Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] 增加arxiv已在期刊发表文章的元数据更新功能 #57

Closed
swenqing opened this issue Aug 22, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@swenqing
Copy link

Describe the feature

以往收藏的arxiv文章可能会随着时间的流逝,某些文章已经在相关期刊发表,希望增加直接更新此类已发表文章的条目元数据的功能,感谢

Additional context

烦请增加该功能以便于期刊的引用

@swenqing swenqing added the enhancement New feature or request label Aug 22, 2023
@northword
Copy link
Owner

northword commented Aug 22, 2023

我所在的领域很少使用预印本,所以我不太了解预印本的情况。据我所知,似乎没有一个集中的数据库或服务可以跟踪预印本网站是否已经正式发表,我目前想到的较好方式也仅是通过标题向搜索引擎/数据库精确匹配,但如果预印本正式发表的过程中修改了标题,可能会匹配不到,也许你有什么更好的想法吗?

此外,官方的元数据更新功能已经做好了,只是一直没有合并,官方在这个功能做好以后一直忙着 PDF 阅读器(Z6),EPub 阅读器和安卓版,所以一直拖着。

Updated (2023-08-22 21:55): 该功能将在未来版本支持。

@northword
Copy link
Owner

northword commented Aug 22, 2023

记录一些与该议题有关的调研笔记:

从 Arxiv 获取 关联 DOI

https://github.com/Future-Scholars/paperlib/blob/8b19e83f3880e80e8b5a0eb99ac72ca461777323/app/repositories/scraper-repository/scrapers/arxiv.ts#L32-L44

  static preProcess(paperEntityDraft: PaperEntity): ScraperRequestType {
    const arxivID = formatString({
      str: paperEntityDraft.arxiv,
      removeStr: "arXiv:",
    });
    const scrapeURL = `https://export.arxiv.org/api/query?id_list=${arxivID}`;

    const headers = {
      "accept-encoding": "UTF-32BE",
    };

    return { scrapeURL, headers };
  }

Arxiv API Documentation: https://info.arxiv.org/help/api/user-manual.html

已正式发布的 https://arxiv.org/abs/2209.06949

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <link href="http://arxiv.org/api/query?search_query%3D%26id_list%3D2209.06949%26start%3D0%26max_results%3D10" rel="self" type="application/atom+xml"/>
  <title type="html">ArXiv Query: search_query=&amp;id_list=2209.06949&amp;start=0&amp;max_results=10</title>
  <id>http://arxiv.org/api/p77U4SDjpVvjSZX7r6obC5nLN6Q</id>
  <updated>2023-08-22T00:00:00-04:00</updated>
  <opensearch:totalResults xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">1</opensearch:totalResults>
  <opensearch:startIndex xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">0</opensearch:startIndex>
  <opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">10</opensearch:itemsPerPage>
  <entry>
    <id>http://arxiv.org/abs/2209.06949v1</id>
    <updated>2022-09-14T21:51:10Z</updated>
    <published>2022-09-14T21:51:10Z</published>
    <title>Absence of a pressure gap and atomistic mechanism of the oxidation of
  pure Co nanoparticles</title>
    <summary>  We present a detailed atomistic picture of the oxidation mechanism of Co
nanoparticles and its impact on magnetism by experimentally following the
evolution of the structure, chemical composition, and magnetism of individual,
gas-phase grown Co nanoparticles during controlled oxidation. The early stage
oxidation occurs in a twostep process characterized by (i) the initial
formation of small CoO crystallites randomly distributed across the
nanoparticle surface, until their coalescence leads to structural completion of
the oxide shell and passivation of the metallic core; (ii) progressive
conversion of the CoO shell to Co3O4, accompanied by void formation due to the
nanoscale Kirkendall effect. The Co nanoparticles remain highly reactive toward
oxygen during phase (i), demonstrating the absence of a pressure gap whereby a
low reactivity at low pressures is postulated. Our results provide an important
benchmark for an improved understanding of the magnetism of oxidized cobalt
nanoparticles, with potential implications on their performance in catalytic
reactions.
</summary>
    <author>
      <name>Jaianth Vijayakumar</name>
    </author>
    <author>
      <name>Tatiana M. Savchenko</name>
    </author>
    <author>
      <name>David M. Bracher</name>
    </author>
    <author>
      <name>Gunnar Lumbeeck</name>
    </author>
    <author>
      <name>Armand Béché</name>
    </author>
    <author>
      <name>Jo Verbeeck</name>
    </author>
    <author>
      <name>Štefan Vajda</name>
    </author>
    <author>
      <name>Frithjof Nolting</name>
    </author>
    <author>
      <name>C. A. F. Vaz</name>
    </author>
    <author>
      <name>Armin Kleibert</name>
    </author>
    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1038/s41467-023-35846-0</arxiv:doi>
    <link title="doi" href="http://dx.doi.org/10.1038/s41467-023-35846-0" rel="related"/>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">34 pages, 5 figures</arxiv:comment>
    <link href="http://arxiv.org/abs/2209.06949v1" rel="alternate" type="text/html"/>
    <link title="pdf" href="http://arxiv.org/pdf/2209.06949v1" rel="related" type="application/pdf"/>
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cond-mat.mtrl-sci" scheme="http://arxiv.org/schemas/atom"/>
    <category term="cond-mat.mtrl-sci" scheme="http://arxiv.org/schemas/atom"/>
    <category term="cond-mat.mes-hall" scheme="http://arxiv.org/schemas/atom"/>
  </entry>
</feed>

未正式发布的(可能是?)https://arxiv.org/abs/2306.03514

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <link href="http://arxiv.org/api/query?search_query%3D%26id_list%3D2306.03514%26start%3D0%26max_results%3D10" rel="self" type="application/atom+xml"/>
  <title type="html">ArXiv Query: search_query=&amp;id_list=2306.03514&amp;start=0&amp;max_results=10</title>
  <id>http://arxiv.org/api/8SR2eakAqey+GcYsD8pz9ybrjS4</id>
  <updated>2023-08-22T00:00:00-04:00</updated>
  <opensearch:totalResults xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">1</opensearch:totalResults>
  <opensearch:startIndex xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">0</opensearch:startIndex>
  <opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">10</opensearch:itemsPerPage>
  <entry>
    <id>http://arxiv.org/abs/2306.03514v3</id>
    <updated>2023-06-09T15:21:06Z</updated>
    <published>2023-06-06T09:00:10Z</published>
    <title>Recognize Anything: A Strong Image Tagging Model</title>
    <summary>  We present the Recognize Anything Model (RAM): a strong foundation model for
image tagging. RAM makes a substantial step for large models in computer
vision, demonstrating the zero-shot ability to recognize any common category
with high accuracy. RAM introduces a new paradigm for image tagging, leveraging
large-scale image-text pairs for training instead of manual annotations.
  The development of RAM comprises four key steps. Firstly, annotation-free
image tags are obtained at scale through automatic text semantic parsing.
Subsequently, a preliminary model is trained for automatic annotation by
unifying the caption and tagging tasks, supervised by the original texts and
parsed tags, respectively. Thirdly, a data engine is employed to generate
additional annotations and clean incorrect ones. Lastly, the model is retrained
with the processed data and fine-tuned using a smaller but higher-quality
dataset.
  We evaluate the tagging capabilities of RAM on numerous benchmarks and
observe impressive zero-shot performance, significantly outperforming CLIP and
BLIP. Remarkably, RAM even surpasses the fully supervised manners and exhibits
competitive performance with the Google tagging API. We are releasing the RAM
at \url{https://recognize-anything.github.io/} to foster the advancements of
large models in computer vision.
</summary>
    <author>
      <name>Youcai Zhang</name>
    </author>
    <author>
      <name>Xinyu Huang</name>
    </author>
    <author>
      <name>Jinyu Ma</name>
    </author>
    <author>
      <name>Zhaoyang Li</name>
    </author>
    <author>
      <name>Zhaochuan Luo</name>
    </author>
    <author>
      <name>Yanchun Xie</name>
    </author>
    <author>
      <name>Yuzhuo Qin</name>
    </author>
    <author>
      <name>Tong Luo</name>
    </author>
    <author>
      <name>Yaqian Li</name>
    </author>
    <author>
      <name>Shilong Liu</name>
    </author>
    <author>
      <name>Yandong Guo</name>
    </author>
    <author>
      <name>Lei Zhang</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">Homepage: https://recognize-anything.github.io/</arxiv:comment>
    <link href="http://arxiv.org/abs/2306.03514v3" rel="alternate" type="text/html"/>
    <link title="pdf" href="http://arxiv.org/pdf/2306.03514v3" rel="related" type="application/pdf"/>
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.CV" scheme="http://arxiv.org/schemas/atom"/>
    <category term="cs.CV" scheme="http://arxiv.org/schemas/atom"/>
  </entry>
</feed>

获取 以下两行并执行 DOI 强制更新。

    <arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1038/s41467-023-35846-0</arxiv:doi>
    <link title="doi" href="http://dx.doi.org/10.1038/s41467-023-35846-0" rel="related"/>

该方案准确性有待验证,初步认为存在非 Arxiv 机构下的 DOI,则认为已正式发布。然而不存在的是否可能为 Arxiv 未更新?

额外的获取方式

https://github.com/vict0rsch/PaperMemory#preprints

@invisprints
Copy link

您好,很感谢您开发的这个插件!现在很大的问题在于绝大部分刊登在arxiv上文章都不会更新自己发表在其它会议或期刊上的新信息,最多只留下一个comment。目前CS领域比较主流的是从 semantic scholar 上获取更新的信息,不知道您可否考虑加入进来?利用 semantic scholar api 可以查询发表在 arxiv 上的预印本是否已经发表论文(虽然没有囊括所有,但胜在简单可靠,且更新相对及时)

例子:

https://api.semanticscholar.org/graph/v1/paper/ArXiv:1705.07874?fields=publicationVenue,externalIds,journal,publicationTypes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

3 participants