mEDRA DOI fetcher implementation. #6641

alchimos · 2020-06-26T08:14:03Z

fixes #6602
I added the possibility to retrieve DOI metadata from mEDRA agency .
Some DOIs return 500 code even if they are correctly registered at mEDRA. I'm waiting for the mEDRA team to answer on that.

Change in CHANGELOG.md described (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

Siedlerchr · 2020-06-26T08:43:46Z

src/main/java/org/jabref/logic/net/URLDownload.java

+
+        // mEDRA requires Accept header attribute with desired content type
+        if (source.toString().contains(Medra.API_URL)) {
+            this.addHeader("Accept", Medra.CONTENT_TYPE_JSON);


Why don't you use the application/x-bibtex header as for the other DOI fetcher well? That would make fetching a lot easier!

I made a few tests but bibtext type seems to not work as exptected with mEDRA.
Here are the outputs of URLDonwnload.asString():

mEDRA
@article{OLLETT, Andrew_2018, title={S?tav?hana and N?g?rjuna}, ISSN={2507-0347}, url={http://doi.org/10.2143/JIABS.41.0.3285748}, DOI={10.2143/JIABS.41.0.3285748}, number={0}, journal={Journal of the International Association of Buddhist Studies}, publisher={Peeters online journals}, author={OLLETT, Andrew}, year={2018}, pages={421–472} }

INSPIREHEP (similarly for GoogleScholar):

@article{Maggiore:2017jjr, author = "Maggiore, M. and Campo, D. and Antonini, P. and Lombardi, A. and Manzolaro, M. and Andrighetto, A. and Monetti, A. and Scarpa, D. and Esposito, J. and Silvestrin, L.", title = "{SPES: A new cyclotron-based facility for research and applications with high-intensity beams}", doi = "10.1142/S0217732317400107", journal = "Mod. Phys. Lett. A", volume = "32", number = "17", pages = "1740010", year = "2017" }

From what I understand, as a result, BibtexParser is not able to parse mEDRA string and returns no entries, while it works correctly in the other case.

In general this works, but the specific authors of the citation contain umlauts which seem to be not correclty returned:
http://doi.org/10.2143/JIABS.41.0.3285748

I performed several other tests with random mEDRA DOIs retrieved at https://api.medra.org/doiprefixlist?doiPrefix=10.2143 but only 1 over maybe 50 happened to be correctly parsed in bibtex type. Json type, instead, never failed and the only errors was due to the 500 error code returned by mEDRA.
At this point i believe the bibtex style returned by mEDRA could be the issue, but since this is my first contribution I'm really open to any other advice or suggestion you think i could work on to tackle this issue.

I'm committing some refactoring of the previous code along with WebFetchers class modified to show mEDRA agency in the drop down list of the Web Search pane.

Thanks for the investigation! I think it's fine to use the json as workaround then.

From a users perspective, I don't want to know to whom a DOI actually belongs or to manually try another DOI fetcher.
It would be nice if there was some "magic" in the background. I just enter the DOI (no matter who "owns" it ) and I get my bibtex entry from it.

Some suggestions: Implement the IDBasedFetcher interface, it's meant for fetchers which work based on an ID, e.g DOI, ISBN
The SearchBasedFetcher is meant for fetcher where you can enter a search term e.g. "abcdf"
Scenario:

User enters doi (e.g. new entry from DOI)

If the DOI starts with medra prefix number => call medra fetcher and return result
else use the normal DOI results.

We have a similar setup for ISBNFetcher, we first check site a and then site b in case the first didn't return anything.
For parsing DOIs we have the DOI.java class.

Add a method like getPrefix to return the prefix of the DOI and then you could use that in the above scenario

I hope that makes it a bit more clear now. Feel free to ask if you have any further questions or problems.

Thanks @Siedlerchr, I really appreciate your suggestions.
Here are some more checks I've performed.

I've figured out that the DoiFetcher.performSearchById was correctly retrieving the bibtex of mEDRA DOIs, even before I made any changes, but cannot parse it for the reason already mentioned. Thus, JabRef not retrieving mEDRA DOIs metadata was not the issue in #6602.

Furthermore, I have not found an easy way of knowing the agency by prefix through mEDRA api without retrieving a list of DOIs registered for a particular prefix and then checking whether this is empty or not, so I don't think a getPrefix method would help.
The most straightforward way seems to be calling https://doi.org/doiRA/10.2143/10.2143/JCS.2.0.583382 (DOI handbook - 5.2.1) which in turn returns a single object array as follow:
[ { "DOI": "10.2143/JCS.2.0.583382", "RA": "mEDRA" } ]
I've then used this and created a getAgency method on DoiFetcher. This in turn would provide the condition to move processing toward mEDRA fetcher.

Last, as suggested I have implemented IdBasedParserFetcher instead of SearchBasedParserFetcher and overridden IdBasedParserFetcher.performSearchByID within Medra class.

I hope my info are clear and if you think this could be a good way of proceeding I would update the PR.
In this case I should remove SearchBasedParserFetcher from the open PR. Can I do delete it directly from the Files Changed view of this PR? Is it the correct way to do it?

Thank you so much.

Ah, yes, then the DoiFetcher getAgency makes more sense.
Just delete/modify your changes locally and push your new commit with the changes, the PR will then be updated automatically.

Siedlerchr

Thanks for tackling this issue. Have you tried to use the application/x-bibtex accept header?

src/main/java/org/jabref/logic/importer/SearchBasedParserFetcher.java

Siedlerchr · 2020-06-30T10:10:12Z

src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java

+     * @throws IOException
+     */
+    public String getAgency(DOI doi) throws JSONException, IOException {
+        String agency = null;


We try to avoid null. Instead you can use Optionals.
https://blog.indrek.io/articles/optionals-in-java-8/

Could be something like this?

public Optional<String> getAgency(DOI doi) throws JSONException, IOException { Optional<String> agency = Optional.empty(); URLDownload download = new URLDownload(DOI.AGENCY_RESOLVER + "/" + doi.getDOI()); JSONObject response = new JSONArray(download.asString()).getJSONObject(0); if (response != null) { agency = Optional.of(response.optString("RA")); } return agency; }

and the condition in performSearchById would then be

if ("medra".equalsIgnoreCase(getAgency(doi.get()).get()))

You need to check if the value is present first:
getAgency(...).ifPresent() && getAgency(...).get().equals(medra)

Yes, sure, and I also needs to change agency = Optional.of(response.optString("RA")); to agency = Optional.ofNullable(response.optString("RA")); since optString could return a null.

Siedlerchr · 2020-06-30T10:11:28Z

In general looks already good! 👍 It would also be nice if you could add a/some tests for the new feature

Siedlerchr · 2020-07-01T06:23:21Z

Looks good so far, I only noticed one failing test, you have to add the Medra Fetcher to the test array in org.jabref.logic.importer.WebFetchersTest
Could you also please add one test case to the existing DOI Fetcher to check that it returns the correct value from MEDRA?

alchimos · 2020-07-01T07:40:43Z

Ok, I added a test case to DoiFetcher. Passed.

Regarding the test array in org.jabref.logic.importer.WebFetchersTest, if I've correctly understood the logic, I think I should remove Medra Fetcher in the below method, instead of adding. In this way tests are passed.

 @Test
    void getSearchBasedFetchersReturnsAllFetcherDerivingFromSearchBasedFetcher() throws Exception {
        Set<SearchBasedFetcher> searchBasedFetchers = WebFetchers.getSearchBasedFetchers(importFormatPreferences);
        try (ScanResult scanResult = classGraph.scan()) {
            ClassInfoList controlClasses = scanResult.getClassesImplementing(SearchBasedFetcher.class.getCanonicalName());
            Set<Class<?>> expected = new HashSet<>(controlClasses.loadClasses());

            expected.remove(SearchBasedParserFetcher.class);

            // Remove ACM, because it doesn't work currently
            expected.remove(ACMPortalFetcher.class);

            // Remove GROBID, because we don't want to show this to the user
            expected.remove(GrobidCitationFetcher.class);

            // Remove mEDRA, because it only works for ID based search
            expected.remove(Medra.class);

            assertEquals(expected, getClasses(searchBasedFetchers));
        }
    }

If you think this could be a proper approach, I will commit.

Siedlerchr · 2020-07-01T07:50:23Z

src/main/java/org/jabref/logic/importer/fetcher/Medra.java

+ * <p>
+ * It requires "Accept" request Header attribute to be set to desired content-type.
+ */
+public class Medra implements SearchBasedParserFetcher, IdBasedParserFetcher {


Do you really need the SearchBasedParserFetcher interface?
I think this is superflous. The IDBasedFetcher should be sufficient

Ah yes, I forgot to remove it.
In this case I don't see why i should add Medra Fetcher to org.jabref.logic.importer.WebFetchersTest array test, since it seems that this already loads Medra class to expected array of getIdBasedFetchersReturnsAllFetcherDerivingFromIdBasedFetcher() and the test is passed.
Am I missing anything?

That was probaby the reason for the failing test at first. If it passes now it's fine. Nothing to do then.
Please take a look at the checkstyle issues, than it's ready to go!

Maybe I fixed the style issues and wanted to run ./gradlew checkstyleMain checkstyleTest checkstyleJmh from terminal to test them before committing but i get:

`Starting a Gradle Daemon (subsequent builds will be faster) > Configure project : Found module name 'org.jabref' > Task :compileJava org.apache.logging.log4j.LoggingException: Unable to create Plugin Service Class org.jabref.gui.logging.plugins.Log4jPlugins at org.apache.logging.log4j.plugins.processor.PluginProcessor.createSourceFile(PluginProcessor.java:201) at org.apache.logging.log4j.plugins.processor.PluginProcessor.writeClassFile(PluginProcessor.java:163) at org.apache.logging.log4j.plugins.processor.PluginProcessor.process(PluginProcessor.java:91) at org.gradle.api.internal.tasks.compile.processing.DelegatingProcessor.process(DelegatingProcessor.java:62)`

running with --info gives some more details:

> Task :compileJava Caching disabled for task ':compileJava' because: Build cache is disabled Task ':compileJava' is not up-to-date because: Task has failed previously. The input changes require a full rebuild for incremental task ':compileJava'. Full recompilation is required because no incremental change information is available. This is usually caused by clean builds or changing compiler arguments. Compiling with JDK Java compiler API. org.apache.logging.log4j.LoggingException: Unable to create Plugin Service Class org.jabref.gui.logging.plugins.Log4jPlugins at org.apache.logging.log4j.plugins.processor.PluginProcessor.createSourceFile(PluginProcessor.java:201) at org.apache.logging.log4j.plugins.processor.PluginProcessor.writeClassFile(PluginProcessor.java:163)

Should I commit anyway and see what the tests say?

Yes, just commit.

alchimos · 2020-07-01T12:32:13Z

Still checkstyle issues. I'll work on that.

Siedlerchr · 2020-07-01T13:06:36Z

Are you using Intellij? Then you can import the style and let the IDE format your changes and organize the imports

alchimos · 2020-07-01T13:14:14Z

Nope, Eclipse.

Siedlerchr · 2020-07-01T13:26:52Z

For the imports you can use ctr+ shift +o will organize the imports

medra-fetcher # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.

Siedlerchr

Thanks for the work! 👍 Although these checkstyle stuff is sometimes annoying, it helps to have consistency across all files and to to avoid unnecessary conflicts due to different formatting.
I hope you nonetheless had a good first experience of contributing.

So, we now wait for a second developer to review your code and then we can merge.
For external contributors we require the review of at least two developers before we merge the code.

alchimos · 2020-07-01T15:20:34Z

I believe this has been a really great lesson for me since I've gone through several never encountered issues.
I hope this will be released at the end so I can move forward to another issue.

Thanks a lot for your patience and guiding advices.

Siedlerchr · 2020-07-01T15:23:56Z

Once your PR is merged in master it will be available in the next automatic build of development version, https://builds.jabref.org/master/

koppor · 2020-07-01T18:58:11Z

I think something went wrong with the merge of master into this branch. Need to investigate manually before merging.

General remark: The discussion in this PR seems to be long?

Are respective code comments added?
Might it be necessary to add an ADR?

alchimos · 2020-07-01T20:05:47Z

Hi @koppor, thanks for the feedback.
One of the last commit was rejected with non-fast-forwarding error, so i tried to pull the medra-fretcher branch back to local and then push back only the last changes. Could this have messed up something? And most important, is there anything I should/could do, for example creating new branch and work the implementation there?

About comments I inserted a few lines in the code where I thought they could have help. Is this you are referring to?

About ADR probably I need some more instructions. Is it something I should do on my own?

Thanks again.

tobiasdiez

Thanks for your contribution! The code looks already pretty good, and I've only a few minor remarks.

src/main/java/org/jabref/logic/importer/fetcher/CrossRef.java

src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java

src/main/java/org/jabref/logic/importer/fetcher/Medra.java

- Rename getURLForID to getUrlForIdentifier - Shrink Medra fetcher - Shrink CrossRef fetcher

Reastion: It is the common base of both SearchBasedParserFetcher and IdBasedParserFetcher

Improve json reader

Siedlerchr · 2020-07-09T06:04:54Z

src/main/java/org/jabref/logic/importer/util/JsonReader.java

-            }
-            if (responseStrBuilder.toString().isBlank()) {
-                throw new ParseException("Empty input!");
+            String inputStr = new String(ByteStreams.toByteArray(inputStream), Charsets.UTF_8);


Suggested change

String inputStr = new String(ByteStreams.toByteArray(inputStream), Charsets.UTF_8);

String inputStr = new String((inputStream.readAllByte()), Charsets.UTF_8);

This is possible since java 9

Siedlerchr · 2020-07-09T09:52:42Z

Thanks for your contribution @Mind000 and the quick follow up! We hope you weren't frightened by our reviews ;) And we hope you will continue contributing!

Siedlerchr · 2020-07-09T10:08:07Z

Shit, saw this only after merge:
org.opentest4j.AssertionFailedError: The following classes are not allowed to depend on org.jabref.logic ==> expected: <[]> but was: <[src/main/java/org/jabref/model/entry/identifier/DOI.java]>

alchimos · 2020-07-09T12:12:56Z

Shit, saw this only after merge:
org.opentest4j.AssertionFailedError: The following classes are not allowed to depend on org.jabref.logic ==> expected: <[]> but was: <[src/main/java/org/jabref/model/entry/identifier/DOI.java]>

Anything I should do?
This happened when I moved the getAgency from DoiFetcher to DOI.

koppor · 2020-07-09T22:20:02Z

The idea behind our High-level architcutre is to have the model as model package. We are aware this leads to an anemic domain model.

Not sure whether downloading things should be part of a model.

Think, this is a devcall thing. We should discuss whether we want to "undo" the seperation of model and logic. What are our "limits" in model? @stefan-kolb

Siedlerchr · 2020-07-10T06:21:19Z

as the build is broken now I would vote to simply change it back to the original, e.g. move the agency fetching back to the fetcher.

* upstream/master: (243 commits) fix checkstyle mEDRA DOI fetcher implementation. (#6641) Bump bcprov-jdk15on from 1.65.01 to 1.66 (#6676) Bump appleboy/ssh-action from v0.0.6 to v0.1.2 (#6674) Add localisation strings and renamed formatter Bump gittools/actions from v0.9.2 to v0.9.3 (#6675) Skip non-working sourcespy site Let dependabot update github actions Add three formatters to fix new lines in abstract and digits in editors Update src/test/java/org/jabref/logic/importer/fetcher/CollectionOfComputerScienceBibliographiesParserTest.java Fix to imports Make fetcher test more specific by checking each field explicitly Use builder instead of "setField" statements (docs) fix styling Update jpackage notes Fix automerge workflow fix markdown Bump org.beryx.jlink from 2.20.0 to 2.21.0 Bump unirest-java from 3.8.00 to 3.8.06 Fix to dependency on Global ...

alchimos · 2020-07-13T06:27:42Z

What should I do now?

Siedlerchr · 2020-07-13T06:56:06Z

@Mind000 Just create a new PR where you move the DOI agency getting back to the DOIFetcher class. That is the simplest solution and this would not destroy our architecture constraints

alchimos · 2020-07-13T07:02:37Z

So, if I got it right, I have to push the same files as the last commit I made, changing only the getAgency location.

Siedlerchr · 2020-07-13T07:24:28Z

as the code is already merged into master you fetch and pull the upstream master in your local master then you create a new branch and then you just move the getAgency back to DoiFetcher. Then you commit your changes and push your new branch and create a new PR Am Mo., 13. Juli 2020 um 09:02 Uhr schrieb Giovanni Caldarola < notifications@github.com>:

…

So, if I got it right, I have to push the same files as the last commit I made, changing only the getAgency location. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#6641 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACOFZB6KFRY2YWRD7KQBYTR3KWRRANCNFSM4OJDHZDA> .

koppor · 2020-07-14T18:29:32Z

For internal referencre: follow-up PR is #6683.

Thus, everyhting should be fine now.

mEDRA DOI fetcher implementation.

2d75da2

Siedlerchr reviewed Jun 26, 2020

View reviewed changes

Siedlerchr requested changes Jun 26, 2020

View reviewed changes

mEDRA DOI fetcher implementation

df31be7

Siedlerchr reviewed Jun 28, 2020

View reviewed changes

src/main/java/org/jabref/logic/importer/SearchBasedParserFetcher.java Outdated Show resolved Hide resolved

mEDRA DOI fetcher implementation

b521e7c

Siedlerchr reviewed Jun 30, 2020

View reviewed changes

mEDRA DOI fetcher implementation

3539079

Siedlerchr reviewed Jul 1, 2020

View reviewed changes

mEDRA DOI fetcher implementation

1b78fd5

Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Jul 1, 2020

Merge branch 'master' into medra-fetcher

81d5cc3

alchimos added 6 commits July 1, 2020 16:05

mEDRA DOI fetcher implementation

51423cb

mEDRA DOI fetcher implementation

a42904c

mEDRA DOI fetcher implementation

c0a329f

mEDRA DOI fetcher implementation

2ede4ac

mEDRA DOI fetcher implementation

a108974

Siedlerchr approved these changes Jul 1, 2020

View reviewed changes

alchimos requested a review from Siedlerchr July 7, 2020 06:51

tobiasdiez requested changes Jul 7, 2020

View reviewed changes

koppor reviewed Jul 7, 2020

View reviewed changes

src/main/java/org/jabref/logic/importer/fetcher/Medra.java Outdated Show resolved Hide resolved

koppor added 5 commits July 7, 2020 11:55

Rewrite JsonReader to accept an empty input stream

5504e52

Introduce "getUrlDownloadForIdentifier" and shrink fetcher code

6fd81c3

- Rename getURLForID to getUrlForIdentifier - Shrink Medra fetcher - Shrink CrossRef fetcher

Use parameterized tests (and add ISSN to Cisternino Paola)

9b9b14f

Fix parsing of names

3596442

Fix checkstyle

d17af0f

koppor removed the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Jul 7, 2020

koppor and others added 3 commits July 7, 2020 15:23

Move "getUrlDownload" up to WebFetcher

dd0a665

Reastion: It is the common base of both SearchBasedParserFetcher and IdBasedParserFetcher

Use new method "getUrlDownload"

4fa073e

Merge pull request #1 from JabRef/improveJsonReader

faa304a

Improve json reader

Siedlerchr requested changes Jul 9, 2020

View reviewed changes

mEDRA DOI fetcher implementation

4523f1d

Siedlerchr approved these changes Jul 9, 2020

View reviewed changes

Merge branch 'master' into medra-fetcher

1793870

Siedlerchr merged commit 20e78a4 into JabRef:master Jul 9, 2020

koppor mentioned this pull request Jul 19, 2020

Using master of forks is dangeours #6699

Closed

alchimos deleted the medra-fetcher branch August 9, 2020 20:00

	String inputStr = new String(ByteStreams.toByteArray(inputStream), Charsets.UTF_8);
	String inputStr = new String((inputStream.readAllByte()), Charsets.UTF_8);

mEDRA DOI fetcher implementation. #6641

mEDRA DOI fetcher implementation. #6641

Conversation

alchimos commented Jun 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Siedlerchr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Siedlerchr commented Jun 30, 2020

Siedlerchr commented Jul 1, 2020

alchimos commented Jul 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Siedlerchr Jul 1, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alchimos commented Jul 1, 2020

Siedlerchr commented Jul 1, 2020

alchimos commented Jul 1, 2020

Siedlerchr commented Jul 1, 2020

Siedlerchr left a comment

Choose a reason for hiding this comment

alchimos commented Jul 1, 2020

Siedlerchr commented Jul 1, 2020

koppor commented Jul 1, 2020

alchimos commented Jul 1, 2020

tobiasdiez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Siedlerchr commented Jul 9, 2020

Siedlerchr commented Jul 9, 2020

alchimos commented Jul 9, 2020

koppor commented Jul 9, 2020

Siedlerchr commented Jul 10, 2020

alchimos commented Jul 13, 2020

Siedlerchr commented Jul 13, 2020

alchimos commented Jul 13, 2020

Siedlerchr commented Jul 13, 2020 via email

koppor commented Jul 14, 2020

alchimos commented Jun 26, 2020 •

edited

Loading

Siedlerchr Jul 1, 2020 •

edited

Loading