Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mEDRA DOI fetcher implementation. #6641

Merged
merged 26 commits into from
Jul 9, 2020
Merged

mEDRA DOI fetcher implementation. #6641

merged 26 commits into from
Jul 9, 2020

Conversation

alchimos
Copy link
Contributor

@alchimos alchimos commented Jun 26, 2020

fixes #6602
I added the possibility to retrieve DOI metadata from mEDRA agency .
Some DOIs return 500 code even if they are correctly registered at mEDRA. I'm waiting for the mEDRA team to answer on that.

  • Change in CHANGELOG.md described (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.


// mEDRA requires Accept header attribute with desired content type
if (source.toString().contains(Medra.API_URL)) {
this.addHeader("Accept", Medra.CONTENT_TYPE_JSON);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use the application/x-bibtex header as for the other DOI fetcher well? That would make fetching a lot easier!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few tests but bibtext type seems to not work as exptected with mEDRA.
Here are the outputs of URLDonwnload.asString():

mEDRA
@article{OLLETT, Andrew_2018, title={S?tav?hana and N?g?rjuna}, ISSN={2507-0347}, url={http://doi.org/10.2143/JIABS.41.0.3285748}, DOI={10.2143/JIABS.41.0.3285748}, number={0}, journal={Journal of the International Association of Buddhist Studies}, publisher={Peeters online journals}, author={OLLETT, Andrew}, year={2018}, pages={421–472} }

INSPIREHEP (similarly for GoogleScholar):

@article{Maggiore:2017jjr,
author = "Maggiore, M. and Campo, D. and Antonini, P. and Lombardi, A. and Manzolaro, M. and Andrighetto, A. and Monetti, A. and Scarpa, D. and Esposito, J. and Silvestrin, L.",
title = "{SPES: A new cyclotron-based facility for research and applications with high-intensity beams}",
doi = "10.1142/S0217732317400107",
journal = "Mod. Phys. Lett. A",
volume = "32",
number = "17",
pages = "1740010",
year = "2017"
}

From what I understand, as a result, BibtexParser is not able to parse mEDRA string and returns no entries, while it works correctly in the other case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this works, but the specific authors of the citation contain umlauts which seem to be not correclty returned:
http://doi.org/10.2143/JIABS.41.0.3285748

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I performed several other tests with random mEDRA DOIs retrieved at https://api.medra.org/doiprefixlist?doiPrefix=10.2143 but only 1 over maybe 50 happened to be correctly parsed in bibtex type. Json type, instead, never failed and the only errors was due to the 500 error code returned by mEDRA.
At this point i believe the bibtex style returned by mEDRA could be the issue, but since this is my first contribution I'm really open to any other advice or suggestion you think i could work on to tackle this issue.

I'm committing some refactoring of the previous code along with WebFetchers class modified to show mEDRA agency in the drop down list of the Web Search pane.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the investigation! I think it's fine to use the json as workaround then.

From a users perspective, I don't want to know to whom a DOI actually belongs or to manually try another DOI fetcher.
It would be nice if there was some "magic" in the background. I just enter the DOI (no matter who "owns" it ) and I get my bibtex entry from it.

Some suggestions: Implement the IDBasedFetcher interface, it's meant for fetchers which work based on an ID, e.g DOI, ISBN
The SearchBasedFetcher is meant for fetcher where you can enter a search term e.g. "abcdf"
Scenario:

  1. User enters doi (e.g. new entry from DOI)
  2. If the DOI starts with medra prefix number => call medra fetcher and return result
    else use the normal DOI results.

We have a similar setup for ISBNFetcher, we first check site a and then site b in case the first didn't return anything.
For parsing DOIs we have the DOI.java class.

  1. Add a method like getPrefix to return the prefix of the DOI and then you could use that in the above scenario

I hope that makes it a bit more clear now. Feel free to ask if you have any further questions or problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Siedlerchr, I really appreciate your suggestions.
Here are some more checks I've performed.

I've figured out that the DoiFetcher.performSearchById was correctly retrieving the bibtex of mEDRA DOIs, even before I made any changes, but cannot parse it for the reason already mentioned. Thus, JabRef not retrieving mEDRA DOIs metadata was not the issue in #6602.

Furthermore, I have not found an easy way of knowing the agency by prefix through mEDRA api without retrieving a list of DOIs registered for a particular prefix and then checking whether this is empty or not, so I don't think a getPrefix method would help.
The most straightforward way seems to be calling https://doi.org/doiRA/10.2143/10.2143/JCS.2.0.583382 (DOI handbook - 5.2.1) which in turn returns a single object array as follow:
[ { "DOI": "10.2143/JCS.2.0.583382", "RA": "mEDRA" } ]
I've then used this and created a getAgency method on DoiFetcher. This in turn would provide the condition to move processing toward mEDRA fetcher.

Last, as suggested I have implemented IdBasedParserFetcher instead of SearchBasedParserFetcher and overridden IdBasedParserFetcher.performSearchByID within Medra class.

I hope my info are clear and if you think this could be a good way of proceeding I would update the PR.
In this case I should remove SearchBasedParserFetcher from the open PR. Can I do delete it directly from the Files Changed view of this PR? Is it the correct way to do it?

Thank you so much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, then the DoiFetcher getAgency makes more sense.
Just delete/modify your changes locally and push your new commit with the changes, the PR will then be updated automatically.

Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this issue. Have you tried to use the application/x-bibtex accept header?

* @throws IOException
*/
public String getAgency(DOI doi) throws JSONException, IOException {
String agency = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid null. Instead you can use Optionals.
https://blog.indrek.io/articles/optionals-in-java-8/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be something like this?

public Optional<String> getAgency(DOI doi) throws JSONException, IOException {
         Optional<String> agency = Optional.empty();
         URLDownload download = new URLDownload(DOI.AGENCY_RESOLVER + "/" + doi.getDOI());
         JSONObject response = new JSONArray(download.asString()).getJSONObject(0);
         if (response != null) {
            agency = Optional.of(response.optString("RA"));
        }
        return agency;
    }

and the condition in performSearchById would then be

if ("medra".equalsIgnoreCase(getAgency(doi.get()).get()))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check if the value is present first:
getAgency(...).ifPresent() && getAgency(...).get().equals(medra)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure, and I also needs to change agency = Optional.of(response.optString("RA")); to agency = Optional.ofNullable(response.optString("RA")); since optString could return a null.

@Siedlerchr
Copy link
Member

In general looks already good! 👍 It would also be nice if you could add a/some tests for the new feature

@Siedlerchr
Copy link
Member

Looks good so far, I only noticed one failing test, you have to add the Medra Fetcher to the test array in org.jabref.logic.importer.WebFetchersTest
Could you also please add one test case to the existing DOI Fetcher to check that it returns the correct value from MEDRA?

@alchimos
Copy link
Contributor Author

alchimos commented Jul 1, 2020

Ok, I added a test case to DoiFetcher. Passed.

Regarding the test array in org.jabref.logic.importer.WebFetchersTest, if I've correctly understood the logic, I think I should remove Medra Fetcher in the below method, instead of adding. In this way tests are passed.

 @Test
    void getSearchBasedFetchersReturnsAllFetcherDerivingFromSearchBasedFetcher() throws Exception {
        Set<SearchBasedFetcher> searchBasedFetchers = WebFetchers.getSearchBasedFetchers(importFormatPreferences);
        try (ScanResult scanResult = classGraph.scan()) {
            ClassInfoList controlClasses = scanResult.getClassesImplementing(SearchBasedFetcher.class.getCanonicalName());
            Set<Class<?>> expected = new HashSet<>(controlClasses.loadClasses());

            expected.remove(SearchBasedParserFetcher.class);

            // Remove ACM, because it doesn't work currently
            expected.remove(ACMPortalFetcher.class);

            // Remove GROBID, because we don't want to show this to the user
            expected.remove(GrobidCitationFetcher.class);

            // Remove mEDRA, because it only works for ID based search
            expected.remove(Medra.class);

            assertEquals(expected, getClasses(searchBasedFetchers));
        }
    }

If you think this could be a proper approach, I will commit.

* <p>
* It requires "Accept" request Header attribute to be set to desired content-type.
*/
public class Medra implements SearchBasedParserFetcher, IdBasedParserFetcher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need the SearchBasedParserFetcher interface?
I think this is superflous. The IDBasedFetcher should be sufficient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I forgot to remove it.
In this case I don't see why i should add Medra Fetcher to org.jabref.logic.importer.WebFetchersTest array test, since it seems that this already loads Medra class to expected array of getIdBasedFetchersReturnsAllFetcherDerivingFromIdBasedFetcher() and the test is passed.
Am I missing anything?

Copy link
Member

@Siedlerchr Siedlerchr Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was probaby the reason for the failing test at first. If it passes now it's fine. Nothing to do then.
Please take a look at the checkstyle issues, than it's ready to go!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I fixed the style issues and wanted to run ./gradlew checkstyleMain checkstyleTest checkstyleJmh from terminal to test them before committing but i get:

`Starting a Gradle Daemon (subsequent builds will be faster)

> Configure project :
Found module name 'org.jabref'

> Task :compileJava
org.apache.logging.log4j.LoggingException: Unable to create Plugin Service Class org.jabref.gui.logging.plugins.Log4jPlugins
        at org.apache.logging.log4j.plugins.processor.PluginProcessor.createSourceFile(PluginProcessor.java:201)
        at org.apache.logging.log4j.plugins.processor.PluginProcessor.writeClassFile(PluginProcessor.java:163)
        at org.apache.logging.log4j.plugins.processor.PluginProcessor.process(PluginProcessor.java:91)
        at org.gradle.api.internal.tasks.compile.processing.DelegatingProcessor.process(DelegatingProcessor.java:62)`

running with --info gives some more details:

> Task :compileJava
Caching disabled for task ':compileJava' because:
  Build cache is disabled
Task ':compileJava' is not up-to-date because:
  Task has failed previously.
The input changes require a full rebuild for incremental task ':compileJava'.
Full recompilation is required because no incremental change information is available. This is usually caused by clean builds or changing compiler arguments.
Compiling with JDK Java compiler API.
org.apache.logging.log4j.LoggingException: Unable to create Plugin Service Class org.jabref.gui.logging.plugins.Log4jPlugins
        at org.apache.logging.log4j.plugins.processor.PluginProcessor.createSourceFile(PluginProcessor.java:201)
        at org.apache.logging.log4j.plugins.processor.PluginProcessor.writeClassFile(PluginProcessor.java:163)

Should I commit anyway and see what the tests say?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just commit.

@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Jul 1, 2020
@alchimos
Copy link
Contributor Author

alchimos commented Jul 1, 2020

Still checkstyle issues. I'll work on that.

@Siedlerchr
Copy link
Member

Are you using Intellij? Then you can import the style and let the IDE format your changes and organize the imports

@alchimos
Copy link
Contributor Author

alchimos commented Jul 1, 2020

Nope, Eclipse.

@Siedlerchr
Copy link
Member

For the imports you can use ctr+ shift +o will organize the imports

medra-fetcher
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! 👍 Although these checkstyle stuff is sometimes annoying, it helps to have consistency across all files and to to avoid unnecessary conflicts due to different formatting.
I hope you nonetheless had a good first experience of contributing.

So, we now wait for a second developer to review your code and then we can merge.
For external contributors we require the review of at least two developers before we merge the code.

@alchimos
Copy link
Contributor Author

alchimos commented Jul 1, 2020

I believe this has been a really great lesson for me since I've gone through several never encountered issues.
I hope this will be released at the end so I can move forward to another issue.

Thanks a lot for your patience and guiding advices.

@Siedlerchr
Copy link
Member

Once your PR is merged in master it will be available in the next automatic build of development version, https://builds.jabref.org/master/

@koppor
Copy link
Member

koppor commented Jul 1, 2020

I think something went wrong with the merge of master into this branch. Need to investigate manually before merging.

General remark: The discussion in this PR seems to be long?

  • Are respective code comments added?
  • Might it be necessary to add an ADR?

@alchimos
Copy link
Contributor Author

alchimos commented Jul 1, 2020

Hi @koppor, thanks for the feedback.
One of the last commit was rejected with non-fast-forwarding error, so i tried to pull the medra-fretcher branch back to local and then push back only the last changes. Could this have messed up something? And most important, is there anything I should/could do, for example creating new branch and work the implementation there?

About comments I inserted a few lines in the code where I thought they could have help. Is this you are referring to?

About ADR probably I need some more instructions. Is it something I should do on my own?

Thanks again.

@alchimos alchimos requested a review from Siedlerchr July 7, 2020 06:51
Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! The code looks already pretty good, and I've only a few minor remarks.

@koppor koppor removed the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Jul 7, 2020
koppor and others added 3 commits July 7, 2020 15:23
Reastion: It is the common base of both SearchBasedParserFetcher and IdBasedParserFetcher
}
if (responseStrBuilder.toString().isBlank()) {
throw new ParseException("Empty input!");
String inputStr = new String(ByteStreams.toByteArray(inputStream), Charsets.UTF_8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String inputStr = new String(ByteStreams.toByteArray(inputStream), Charsets.UTF_8);
String inputStr = new String((inputStream.readAllByte()), Charsets.UTF_8);

This is possible since java 9

@Siedlerchr Siedlerchr merged commit 20e78a4 into JabRef:master Jul 9, 2020
@Siedlerchr
Copy link
Member

Thanks for your contribution @Mind000 and the quick follow up! We hope you weren't frightened by our reviews ;) And we hope you will continue contributing!

@Siedlerchr
Copy link
Member

Shit, saw this only after merge:
org.opentest4j.AssertionFailedError: The following classes are not allowed to depend on org.jabref.logic ==> expected: <[]> but was: <[src/main/java/org/jabref/model/entry/identifier/DOI.java]>

@alchimos
Copy link
Contributor Author

alchimos commented Jul 9, 2020

Shit, saw this only after merge:
org.opentest4j.AssertionFailedError: The following classes are not allowed to depend on org.jabref.logic ==> expected: <[]> but was: <[src/main/java/org/jabref/model/entry/identifier/DOI.java]>

Anything I should do?
This happened when I moved the getAgency from DoiFetcher to DOI.

@koppor
Copy link
Member

koppor commented Jul 9, 2020

The idea behind our High-level architcutre is to have the model as model package. We are aware this leads to an anemic domain model.

Not sure whether downloading things should be part of a model.

Think, this is a devcall thing. We should discuss whether we want to "undo" the seperation of model and logic. What are our "limits" in model? @stefan-kolb

@Siedlerchr
Copy link
Member

as the build is broken now I would vote to simply change it back to the original, e.g. move the agency fetching back to the fetcher.

Siedlerchr added a commit that referenced this pull request Jul 11, 2020
* upstream/master: (243 commits)
  fix checkstyle
  mEDRA DOI fetcher implementation. (#6641)
  Bump bcprov-jdk15on from 1.65.01 to 1.66 (#6676)
  Bump appleboy/ssh-action from v0.0.6 to v0.1.2 (#6674)
  Add localisation strings and renamed formatter
  Bump gittools/actions from v0.9.2 to v0.9.3 (#6675)
  Skip non-working sourcespy site
  Let dependabot update github actions
  Add three formatters to fix new lines in abstract and digits in editors
  Update src/test/java/org/jabref/logic/importer/fetcher/CollectionOfComputerScienceBibliographiesParserTest.java
  Fix to imports
  Make fetcher test more specific by checking each field explicitly
  Use builder instead of "setField" statements
  (docs) fix styling
  Update jpackage notes
  Fix automerge workflow
  fix markdown
  Bump org.beryx.jlink from 2.20.0 to 2.21.0
  Bump unirest-java from 3.8.00 to 3.8.06
  Fix to dependency on Global
  ...
@alchimos
Copy link
Contributor Author

What should I do now?

@Siedlerchr
Copy link
Member

@Mind000 Just create a new PR where you move the DOI agency getting back to the DOIFetcher class. That is the simplest solution and this would not destroy our architecture constraints

@alchimos
Copy link
Contributor Author

So, if I got it right, I have to push the same files as the last commit I made, changing only the getAgency location.

@Siedlerchr
Copy link
Member

Siedlerchr commented Jul 13, 2020 via email

@koppor
Copy link
Member

koppor commented Jul 14, 2020

For internal referencre: follow-up PR is #6683.

Thus, everyhting should be fine now.

@alchimos alchimos deleted the medra-fetcher branch August 9, 2020 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOI fetcher for medra
4 participants