Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show information about the journal #6189

Closed
tobiasdiez opened this issue Mar 27, 2020 · 15 comments · Fixed by #10015
Closed

Show information about the journal #6189

tobiasdiez opened this issue Mar 27, 2020 · 15 comments · Fixed by #10015

Comments

@tobiasdiez
Copy link
Member

tobiasdiez commented Mar 27, 2020

In the entry editor, there should be a button next to the journal field that displays a bunch of information about the journal in a PopOver. For example similar to
image
(from eigenfactor)

A few possible datasources are discussed here: https://academia.stackexchange.com/questions/3/where-can-i-find-the-impact-factor-for-a-given-journal (related: https://github.com/ikashnitsky/sjrdata)

@dimitra-karadima
Copy link
Contributor

@tobiasdiez I want to tackle this issue! Can you give more specific details on where the relevant code is in order to add the button.
I have also googled on how can you search Google Programmatically and based on the link: http://www.eigenfactor.org/projects/journalRank/rankings.php?bsearch=COMMUNICATIONS+IN+MATHEMATICAL+PHYSICS&searchby=journal&orderby=eigenfactor
I thought I'd have just the "COMMUNICATIONS+IN+MATHEMATICAL+PHYSICS" as a variable in order to search every time for the specific journal. What do you think about it? I haven't done anything similar before so if you have a better idea, it would be much appreciated!

@tobiasdiez
Copy link
Member Author

Thanks for your interest @dimitra-karadima! I had a quick look and it looks like none of the above sources have a proper API that would give us easy access to the data. Thus, it seems this will be a bigger project.

I would propose the following:

What do you think?

@dimitra-karadima
Copy link
Contributor

@tobiasdiez very helpful input! But I don't think I can handle it right now. I am not really keen on python scripts, json files let alone combining them even though you have done a great work finding all the files needed! It is going to take me much more time than I thought and right now I am kinda busy. So I am going to find a smaller issue and maybe come back to this when I find some extra time if no one else has tackled it till then! And again I am really sorry for dropping the issue.

@tobiasdiez
Copy link
Member Author

That is very understandable. I wasn't aware of how much work this is until I wrote down what needs to be done.

But we do have a few issues that should be smaller tasks. For example, the ones tagged with good-first-issue and the ones concerning fetcher are usual also pretty self-contained. Looking forward to your next PR!

@ilippert
Copy link
Contributor

Generally, it might be of interest to query whether journals are listed as open access journals.
https://doaj.org/api/v1/docs

@koobs
Copy link

koobs commented Dec 22, 2020

CrossRef has a Journals resource that might be handy.

CrossRef API Documentation: Resources

Example API Calls:

@KallePettersson
Copy link

Hi we are a group of five university students (@davyie, @osclind, @LukasGutenberg and @martinfalke) who would like to work on this issue as part of the course DD2480 Software Engineering Fundamentals at KTH Royal Institute of Technology. Is there anything in particular we should know about?

LukasGutenberg added a commit to DD2480-group18/jabref that referenced this issue Mar 3, 2021
- added another button for the journal field
- added tooltip for the button
- added a function the button is linked to where we can work on the feature
davyie pushed a commit to DD2480-group18/jabref that referenced this issue Mar 3, 2021
    - added initial python script to generate JSON from an API call
    - Is currently using Crossref API
@tobiasdiez
Copy link
Member Author

tobiasdiez commented Mar 3, 2021

Cool, thanks for your interest! The approach outlined in #6189 (comment) is still pretty much up-to-date. Do you have any questions concerning this? (btw: in place of python you could have course also use java to download and pre-format the journal info if you prefer that)

Edit: It would be also nice if the info could be shown directly in the entry editor as a popover instead of a new dialog, using http://fxexperience.com/controlsfx/features/#popover.

LukasGutenberg added a commit to DD2480-group18/jabref that referenced this issue Mar 4, 2021
- popover implemented and located above assigned button
- ready to display relevant information once implemented
LukasGutenberg added a commit to DD2480-group18/jabref that referenced this issue Mar 4, 2021
davyie pushed a commit to DD2480-group18/jabref that referenced this issue Mar 8, 2021
davyie pushed a commit to DD2480-group18/jabref that referenced this issue Mar 8, 2021
davyie pushed a commit to DD2480-group18/jabref that referenced this issue Mar 8, 2021
        - Test is outlined and ready to be integrate with code
davyie pushed a commit to DD2480-group18/jabref that referenced this issue Mar 8, 2021
…tomated

automated test. The examples are written in the tests methods. This is
related to issue JabRef#6189.
davyie pushed a commit to DD2480-group18/jabref that referenced this issue Mar 8, 2021
davyie pushed a commit to DD2480-group18/jabref that referenced this issue Mar 8, 2021
davyie pushed a commit to DD2480-group18/jabref that referenced this issue Mar 8, 2021
…oading and pre-processing journal data using Scimagojr.
osclind added a commit to DD2480-group18/jabref that referenced this issue Mar 8, 2021
- Fix a mistake in the naming of a new test file
@martinfalke
Copy link
Contributor

We've attempted a bunch of different solutions for this issue and we did not get very far on either of them. We have decided to write a summary of our suggested way going forward, as the course we are working on this issue through is coming to an end. Below we have divided the work into sections based on the completeness of the different parts. After that comes notes on some different parts.

Complete or requiring minor changes:

  • GUI button that can be pressed to show a PopOver above it, using placeholder content

Started but incomplete:

  • Python script that downloads and aggregates the journal data, see 1. below
  • Unit tests that validate the results of the Python script, see 2. below

Not started or minor start:

  • Designing and coding the display of data in the PopOver
  • Code for calling the Python script and populating the PopOver with the data (only the method that the button calls is implemented)

1. Python script

Concerning the Python script, we figured that it would be important to stick to the standard modules, so that no packages need to be installed prior to running it. However, there were additional problems that occurred that are currently unresolved. The first problem is choosing which API to use, see 3. for a comparison between a few that we considered. The second problem is choosing how to use the API that is chosen (e.g. live retrieval of data vs. downloading all of the data and storing it). This is further discussed in 4A.

This script uses the URL query from Scimagojr to retrieve data. We discovered some troubling issues with this method that are described in more detail inside the spoiler below.

Code with notes on problems

Problems

  • The data returned from Scimagojr is in .csv-format. Its rows are separated by newlines, but there are also some cells that contain newlines, causing the rows to break at some points. This is currently only handled by checking the length of the list cells that is returned from splitting on the delimiter ;, in which case the row is skipped entirely. This could for instance be solved by hard-coding an expected number of columns for each row, which is then used to detect broken rows. The broken rows should then be merged with their respective subsequent row.
  • Searching through each row for the correct ISSN involves a lot of string splitting and comparison, which is very time-consuming to the point of it being infeasible to do live. A case where the correct row is at the end of the .csv-file can take up to a minute, if not more. This could be resolved if Scimagojr allowed filtering the query on ISSN (e.g. https://www.scimagojr.com/journalrank.php?year=2014&issn=15461718&out=xls would download only the data for Nature Genetics (ISSN: 15461718) as a single .csv-row).
  • The code would require maintenance of at least two things that may change over time. The first is the order of columns that is currently hard-coded to check the fifth column for ISSN. The second is the values of start_year and end_year that is based on what data is available on Scimagojr. Presumably newer years are added eventually, but 2020 does not exist as of 2021-03-08, and old data might be removed.
  • This is not exactly a problem, but the script would be called with one argument, and that argument should be the ISSN for the journal that the button was pressed.
import urllib.request as rq
import sys

def journal_url(year):
    # Search view: https://www.scimagojr.com/journalrank.php?year=2019
    baseURL = 'https://www.scimagojr.com/journalrank.php?year='
    # '&out=xls' returns a .csv-file with the journal rankings of the specified year
    downloadQuery = '&out=xls'
    return baseURL + str(year) + downloadQuery

def get_year_stats(year, issn):
    response = rq.urlopen(journal_url(year))
    # Response Status 200 OK
    if response.status == 200:
        lines = response.read().splitlines()
        for l in lines:
            decoded = l.decode('utf-8')
            cells = decoded.split(';')
            if len(cells) < 5:
                continue # incomplete data
            # cells[4] should contain a list of ISSNs for the journal, separated by ', '
            # e.g. "12345678, 98765432"
            if issn in cells[4].split(', '):
                return str(year) + ";" + decoded # prepend the year

        return [] # TODO: handle case where ISSN is not found
    else:
        return [] # TODO: revise potential HTTP error handling
        
# which journal the data is fetched for
journal_issn = '15458601' # default for testing
issn = sys.argv[1] if len(sys.argv) > 1 else journal_issn

# range of available years as of 2021-03-08
start_year = 1999
end_year = 2019
years = range(start_year, end_year+1)

journal_stats = []

for y in years:
    journal_stats.append(get_year_stats(y, issn))

# TODO:
# write aggregated stats to .json-file

2. Unit tests for script results

The unit tests committed in src/test/java/org/jabref/gui/fieldeditors/JournalEditorPopOverTest.java were intended to be used for automated testing of the script working as expected. Since the script is incomplete, these have not been run and should instead be viewed as sketches for tests that can be used, if the feature is developed. They are mostly based on the unit tests found in ThemeTest.java as they also utilize file operations in a temp directory (@TempDir annotation from junit).

One of the tests, namely invalidISSNreturnsEmptyData, checks that the given ISSN is not found in the temp file, and should therefore either return empty data or throw an exception. The other test checks whether the given ISSN is found in the temp file. If the entry with the given ISSN is found then it should return the data.

3. Comparison between API's

CrossRef

Pros:

  • Public
  • Metadata
  • Easy to use

Cons:

  • No relevant data. We do not believe the metadata in the API is relevant for the purpose of the feature.

Scimagojr

Pros:

  • Public
  • Relevant metadata such as number of citations and other metrics available on year basis

Cons:

  • See our proposed solution for more information on why this API is not feasible for the feature
  • Can't filter the query that downloads a .csv-file by ISSN, meaning all rows need to be searched for the correct ISSN

Elseiver

Pros:

  • Pseudo public for institutions
  • Relevant metadata about journals on yearly basis
  • Reliable
  • Supports Scimagojr

Cons:

  • Requires API key (Maybe not feasible for an open source project)
  • Requires implementation of API key handling

DOAJ

Pros:

  • Public
  • Contains Metadata

Cons:

4. Conclusion and proposed solutions

We believe that this is a far more complex issue than it initially may seem like. We urge the developers to reconsider whether the feature (and the experience it provides) is really worth the effort, as well as the complexity and need for maintenance it adds to the project. Below we propose a few solutions that we think are the most viable options moving forward.

A.

Download all this data at a set event (e.g. startup of the application) with a cleanup of previously saved results so that the results are ready to be presented as soon as the button is pressed. This would on the other hand slow down the startup of the program for a feature that may not be used all that much.

B.

On pressing the button, the webpage that has the stats is simply opened in a browser (e.g. the Scimagojr page for Nature Genetics). The problem of not being able to directly create a URL based on ISSN persists, though.

C.

Scrap the feature altogether, perhaps at most provide a tooltip that recommends learning more about journals on an external webpage (such as Scimagojr).

@tobiasdiez
Copy link
Member Author

Thanks a lot for the work and effort you put into this. Very much appreciated.

I didn't anticipated that it would be so difficult to get the data from scimagojr. Sorry! Given these problems, I would suggest to run the script once and then commit the generated json file with the code. This file can then be loaded, hopefully on the fly when people click the button. Has the disadvantage that we need to run the script regularly, but that should only be necessary once a year so no big deal.

I understand that your project is coming to an end. Nonetheless it would be nice if you could prepare a PR with the changes you have so far. Then we can take it over from there. Of course, you are invited to continue working on it as well.

@martinfalke
Copy link
Contributor

@tobiasdiez Absolutely no worries, it is a natural part of the development process and we are happy to have helped. A draft pull request for the issue can now be found at #7541.

@calixtus
Copy link
Member

One could probably take the draft by @martinfalke and finish it if that is ok?

@JabRef JabRef deleted a comment from github-actions bot Nov 22, 2021
koppor pushed a commit that referenced this issue Sep 1, 2022
8d69f16 Create university-of-hull-harvard.csl (#6146)
139dfdd Create current organic synthesis.csl (#6139)
bb006c8 Update acta-universitatis-agriculturae-sueciae.csl (#6143)
5815da0 Create food-science-and-biotechnology.csl (#6132)
2702a7c Update harvard-university-for-the-creative-arts.csl (#6104)
ef34543 Update economic-geology.csl (#6128)
0adcd30 Bump mathieudutour/github-tag-action from 5.6 to 6.0 (#6141)
3c36e97 Create universite-du-quebec-a-montreal-prenoms.csl (#6073)
415bc05 Bump softprops/action-gh-release from 0.1.14 to 1 (#6142)
ae8c5e4 Create politique-europeenne.csl (#6074)
09cbc09 Update cell-numeric-superscript.csl (#6188)
6ee1ace Update avian-conservation-and-ecology.csl (#6191)
cb5c43f Update harvard-anglia-ruskin-university.csl (#6189)
5c4f4c0 Create anais-da-academia-brasileira-de-ciencias.csl (#6066)
a60dfe9 Update cardiff-university-harvard.csl (#6190)
999a45c Create sociologia-urbana-e-rurale.csl (#6042)
1bc9d62 Bluebook (#6183)
a4f2a72 Oxford Brookes (#6182)
88df8d5 Delete harvard-cardiff-university-old.csl (#6180)
b9302fd Update APA styles for "event" macro (#6174)
d4daec6 remove DOI for printed articles organizational-studies.csl (#6176)
acfc620 Create liver-transplantation.csl (#6167)
129a775 Change "event" to "event-title" (#6164)

git-subtree-dir: buildres/csl/csl-styles
git-subtree-split: 8d69f16
@aqurilla
Copy link
Contributor

Hi, I would like to take up this issue

@calixtus
Copy link
Member

Hi @aqurilla thanks for your interest. We already have seen a few PRs of you, so there is no doubt, that you you are able to complete it. But from the comments above it seems, that this will also take quite some time complete the implementation. So if you think that this is suitable for you we would be very happy if you decide to work on this. If you have questions you can always ask us via Gitter chat or by email. We would appreciate it if you create an early draft PR to document your progress, so we can help and support your work.

@aqurilla
Copy link
Contributor

Appreciate the heads-up @calixtus! I'll create a draft PR to share progress on this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

8 participants