Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Research Organization Registry (ROR) IDs #6640

Closed
mcuthill opened this issue Feb 11, 2020 · 31 comments
Closed

Support Research Organization Registry (ROR) IDs #6640

mcuthill opened this issue Feb 11, 2020 · 31 comments
Labels
Feature: Metadata GREI 2 Consistent Metadata NIH OTA: 2.5.3 pm.GREI-d-2.5.3 Type: Suggestion an idea User Role: Curator Curates and reviews datasets, manages permissions User Role: Depositor Creates datasets, uploads data, etc.

Comments

@mcuthill
Copy link

As a data steward for an organization producing and publishing data, we would like to see the Research Organization Registry ID option added to the Citation metadata block. Perhaps as an addition to the list of Identifier Schemes for authors, or attached to the Affiliation, Producer, Distributor, or similar. As can be seen here, a respectable list of supporters and signatories have already committed to the adoption and use of RORs going forward.

@pdurbin
Copy link
Member

pdurbin commented Feb 12, 2020

@mcuthill hi! Two weeks ago I heard all about ROR at this event in Lisbon the day before PIDapalooza 2020: https://www.eventbrite.com/e/the-ror-community-meeting-lisbon-registration-82814758171

It was a fun group! Here's a pic from https://twitter.com/ResearchOrgs/status/1222159655377473539

Screen Shot 2020-02-11 at 9 48 11 PM

Here are my main takeaways from that event:

I don't think ROR IDs make sense in the list of author identifier schemes (ORCID, etc.) (or does it?!?) but yes, ROR IDs could be tied to Affiliation and other fields you mentioned. (Would it make sense to re-title this issue to something like "Support Research Organization Registry (ROR) IDs"?) Off the top of my head I'm not sure how much work this would take.

@djbrooke djbrooke added the Small label Feb 12, 2020
@djbrooke djbrooke removed the Small label Feb 12, 2020
@djbrooke djbrooke changed the title Adding ROR to Author Identifier Schemes Support Research Organization Registry (ROR) IDs Feb 12, 2020
@mcuthill
Copy link
Author

@pdurbin Thanks for sharing all the materials from that workshop! ROR definitely seems to be gaining momentum. You're right that it wouldn't generally fit in the Author identifier category, except in edge cases like ours (Ocean Networks Canada) where the data mostly isn't directly associated with a single PI so the organization serves as the author. It might be good to have it as an option in the Identifier Scheme list for situations like that, but also added to other field/s where organizations are normally identified.

@pdurbin
Copy link
Member

pdurbin commented Feb 12, 2020

@mcuthill sure. As has been discussed extensively in #5029 Dataverse doesn't currently have a way to express the different between a person and an organization in the "Author" fields and subfields, but I see what you mean. If there was a checkbox or something for "organization", perhaps we could prompt for a ROR ID. Something like this (you have to imagine the checkbox)

Screen Shot 2020-02-12 at 5 41 52 PM

@mfenner
Copy link

mfenner commented Feb 17, 2020

DataCite maybe two years ago added the optional property nameType that can be either "Personal" or "Organizational" for exactly this reason. We also separate out personal names into givenName and familyName fields. These details are important for properly formatting metadata into a citation in one of the many citation styles. We support ROR (or other organizational identifiers) for names that are for organizations.

DataCite uses a set of rules to "guess" whether an author is a person or organization. The most effective seems to be the list of common givenNames that we check against every author name on DOI registration.

@pdurbin
Copy link
Member

pdurbin commented Feb 24, 2020

@mfenner thanks for the reminder about nameType. I see we have tests for it here:

However, these are used for a specific "export" format (OpenAIRE) rather than what Dataverse sends over the wire to DataCite.

As @jggautier has noted at #2917 (comment) and #6492 (comment) we use your rules already in that export format already. Thanks!

@philippconzett
Copy link
Contributor

philippconzett commented Sep 26, 2020

We are definitely in favor of implementing ROR in Dataverse. In a recent report (https://doi.org/10.29242/report.effectivedatapractices2020), the Association of Research Libraries (ARL) recommends wide adoption of these 5 core PIDs to power findability of research data, including ROR:

image

@mcuthill already mentioned some fields in the Citation Metadata schema where ROR would fit in. Here is my list of relevant fields:

  • Author
  • Contact
  • Producer
  • Contributor
  • Grant Information
  • Distributor

@mfenner
Copy link

mfenner commented Sep 26, 2020

@philippconzett here is how we currently connect ROR IDs to DOIs at DataCite:

  • Hosted: Organization is hosting institution, connected via repository identifier.
  • Contributed: Organization is creator or contributor, connected via nameIdentifier.
  • Affiliated: Organization is creator or contributor affiliation, connected via affiliationIdentifier.
  • Funded: Organization is funder, connected via funderIdentifier.

The relative numbers as of today are as follows:

Bildschirmfoto 2020-09-26 um 09 27 11

This is data on all DataCite DOIs and 8 million Crossref DOIs in DataCite Commons. Crossref doesn't yet support ROR IDs in their schema, but we can link ROR ID and DOI via the Crossref Funder ID in funding information.

Affiliation is the classic use case for ROR, in addition we have a small number of DOIs with organizations as creator or contributor. But by far the largest number is hosted, DOIs in a repository run by particular organization identified by its ROR ID. This is of course one big reason why institutional repositories exist. For domain repositories that linkage is also useful, but with a different kind of information. For a repository that hosts content contributed by researchers from many different organizations, linking by affiliation is crucial.

For the 273,601 DataCite DOIs with at least one ROR ID as affiliation identifier, more than 220K are in the "institutional repository" category. Dryad is currently the implementation in the domain repository category with the biggest uptake.

When you look at a particular organization identified by ROR ID in DataCite Commons, e.g. UiT, you see these different sources aggregated in one place, e.g. Dryad datasets and publications from Crossref with funding: https://commons.datacite.org/ror.org/00wge5k78

Not yet all DOIs from DataverseNO, as this needs the new DataCite consortium organization structure to be in place to uniquely associate the repository with UiT. An organization where this transition has already happened is for example the University of Cambridge: https://commons.datacite.org/ror.org/013meh722

@philippconzett
Copy link
Contributor

Thanks, @mfenner! This was useful information. And I guess the last section answers the question which I have had on my to-do-list since August 27, namely "Why are there only 64 records [as of 2020-08-27] for UiT in the DataCite Commons overview?" So, once the DataCite consortium organization structure is in place, the numbers for UiT will be more correct. But will these numbers be based on the fact the UiT is running DataverseNO? In that case, will all the datasets published by other partner institutions of DataverseNO, e.g. NTNU (https://ror.org/05xg72x27), UiB (https://ror.org/03zga2b32) etc., also be associated with UiT? In terms of the Dataverse metadata schema, I think the correct association would be through the metadata field producer, ideally through ROR.

@mfenner
Copy link

mfenner commented Sep 26, 2020

@philippconzett Mapping ROR ID and DOI via the repository as a "shortcut" only works reliably if it is an "institutional repository. It multiple institutions are behind a repository as I can see for DataverseNO for example at https://www.re3data.org/repository/r3d100012538, things get more complicated. The safest way is of course to add the ROR ID to every single DOI, but I would suggest to think about how this can also be done at the repository level in Dataverse, for example by defining "collections" for each repository partner institution.

The "contributed" group in my visualizations above includes contributors with a ROR ID as nameIdentifier, and if you use that for example with contributorTypes "producer", it would work with DataCite Commons today without additional work needed on our end. You can see this for the California Digital Library in this query (where they use contributor type "producer" for data management plans, some very recent work where DataCite helped): https://commons.datacite.org/ror.org/03yrm5c26?query=contributors.contributorType%3AProducer

@philippconzett
Copy link
Contributor

@mfenner Once ROR support is in place in Dataverse, we will add RORs to each dataset. We would simply add these RORs in the dataset/metadata templates for each partner institution. The ROR will then automatically be included in the Producer field (and if necessary other fields, e.g. Author Affiliation) of each published dataset.

You suggest we also should consider "defining "collections" for each repository partner institution". Each DataverseNO partner institution has already its own institutional collection (= sub-dataverse), e.g. UiB: https://dataverse.no/dataverse/uib. But currently, such collections do not get their own DOI in the Dataverse software. However, at request from a research group, DataverseNO has recently minted a collection DOI (through DataCite Fabrica) for a sub-sub-collection; see https://doi.org/10.18710/AJ4S-X394. Would minting such collection DOIs be helpful to associate datasets with organizations in DataCite Commons?

@mfenner
Copy link

mfenner commented Sep 26, 2020

If ROR IDs can be automatically included in the producer field, then maybe using collections is not needed. For repositories with content from multiple organizations, using ROR IDs per DOI is probably the "safest" way to associate content with an organization.

Something that would help then, and we have heard this in other contexts, is the ability to "bulk update", so that this information can also be added retroactively without too much troiuble.

@philippconzett
Copy link
Contributor

This blog post may be of interest for the discussion in this issue thread: https://www.pidforum.org/t/organizational-identifier-adoption-in-datacite-metadata/1279.

@philippconzett
Copy link
Contributor

I just noticed that support for PIDs for institutions is set out as a desired characteristics in the COAR Community Framework for Good Practices in Repositories (https://doi.org/10.5281/zenodo.4110829); cf.:

1.9 The repository supports PIDs for authors,funders, funding programmes and grants,institutions, and other relevant entities.

@doigl
Copy link
Contributor

doigl commented Apr 26, 2021

Just to support the issue: we (University of Stuttgart) would also be very interested to have ROR-Ids integrated with all the affiliation fields (Author, Contact, Producer, Distributor), ideally in form of an external controlled vocabulary as the backend of a auto-fill-field with the label of the information visible for humans and the ROR-ID somewhere in behind and added to the DataCite-Metadata for getting a DOI.

In our repository, we have several datasets with authors from different organizations, so it would really be good, if the ROR could be attached not only at the dataset-level, but on the author-affiliation-level. And we still need to attach an ORCID to the author, so it should really be an identification of th affiliation of an author/contact and not an identification of the author itself.

@lmaylein
Copy link
Contributor

lmaylein commented Nov 9, 2021

Just to support the issue: we (University of Stuttgart) would also be very interested to have ROR-Ids integrated with all the affiliation fields (Author, Contact, Producer, Distributor), ideally in form of an external controlled vocabulary as the backend of a auto-fill-field with the label of the information visible for humans and the ROR-ID somewhere in behind and added to the DataCite-Metadata for getting a DOI.

Heidelberg University would also appreciate this.

@stevenmce
Copy link

stevenmce commented Nov 9, 2021

And +1 for ADA as well - we are looking at RORs for our CADRE project (https://cadre5safes.org.au/)

@pdurbin
Copy link
Member

pdurbin commented Feb 10, 2022

With help from @Kris-LIBIS and the code in gdcc/dataverse-external-vocab-support#9 I was just able to search for "ucla" under Author Affiliation and see a list of organizations in ROR to select from. Here's a screenshot:

Screen Shot 2022-02-10 at 3 40 34 PM

@pdurbin
Copy link
Member

pdurbin commented May 17, 2022

@landreev recently configured https://demo.dataverse.org with the same external controlled vocabulary example: Author Affiliation can be populated from ROR.

He put some nice screenshots at #8571 (comment)

@philippconzett
Copy link
Contributor

Great! Just tested it. Works fine. Would it make sense to expand the search configuration to include non-initial positions, so that when searching, e.g., for "California" you also would get results where "California" is in the midle or the end of the name, e.g., "University of California", "University of California, Berkeley"?

@Kris-LIBIS
Copy link
Contributor

Kris-LIBIS commented May 18, 2022

@philippconzett That depends on the search API of ROR. But as far as I can tell from the docs and the screenshot above, that should already work.

Please note that this has been a quick proof of concept implementation. The ROR search API only returns the first 20 results. In order to retrieve more, support for pagination should be added. Then again, you can narrow your search by entering multiple words like "berk* calif*".

For instance: we use pagination in our author lookup:
image

@philippconzett
Copy link
Contributor

Thanks, @Kris-LIBIS! It seems that the pagination configuration was the reason why I didn't see relevant results when searching for, e.g., "California". I guess pagination would be a configurable feature?

@mreekie mreekie added the NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... label Oct 6, 2022
@pdurbin pdurbin added Type: Suggestion an idea Feature: Metadata User Role: Curator Curates and reviews datasets, manages permissions User Role: Depositor Creates datasets, uploads data, etc. labels Oct 9, 2022
@mreekie
Copy link

mreekie commented Feb 13, 2023

Priority:

@mreekie mreekie added pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations labels Mar 20, 2023
@cmbz
Copy link

cmbz commented May 1, 2023

Update:

@jggautier
Copy link
Contributor

jggautier commented May 18, 2023

I added this in the broader related issue at IQSS/dataverse-pm#19 and realized I should also mention here that in a Google Slide at https://docs.google.com/presentation/d/1PtqmEzAamuM2__V8psOIetgNODPQxjqSEOuxL3kAV-Y I've tried to summarize what support means and which types of metadata are and aren't supported in some way. I'm hoping this helps scope the work.

@sbarbosadataverse
Copy link

Most recent update to this issue:

NIH Task 2.5.3* | Task 2.5.3: Participate in GREI ROR Working Group and define and scope Dataverse ROR support (New for Year 2) | Proposed: Membership in ROR WG and document and related issues (e.g., #6640) describing how Dataverse will support ROR and technical work needed to provide this support

@cmbz cmbz added NIH OTA: 2.5.3 pm.GREI-d-2.5.3 and removed NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations labels Aug 28, 2023
@cmbz
Copy link

cmbz commented Aug 28, 2023

Updated AIM labels to reflect relationship to Aim 2.5.3 rather than 1.5.1 and 1.5.2

@amandafrench
Copy link

Amanda French, Technical Community Manager for ROR, here. Just a note that I'm available to answer any questions you might have as you integrate ROR. And regarding the discussion from 2020 about individuals vs. institutions as authors, you might take a look at the slides at https://doi.org/10.5281/zenodo.8074996 where @zzacharo showed how InvenioRDM handles that in the interface.

@pdurbin
Copy link
Member

pdurbin commented Sep 1, 2023

Thanks @amandafrench! Much appreciated! 🎉❤️

@jggautier
Copy link
Contributor

We'll be working on this as part of NIH-GREI funded work. @cmbz and I agreed to list this issue in IQSS/dataverse-pm#127, where related issues are listed, and close this GitHub issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Metadata GREI 2 Consistent Metadata NIH OTA: 2.5.3 pm.GREI-d-2.5.3 Type: Suggestion an idea User Role: Curator Curates and reviews datasets, manages permissions User Role: Depositor Creates datasets, uploads data, etc.
Projects
Development

No branches or pull requests