Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a standard vocabulary and annotation property for indicating metaclass #150

Open
cmungall opened this issue Jun 27, 2019 · 2 comments
Open
Assignees
Labels
Policy Discussions about PRO policies

Comments

@cmungall
Copy link

cmungall commented Jun 27, 2019

PRO has an implicit ontology of metaclasses. These are currently represented by overloading rdfs:comment wth a value like Category=sequence.

This is suboptimal, a the metaclass is not represented as a URI so the user has no way of following this string to see what it means

This is presumably due to early obof limitation, but it is now possible to have arbitrary annotation assertions in obo format.

I propose to use the biolink vocabulary here. For each class add a triple

PR_nnnn bl:category bl:ProteinIsoform or similar

See also biolink/biolink-model#230

But whichever system is used, we need to ontologize the metaclasses. I think the PRO group are ideally placed to do this as they have thought about this a lot, and it deserves to be represented as a computable artefact rather than hidden in comments.

@nataled nataled self-assigned this Jun 27, 2019
@nataled nataled added the Policy Discussions about PRO policies label Jun 27, 2019
@nataled
Copy link
Collaborator

nataled commented Jun 27, 2019

The original intent of the categories was just to have a way of explaining the overall structure of PRO. Other than that, we pretty much use them for internal purposes, and they've evolved accordingly. The simplest way of thinking about them is as subsets. In fact, from time to time we consider turning them into proper subsets, but really there is zero benefit to doing so (they are already internally computable, and no one has ever asked for it). We never really intended to ontologize them; they'd need a lot of work. We do, however, plan to do expose them in some way (that is, take them out of the comments). By the way, there is documentation on what they mean:

https://proconsortium.org/PRO_QA.pdf (specifically, Q4)

Side note: looks like this doc needs updating.

@cmungall
Copy link
Author

cmungall commented Jul 3, 2019

I don't know how other people use PRO, but it would seem to be really useful to many people. I am not sure I would jump to stating zero benefit.

For example, most of the ontologies I work on that use PRO don't use any species-specific level information. Making the import chain is a pain due to the size of PRO. Ideally PRO would provide downloadable subsets for cuts like this but in the absence of this it's easier to do a SPARQL query based on a predictable property. While we could do this by encoding the comment text in the query this is obviously suboptimal.

I think subsets would be better than the current situation but I think having a dedicated property would be better.

We never really intended to ontologize them; they'd need a lot of work

What would the work be? I mean you already have an implicit ontology that has great documentation in a pdf file. I don't think this needs to be overthought. Just a URI for each concept, included in PRO or an ancilliary ontology.

We will likely end up doing this in biolink model anyway as we need a computable way to distinguish entries in that denote generic forms from variant forms etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Policy Discussions about PRO policies
Projects
None yet
Development

No branches or pull requests

2 participants