Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New exported format: Simple Knowledge Organization System (SKOS) (Basic implementation or better); RDF on Turtle #38

Open
fititnt opened this issue May 2, 2022 · 2 comments
Labels
archiva-farmatis archīva fōrmātīs; /formats of files/@eng-Latn; About (new) data formats to package dictionaries

Comments

@fititnt
Copy link
Member

fititnt commented May 2, 2022


Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data. -- via https://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System

Based on suggestion in https://dadosabertos.social/t/metadados-legislativos-e-semantica/390/3?u=rocha by @augusto-herrmann, let's add SKOS as additional exportable format.

Notes

Default codes used for languages, while valid, may intentionally use non-normalized BCP47

Our language codes are BCP47 valid but not the most normalized format (which would require an IANA lookup). This is mostly because we

  • use ISO 639-3, ignoring completely ISO 639-2; this practice is also used by linguists including on Glottocodes
    • Example: por instead of pt in por-Latn
    • This approach is less clear in European languages (unless is minority languages, than is very relevant) but it make easier to intentionally be friendly for languages which never got into ISO 639-1 and ISO 639-2
  • Use ISO 15924 even when likely be unnecessary;
    • Example: Arab in arb-Arab, Latn in lat-Latn, ...

The SKOS do not state best practices of how to encode languages (just recommend BCP47). So by deciding this default, we can actually later also release the language tables.


Changes

@fititnt fititnt added the archiva-farmatis archīva fōrmātīs; /formats of files/@eng-Latn; About (new) data formats to package dictionaries label May 2, 2022
fititnt added a commit that referenced this issue May 2, 2022
@fititnt
Copy link
Member Author

fititnt commented May 2, 2022

Nice. At this point we have an exporter for rudimentar skos:Concept + skos:prefLabel (linguistic only; don't make sense abuse BCP47 like we do on tabular format to add inter-linguistic properties here, since RDF/SKOS/Turtle obviously support it).

One early attempt I stopped was trying to convert TBX to RDF based on bpmlod report but that generated somewhat ugly result at least for simpler cases (like at the moment we're doing). However, using SKOS and other examples as reference from what is used on the software for this type of task, the output format actually feel it could be edited by hand.


1603/63/101/1603_63_101.no11.skos.ttl

$ ./999999999/0/1603_1.py --methodus='status-quo' --status-quo-in-rdf-skos-turtle --codex-de 1603_63_101 > 1603/63/101/1603_63_101.no11.skos.ttl

Captura de tela de 2022-05-02 05-01-15

@fititnt
Copy link
Member Author

fititnt commented Jun 14, 2022

Okay, we're having an problem on how to organize the entire library.

At the moment, we're considering that each group is an entire skos:ConceptScheme, in a similar way we publish the PDF versions with grouped packages. UNESCO Therraurus use a different approach (https://vocabularies.unesco.org/exports/thesaurus/latest/unesco-thesaurus.ttl) that by the way requires another specification, as said here https://lists.w3.org/Archives/Public/public-esw-thes/2016Jan/0012.html.

Maybe the SKOS version always have 1603 as entry point?

I'm not 100% sure, we could make an early test to implicitly create the missing parents notes (up to 1603, or whatever root numbering would exist, since others could use others) and this would not break the things while likely not require something like iso-thes:superGroup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archiva-farmatis archīva fōrmātīs; /formats of files/@eng-Latn; About (new) data formats to package dictionaries
Projects
None yet
Development

No branches or pull requests

1 participant