add codespell workflow, config and fix some typos (#127)

* Add github action to codespell main on push and PRs * Add rudimentary codespell config * run codespell throughout but ignore fail -- committing manually since example outputs are ignored for git * [DATALAD RUNCMD] Do interactive fixing of typos === Do not change lines below === { "chain": [], "cmd": "codespell -w -i 3 -C 2", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^
linkml · Feb 27, 2024 · 51a2845 · 51a2845
1 parent e7fb8e1
commit 51a2845
Show file tree

Hide file tree

Showing 10 changed files with 49 additions and 20 deletions.
diff --git a/.github/workflows/codespell.yml b/.github/workflows/codespell.yml
@@ -0,0 +1,23 @@
+# Codespell configuration is within pyproject.toml
+---
+name: Codespell
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+permissions:
+  contents: read
+
+jobs:
+  codespell:
+    name: Check for spelling errors
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+      - name: Codespell
+        uses: codespell-project/actions-codespell@v2
diff --git a/docs/datamodel/types/Objectidentifier.md b/docs/datamodel/types/Objectidentifier.md
@@ -15,5 +15,5 @@ URI: [linkml:Objectidentifier](https://w3id.org/linkml/Objectidentifier)
 
 |  |  |  |
 | --- | --- | --- |
-| **Comments:** | | Used for inheritence and type checking |
+| **Comments:** | | Used for inheritance and type checking |
 
diff --git a/docs/intro/export.md b/docs/intro/export.md
@@ -84,7 +84,7 @@ this guards against accidental overwrites.
 
 schemasheets allows *custom* sheet formats that map to the LinkML standard.
 
-you can use the combination of sheets2linkml and linkml2sheets to convert betweeen two sheet specifications.
+you can use the combination of sheets2linkml and linkml2sheets to convert between two sheet specifications.
 
 For example, let's say for schema1.tsv, you use a spreadsheet with the following headers:
 

diff --git a/docs/intro/mixed-sheets.md b/docs/intro/mixed-sheets.md
@@ -23,4 +23,4 @@ For example:
 |C|ForProfit|||||Organization|||||||
 |C|NonProfit|||||Organization|||Q163740|||foo|
 
- * [personinfo with tyoes](https://docs.google.com/spreadsheets/d/1wVoaiFg47aT9YWNeRfTZ8tYHN8s8PAuDx5i2HUcDpvQ/edit#gid=509198484)
+ * [personinfo with types](https://docs.google.com/spreadsheets/d/1wVoaiFg47aT9YWNeRfTZ8tYHN8s8PAuDx5i2HUcDpvQ/edit#gid=509198484)
diff --git a/examples/output/docs/types/Objectidentifier.md b/examples/output/docs/types/Objectidentifier.md
@@ -15,5 +15,5 @@ URI: [linkml:Objectidentifier](https://w3id.org/linkml/Objectidentifier)
 
 |  |  |  |
 | --- | --- | --- |
-| **Comments:** | | Used for inheritence and type checking |
+| **Comments:** | | Used for inheritance and type checking |
 
diff --git a/examples/output/jsonld/combined.jsonld b/examples/output/jsonld/combined.jsonld
@@ -207,7 +207,7 @@
       "definition_uri": "https://w3id.org/linkml/Objectidentifier",
       "description": "A URI or CURIE that represents an object in the model.",
       "comments": [
-        "Used for inheritence and type checking"
+        "Used for inheritance and type checking"
       ],
       "from_schema": "https://w3id.org/linkml/types",
       "imported_from": "linkml:types",

diff --git a/pyproject.toml b/pyproject.toml
@@ -33,4 +33,10 @@ sheets2linkml = "schemasheets.schemamaker:convert"
 linkml2sheets = "schemasheets.schema_exporter:export_schema"
 sheets2project = "schemasheets.sheets_to_project:multigen"
 
-linkml2schemasheets-template = 'schemasheets.generate_populate:cli'
+linkml2schemasheets-template = 'schemasheets.generate_populate:cli'
+[tool.codespell]
+# Ref: https://github.com/codespell-project/codespell#using-a-config-file
+skip = '.git,*.lock'
+check-hidden = true
+ignore-regex = '\bOTU\b'
+# ignore-words-list = ''
diff --git a/schemasheets/schemamaker.py b/schemasheets/schemamaker.py
@@ -217,7 +217,7 @@ def get_current_element(self, elt: Element) -> Union[Element, PermissibleValue]:
         """
         sc = self.schema
         if isinstance(elt, SchemaDefinition):
-            # TODO: consider multiple shemas per sheet
+            # TODO: consider multiple schemas per sheet
             return sc
         elif isinstance(elt, PermissibleValue):
             return elt
@@ -321,7 +321,7 @@ def check_excess(descriptors):
                 for c in vmap[T_CLASS]:
                     if self.use_attributes:
                         # slots always belong to a class;
-                        # no seperate top level slots
+                        # no separate top level slots
                         a = SlotDefinition(main_elt.name)
                         c.attributes[main_elt.name] = a
                         yield a

diff --git a/tests/input/mixs6_core_test.tsv b/tests/input/mixs6_core_test.tsv
diff --git a/tests/input/rda-crosswalk.tsv b/tests/input/rda-crosswalk.tsv
@@ -12,7 +12,7 @@ A. From Google dataset search recommendaton	Thing	description	mandatory	Text	A d
 	CreativeWork	keywords		Text	Keywords or tags used to describe this content. Multiple entries in a keywords list are typically delimited by commas.	dct:keyword (R)	dcat:keyword	dcterms:subject (R)*	MD_Identification/descriptiveKeywords//keyword	keywords (R)	Subject (M); Topic Classification Term; Keywords	keywords (O)	collection/subject        	dcterms:subject	keywords (M)		keywords		keywords	keyword
 	CreativeWork	license		CreativeWork or URL	A license document that applies to this content, typically indicated by URL.  (A license under which the dataset is distributed.)	dct:license	dct:license	dcterms:rights	MD_Identification/resourceConstraints//reference/CI_Citation, or text in MD_LegalConstraints/useLimitation [restrictionCode = license]	license (R)		licenses (R)	collection/rights/licence[@rightsURI] AND/OR collection/rights/licence[@type] AND collection/rights/licence	dcterms:rights	license (R)	Rights (O)	rights	Rights (O)	license	
 	CreativeWork	creator		Organization or Person	The creator/author of this CreativeWork (dataset). This is the same as the Author property for CreativeWork.  (To uniquely identify individuals, use ORCID ID as the value of the sameAs property of the Person type. To uniquely identify institutions and organizations, use ROR ID. )	dct:creator	dcterms:creator	dcterms:creator (M) 	MD_Identification/citation//citedResponsibleParty//name  [role = one of {author, coAuthor, originator, editor}]	creator (M)	Author; authorName (M)	creator (M)	collection/citationInfo/citationMetadata/contributor OR relatedObject|relatedInfo party/name where relation=IsPrincipalInvestigatorOf OR relatedObject|relatedInfo party/name where relation=author OR relatedObject|relatedInfo party/name where relation=coInvestigator OR relatedObject|relatedInfo party/name where relation=hasCollector	dcterms:creator	creator (R)	Creator (R)	AuthEnty*	Creators (M)	author	ResourceHeader/Contact[@role=PrincipalInvestigator]  ResourceHeader/Contact[@role=DataProducer]
-	CreativeWork	isPartOf		CreativeWork	Indicates a CreativeWork that this CreativeWork is (in some sense) part of. Reverse property hasPart.  If the dataset is a collection of smaller datasets, use the hasPart property to denote such relationship. Conversly, if the dataset is part of a larger dataset, use isPartOf.	dct:isPartOf (R)	dcterms:isPartOf	isPartOf	MD_Identifcation/associatedResource/name/CI_Citation [associationType = 'largerWorkCitation']	 includedIn(Dataset) (R)			relatedObject|relatedInfo collection where relation[@type='isPartOf']        	dcterms:isPartOf			isPartOf		isPartOf	ParentID (only for Granule resource type)
+	CreativeWork	isPartOf		CreativeWork	Indicates a CreativeWork that this CreativeWork is (in some sense) part of. Reverse property hasPart.  If the dataset is a collection of smaller datasets, use the hasPart property to denote such relationship. Conversely, if the dataset is part of a larger dataset, use isPartOf.	dct:isPartOf (R)	dcterms:isPartOf	isPartOf	MD_Identifcation/associatedResource/name/CI_Citation [associationType = 'largerWorkCitation']	 includedIn(Dataset) (R)			relatedObject|relatedInfo collection where relation[@type='isPartOf']        	dcterms:isPartOf			isPartOf		isPartOf	ParentID (only for Granule resource type)
 	CreativeWork	hasPart		CreativeWork	Indicates a CreativeWork that is (in some sense) a part of this CreativeWork. Reverse property isPartOf	dct:hasPart (R)	dcterms:hasPart	hasPart	MD_Identifcation/associatedResource/name/CI_Citation [associationType = 'isComposedOf']	includes(Dataset) (R)		hasPart (O)	relatedObject|relatedInfo collection where relation[@type='hasPart']        	dcterms:hasPart			hasPart		hasPart	
 	CreativeWork	version		Number or Text	The version of the CreativeWork embodied by a specified resource.	owl:versionInfo	owl:versionInfo	Version (O)	MD_Identification/citation//edition	 version (O)	Version	version (R)	registryObject:collection:citationInfo:citationMetadata:version		version (R)		version	version (O)	version	ProviderVersion
 	CreativeWork	temporalCoverage		Text	The temporalCoverage of a CreativeWork indicates the period that the content applies to  (The data in the dataset covers a specific time interval. Only include this property if the dataset has a temporal dimension.)	dct:temporal	dcterms:temporal	Date	MD_Identification/extent//temporalElement/extent/TM_Primitive	temporalCoverage (O)	Time Period Covered		collection/coverage/temporal	dcterms:temporal (start); dcterms:temporal (end)		Temporal Coverage (O)	 temporal			TemporalDescription/TimeSpan