Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further improvements for SEO (especially schema.org / GDS) #295

Open
m-mohr opened this issue Mar 29, 2023 · 8 comments
Open

Further improvements for SEO (especially schema.org / GDS) #295

m-mohr opened this issue Mar 29, 2023 · 8 comments
Milestone

Comments

@m-mohr
Copy link
Collaborator

m-mohr commented Mar 29, 2023

Google Search makes good use of STAC Browser:
https://www.google.de/search?q=site:mspc.lutana.de

Google DatasetSearch also picks it up, but the data is not ideal yet:
https://datasetsearch.research.google.com/search?src=0&query=Planet%20NICFI&docid=L2cvMTF0dDk0bGd6aw%3D%3D

Especially the schema.org data should be improved if possible.

@cboettig
Copy link

cboettig commented Apr 4, 2023

Thanks @m-mohr !

I'd be keen to see an alignment of Science on Schema guidelines from the Earth Science Information Partners Federation with the schema.org being generated from STAC JSON. (because it would be so nice if these two JSON metadata formats from two widely adopted earth science communities were more interoperable 😊 !)

@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 4, 2023

@cboettig Is there an equivalent for STAC Collections in SOSO? I'm using a DataCatalog right now, but it seems not to be part of SOSO. Edit: Just found https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md#collections-of-datasets-using-schemaorg-datacatalog - I'm currently struggling to get the DataCatalog be included in Google Dataset Search though...

I'll have a look at the Dataset guideline on what we can improve in the Browser.
I don't think there's a good equivalent for Data Repository? Maybe the root catalog?
What do you think is the best way to align SOSO and STAC?

One issue with Dataset is that it is likely what STAC Items are, but STAC Items are often not very descriptive (just have an id, but no title or description). Or should STAC Collections be Datasets?

@cboettig
Copy link

cboettig commented Apr 4, 2023

Thanks @m-mohr -- good questions! I should follow up with the ESIP devs, I'm mostly a data consumer working across products in both stac and ESIP and hoping to connect the dots!

I think there are analogous concepts for the collection / catalog / item levels of STAC but am not sure the best choices. My understanding is that schema.org/Dataset was based on the original W3C DCAT (Data Catalog) standard, now in it's 3rd version, which I think has all these notions. I know the ESIP folks know the W3C standards well and I think their style of schema.org roughly parallels that, but I'm not expert here.

@mbjones or others probably have good advice here.

@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 4, 2023

Thanks. So right now I'm mapping:
STAC Collection (or Catalog) -> DataCatalog
STAC Item -> Dataset
STAC Asset -> DataDownload

I'm not sure whether that's ideal though due to the limited information in a STAC Item. What we can find now in GDS is just Datasets with limited information, but no DataCatalogs, which have much more information.

Any insights would be appreciated.

@cboettig
Copy link

cboettig commented Apr 4, 2023

This seems reasonable to me at least. I'm also interested in the stac extensions or at least those extensions that have good parallels to science-on-schema (e.g. scientific citation, file info, table).

In some ways mapping such extensions to schema.org is particularly compelling where there are schema.org based dataset browsing tools that can already take advantage of indexing on such fields as "author" or "column name" that are not as first-class in stac search....

@m-mohr m-mohr modified the milestones: 3.0.0, 3.1.0 Apr 5, 2023
@cboettig
Copy link

@m-mohr regarding your comment:

I'm currently struggling to get the DataCatalog be included in Google Dataset Search though...

yeah, I noticed that too. I got some great advice from @mbjones on possible culprits for this:

  • Google tools do support SO markup in pages loaded with javascript generation, but there are timeouts and other issues to pay attention to. Records will fail the google ingest if some key metadata are outside google’s parameters — for example, schema:description must be > 50 and < 5000 characters or google will reject the record.

  • Do you have a sitemap.xml that directs google to these landing pages for crawling? Sometimes google doesn’t find stuff to crawl without a sitemap. See: https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/GETTING-STARTED.md#sitemaps

Also, while I think the mapping you have

STAC Collection (or Catalog) -> DataCatalog
STAC Item -> Dataset
STAC Asset -> DataDownload

Makes sense from a literal/technical standpoint, it does look like a lot of metadata fields often found on a Dataset item in ESIP wind up only being on the DataCatalog for a stac entry. e.g. using google rich results test:

https://search.google.com/test/rich-results?url=https%3A%2F%2Fradiantearth.github.io%2Fstac-browser%2F%23%2Fexternal%2Fplanetarycomputer.microsoft.com%2Fapi%2Fstac%2Fv1%2Fcollections%2Fmobi%3F.language%3Den

which will wind up with lots of useful stuff being missed (e.g. spatial coverage, temporal coverage, creator, licence, copyrightHolder, producer, provider, keywords, etc would I think all be cut off from the Dataset search since they aren't properties of the Dataset). Not sure if there's a good way to handle 'inheritance' in this context?

Down the road, it would be really nice if some of the common extensions could also be translated into schema.org. e.g. I think there's a really clean/simple mapping for the scientific citation extension and the table extension into schema.org / ESIP science-on-schema conventions which I'd love to see included. Please let me know if I should open a separate issue for that. Our community may be able to contribute a PR if interested (and I can find who knows javascript well...)

@m-mohr m-mohr modified the milestones: 3.1.0, future May 12, 2023
@m-mohr
Copy link
Collaborator Author

m-mohr commented May 15, 2023

@cboettig Your comments are appreciated, thanks! I don't have the time right now to work on it, but I'll get back to it eventually.

@cboettig
Copy link

Thanks for the heads up and no worries! Appreciate all the amazing work you're doing here.

@m-mohr m-mohr modified the milestones: 3.1.0, 3.2.0 Aug 16, 2023
@m-mohr m-mohr modified the milestones: 3.2.0, 3.3.0 Mar 7, 2024
@m-mohr m-mohr modified the milestones: 3.3.0, 3.4.0 Aug 9, 2024
@m-mohr m-mohr modified the milestones: 3.4.0, 3.5.0 Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants