Add Euclid HATS tutorial #108

troyraen · 2025-06-05T02:15:11Z

Ready for review, but should not be merged before the Euclid HATS catalog is released publicly.

Adds a notebook introducing the Euclid Q1 HATS product. ~~The dataset is currently in a testing bucket that is available from Fornax and IPAC networks only.~~ The dataset has been released publicly. nasa-fornax/fornax-demo-notebooks#416.

Co-authored-by: Brigitta Sipőcz <b.sipocz@gmail.com>

Co-authored-by: Jaladh Singhal <jaladhsinghal@gmail.com> Co-authored-by: Brigitta Sipőcz <b.sipocz@gmail.com>

bsipocz · 2025-06-11T21:53:08Z

.binder/requirements.txt

@@ -23,6 +23,9 @@ fsspec
 sep>=1.4
 h5py
 requests
+hats>=0.5.2
+lsdb>=0.5.2
+pyerfa>=2.0.1.3


do we need this?

Only if the notebook actually ends up using lsdb (currently, it does not). I'll prune the dependencies once that is known.

bsipocz · 2025-06-11T22:09:42Z

FYI: The new HATS requirements adds a very tight limit on the version tolerance for our dependencies, so I will need to rethink how we do CI here. It points beyond this PR, so I will just push the CI workarounds, but will revise CI approaches for notebooks that are relying on latest and greatest features and libraries.

bsipocz · 2025-06-11T22:47:07Z

This now nicely triggered the need to update our CI approaches as hats introduces a very tight version requirements that would be nice not to enforce for all the other notebooks.
But this all points beyond any reasonable scope in here, and I will address it separately.

troyraen · 2025-06-12T01:23:07Z

Thanks @bsipocz. Big picture, yes, we will need to require hats>=0.5.2 and lsdb>=0.5.2 in order to use lsdb with IRSA's HATS products. This notebook doesn't actually use lsdb right now, but it might before it's finished. If you'd prefer, I can just remove those dependencies for now and deal with adding them back later if/when needed.

troyraen · 2025-06-16T19:07:46Z

Applied some feedback from @vandesai1. Remaining is:

I sort of feel like the Appendix should be part of the notebook and earlier. I think a lot of astronomers don't know what "Schema" means.
Should have a bit more description of how it was joined, and if there are any limitations if they exist.
Learning goals:
- compare to learning goals in "traditional" SPE notebook to look for balance/intentionally.
- should include understand why we want to use this format: pros/cons versus other methods.
- I see three goals for this material: (a) show the format/tools for the format; (b) make users trust that this new data product; (c) show users some of the ins and outs of Euclid catalog data. Tiffany and Anahita are going to provide additional reviews with these goals in mind.

jaladh-singhal

Looks mostly good to me! Pyarrow syntax throughout the notebook looks easy to follow thanks to your comments and narrative.

Side note: From the HATS structure aspect, your notebook clearly demonstrates how to access multiple columns without needing to perform joins and how to load only slice of it in memory (using pyarrow). But spatial filtering (where hpgeom and/or lsdb comes into picture) isn't understandably demonstrated due to the size of this notebook.

Related to this, I see your note to remove section 3.5 - perhaps this can become a second notebook which can also do cross-matching of Euclid q1 with some other catalogs - maybe ZTF? This relates to discussion we had at https://github.com/IPAC-SW/ipac-sp-notebooks/pull/104 (and outline of such a tutorial). Let me know if you have a clear science use case in mind that can be pursued here.