Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new non-standard FITS extension for ASDF-in-FITS #412

Merged
merged 10 commits into from
Jan 2, 2018

Conversation

drdavella
Copy link
Contributor

@drdavella drdavella commented Dec 14, 2017

The motivation for this update was a problem report from the archive. They pointed out that storing ASDF metadata in an IMAGE HDU goes against convention and would likely lead to incorrect handling of the data when the resulting file is processed by FITS readers.

It was originally proposed to instead use the FOREIGN extension. However, this convention has already been defined, and it seemed too restrictive to have to conform to this convention, especially since there is not already an implementation in astropy.

Instead, it was agreed that we would create a new ASDF extension for storing ASDF metadata in FITS files. This PR contains the implementation of the corresponding HDU, which is nearly identical to the FitsHDU in astropy.io.fits. In this case, the XTENSION keyword is ASDF. The ASDF library writes and expects to read extensions with an EXTNAME of ASDF.

The AsdfInFits implementation has been updated to write new files using the new convention by default. However, it is still capable of reading older files using the IMAGE extension, and it is still possible to optionally write files using the IMAGE extension as well.

I expect that these changes will break a lot of tests in the JWST pipeline, so I will make sure that everyone is on board before hitting the merge button.

This closes #336.

@stsci-bot
Copy link

stsci-bot bot commented Dec 14, 2017

Hi there @drdavella 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog. I help make sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃.

Everything looks good from my point of view! 👍

If there are any issues with this message, please report them here.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 94.975% when pulling 9d996c5 on drdavella:custom-hdu into 63c11ff on spacetelescope:master.


try:
asdf_extension = self._hdulist[ASDF_EXTENSION_NAME]
except (KeyError, IndexError, AttributeError):
self._hdulist.append(fits.ImageHDU(array, name=ASDF_EXTENSION_NAME))
self._hdulist.append(self._create_hdu(buff, use_image_hdu))
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array is not defined here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This was an oversight. I'm not sure it will be quite as simple as replacing the deleted line, but I'll add a unit test to exercise this case and go from there.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 94.974% when pulling 9d996c5 on drdavella:custom-hdu into 63c11ff on spacetelescope:master.

Copy link
Contributor

@bernie-simon bernie-simon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As James said, I don't see why you deleted line 238 in fits_embed.py. It's used at line 245.

# Allow writing to old-style ImageHDU for backwards compatibility
if use_image_hdu:
array = np.frombuffer(buff.getvalue(), np.uint8)
return fits.ImageHDU(array, name=ASDF_EXTENSION_NAME)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the use of the ASDF_EXTENSION_NAME value here going to cause the XTENSION keyword to still have a value of "ASDF", even though in this case we want it to be "IMAGE"? Or does "name" in this case set the value of the EXTNAME keyword?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the latter: the name keyword argument will make EXTNAME be ASDF, but the XTENSION will still be IMAGE since we're using ImageHDU.

However, this is the backwards compatible case. In the new case, both XTENSION and EXTNAME will be ASDF.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@jdavies-st jdavies-st Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Howard, here's a comparison of a current DMS NIRISS rate file read in and written out under the new asdf code. Compare the last HDU of the DMS version and the version with the new code:

No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     244   ()      
  1  SCI           1 ImageHDU        48   (2048, 2048)   float32   
  2  ERR           1 ImageHDU        10   (2048, 2048)   float32   
  3  DQ            1 ImageHDU        11   (2048, 2048)   int32 (rescales to uint32)   
  4  VAR_POISSON    1 ImageHDU         9   (2048, 2048)   float32   
  5  VAR_RNOISE    1 ImageHDU         9   (2048, 2048)   float32   
  6  ASDF          1 ImageHDU         7   (5998,)   uint8   
---------------------------------------------------------------
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     244   ()      
  1  SCI           1 ImageHDU        48   (2048, 2048)   float32   
  2  ERR           1 ImageHDU        10   (2048, 2048)   float32   
  3  DQ            1 ImageHDU        11   (2048, 2048)   int32 (rescales to uint32)   
  4  VAR_POISSON    1 ImageHDU         9   (2048, 2048)   float32   
  5  VAR_RNOISE    1 ImageHDU         9   (2048, 2048)   float32   
  6  ASDF          1 NonstandardExtHDU      8   ()      

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the lack of Dimensions and Format info for the new ASDF extension is due to the fact that fits.info no longer knows how to handle this type of extension?

@jdavies-st
Copy link
Contributor

And it looks like there's really only one change to a test in jwst package that needs to be done to be compatible with this PR, and that's only because the test explicitly checks to see if the ASDF extension is an ImageHDU, something that is an ASDF implementation detail, and really shouldn't be tested on the JWST side anyway. So that particular assert in the test can be removed now to pave the way for this change.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.07%) to 95.095% when pulling d1cd048 on drdavella:custom-hdu into 63c11ff on spacetelescope:master.

@drdavella
Copy link
Contributor Author

I just want to mention that this PR also explicitly "un-implements" the .update method of AsdfInFits objects. I don't think it was working previously, but I have explicitly added a NotImplementedError. I'm not sure this is worth supporting unless I get strong pushback from the JWST folks.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.09%) to 95.07% when pulling 37fb6bd on drdavella:custom-hdu into 63c11ff on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.09%) to 95.071% when pulling 5e54fa3 on drdavella:custom-hdu into 63c11ff on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.09%) to 95.071% when pulling a29dbd0 on drdavella:custom-hdu into 63c11ff on spacetelescope:master.

@eteq
Copy link
Contributor

eteq commented Dec 15, 2017

@drdavella - I was trying to test this out and I ran into a couple of oddities while doing so. It's not clear that it's related to these changes, but it's preventing me from testing this. Specifically, I follow the example from http://asdf.readthedocs.io/en/latest/asdf/examples.html#saving-asdf-in-fits (which looks to be unchanged in this PR) and I get a ValueError: Don't know how to write out ASDF version 1.2.0. Is that expected? I also tried doing the penultimate line as ff = fits_embed.AsdfInFits(hdulist, tree, version='1.1') (or any other version) with no luck...

(What I was actually trying to do was look at the generated fits files without asdf installed to see how things behave. But can't do that without an example.)

# Allow writing to old-style ImageHDU for backwards compatibility
if use_image_hdu:
array = np.frombuffer(buff.getvalue(), np.uint8)
return fits.ImageHDU(array, name=ASDF_EXTENSION_NAME)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the lack of Dimensions and Format info for the new ASDF extension is due to the fact that fits.info no longer knows how to handle this type of extension?

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 95.222% when pulling 0022c16 on drdavella:custom-hdu into 63c11ff on spacetelescope:master.

@drdavella
Copy link
Contributor Author

@eteq ASDF can't be used from the source directory. Try installing the package and doing the same thing somewhere else. I can try to add a better error message for this for the user but I don't think we're going to support this for a variety of reasons that can be discussed elsewhere.

@drdavella
Copy link
Contributor Author

drdavella commented Dec 15, 2017

@hbushouse I just pushed an update that will allow astropy.io.fits to recognize the custom extension, so I think the dimension and format information would be available if this is run again with the latest updates.

[edit] Actually it will only be automatically recognized as an AsdfHDU by astropy.io.fits when being read through ASDF, so it actually won't change anything here.

cc @jdavies-st

@eteq
Copy link
Contributor

eteq commented Dec 15, 2017

FWIW, it turns out the problem I had above was that I hadn't updated git submodule updateed... so not related.

Now that I got it working, I have an observation of the behavior that I think might be a bit surprising. Supposing you have a file embedded_asdf.fits that's got asdf in the final HDU, here's the behavior:

>>> from astropy.io import fits
>>> fits.open('embedded_asdf.fits')[-1]
<astropy.io.fits.hdu.base.NonstandardExtHDU at 0x10cd6df98>
>>> import asdf
>>> fits.open('embedded_asdf.fits')[-1]
<astropy.io.fits.hdu.base.NonstandardExtHDU at 0x111788780>
>>> from asdf import fits_embed
>>> fits.open('embedded_asdf.fits')[-1]
<asdf.fits_embed._AsdfHDU at 0x111793860>

So the class that comes out depends in whether asdf.fits_embed is imported, and I think that would surprise most users. So perhaps the registration should be in the root of asdf instead of in fits_embed? To go even further one could imagine having a note (a COMMENT?) somewhere in the header that explicitly says "if you're using python, you should import asdf" or something like that? Otherwise a novice user might see the first case and just be lost from the beginning.

@drdavella
Copy link
Contributor Author

@eteq very good point, will push an update shortly.

Copy link
Contributor

@eteq eteq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from my comment above, one more thought from an inline review. The rest looks good though!

('PCOUNT', 0, 'number of parameters'),
('GCOUNT', 1, 'number of groups'),
('COMPRESS', compress, 'Uses gzip compression'),
('EXTNAME', cls._extension, 'Name of ASDF extension'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this right, the name is still hard-coded as ASDF. Is that desirable? I would have thought this should allow multiple ASDF-type extensions, with arbitrary names. Or is this necessary for backwards compatibility?

Copy link
Contributor Author

@drdavella drdavella Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly this is not the most generic design. However, since the AsdfInFits class is expected to be the only user of this HDU class, and since astropy.io.fits locates HDUs in an HDUList based on EXTNAME, it just made sense to set EXTNAME to ASDF here. In other words, this is a matter of convenience and an implementation detail of AsdfInFits, but not necessarily part of the abstract ASDF HDU convention itself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple ASDF extensions could be distinguished via unique EXTVER values, assuming that keyword makes sense in an ASDF extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible, but I would need to investigate more. Currently AsdfInFits expects only a single ASDF extension to be present in a FITS file. Until the JWST pipeline has a use case for multiple ASDF extensions in the same file, I don't think it makes sense to support this.

Copy link
Contributor

@eteq eteq Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, well, I'm not necessarily attached to this specific line as the spot to change it. Are you thinking that if I am a user and want to have an ASDF-in-FITS HDU with the name "WCS_MODEL", I'd be expected to do something like

fits_embed.AsdfInFits(hdulist, tree)
hdulist['ASDF'].header['EXTNAME] = 'WCS_MODEL'

? I'm really just saying I think it would be useful to have the option to do something like fits_embed.AsdfInFits(hdulist, tree, name='WCS_MODEL'), as the typical user is likely to see that as the main place where they set information about the hdu.

Copy link
Contributor

@eteq eteq Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I just realized my mis-conception. I thought it was possible to have multiple asdf trees in one file (I see @drdavella pointed this out above, but I didn't see it until now...). I see now that the "reader" mode only supports a single one. I guess that makes this use case less important. Probably I should make a separate issue about that, but this concern about the naming could wait until then...

Copy link
Contributor Author

@drdavella drdavella Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear: the ASDF HDU is used solely for storing a serialized version of the ASDF metadata tree. AsdfInFits currently expects a one-to-one mapping between FITS files and ASDF trees. The actual data arrays, including any that are used for WCS models or any other purposes, are actually stored in conventional FITS HDUs. The details of this, including the assignment of any EXTNAME are controlled completely by the user. This array data is accessible by any FITS reader.

So it doesn't really make sense to use anything other than ASDF as the EXTNAME for this HDU because the only thing that will ever actually process it is our AsdfInFits code. Its just an implementation detail and anyone who reads the pure FITS file does not need to be concerned with it. Unless there becomes a compelling reason to have more than one ASDF metadata tree in the same FITS file, I don't think it makes sense to change anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, that clears things up. I withdraw my request 😉

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 95.225% when pulling 24b0865 on drdavella:custom-hdu into 63c11ff on spacetelescope:master.

@drdavella
Copy link
Contributor Author

@hbushouse, @jdavies-st, @bernie-simon, @nden I'll wait for your call on when this is okay to merge. Based on discussions with @jdavies-st it doesn't sound like this change is necessarily going to be catastrophic for the JWST regression suite, but I want to make sure everyone is ready for it when it gets merged.

@drdavella
Copy link
Contributor Author

According to @nden there is a DMS delivery this week, so we will wait to merge until next week.

@drdavella
Copy link
Contributor Author

@nden just wondering if merging this would be a good way to ring in the new year?

@hbushouse
Copy link

@drdavella I would say go for it. Everything is pretty stable right now and has been since we did the tag/branch for DMS just before Christmas. No other major changes have been merged lately.

@drdavella drdavella merged commit 78524aa into asdf-format:master Jan 2, 2018
@drdavella drdavella deleted the custom-hdu branch January 2, 2018 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use FOREIGN extension instead of IMAGE for ASDF embedded in FITS
6 participants