Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new non-standard FITS extension for ASDF-in-FITS #412

Merged
merged 10 commits into from
Jan 2, 2018
5 changes: 5 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@
- Fix bug that caused serialized FITS tables to be duplicated in embedded ASDF
HDU. [#411]

- Create and use a new non-standard FITS extension instead of ImageHDU for
storing ASDF files embedded in FITS. Explicitly remove support for the
``.update`` method of ``AsdfInFits``, even though it didn't appear to be
working previously. [#412]

1.3.2 (unreleased)
------------------

Expand Down
104 changes: 92 additions & 12 deletions asdf/fits_embed.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,87 @@

try:
from astropy.io import fits
from astropy.io.fits.file import _File
from astropy.io.fits.header import Header, _pad_length
except ImportError:
raise ImportError("AsdfInFits requires astropy")


ASDF_EXTENSION_NAME = 'ASDF'

FITS_SOURCE_PREFIX = 'fits:'


__all__ = ['AsdfInFits']


class _AsdfHDU(fits.hdu.base.NonstandardExtHDU):
"""
A non-standard extension HDU for encapsulating an entire ASDF file within a
single HDU of a container FITS file. These HDUs have an extension (that is
an XTENSION keyword) of ASDF.
"""

_extension = ASDF_EXTENSION_NAME

@classmethod
def from_buff(cls, buff, compress=False, **kwargs):
"""
Creates a new _AsdfHDU from a given AsdfFile object.

Parameters
----------
buff : io.BytesIO
A buffer containing an ASDF metadata tree
compress : bool, optional
Gzip compress the contents of the ASDF HDU
"""

if compress:
buff = gzip.GzipFile(fileobj=buff, mode='wb')

# A proper HDU should still be padded out to a multiple of 2880
# technically speaking
data_length = buff.tell()
padding = (_pad_length(data_length) * cls._padding_byte).encode('ascii')
buff.write(padding)

buff.seek(0)

cards = [
('XTENSION', cls._extension, 'ASDF extension'),
('BITPIX', 8, 'array data type'),
('NAXIS', 1, 'number of array dimensions'),
('NAXIS1', len(buff.getvalue()), 'Axis length'),
('PCOUNT', 0, 'number of parameters'),
('GCOUNT', 1, 'number of groups'),
('COMPRESS', compress, 'Uses gzip compression'),
('EXTNAME', cls._extension, 'Name of ASDF extension'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this right, the name is still hard-coded as ASDF. Is that desirable? I would have thought this should allow multiple ASDF-type extensions, with arbitrary names. Or is this necessary for backwards compatibility?

Copy link
Contributor Author

@drdavella drdavella Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly this is not the most generic design. However, since the AsdfInFits class is expected to be the only user of this HDU class, and since astropy.io.fits locates HDUs in an HDUList based on EXTNAME, it just made sense to set EXTNAME to ASDF here. In other words, this is a matter of convenience and an implementation detail of AsdfInFits, but not necessarily part of the abstract ASDF HDU convention itself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple ASDF extensions could be distinguished via unique EXTVER values, assuming that keyword makes sense in an ASDF extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible, but I would need to investigate more. Currently AsdfInFits expects only a single ASDF extension to be present in a FITS file. Until the JWST pipeline has a use case for multiple ASDF extensions in the same file, I don't think it makes sense to support this.

Copy link
Contributor

@eteq eteq Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, well, I'm not necessarily attached to this specific line as the spot to change it. Are you thinking that if I am a user and want to have an ASDF-in-FITS HDU with the name "WCS_MODEL", I'd be expected to do something like

fits_embed.AsdfInFits(hdulist, tree)
hdulist['ASDF'].header['EXTNAME] = 'WCS_MODEL'

? I'm really just saying I think it would be useful to have the option to do something like fits_embed.AsdfInFits(hdulist, tree, name='WCS_MODEL'), as the typical user is likely to see that as the main place where they set information about the hdu.

Copy link
Contributor

@eteq eteq Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I just realized my mis-conception. I thought it was possible to have multiple asdf trees in one file (I see @drdavella pointed this out above, but I didn't see it until now...). I see now that the "reader" mode only supports a single one. I guess that makes this use case less important. Probably I should make a separate issue about that, but this concern about the naming could wait until then...

Copy link
Contributor Author

@drdavella drdavella Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear: the ASDF HDU is used solely for storing a serialized version of the ASDF metadata tree. AsdfInFits currently expects a one-to-one mapping between FITS files and ASDF trees. The actual data arrays, including any that are used for WCS models or any other purposes, are actually stored in conventional FITS HDUs. The details of this, including the assignment of any EXTNAME are controlled completely by the user. This array data is accessible by any FITS reader.

So it doesn't really make sense to use anything other than ASDF as the EXTNAME for this HDU because the only thing that will ever actually process it is our AsdfInFits code. Its just an implementation detail and anyone who reads the pure FITS file does not need to be concerned with it. Unless there becomes a compelling reason to have more than one ASDF metadata tree in the same FITS file, I don't think it makes sense to change anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, that clears things up. I withdraw my request 😉

]

header = Header(cards)
return cls._readfrom_internal(_File(buff), header=header)


@classmethod
def match_header(cls, header):
card = header.cards[0]
if card.keyword != 'XTENSION':
return False
xtension = card.value
if isinstance(xtension, str):
xtension = xtension.rstrip()
return xtension == cls._extension

# TODO: Add header verification

def _summary(self):
# TODO: Perhaps make this more descriptive...
return (self.name, self.ver, self.__class__.__name__, len(self._header))


fits.register_hdu(_AsdfHDU)


class _FitsBlock(object):
def __init__(self, hdu):
self._hdu = hdu
Expand Down Expand Up @@ -222,9 +294,17 @@ def open(cls, fd, uri=None, validate_checksums=False, extensions=None,
return cls._open_asdf(self, buff, uri=uri, mode='r',
validate_checksums=validate_checksums)

def _create_hdu(self, buff, use_image_hdu):
# Allow writing to old-style ImageHDU for backwards compatibility
if use_image_hdu:
array = np.frombuffer(buff.getvalue(), np.uint8)
return fits.ImageHDU(array, name=ASDF_EXTENSION_NAME)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the use of the ASDF_EXTENSION_NAME value here going to cause the XTENSION keyword to still have a value of "ASDF", even though in this case we want it to be "IMAGE"? Or does "name" in this case set the value of the EXTNAME keyword?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the latter: the name keyword argument will make EXTNAME be ASDF, but the XTENSION will still be IMAGE since we're using ImageHDU.

However, this is the backwards compatible case. In the new case, both XTENSION and EXTNAME will be ASDF.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@jdavies-st jdavies-st Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Howard, here's a comparison of a current DMS NIRISS rate file read in and written out under the new asdf code. Compare the last HDU of the DMS version and the version with the new code:

No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     244   ()      
  1  SCI           1 ImageHDU        48   (2048, 2048)   float32   
  2  ERR           1 ImageHDU        10   (2048, 2048)   float32   
  3  DQ            1 ImageHDU        11   (2048, 2048)   int32 (rescales to uint32)   
  4  VAR_POISSON    1 ImageHDU         9   (2048, 2048)   float32   
  5  VAR_RNOISE    1 ImageHDU         9   (2048, 2048)   float32   
  6  ASDF          1 ImageHDU         7   (5998,)   uint8   
---------------------------------------------------------------
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     244   ()      
  1  SCI           1 ImageHDU        48   (2048, 2048)   float32   
  2  ERR           1 ImageHDU        10   (2048, 2048)   float32   
  3  DQ            1 ImageHDU        11   (2048, 2048)   int32 (rescales to uint32)   
  4  VAR_POISSON    1 ImageHDU         9   (2048, 2048)   float32   
  5  VAR_RNOISE    1 ImageHDU         9   (2048, 2048)   float32   
  6  ASDF          1 NonstandardExtHDU      8   ()      

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the lack of Dimensions and Format info for the new ASDF extension is due to the fact that fits.info no longer knows how to handle this type of extension?

else:
return _AsdfHDU.from_buff(buff)

def _update_asdf_extension(self, all_array_storage=None,
all_array_compression=None,
auto_inline=None, pad_blocks=False):
all_array_compression=None, auto_inline=None,
pad_blocks=False, use_image_hdu=False):
if self.blocks.streamed_block is not None:
raise ValueError(
"Can not save streamed data to ASDF-in-FITS file.")
Expand All @@ -235,27 +315,27 @@ def _update_asdf_extension(self, all_array_storage=None,
all_array_compression=all_array_compression,
auto_inline=auto_inline, pad_blocks=pad_blocks,
include_block_index=False)
array = np.frombuffer(buff.getvalue(), np.uint8)

try:
asdf_extension = self._hdulist[ASDF_EXTENSION_NAME]
except (KeyError, IndexError, AttributeError):
self._hdulist.append(fits.ImageHDU(array, name=ASDF_EXTENSION_NAME))
else:
asdf_extension.data = array
if ASDF_EXTENSION_NAME in self._hdulist:
del self._hdulist[ASDF_EXTENSION_NAME]
self._hdulist.append(self._create_hdu(buff, use_image_hdu))

def write_to(self, filename, all_array_storage=None,
all_array_compression=None, auto_inline=None,
pad_blocks=False, *args, **kwargs):
pad_blocks=False, use_image_hdu=False, *args, **kwargs):
self._update_asdf_extension(
all_array_storage=all_array_storage,
all_array_compression=all_array_compression,
auto_inline=auto_inline, pad_blocks=pad_blocks)
auto_inline=auto_inline, pad_blocks=pad_blocks,
use_image_hdu=use_image_hdu)

self._hdulist.writeto(filename, *args, **kwargs)

def update(self, all_array_storage=None, all_array_compression=None,
auto_inline=None, pad_blocks=False):
raise NotImplementedError(
"In-place update is not currently implemented for ASDF-in-FITS")

self._update_asdf_extension(
all_array_storage=all_array_storage,
all_array_compression=all_array_compression,
Expand Down
89 changes: 79 additions & 10 deletions asdf/tests/test_fits_embed.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,13 @@ def create_asdf_in_fits():

return fits_embed.AsdfInFits(hdulist, tree)

def test_embed_asdf_in_fits_file(tmpdir):
# Testing backwards compatibility ensures that we can continue to read and
# write files that use the old convention of ImageHDU to store the ASDF file.
@pytest.mark.parametrize('backwards_compat', [False, True])
def test_embed_asdf_in_fits_file(tmpdir, backwards_compat):
fits_testfile = str(tmpdir.join('test.fits'))
asdf_testfile = str(tmpdir.join('test.asdf'))

hdulist = fits.HDUList()
hdulist.append(fits.ImageHDU(np.arange(512, dtype=np.float), name='SCI'))
hdulist.append(fits.ImageHDU(np.arange(512, dtype=np.float), name='DQ'))
Expand All @@ -70,24 +76,30 @@ def test_embed_asdf_in_fits_file(tmpdir):
}

ff = fits_embed.AsdfInFits(hdulist, tree)
ff.write_to(os.path.join(str(tmpdir), 'test.fits'))

ff2 = asdf.AsdfFile(tree)
ff2.write_to(os.path.join(str(tmpdir), 'plain.asdf'))
ff.write_to(fits_testfile, use_image_hdu=backwards_compat)

with fits.open(os.path.join(str(tmpdir), 'test.fits')) as hdulist2:
with fits.open(fits_testfile) as hdulist2:
assert len(hdulist2) == 3
assert [x.name for x in hdulist2] == ['SCI', 'DQ', 'ASDF']
assert_array_equal(hdulist2[0].data, np.arange(512, dtype=np.float))
assert hdulist2['ASDF'].data.tostring().strip().endswith(b"...")
asdf_hdu = hdulist2['ASDF']
assert asdf_hdu.data.tostring().startswith(b'#ASDF')
# When in backwards compatibility mode, the ASDF file will be contained
# in an ImageHDU
if backwards_compat:
assert isinstance(asdf_hdu, fits.ImageHDU)
assert asdf_hdu.data.tostring().strip().endswith(b'...')
else:
assert isinstance(asdf_hdu, fits_embed._AsdfHDU)
assert len(asdf_hdu.data) % 2880 == 0

with fits_embed.AsdfInFits.open(hdulist2) as ff2:
assert_tree_match(tree, ff2.tree)

ff = asdf.AsdfFile(copy.deepcopy(ff2.tree))
ff.write_to('test.asdf')
ff.write_to(asdf_testfile)

with asdf.AsdfFile.open('test.asdf') as ff:
with asdf.AsdfFile.open(asdf_testfile) as ff:
assert_tree_match(tree, ff.tree)


Expand All @@ -102,7 +114,10 @@ def test_embed_asdf_in_fits_file_anonymous_extensions(tmpdir):
with fits.open(os.path.join(str(tmpdir), 'test.fits')) as hdulist:
assert len(hdulist) == 4
assert [x.name for x in hdulist] == ['PRIMARY', '', '', 'ASDF']
assert hdulist['ASDF'].data.tostring().strip().endswith(b"...")
asdf_hdu = hdulist['ASDF']
assert isinstance(asdf_hdu, fits_embed._AsdfHDU)
assert asdf_hdu.data.tostring().startswith(b'#ASDF')
assert len(asdf_hdu.data) % 2880 == 0

with fits_embed.AsdfInFits.open(hdulist) as ff2:
assert_tree_match(asdf_in_fits.tree, ff2.tree)
Expand All @@ -114,6 +129,60 @@ def test_embed_asdf_in_fits_file_anonymous_extensions(tmpdir):
assert_tree_match(asdf_in_fits.tree, ff.tree)


@pytest.mark.xfail(
reason="In-place update for ASDF-in-FITS does not currently work")
def test_update_in_place(tmpdir):
tempfile = str(tmpdir.join('test.fits'))

# Create a file and write it out
asdf_in_fits = create_asdf_in_fits()
asdf_in_fits.write_to(tempfile)

# Open the file and add data so it needs to be updated
with fits_embed.AsdfInFits.open(tempfile) as ff:
ff.tree['new_stuff'] = "A String"
ff.update()

# Open the updated file and make sure everything looks okay
with fits_embed.AsdfInFits.open(tempfile) as ff:
assert ff.tree['new_stuff'] == "A String"
assert_tree_match(ff.tree['model'], asdf_in_fits.tree['model'])


def test_update_and_write_new(tmpdir):
tempfile = str(tmpdir.join('test.fits'))
newfile = str(tmpdir.join('new.fits'))

# Create a file and write it out
asdf_in_fits = create_asdf_in_fits()
asdf_in_fits.write_to(tempfile)

# Open the file and add data so it needs to be updated
with fits_embed.AsdfInFits.open(tempfile) as ff:
ff.tree['new_stuff'] = "A String"
ff.write_to(newfile)

# Open the updated file and make sure everything looks okay
with fits_embed.AsdfInFits.open(newfile) as ff:
assert ff.tree['new_stuff'] == "A String"
assert_tree_match(ff.tree['model'], asdf_in_fits.tree['model'])


@pytest.mark.xfail(
reason="ASDF HDU implementation does not currently reseek after writing")
def test_access_hdu_data_after_write(tmpdir):
# There is actually probably not a great reason to support this kind of
# functionality, but I am adding a test here to record the failure for
# posterity.
tempfile = str(tmpdir.join('test.fits'))

asdf_in_fits = create_asdf_in_fits()
asdf_in_fits.write_to(tempfile)
asdf_hdu = asdf_in_fits._hdulist['ASDF']

assert asdf_hdu.data.tostring().startswith('#ASDF')


def test_create_in_tree_first(tmpdir):
tree = {
'model': {
Expand Down
7 changes: 5 additions & 2 deletions docs/asdf/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -475,8 +475,11 @@ ASDF-in-FITS format.

.. runcode:: hidden

with open('content.asdf', 'wb') as fd:
fd.write(hdulist['ASDF'].data.tostring())
from astropy.io import fits

with fits.open('embedded_asdf.fits') as new_hdulist:
with open('content.asdf', 'wb') as fd:
fd.write(new_hdulist['ASDF'].data.tostring())

The special ASDF extension in the resulting FITS file looks like the
following. Note that the data source of the arrays uses the ``fits:``
Expand Down