Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ndarray conversion to a Converter #1537

Merged
merged 154 commits into from
Sep 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
154 commits
Select commit Hold shift + click to select a range
4dc65a9
move asdf.block to asdf._block
braingram Apr 10, 2023
e100e6b
remove deprecated AsdfFile.blocks
braingram Apr 10, 2023
e807293
add components to break apart block manager
braingram Apr 11, 2023
71ac3e7
add read/write block index
braingram Apr 12, 2023
c1768bd
add block reader/writer
braingram Apr 13, 2023
e64f8a3
add _block.LinearStore for storing read blocks
braingram Apr 13, 2023
fc6befc
fix _block.store.LinearStore.__len__
braingram Apr 14, 2023
12e4c3f
break update
braingram Apr 14, 2023
de3e776
messy write support
braingram Apr 19, 2023
7944f03
fix resolve_and_inline
braingram Apr 19, 2023
07e06e5
stream blocks working
braingram Apr 20, 2023
b912b78
external blocks working
braingram Apr 21, 2023
eecd862
add block data caching for ndarraytype
braingram Apr 27, 2023
e610dd6
convert ndarray to using callback instead of raw block access
braingram Apr 27, 2023
9aae606
remove cludges added to new block manager
braingram Apr 27, 2023
8d4e686
reduce usage of asdffile reference in ndarray
braingram Apr 27, 2023
4d7183a
associate objects with write blocks
braingram Apr 28, 2023
a944de6
associate streamed block with object
braingram Apr 28, 2023
2a22176
Revert "break update"
braingram Apr 28, 2023
3eb3bff
update working
braingram Apr 28, 2023
aab0522
output compression extension list working
braingram Apr 28, 2023
081dd43
fix block converter tests
braingram Apr 28, 2023
ca364ef
rebased, broke 2 tests
braingram Apr 28, 2023
8cf8024
add tests for issues
braingram May 1, 2023
2c4f684
prevent modification of tree or blocks during write/validate
braingram May 1, 2023
afe172d
minor reorganization of 1013 test
braingram May 1, 2023
58c0a93
external block fixes
braingram May 1, 2023
d462b31
remove old block manager
braingram May 2, 2023
3d92597
increasing _block test coverage
braingram May 2, 2023
17d9d51
cleaning up _blocks usage in AsdfFile.write_to
braingram May 2, 2023
517cb14
drop asdffile from __init__ args to NDArrayType
braingram May 2, 2023
f7cc999
cleanup some comments
braingram May 2, 2023
f68f5ee
enable block only files
braingram May 2, 2023
f9da319
clear _write_fd during block manager _clear_write
braingram May 2, 2023
ef038f8
make serialization context aware of current object
braingram May 2, 2023
27df3a7
add _block/callback tests
braingram May 2, 2023
df704d7
add ndarray converter (external blocks broken)
braingram May 2, 2023
a3b4085
external blocks working again
braingram May 3, 2023
2c38900
update diff to not use NDArrayType directly
braingram May 3, 2023
4a26d7a
clean up NDArrayType, xfail 1530 test
braingram May 3, 2023
91024bb
fix failing test from rebase
braingram May 3, 2023
f955c93
remove Block usage in pytest plugin
braingram May 3, 2023
bcbd30d
Allow converter to handle subclasses
braingram May 3, 2023
17ae69f
fixes for remote_data tests
braingram May 3, 2023
ff8dbb3
restrict subclass conversion to ndarray
braingram May 4, 2023
4f41246
fix loading of empty inline arrays
braingram May 4, 2023
f29597e
fix config doctests
braingram May 4, 2023
c3404bf
catch old FutureWarning from numpy in tests
braingram May 4, 2023
933619e
seek to 0 before update shortcuts
braingram May 4, 2023
0dada44
allow file truncation on windows
braingram May 5, 2023
169482f
update array equality testing in tests
braingram May 5, 2023
7259e01
reset file position after memmapping on windows
braingram May 5, 2023
18d0bf7
attempt to fix array equality for older numpy
braingram May 5, 2023
4c388f5
fix reading for inline structured arrays
braingram May 5, 2023
fd56a32
add xfailed test for 1539
braingram May 8, 2023
43c57b8
increase number of tries to get reused memory
braingram May 8, 2023
7173631
fix validation of checksums on read
braingram May 8, 2023
e7018ad
add test for and fix asdf.stream deprecation
braingram May 8, 2023
5af8198
remove fd from generate_write_header args
braingram May 8, 2023
3a3c5bb
compute len of block magic from value instead of hard coding
braingram May 8, 2023
6fc6a4e
fix write for non-seekable files
braingram May 8, 2023
5141cc9
mock id for _block.store testing
braingram May 8, 2023
e4fb181
add _Operations for SerializationContext
braingram May 10, 2023
06550cb
don't assign blocks if conversion raised error
braingram May 10, 2023
6918225
add multi-block example update converter docs
braingram May 10, 2023
70e441c
fix multi-block converter example in docs
braingram May 10, 2023
42b6f89
add try/finally to update and copy blocks in block_size increments
braingram May 10, 2023
244522e
skip 1542 test on windows
braingram May 11, 2023
2eb07c9
add eq and copy to _block.Key
braingram May 11, 2023
21e3c9b
move block reading to Manager.read
braingram May 11, 2023
0f47eec
store uri in _block.Manager instead of AsdfFile
braingram May 11, 2023
07f1685
reduce code duplication in AsdfFile.update
braingram May 11, 2023
ef30d83
move bulk of AsdfFile.update into _blocks.Manager.update
braingram May 11, 2023
0eaf181
rename _streamed_block to _streamed_write_block
braingram May 11, 2023
b64f982
restore AsdfFile.version after write_to
braingram May 11, 2023
e871200
change _block.key.Key matches to matches_object
braingram May 11, 2023
10f3b57
fix handling of io_block_size
braingram May 11, 2023
bfed156
skip checksum validation for streamed blocks
braingram May 11, 2023
17b8652
add unit tests for _block.external
braingram May 11, 2023
27ddde4
move external block uri resolution to _block.external
braingram May 11, 2023
52d0e5d
fix missing uri check for external blocks
braingram May 11, 2023
aede9f3
remove reserve_blocks
braingram May 11, 2023
c0d75a4
simplify ndarray shape loading and fix error message
braingram May 12, 2023
6942107
move SerializationContext tests
braingram May 12, 2023
1bd9988
update SerializationContext tests
braingram May 12, 2023
c444093
allow >1 stream if the same block
braingram May 12, 2023
d07dc95
fix external cache test for windows
braingram May 12, 2023
5bb7d7a
adjust key assignment during deserialization
braingram May 12, 2023
d304575
add _block.manager unit tests
braingram May 15, 2023
8ba435e
make DataCallback.reassign private
braingram May 15, 2023
a0ddcce
update docs off _block submodule
braingram May 15, 2023
69cd693
make all block Key methods private
braingram May 15, 2023
2ef0dbb
support 'input' compression after overwriting compression
braingram May 15, 2023
a32bffb
update comment about input compression
braingram May 15, 2023
70f99cf
simplify ReadBlocks to a subclass of UserList
braingram May 15, 2023
dfb0d49
update _block.manager docstrings
braingram May 16, 2023
6676e31
update docs
braingram May 16, 2023
cc862db
add docs for subclass conversion config setting
braingram May 16, 2023
e0ed2cb
update changelog
braingram May 16, 2023
a3bdd90
update changes
braingram May 16, 2023
0fcff14
index write blocks by data
braingram May 17, 2023
a991c4e
remove _serialization_context._Operation context manager feature
braingram May 17, 2023
e07ea0d
add external block UseInternalType
braingram May 30, 2023
4e3a7aa
remove unused config_context in block options
braingram May 30, 2023
d7032b1
change key copy to match stype of options copy
braingram May 30, 2023
b227f9f
remove out-of-date block options comments
braingram May 30, 2023
ca0b9f4
remove leftover name and types from Stream
braingram May 30, 2023
067c650
check for valid key on Store.assign_object
braingram May 30, 2023
47a17f4
avoid fast_forwarding using header values for streamed blocks
braingram May 30, 2023
31568c0
remove NDArrayType subclassing from Stream
braingram May 30, 2023
254c165
change generate_write_block_header to return dict instead of packed h…
braingram May 30, 2023
4caae1a
rename and clarify _block.external.relative_uri_to_index
braingram May 30, 2023
b98f428
test for and handle files will odd 'tell' results
braingram May 30, 2023
361f69d
remove LinearStore
braingram May 31, 2023
c1e7975
add unit test for Store.keys_for_value
braingram May 31, 2023
c780a05
remove unused WriteBlocks functions add docs
braingram May 31, 2023
df8e961
add custom exception BlockIndexError for block index reading/parsing
braingram May 31, 2023
843fb8c
get version string from serialization context
braingram Jun 5, 2023
89974b1
select SerializationContext operation at creation
braingram Jun 5, 2023
930123a
remove unneeded code
braingram Jun 5, 2023
7736837
add warning when <4 non-null bytes after blocks
braingram Jun 6, 2023
550ae6d
preserve block index during asdftool edit
braingram Jun 6, 2023
67044aa
re-add SerializationContext to asdf.asdf for weldx
braingram Jun 6, 2023
c114ad2
allow external block memory mapping
braingram Jun 6, 2023
2c739db
re-add block manager close
braingram Jun 6, 2023
92a0ddb
clear SerializationContext object after block assignment
braingram Jun 6, 2023
7722296
fix external block loading over http
braingram Jun 7, 2023
b9c7ac8
fix file url parsing for external blocks on windows
braingram Jun 7, 2023
34f74fa
remove use of assert_roundtrip_tree in ndarray tests
braingram Jun 8, 2023
eb944b9
attempt to fix FutureWarning for array comparsion with older numpy
braingram Jun 8, 2023
cc2fd1f
rename some local variables to avoid confusion
braingram Aug 9, 2023
c7fb962
fix stream deprecation test
braingram Aug 9, 2023
335b678
fix reference uri resolution
braingram Aug 9, 2023
b6adf61
run pre-commit locally
braingram Aug 9, 2023
d36b929
ignore hypothesis files
braingram Aug 9, 2023
bb9a98c
move SerializationContext out of asdf.extension
braingram Aug 9, 2023
7cc553c
keep SerializationContext exposed at asdf.asdf
braingram Aug 9, 2023
c224590
temporarily use dev sphinx-asdf
braingram Aug 9, 2023
4148331
move SerializationContext back into asdf.extension
braingram Aug 9, 2023
ab67c48
deprecate import of asdf.asdf.SerializationContext
braingram Aug 9, 2023
1802597
move _issues tests to _regtests and rename tests
braingram Aug 9, 2023
e6e3c44
typo prevented dev sphinx_asdf install
braingram Aug 9, 2023
428dc9e
move write_to version reset into finally
braingram Aug 9, 2023
75e005d
add missing BlockAccess docstring
braingram Aug 9, 2023
f740292
add asdf.asdf.SerializationContext import deprecation to docs
braingram Aug 9, 2023
5a143c0
add parametrization to 1525 regression test
braingram Aug 14, 2023
adf9a99
remove unneeded line
braingram Aug 14, 2023
6651672
simplify ndarray subclass handling
braingram Aug 14, 2023
3a59907
remove unnecessary assign_object(None)
braingram Aug 14, 2023
2caf2c9
add warnings to failed block index reading
braingram Aug 14, 2023
9bf92fe
remove unnecessary config context
braingram Aug 14, 2023
688617a
add test_update_compressed_blocks
braingram Aug 14, 2023
1534bd9
add AsdfBlockIndexWarning
braingram Aug 14, 2023
8672815
remove sphinx-asdf dev version requirement
braingram Sep 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,6 @@ asdf/_version.py

# airspeed velocity files
.asv

# hypothesis files
.hypothesis
4 changes: 4 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ The ASDF Standard is at v1.6.0
AsdfFile.write_to and AsdfFile.update kwargs [#1592]
- Fix ``AsdfFile.info`` loading all array data [#1572]
- Blank out AsdfFile.tree on close [#1575]
- Move ndarray to a converter, add ``convert_unknown_ndarray_subclasses``
to ``asdf.config.AsdfConfig``, move ``asdf.Stream`` to
``asdf.tags.core.Stream``, update block storage support for
Converter and update internal block API [#1537]

2.15.1 (2023-08-07)
-------------------
Expand Down
3 changes: 1 addition & 2 deletions asdf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,5 @@
from .asdf import open_asdf as open
from .config import config_context, get_config
from .exceptions import ValidationError
from .stream import Stream
from .tags.core import IntegerType
from .tags.core import IntegerType, Stream
from .tags.core.external_reference import ExternalArrayReference
61 changes: 61 additions & 0 deletions asdf/_block/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
"""
Submodule for reading and writing ASDF blocks.

The primary interface to this submodule is ``_block.manager.Manager``
that in some ways mimics the older ``BlockManager``. An instance
of ``Manager`` will be created by each `asdf.AsdfFile` instance.

Internally, this submodule is broken up into:
- low-level:
- ``io``: functions for reading and writing blocks
- ``key``: ``Key`` used to implement ``Store`` (see below)
- ``store``: ``Store`` special key-value store for indexing blocks
- medium-level:
- ``reader``: ``ReadBlock`` and ``read_blocks``
- ``writer``: ``WriteBlock`` and ``write_blocks``
- ``callback``: ``DataCallback`` for reading block data
- ``external``: ``ExternalBlockCache`` for reading external blocks
- ``options``: ``Options`` controlling block storage
- high-level:
- ``manager``: ``Manager`` and associated classes


The low-level ``io`` functions are responsible for reading and writing
bytes compatible with the block format defined in the ASDF standard.
These should be compatible with as wide a variety of file formats as possible
including files that are:
- seekable and non-seekable
- memory mappable
- accessed from a remote server
- stored in memory
- etc

To help organize ASDF block data the ``key`` and ``store`` submodules
provide a special key-value store, ``Store``. ``Store`` uses ``Key``
instances to tie the lifetime of values to the lifetime of objects
in the ASDF tree (without keeping references to the objects) and
allows non-hashable objects to be used as keys. See the ``key``
submodule docstring for more details. One usage of ``Store`` is
for managing ASDF block ``Options``. ``Options`` determine where
and how array data will be written and a single ``Options`` instance
might be associated with several arrays within the ASDF tree
(if the arrays share the same base array). By using a ``Key`` generated
with the base array the block ``Options`` can be stored in a ``Store``
without keeping a reference to the base array and these ``Options``
will be made unavailable if the base array is garbage collected (so
they are not inapproriately assigned to a new array).

The medium-level submodules ``reader`` and ``writer`` each define
a helper class and function for reading or writing blocks:
- ``ReadBlock`` and ``WriteBlock``
- ``read_blocks`` and ``write_blocks``
These abstract some of the complexity of reading and writing blocks
using the low-level API and are the primary means by which the ``Manager``
reads and writes ASDF blocks. Reading of external blocks by the ``Manager``
requires some special handling which is contained in the ``external``
submodule.

To allow for lazy-loading of ASDF block data, ``callback`` defines
``DataCallback`` which allows reading block data even after the blocks
have been rearranged following an update-in-place.
"""
43 changes: 43 additions & 0 deletions asdf/_block/callback.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
"""
A `DataCallback` class is implemented here to allow
for reassignment of the index of an ASDF block corresponding
to a callback.

This is needed so that extension code can generate a callback
during deserialization of an ASDF file that will continue
to be valid even after an `AsdfFile.update` which might
reorder blocks.

To allow for 'low-level' block access needed for ndarray
`DataCallback` can be called with an optional ``_attr``
argument to cache data, access the block header and other
operations that we generally do not want to expose to
extension code.
"""
import weakref


class DataCallback:
"""
A callable object used to read data from an ASDF block
read from an ASDF file.
"""

def __init__(self, index, read_blocks):
self._reassign(index, read_blocks)

def __call__(self, _attr=None):
read_blocks = self._read_blocks_ref()
if read_blocks is None:
msg = "Attempt to read block data from missing block"
raise OSError(msg)
if _attr is None:
return read_blocks[self._index].data
else:
# _attr allows NDArrayType to have low level block access for things
# like reading the header and cached_data
return getattr(read_blocks[self._index], _attr)

def _reassign(self, index, read_blocks):
self._index = index
self._read_blocks_ref = weakref.ref(read_blocks)
4 changes: 4 additions & 0 deletions asdf/_block/exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
class BlockIndexError(Exception):
"""
An error occurred while reading or parsing an ASDF block index
"""
64 changes: 64 additions & 0 deletions asdf/_block/external.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
"""
For external blocks, the previous block management
would cache data opened from external files (to return the
same underlying ndarray if the same external block
was referenced more than once). `ExternalBlockCache` is
used here to allow for the same behavior without requiring
the block manager to have a reference to the `AsdfFile`
(that references the block manager).
"""
import os
import urllib

import numpy as np

from asdf import generic_io, util


class UseInternalType:
pass


UseInternal = UseInternalType()


class ExternalBlockCache:
def __init__(self):
self.clear()

def load(self, base_uri, uri, memmap=False, validate_checksums=False):
key = util.get_base_uri(uri)
if key not in self._cache:
resolved_uri = generic_io.resolve_uri(base_uri, uri)
if resolved_uri == "" or resolved_uri == base_uri:
return UseInternal

from asdf import open as asdf_open

with asdf_open(
resolved_uri, "r", lazy_load=False, copy_arrays=True, validate_checksums=validate_checksums
) as af:
blk = af._blocks.blocks[0]
if memmap and blk.header["compression"] == b"\0\0\0\0":
parsed_url = util.patched_urllib_parse.urlparse(resolved_uri)
if parsed_url.scheme == "file":
# deal with leading slash for windows file://
filename = urllib.request.url2pathname(parsed_url.path)
arr = np.memmap(filename, np.uint8, "r", blk.data_offset, blk.cached_data.nbytes)
else:
arr = blk.cached_data
else:
arr = blk.cached_data
self._cache[key] = arr
return self._cache[key]

def clear(self):
self._cache = {}


def relative_uri_for_index(uri, index):
# get the os-native separated path for this uri
path = util.patched_urllib_parse.urlparse(uri).path
dirname, filename = os.path.split(path)
eslavich marked this conversation as resolved.
Show resolved Hide resolved
filename = os.path.splitext(filename)[0] + f"{index:04d}.asdf"
return filename
Loading
Loading