initial block converter docs

asdf-format · Mar 31, 2023 · 3122d9c · 3122d9c
1 parent 862de68
commit 3122d9c
Show file tree

Hide file tree

Showing 2 changed files with 88 additions and 1 deletion.
diff --git a/asdf/extension/_converter.py b/asdf/extension/_converter.py
@@ -169,7 +169,7 @@ def reserve_blocks(self, obj, tag, ctx):
             The context of the current serialization request.
 
         Returns
-        ------
+        -------
         keys : list of unique hashable keys
             These keys will be used to reserve blocks for later use
         """

diff --git a/docs/asdf/extending/converters.rst b/docs/asdf/extending/converters.rst
@@ -280,6 +280,93 @@ With this modification we can successfully deserialize our ASDF file:
 
     assert reconstituted_f1.inverse.inverse is reconstituted_f1
 
+.. _extending_converter_block_storage:
+
+Block storage
+=============
+
+As described in the :ref:`extending_converters` Converters can return complex objects that will
+be pass to other Converters. If a Converter returns a ndarray, ASDF will recognize this
+array and store it in an ASDF block.
+
+For applications where this isn't possible or more control of the block storage is required
+the Converters allow for more flexible block storage through use of the ``SerializationContext``
+provided as an argument to `Converter.to_yaml_tree` `Converter.from_yaml_tree` and `Converter.select_tag`.
+
+A simple example of a Converter using block storage to store the ``payload`` for
+``BlockData`` object instances is as follows:
+
+.. runcode::
+
+    import asdf
+    import numpy as np
+    from asdf.extension import Converter, Extension
+
+    class BlockData:
+        def __init__(self, payload):
+            self.payload = payload
+
+
+    class BlockConverter(Converter):
+        tags = ["asdf://somewhere.org/tags/block_data-1.0.0"]
+        types = [BlockData]
+
+        def to_yaml_tree(self, obj, tag, ctx):
+            block_index = ctx.find_block_index(
+                id(obj),
+                lambda: np.ndarray(len(obj.payload), dtype="uint8", buffer=obj.payload),
+            )
+            return {"block_index": block_index}
+
+        def from_yaml_tree(self, node, tag, ctx):
+            block_index = node["block_index"]
+            data = ctx.load_block(block_index, by_index=True)
+            obj = BlockData(data.tobytes())
+            ctx.assign_block_key(block_index, id(obj))
+            return obj
+
+        def reserve_blocks(self, obj, tag, ctx):
+            return [id(obj)]
+
+    class BlockExtension(Extension):
+        tags = ["asdf://somewhere.org/tags/block_data-1.0.0"]
+        converters = [BlockConverter()]
+        extension_uri = "asdf://somewhere.org/extensions/block_data-1.0.0"
+
+    with asdf.config_context() as cfg:
+        cfg.add_extension(BlockExtension())
+        ff = asdf.AsdfFile({"example": BlockData(b"abcdefg")})
+        ff.write_to("block_converter_example.asdf")
+
+.. asdf:: block_converter_example.asdf
+
+To discuss the above example, it is helpful to first review some details of how ASDF
+:ref:`stores block <asdf-standard:block>`. Blocks are stored sequentially within a
+ASDF file following the YAML tree. Converters can read and write these blocks
+based on the index of the block within the file.
+
+During read (``Converter.from_yaml_tree``) data for a specific block can be read
+by providing the ``index`` of the block to ``SerializationContext.load_block`` (and setting the
+``by_index`` argument to True). ``Converter.from_yaml_tree`` returns the deserialized custom
+object that will be placed in the ``AsdfFile.tree``. ``SerializationContext.assign_block_key``
+should also be called during ``Converter.from_yaml_tree`` to allow ASDF to associate
+a unique hashable key with any block used during conversion of this object. This is
+important as the ordering of blocks in memory might change during an update in place.
+Furthermore, This key can be used to lazily load block data by later calling
+``SerializationContext.load_block`` with the assigned key.
+
+During write, ``Converter.to_yaml_tree`` can prepare data to be stored in a block
+by calling ``SerializationContext.find_block_index`` to find the location of an
+available block. ``SerializationContext.find_block_index`` should be called with a
+hashable key unique to this object (and the same as the key used during reading)
+and a callback function that accepts no arguments and returns the ndarray to save
+within the block.
+
+A Converter that uses block storage must also define ``Converter.reserve_blocks``.
+``Converter.reserve_blocks`` will be called during memory management to free
+resources for unused blocks and allocate. ``Converter.reserve_blocks`` must
+return a list of keys associated with the object provided as the first argument.
+
 .. _extending_converters_performance:
 
 Entry point performance considerations