Skip to content

Commit

Permalink
initial block converter docs
Browse files Browse the repository at this point in the history
  • Loading branch information
braingram committed Mar 31, 2023
1 parent 862de68 commit 3122d9c
Show file tree
Hide file tree
Showing 2 changed files with 88 additions and 1 deletion.
2 changes: 1 addition & 1 deletion asdf/extension/_converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ def reserve_blocks(self, obj, tag, ctx):
The context of the current serialization request.
Returns
------
-------
keys : list of unique hashable keys
These keys will be used to reserve blocks for later use
"""
Expand Down
87 changes: 87 additions & 0 deletions docs/asdf/extending/converters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,93 @@ With this modification we can successfully deserialize our ASDF file:
assert reconstituted_f1.inverse.inverse is reconstituted_f1
.. _extending_converter_block_storage:

Block storage
=============

As described in the :ref:`extending_converters` Converters can return complex objects that will
be pass to other Converters. If a Converter returns a ndarray, ASDF will recognize this
array and store it in an ASDF block.

For applications where this isn't possible or more control of the block storage is required
the Converters allow for more flexible block storage through use of the ``SerializationContext``
provided as an argument to `Converter.to_yaml_tree` `Converter.from_yaml_tree` and `Converter.select_tag`.

A simple example of a Converter using block storage to store the ``payload`` for
``BlockData`` object instances is as follows:

.. runcode::

import asdf
import numpy as np
from asdf.extension import Converter, Extension

class BlockData:
def __init__(self, payload):
self.payload = payload


class BlockConverter(Converter):
tags = ["asdf://somewhere.org/tags/block_data-1.0.0"]
types = [BlockData]

def to_yaml_tree(self, obj, tag, ctx):
block_index = ctx.find_block_index(
id(obj),
lambda: np.ndarray(len(obj.payload), dtype="uint8", buffer=obj.payload),
)
return {"block_index": block_index}

def from_yaml_tree(self, node, tag, ctx):
block_index = node["block_index"]
data = ctx.load_block(block_index, by_index=True)
obj = BlockData(data.tobytes())
ctx.assign_block_key(block_index, id(obj))
return obj

def reserve_blocks(self, obj, tag, ctx):
return [id(obj)]

class BlockExtension(Extension):
tags = ["asdf://somewhere.org/tags/block_data-1.0.0"]
converters = [BlockConverter()]
extension_uri = "asdf://somewhere.org/extensions/block_data-1.0.0"

with asdf.config_context() as cfg:
cfg.add_extension(BlockExtension())
ff = asdf.AsdfFile({"example": BlockData(b"abcdefg")})
ff.write_to("block_converter_example.asdf")

.. asdf:: block_converter_example.asdf

To discuss the above example, it is helpful to first review some details of how ASDF
:ref:`stores block <asdf-standard:block>`. Blocks are stored sequentially within a
ASDF file following the YAML tree. Converters can read and write these blocks
based on the index of the block within the file.

During read (``Converter.from_yaml_tree``) data for a specific block can be read
by providing the ``index`` of the block to ``SerializationContext.load_block`` (and setting the
``by_index`` argument to True). ``Converter.from_yaml_tree`` returns the deserialized custom
object that will be placed in the ``AsdfFile.tree``. ``SerializationContext.assign_block_key``
should also be called during ``Converter.from_yaml_tree`` to allow ASDF to associate
a unique hashable key with any block used during conversion of this object. This is
important as the ordering of blocks in memory might change during an update in place.
Furthermore, This key can be used to lazily load block data by later calling
``SerializationContext.load_block`` with the assigned key.

During write, ``Converter.to_yaml_tree`` can prepare data to be stored in a block
by calling ``SerializationContext.find_block_index`` to find the location of an
available block. ``SerializationContext.find_block_index`` should be called with a
hashable key unique to this object (and the same as the key used during reading)
and a callback function that accepts no arguments and returns the ndarray to save
within the block.

A Converter that uses block storage must also define ``Converter.reserve_blocks``.
``Converter.reserve_blocks`` will be called during memory management to free
resources for unused blocks and allocate. ``Converter.reserve_blocks`` must
return a list of keys associated with the object provided as the first argument.

.. _extending_converters_performance:

Entry point performance considerations
Expand Down

0 comments on commit 3122d9c

Please sign in to comment.