Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy parser parameter table breaks if there is an empty line between parameters #167

Closed
machow opened this issue Jun 6, 2023 · 5 comments

Comments

@machow
Copy link
Contributor

machow commented Jun 6, 2023

Describe the bug

From what I can tell, the numpydoc parser supports empty lines between parameter descriptions. e.g.

"""
Parameters
------------
x: int
    a thing
                                         <--- note this empty line
y: float
    another thing
"""

However, with griffe's numpy parser this triggers the end of the parameters table.

One possible solution

If we instead use the following rule to indicate a new section, it should be able to keep the table intact, without changing the current behavior too much.

  • detect directly the next section of documentation (e.g. line has words, and the next line starts with ---)

Breaking cases

Note that with this solution, these kinds of docstrings from the unit tests could be parsed differently:

Returns
-------
only_item : type
    Description.



Something.

Currently, these get parsed as 2 sections (a Returns and a generic Text section). However, AFAICT, the numpydoc parser just considers the "Something." to be part of the Returns section, since sections must be separated by headings.

from numpydoc.docscrape import NumpyDocString

docstring = """
Returns
-------
only_item : type
    Description.



Something.
"""

doc = NumpyDocString(docstring)
list(doc.items())

WDYT? Happy to find the right balance of current behavior / supporting table description variations!

@pawamoy
Copy link
Member

pawamoy commented Jun 6, 2023

Thanks for the report!

I find it weird that

Returns
-------
only_item : type
    Description.



Something.

...would be parsed as a single Returns section. How do you intertwine prose and semantic sections? Is that not possible with Numpydoc's spec? And how should Something. be integrated within the Returns section? I think supporting blank lines in-between items is very niche. IMO it costs nothing to just remove the blank line, and it even makes the docstring more readable.

However this is just my opinion, and I'm fine with anything that matches the spec: I'm not using this docstring style after all, so those who actually use it know better 🙂 Reading again the spec at https://numpydoc.readthedocs.io/en/latest/format.html, it seems indeed that prose is not expected in-between sections, and therefore headings (and dash lines) are the true sections delimiters, not blank lines.

@machow
Copy link
Contributor Author

machow commented Jun 8, 2023

Hmm... I dug a bit deeper, and it looks like sphinx does allow including extra narrative (to a degree).

It seems like the rule they use to end a section (sphinx code here) is...

  • Encounter a new section, OR
  • 2 empty lines in the table block

So maybe this could be a nice compromise? I didn't realize that numpydoc and sphinx both have parsers :/

@pawamoy
Copy link
Member

pawamoy commented Jun 8, 2023

Parsers, parsers everywhere!

Thanks for investigating, the double blank line seems like a good compromise indeed. That's a breaking change though 🤔

@machow
Copy link
Contributor Author

machow commented Jun 8, 2023

That's a breaking change though 🤔

Yeah, it's a tough spot :/. For what it's worth, I think there are libraries using sphinx in the wild that that include empty lines in their parameter tables. I haven't seen people add narrative to sections like parameters, so am in favor of breaking changes there 😬 (but if it's helpful, we could also refactor the numpy parser to be a class, and then I could subclass it as a new parser implementation somewhere?).

Here's an example docstring from the pandas DataFrame docs:

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as a dict-like container for Series objects. The primary
pandas data structure.

Parameters
----------
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
    Dict can contain Series, arrays, constants, dataclass or list-like objects. If
    data is a dict, column order follows insertion-order. If a dict contains Series
    which have an index defined, it is aligned by its index. This alignment also
    occurs if data is a Series or a DataFrame itself. Alignment is done on
    Series/DataFrame inputs.

    If data is a list of dicts, column order follows insertion-order.

index : Index or array-like
    Index to use for resulting frame. Will default to RangeIndex if
    no indexing information part of input data and no index provided.
columns : Index or array-like
    Column labels to use for resulting frame when data does not have them,
    defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,
    will perform column selection instead.
dtype : dtype, default None
    Data type to force. Only a single dtype is allowed. If None, infer.
copy : bool or None, default None
    Copy data from inputs.
    For dict data, the default of None behaves like ``copy=True``.  For DataFrame
    or 2d ndarray input, the default of None behaves like ``copy=False``.
    If data is a dict containing one or more Series (possibly of different dtypes),
    ``copy=False`` will ensure that these inputs are not copied.

    .. versionchanged:: 1.3.0

...

numpy.array (maybe less of an issue, since empty lines are in parameter descriptions):

array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
      like=None)

Create an array.

Parameters
----------
object : array_like
    An array, any object exposing the array interface, an object whose
    __array__ method returns an array, or any (nested) sequence.
    If object is a scalar, a 0-dimensional array containing object is
    returned.
dtype : data-type, optional
    The desired data-type for the array.  If not given, then the type will
    be determined as the minimum type required to hold the objects in the
    sequence.
copy : bool, optional
    If true (default), then the object is copied.  Otherwise, a copy will
    only be made if __array__ returns a copy, if obj is a nested sequence,
    or if a copy is needed to satisfy any of the other requirements
    (`dtype`, `order`, etc.).
order : {'K', 'A', 'C', 'F'}, optional
    Specify the memory layout of the array. If object is not an array, the
    newly created array will be in C order (row major) unless 'F' is
    specified, in which case it will be in Fortran order (column major).
    If object is an array the following holds.

    ===== ========= ===================================================
    order  no copy                     copy=True
    ===== ========= ===================================================
    'K'   unchanged F & C order preserved, otherwise most similar order
    'A'   unchanged F order if input is F and not C, otherwise C order
    'C'   C order   C order
    'F'   F order   F order
    ===== ========= ===================================================

    When ``copy=False`` and a copy is made for other reasons, the result is
    the same as if ``copy=True``, with some exceptions for 'A', see the
    Notes section. The default order is 'K'.
subok : bool, optional
    If True, then sub-classes will be passed-through, otherwise
    the returned array will be forced to be a base-class array (default).
ndmin : int, optional
    Specifies the minimum number of dimensions that the resulting
    array should have.  Ones will be prepended to the shape as
    needed to meet this requirement.
like : array_like, optional
    Reference object to allow the creation of arrays which are not
    NumPy arrays. If an array-like passed in as ``like`` supports
    the ``__array_function__`` protocol, the result will be defined
    by it. In this case, it ensures the creation of an array object
    compatible with that passed in via this argument.

    .. versionadded:: 1.20.0

@pawamoy
Copy link
Member

pawamoy commented Jun 8, 2023

Yep second example poses no issue, since the indentation helps us determining it's still within the section (and section item).

We could add an option to the numpy parser? This way, no breaking change, and users can still opt-in to be able to have blank lines in sections (while needing double blank lines to separate prose from sections or inversely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants