Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C++ autosummary extension #117

Merged
merged 10 commits into from
Sep 1, 2022
Merged

Add C++ autosummary extension #117

merged 10 commits into from
Sep 1, 2022

Conversation

jbms
Copy link
Owner

@jbms jbms commented Jul 7, 2022

Fixes #92.

This is based on #116.

TODO:

  • Document how to use separate build step to preprocess C++ input header using normal compiler
  • Document how to extract compiler flags (especially include paths) from normal build process
  • Add unit tests of api_parser
  • Add unit tests of apigen
  • Add example of explicit(bool) auto-conversion
  • Support :order: option as in Python apigen
  • Support equivalent of python_apigen_default_groups / python_apigen_default_order
  • Support rST-style :group: and :relates: in addition to doxygen-style commands
  • Support @-prefixed doxygen commands in addition to \-prefixed ones (e.g. @param in addition to \param)
  • Add example of documenting multiple signatures together (by leaving no space between them)
  • Add tests of source location (e.g. in signature, and in docstring)
  • Add tests and documentation of NONITPICK

@jbms jbms requested a review from 2bndy5 July 7, 2022 23:00
@jbms jbms marked this pull request as draft July 7, 2022 23:00
@jbms jbms force-pushed the cpp-apigen branch 3 times, most recently from b16593f to 4266663 Compare July 7, 2022 23:07
@jbms
Copy link
Owner Author

jbms commented Jul 7, 2022

This is somewhat working, but perhaps we can work together to refine it.

@2bndy5
Copy link
Collaborator

2bndy5 commented Jul 7, 2022

documenting multiple signatures together (by leaving no space between them)

Is this related to overloads sharing the same docstring?

@jbms jbms force-pushed the cpp-apigen branch 2 times, most recently from 203fbe4 to 5852fdf Compare July 7, 2022 23:43
@jbms
Copy link
Owner Author

jbms commented Jul 7, 2022

documenting multiple signatures together (by leaving no space between them)

Is this related to overloads sharing the same docstring?

Yes --- the result is a single cpp:function with multiple signatures.

I added an example of this: Array::operator[]

@jbms
Copy link
Owner Author

jbms commented Jul 8, 2022

This is now updated to use types-clang rather than ones I generated myself. There are a few bugs in types-clang that required a cast as a workaround: tgockel/types-clang#2

@2bndy5
Copy link
Collaborator

2bndy5 commented Jul 8, 2022

I also noticed that CursorKind.value isn't typed at all in that stub lib...

Copy link
Collaborator

@2bndy5 2bndy5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First impressions.


.. rst:directive:: .. cpp-apigen-entity-summary:: entity-name

Generates a summary of a single Python entity.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a copy-n-paste artifact. 😄

Suggested change
Generates a summary of a single Python entity.
Generates a summary of a single C++ entity.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this example still needs to be fixed.

Also at the moment the entity is specified by the clang usr-derived id, but it should probably be something else friendlier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any documentation about the USR generation, so I often look to the src code. Some USR ids use the line number, but IIRC that is only applied to locally scoped members (like param decls and vars).

sphinx_immaterial/apidoc/cpp/api_parser.py Show resolved Hide resolved
input_path,
unsaved_files=[(input_path, input_content)],
args=tuple(config.compiler_flags) + ("-ferror-limit=0",),
options=( # TranslationUnit.PARSE_SKIP_FUNCTION_BODIES +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not ignore the function bodies?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a comment. If function bodies are skipped, the extent does not include the body. That interferes with detecting if there are multiple entities defined with no space between (so that they will all be documented together).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok. Yeah, a comment would've satisfied that inquiry.

sphinx_immaterial/apidoc/cpp/api_parser.py Show resolved Hide resolved
sphinx_immaterial/apidoc/cpp/api_parser.py Show resolved Hide resolved
Macros matching any of these patterns are undocumented.
"""

ignore_diagnostics: List[Pattern] = dataclasses.field(default_factory=list)
Copy link
Collaborator

@2bndy5 2bndy5 Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be made an arbitrary bool. In my research, I had a conf.py option to output these diags to the build log (defaulting to false) using the severity of the diag as a logging level. If users really want to use static analysis, then they can use clang-tidy separately.

Diagnostics can also be specified to the compiler_flags option (which are really just arbitrary clang args); the compilation database's path might be used in this way also.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found that any compiler errors are usually an indication that the result of parsing will be wrong, sometimes in subtle ways, and should normally not be ignored. However, for tensorstore I ran into some errors related to an undefined __builtin in a standard library header that I was not able to avoid, and did not seem to cause any problems, so I added this mechanism to suppress them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that clang will fail fast if it can't find an included lib, thus no parsing at all. That's why I mentioned the compilation database path (-p I think) can be used when third party libs are involved.

I haven't yet looked at how you're normalizing the path relative to the conf.py path, but that might also be a factor when passing certain args to clang.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paths are normalized using include_directory_map. We should probably try to support compilation databases, but one tricky thing about a compilation database is that it doesn't include the compiler built-in include paths. An alternative is to have the user preprocess the desired headers first using their normal compiler. The original file/line information is preserved by the added #line directives.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think users can specify -working-directory to clang. Or, maybe we should provide an option to do it.

@2bndy5
Copy link
Collaborator

2bndy5 commented Jul 8, 2022

Going forward, can we change NONITPICK to NO_NIT_PICK? I keep reading it as NON_IT_PICK 🤣

@jbms
Copy link
Owner Author

jbms commented Jul 8, 2022

Going forward, can we change NONITPICK to NO_NIT_PICK? I keep reading it as NON_IT_PICK rofl

Makes sense --- or could be: nowarnxref or something --- nitpick is a sphinx-specific term that may be confusing to readers of the C++ source code.

@2bndy5
Copy link
Collaborator

2bndy5 commented Jul 16, 2022

ok, I rebased this branch on latest main, and the CI is failing because it now tests for Windows and the fix for that is in my other branch (based on this one).

@jbms
Copy link
Owner Author

jbms commented Sep 1, 2022

I wonder if we should merge this as an experimental feature, since this PR also contains the default-literal-role and highlight changes that also apply to Python and JSON.

@2bndy5
Copy link
Collaborator

2bndy5 commented Sep 1, 2022

We could do that. I've been think on ways to refactor this ext. My priorities are

  1. better parsing mechanism
  2. specify an entity via user friendly implementation

The database and diagnostics are a lesser priority for me. And I assume unit testing will just develop organically.

@jbms
Copy link
Owner Author

jbms commented Sep 1, 2022

To be clear, by better parsing mechanism, is that in regards to the doxygen syntax, or in regards to the C++ source itself?

For specifying an entity, we could allow it to be specified by its "object name", e.g. foo::bar::Baz or foo::bar::Baz[overload_id] if overloaded. Much more reasonable than the clang USR. Allowing it to be specified by signature, e.g. as supported by the C++ domain for cross references, would be significantly more challenging I think.

@2bndy5
Copy link
Collaborator

2bndy5 commented Sep 1, 2022

To be clear, by better parsing mechanism, is that in regards to the doxygen syntax

Yes. I'm not sure how the C++ parsing would need improvement. I still want to provide a config option to turn off the doxygen syntax parsing, so the docstrings can be written in pure RST.

we could allow it to be specified by its "object name", e.g. foo::bar::Baz or foo::bar::Baz[overload_id] if overloaded

This was my thinking as well, but it would be more natural to use the parameter datatypes types for overloads (instead of a docstring specific ID). I find the ability to document multiple overloads with a single docstring very appealing - not even doxygen is capable of that.

@jbms
Copy link
Owner Author

jbms commented Sep 1, 2022

To be clear, by better parsing mechanism, is that in regards to the doxygen syntax

Yes. I'm not sure how the C++ parsing would need improvement. I still want to provide a config option to turn off the doxygen syntax parsing, so the docstrings can be written in pure RST.

Sounds good.

we could allow it to be specified by its "object name", e.g. foo::bar::Baz or foo::bar::Baz[overload_id] if overloaded

This was my thinking as well, but it would be more natural to use the parameter datatypes types for overloads (instead of a docstring specific ID). I find the ability to document multiple overloads with a single docstring very appealing - not even doxygen is capable of that.

That could be done using the C++ domain symbol resolution used for cross-references, but is a bit tricky because currently the symbol tree only gets populated as the c++ domain directives are actually run, which is too late. We would need to somehow create a separate one and pre-populate it. I also think it is often inconvenient to specify the full signature if you have a lot of template or function parameters.

@2bndy5
Copy link
Collaborator

2bndy5 commented Sep 1, 2022

currently the symbol tree only gets populated as the c++ domain directives are actually run

I noticed that. Is there a reason you don't just save the parsed API to a builder env var (on a config-inited event)?

@jbms
Copy link
Owner Author

jbms commented Sep 1, 2022

There wasn't a need for the symbol tree except for normal xrefs so I just tried to keep it working closely to how it normally does.

Note that the c++ domain symbol lookup is slow if you have a lot of symbols in the same namespace because it just does a linear search over every name --- it would be better if it used a dict.

@2bndy5
Copy link
Collaborator

2bndy5 commented Sep 1, 2022

Well I'm ok with this getting merged but there should be a warning on both config and demo doc pages. Possibly even a warning in the build log like you did for graphviz, but that seems a little excessive to me -- you went above and beyond my expectations there (again).

@2bndy5
Copy link
Collaborator

2bndy5 commented Sep 1, 2022

Instead of a task list in the thread OP, you could use a github project to track the progress of this extension.

jbms and others added 7 commits September 1, 2022 07:10
Also adds facilities for saving/restoring default role state.
… options

This also ensures that the default roles and highlight language are
restored after generated JSON/C++/Python object descriptions are
inserted.
Co-authored-by: Brendan <2bndy5@gmail.com>
* fix cross-refs in docs

* enable cpp.apigen verbosity and show number of found decls

* adjust clang imports for quicker edits

* fix windows path separator compensation

* support `\` and `@` as cmd prefixes

- support for param direction
- support for retval cmd
- add new strip_comment() to strip all forms of C++ comment syntax from comment tokens' text
- use new strip_comment() instead of text.lstrip()
  This will also preserve consistent indentation that is needed for
  code-blocks (which isn't supported yet)
- modify index_interva.h to test some of the new features

* use explicit role for cross-refs

* admonition importance of Linux path separators

* change admonished text (per request)

* change erroneous admonition text

* allow blank docstr lines to get normalized

- add multiline flag for re.sub(brief/details) call
- ran black on api_parser.py

* add some unit tests

* add a blank line to array.h docstr

* change array.h until we support multline comments

The indentation removed here (on a subsequent
line) is interpretted as a blockquote while still
considered within the :param: field.

This change also satisfies what the the Doxygen
parser expects; meaning unexpected indentation
is prohibited amidst single a multi-lined paragraph.

* requested changes

* try to get raw_comment first

* update tests about comment stripping

* only dedent multiline comments during stripping

adjust tests about leading whitespace for single line comments

* [no ci] remove outdated conditional statement

* latest review requests

- reverted xrefs in Config docs (now that roles are fixed)
- removed old algorithm for parsing docstring line-by-line
- added regex to remove non-docstring comments from a raw_comment
- revised docstring parsing test with pytest.mark.paramtrize
- added 2 tests to check for proper removal of non-docstring comments

* return None when no raw_comment exists

* replace non-doc comments with blank lines

also added IDs to the growing parametrized test_comment_styles()

* satisfy review request about demo src (`\returns`)
@jbms
Copy link
Owner Author

jbms commented Sep 1, 2022

I added a warning that it is experimental.

2bndy5
2bndy5 previously approved these changes Sep 1, 2022
@2bndy5
Copy link
Collaborator

2bndy5 commented Sep 1, 2022

Did I mess up the rebase?

@jbms
Copy link
Owner Author

jbms commented Sep 1, 2022

Did I mess up the rebase?

Not sure what happened with types-clang

@jbms jbms marked this pull request as ready for review September 1, 2022 15:58
@2bndy5
Copy link
Collaborator

2bndy5 commented Sep 1, 2022

I thought it was weird when the rebased push didn't trigger the CI (only RTD's CI ran).

setup.py Outdated Show resolved Hide resolved
@jbms jbms merged commit 21f2aee into main Sep 1, 2022
@jbms jbms deleted the cpp-apigen branch September 1, 2022 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorporation of TensorStore C++ autosummary extension
2 participants