Skip to content

Navigation Menu

Appearance settings

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

NVIDIA / TransformerEngine Public

Notifications You must be signed in to change notification settings
Fork 451
Star 2.5k

Code
Issues 202
Pull requests 69
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

[PyTorch debug] Improve precision debug tools performance #1909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

pggPL wants to merge 21 commits into NVIDIA:main

base: main

Choose a base branch

Loading

Loading

from pggPL:nvinspect_performance

Open

[PyTorch debug] Improve precision debug tools performance #1909

pggPL wants to merge 21 commits into NVIDIA:main from pggPL:nvinspect_performance

Conversation 4 Commits 21 Checks 11 Files changed

Uh oh!

There was an error while loading. Please reload this page.

Conversation

Copy link

Collaborator

pggPL commented Jun 30, 2025 •

edited

Loading

Description

This PR aims to speed up layers which are not affected by any feature in particular iteration. They should be exactly as fast as layers without initializing debug tools.

I needed to fix 3 things:

There was a lot of CPU overhead when we tried to decide if layer uses any feature in current iteration. We have called inspect_tensor_enabled and few similar calls for each layer, iteration and tensor. I changed calls like inspect_tensor_enabled- they may return tuple (bool, int), where int indicated number of iteration the feature will be enabled next time. If each tensor for one layer returns (bool, n) we run non-debug layer for next n iterations,
debug_api.step() is called after every iteration. Inside it, we call STATS_BUFFER.log() which performs synchonization and some cpu ops, even if no stats is logged. I disable this logic if no stat was logged.
COMM/GEMM overlap was disabled for the whole time, now it is disabled when layer is affected by at least one feature.

If we want to only log some stats every n iterations, then this PR should make it work as fast as non-debug workflow when n -> infinity.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

pggPL added 3 commits

June 27, 2025 09:10


          turn on userbuffers for layers without debug

35e8438

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>


          code drop

02c0a6b

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>


          working change

7f56375

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL force-pushed the nvinspect_performance branch from dda223b to 7f56375 Compare

July 2, 2025 16:17

pggPL mentioned this pull request

Multi feature invocation support and error handling NVIDIA/nvidia-dlfw-inspect#7

Merged

fix

b5024af

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL force-pushed the nvinspect_performance branch from e2f237d to b5024af Compare

July 2, 2025 17:10

pre-commit-ci bot and others added 3 commits

July 2, 2025 17:10


          [pre-commit.ci] auto fixes from pre-commit.com hooks

d9a7c34

for more information, see https://pre-commit.ci

fix

5d70fb4

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL force-pushed the nvinspect_performance branch from 58cf805 to 9893831 Compare

July 2, 2025 20:26

pre-commit-ci bot and others added 3 commits

July 2, 2025 20:27


          [pre-commit.ci] auto fixes from pre-commit.com hooks

03dfae1

for more information, see https://pre-commit.ci

fix

cbb7557

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>


          fixes

a0ae480

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL force-pushed the nvinspect_performance branch from dfa89ee to a0ae480 Compare

July 3, 2025 10:20

pre-commit-ci bot and others added 5 commits

July 3, 2025 10:20


          [pre-commit.ci] auto fixes from pre-commit.com hooks

085a7c0

for more information, see https://pre-commit.ci


          tests and fixes

a246efb

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>


          [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>


          fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL force-pushed the nvinspect_performance branch from 9522170 to 1343547 Compare

July 3, 2025 14:33

pre-commit-ci bot and others added 3 commits

July 3, 2025 14:33


          [pre-commit.ci] auto fixes from pre-commit.com hooks

04ab037

for more information, see https://pre-commit.ci


          fixes

987a588

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fix

7c1a1f7

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL force-pushed the nvinspect_performance branch from 968336f to 7c1a1f7 Compare

July 3, 2025 14:43

pggPL marked this pull request as ready for review

July 3, 2025 14:44

Copy link

Collaborator Author

pggPL commented Jul 3, 2025

PR ready for review, waiting for NVIDIA/nvidia-dlfw-inspect#7 to be merged to update version and run tests.

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.


          update nvinspect version

7322fc2

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL force-pushed the nvinspect_performance branch from ba3d72e to 7322fc2 Compare

July 3, 2025 20:42


          [pre-commit.ci] auto fixes from pre-commit.com hooks

51444ea

for more information, see https://pre-commit.ci

Copy link

Collaborator Author

pggPL commented Jul 3, 2025

/te-ci pytorch L1

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

fix

b06fbbc

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Copy link

Collaborator Author

pggPL commented Jul 4, 2025

/te-ci pytorch

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

1 similar comment

Copy link

Collaborator

timmoon10 commented Jul 7, 2025

/te-ci pytorch

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Uh oh!

There was an error while loading. Please reload this page.

2 participants

Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.