Skip to content

TE Gemma tutorial attempt#2 #1839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

sudhakarsingh27
Copy link
Collaborator

Description

Adds a tutorial to showcase how to:

  1. use TE TransformerEngine layer in place of HuggingFace's GemmaDecoderLayer in Gemma models.
  2. use non-paged and paged KV cache from TE
  3. use CUDA Graphs and fp8_model_init to optimize generation times

Attempt#1 @ #829

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)

@sudhakarsingh27 sudhakarsingh27 force-pushed the te_gemma_tutorial_base branch from 03729bc to 2a514cf Compare June 2, 2025 21:10
@sudhakarsingh27 sudhakarsingh27 force-pushed the te_gemma_tutorial_base branch from 2a514cf to 4757bfa Compare June 2, 2025 21:19
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
@sudhakarsingh27 sudhakarsingh27 force-pushed the te_gemma_tutorial_base branch 3 times, most recently from 5d7538e to 93960fd Compare June 16, 2025 22:09
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
@sudhakarsingh27 sudhakarsingh27 force-pushed the te_gemma_tutorial_base branch from 588fcd6 to 6cd3c1a Compare June 17, 2025 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant