-
Notifications
You must be signed in to change notification settings - Fork 913
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
The accuracy of trt-llm-qwen-vl-chat is low.
bug
Something isn't working
#2241
opened Sep 19, 2024 by
xiangxinhello
2 of 4 tasks
Cannot set earlyStopping 0 when using ModelRunnerCpp
bug
Something isn't working
#2239
opened Sep 18, 2024 by
PKaralupov
4 tasks
How to set TensorRT-LLM to use Flash Attention 3
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2238
opened Sep 18, 2024 by
kanebay
Working with vllm is much easier than working with tensorrt
feature request
New feature or request
triaged
Issue has been triaged by maintainers
#2237
opened Sep 18, 2024 by
Alireza3242
Can tensorrt-llm or how tensorrt-llm support that seprating the prefill stage and decode stage in different GPU or different nodes with self configuration
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2235
opened Sep 18, 2024 by
GGBond8488
gemma-2-27b bad outputs
bug
Something isn't working
#2233
opened Sep 17, 2024 by
siddhatiwari
2 of 4 tasks
whisper-medium decoder Compile blocking
bug
Something isn't working
#2227
opened Sep 14, 2024 by
skyCreateXian
1 of 4 tasks
"use_embedding_sharing" option not working for llama model.
bug
Something isn't working
#2226
opened Sep 14, 2024 by
jxchenus
2 of 4 tasks
error in tag v0.12.0 when build from source
bug
Something isn't working
#2222
opened Sep 12, 2024 by
zhangts20
2 of 4 tasks
[critical bug] Enabling cache reuse leads to logit violations and tokens going out of range.
#2217
opened Sep 10, 2024 by
akhoroshev
Cannot build quantized int8 models for Phi3 128k models [TensorRT-LLM 0.12.0]
bug
Something isn't working
#2214
opened Sep 10, 2024 by
louis845
2 of 4 tasks
Can we directly pass the input_embeds to the
generate
function
#2211
opened Sep 9, 2024 by
OswaldoBornemann
multi-gpu error:MPI_Unknown_error for examples/apps/chat.py
bug
Something isn't working
#2209
opened Sep 9, 2024 by
youxzAnt
4 tasks
Accuracy Problem: Qwen speculative decoding, different output for num_draft_tokens=2 and num_draft_tokens=5
bug
Something isn't working
#2208
opened Sep 9, 2024 by
jasica528
2 of 4 tasks
[Nougat]Accuracy Problem: different output for both float32 and bfloat 16 trtllm engine with float32 huggingface original model
bug
Something isn't working
#2207
opened Sep 9, 2024 by
ehuaa
3 of 4 tasks
Qwen-VL-Chat has an error
bug
Something isn't working
#2206
opened Sep 9, 2024 by
xiangxinhello
2 of 4 tasks
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.