Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The accuracy of trt-llm-qwen-vl-chat is low. #2241

Open
2 of 4 tasks
xiangxinhello opened this issue Sep 19, 2024 · 1 comment
Open
2 of 4 tasks

The accuracy of trt-llm-qwen-vl-chat is low. #2241

xiangxinhello opened this issue Sep 19, 2024 · 1 comment
Assignees
Labels
not a bug Some known limitation, but not a bug. triaged Issue has been triaged by maintainers

Comments

@xiangxinhello
Copy link

System Info

trt-llm-0.12.0

Who can help?

@kaiyux

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

this is trt-llm-qwenvl test script

python3 run_chat.py
--tokenizer_dir=./Qwen-VL-Chat
--qwen_engine_dir=./trt_engines/Qwen-VL-7B-Chat
--vit_engine_dir=./plan
--display
--local_machine

Text (or 'q' to quit): 框出图中击掌的位置
击掌(539,512),(588,600)
but result of huggingface-qwenvl is (536,509),(588,602)

this is huggingface-qwenvl test script

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
torch.manual_seed(1234)

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="cuda", trust_remote_code=True).eval()

model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)

query = tokenizer.from_list_format([
{'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'}, # Either a local path or an url
{'text': '这是什么?'},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)

2nd dialogue turn

response, history = model.chat(tokenizer, '框出图中击掌的位置', history=history)
print(response)

击掌(536,509),(588,602)

image = tokenizer.draw_bbox_on_latest_picture(response, history)
if image:
image.save('1.jpg')
else:
print("no box")

this is trt-llm-qwenvl result

Image

this is huggingface-qwenvl result

Image

Expected behavior

I hope the performance of trt-llm-qwenvl is the same as that of huggingface-qwenvl.

actual behavior

The difference is significant.

additional notes

no

@xiangxinhello xiangxinhello added the bug Something isn't working label Sep 19, 2024
@lfr-0531
Copy link
Collaborator

From the results, it doesn't seem to be a large difference. The output box can capture the semantically correct area.

@lfr-0531 lfr-0531 self-assigned this Sep 20, 2024
@lfr-0531 lfr-0531 added triaged Issue has been triaged by maintainers not a bug Some known limitation, but not a bug. and removed bug Something isn't working labels Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not a bug Some known limitation, but not a bug. triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants