The accuracy of trt-llm-qwen-vl-chat is low. #2241

xiangxinhello · 2024-09-19T02:25:43Z

System Info

trt-llm-0.12.0

Who can help?

@kaiyux

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

this is trt-llm-qwenvl test script

python3 run_chat.py
--tokenizer_dir=./Qwen-VL-Chat
--qwen_engine_dir=./trt_engines/Qwen-VL-7B-Chat
--vit_engine_dir=./plan
--display
--local_machine

Text (or 'q' to quit): 框出图中击掌的位置
击掌(539,512),(588,600)
but result of huggingface-qwenvl is (536,509),(588,602)

this is huggingface-qwenvl test script

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
torch.manual_seed(1234)

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="cuda", trust_remote_code=True).eval()

model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)

query = tokenizer.from_list_format([
{'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'}, # Either a local path or an url
{'text': '这是什么?'},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)

2nd dialogue turn

response, history = model.chat(tokenizer, '框出图中击掌的位置', history=history)
print(response)

击掌(536,509),(588,602)

image = tokenizer.draw_bbox_on_latest_picture(response, history)
if image:
image.save('1.jpg')
else:
print("no box")

this is trt-llm-qwenvl result

this is huggingface-qwenvl result

Expected behavior

I hope the performance of trt-llm-qwenvl is the same as that of huggingface-qwenvl.

actual behavior

The difference is significant.

additional notes

no

The text was updated successfully, but these errors were encountered:

lfr-0531 · 2024-09-20T15:03:45Z

From the results, it doesn't seem to be a large difference. The output box can capture the semantically correct area.

xiangxinhello added the bug Something isn't working label Sep 19, 2024

lfr-0531 self-assigned this Sep 20, 2024

lfr-0531 added triaged Issue has been triaged by maintainers not a bug Some known limitation, but not a bug. and removed bug Something isn't working labels Sep 20, 2024

lfr-0531 mentioned this issue Sep 20, 2024

Qwen-VL-Chat has an error #2206

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The accuracy of trt-llm-qwen-vl-chat is low. #2241

The accuracy of trt-llm-qwen-vl-chat is low. #2241

xiangxinhello commented Sep 19, 2024

lfr-0531 commented Sep 20, 2024

The accuracy of trt-llm-qwen-vl-chat is low. #2241

The accuracy of trt-llm-qwen-vl-chat is low. #2241

Comments

xiangxinhello commented Sep 19, 2024

System Info

Who can help?

Information

Tasks

Reproduction

this is trt-llm-qwenvl test script

this is huggingface-qwenvl test script

2nd dialogue turn

击掌(536,509),(588,602)

this is trt-llm-qwenvl result

this is huggingface-qwenvl result

Expected behavior

actual behavior

additional notes

lfr-0531 commented Sep 20, 2024