You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
query = tokenizer.from_list_format([
{'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'}, # Either a local path or an url
{'text': '这是什么?'},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)
2nd dialogue turn
response, history = model.chat(tokenizer, '框出图中击掌的位置', history=history)
print(response)
击掌(536,509),(588,602)
image = tokenizer.draw_bbox_on_latest_picture(response, history)
if image:
image.save('1.jpg')
else:
print("no box")
this is trt-llm-qwenvl result
this is huggingface-qwenvl result
Expected behavior
I hope the performance of trt-llm-qwenvl is the same as that of huggingface-qwenvl.
actual behavior
The difference is significant.
additional notes
no
The text was updated successfully, but these errors were encountered:
System Info
trt-llm-0.12.0
Who can help?
@kaiyux
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
this is trt-llm-qwenvl test script
python3 run_chat.py
--tokenizer_dir=./Qwen-VL-Chat
--qwen_engine_dir=./trt_engines/Qwen-VL-7B-Chat
--vit_engine_dir=./plan
--display
--local_machine
Text (or 'q' to quit): 框出图中击掌的位置
击掌(539,512),(588,600)
but result of huggingface-qwenvl is (536,509),(588,602)
this is huggingface-qwenvl test script
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
torch.manual_seed(1234)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="cuda", trust_remote_code=True).eval()
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)
query = tokenizer.from_list_format([
{'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'}, # Either a local path or an url
{'text': '这是什么?'},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)
2nd dialogue turn
response, history = model.chat(tokenizer, '框出图中击掌的位置', history=history)
print(response)
击掌(536,509),(588,602)
image = tokenizer.draw_bbox_on_latest_picture(response, history)
if image:
image.save('1.jpg')
else:
print("no box")
this is trt-llm-qwenvl result
this is huggingface-qwenvl result
Expected behavior
I hope the performance of trt-llm-qwenvl is the same as that of huggingface-qwenvl.
actual behavior
The difference is significant.
additional notes
no
The text was updated successfully, but these errors were encountered: