Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultipanelVQA and POPE vision-language scenarios #2517

Merged
merged 3 commits into from
Mar 31, 2024

Conversation

ImKeTT
Copy link
Contributor

@ImKeTT ImKeTT commented Mar 28, 2024

Hello, this PR is to add two vision-language scenarios to VHELM --- MultipanelVQA from https://arxiv.org/abs/2401.15847 and the POPE benchmark from https://aclanthology.org/2023.emnlp-main.20/.

There are two subjects (synthetic or real-world) and two question_type (multiple-choice or open) for MultipanelVQA, I use get_short_answer_generation_adapter_spec for open-ended generation and get_multiple_choice_joint_adapter_spec for multiple-choice type questions. For both scenarios, I use get_exact_match_metric_specs for evaluation.

Here's a screenshot after running ./pre-commit.sh

pre-commit-mpvqa-pope

Here're several screenshots and the scenario_state.json of toy runs on two scenarios (Qwen-VL-Chat on 25 instances):
POPE
pope_cl
pope_scenario_state.json

MultipanelVQA-real-world
mpvqa_cl
mpvqa-real-open-scenario_state.json
mpvqa-real-mc-scenario_state.json

MultipanelVQA-synthetic
mpvqa_cl
mpvqa-syn-open-scenario_state.json
mpvqa-syn-mc-scenario_state.json

Please let me know how I can improve it.
Thanks!

@teetone teetone self-requested a review March 29, 2024 07:36
Copy link
Member

@teetone teetone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ImKeTT Thanks for adding these! I had a few minor comments. Could you also add the conf file you used to run in the PR description?

src/helm/benchmark/run_specs/vlm_run_specs.py Outdated Show resolved Hide resolved
src/helm/benchmark/static/schema_vlm.yaml Outdated Show resolved Hide resolved
@ImKeTT
Copy link
Contributor Author

ImKeTT commented Mar 29, 2024

Thanks for reviewing @teetone ! I've re-framed POPE to the MCQA task and added more detailed descriptions for these two scenarios.

Here are the configuration files I used for this PR.
For MultipanelVQA

entries: [
    {description: "multipanelvqa:subject=synthetic,question_type=multiple-choice,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=synthetic,question_type=open,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=real-world,question_type=multiple-choice,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=real-world,question_type=open,model=qwen/qwen-vl-chat", priority: 1}
    ]

For POPE

entries: [
    {description: "pope:model=qwen/qwen-vl-chat", priority: 1}
    ]

@teetone
Copy link
Member

teetone commented Mar 30, 2024

Thanks for reviewing @teetone ! I've re-framed POPE to the MCQA task and added more detailed descriptions for these two scenarios.

Here are the configuration files I used for this PR. For MultipanelVQA

entries: [
    {description: "multipanelvqa:subject=synthetic,question_type=multiple-choice,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=synthetic,question_type=open,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=real-world,question_type=multiple-choice,model=qwen/qwen-vl-chat", priority: 1}
    {description: "multipanelvqa:subject=real-world,question_type=open,model=qwen/qwen-vl-chat", priority: 1}
    ]

For POPE

entries: [
    {description: "pope:model=qwen/qwen-vl-chat", priority: 1}
    ]

Thanks @ImKeTT! could you address one last comment in schema_vlm.yaml?

@ImKeTT
Copy link
Contributor Author

ImKeTT commented Mar 30, 2024

Thanks @ImKeTT! could you address one last comment in schema_vlm.yaml?

Sure, I think it's ready to go now, thanks @teetone !

@teetone teetone merged commit b29fb5e into stanford-crfm:main Mar 31, 2024
6 checks passed
@ImKeTT ImKeTT deleted the multipanel-and-pope branch April 9, 2024 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants