Given a video with associated dialogue as premise, and two possible future events, the VLEP task requires systems to predict which one is more likely to happen. The task performance is evaluated using accuracy.
A prediction file is .jsonl
file. Each line in this file contains a single json string that
can be loaded as a dict
with two entries.
{"example_id": int, "pred_ans": int}
example_id
is the id of the example, pred_ans
is the index of the predicted answer, in {0, 1}
.
At project root, run
bash standalone_eval/eval_sample.sh
This command will use eval.py to evaluate the provided sample_dev_submission.jsonl
file,
the output will be written into sample_dev_submission_metrics_new.json
.
Its content should be similar if not the same as sample_dev_submission_metrics.jsonl
file.
To get your model's performance on test
split,
please submit both dev
and test
predictions to our
CodaLab evaluation server.
The submission file should be a single .zip
file (no enclosing folder)
that contains the two prediction files
vlep_test_submission.jsonl
and vlep_dev_submission.jsonl
, each of the *submission.jsonl
file
should be formatted as instructed above.