Skip to content

hltcoe/rag-run-validator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Run Validator

A simple Python package for validating a RAG run for TREC RAG-related tracks, including RAGTIME, RAG, DRAGUN, BioGen, and iKAT.

The default schema can be found in ./rag_run_validator/defaut.json, which allows the citation for each response sentence to be either a list of document IDs or a dictionary of document ID to score.

Get Started

pip install git+https://github.com/hltcoe/rag-run-validator.git

Usage

You can validate the run at command line through our cli interace.

rag_run_validator your_run.jsonl

Please refer to rag_run_validator --help for more detail.

Or through Python interface

from rag_run_validator import validate

your_run = [
    {"metadata": {...}, "responses": [{"text": ...}]}
]

# a full run
validate(your_run)
# a single topic
validate(your_run[0])
# or a path to the file
validate("path_to_your_run.jsonl")
# a custom schema
validate("path_to_your_run.jsonl", {"$schema": ...})

The validate function would throw an ValidationError execption if the input run does not comply to the schema.

Example Run

The default schema allows a run of the following format.

{
    "metadata": {
        "team_id": "my_fantastic_team",
        "run_id": "my_best_run_02", 
        "topic_id": "101",
    },
    "responses": [
        {
            "text": "Sky is blue.",
            "citations": {
                "docid001": 0.3,
                "docid003": 0.1,
            }
        },
        {
            "text": "The moon is made out of blue cheese.",
            "citations": {
                "docid002": 0.7,
            }
        },
        {
            "text": "This is all.",
            "citations": {}
        },
    ],
    "references": [
        "docid0001",
        "docid0002",
        "docid0003",
    ]
}

Or with simply a list of citations.

{
	"metadata": {
		"team_id": "my_fantastic_team",
		"run_id": "my_best_run_02", 
		"topic_id": "101",
	},
	"responses": [
		{
			"text": "Sky is blue.",
			"citations": [
				"docid001",
				"docid003",
			]
		},
		{
			"text": "The moon is made out of blue cheese.",
			"citations": [
				"docid002"
			]
		},
		{
			"text": "This is all.",
			"citations": []
		}
	],
	"references": [
		"docid0001",
		"docid0002",
		"docid0003",
	]
}

Credit

Thanks to all RAG-related coordinators at TREC and others for the discussion. Please consider participate in the tracks mentioned at the top of this README. All coordinators would appreciate your participation and effort in improving RAG systems and evaluation.

Contact

If you have any question, feel free to email Eugene Yang or raise an issue in this repository.

About

A simple Python package for validating a RAG run for TREC RAG-related tracks.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages