From 346928d42d3f4ec530cb02f3b99fa1861725d2d8 Mon Sep 17 00:00:00 2001 From: Aakash Thatte Date: Wed, 28 Feb 2024 15:54:37 +0530 Subject: [PATCH 1/2] Add docs for new entity-based recall metric --- .../metrics/context_entities_recall.md | 35 +++++++++++++++++++ docs/concepts/metrics/index.md | 2 ++ docs/references/metrics.rst | 1 + 3 files changed, 38 insertions(+) create mode 100644 docs/concepts/metrics/context_entities_recall.md diff --git a/docs/concepts/metrics/context_entities_recall.md b/docs/concepts/metrics/context_entities_recall.md new file mode 100644 index 000000000..0e3548a23 --- /dev/null +++ b/docs/concepts/metrics/context_entities_recall.md @@ -0,0 +1,35 @@ +# Context entities recall + +This metric gives the measure of recall of the retrieved context, based on the number of entities present in both `ground_truths` and `contexts` relative to the number of entities present in the `ground_truths` alone. Simply put, it is a measure of what fraction of entities are recalled from `ground_truths`. This metric is useful in fact-based use cases like tourism help desk, historical QA, etc. This metric can help evaluate the retrieval mechanism for entities, based on comparison with entities present in `ground_truths`, because in cases where entities matter, we need the `contexts` which cover them. + +To compute this metric, we use two sets, $GE$ and $CE$, as set of entities present in `ground_truths` and set of entities present in `contexts` respectively. We then take the number of elements in intersection of these sets and divide it by the number of elements present in the $GE$, given by the formula: + +```{math} +:label: context_entity_recall +\text{context entity recall} = \frac{| CE \cap GE |}{| GE |} +```` + +```{hint} +**Ground truth**: The Taj Mahal is an ivory-white marble mausoleum on the right bank of the river Yamuna in the Indian city of Agra. It was commissioned in 1631 by the Mughal emperor Shah Jahan to house the tomb of his favorite wife, Mumtaz Mahal. + +**High entity recall context**: The Taj Mahal is a symbol of love and architectural marvel located in Agra, India. It was built by the Mughal emperor Shah Jahan in memory of his beloved wife, Mumtaz Mahal. The structure is renowned for its intricate marble work and beautiful gardens surrounding it. + +**Low entity recall context**: The Taj Mahal is an iconic monument in India. It is a UNESCO World Heritage Site and attracts millions of visitors annually. The intricate carvings and stunning architecture make it a must-visit destination. + +```` + + +## Example + +```{code-block} python +from ragas.metrics import ContextEntityRecall +context_entity_recall = ContextEntityRecall() + +# Dataset({ +# features: ['ground_truths','contexts'], +# num_rows: 25 +# }) +dataset: Dataset + +results = context_entity_recall.score(dataset) +``` \ No newline at end of file diff --git a/docs/concepts/metrics/index.md b/docs/concepts/metrics/index.md index 069b8ff7f..e2f6abc65 100644 --- a/docs/concepts/metrics/index.md +++ b/docs/concepts/metrics/index.md @@ -14,6 +14,7 @@ Just like in any machine learning system, the performance of individual componen - [Context recall](context_recall.md) - [Context precision](context_precision.md) - [Context relevancy](context_relevancy.md) +- [Context entity recall](context_entities_recall.md) ## End-to-End Evaluation @@ -31,6 +32,7 @@ answer_relevance context_precision context_relevancy context_recall +context_entities_recall semantic_similarity answer_correctness critique diff --git a/docs/references/metrics.rst b/docs/references/metrics.rst index c6e5fba8d..dbf2a89f2 100644 --- a/docs/references/metrics.rst +++ b/docs/references/metrics.rst @@ -8,6 +8,7 @@ Metrics ragas.metrics.answer_correctness ragas.metrics.context_precision ragas.metrics.context_recall + ragas.metrics.context_entity_recall .. automodule:: ragas.metrics :members: From 09407c8d4ea489ec5921f93bb3614cd7e1d8c491 Mon Sep 17 00:00:00 2001 From: Aakash Thatte Date: Thu, 29 Feb 2024 10:03:19 +0530 Subject: [PATCH 2/2] Add 'How this was calculated?' --- .../metrics/context_entities_recall.md | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/docs/concepts/metrics/context_entities_recall.md b/docs/concepts/metrics/context_entities_recall.md index 0e3548a23..066f31018 100644 --- a/docs/concepts/metrics/context_entities_recall.md +++ b/docs/concepts/metrics/context_entities_recall.md @@ -18,6 +18,31 @@ To compute this metric, we use two sets, $GE$ and $CE$, as set of entities prese ```` +:::{dropdown} How was this calculated? +Let us consider the ground truth and the contexts given above. + +- **Step-1**: Find entities present in the ground truths. + - Entities in ground truth (GE) - ['Taj Mahal', 'Yamuna', 'Agra', '1631', 'Shah Jahan', 'Mumtaz Mahal'] +- **Step-2**: Find entities present in the context. + - Entities in context (CE1) - ['Taj Mahal', 'Agra', 'Shah Jahan', 'Mumtaz Mahal', 'India'] + - Entities in context (CE2) - ['Taj Mahal', 'UNESCO', 'India'] +- **Step-3**: Use the formula given above to calculate entity-recall + ```{math} + :label: context_entity_recall + \text{context entity recall - 1} = \frac{| CE1 \cap GE |}{| GE |} + = 4/6 + = 0.666 + ``` + + ```{math} + :label: context_entity_recall + \text{context entity recall - 2} = \frac{| CE2 \cap GE |}{| GE |} + = 1/6 + = 0.166 + ``` + + We can see that the first context had a high entity recall, because it has a better entity coverage given the ground truth. If these two contexts were fetched by two retrieval mechanisms on same set of documents, we could say that the first mechanism was better than the other in use-cases where entities are of importance. +::: ## Example