Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: getting started #589

Merged
merged 13 commits into from
Feb 17, 2024
2 changes: 1 addition & 1 deletion docs/community/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
(community)=
# Community ❤️
# ❤️ Community

**"Alone we can do so little; together we can do so much." - Helen Keller**

Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
(core-concepts)=
# Core Concepts
# 📚 Core Concepts
:::{toctree}
:caption: Concepts
:hidden:
Expand Down
69 changes: 30 additions & 39 deletions docs/getstarted/evaluation.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,30 @@
(get-started-evaluation)=
# Evaluation
# Evaluating Your Test Set
jjmachan marked this conversation as resolved.
Show resolved Hide resolved

Welcome to the ragas quickstart. We're going to get you up and running with ragas as quickly as you can so that you can go back to improving your Retrieval Augmented Generation pipelines while this library makes sure your changes are improving your entire pipeline.
Once your test set is ready (whether you've created your own or used the [synthetic test set generation module](get-started-testset-generation)), it's time to evaluate your RAG pipeline. Our aim is to help you set up with Ragas as quickly as possible so that you can focus on enhancing your Retrieval Augmented Generation pipelines while this library ensures your changes are improving the entire pipeline.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a mistake?

while this library ensures your changes are improving the entire pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, its correct i think


to kick things of lets start with the data

:::{note}
Are you using Azure OpenAI endpoints? Then checkout [this quickstart
guide](../howtos/customisations/azure-openai.ipynb)
:::

```bash
pip install ragas
```

Ragas also uses OpenAI for running some metrics so make sure you have your openai key ready and available in your environment
This guide uses OpenAI for running some metrics, so make sure you have your OpenAI key ready and available in your environment.

```python
import os
os.environ["OPENAI_API_KEY"] = "your-openai-key"
```
:::{note}
By default, these metrics use OpenAI's API to compute the score. If you're using this metric, ensure that you've set the environment key `OPENAI_API_KEY` with your API key. You can also try other LLMs for evaluation, check the [LLM guide](../howtos/customisations/llms.ipynb) to learn more.
:::

Let's start with the data.

## The Data

For this tutorial we are going to use an example dataset from one of the baselines we created for the [Financial Opinion Mining and Question Answering (fiqa) Dataset](https://sites.google.com/view/fiqa/). The dataset has the following columns.
For this tutorial, we'll use an example dataset from one of the baselines we created for the [Amnesty QA](https://huggingface.co/datasets/explodinggradients/amnesty_qa) dataset. The dataset contains the following columns:

- question: `list[str]` - These are the questions your RAG pipeline will be evaluated on.
- answer: `list[str]` - The answer generated from the RAG pipeline and given to the user.
- answer: `list[str]` - The answer generated from the RAG pipeline and provided to the user.
- contexts: `list[list[str]]` - The contexts which were passed into the LLM to answer the question.
- ground_truths: `list[list[str]]` - The ground truth answer to the questions. (only required if you are using context_recall)

Ideally your list of questions should reflect the questions your users give, including those that you have been problematic in the past.
- ground_truth: `list[str]` - The ground truth answer to the questions.

Ideally, your list of questions should reflect the questions your users ask, including those that have been problematic in the past.
jjmachan marked this conversation as resolved.
Show resolved Hide resolved

```{code-block} python
:caption: import sample dataset
Expand All @@ -42,19 +36,19 @@ amnesty_qa
```

:::{seealso}
See [testset generation](./testset_generation.md) to learn how to generate your own synthetic data for evaluation.
See [test set generation](./testset_generation.md) to learn how to generate your own synthetic data for evaluation.
:::

## Metrics

Ragas provides you with a few metrics to evaluate the different aspects of your RAG systems namely
Ragas provides several metrics to evaluate various aspects of your RAG systems:

1. Retriever: offers `context_precision` and `context_recall` which give you the measure of the performance of your retrieval system.
2. Generator (LLM): offers `faithfulness` which measures hallucinations and `answer_relevancy` which measures how to the point the answers are to the question.
1. Retriever: Offers `context_precision` and `context_recall` that measure the performance of your retrieval system.
2. Generator (LLM): Provides `faithfulness` that measures hallucinations and `answer_relevancy` that measures how on point the answers are to the question.

The harmonic mean of these 4 aspects gives you the **ragas score** which is a single measure of the performance of your QA system across all the important aspects.
There are numerous other metrics available in Ragas, check the [metrics guide](ragas-metrics) to learn more.

now lets import these metrics and understand more about what they denote
Now, let's import these metrics and understand more about what they denote.

```{code-block} python
:caption: import metrics
Expand All @@ -65,21 +59,18 @@ from ragas.metrics import (
context_precision,
)
```
here you can see that we are using 4 metrics, but what do they represent?
Here we're using four metrics, but what do they represent?

1. faithfulness - the factual consistency of the answer to the context base on the question.
2. context_precision - a measure of how relevant the retrieved context is to the question. Conveys quality of the retrieval pipeline.
3. answer_relevancy - a measure of how relevant the answer is to the question
4. context_recall: measures the ability of the retriever to retrieve all the necessary information needed to answer the question.
1. Faithfulness - Measures the factual consistency of the answer to the context based on the question.
2. Context_precision - Measures how relevant the retrieved context is to the question, conveying the quality of the retrieval pipeline.
3. Answer_relevancy - Measures how relevant the answer is to the question.
4. Context_recall - Measures the retriever's ability to retrieve all necessary information required to answer the question.


:::{note}
by default these metrics are using OpenAI's API to compute the score. If you using this metric make sure you set the environment key `OPENAI_API_KEY` with your API key. You can also try other LLMs for evaluation, check the [llm guide](../howtos/customisations/llms.ipynb) to learn more
:::
To explore other metrics, check the [metrics guide](ragas-metrics).

## Evaluation

Running the evaluation is as simple as calling evaluate on the `Dataset` with the metrics of your choice.
Running the evaluation is as simple as calling `evaluate` on the `Dataset` with your chosen metrics.

```{code-block} python
:caption: evaluate using sample dataset
Expand All @@ -97,9 +88,9 @@ result = evaluate(

result
```
and there you have it, all the scores you need.
There you have it, all the scores you need.

Now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!
If you want to delve deeper into the results and identify examples where your pipeline performed poorly or exceptionally well, you can convert it into a pandas DataFrame and use your standard analytics tools!

```{code-block} python
:caption: export results
Expand All @@ -110,6 +101,6 @@ df.head()
<img src="../_static/imgs/quickstart-output.png" alt="quickstart-outputs" width="800" height="600" />
</p>

And thats it!
That's all!

If you have any suggestion/feedbacks/things your not happy about, please do share it in the [issue section](https://github.com/explodinggradients/ragas/issues). We love hearing from you 😁
If you have any suggestions, feedback or issues, please share them in the [issue section](https://github.com/explodinggradients/ragas/issues). We value your input.
37 changes: 16 additions & 21 deletions docs/getstarted/index.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,44 @@
(get-started)=
# Get Started
# 🚀 Get Started

:::{toctree}
:maxdepth: 1
:hidden:
install.md
evaluation.md
testset_generation.md
evaluation.md
monitoring.md
:::

Welcome to the Ragas tutorials! These beginner-friendly tutorials will guide you
through the fundamentals of working with Ragas. These tutorials do assume basic
knowledge of Python and Retrieval Augmented Generation (RAG) pipelines.
Welcome to the Ragas tutorials! If you're new to Ragas, the Get Started guides will walk you through the fundamentals of working with Ragas. These tutorials assume basic knowledge of Python and Retrieval Augmented Generation (RAG) pipelines.

Before you go further make sure you have [Ragas installed](./install.md)!
Before you proceed further, make sure you have [Ragas installed](./install.md)!

:::{note}
The tutorials only give you on overview of what you can do with ragas and the
basic skill you need to use it. If you want an in-depth explanation of the
core-concepts behind Ragas, check out the [Core Concepts](../concepts/index.md) page. You can also checkout the [How-to Guides](../howtos/index.md) if you want to specific applications of Ragas.
The tutorials only give you an overview of what you can do with Ragas and the basic skills needed to use it. If you want an in-depth explanation of the core concepts behind Ragas, check out the [Core Concepts](../concepts/index.md) page. You can also check out the [How-to Guides](../howtos/index.md) if you want specific applications of Ragas.
:::

If you have any questions about Ragas, feel free to join and ask in the `#questions` channel in our discord community.

If you have any questions about Ragas, feel free to join and ask in the
`#questions` channel in our discord community ❤ .
Let’s get started!

Let’s get started! 🏁

:::{card} Ragas Metrics and Evaluation
:link: get-started-evaluation
:::{card} Generate a Synthetic Testset
:link: get-started-testset-generation
:link-type: ref

How to use the Ragas Metrics to evaluate your RAG pipelines.
If you want to learn how to generate a synthetic testset to get started.
:::

:::{card} Synthetic Test data Generation
:link: get-started-testset-generation
:::{card} Evaluate your Testset
:link: get-started-evaluation
:link-type: ref

How to generate test set to assess your RAG pipelines
If you are looking to evaluate your RAG pipeline against your testset (your own dataset or synthetic).
:::
:::{card} Monitoring

:::{card} Monitor your RAG in Production
:link: get-started-monitoring
:link-type: ref

How to monitor your RAG systems in production.
If you're curious about monitoring the performance and quality of your RAG application in production.
:::
6 changes: 4 additions & 2 deletions docs/getstarted/install.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Install

You can install ragas with
To get started, install ragas with `pip` as
```bash
pip install ragas
```

If you want to install the latest version (from the main branch)
If you want to play around with the latest and greatest, install the latest version (from the main branch)
```bash
pip install git+https://github.com/explodinggradients/ragas.git
```
Expand All @@ -18,3 +18,5 @@ git clone https://github.com/explodinggradients/ragas.git
cd ragas
pip install -e .
```

Next let's build a [synthetic testset](get-started-testset-generation) with your own data or If you brought your own testset, lets learn how you can [evaluate it](get-started-evaluation) with Ragas.
32 changes: 17 additions & 15 deletions docs/getstarted/monitoring.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,29 @@
(get-started-monitoring)=
# Monitoring
# Monitor your RAG in Production

Maintaining the quality and performance of an LLM application in a production environment can be challenging. Ragas provides with basic building blocks that you can use for production quality monitoring, offering valuable insights into your application's performance. This is achieved by constructing custom, smaller, more cost-effective, and faster models.
Maintaining the quality and performance of a RAG application in a production environment can be challenging. RAG currently provides the essential building blocks that you can use for production-quality monitoring, offering valuable insights into your application's performance. However, we are also working towards building a more advanced production monitoring solution by addressing three questions:

1. How can we ensure the distribution of your production dataset remains consistent with your test set?
jjmachan marked this conversation as resolved.
Show resolved Hide resolved
2. How can we effectively extract insights from explicit and implicit signals your users provide to infer the quality of your RAG application and identify areas that require attention?
3. How can we construct custom, smaller, more cost-effective and faster models for evaluation and more advanced test set generation?

:::{note}
This is feature is still in beta access. You can requests for
[**early access**](https://calendly.com/shahules/30min) to try it out.
We are still developing and gathering feedback for upcoming releases. You can request
[**early access**](https://calendly.com/shahules/30min) to try it out or share the challenges you face in this area. We would love to hear your thoughts and challenges.
:::

The Ragas metrics can also be used with other LLM observability tools like
[Langsmith](https://www.langchain.com/langsmith) and
[Langfuse](https://langfuse.com/) to get model-based feedback about various
aspects of you application like those mentioned below
Additionally, you can use the RAG metrics with other Machine Learning Model (MLM) observability tools like
- [Langsmith](../howtos/integrations/langsmith.ipynb)
- [Phoenix (Arize)](../howtos/integrations/ragas-arize.ipynb)
- [Langfuse](../howtos/integrations/langfuse.ipynb)
- [OpenLayer](https://openlayer.com/)

:::{seealso}
[Langfuse Integration](../howtos/integrations/langfuse.ipynb) to see Ragas
monitoring in action within the Langfuse dashboard and how to set it up
:::
to get model-based feedback about various aspects of your application, such as those mentioned below:

## Aspects to Monitor

1. Faithfulness: This feature assists in identifying and quantifying instances of hallucinations.
2. Bad retrieval: This feature helps identify and quantify poor context retrievals.
3. Bad response: This feature helps in recognizing and quantifying evasive, harmful, or toxic responses.
4. Bad format: This feature helps in detecting and quantifying responses with incorrect formatting.
5. Custom use-case: For monitoring other critical aspects that are specific to your use case. [Talk to founders](https://calendly.com/shahules/30min)
3. Bad response: This feature assists in recognizing and quantifying evasive, harmful, or toxic responses.
4. Bad format: This feature enables the detection and quantification of responses with incorrect formatting.
5. Custom use-case: For monitoring other critical aspects that are specific to your use case, [Talk to founders](https://calendly.com/shahules/30min).
14 changes: 7 additions & 7 deletions docs/getstarted/testset_generation.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
(get-started-testset-generation)=
# Synthetic test data generation
# Generate a Synthetic Test Set

This tutorial is designed to help you create a synthetic evaluation dataset for assessing your RAG pipeline. To achieve this, we will utilize open-ai models, so please ensure you have your OpenAI API key ready and accessible within your environment.
This tutorial is designed to help you create a synthetic evaluation dataset for assessing your RAG pipeline. To accomplish this, we will utilize OpenAI models. Please ensure you have your OpenAI API key ready and accessible within your environment.

```{code-block} python
import os
Expand All @@ -11,7 +11,7 @@ os.environ["OPENAI_API_KEY"] = "your-openai-key"

## Documents

To begin, we require a collection of documents to generate synthetic Question/Context/Answer samples. Here, we will employ the langchain document loader to load documents.
We first need a collection of documents to generate synthetic `Question/Context/Answer/Ground_Truth` samples. For this, we'll use the LangChain document loader to load documents.

```{code-block} python
:caption: Load documents from directory
Expand All @@ -21,21 +21,21 @@ documents = loader.load()
```

:::{note}
Each Document object contains a metadata dictionary, which can be used to store additional information about the document which can be accessed with `Document.metadata`. Please ensure that the metadata dictionary contains a key called `file_name` as this will be used in the generation process. The `file_name` attribute in metadata is used to identify chunks belonging to the same document. For example, pages belonging to the same research publication can be identifies using filename.
Each Document object contains a metadata dictionary, which can be used to store additional information about the document accessible via `Document.metadata`. Please ensure that the metadata dictionary contains a key called `file_name`, as this will be used in the generation process. The `file_name` attribute in metadata is used to identify chunks belonging to the same document. For instance, pages belonging to the same research publication can be identified using filename.

An example of how to do this is shown below.
Here's an example of how to do this:

```{code-block} python
for document in documents:
document.metadata['file_name'] = document.metadata['source']
```
:::

At this point, we have a set of documents at our disposal, which will serve as the basis for creating synthetic Question/Context/Answer triplets.
At this stage, we have a set of documents ready, which will be used as the foundation for creating synthetic `Question/Context/Answer/Ground_Truth` samples.
jjmachan marked this conversation as resolved.
Show resolved Hide resolved

## Data Generation

We will now import and use Ragas' `Testsetgenerator` to promptly generate a synthetic test set from the loaded documents.
We will now import and use Ragas' `TestsetGenerator` to swiftly generate a synthetic test set from the loaded documents.

```{code-block} python
:caption: Create 10 samples using default configuration
Expand Down
2 changes: 1 addition & 1 deletion docs/howtos/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
(how-to-guides)=
# How-to Guides
# 🛠️ How-to Guides


The how-to guides offer a more comprehensive overview of all the tools Ragas
Expand Down
2 changes: 1 addition & 1 deletion docs/references/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
.. _references:
References
📖 References
==========

Reference documents for the ``ragas`` package.
Expand Down
Loading