The quanteda.llm package makes it easy to use LLMs with quanteda corpora (or character vectors and data frames), to enable classification, summarisation, scoring, and analysis of documents and text. quanteda provides a host of convenient functions for managing, manipulating, and describing corpora as well as linking their document variables and metadata to these documents. quanteda.llm makes it convenient to link these to LLMs for analysing or classifying these texts, creating new variables from what is created by the LLMs.
The package includes the following functions:
ai_text()
:- A generic function that can be used with any LLM supported by
ellmer
. - Generates structured responses or classifications based on
pre-defined instructions for texts in a
quanteda corpus
. - Users can flexibly define prompts and structure of responses via
type_object()
from theellmer
package. - Users can add a dataset with examples to improve LLM performance (few-shot prompting)
- Supports resuming interrupted processes in a
result_env
environment.
- A generic function that can be used with any LLM supported by
ai_validate()
:- Starts an interactive app to manually validate the LLM-generated outputs.
- Allows users to review and validate the LLM-generated outputs and justifications, marking them as valid or invalid.
- Supports resuming the validation process in case of interruptions in
a
result_env
environment.
ai_summary()
:- A wrapper around
ai_text()
for summarizing documents in a corpus. - Uses a pre-defined
type_object()
to structure the summary output.
- A wrapper around
ai_salience()
:- A wrapper around
ai_text()
for computing salience scores for topics in a corpus. - Uses a pre-defined
type_object()
to structure the salience classification output.
- A wrapper around
ai_score()
:- A wrapper around
ai_text()
for scoring documents based on a scale defined by a prompt. - Uses a pre-defined
type_object()
to structure the scoring output.
- A wrapper around
The package supports all LLMs currently available with the ellmer
package, including:
- Anthropic’s Claude:
chat_anthropic()
. - AWS Bedrock:
chat_aws_bedrock()
. - Azure OpenAI:
chat_azure_openai()
. - Cloudflare:
chat_cloudflare()
. - Databricks:
chat_databricks()
. - DeepSeek:
chat_deepseek()
. - GitHub model marketplace:
chat_github()
. - Google Gemini/Vertex AI:
chat_google_gemini()
,chat_google_vertex()
. - Groq:
chat_groq()
. - Hugging Face:
chat_huggingface()
. - Mistral:
chat_mistral()
. - Ollama:
chat_ollama()
. - OpenAI:
chat_openai()
. - OpenRouter:
chat_openrouter()
. - perplexity.ai:
chat_perplexity()
. - Snowflake Cortex:
chat_snowflake()
andchat_cortex_analyst()
. - VLLM:
chat_vllm()
.
For authentication and usage of each of these LLMs, please refer to the
respective ellmer
documentation
here. For
example, to use the chat_openai
models, you would need to sign up
for an API key from
OpenAI which you can
save in your .Renviron
file as OPENAI_API_KEY
. To use the
chat_ollama
models, first download and install
Ollama. Then install some models either from the
command line (e.g. with ollama pull llama3.1) or within R using the
rollama
package. The Ollama app must be running for the models to be
used.
You can install the development version of quanteda.llm from https://github.com/quanteda/quanteda.llm with:
# install.packages("pak")
pak::pak("quanteda/quanteda.llm")
pak::pak("quanteda/quanteda.tidy")
To learn more about how to use the package, please refer to the following examples: