-
Notifications
You must be signed in to change notification settings - Fork 74
Add classifyCategories function for llm scorecard #794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nlarew
wants to merge
1
commit into
main
Choose a base branch
from
llm-scorecard-categories
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
149 changes: 149 additions & 0 deletions
149
packages/scripts/src/llm-scorecard/classifyCategories.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
import { | ||
type ClassificationType, | ||
getEnv, | ||
makeClassifier, | ||
} from "mongodb-rag-core"; | ||
import { AzureOpenAI } from "mongodb-rag-core/openai"; | ||
|
||
const env = getEnv({ | ||
required: ["OPENAI_API_KEY", "OPENAI_ENDPOINT", "OPENAI_API_VERSION"], | ||
}); | ||
|
||
const openAiClient = new AzureOpenAI({ | ||
apiKey: env.OPENAI_API_KEY, | ||
endpoint: env.OPENAI_ENDPOINT, | ||
apiVersion: env.OPENAI_API_VERSION, | ||
}); | ||
|
||
const classificationTypes: ClassificationType[] = [ | ||
{ | ||
type: "Advanced Features", | ||
description: | ||
"Prompts that only apply to specific use cases or complex features.", | ||
examples: [ | ||
{ text: "does mongodb support transactions" }, | ||
{ text: "how to use mongodump" }, | ||
{ text: "How do I backup a MongoDB database" }, | ||
{ text: "How do I use mongorestore from dump" }, | ||
{ | ||
text: "What's the difference between ANN and ENN search in Atlas Vector Search?", | ||
}, | ||
{ text: "What are mongosync limitations" }, | ||
{ text: "how to use gridfs in mongodb" }, | ||
{ | ||
text: "Does Atlas Vector Search work with images, media files, and other types of data?", | ||
}, | ||
], | ||
}, | ||
{ | ||
type: "AI/LLM Integration", | ||
description: | ||
"Prompts that are about MongoDB's AI/LLM integration features.", | ||
examples: [ | ||
{ text: "How do I build AI applications with MongoDB?" }, | ||
{ | ||
text: "Does MongoDB support LangGraph checkpointers? If so, are they asynchronous or synchronous?", | ||
}, | ||
{ text: "How does MongoDB help with AI projects?" }, | ||
{ text: "Can I use MongoDB for RAG implementations? How?" }, | ||
{ text: "Does MongoDB offer support for developing AI applications?" }, | ||
{ text: "Does MongoDB generate embeddings?" }, | ||
{ text: "What is Retrieval-augmented generation?" }, | ||
{ text: "How will MongoDB and Voyage AI work together?" }, | ||
], | ||
}, | ||
{ | ||
type: "Foundational Concepts", | ||
description: "Prompts that are about MongoDB's core features and concepts.", | ||
examples: [ | ||
{ text: "explain indexes in mongodb" }, | ||
{ text: "when to use findone vs find in mongodb" }, | ||
{ text: "What is a mongodb change streams example" }, | ||
{ text: "what's the difference between updateone and findoneandupdate" }, | ||
{ text: "What is the mongodb list collections command" }, | ||
{ text: "What is MongoDB?" }, | ||
{ text: "What is aggregation in MongoDB" }, | ||
{ text: "how many authentication methods for MongoDB" }, | ||
], | ||
}, | ||
{ | ||
type: "Positioning", | ||
description: | ||
"Prompts that position MongoDB in the market relative to other solutions.", | ||
examples: [ | ||
{ | ||
text: "What specific advantages does the new Atlas Flex tier provide over traditional serverless models?", | ||
}, | ||
{ | ||
text: "What are the key differentiators when comparing MongoDB to Azure Data Explorer (ADX)?", | ||
}, | ||
{ text: "How does MongoDB compare to Postgres?" }, | ||
{ text: "How is MongoDB used by companies in the energy industry?" }, | ||
{ | ||
text: "Are there any case studies demonstrating MongoDB’s effectiveness?", | ||
}, | ||
{ | ||
text: "How does the new pricing model of the new Atlas Flex tier ensure more predictability compared to previous offerings?", | ||
}, | ||
{ text: "How can I migrate from MySQL to MongoDB?" }, | ||
{ text: "What industries use MongoDB?" }, | ||
], | ||
}, | ||
{ | ||
type: "Practical Usage & Queries", | ||
description: | ||
"Prompts that are about how to use MongoDB in concrete scenarios.", | ||
examples: [ | ||
{ text: "how to get connection string from mongodb atlas" }, | ||
{ text: "command to create new collection" }, | ||
{ text: "What are the installation steps for mongodb compass" }, | ||
{ text: "What is the mongodb filter query for a nested object" }, | ||
{ text: "connect to mongodb nodejs" }, | ||
{ text: "How do you use not equal in MongoDB for multiple values" }, | ||
{ text: "how to query mongodb collection" }, | ||
{ | ||
text: "What are the step by step setup instructions for replication in mongodb with linux", | ||
}, | ||
], | ||
}, | ||
{ | ||
type: "Troubleshooting & Best Practices", | ||
description: | ||
"Prompts that ask about bugs, performance, and other MongoDB best practices.", | ||
examples: [ | ||
{ text: "What limitations for mongodb time series" }, | ||
{ text: "are there any best practices for mongodb crud operations" }, | ||
{ text: "What are the common exceptions for the mongodb java driver" }, | ||
{ text: "mongodb ttl not working" }, | ||
{ text: "How can Atlas users specify maintenance timing?" }, | ||
{ | ||
text: "I'm trying to use Compass with DocumentDB, and I keep running into unexpected behavior. For example, collection and database stats don't render, and I can't analyze my schema. Is there a workaround? ", | ||
}, | ||
{ | ||
text: "Why can't I read my own writes with a numbered write concern and read concern majority?", | ||
}, | ||
{ text: "I have enough memory, how can I further improve performance?" }, | ||
], | ||
}, | ||
{ | ||
type: "General Information", | ||
description: | ||
"Prompts that are related to MongoDB but not directly about the product, such as release notes, documentation, and other general information.", | ||
examples: [ | ||
{ text: "what's new in mongodb 8" }, | ||
{ text: "Where is the official MongoDB documentation" }, | ||
{ text: "Where are mongodb release notes" }, | ||
{ text: "Can I hire MongoDB developers to build my application?" }, | ||
{ text: "Is MongoDB currently hiring?" }, | ||
{ | ||
text: "Where can I find the changes in the newest version of the MongoDB Administration API?", | ||
}, | ||
], | ||
}, | ||
]; | ||
|
||
export const classifyCategories = makeClassifier({ | ||
openAiClient, | ||
model: "gpt-4.1-mini", | ||
classificationTypes, | ||
}); |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This inline definition of many category examples can become hard to maintain as you add new categories. Consider loading this data from an external JSON or CSV file and transforming it into
classificationTypes
to keep the code concise.Copilot uses AI. Check for mistakes.