Using submodular optimization for context engineering. Examples of text selection, query fan-out, passage reranking are included in this repo.
Highly recommended before diving into the code:
- Submodular Optimization for Text Selection, Passage Reranking & Context Engineering. Examples of text selection and passage reranking w/ submodular optimization.
- Submodular Optimization for Diverse Query Generation in DeepResearch. Example of query fan-out w/ submodular optimization.
npm install
export GOOGLE_GENERATIVE_AI_API_KEY=your_api_key_here
export JINA_API_KEY=your_jina_api_key_here
npm run generate <prompt-file> "your query" [num_queries_or_range]
npm run embed <output-file>
node submodular_optimization.js <k> # Single k value
node submodular_optimization.js <start>-<end> # Range of k values
npm run generate prompt-v1.txt "machine learning" 5
npm run generate prompt-v2.txt "climate change" 2-10
npm run embed output-prompt-v1.txt.json
node submodular_optimization.js 5 # Select 5 optimal queries
node submodular_optimization.js 1-20 # Select 1-20 queries iteratively
output-<prompt-file>.json
- Generated query stringsoutput-<prompt-file>.embeddings.json
- Query embeddings
output-prompt-v1.txt.submodular.embeddings.json
- Optimized query embeddings
The submodular optimization uses a lazy greedy algorithm that:
- Maximizes diversity by selecting queries that cover different aspects of the topic
- Maintains relevance by considering similarity to the original query
- Uses cosine similarity for measuring query relationships
- Implements lazy evaluation for computational efficiency
The objective function balances:
- Relevance: How well queries match the original topic (weighted by α=0.3)
- Coverage: How well selected queries cover the candidate set
- Diversity: How different the selected queries are from each other
submodular-optimization/
├── submodular_optimization.js # Main optimization algorithm
├── prompt-v*.txt # Query generation prompts
├── output-prompt-v*.json # Generated query strings
├── output-prompt-v*.embeddings.json # Original embeddings
└── output-prompt-v*.submodular.embeddings.json # Optimized embeddings