Fix performance issue: cache agent knowledge to avoid reloading on every kickoff #3077

devin-ai-integration · 2025-06-27T09:53:02Z

Fix: Cache agent knowledge to prevent unnecessary reloading on repeated kickoffs

Summary

This PR implements a caching mechanism in the Agent.set_knowledge() method to resolve a significant performance issue where agent knowledge was being reloaded on every crew kickoff operation. The issue was occurring in crew.py line 645 where knowledge sources were being processed (chunked, embedded, stored) unnecessarily on each kickoff, causing substantial performance overhead.

Key Changes:

Added knowledge state tracking with private attributes _knowledge_loaded, _last_embedder, _last_knowledge_sources
Modified set_knowledge() to skip reloading when knowledge hasn't changed
Added reset_knowledge_cache() method for explicit cache clearing when needed
Added comprehensive test coverage for caching behavior and edge cases

The caching mechanism intelligently detects when knowledge needs to be reloaded (when sources or embedder changes) while preventing redundant processing when the same agent is used across multiple kickoffs.

Review & Testing Checklist for Human

Verify cache invalidation logic - Test that knowledge is properly reloaded when knowledge sources or embedder configurations change, and NOT reloaded when they stay the same
End-to-end performance testing - Create a crew with knowledge sources and run multiple kickoffs to verify the performance improvement actually occurs
Test edge cases - Verify behavior with different knowledge source types, embedder configurations, and the reset_knowledge_cache() method
Backward compatibility - Ensure existing workflows still work correctly with the new caching behavior

Recommended Test Plan:

Create an agent with knowledge sources (e.g., StringKnowledgeSource)
Run crew.kickoff() multiple times and measure/verify that knowledge loading only happens once
Change knowledge sources mid-way and verify knowledge gets reloaded
Test with different embedder configurations to ensure cache invalidation works

Diagram

graph TD
    crew[src/crewai/crew.py]
    agent[src/crewai/agent.py]:::major-edit
    knowledge[src/crewai/knowledge/knowledge.py]:::context
    agent_tests[tests/agent_test.py]:::major-edit
    
    crew -->|calls set_knowledge| agent
    agent -->|creates/caches| knowledge
    agent_tests -->|tests caching behavior| agent
    
    subgraph "Agent Caching Logic"
        cache_check[Check _knowledge_loaded flag]
        compare_state[Compare _last_embedder & _last_knowledge_sources]
        skip_load[Skip knowledge loading]
        load_knowledge[Load knowledge & update cache]
        
        cache_check --> compare_state
        compare_state -->|same| skip_load
        compare_state -->|different| load_knowledge
    end
    
    subgraph Legend
        L1[Major Edit]:::major-edit
        L2[Minor Edit]:::minor-edit  
        L3[Context/No Edit]:::context
    end

classDef major-edit fill:#90EE90
classDef minor-edit fill:#87CEEB
classDef context fill:#FFFFFF

Notes

Performance Impact: This fix addresses issue [BUG]crew.py reloads memory on every kickoff causing performance issues #3076 where repeated kickoffs caused significant performance degradation due to unnecessary knowledge reprocessing
Cache Strategy: Uses simple state comparison (embedder config + knowledge sources) to determine when cache is valid
Memory Considerations: Cache stores references to knowledge sources and embedder configs - monitor for potential memory usage in long-running applications
Thread Safety: Current implementation is not thread-safe - consider this if agents are used in multi-threaded environments

- Add caching mechanism in Agent.set_knowledge to track loaded state - Skip knowledge reloading when sources and embedder haven't changed - Add reset_knowledge_cache method for explicit cache clearing - Add comprehensive tests for caching behavior and edge cases - Fixes issue #3076 performance overhead on repeated kickoffs Co-Authored-By: João <joao@crewai.com>

devin-ai-integration · 2025-06-27T09:53:04Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

joaomdmoura · 2025-06-27T09:54:55Z

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comments on Knowledge Caching Implementation

Overview

This pull request effectively implements a caching strategy for knowledge in the Agent class to optimize performance during repeated operations. The changes are made to agent.py and agent_test.py, and while the implementation demonstrates good practices, several key areas could benefit from refinement.

File: `src/crewai/agent.py`

Positive Aspects

Effective Caching Mechanism: The integration of a caching mechanism using instance attributes is well-conceived. This minimizes repetitive loading during multiple agent kickoffs.
Clear Cache Invalidation Logic: The logic to invalidate the cache is structured clearly, ensuring that the state is accurately managed.
Error Handling: The implementation incorporates specific exceptions which strengthen the robustness of the functionality.
Added Reset Functionality: A reset feature allows manual management of the cache, enhancing flexibility.

Areas for Improvement

Cache State Variables Naming

Current:

self._knowledge_loaded = True
self._last_embedder = current_embedder
self._last_knowledge_sources = self.knowledge_sources

Suggested:

self.__knowledge_cache = {
    'loaded': True,
    'embedder': current_embedder,
    'sources': self.knowledge_sources.copy() if self.knowledge_sources else None
}

Reasoning: Consolidating cache-related variables into a single attribute enhances encapsulation and clarity.

Cache Validation Logic Extraction

Suggested Implementation:

def _is_knowledge_cache_valid(self, current_embedder):
    if not hasattr(self, '__knowledge_cache'):
        return False
    return (self.__knowledge_cache['loaded'] and 
            self.knowledge is not None and
            self.__knowledge_cache['embedder'] == current_embedder and
            self.__knowledge_cache['sources'] == self.knowledge_sources)

Reasoning: Extracting validation into a separate method can improve readability and maintainability.

Knowledge Sources Copying

Current:

self._last_knowledge_sources = self.knowledge_sources.copy() if self.knowledge_sources else None

Suggested:

self._last_knowledge_sources = copy.deepcopy(self.knowledge_sources) if self.knowledge_sources else None

Reasoning: deepcopy ensures that the cached sources are entirely independent of the original sources, safeguarding integrity.

Type Hints
- Suggested Addition:
```
def reset_knowledge_cache(self) -> None:
```
- Reasoning: Including type hints can enhance documentation and improve IDE support.

File: `tests/agent_test.py`

Positive Aspects

Comprehensive Test Coverage: The test file demonstrates thorough coverage for the new caching functionality, emphasizing both positive and negative cases.
Clarity in Test Descriptions: Good test case descriptions provide immediate context.
Effective Use of Mocking: The mocking strategies employed enhance the credibility of the tests.

Areas for Improvement

Setup Duplication:

Current: Setup code is frequently repeated in tests.

Suggested:

@pytest.fixture
def test_agent():
    content = "Brandon's favorite color is blue."
    return Agent(
        role="Researcher",
        goal="Research about Brandon",
        backstory="You are a researcher.",
        knowledge_sources=[StringKnowledgeSource(content=content)]
    )

Reasoning: Using fixtures minimizes duplication and enhances maintainability.

Assertion Messages:

Current:
```
assert mock_add_sources.call_count == 1
```

Suggested:

assert mock_add_sources.call_count == 1, "Knowledge sources should only be loaded once for cached content"

Reasoning: Adding messages to assertions helps in debugging failed tests.

Edge Case Coverage:

Suggested Additional Test:

def test_agent_knowledge_cache_with_invalid_sources():
    agent = test_agent()
    agent.knowledge_sources = ["invalid source"]
    
    with pytest.raises(ValueError) as exc_info:
        agent.set_knowledge()
    assert "Invalid Knowledge Configuration" in str(exc_info.value)

Reasoning: Adding tests for edge cases ensures your logic can handle errors gracefully.

General Recommendations

Documentation: Expand docstrings to explain the caching mechanism and the conditions for cache invalidation. Comments will enhance understanding for future maintenance.
Performance Monitoring: Introduce logging for cache hits and misses to optimize performance and resource management, especially for larger datasets.
Code Organization: Consider encapsulating caching logic in a dedicated mixin for clarity and reusability. This can simplify the Agent class implementation.

Conclusion

The PR enhances the Agent class's functionality by implementing an effective caching strategy that significantly improves performance. With a few refinements in code quality and testing strategies, the implementation can be made even more robust and maintainable. Overall, excellent work on the implementation!

- Add proper PrivateAttr declarations for cache attributes to fix mypy errors - Simplify tests to focus on set_knowledge method directly instead of full kickoff - Remove network calls and invalid method mocking from tests - All knowledge caching functionality verified working locally Co-Authored-By: João <joao@crewai.com>

- Remove hasattr/getattr calls that caused mypy type-checker errors - Fix test mocking to use 'crewai.agent.Knowledge' for proper isolation - Prevent network calls in tests by mocking Knowledge class constructor - All knowledge-related tests now pass locally without API dependencies Co-Authored-By: João <joao@crewai.com>

- Update VCR cassettes for knowledge-related tests - Ensures CI has consistent test recordings Co-Authored-By: João <joao@crewai.com>

devin-ai-integration bot and others added 3 commits June 27, 2025 10:01

update: refresh test cassettes after local test runs

255758e

- Update VCR cassettes for knowledge-related tests - Ensures CI has consistent test recordings Co-Authored-By: João <joao@crewai.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix performance issue: cache agent knowledge to avoid reloading on every kickoff #3077

Fix performance issue: cache agent knowledge to avoid reloading on every kickoff #3077

Uh oh!

devin-ai-integration bot commented Jun 27, 2025 •

edited

Loading

Uh oh!

devin-ai-integration bot commented Jun 27, 2025

Uh oh!

joaomdmoura commented Jun 27, 2025

Uh oh!

Uh oh!

Fix performance issue: cache agent knowledge to avoid reloading on every kickoff #3077

Are you sure you want to change the base?

Fix performance issue: cache agent knowledge to avoid reloading on every kickoff #3077

Uh oh!

Conversation

devin-ai-integration bot commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix: Cache agent knowledge to prevent unnecessary reloading on repeated kickoffs

Summary

Review & Testing Checklist for Human

Diagram

Notes

Uh oh!

devin-ai-integration bot commented Jun 27, 2025

🤖 Devin AI Engineer

Uh oh!

joaomdmoura commented Jun 27, 2025

Code Review Comments on Knowledge Caching Implementation

Overview

File: src/crewai/agent.py

Positive Aspects

Areas for Improvement

File: tests/agent_test.py

Positive Aspects

Areas for Improvement

General Recommendations

Conclusion

Uh oh!

Uh oh!

devin-ai-integration bot commented Jun 27, 2025 •

edited

Loading

File: `src/crewai/agent.py`

File: `tests/agent_test.py`