Implement LLM generations, logprobs, and XML parsing features #3053

devin-ai-integration · 2025-06-24T05:14:36Z

Implement LLM Generations, Logprobs, and XML Parsing Features

This PR implements the feature request from issue #3052, adding support for advanced LLM parameters and XML tag parsing capabilities to the CrewAI framework.

🚀 Features Added

1. LLM Multiple Generations Support

Added n parameter support to generate multiple completions
Added logprobs and top_logprobs parameters for accessing log probabilities
Added return_full_completion parameter to access complete response metadata

2. Agent-Level LLM Parameter Control

New Agent parameters: llm_n, llm_logprobs, llm_top_logprobs
New return_completion_metadata parameter for accessing generation metadata
Parameters are automatically passed through to the underlying LLM instance

3. XML Content Extraction Utility

New xml_parser.py utility for extracting content from XML tags
Support for extracting <thinking>, <reasoning>, <conclusion> and other custom tags
Functions for cleaning agent output by removing internal tags

4. Enhanced Output Classes

Extended TaskOutput with completion metadata and helper methods
Extended LiteAgentOutput with completion metadata support
New methods: get_generations(), get_logprobs(), get_usage_metrics()

📝 Usage Examples

Multiple Generations

from crewai import Agent, LLM

# Create agent with multiple generations
agent = Agent(
    role="writer",
    goal="write creative content",
    backstory="You are a creative writer",
    llm_n=3,  # Generate 3 different versions
    return_completion_metadata=True
)

result = agent.execute_task(task)
generations = result.get_generations()  # Access all 3 generations

XML Tag Extraction

from crewai.utilities.xml_parser import extract_xml_content

agent_output = """
<thinking>
Let me analyze this step by step...
</thinking>

Based on my analysis, the answer is 42.
"""

thinking = extract_xml_content(agent_output, "thinking")
print(thinking)  # "Let me analyze this step by step..."

Log Probabilities

agent = Agent(
    role="analyst",
    goal="analyze with confidence scores",
    backstory="You are a data analyst",
    llm_logprobs=5,  # Get top 5 log probabilities
    return_completion_metadata=True
)

result = agent.execute_task(task)
logprobs = result.get_logprobs()  # Access log probabilities
usage = result.get_usage_metrics()  # Access token usage

🧪 Testing

Added comprehensive test suite covering all new functionality
Integration tests for agent execution with multiple generations
XML parser tests with realistic agent output examples
Backward compatibility tests to ensure existing code continues to work

🔄 Backward Compatibility

All changes are fully backward compatible. Existing code will continue to work exactly as before. The new functionality is opt-in through new parameters and methods.

📁 Files Changed

Core Implementation

src/crewai/llm.py - Enhanced LLM class with new parameters and completion metadata
src/crewai/agent.py - Added LLM parameter support
src/crewai/lite_agent.py - Added completion metadata support
src/crewai/tasks/task_output.py - Enhanced with metadata and helper methods
src/crewai/utilities/agent_utils.py - Updated to handle completion metadata

New Utilities

src/crewai/utilities/xml_parser.py - XML content extraction utility

Tests and Examples

tests/test_llm_generations_logprobs.py - Core functionality tests
tests/test_integration_llm_features.py - Integration tests
tests/test_xml_parser_examples.py - XML parser tests
examples/llm_generations_example.py - Usage examples

🔗 Related

Fixes [FEATURE] How to obtain n generations or generations in different tags? #3052
Link to Devin run: https://app.devin.ai/sessions/0b0dd75fc13d4ec1a93c6cd68b00ac3c
Requested by: João (joao@crewai.com)

✅ Verification

The implementation has been tested with:

Basic functionality verification showing all features work correctly
XML parsing with various tag formats
Agent parameter passing and LLM integration
Completion metadata access and manipulation

All new features work as expected while maintaining full backward compatibility with existing CrewAI applications.

- Add support for n generations and logprobs parameters in LLM class - Extend Agent class to accept LLM generation parameters (llm_n, llm_logprobs, llm_top_logprobs) - Add return_full_completion parameter to access complete LLM response metadata - Implement XML parser utility for extracting content from tags like <thinking> - Add completion metadata support to TaskOutput and LiteAgentOutput classes - Add comprehensive tests and examples demonstrating new functionality - Maintain full backward compatibility with existing code Addresses issue #3052: How to obtain n generations or generations in different tags Co-Authored-By: João <joao@crewai.com>

joaomdmoura · 2025-06-24T05:18:12Z

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment

Overview

The recent pull request implements significant enhancements for handling multiple LLM generations, tracking log probabilities, and parsing structured XML content. The changes reflect a substantial upgrade across core components and include increased test coverage.

Key Components Analysis

1. LLM Class Enhancements

Strengths:
- The implementation supports multiple generations and handles completion metadata effectively.
- Configuration options provide added flexibility for end-users.

Specific Code Improvements:

Type Hints & Validation: It is advisable to include stricter type hints for method parameters and return types. For example:

def call(
    self,
    messages: Union[str, List[Dict[str, str]]],  # Enhance with strict types
    ...
) -> Union[str, Dict[str, Any]]:

2. XML Parser Implementation

Strengths:
- Robust handling of nested tags and comprehensive utility functions.
- Effective error handling allows for more resilient code.

Improvements Needed:

Validation & Sanitization: Adding validation checks to ensure the content integrity and avoid processing of malformed XML:

def extract_xml_content(text: str, tag: str) -> Optional[str]:
    if not isinstance(text, str) or not isinstance(tag, str):
        raise TypeError("Text and tag must be strings")

3. Agent Class Updates

Strengths:
- Enhancements enable dynamic adjustment of generation parameters, improving user interactions.

Areas for Improvement:

Ensure all parameters are validated to prevent inappropriate settings and conflicts.

For instance, add checks to validate integer parameters are non-negative:

@validator('llm_n', 'llm_logprobs')
def validate_llm_params(cls, v):
    if v is not None and not isinstance(v, int):
        raise ValueError(f"Parameter must be integer, got {type(v)}")
    return v

4. Task Output Enhancement

Strengths:
- New methods for accessing output details improve usability and tracking.
Implementation Suggestions:
- Limit the size of metadata to avoid excessive resource use during large operations.

Testing Coverage Analysis

The PR includes comprehensive test cases that affirm functionality, with tests for edge cases and integration scenarios.
Recommendations for Additional Tests:
- Include tests for boundary conditions and cases of invalid parameters to ensure robustness against diverse scenarios.

Performance Considerations

XML Parsing Optimization: Consider caching compiled regex patterns for efficiency.
Memory Usage: Utilize generator functions for output retrieval to lower memory footprint:
```
def get_generations(self) -> Generator[str, None, None]: ...
```

Security Recommendations

Ensure stringent input validation on XML content to prevent injection attacks.
Output sanitization should be implemented to clean potentially dangerous content before serving.

Documentation Recommendations

Enhance documentation to cover:
- Usage examples for new features.
- Guidelines on security best practices.
- Performance considerations and pitfalls.

In summary, while the implementation shows great promise, it requires focused improvements in security, validation, and performance optimization to ensure reliability and usability moving forward.

mplachta · 2025-06-24T05:18:59Z

Disclaimer: This review was made by a crew of AI Agents.

Code Review for PR #3053: Implement LLM Generations, Logprobs, and XML Parsing Features

Summary of Key Findings

This PR introduces significant, well-integrated features to enhance the LLM handling capabilities and output processing in crewAI:

Extended LLM class to support multiple generations (n parameter), detailed log probabilities (logprobs, top_logprobs), and optionally return full completion metadata including choices and usage.
Correspondingly extended the Agent to accept these new LLM parameters and pass them to the underlying LLM.
Added completion_metadata fields and accessor methods (e.g., get_generations(), get_logprobs()) to both LiteAgentOutput and TaskOutput to facilitate easy extraction of generational and probability data.
Created a dedicated XML parser utility to parse and extract multiple XML-like tags from agent output text, supporting use cases like structured reasoning and solution presentation.
Developed clear example scripts demonstrating usage of multiple generations, XML tag extraction, and log probability analysis.
Comprehensive and well-structured tests covering integration flows, unit tests of new methods, and XML parser correctness.
All changes preserve backward compatibility by making new behaviors opt-in with parameters like return_full_completion.

Detailed Review and Improvement Suggestions by Component

1. Examples (`examples/llm_generations_example.py`)

Code Quality: Clear and instructive example script with distinct showcases of multi-generation, XML parsing, and logprobs.
Improvements:
- Add type hints on example functions for better clarity and maintainability.
- Extract repeated print blocks into a helper to reduce code duplication and improve readability.
- Provide more descriptive output labels clarifying primary vs alternative generations to aid user understanding.
Example:
```
def print_generations(generations: list):
    print(f"Generated {len(generations)} alternatives:")
    for i, generation in enumerate(generations, 1):
        print(f"\nGeneration {i}:\n{generation}\n{'-'*30}")
```

2. Agent Class (`src/crewai/agent.py`)

Code Quality: New llm_n, llm_logprobs, llm_top_logprobs, return_completion_metadata fields are well documented and integrated in post-init setup.

Improvements:

Refactor repeated hasattr checks and direct attribute assignments into a loop to keep code concise and easier to maintain.
Add validation for numerical parameters to ensure they are non-negative integers, failing fast on invalid values with clear error messages.

Example:

def post_init_setup(self):
    ...
    llm_params = [('n', self.llm_n), ('logprobs', self.llm_logprobs), ('top_logprobs', self.llm_top_logprobs)]
    for attr, value in llm_params:
        if value is not None and hasattr(self.llm, attr):
            if isinstance(value, int) and value >= 0:
                setattr(self.llm, attr, value)
            else:
                raise ValueError(f"Agent parameter '{attr}' must be a non-negative integer")
    if hasattr(self.llm, 'return_full_completion'):
        self.llm.return_full_completion = self.return_completion_metadata
    ...

3. LiteAgent Output (`src/crewai/lite_agent.py`) and TaskOutput (`src/crewai/tasks/task_output.py`)

Code Quality: Both classes were extended to include completion_metadata and accessor methods that simplify retrieving multiple generations, log probabilities, and usage metrics.
Duplicates: The method implementations are nearly identical, violating DRY principles.

Improvements:

Extract the common extraction logic into a shared utility function or base class method.
Normalize "choices" so that downstream code does not need to handle mixed dict/object types at every access point.
Add more robust type hints or runtime checks to clarify and enforce expected metadata formats.

Example:

def get_generations(self) -> Optional[List[str]]:
    choices = self.completion_metadata.get("choices", [])
    generations = []
    for choice in choices:
        msg = getattr(choice, "message", None) or choice.get("message", {})
        content = getattr(msg, "content", "") or msg.get("content", "")
        generations.append(content)
    return generations or None

4. LLM Class (`src/crewai/llm.py`)

Code Quality: The core LLM class robustly supports returning full completion metadata while preserving existing simple text return modes.

Improvements:

Refactor repeated construction of full completion metadata dict into a dedicated helper method to reduce code duplication.
Add stricter runtime validation for structure of response objects to avoid errors when underlying third party shapes change unexpectedly.
Consider creating a completion metadata model or data class to improve maintainability, typing, and IDE support.
Enhance docstrings to explicitly mention the impact on return types when using return_full_completion.

Example:

def _make_completion_metadata(self, response, content):
    return {
        "content": content,
        "choices": getattr(response, "choices", []),
        "usage": getattr(response, "usage", None),
        "model": getattr(response, "model", None),
        "created": getattr(response, "created", None),
        "id": getattr(response, "id", None),
        "object": getattr(response, "object", "chat.completion"),
        "system_fingerprint": getattr(response, "system_fingerprint", None),
    }

5. Utilities (`src/crewai/utilities/agent_utils.py`)

Code Quality: Extended get_llm_response to support full completion metadata and added consistent error checking.
Improvements:
- Unify empty or invalid response checks to a single place for cleaner logic.
- Optionally add logging or debugging when full metadata responses are returned.
- In process_llm_response, make explicit the expectation of either dict or string input and clarify processing steps.

6. XML Parser Utilities (`src/crewai/utilities/xml_parser.py`)

Code Quality: Well-designed regex-based utilities for extracting, removing, and stripping XML-like tags, with support for attributes.
Improvements:
- Document explicitly that these utilities assume well-formed, flat pseudo-XML and may not support nested or malformed tags fully.
- (Optional) Add warnings or errors when tags are unbalanced or malformed to improve robustness.
- Consider adding functionality or documentation about handling nested tags or escaping.

7. Tests

Coverage: Thorough unit and integration tests cover LLM features, agent interactions, metadata extraction, XML parsing, and more.
Improvements:
- Refactor repeated mock completion setup into pytest fixtures to simplify test code and improve reuse.
- Add negative tests for malformed or incomplete completion metadata cases to ensure robustness.
- Include docstrings on all test methods for clarity and ease of understanding test intent.

Historical and Contextual Notes

This PR addresses issue #3052, responding to user requests on how to obtain multiple LLM generations or tag-structured outputs.
The approach respects backward compatibility by using optional parameters and toggles.
The XML parser is a new lightweight parsing utility tailored for the typical form of agent outputs, avoiding heavy XML dependencies.
Tests confirm robust evidence of correct and expected behavior in expanded features.

Summary Table of Recommendations

Area	Issue / Suggestion	Suggested Improvement
`agent.py`	Repetitive param assignment and lack of validation	Refactor looping assignments and add validation
`llm.py`	Code duplication constructing completion dict	Extract helper method for dict construction
`lite_agent.py`, `task_output.py`	Duplicate metadata extraction methods	Extract shared utilities, normalize choice data
`agent_utils.py`	Repeated error checking	Consolidate error handling and log selectively
`xml_parser.py`	Limited malformed tag handling, no nested support	Document limitations, consider adding error warnings
Tests	Repeated mocks, missing negative cases	Use fixtures, add edge case tests and docstrings
Examples	Code duplication in print, missing type hints	Helper functions, add type hints

Final Remarks

This PR is a high-quality, impactful enhancement that meaningfully expands the crewAI LLM integration and output utility. It carefully preserves legacy behavior while enabling advanced use cases such as multiple generations, confidence metrics via logprobs, and easily parsed structured reasoning via XML tags.

Addressing the above noted improvements around DRY principles, validation, and documentation will improve maintainability and robustness. The comprehensive test coverage demonstrates attention to quality.

Thank you for considering these points; happy to discuss or help with any follow-up refinements!

- Remove unused imports from test and example files - Fix f-string formatting in examples - Add proper type handling for Union[str, Dict] in agent_utils.py - Fix XML parser test assertion to match expected output - Use isinstance() for proper type narrowing in LLM calls Co-Authored-By: João <joao@crewai.com>

- Fix type checker errors in reasoning_handler.py: handle Union[str, dict] response types - Fix type checker error in crew_chat.py: convert final_response to string for dict - Update test_task_callback_returns_task_output to include completion_metadata field - Fix integration test attribute access in test_lite_agent_with_xml_extraction Co-Authored-By: João <joao@crewai.com>

- Fix test_crew_with_llm_parameters: mock _run_sequential_process instead of kickoff to avoid circular mocking - Fix test_lite_agent_with_xml_extraction: access result.raw instead of result.output for LiteAgentOutput Co-Authored-By: João <joao@crewai.com>

…string - Mock _invoke_loop to return proper AgentFinish object with output attribute - This should resolve the 'str' object has no attribute 'output' error in CI Co-Authored-By: João <joao@crewai.com>

devin-ai-integration · 2025-07-02T16:14:24Z

Closing due to inactivity for more than 7 days. Configure here.

devin-ai-integration bot and others added 4 commits June 24, 2025 05:21

Fix LiteAgent test mocking: return AgentFinish object instead of raw …

76fea44

…string - Mock _invoke_loop to return proper AgentFinish object with output attribute - This should resolve the 'str' object has no attribute 'output' error in CI Co-Authored-By: João <joao@crewai.com>

devin-ai-integration bot closed this Jul 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement LLM generations, logprobs, and XML parsing features #3053

Implement LLM generations, logprobs, and XML parsing features #3053

Uh oh!

devin-ai-integration bot commented Jun 24, 2025

Uh oh!

joaomdmoura commented Jun 24, 2025

Uh oh!

mplachta commented Jun 24, 2025

Uh oh!

devin-ai-integration bot commented Jul 2, 2025

Uh oh!

Uh oh!

Implement LLM generations, logprobs, and XML parsing features #3053

Implement LLM generations, logprobs, and XML parsing features #3053

Uh oh!

Conversation

devin-ai-integration bot commented Jun 24, 2025

Implement LLM Generations, Logprobs, and XML Parsing Features

🚀 Features Added

1. LLM Multiple Generations Support

2. Agent-Level LLM Parameter Control

3. XML Content Extraction Utility

4. Enhanced Output Classes

📝 Usage Examples

Multiple Generations

XML Tag Extraction

Log Probabilities

🧪 Testing

🔄 Backward Compatibility

📁 Files Changed

Core Implementation

New Utilities

Tests and Examples

🔗 Related

✅ Verification

Uh oh!

joaomdmoura commented Jun 24, 2025

Code Review Comment

Overview

Key Components Analysis

1. LLM Class Enhancements

2. XML Parser Implementation

3. Agent Class Updates

4. Task Output Enhancement

Testing Coverage Analysis

Performance Considerations

Security Recommendations

Documentation Recommendations

Uh oh!

mplachta commented Jun 24, 2025

Code Review for PR #3053: Implement LLM Generations, Logprobs, and XML Parsing Features

Summary of Key Findings

Detailed Review and Improvement Suggestions by Component

1. Examples (examples/llm_generations_example.py)

2. Agent Class (src/crewai/agent.py)

3. LiteAgent Output (src/crewai/lite_agent.py) and TaskOutput (src/crewai/tasks/task_output.py)

4. LLM Class (src/crewai/llm.py)

5. Utilities (src/crewai/utilities/agent_utils.py)

6. XML Parser Utilities (src/crewai/utilities/xml_parser.py)

7. Tests

Historical and Contextual Notes

Summary Table of Recommendations

Final Remarks

Uh oh!

devin-ai-integration bot commented Jul 2, 2025

Uh oh!

Uh oh!

1. Examples (`examples/llm_generations_example.py`)

2. Agent Class (`src/crewai/agent.py`)

3. LiteAgent Output (`src/crewai/lite_agent.py`) and TaskOutput (`src/crewai/tasks/task_output.py`)

4. LLM Class (`src/crewai/llm.py`)

5. Utilities (`src/crewai/utilities/agent_utils.py`)

6. XML Parser Utilities (`src/crewai/utilities/xml_parser.py`)