Skip to content

Major changes in the documentation and schemas created #99

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 68 commits into
base: main
Choose a base branch
from
Open

Conversation

ypriverol
Copy link
Member

@ypriverol ypriverol commented Jun 24, 2025

PR Type

Enhancement


Description

  • Major CLI restructuring with organized command groups

  • Enhanced logging system with structured output support

  • New quantms workflow conversion command

  • Improved error handling and verbose logging


Changes walkthrough 📝

Relevant files
Enhancement
28 files
quantms.py
Add quantms workflow conversion command                                   
+318/-0 
diann.py
Add DIA-NN conversion commands                                                     
+256/-0 
psm.py
Add PSM conversion and comparison commands                             
+184/-0 
feature.py
Add feature conversion command                                                     
+167/-0 
plot.py
Add visualization plotting commands                                           
+207/-0 
quantmsioc.py
Restructure CLI with organized command groups                       
+87/-42 
de.py
Add differential expression transformation command             
+128/-0 
stats.py
Add statistical analysis commands                                               
+114/-0 
maxquant.py
Update MaxQuant conversion commands                                           
+31/-21 
psm.py
Add enhanced logging to PSM processing                                     
+17/-6   
feature.py
Add enhanced logging to feature processing                             
+19/-1   
project.py
Add project management commands                                                   
+84/-0   
ae.py
Add absolute expression transformation command                     
+82/-0   
format.py
Update schema with nullable fields                                             
+6/-0     
fragpipe.py
Update FragPipe PSM conversion method                                       
+24/-21 
gene.py
Add gene mapping transformation command                                   
+66/-0   
project.py
Update project save methods                                                           
+22/-7   
fragpipe.py
Update FragPipe conversion command                                             
+15/-7   
uniprot.py
Add UniProt mapping transformation command                             
+56/-0   
ibaq.py
Add IBAQ transformation command                                                   
+55/-0   
spectra.py
Add spectra transformation command                                             
+58/-0   
anndata.py
Update AnnData merge command                                                         
+20/-9   
attach.py
Update file attachment command                                                     
+6/-6     
mztab.py
Add logging to MzTab class                                                             
+3/-1     
logger.py
Add logger utility module                                                               
+10/-0   
__init__.py
Add convert commands package                                                         
+3/-0     
__init__.py
Add transform commands package                                                     
+3/-0     
__init__.py
Add utils commands package                                                             
+3/-0     
Bug fix
1 files
pride_utils.py
Fix regex patterns with raw strings                                           
+2/-2     
Documentation
7 files
feature.avsc
Update feature schema with scan format                                     
+18/-10 
README.md
Major documentation update with usage examples                     
+168/-1 
psm.avsc
Update PSM schema with record types                                           
+11/-7   
README_LOGGING.md
Add comprehensive logging documentation                                   
+127/-0 
differential.avsc
Add differential expression schema                                             
+51/-0   
ibaq_usecase.adoc
Update IBAQ use case documentation                                             
+7/-7     
absolute.avsc
Add absolute expression schema                                                     
+46/-0   
Dependencies
2 files
enviroment.yml
Update DuckDB version requirement                                               
+1/-1     
pyproject.toml
Update Python and PyOpenMS version requirements                   
+2/-2     
Additional files
13 files
ae_command.py +0/-60   
de_command.py +0/-67   
diann_command.py +0/-191 
feature_command.py +0/-102 
generate_gene_message_command.py +0/-49   
generate_spectra_message_command.py +0/-46   
ibaq_command.py +0/-52   
map_latest_uniport_command.py +0/-44   
plot_command.py +0/-126 
project_command.py +0/-81   
psm_command.py +0/-79   
quantms_command.py +0/-140 
statistic_command.py +0/-104 

Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Summary by CodeRabbit

    • New Features

      • Introduced CLI command groups for conversion, transformation, visualization, statistics, and project management, enhancing command-line organization.
      • Added support for MaxQuant and quantms project data conversion with protein group processing and batch workflows.
      • Added new CLI commands for gene and spectrum mapping, UniProt mapping, merging absolute expression files, and differential expression conversion.
      • Introduced a new optimized mzTab parser with DuckDB integration and a protein group quantification module using SQL aggregation.
      • Added logger configuration utility and enhanced resource management in core components.
      • Added comprehensive GitHub Actions workflows for scheduled, unit, integration, and full test suites.
    • Improvements

      • Refactored CLI commands for consistent hyphenated option naming and grouped commands logically.
      • Enhanced data schemas with nullable fields, renamed intensity structures, and added peptide and feature counts for protein groups.
      • Improved logging, error handling, and resource cleanup across modules.
      • Updated documentation with detailed user, developer, logging, and implementation guides.
      • Reorganized imports and standardized code style.
    • Bug Fixes

      • Corrected typographical errors in documentation and CLI command examples.
      • Fixed regex pattern usage in gene name extraction for robustness.
    • Tests

      • Added extensive integration tests covering full conversion workflows for LFQ and TMT datasets.
      • Added unit and integration tests for MaxQuant protein group transformations and msstats input processing.
      • Ensured proper resource cleanup in tests to avoid dangling database connections.
    • Chores

      • Updated dependency versions and raised minimum Python requirement to 3.10.
      • Expanded .gitignore to exclude additional temporary and generated files.
      • Removed deprecated CLI commands and files, consolidating functionality into new modules.

    Copy link
    Contributor

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 23

    ♻️ Duplicate comments (7)
    quantmsio/commands/utils/project.py (1)

    77-79: Fix typos in parameter names.

    The parameter names have typos: sortware_name and sortware_version should be software_name and software_version.

    -    project_handler.add_software_provider(
    -        sortware_name=software_name, sortware_version=software_version
    -    )
    +    project_handler.add_software_provider(
    +        software_name=software_name, software_version=software_version
    +    )
    quantmsio/commands/transform/de.py (1)

    67-68: Improve exception chaining.

    Use raise ... from e to preserve the original exception context for better debugging.

    -        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
    +        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
    quantmsio/commands/convert/maxquant.py (3)

    98-99: Fix exception chaining to preserve error context.

    When re-raising exceptions, use from e to preserve the error chain for better debugging.

    -        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
    +        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e

    217-218: Fix exception chaining to preserve error context.

    Same issue as above - use proper exception chaining.

    -        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
    +        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e

    319-320: Fix exception chaining to preserve error context.

    Same issue as above - use proper exception chaining.

    -        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
    +        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
    quantmsio/commands/convert/diann.py (2)

    91-93: Fix exception chaining to preserve error context.

    When re-raising exceptions, use from e to preserve the error chain for better debugging.

    -        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
    +        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e

    144-146: Fix exception chaining to preserve error context.

    Same issue as above - use proper exception chaining.

    -        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
    +        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
    🧹 Nitpick comments (33)
    quantmsio/core/quantms/psm.py (1)

    1-3: Remove unused import and consider the logging addition.

    The os import on line 2 is unused and should be removed. The addition of logging import aligns well with the PR's enhanced logging system objectives.

     import logging
    -import os
     import re
    quantmsio/commands/transform/uniprot.py (2)

    50-51: Redundant parameter validation.

    The manual validation of required parameters is unnecessary since click's required=True already handles this validation and will raise appropriate errors.

    -    if not all([feature_file, fasta, output_folder]):
    -        raise click.UsageError("Please provide all required parameters")
    -

    56-58: Remove unused variable assignment.

    The output_path variable is assigned but never used. The filename is passed directly to the core function.

         filename = create_uuid_filename(output_prefix, ".feature.parquet")
    -    output_path = output_folder / filename
         map_peptide_to_protein(str(feature_file), str(fasta), str(output_folder), filename)
    README.md (2)

    114-114: Minor punctuation formatting issue.

    There's a loose punctuation mark in the environment variables list formatting.

    -The package can be configured using environment variables:
    +The package can be configured using environment variables:

    132-144: Add language specification to code block.

    The project structure code block should specify a language for better rendering and compliance with markdown standards.

    -```
    +```text
     quantmsio/
     ├── __init__.py
     ├── quantmsioc.py          # CLI entry point
    .github/workflows/python-package.yml (1)

    72-72: Add missing newline at end of file.

    The file should end with a newline character for better POSIX compliance.

           env:
    -        PYTHONIOENCODING: utf-8
    +        PYTHONIOENCODING: utf-8
    +
    quantmsio/core/quantms/feature.py (1)

    61-62: Use 'yield from' for cleaner generator delegation.

    The current implementation yields each element individually. Using yield from is more efficient and cleaner.

    -        for msstats in msstats_in.generate_msstats_in(file_num, protein_str):
    -            yield msstats
    +        yield from msstats_in.generate_msstats_in(file_num, protein_str)
    .github/workflows/integration-tests.yml (1)

    23-23: Remove trailing spaces and add missing newline.

    The file has trailing spaces on multiple lines and is missing a newline at the end.

    Remove trailing spaces from all affected lines and add a newline at the end of the file:

         timeout-minutes: 60  # 1 hour timeout for the entire job
    -    
    +
         
         steps:
         - uses: actions/checkout@v4
    -    
    +
         
         - name: Set up Python 3.10
           uses: actions/setup-python@v5
           with:
             python-version: "3.10"
    -        
    +
             
         # Install package in development mode
         poetry install
    -        
    +
             
         - name: Run Full Conversion Suite Tests
           if: ${{ github.event.inputs.test_type == 'all' || github.event.inputs.test_type == 'full-conversion-suite' || github.event_name == 'release' }}
           run: |
             poetry run pytest -vv tests/test_full_conversion_suite.py --timeout=900
           env:
             PYTHONIOENCODING: utf-8
    -        
    +
             
         - name: Run LFQ Tests Only
           if: ${{ github.event.inputs.test_type == 'lfq-only' }}
           run: |
             poetry run pytest -vv tests/test_full_conversion_suite.py -k "lfq" --timeout=900
           env:
             PYTHONIOENCODING: utf-8
    -        
    +
             
    -    - name: Run TMT Tests Only  
    +    - name: Run TMT Tests Only
           if: ${{ github.event.inputs.test_type == 'tmt-only' }}
           run: |
             poetry run pytest -vv tests/test_full_conversion_suite.py -k "tmt" --timeout=900
           env:
             PYTHONIOENCODING: utf-8
    -        
    +
             
         - name: Run All Integration Tests
           if: ${{ github.event.inputs.test_type == 'all' || github.event_name == 'release' }}
           run: |
             poetry run pytest -vv -m "integration" --timeout=900
           env:
    -        PYTHONIOENCODING: utf-8 
    +        PYTHONIOENCODING: utf-8
    +

    Also applies to: 26-26, 31-31, 40-40, 47-47, 54-54, 55-55, 61-61, 67-67

    quantmsio/commands/utils/project.py (1)

    68-69: Remove redundant parameter validation.

    The parameter validation is unnecessary since Click already enforces required parameters with required=True.

    -    if not all([project_accession, sdrf_file, output_folder]):
    -        raise click.UsageError("Please provide all required parameters")
    -
    quantmsio/commands/transform/de.py (1)

    16-26: Consider reducing function parameters.

    The function has 9 parameters, which exceeds the recommended limit. Consider grouping related parameters into a configuration object or dataclass.

    from dataclasses import dataclass
    from typing import Optional
    from pathlib import Path
    
    @dataclass
    class DifferentialConversionConfig:
        msstats_file: Path
        sdrf_file: Path
        project_file: Optional[Path] = None
        protein_file: Optional[Path] = None
        fdr_threshold: float = 0.05
        output_folder: Path = None
        output_prefix: Optional[str] = None
        delete_existing: bool = True
        verbose: bool = False
    
    def convert_msstats_differential(config: DifferentialConversionConfig) -> None:
        # Function implementation using config.attribute_name
    quantmsio/commands/utils/stats.py (1)

    9-9: Address pylint suggestion for dictionary literal.

    The static analysis tool suggests using dictionary literal syntax instead of dict() call.

    -CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])
    +CONTEXT_SETTINGS = {"help_option_names": ["-h", "--help"]}
    .github/workflows/python-app.yml (2)

    35-35: Fix YAML indentation consistency.

    The YAML file has inconsistent indentation. Steps should be indented by 6 spaces, not 4.

         steps:
    -    - uses: actions/checkout@v4
    +      - uses: actions/checkout@v4

    Apply this indentation fix to all three job definitions.

    Also applies to: 64-64, 88-88


    106-106: Add missing newline at end of file.

    The file is missing a newline character at the end, which is a common YAML best practice.

             poetry run pytest -vv --timeout=900
           env:
             PYTHONIOENCODING: utf-8
    +
    tests/test_full_conversion_suite.py (3)

    51-56: Address unused loop variables.

    The static analysis correctly identifies unused loop variables that should be renamed with underscore prefix.

    -    for dataset_name, files in [("LFQ", lfq_files), ("TMT", tmt_files)]:
    -        for file_type, file_path in files.items():
    +    for _dataset_name, files in [("LFQ", lfq_files), ("TMT", tmt_files)]:
    +        for _file_type, file_path in files.items():

    93-93: Fix f-string without placeholders.

    The f-string on line 93 doesn't contain any placeholders and should be a regular string.

    -        print(f"[TIMEOUT] Timeout after 10 minutes")
    +        print("[TIMEOUT] Timeout after 10 minutes")

    84-94: Simplify conditional return structure.

    The else clause after return is unnecessary and can be simplified.

            if result.returncode == 0:
                print(f"[OK] Success in {duration:.2f}s")
                return True, duration, result.stdout, result.stderr
    -        else:
    -            print(f"[FAIL] Failed in {duration:.2f}s")
    -            print(f"Error: {result.stderr}")
    -            return False, duration, result.stdout, result.stderr
    +        
    +        print(f"[FAIL] Failed in {duration:.2f}s")
    +        print(f"Error: {result.stderr}")
    +        return False, duration, result.stdout, result.stderr
    quantmsio/commands/convert/quantms.py (1)

    191-201: Consider grouping related parameters for better maintainability.

    The function has 9 parameters, which is acceptable for a CLI command but could become harder to maintain as it grows. Consider grouping related parameters into configuration objects if more options are added in the future.

    For future consideration, you could group related parameters:

    • Input parameters: input_file, msstats_file, sdrf_file
    • Output parameters: output_folder, output_prefix
    • Processing options: compute_topn, compute_ibaq, topn
    • General options: verbose

    This would make the function signature cleaner and more extensible.

    quantmsio/commands/utils/plot.py (1)

    14-14: Use dict literal instead of dict() call.

    For better readability and slight performance improvement, use a dict literal.

    -CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])
    +CONTEXT_SETTINGS = {"help_option_names": ["-h", "--help"]}
    docs/IMPLEMENTATION_DETAILS.md (3)

    73-73: Use en dash for number ranges.

    For better typography, use an en dash (–) instead of a hyphen (-) when specifying ranges.

    -- **Before:** 7+ minutes for large datasets
    -- **After:** 10-45 seconds for the same datasets
    +- **Before:** 7+ minutes for large datasets  
    +- **After:** 10–45 seconds for the same datasets

    177-177: Add language specifiers to fenced code blocks.

    Several code blocks are missing language specifiers, which helps with syntax highlighting.

    For the code blocks starting at lines 177, 271, 281, and 291, add language specifiers:

    Line 177:

    -```
    +```text

    Lines 271, 281, 291:

    -```
    +```text

    This improves readability with proper syntax highlighting.

    Also applies to: 271-271, 281-281, 291-291


    270-270: Remove trailing punctuation from headings.

    Markdown headings should not have trailing punctuation according to common style guides.

    -#### Input (Raw msstats data):
    +#### Input (Raw msstats data)
    
    -#### After Channel Mapping:
    +#### After Channel Mapping
    
    -#### After Aggregation (Final output):
    +#### After Aggregation (Final output)

    Also applies to: 280-280, 290-290

    tests/test_msstats_in.py (4)

    88-88: Remove unnecessary f-string prefixes.

    These strings don't contain any placeholders, so the f-prefix is not needed.

    -        print(f"\nLFQ test completed successfully!")
    +        print("\nLFQ test completed successfully!")
    ...
    -        print(f"\nTMT test completed successfully!")
    +        print("\nTMT test completed successfully!")
    ...
    -    print(f"\nDataset Comparison:")
    +    print("\nDataset Comparison:")

    Also applies to: 331-331, 403-403


    236-240: Remove unused loop variable.

    The count variable is not used within the loop body.

    -                for channel, count in intensities_channel_counts.items():
    +                for channel, _ in intensities_channel_counts.items():
                         if channel not in channel_counts:
                             channel_counts[channel] = 0

    251-254: Remove unused variable assignment.

    The file_channel_counts variable is assigned but never used.

    -            # Original counting from channel column
    -            file_channel_counts = msstats_batch.groupby(
    -                ["reference_file_name", "channel"]
    -            ).size()
    -            # Note: Not using this since it only shows surviving channels after deduplication
    +            # Note: Not using groupby counting since it only shows surviving channels after deduplication

    282-284: Simplify dictionary key iteration.

    Remove unnecessary .keys() when iterating over dictionary keys.

    -                for channel in file_channels.keys()
    +                for channel in file_channels
    quantmsio/core/quantms/mztab.py (2)

    495-495: Simplify dictionary get() call.

    The default None is redundant when using dict.get().

    -                        columns = kwargs.get("usecols", None)
    +                        columns = kwargs.get("usecols")

    83-126: Consider reducing instance attributes.

    The class has 22 instance attributes, exceeding the recommended limit of 7. Consider grouping related attributes into data classes.

    Would you like me to help refactor this by grouping related attributes? For example:

    • Position tracking attributes into a PositionTracker class
    • DuckDB-related attributes into a DuckDBConfig class
    • Metadata cache into a MetadataCache class
    quantmsio/core/quantms/msstats_in.py (3)

    14-27: Consider using Python 3 style super() and adding type hints

    The initialization is well-structured, but could benefit from modern Python conventions and type hints for better code clarity.

     class MsstatsIN(DuckDB):
    +    """MSstats input data processor using DuckDB for efficient data handling.
    +    
    +    Supports both LFQ and isobaric labeling (TMT/iTRAQ) experiments.
    +    """
    +    
         def __init__(
             self,
             report_path: Union[Path, str],
             sdrf_path: Union[Path, str],
    -        duckdb_max_memory="16GB",
    -        duckdb_threads=4,
    -    ):
    -        super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads)
    +        duckdb_max_memory: str = "16GB",
    +        duckdb_threads: int = 4,
    +    ) -> None:
    +        super().__init__(report_path, duckdb_max_memory, duckdb_threads)
             self._sdrf = SDRFHandler(sdrf_path)
             self.experiment_type = self._sdrf.get_experiment_type_from_sdrf()
             self._sample_map = self._sdrf.get_sample_map_run()
             self._optimized_setup_done = False

    219-219: Fix whitespace before colon in slicing operations

    PEP 8 recommends no whitespace before colons in slicing operations.

    -            references[i : i + file_num] for i in range(0, len(references), file_num)
    +            references[i:i + file_num] for i in range(0, len(references), file_num)
    -            yield file_list[i : i + batch_size]
    +            yield file_list[i:i + batch_size]

    Also applies to: 533-533


    671-685: Consider relying on context manager for cleanup instead of del

    The __del__ method is not guaranteed to be called and can cause issues with garbage collection. Since you already have context manager support, consider emphasizing its use in documentation.

         def __del__(self):
             """Cleanup database views and tables."""
    +        # Note: __del__ is not guaranteed to be called. Prefer using context manager.
             try:
                 if hasattr(self, "_duckdb") and self._duckdb and self._optimized_setup_done:
                     self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats")
                     self._duckdb.execute("DROP TABLE IF EXISTS channel_mapping")
                     self._duckdb.execute("DROP TABLE IF EXISTS sample_mapping")
                     self._duckdb.execute("DROP TABLE IF EXISTS protein_groups")
                     self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats_with_pg")
                 # Always call parent cleanup to close connection and remove database file
                 if hasattr(self, "_duckdb") and self._duckdb:
                     self.destroy_duckdb_database()
             except:
    -            pass
    +            pass  # Suppress exceptions in destructor to prevent issues during GC
    quantmsio/core/quantms/pg.py (3)

    60-66: Improve exception handling in destructor

    Using bare except in del is acceptable but could be improved with contextlib.suppress.

    +from contextlib import suppress
    +
         def __del__(self):
             """Destructor to ensure cleanup."""
    -        try:
    +        with suppress(Exception):
                 self.cleanup()
    -        except:
    -            pass  # Ignore errors in destructor

    229-273: Consider using ExitStack for cleaner resource management

    The current implementation manually tracks file handles. Python's contextlib.ExitStack could simplify this.

    from contextlib import ExitStack
    
    def _safe_file_open(self, file_path, mode="r"):
        """Safely open files with automatic resource management."""
        import gzip
        
        # Handle both compressed and uncompressed files
        if str(file_path).endswith(".gz"):
            return gzip.open(file_path, "rt" if "t" not in mode else mode, encoding="utf-8")
        else:
            return open(file_path, mode, encoding="utf-8")

    Then use it directly with with statements without the complex tracking logic.


    583-583: Fix whitespace before colon in slicing operation

    -                yield unique_files[i : i + batch_size]
    +                yield unique_files[i:i + batch_size]
    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between 57ea767 and 583005d.

    ⛔ Files ignored due to path filters (14)
    • tests/examples/maxquant/maxquant_full/PXD001819.sdrf.tsv is excluded by !**/*.tsv
    • tests/examples/maxquant/maxquant_full/evidence.txt.gz is excluded by !**/*.gz
    • tests/examples/maxquant/maxquant_simple/sdrf.tsv is excluded by !**/*.tsv
    • tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf.tsv is excluded by !**/*.tsv
    • tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design.tsv is excluded by !**/*.tsv
    • tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_msstats_in.csv.gz is excluded by !**/*.gz
    • tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_openms.mzTab.gz is excluded by !**/*.gz
    • tests/examples/quantms/dda-lfq-small/PXD040438.sdrf.tsv is excluded by !**/*.tsv
    • tests/examples/quantms/dda-lfq-small/PXD040438_msstats_in.csv is excluded by !**/*.csv
    • tests/examples/quantms/dda-plex-full/PXD007683-TMT.sdrf.tsv is excluded by !**/*.tsv
    • tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_msstats_in.csv.gz is excluded by !**/*.gz
    • tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_openms.mzTab.gz is excluded by !**/*.gz
    • tests/examples/quantms/dda-plex-small/MSV000079033-Blood-Plasma-iTRAQ.sdrf.tsv is excluded by !**/*.tsv
    • tests/examples/quantms/dda-plex-small/MSV000079033_msstats_in.csv is excluded by !**/*.csv
    📒 Files selected for processing (59)
    • .github/workflows/integration-tests.yml (1 hunks)
    • .github/workflows/python-app.yml (2 hunks)
    • .github/workflows/python-package.yml (1 hunks)
    • .gitignore (1 hunks)
    • README.md (1 hunks)
    • docs/IMPLEMENTATION_DETAILS.md (1 hunks)
    • docs/README.adoc (5 hunks)
    • docs/feature.avsc (6 hunks)
    • docs/pg.avsc (3 hunks)
    • docs/tools.adoc (2 hunks)
    • pyproject.toml (3 hunks)
    • quantmsio/commands/convert/__init__.py (1 hunks)
    • quantmsio/commands/convert/diann.py (1 hunks)
    • quantmsio/commands/convert/fragpipe.py (1 hunks)
    • quantmsio/commands/convert/maxquant.py (1 hunks)
    • quantmsio/commands/convert/quantms.py (1 hunks)
    • quantmsio/commands/convert/quantms_project.py (1 hunks)
    • quantmsio/commands/transform/anndata.py (1 hunks)
    • quantmsio/commands/transform/de.py (1 hunks)
    • quantmsio/commands/transform/gene.py (1 hunks)
    • quantmsio/commands/transform/ibaq.py (1 hunks)
    • quantmsio/commands/transform/spectra.py (1 hunks)
    • quantmsio/commands/transform/uniprot.py (1 hunks)
    • quantmsio/commands/utils/plot.py (1 hunks)
    • quantmsio/commands/utils/project.py (1 hunks)
    • quantmsio/commands/utils/stats.py (1 hunks)
    • quantmsio/core/ae.py (1 hunks)
    • quantmsio/core/combiner.py (1 hunks)
    • quantmsio/core/common.py (3 hunks)
    • quantmsio/core/de.py (1 hunks)
    • quantmsio/core/diann/diann.py (3 hunks)
    • quantmsio/core/duckdb.py (1 hunks)
    • quantmsio/core/format.py (9 hunks)
    • quantmsio/core/fragpipe/fragpipe.py (3 hunks)
    • quantmsio/core/maxquant/maxquant.py (3 hunks)
    • quantmsio/core/msstats_in.py (0 hunks)
    • quantmsio/core/mztab.py (0 hunks)
    • quantmsio/core/project.py (2 hunks)
    • quantmsio/core/quantms/feature.py (4 hunks)
    • quantmsio/core/quantms/msstats_in.py (1 hunks)
    • quantmsio/core/quantms/mztab.py (1 hunks)
    • quantmsio/core/quantms/pg.py (1 hunks)
    • quantmsio/core/quantms/psm.py (5 hunks)
    • quantmsio/core/sdrf.py (2 hunks)
    • quantmsio/operate/plots.py (1 hunks)
    • quantmsio/operate/query.py (2 hunks)
    • quantmsio/operate/statistics.py (1 hunks)
    • quantmsio/operate/tools.py (1 hunks)
    • quantmsio/quantmsioc.py (3 hunks)
    • quantmsio/utils/file_utils.py (1 hunks)
    • tests/test_diann.py (2 hunks)
    • tests/test_feature.py (1 hunks)
    • tests/test_full_conversion_suite.py (1 hunks)
    • tests/test_maxquant.py (2 hunks)
    • tests/test_msstats_in.py (1 hunks)
    • tests/test_project.py (1 hunks)
    • tests/test_psm.py (1 hunks)
    • tests/test_query.py (1 hunks)
    • tests/test_sdrf.py (4 hunks)
    💤 Files with no reviewable changes (2)
    • quantmsio/core/msstats_in.py
    • quantmsio/core/mztab.py
    ✅ Files skipped from review due to trivial changes (13)
    • tests/test_query.py
    • quantmsio/operate/plots.py
    • quantmsio/core/ae.py
    • quantmsio/core/de.py
    • quantmsio/utils/file_utils.py
    • tests/test_psm.py
    • quantmsio/core/duckdb.py
    • tests/test_project.py
    • tests/test_feature.py
    • quantmsio/core/combiner.py
    • quantmsio/operate/tools.py
    • tests/test_sdrf.py
    • docs/tools.adoc
    🚧 Files skipped from review as they are similar to previous changes (11)
    • quantmsio/commands/convert/init.py
    • pyproject.toml
    • quantmsio/commands/transform/anndata.py
    • quantmsio/commands/transform/ibaq.py
    • quantmsio/commands/transform/spectra.py
    • quantmsio/core/project.py
    • docs/pg.avsc
    • quantmsio/quantmsioc.py
    • quantmsio/commands/convert/fragpipe.py
    • quantmsio/core/format.py
    • docs/feature.avsc
    🧰 Additional context used
    🧬 Code Graph Analysis (10)
    quantmsio/operate/query.py (2)
    quantmsio/utils/pride_utils.py (3)
    • generate_gene_name_map (35-55)
    • get_gene_accessions (58-71)
    • get_unanimous_name (19-32)
    quantmsio/operate/statistics.py (1)
    • close (108-112)
    tests/test_diann.py (3)
    quantmsio/core/diann/diann.py (2)
    • get_report_from_database (48-65)
    • generate_pg_matrix (109-210)
    quantmsio/core/quantms/feature.py (1)
    • Feature (18-292)
    quantmsio/core/duckdb.py (1)
    • get_unique_references (66-69)
    quantmsio/operate/statistics.py (1)
    quantmsio/operate/query.py (1)
    • close (344-348)
    quantmsio/commands/convert/maxquant.py (3)
    quantmsio/core/maxquant/maxquant.py (5)
    • MaxQuant (54-750)
    • write_psm_to_file (362-373)
    • write_feature_to_file (380-399)
    • write_features_to_file (419-443)
    • write_protein_groups_to_file (684-711)
    quantmsio/core/project.py (1)
    • create_uuid_filename (59-62)
    quantmsio/utils/logger.py (1)
    • get_logger (8-10)
    quantmsio/commands/convert/diann.py (3)
    quantmsio/core/diann/diann.py (4)
    • DiaNNConvert (34-486)
    • write_feature_to_file (443-462)
    • write_features_to_file (464-486)
    • write_pg_matrix_to_file (423-441)
    quantmsio/core/project.py (1)
    • create_uuid_filename (59-62)
    quantmsio/utils/logger.py (1)
    • get_logger (8-10)
    quantmsio/commands/transform/de.py (3)
    quantmsio/core/de.py (6)
    • DifferentialExpressionHandler (33-269)
    • load_project_file (100-113)
    • load_msstats_file (81-98)
    • load_sdrf_file (255-261)
    • set_fdr_threshold (263-269)
    • convert_msstats_to_quantms (115-220)
    quantmsio/utils/file_utils.py (1)
    • extract_protein_list (21-38)
    quantmsio/utils/logger.py (1)
    • get_logger (8-10)
    quantmsio/commands/transform/gene.py (1)
    quantmsio/operate/tools.py (1)
    • generate_feature_of_gene (104-120)
    quantmsio/commands/utils/plot.py (1)
    quantmsio/operate/plots.py (5)
    • plot_distribution_of_ibaq (14-43)
    • plot_intensity_box_of_samples (180-217)
    • plot_intensity_distribution_of_samples (117-145)
    • plot_peptide_distribution_of_protein (148-177)
    • plot_peptides_of_lfq_condition (46-114)
    quantmsio/commands/utils/project.py (1)
    quantmsio/core/project.py (7)
    • check_directory (22-56)
    • populate_from_pride_archive (123-148)
    • populate_from_sdrf (282-288)
    • add_quantms_version (150-151)
    • add_software_provider (168-175)
    • add_sdrf_file (290-326)
    • save_updated_project_info (257-280)
    quantmsio/commands/utils/stats.py (1)
    quantmsio/operate/statistics.py (15)
    • IbaqStatistics (26-46)
    • ParquetStatistics (49-116)
    • get_number_of_proteins (10-11)
    • get_number_of_proteins (32-38)
    • get_number_of_proteins (80-92)
    • get_number_of_peptides (13-14)
    • get_number_of_peptides (62-66)
    • get_number_of_samples (16-17)
    • get_number_of_samples (40-46)
    • get_number_of_samples (74-78)
    • get_number_of_peptidoforms (19-20)
    • get_number_of_peptidoforms (68-72)
    • get_number_msruns (22-23)
    • get_number_msruns (94-98)
    • get_number_of_psms (100-106)
    🪛 Ruff (0.11.9)
    quantmsio/core/quantms/psm.py

    2-2: os imported but unused

    Remove unused import: os

    (F401)


    62-62: Use a context manager for opening files

    (SIM115)

    quantmsio/commands/convert/maxquant.py

    99-99: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)


    218-218: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)


    320-320: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)

    quantmsio/commands/convert/quantms.py

    86-86: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)


    140-140: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)


    249-249: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)

    quantmsio/core/maxquant/maxquant.py

    733-733: Use is and is not for type comparisons, or isinstance() for isinstance checks

    (E721)


    735-735: Use is and is not for type comparisons, or isinstance() for isinstance checks

    (E721)


    737-737: Use is and is not for type comparisons, or isinstance() for isinstance checks

    (E721)


    739-739: Use is and is not for type comparisons, or isinstance() for isinstance checks

    (E721)


    743-743: Use is and is not for type comparisons, or isinstance() for isinstance checks

    (E721)


    745-745: Use is and is not for type comparisons, or isinstance() for isinstance checks

    (E721)


    749-749: Use is and is not for type comparisons, or isinstance() for isinstance checks

    (E721)

    quantmsio/commands/convert/quantms_project.py

    5-5: logging imported but unused

    Remove unused import: logging

    (F401)


    12-12: pandas imported but unused

    Remove unused import: pandas

    (F401)


    13-13: pyarrow imported but unused

    Remove unused import: pyarrow

    (F401)


    14-14: pyarrow.parquet imported but unused

    Remove unused import: pyarrow.parquet

    (F401)


    22-22: quantmsio.core.quantms.mztab.MzTabProteinGroups imported but unused

    Remove unused import: quantmsio.core.quantms.mztab.MzTabProteinGroups

    (F401)

    tests/test_msstats_in.py

    88-88: f-string without any placeholders

    Remove extraneous f prefix

    (F541)


    236-236: Loop control variable count not used within loop body

    (B007)


    251-251: Local variable file_channel_counts is assigned to but never used

    Remove assignment to unused variable file_channel_counts

    (F841)


    283-283: Use key in dict instead of key in dict.keys()

    Remove .keys()

    (SIM118)


    331-331: f-string without any placeholders

    Remove extraneous f prefix

    (F541)


    403-403: f-string without any placeholders

    Remove extraneous f prefix

    (F541)

    quantmsio/core/quantms/msstats_in.py

    68-68: Do not use bare except

    (E722)


    76-76: Local variable e is assigned to but never used

    Remove assignment to unused variable e

    (F841)


    105-105: Local variable mapping_df is assigned to but never used

    Remove assignment to unused variable mapping_df

    (F841)


    119-119: Local variable sample_df is assigned to but never used

    Remove assignment to unused variable sample_df

    (F841)


    260-260: Do not use bare except

    (E722)


    266-266: Do not use bare except

    (E722)


    683-683: Do not use bare except

    (E722)

    tests/test_maxquant.py

    6-6: pytest imported but unused

    Remove unused import: pytest

    (F401)

    quantmsio/core/quantms/pg.py

    62-65: Use contextlib.suppress(Exception) instead of try-except-pass

    (SIM105)


    64-64: Do not use bare except

    (E722)


    180-180: Do not use bare except

    (E722)


    241-241: Use a context manager for opening files

    (SIM115)


    246-246: Use a context manager for opening files

    (SIM115)


    261-264: Use contextlib.suppress(ValueError) instead of try-except-pass

    (SIM105)


    270-270: Do not use bare except

    (E722)


    519-519: Local variable protein_df is assigned to but never used

    Remove assignment to unused variable protein_df

    (F841)

    quantmsio/core/quantms/mztab.py

    255-255: Use a context manager for opening files

    (SIM115)


    258-258: Use a context manager for opening files

    (SIM115)


    414-414: Use a context manager for opening files

    (SIM115)


    478-478: Use a context manager for opening files

    (SIM115)


    495-495: Use kwargs.get("usecols") instead of kwargs.get("usecols", None)

    Replace kwargs.get("usecols", None) with kwargs.get("usecols")

    (SIM910)


    680-680: Do not use bare except

    (E722)


    691-691: Do not use bare except

    (E722)


    785-788: Use contextlib.suppress(Exception) instead of try-except-pass

    Replace with contextlib.suppress(Exception)

    (SIM105)


    787-787: Do not use bare except

    (E722)


    792-795: Use contextlib.suppress(Exception) instead of try-except-pass

    Replace with contextlib.suppress(Exception)

    (SIM105)


    794-794: Do not use bare except

    (E722)


    800-803: Use contextlib.suppress(Exception) instead of try-except-pass

    Replace with contextlib.suppress(Exception)

    (SIM105)


    802-802: Do not use bare except

    (E722)

    tests/test_full_conversion_suite.py

    51-51: Loop control variable dataset_name not used within loop body

    Rename unused dataset_name to _dataset_name

    (B007)


    52-52: Loop control variable file_type not used within loop body

    Rename unused file_type to _file_type

    (B007)


    93-93: f-string without any placeholders

    Remove extraneous f prefix

    (F541)

    quantmsio/commands/convert/diann.py

    93-93: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)


    146-146: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)

    quantmsio/commands/transform/de.py

    68-68: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)

    quantmsio/commands/transform/uniprot.py

    57-57: Local variable output_path is assigned to but never used

    Remove assignment to unused variable output_path

    (F841)

    quantmsio/core/sdrf.py

    127-129: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)

    🪛 Flake8 (7.2.0)
    quantmsio/core/quantms/psm.py

    [error] 2-2: 'os' imported but unused

    (F401)

    quantmsio/commands/convert/quantms_project.py

    [error] 5-5: 'logging' imported but unused

    (F401)


    [error] 12-12: 'pandas as pd' imported but unused

    (F401)


    [error] 13-13: 'pyarrow as pa' imported but unused

    (F401)


    [error] 14-14: 'pyarrow.parquet as pq' imported but unused

    (F401)


    [error] 22-22: 'quantmsio.core.quantms.mztab.MzTabProteinGroups' imported but unused

    (F401)

    tests/test_msstats_in.py

    [error] 88-88: f-string is missing placeholders

    (F541)


    [error] 251-251: local variable 'file_channel_counts' is assigned to but never used

    (F841)


    [error] 331-331: f-string is missing placeholders

    (F541)


    [error] 403-403: f-string is missing placeholders

    (F541)

    quantmsio/core/quantms/msstats_in.py

    [error] 68-68: do not use bare 'except'

    (E722)


    [error] 76-76: local variable 'e' is assigned to but never used

    (F841)


    [error] 105-105: local variable 'mapping_df' is assigned to but never used

    (F841)


    [error] 119-119: local variable 'sample_df' is assigned to but never used

    (F841)


    [error] 219-219: whitespace before ':'

    (E203)


    [error] 260-260: do not use bare 'except'

    (E722)


    [error] 266-266: do not use bare 'except'

    (E722)


    [error] 533-533: whitespace before ':'

    (E203)


    [error] 683-683: do not use bare 'except'

    (E722)

    tests/test_maxquant.py

    [error] 6-6: 'pytest' imported but unused

    (F401)

    quantmsio/core/quantms/pg.py

    [error] 64-64: do not use bare 'except'

    (E722)


    [error] 180-180: do not use bare 'except'

    (E722)


    [error] 270-270: do not use bare 'except'

    (E722)


    [error] 519-519: local variable 'protein_df' is assigned to but never used

    (F841)


    [error] 583-583: whitespace before ':'

    (E203)

    quantmsio/core/quantms/mztab.py

    [error] 680-680: do not use bare 'except'

    (E722)


    [error] 691-691: do not use bare 'except'

    (E722)


    [error] 769-769: whitespace before ':'

    (E203)


    [error] 787-787: do not use bare 'except'

    (E722)


    [error] 794-794: do not use bare 'except'

    (E722)


    [error] 802-802: do not use bare 'except'

    (E722)

    tests/test_full_conversion_suite.py

    [error] 93-93: f-string is missing placeholders

    (F541)

    quantmsio/commands/transform/uniprot.py

    [error] 57-57: local variable 'output_path' is assigned to but never used

    (F841)

    🪛 Pylint (3.3.7)
    quantmsio/core/quantms/psm.py

    [refactor] 62-62: Consider using 'with' for resource-allocating operations

    (R1732)

    quantmsio/commands/convert/maxquant.py

    [refactor] 144-144: Too many arguments (8/5)

    (R0913)


    [refactor] 144-144: Too many positional arguments (8/5)

    (R0917)


    [refactor] 259-259: Too many arguments (7/5)

    (R0913)


    [refactor] 259-259: Too many positional arguments (7/5)

    (R0917)

    quantmsio/commands/convert/quantms.py

    [refactor] 55-55: Too many arguments (6/5)

    (R0913)


    [refactor] 55-55: Too many positional arguments (6/5)

    (R0917)


    [refactor] 191-191: Too many arguments (9/5)

    (R0913)


    [refactor] 191-191: Too many positional arguments (9/5)

    (R0917)


    [refactor] 191-191: Too many local variables (16/15)

    (R0914)

    quantmsio/core/maxquant/maxquant.py

    [refactor] 684-684: Too many arguments (6/5)

    (R0913)


    [refactor] 684-684: Too many positional arguments (6/5)

    (R0917)

    quantmsio/commands/convert/quantms_project.py

    [error] 22-22: No name 'MzTabProteinGroups' in module 'quantmsio.core.quantms.mztab'

    (E0611)


    [refactor] 47-47: Too many arguments (6/5)

    (R0913)


    [refactor] 47-47: Too many positional arguments (6/5)

    (R0917)


    [refactor] 47-47: Too many local variables (25/15)

    (R0914)


    [refactor] 47-47: Too many branches (14/12)

    (R0912)


    [refactor] 47-47: Too many statements (78/50)

    (R0915)


    [refactor] 247-247: Too many arguments (6/5)

    (R0913)


    [refactor] 247-247: Too many positional arguments (6/5)

    (R0917)

    quantmsio/core/quantms/feature.py

    [refactor] 61-62: Use 'yield from' directly instead of yielding each element one by one

    (R1737)

    tests/test_msstats_in.py

    [refactor] 98-98: Too many local variables (34/15)

    (R0914)


    [refactor] 152-253: Too many nested blocks (6/5)

    (R1702)


    [refactor] 152-253: Too many nested blocks (6/5)

    (R1702)


    [refactor] 98-98: Too many branches (31/12)

    (R0912)


    [refactor] 98-98: Too many statements (114/50)

    (R0915)

    quantmsio/core/quantms/msstats_in.py

    [refactor] 22-22: Consider using Python 3 style super() without arguments

    (R1725)


    [refactor] 405-406: Use 'yield from' directly instead of yielding each element one by one

    (R1737)

    quantmsio/core/quantms/pg.py

    [error] 73-73: Non-iterable value file is used in an iterating context

    (E1133)


    [refactor] 74-82: Unnecessary "elif" after "break", remove the leading "el" from "elif"

    (R1723)


    [error] 110-110: Non-iterable value file is used in an iterating context

    (E1133)


    [refactor] 92-92: Too many branches (13/12)

    (R0912)


    [refactor] 192-195: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 277-283: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)


    [refactor] 285-285: Too many arguments (9/5)

    (R0913)


    [refactor] 285-285: Too many positional arguments (9/5)

    (R0917)


    [refactor] 285-285: Too many local variables (35/15)

    (R0914)


    [refactor] 285-285: Too many statements (55/50)

    (R0915)


    [refactor] 571-571: Useless return at end of function or method

    (R1711)


    [refactor] 627-627: Too many arguments (7/5)

    (R0913)


    [refactor] 627-627: Too many positional arguments (7/5)

    (R0917)


    [refactor] 627-627: Too many local variables (23/15)

    (R0914)

    quantmsio/core/quantms/mztab.py

    [refactor] 83-83: Too many instance attributes (22/7)

    (R0902)


    [refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 414-414: Consider using 'with' for resource-allocating operations

    (R1732)


    [refactor] 478-478: Consider using 'with' for resource-allocating operations

    (R1732)


    [refactor] 606-640: Too many nested blocks (7/5)

    (R1702)


    [refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)


    [refactor] 715-732: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)

    tests/test_full_conversion_suite.py

    [refactor] 84-90: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)

    quantmsio/commands/convert/diann.py

    [refactor] 17-17: Too many arguments (12/5)

    (R0913)


    [refactor] 17-17: Too many positional arguments (12/5)

    (R0917)


    [refactor] 17-17: Too many local variables (19/15)

    (R0914)


    [refactor] 96-96: Too many arguments (7/5)

    (R0913)


    [refactor] 96-96: Too many positional arguments (7/5)

    (R0917)

    quantmsio/commands/transform/de.py

    [refactor] 16-16: Too many arguments (9/5)

    (R0913)


    [refactor] 16-16: Too many positional arguments (9/5)

    (R0917)

    quantmsio/commands/transform/gene.py

    [refactor] 45-45: Too many arguments (6/5)

    (R0913)


    [refactor] 45-45: Too many positional arguments (6/5)

    (R0917)

    quantmsio/commands/utils/plot.py

    [refactor] 14-14: Consider using '{"help_option_names": ['-h', '--help']}' instead of a call to 'dict'.

    (R1735)

    quantmsio/commands/utils/project.py

    [refactor] 45-45: Too many arguments (6/5)

    (R0913)


    [refactor] 45-45: Too many positional arguments (6/5)

    (R0917)


    [error] 86-86: Unexpected keyword argument 'output_file_name' in method call

    (E1123)

    quantmsio/commands/utils/stats.py

    [refactor] 9-9: Consider using '{"help_option_names": ['-h', '--help']}' instead of a call to 'dict'.

    (R1735)

    🪛 actionlint (1.7.7)
    .github/workflows/python-app.yml

    37-37: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)


    66-66: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)


    90-90: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)

    .github/workflows/python-package.yml

    23-23: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)


    56-56: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)

    .github/workflows/integration-tests.yml

    28-28: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)

    🪛 YAMLlint (1.37.1)
    .github/workflows/python-app.yml

    [warning] 35-35: wrong indentation: expected 6 but found 4

    (indentation)


    [warning] 64-64: wrong indentation: expected 6 but found 4

    (indentation)


    [warning] 88-88: wrong indentation: expected 6 but found 4

    (indentation)


    [error] 106-106: no new line character at the end of file

    (new-line-at-end-of-file)

    .github/workflows/python-package.yml

    [error] 72-72: no new line character at the end of file

    (new-line-at-end-of-file)

    .github/workflows/integration-tests.yml

    [error] 23-23: trailing spaces

    (trailing-spaces)


    [error] 26-26: trailing spaces

    (trailing-spaces)


    [error] 31-31: trailing spaces

    (trailing-spaces)


    [error] 40-40: trailing spaces

    (trailing-spaces)


    [error] 47-47: trailing spaces

    (trailing-spaces)


    [error] 54-54: trailing spaces

    (trailing-spaces)


    [error] 55-55: trailing spaces

    (trailing-spaces)


    [error] 61-61: trailing spaces

    (trailing-spaces)


    [error] 67-67: no new line character at the end of file

    (new-line-at-end-of-file)


    [error] 67-67: trailing spaces

    (trailing-spaces)

    🪛 LanguageTool
    README.md

    [uncategorized] ~114-~114: Loose punctuation mark.
    Context: ...ment variables: - QUANTMSIO_LOG_LEVEL: Set logging level (DEBUG, INFO, WARNING...

    (UNLIKELY_OPENING_PUNCTUATION)

    docs/IMPLEMENTATION_DETAILS.md

    [typographical] ~73-~73: If specifying a range, consider using an en dash instead of a hyphen.
    Context: ...minutes for large datasets - After: 10-45 seconds for the same datasets - **Memor...

    (HYPHEN_TO_EN)

    🪛 markdownlint-cli2 (0.17.2)
    README.md

    132-132: Fenced code blocks should have a language specified
    null

    (MD040, fenced-code-language)

    docs/IMPLEMENTATION_DETAILS.md

    177-177: Fenced code blocks should have a language specified
    null

    (MD040, fenced-code-language)


    270-270: Trailing punctuation in heading
    Punctuation: ':'

    (MD026, no-trailing-punctuation)


    271-271: Fenced code blocks should have a language specified
    null

    (MD040, fenced-code-language)


    280-280: Trailing punctuation in heading
    Punctuation: ':'

    (MD026, no-trailing-punctuation)


    281-281: Fenced code blocks should have a language specified
    null

    (MD040, fenced-code-language)


    290-290: Trailing punctuation in heading
    Punctuation: ':'

    (MD026, no-trailing-punctuation)


    291-291: Fenced code blocks should have a language specified
    null

    (MD040, fenced-code-language)

    🪛 GitHub Check: CodeFactor
    quantmsio/commands/convert/quantms_project.py

    [notice] 45-199: quantmsio/commands/convert/quantms_project.py#L45-L199
    Complex Method

    tests/test_msstats_in.py

    [notice] 98-331: tests/test_msstats_in.py#L98-L331
    Complex Method

    quantmsio/core/quantms/msstats_in.py

    [notice] 260-260: quantmsio/core/quantms/msstats_in.py#L260
    Do not use bare 'except'. (E722)


    [notice] 683-683: quantmsio/core/quantms/msstats_in.py#L683
    Do not use bare 'except'. (E722)


    [notice] 683-684: quantmsio/core/quantms/msstats_in.py#L683-L684
    Try, Except, Pass detected. (B110)


    [notice] 266-266: quantmsio/core/quantms/msstats_in.py#L266
    Do not use bare 'except'. (E722)


    [notice] 68-68: quantmsio/core/quantms/msstats_in.py#L68
    Do not use bare 'except'. (E722)

    quantmsio/core/quantms/pg.py

    [notice] 258-258: quantmsio/core/quantms/pg.py#L258
    Do not use bare 'except'. (E722)


    [notice] 64-64: quantmsio/core/quantms/pg.py#L64
    Do not use bare 'except'. (E722)


    [notice] 64-65: quantmsio/core/quantms/pg.py#L64-L65
    Try, Except, Pass detected. (B110)


    [notice] 172-172: quantmsio/core/quantms/pg.py#L172
    Do not use bare 'except'. (E722)


    [notice] 258-259: quantmsio/core/quantms/pg.py#L258-L259
    Try, Except, Pass detected. (B110)

    quantmsio/core/quantms/mztab.py

    [notice] 802-802: quantmsio/core/quantms/mztab.py#L802
    Do not use bare 'except'. (E722)


    [notice] 787-788: quantmsio/core/quantms/mztab.py#L787-L788
    Try, Except, Pass detected. (B110)


    [notice] 794-794: quantmsio/core/quantms/mztab.py#L794
    Do not use bare 'except'. (E722)


    [notice] 680-680: quantmsio/core/quantms/mztab.py#L680
    Do not use bare 'except'. (E722)


    [notice] 794-795: quantmsio/core/quantms/mztab.py#L794-L795
    Try, Except, Pass detected. (B110)


    [notice] 802-803: quantmsio/core/quantms/mztab.py#L802-L803
    Try, Except, Pass detected. (B110)


    [notice] 691-691: quantmsio/core/quantms/mztab.py#L691
    Do not use bare 'except'. (E722)


    [notice] 691-692: quantmsio/core/quantms/mztab.py#L691-L692
    Try, Except, Continue detected. (B112)


    [notice] 787-787: quantmsio/core/quantms/mztab.py#L787
    Do not use bare 'except'. (E722)

    ⏰ Context from checks skipped due to timeout of 90000ms (5)
    • GitHub Check: Codacy Static Code Analysis
    • GitHub Check: unit-tests
    • GitHub Check: unit-tests
    • GitHub Check: unit-tests (3.10)
    • GitHub Check: unit-tests (3.11)
    🔇 Additional comments (41)
    .gitignore (1)

    8-70: Excellent comprehensive .gitignore update.

    The extensive additions cover all the essential ignore patterns for the quantmsio project, including DuckDB databases, temporary processing files, test outputs, and standard Python development artifacts. These patterns align well with the project's data processing workflows and CLI command structure.

    quantmsio/core/sdrf.py (1)

    11-11: Minor import reordering.

    The import order change is stylistic and doesn't affect functionality.

    quantmsio/operate/statistics.py (1)

    108-116: Excellent resource management implementation.

    The addition of explicit close() and __del__() methods properly handles DuckDB connection lifecycle. The implementation correctly checks for attribute existence, closes the connection, and sets it to None to prevent double-closure issues. This pattern aligns with the similar implementation in quantmsio/operate/query.py and effectively prevents resource leaks.

    quantmsio/operate/query.py (2)

    15-19: Appropriate import organization.

    The updated imports from pride_utils improve modularity by importing specific utility functions needed by the Query class methods.


    344-352: Consistent resource management implementation.

    The addition of close() and __del__() methods matches the pattern implemented in quantmsio/operate/statistics.py (lines 108-112), providing consistent DuckDB connection management across the codebase. This prevents resource leaks and ensures proper cleanup.

    quantmsio/core/quantms/psm.py (2)

    117-122: Improved logic separation for minimum score finding.

    The refactoring to separate grouping and index extraction steps improves code clarity and makes the intent more explicit. This is a good improvement over the previous implementation.


    231-250: Enhanced logging provides good traceability.

    The addition of structured logging statements provides excellent traceability for PSM conversion operations, including input/output paths and optional protein filtering. This aligns well with the PR's logging enhancement objectives.

    quantmsio/commands/transform/gene.py (1)

    1-67: Well-structured CLI command with appropriate parameter handling.

    The command follows good click patterns and properly handles the optional partitions parameter by splitting comma-separated values. While static analysis flags the high parameter count, all parameters appear necessary for the gene mapping functionality.

    The delegation to generate_feature_of_gene keeps the CLI layer thin and focused on parameter handling, which is a good separation of concerns.

    tests/test_diann.py (2)

    3-4: Import path updates reflect module restructuring.

    The updated import paths align with the module restructuring described in the PR objectives, moving to more specific submodules within quantmsio.core.quantms.


    38-125: Comprehensive test coverage for protein group transformation.

    This test excellently validates the complex nested structure of protein group data after transformation. It thoroughly checks:

    • Presence of required columns (intensities, additional_intensities)
    • Proper list structure for intensity arrays
    • Correct nested objects with expected fields (sample_accession, channel, intensity)
    • Validation of DIA-NN-specific values (LFQ channel)
    • Structure of additional intensity types with proper nesting

    The test provides robust coverage for the refactored protein group handling and ensures data integrity throughout the transformation process.

    README.md (2)

    8-25: Excellent transformation to comprehensive user documentation.

    The README has been completely restructured from a brief specification note into a comprehensive user and developer guide. This aligns perfectly with the PR's CLI reorganization and enhanced functionality, providing clear documentation for all the new command groups and features.


    174-176: License inconsistency needs clarification.

    The development section mentions MIT License, but the copyright notice at the bottom specifies GNU General Public License. This inconsistency should be resolved to avoid confusion.

    Please clarify which license applies to this project - MIT (mentioned in line 176) or GPL (mentioned in the copyright notice starting at line 196). Consistent licensing information is important for open source projects.

    quantmsio/core/quantms/feature.py (2)

    58-62: Excellent use of context manager for resource management.

    The refactoring to use a context manager with MsstatsIN ensures proper resource cleanup and automatic database closure. This is a significant improvement for reliability.


    156-162: Good addition of structured logging.

    The logging statements provide valuable traceability by recording input parameters, output paths, and optional filter files. This enhances debugging and monitoring capabilities.

    Also applies to: 185-192

    quantmsio/core/diann/diann.py (5)

    15-24: Import reorganization looks good.

    The reordering of imports to move DIANN_USECOLS after DIANN_PG_USECOLS and updating imports from quantmsio.core.quantms aligns with the module restructuring mentioned in the AI summary.


    176-195: Review the additional_intensities structure for completeness.

    The enhanced additional_intensities structure with nested intensities array containing intensity_name and intensity_value pairs is well-structured. This provides good separation between primary intensities and additional/derived intensities.


    203-208: Good practice: Clean up intermediate columns.

    Dropping the raw count columns unique_sequences and total_features after transformation is good practice to avoid confusion and reduce memory usage.


    370-370: Verify consistency in additional_intensities key naming.

    The change from "additional_intensity" to "intensities" in the add_additional_msg method should be consistent with the key used in generate_pg_matrix. Both methods now use "intensities" which is correct.


    159-173: Manual Verification Required: Confirm intensities Array Structure Matches Schema

    I was unable to locate any schema definitions for the new intensities field in the repository. Please ensure that the generated array with these properties:

    • sample_accession (string)
    • channel (string)
    • intensity (numeric)

    aligns with the updated schema expectations.

    Affected location:
    • quantmsio/core/diann/diann.py (Lines 159–173)

    # Create intensities array using the pg_quantity field (mapped from PG.Quantity)
    report.loc[:, "intensities"] = report[
        ["reference_file_name", "pg_quantity"]
    ].apply(
        lambda rows: [
            {
                "sample_accession": self._sample_map[
                    rows["reference_file_name"] + "-LFQ"
                ],
                "channel": "LFQ",
                "intensity": rows["pg_quantity"],
            }
        ],
        axis=1,
    )
    quantmsio/core/common.py (4)

    8-11: Import reorganization improves module structure.

    The updated imports from quantmsio and quantmsio.core.format align with the project's module restructuring and make dependencies more explicit.


    99-111: Explicit column definition improves maintainability.

    Defining DIANN_PG_USECOLS explicitly rather than dynamically generating it improves readability and makes it easier to maintain the column list independently of the mapping dictionary.


    163-197: New mapping dictionaries extend format support.

    The addition of MAXQUANT_PG_MAP and MZTAB_PG_MAP with their corresponding usage columns extends the library's support for multiple proteomics data formats. The field names are well-chosen and follow consistent naming conventions.


    90-96: All DIANN_PG_MAP entries are correctly used in generate_pg_matrix.

    The newly added mappings ("PG.Quantity" → pg_quantity, "Proteotypic" → unique_sequences, "Precursor.Quantity" → total_features) are referenced in diann.py for both the peptide_counts and feature_counts structs and in the intensities array creation, and are then dropped as expected. No further changes needed.

    quantmsio/commands/utils/stats.py (1)

    12-15: CLI command structure follows good practices.

    The command group setup with proper context settings and clear documentation follows Click framework best practices.

    .github/workflows/python-app.yml (2)

    11-15: Good addition of scheduling and manual dispatch.

    Adding cron scheduling for daily full test runs and workflow_dispatch for manual triggering improves the CI/CD workflow flexibility.


    32-83: Excellent workflow restructuring for better testing granularity.

    The separation into unit-tests, integration-tests, and full-test-suite jobs provides:

    1. Faster feedback for unit tests
    2. Conditional integration tests based on labels
    3. Comprehensive testing on main branch and scheduled runs

    This is a significant improvement over the single build job approach.

    tests/test_full_conversion_suite.py (3)

    66-80: Excellent subprocess configuration for robust testing.

    The subprocess configuration includes:

    1. UTF-8 encoding handling
    2. Proper timeout (10 minutes for TMT protein groups)
    3. Error handling with text output
    4. Working directory set to workspace root

    This ensures robust and reliable CLI testing.


    119-167: Comprehensive test coverage for all conversion types.

    The test suite covers all major conversion workflows (PSM, Feature, Protein Groups) for both LFQ and TMT datasets. Each test:

    1. Validates input file existence
    2. Creates temporary output directories
    3. Runs the CLI command via subprocess
    4. Analyzes output files for expected structure
    5. Asserts success and data validity

    This provides excellent end-to-end validation of the CLI functionality.

    Also applies to: 169-223, 225-278


    100-116: Good output file analysis helper.

    The analyze_output_file function provides useful validation of parquet outputs including row counts, column counts, file size, and column names. The error handling ensures graceful failure when files can't be analyzed.

    tests/test_maxquant.py (1)

    356-362: Update main block to remove duplicate test function calls.

    After removing the duplicate test functions, update the main block to only call the unique test functions.

     if __name__ == "__main__":
         test_maxquant_protein_groups_transform()
         test_maxquant_protein_groups_with_sample_columns()
    -    test_maxquant_pg_basic_transformation()
    -    test_maxquant_pg_sample_specific_columns()
         test_mztab_pg_gene_extraction()
         print("All MaxQuant protein group tests passed!")

    Likely an incorrect or invalid review comment.

    quantmsio/core/maxquant/maxquant.py (3)

    445-496: Well-implemented batch processing for protein groups.

    The iter_protein_groups_batch method demonstrates good practices:

    • Dynamic column detection for sample-specific intensities
    • Efficient chunked reading
    • Proper metadata storage using DataFrame attributes

    587-619: Good separation of concerns in intensity creation methods.

    The helper methods _create_intensities_from_samples and _create_additional_intensities properly encapsulate the logic for creating structured intensity data, making the code more maintainable and testable.

    Also applies to: 620-663


    733-749: Use isinstance() for type checking instead of direct comparison.

    Direct type comparison with == is not recommended. Use isinstance() for proper type checking.

             for field, dtype in required_fields.items():
                 if field not in df.columns:
    -                if dtype == list:
    +                if dtype is list:
                         df[field] = [[] for _ in range(len(df))]
    -                elif dtype == str:
    +                elif dtype is str:
                         df[field] = None
    -                elif dtype == float:
    +                elif dtype is float:
                         df[field] = 0.0
    -                elif dtype == int:
    +                elif dtype is int:
                         df[field] = 0
                 else:
                     # Handle type conversions
    -                if dtype == float:
    +                if dtype is float:
                         df[field] = pd.to_numeric(df[field], errors="coerce").fillna(0.0)
    -                elif dtype == int:
    +                elif dtype is int:
                         df[field] = (
                             pd.to_numeric(df[field], errors="coerce").fillna(0).astype(int)
                         )
    -                elif dtype == str:
    +                elif dtype is str:
                         df[field] = df[field].astype(str).fillna("")

    Likely an incorrect or invalid review comment.

    quantmsio/core/quantms/mztab.py (1)

    246-266: Context manager already handles file operations correctly.

    The _safe_file_open method is properly implemented as a context manager that handles both regular and gzipped files. The static analysis warnings about lines 255 and 258 are false positives - the files are properly closed in the finally block.

    quantmsio/core/quantms/msstats_in.py (2)

    105-106: Variables are used implicitly in DuckDB SQL context

    While static analysis flags these as unused, they're actually referenced in the SQL CREATE TABLE statements through DuckDB's implicit DataFrame registration.

    Also applies to: 119-120


    122-210: Well-structured SQL view creation with proper column handling

    The implementation properly handles optional columns and creates appropriate views for both LFQ and TMT/iTRAQ experiments. The dynamic SQL generation based on available columns is a good approach.

    docs/README.adoc (2)

    1227-1280: Excellent documentation of intensity data structure

    The clear distinction between primary intensities (raw measurements) and additional intensities (derived values) with concrete examples greatly improves the format specification. The semantic guidelines provide valuable guidance for implementers.


    1405-1481: Well-structured protein group field definitions

    The addition of structured peptide_counts and feature_counts fields, along with clearer descriptions of existing fields, improves the protein group specification. The mapping to tool-specific fields (DIA-NN, FragPipe, MaxQuant) is particularly helpful.

    quantmsio/core/quantms/pg.py (3)

    285-434: Well-architected protein quantification pipeline

    The implementation effectively uses DuckDB for scalable SQL-based aggregation with proper error handling and detailed logging. The step-by-step approach with timing information aids debugging and performance analysis.


    627-782: Comprehensive protein row construction with proper data transformation

    The method effectively transforms SQL aggregation results into the structured protein group format. The handling of optional computations (TopN, iBAQ) and fallback values for missing data is well implemented.


    517-524: DuckDB Implicit DataFrame Registration – No Change Required

    The protein_df variable is picked up by DuckDB’s Python API at runtime—when you run

    msstats_in._duckdb.execute(
        "CREATE TABLE protein_groups AS SELECT * FROM protein_df"
    )

    the protein_df object in the Python namespace is used as the source table. Although static analysis flags it as “unused,” it’s consumed implicitly by the SQL engine.

    Optional (to satisfy linters): explicitly register the DataFrame before use:

    msstats_in._duckdb.register("protein_df", protein_df)
    msstats_in._duckdb.execute(
        "CREATE TABLE protein_groups AS SELECT * FROM protein_df"
    )

    File: quantmsio/core/quantms/pg.py
    Lines: 517–524

    Comment on lines 366 to 391
    def write_psms_to_parquet(
    self,
    file_path: Path,
    batch_size: int = 10000,
    output_prefix_file: Optional[str] = None,
    **metadata,
    ):
    if not file_path.exists():
    raise FileNotFoundError(file_path)
    if not self.output_directory.exists():
    self.output_directory.mkdir(parents=True)
    if not output_prefix_file:
    output_prefix_file = "psm"

    file_uuid = uuid.uuid4()
    output_path = (
    self.output_directory / f"{output_prefix_file}-{file_uuid}.psm.parquet"
    )

    metadata["file_type"] = "psm"
    metadata["uuid"] = str(file_uuid)
    metadata["creation_date"] = date.today().isoformat()
    msms_file: Path,
    batch_size: int = 1000000,
    output_prefix: Optional[str] = None,
    ) -> None:
    """
    Write PSMs to parquet file.

    Args:
    msms_file: Path to the psm.tsv file
    batch_size: Number of rows to process at a time
    output_prefix: Prefix for the output file name
    """
    if not output_prefix:
    output_prefix = "psm"

    file_uuid = str(uuid.uuid4())
    output_path = self.output_directory / f"{output_prefix}-{file_uuid}.psm.parquet"

    metadata = {
    "file_type": "psm",
    "uuid": file_uuid,
    "creation_date": date.today().isoformat(),
    }

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Review the aggressive batch size increase and missing safety checks.

    The refactoring improves code clarity with better parameter naming, type annotations, and documentation. However, there are two concerns:

    1. Batch size increase: The default batch size increased 100x from 10,000 to 1,000,000. This could cause memory issues on systems with limited resources.

    2. Missing safety checks: The removal of file existence checks and output directory creation could lead to less informative error messages and runtime failures.

    Consider these improvements:

    def write_psms_to_parquet(
        self,
        msms_file: Path,
    -   batch_size: int = 1000000,
    +   batch_size: int = 100000,  # More conservative default
        output_prefix: Optional[str] = None,
    ) -> None:
        """
        Write PSMs to parquet file.
    
        Args:
            msms_file: Path to the psm.tsv file
            batch_size: Number of rows to process at a time
            output_prefix: Prefix for the output file name
        """
    +   # Validate input file exists
    +   if not msms_file.exists():
    +       raise FileNotFoundError(f"PSM file not found: {msms_file}")
    +   
    +   # Ensure output directory exists
    +   self.output_directory.mkdir(parents=True, exist_ok=True)
    +   
        if not output_prefix:
            output_prefix = "psm"
    📝 Committable suggestion

    ‼️ IMPORTANT
    Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    Suggested change
    def write_psms_to_parquet(
    self,
    file_path: Path,
    batch_size: int = 10000,
    output_prefix_file: Optional[str] = None,
    **metadata,
    ):
    if not file_path.exists():
    raise FileNotFoundError(file_path)
    if not self.output_directory.exists():
    self.output_directory.mkdir(parents=True)
    if not output_prefix_file:
    output_prefix_file = "psm"
    file_uuid = uuid.uuid4()
    output_path = (
    self.output_directory / f"{output_prefix_file}-{file_uuid}.psm.parquet"
    )
    metadata["file_type"] = "psm"
    metadata["uuid"] = str(file_uuid)
    metadata["creation_date"] = date.today().isoformat()
    msms_file: Path,
    batch_size: int = 1000000,
    output_prefix: Optional[str] = None,
    ) -> None:
    """
    Write PSMs to parquet file.
    Args:
    msms_file: Path to the psm.tsv file
    batch_size: Number of rows to process at a time
    output_prefix: Prefix for the output file name
    """
    if not output_prefix:
    output_prefix = "psm"
    file_uuid = str(uuid.uuid4())
    output_path = self.output_directory / f"{output_prefix}-{file_uuid}.psm.parquet"
    metadata = {
    "file_type": "psm",
    "uuid": file_uuid,
    "creation_date": date.today().isoformat(),
    }
    def write_psms_to_parquet(
    self,
    msms_file: Path,
    batch_size: int = 100000, # More conservative default
    output_prefix: Optional[str] = None,
    ) -> None:
    """
    Write PSMs to parquet file.
    Args:
    msms_file: Path to the psm.tsv file
    batch_size: Number of rows to process at a time
    output_prefix: Prefix for the output file name
    """
    # Validate input file exists
    if not msms_file.exists():
    raise FileNotFoundError(f"PSM file not found: {msms_file}")
    # Ensure output directory exists
    self.output_directory.mkdir(parents=True, exist_ok=True)
    if not output_prefix:
    output_prefix = "psm"
    file_uuid = str(uuid.uuid4())
    output_path = self.output_directory / f"{output_prefix}-{file_uuid}.psm.parquet"
    metadata = {
    "file_type": "psm",
    "uuid": file_uuid,
    "creation_date": date.today().isoformat(),
    }
    # ... rest of implementation ...
    🤖 Prompt for AI Agents
    In quantmsio/core/fragpipe/fragpipe.py around lines 366 to 391, the default
    batch size was increased from 10,000 to 1,000,000, which may cause memory issues
    on some systems; reduce the default batch size back to a safer value like
    10,000. Also, add safety checks to verify that the input msms_file exists before
    processing and ensure the output directory exists or create it if missing to
    prevent runtime errors and provide clearer error messages.
    

    Comment on lines +127 to +129
    raise FileNotFoundError(
    "The SDRF file provided not found: " + str(sdrf_file)
    )
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Improve exception chaining for better error context.

    The string conversion fix for the error message is good. However, consider using proper exception chaining to preserve the original error context.

    Apply this diff to implement proper exception chaining:

    except FileNotFoundError:
    -   raise FileNotFoundError(
    -       "The SDRF file provided not found: " + str(sdrf_file)
    -   )
    +   raise FileNotFoundError(
    +       "The SDRF file provided not found: " + str(sdrf_file)
    +   ) from None

    Alternatively, if you want to preserve the original exception context:

    -except FileNotFoundError:
    +except FileNotFoundError as err:
        raise FileNotFoundError(
            "The SDRF file provided not found: " + str(sdrf_file)
    -   )
    +   ) from err
    📝 Committable suggestion

    ‼️ IMPORTANT
    Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    Suggested change
    raise FileNotFoundError(
    "The SDRF file provided not found: " + str(sdrf_file)
    )
    except FileNotFoundError:
    raise FileNotFoundError(
    "The SDRF file provided not found: " + str(sdrf_file)
    ) from None
    Suggested change
    raise FileNotFoundError(
    "The SDRF file provided not found: " + str(sdrf_file)
    )
    except FileNotFoundError as err:
    raise FileNotFoundError(
    "The SDRF file provided not found: " + str(sdrf_file)
    ) from err
    🧰 Tools
    🪛 Ruff (0.11.9)

    127-129: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

    (B904)

    🤖 Prompt for AI Agents
    In quantmsio/core/sdrf.py around lines 127 to 129, the FileNotFoundError is
    raised with a custom message but lacks proper exception chaining. Modify the
    code to use "raise FileNotFoundError(...) from original_exception" where
    original_exception is the caught exception, to preserve the original error
    context and improve debugging. This involves catching the original exception
    first, then raising the new FileNotFoundError with the "from" keyword
    referencing the caught exception.
    

    Comment on lines +58 to +69
    if self.mztab_path.stat().st_size == 0:
    raise ValueError("File is empty")
    f = open(self.mztab_path)
    # Use seekable file path to handle gzip files
    seekable_path = self._get_seekable_file_path()
    f = open(seekable_path, "r", encoding="utf-8")
    pos = self._get_pos("PEH")
    f.seek(pos)
    line = f.readline()
    while not line.startswith("PEH"):
    line = f.readline()
    self._pep_columns = line.split("\n")[0].split("\t")
    f.close()
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Use context manager for file operations.

    The file handling has been improved to support gzip files, but the manual file opening and closing should use a context manager for better resource management and exception safety.

    -    # Use seekable file path to handle gzip files
    -    seekable_path = self._get_seekable_file_path()
    -    f = open(seekable_path, "r", encoding="utf-8")
    -    pos = self._get_pos("PEH")
    -    f.seek(pos)
    -    line = f.readline()
    -    while not line.startswith("PEH"):
    -        line = f.readline()
    -    self._pep_columns = line.split("\n")[0].split("\t")
    -    f.close()
    +    # Use seekable file path to handle gzip files
    +    seekable_path = self._get_seekable_file_path()
    +    with open(seekable_path, "r", encoding="utf-8") as f:
    +        pos = self._get_pos("PEH")
    +        f.seek(pos)
    +        line = f.readline()
    +        while not line.startswith("PEH"):
    +            line = f.readline()
    +        self._pep_columns = line.split("\n")[0].split("\t")
    📝 Committable suggestion

    ‼️ IMPORTANT
    Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    Suggested change
    if self.mztab_path.stat().st_size == 0:
    raise ValueError("File is empty")
    f = open(self.mztab_path)
    # Use seekable file path to handle gzip files
    seekable_path = self._get_seekable_file_path()
    f = open(seekable_path, "r", encoding="utf-8")
    pos = self._get_pos("PEH")
    f.seek(pos)
    line = f.readline()
    while not line.startswith("PEH"):
    line = f.readline()
    self._pep_columns = line.split("\n")[0].split("\t")
    f.close()
    if self.mztab_path.stat().st_size == 0:
    raise ValueError("File is empty")
    # Use seekable file path to handle gzip files
    seekable_path = self._get_seekable_file_path()
    with open(seekable_path, "r", encoding="utf-8") as f:
    pos = self._get_pos("PEH")
    f.seek(pos)
    line = f.readline()
    while not line.startswith("PEH"):
    line = f.readline()
    self._pep_columns = line.split("\n")[0].split("\t")
    🧰 Tools
    🪛 Ruff (0.11.9)

    62-62: Use a context manager for opening files

    (SIM115)

    🪛 Pylint (3.3.7)

    [refactor] 62-62: Consider using 'with' for resource-allocating operations

    (R1732)

    🤖 Prompt for AI Agents
    In quantmsio/core/quantms/psm.py around lines 58 to 69, the file is opened and
    closed manually which risks resource leaks if exceptions occur. Replace the
    manual open and close calls with a context manager using the "with" statement to
    ensure the file is properly closed after reading, improving resource management
    and exception safety.
    

    - name: Set up Python ${{ matrix.python-version }}
    uses: actions/setup-python@v3
    uses: actions/setup-python@v4
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue

    Update GitHub Actions to latest version.

    The actions/setup-python@v4 action is outdated and should be updated to v5 for better compatibility and security.

    -      uses: actions/setup-python@v4
    +      uses: actions/setup-python@v5

    Apply this change to both occurrences (lines 23 and 56).

    Also applies to: 56-56

    🧰 Tools
    🪛 actionlint (1.7.7)

    23-23: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)

    🤖 Prompt for AI Agents
    In .github/workflows/python-package.yml at lines 23 and 56, the GitHub Action
    'actions/setup-python' is using version 'v4', which is outdated. Update both
    occurrences to use 'actions/setup-python@v5' to ensure better compatibility and
    security.
    

    - uses: actions/checkout@v4

    - name: Set up Python 3.10
    uses: actions/setup-python@v4
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue

    Update GitHub Actions to latest version.

    The actions/setup-python@v4 action is outdated and should be updated to v5.

    -      uses: actions/setup-python@v4
    +      uses: actions/setup-python@v5
    📝 Committable suggestion

    ‼️ IMPORTANT
    Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    Suggested change
    uses: actions/setup-python@v4
    uses: actions/setup-python@v5
    🧰 Tools
    🪛 actionlint (1.7.7)

    28-28: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)

    🤖 Prompt for AI Agents
    In .github/workflows/integration-tests.yml at line 28, the GitHub Action uses an
    outdated version of actions/setup-python (v4). Update the version tag from v4 to
    v5 to use the latest version of the action.
    

    project_handler.populate_from_sdrf(str(sdrf_file))
    project_handler.add_quantms_version(quantmsio_version=quantmsio_version)
    project_handler.add_software_provider(
    sortware_name="quantms", sortware_version=quantms_version
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue

    Fix typo in parameter name.

    The parameter name has a typo that will cause a runtime error.

    -            sortware_name="quantms", sortware_version=quantms_version
    +            software_name="quantms", software_version=quantms_version
    📝 Committable suggestion

    ‼️ IMPORTANT
    Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    Suggested change
    sortware_name="quantms", sortware_version=quantms_version
    software_name="quantms", software_version=quantms_version
    🤖 Prompt for AI Agents
    In quantmsio/commands/convert/quantms_project.py at line 115, there is a typo in
    the parameter name "sortware_name" which should be corrected to "software_name"
    to prevent runtime errors. Update the parameter name to the correct spelling.
    

    Comment on lines +56 to +58
    except Exception as e:
    print(f"Warning: Could not setup optimized processing: {e}")
    # Fall back to original processing
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Log exception details for better debugging

    The current exception handling swallows all exceptions and only prints a warning. This could make debugging difficult in production.

             except Exception as e:
    -            print(f"Warning: Could not setup optimized processing: {e}")
    +            import logging
    +            logger = logging.getLogger(__name__)
    +            logger.warning(f"Could not setup optimized processing: {type(e).__name__}: {e}")
                 # Fall back to original processing
    +            self._optimized_setup_done = False
    📝 Committable suggestion

    ‼️ IMPORTANT
    Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    Suggested change
    except Exception as e:
    print(f"Warning: Could not setup optimized processing: {e}")
    # Fall back to original processing
    except Exception as e:
    import logging
    logger = logging.getLogger(__name__)
    logger.warning(f"Could not setup optimized processing: {type(e).__name__}: {e}")
    # Fall back to original processing
    self._optimized_setup_done = False
    🤖 Prompt for AI Agents
    In quantmsio/core/quantms/msstats_in.py around lines 56 to 58, the exception
    handling only prints a warning message without logging the full exception
    details, which hinders debugging. Modify the except block to log the complete
    exception information, including the stack trace, using a proper logging
    mechanism instead of just printing a warning. This will provide better
    visibility into the error cause during production issues.
    

    Copy link
    Contributor

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 0

    ♻️ Duplicate comments (11)
    quantmsio/commands/convert/quantms_project.py (3)

    5-5: Remove unused import.

    The logging import is not used in this file and should be removed.


    111-111: Fix typo in parameter name.

    The parameter name has a typo that will cause a runtime error.

    -            sortware_name="quantms", sortware_version=quantms_version
    +            software_name="quantms", software_version=quantms_version

    43-201: Refactor complex function to improve maintainability.

    This function is overly complex with multiple responsibilities and too many parameters. Consider breaking it down into smaller, focused functions for better maintainability.

    quantmsio/core/quantms/mztab.py (2)

    680-680: Replace bare except clauses with specific exceptions.

    Using bare except: can catch system exits and keyboard interrupts. Be more specific about which exceptions to handle.

    Also applies to: 691-691


    785-803: Replace bare except clauses in cleanup methods.

    The cleanup methods use bare except clauses which should be replaced with specific exception types for better error handling.

    tests/test_maxquant.py (3)

    6-6: Remove unused import.

    The pytest import is not used in this file and should be removed.


    223-287: Remove duplicate test function.

    This function test_maxquant_pg_basic_transformation is an exact duplicate of test_maxquant_protein_groups_transform and should be removed to avoid redundancy.


    289-329: Remove duplicate test function.

    This function test_maxquant_pg_sample_specific_columns is an exact duplicate of test_maxquant_protein_groups_with_sample_columns and should be removed to avoid redundancy.

    quantmsio/core/quantms/msstats_in.py (3)

    68-73: Fix exception handling in column detection

    Multiple issues with exception handling:

    1. Bare except clause at line 68
    2. Unused variable 'e' at line 76
    3. Should log the exception details for debugging
                 try:
                     columns_query = "PRAGMA table_info('report')"
                     columns_info = self._duckdb.execute(columns_query).df()
                     available_columns = set(columns_info["name"].tolist())
    -            except:
    +            except (AttributeError, KeyError, ValueError) as e:
    +                import logging
    +                logger = logging.getLogger(__name__)
    +                logger.debug(f"PRAGMA query failed: {e}")
                     # Fallback: get columns from a sample query
                     sample_query = "SELECT * FROM report LIMIT 1"
                     sample_data = self._duckdb.execute(sample_query).df()
                     available_columns = set(sample_data.columns.tolist())
    
                 self._available_columns = available_columns
    
    -        except Exception as e:
    +        except Exception:
                 # Default column set based on common msstats format
                 self._available_columns = {

    Also applies to: 76-87


    56-58: Replace bare except with specific exception handling and improve fallback handling

    The bare except clause should catch specific exceptions, use proper logging instead of print, and ensure _optimized_setup_done remains False on failure.

    -        except Exception as e:
    -            print(f"Warning: Could not setup optimized processing: {e}")
    -            # Fall back to original processing
    +        except Exception as e:
    +            import logging
    +            logger = logging.getLogger(__name__)
    +            logger.warning(f"Could not setup optimized processing: {type(e).__name__}: {e}")
    +            # Fall back to original processing
    +            self._optimized_setup_done = False

    260-267: Replace bare except clauses in fallback statistics method

    The bare except clauses could catch system exceptions and hide real issues.

             try:
                 columns_query = "PRAGMA table_info('report')"
                 columns_info = self._duckdb.execute(columns_query).df()
                 available_columns = set(columns_info["name"].tolist())
    -        except:
    +        except (AttributeError, KeyError, ValueError):
                 # If pragma doesn't work, try a different approach
                 try:
                     sample_query = "SELECT * FROM report LIMIT 1"
                     sample_data = self._duckdb.execute(sample_query).df()
                     available_columns = set(sample_data.columns.tolist())
    -            except:
    +            except Exception:
                     # Fallback to basic assumptions
                     available_columns = {
    🧹 Nitpick comments (3)
    tests/test_full_conversion_suite.py (1)

    70-76: Minor style improvement: simplify conditional structure.

    The else clause after return is unnecessary and can be simplified.

    -        if result.returncode == 0:
    -            print(f"[OK] Success in {duration:.2f}s")
    -            return True, duration, result.stdout, result.stderr
    -        else:
    -            print(f"[FAIL] Failed in {duration:.2f}s")
    -            print(f"Error: {result.stderr}")
    -            return False, duration, result.stdout, result.stderr
    +        if result.returncode == 0:
    +            print(f"[OK] Success in {duration:.2f}s")
    +            return True, duration, result.stdout, result.stderr
    +        
    +        print(f"[FAIL] Failed in {duration:.2f}s")
    +        print(f"Error: {result.stderr}")
    +        return False, duration, result.stdout, result.stderr
    quantmsio/core/quantms/msstats_in.py (1)

    105-106: Remove unused DataFrame assignments

    The DataFrames are created but the variables are never used. DuckDB references them by name in the SQL string.

    -        mapping_df = pd.DataFrame(mapping_data)
    -        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
    +        # DuckDB can reference the DataFrame directly by name
    +        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM pd.DataFrame(mapping_data)")

    Or alternatively:

             # Insert into DuckDB using pandas DataFrame
             mapping_df = pd.DataFrame(mapping_data)
    -        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
    +        self._duckdb.register("mapping_df", mapping_df)
    +        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
    +        self._duckdb.unregister("mapping_df")

    Also applies to: 119-120

    quantmsio/core/quantms/pg.py (1)

    285-295: Consider using a configuration object to reduce method arguments

    The method has 9 arguments which impacts readability and maintainability. Consider using a configuration dataclass or dictionary.

    from dataclasses import dataclass
    
    @dataclass
    class QuantificationConfig:
        compute_topn: bool = True
        topn: int = 3
        compute_ibaq: bool = True
        file_num: int = 10
        duckdb_max_memory: str = "16GB"
        duckdb_threads: int = 4
    
    def quantify_from_msstats_optimized(
        self,
        msstats_path: str,
        sdrf_path: str,
        config: Optional[QuantificationConfig] = None
    ) -> pd.DataFrame:
        if config is None:
            config = QuantificationConfig()
        # Use config.compute_topn, config.topn, etc.
    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between 583005d and 8ce4f5e.

    📒 Files selected for processing (10)
    • quantmsio/commands/convert/quantms_project.py (1 hunks)
    • quantmsio/core/duckdb.py (2 hunks)
    • quantmsio/core/quantms/msstats_in.py (1 hunks)
    • quantmsio/core/quantms/mztab.py (1 hunks)
    • quantmsio/core/quantms/pg.py (1 hunks)
    • tests/test_diann.py (2 hunks)
    • tests/test_feature.py (4 hunks)
    • tests/test_full_conversion_suite.py (1 hunks)
    • tests/test_maxquant.py (2 hunks)
    • tests/test_query.py (2 hunks)
    🚧 Files skipped from review as they are similar to previous changes (3)
    • tests/test_query.py
    • tests/test_feature.py
    • quantmsio/core/duckdb.py
    🧰 Additional context used
    🧬 Code Graph Analysis (2)
    quantmsio/core/quantms/mztab.py (8)
    quantmsio/quantmsioc.py (1)
    • project (84-86)
    quantmsio/core/project.py (1)
    • create_uuid_filename (59-62)
    quantmsio/operate/tools.py (1)
    • get_modification_details (205-247)
    quantmsio/utils/pride_utils.py (1)
    • get_quantmsio_modifications (271-286)
    quantmsio/core/quantms/pg.py (2)
    • _safe_file_open (229-273)
    • cleanup (35-58)
    quantmsio/operate/query.py (1)
    • close (344-348)
    quantmsio/operate/statistics.py (1)
    • close (108-112)
    quantmsio/core/fragpipe/fragpipe.py (2)
    • append (43-44)
    • peptidoform (194-219)
    quantmsio/core/quantms/msstats_in.py (4)
    quantmsio/core/duckdb.py (3)
    • DuckDB (13-101)
    • destroy_duckdb_database (96-101)
    • query_field (86-94)
    quantmsio/core/sdrf.py (3)
    • SDRFHandler (90-454)
    • get_experiment_type_from_sdrf (269-314)
    • get_sample_map_run (407-420)
    quantmsio/operate/tools.py (1)
    • get_protein_accession (267-272)
    quantmsio/utils/pride_utils.py (1)
    • clean_peptidoform_sequence (147-159)
    🪛 Ruff (0.11.9)
    tests/test_maxquant.py

    6-6: pytest imported but unused

    Remove unused import: pytest

    (F401)

    quantmsio/core/quantms/mztab.py

    255-255: Use a context manager for opening files

    (SIM115)


    258-258: Use a context manager for opening files

    (SIM115)


    414-414: Use a context manager for opening files

    (SIM115)


    478-478: Use a context manager for opening files

    (SIM115)


    495-495: Use kwargs.get("usecols") instead of kwargs.get("usecols", None)

    Replace kwargs.get("usecols", None) with kwargs.get("usecols")

    (SIM910)


    680-680: Do not use bare except

    (E722)


    691-691: Do not use bare except

    (E722)


    785-788: Use contextlib.suppress(Exception) instead of try-except-pass

    Replace with contextlib.suppress(Exception)

    (SIM105)


    792-795: Use contextlib.suppress(Exception) instead of try-except-pass

    Replace with contextlib.suppress(Exception)

    (SIM105)


    800-803: Use contextlib.suppress(Exception) instead of try-except-pass

    Replace with contextlib.suppress(Exception)

    (SIM105)

    quantmsio/core/quantms/pg.py

    62-65: Use contextlib.suppress(Exception) instead of try-except-pass

    (SIM105)


    241-241: Use a context manager for opening files

    (SIM115)


    246-246: Use a context manager for opening files

    (SIM115)


    261-264: Use contextlib.suppress(ValueError) instead of try-except-pass

    (SIM105)


    519-519: Local variable protein_df is assigned to but never used

    Remove assignment to unused variable protein_df

    (F841)

    quantmsio/commands/convert/quantms_project.py

    5-5: logging imported but unused

    Remove unused import: logging

    (F401)

    quantmsio/core/quantms/msstats_in.py

    76-76: Local variable e is assigned to but never used

    Remove assignment to unused variable e

    (F841)


    105-105: Local variable mapping_df is assigned to but never used

    Remove assignment to unused variable mapping_df

    (F841)


    119-119: Local variable sample_df is assigned to but never used

    Remove assignment to unused variable sample_df

    (F841)


    260-260: Do not use bare except

    (E722)


    266-266: Do not use bare except

    (E722)

    🪛 Flake8 (7.2.0)
    tests/test_maxquant.py

    [error] 6-6: 'pytest' imported but unused

    (F401)

    quantmsio/core/quantms/mztab.py

    [error] 680-680: do not use bare 'except'

    (E722)


    [error] 691-691: do not use bare 'except'

    (E722)


    [error] 769-769: whitespace before ':'

    (E203)

    quantmsio/core/quantms/pg.py

    [error] 519-519: local variable 'protein_df' is assigned to but never used

    (F841)


    [error] 583-583: whitespace before ':'

    (E203)

    quantmsio/commands/convert/quantms_project.py

    [error] 5-5: 'logging' imported but unused

    (F401)

    quantmsio/core/quantms/msstats_in.py

    [error] 76-76: local variable 'e' is assigned to but never used

    (F841)


    [error] 105-105: local variable 'mapping_df' is assigned to but never used

    (F841)


    [error] 119-119: local variable 'sample_df' is assigned to but never used

    (F841)


    [error] 219-219: whitespace before ':'

    (E203)


    [error] 260-260: do not use bare 'except'

    (E722)


    [error] 266-266: do not use bare 'except'

    (E722)


    [error] 533-533: whitespace before ':'

    (E203)

    🪛 GitHub Check: CodeFactor
    quantmsio/core/quantms/mztab.py

    [notice] 680-680: quantmsio/core/quantms/mztab.py#L680
    Do not use bare 'except'. (E722)


    [notice] 691-691: quantmsio/core/quantms/mztab.py#L691
    Do not use bare 'except'. (E722)


    [notice] 691-692: quantmsio/core/quantms/mztab.py#L691-L692
    Try, Except, Continue detected. (B112)


    [notice] 794-795: quantmsio/core/quantms/mztab.py#L794-L795
    Try, Except, Pass detected. (B110)


    [notice] 787-788: quantmsio/core/quantms/mztab.py#L787-L788
    Try, Except, Pass detected. (B110)


    [notice] 802-803: quantmsio/core/quantms/mztab.py#L802-L803
    Try, Except, Pass detected. (B110)

    quantmsio/core/quantms/pg.py

    [notice] 64-65: quantmsio/core/quantms/pg.py#L64-L65
    Try, Except, Pass detected. (B110)


    [notice] 270-271: quantmsio/core/quantms/pg.py#L270-L271
    Try, Except, Pass detected. (B110)

    quantmsio/commands/convert/quantms_project.py

    [notice] 45-199: quantmsio/commands/convert/quantms_project.py#L45-L199
    Complex Method

    quantmsio/core/quantms/msstats_in.py

    [notice] 260-260: quantmsio/core/quantms/msstats_in.py#L260
    Do not use bare 'except'. (E722)


    [notice] 68-68: quantmsio/core/quantms/msstats_in.py#L68
    Do not use bare 'except'. (E722)


    [notice] 683-684: quantmsio/core/quantms/msstats_in.py#L683-L684
    Try, Except, Pass detected. (B110)

    🪛 Pylint (3.3.7)
    quantmsio/core/quantms/mztab.py

    [refactor] 83-83: Too many instance attributes (22/7)

    (R0902)


    [refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 414-414: Consider using 'with' for resource-allocating operations

    (R1732)


    [refactor] 478-478: Consider using 'with' for resource-allocating operations

    (R1732)


    [refactor] 606-640: Too many nested blocks (7/5)

    (R1702)


    [refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)


    [refactor] 715-732: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)

    quantmsio/core/quantms/pg.py

    [error] 73-73: Non-iterable value file is used in an iterating context

    (E1133)


    [refactor] 74-82: Unnecessary "elif" after "break", remove the leading "el" from "elif"

    (R1723)


    [error] 110-110: Non-iterable value file is used in an iterating context

    (E1133)


    [refactor] 92-92: Too many branches (13/12)

    (R0912)


    [refactor] 192-195: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 277-283: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)


    [refactor] 285-285: Too many arguments (9/5)

    (R0913)


    [refactor] 285-285: Too many positional arguments (9/5)

    (R0917)


    [refactor] 285-285: Too many local variables (35/15)

    (R0914)


    [refactor] 285-285: Too many statements (55/50)

    (R0915)


    [refactor] 571-571: Useless return at end of function or method

    (R1711)


    [refactor] 627-627: Too many arguments (7/5)

    (R0913)


    [refactor] 627-627: Too many positional arguments (7/5)

    (R0917)


    [refactor] 627-627: Too many local variables (23/15)

    (R0914)

    quantmsio/commands/convert/quantms_project.py

    [refactor] 43-43: Too many arguments (6/5)

    (R0913)


    [refactor] 43-43: Too many positional arguments (6/5)

    (R0917)


    [refactor] 43-43: Too many local variables (25/15)

    (R0914)


    [refactor] 43-43: Too many branches (14/12)

    (R0912)


    [refactor] 43-43: Too many statements (78/50)

    (R0915)


    [refactor] 243-243: Too many arguments (6/5)

    (R0913)


    [refactor] 243-243: Too many positional arguments (6/5)

    (R0917)

    quantmsio/core/quantms/msstats_in.py

    [refactor] 22-22: Consider using Python 3 style super() without arguments

    (R1725)


    [refactor] 405-406: Use 'yield from' directly instead of yielding each element one by one

    (R1737)

    tests/test_full_conversion_suite.py

    [refactor] 70-76: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)

    ⏰ Context from checks skipped due to timeout of 90000ms (4)
    • GitHub Check: Codacy Static Code Analysis
    • GitHub Check: unit-tests
    • GitHub Check: unit-tests
    • GitHub Check: unit-tests (3.10)
    🔇 Additional comments (13)
    quantmsio/core/quantms/mztab.py (2)

    83-136: LGTM! Excellent architecture for MzTab processing.

    The class design demonstrates good practices with:

    • Optional DuckDB integration for large files
    • Proper gzip support
    • Resource management with context managers
    • Intelligent file size-based optimization
    • Comprehensive metadata caching

    The initialization logic is well-structured and handles edge cases appropriately.


    246-266: Well-implemented context manager for file operations.

    The _safe_file_open method properly handles both gzipped and regular files with appropriate error handling and resource cleanup.

    tests/test_diann.py (2)

    20-27: Excellent addition of proper cleanup handling.

    The try-finally blocks ensure that DuckDB resources are properly cleaned up even if exceptions occur during processing. This prevents resource leaks and is a good practice for database connections.

    Also applies to: 35-45


    48-145: Comprehensive test coverage for protein group intensity structure.

    This new test function excellently validates:

    • The structured intensity representation with proper sample_accession and channel mapping
    • Both intensities and additional_intensities fields
    • Nested intensity arrays with correct types ("normalize_intensity" and "lfq")
    • Proper channel assignment ("LFQ" for DIA-NN)

    The test provides thorough coverage for the refactored protein group handling and ensures data integrity.

    tests/test_maxquant.py (1)

    331-357: Excellent test for gene name extraction.

    This test function provides good coverage for the gene name extraction logic from protein descriptions, testing various formats including edge cases like missing gene names and multiple gene entries.

    tests/test_full_conversion_suite.py (3)

    17-43: Excellent test infrastructure setup.

    The helper functions provide a solid foundation for integration tests with proper workspace detection and file path management. The separation of LFQ and TMT test files is well-organized.


    46-84: Well-implemented command execution with robust error handling.

    The run_command function includes excellent practices:

    • Proper UTF-8 encoding handling
    • Reasonable timeout for long-running operations
    • Comprehensive error capture and logging
    • Good timing information for performance monitoring

    105-426: Comprehensive integration test coverage.

    This test suite provides excellent end-to-end validation of the conversion workflows:

    • Tests all major data types (PSM, Feature, Protein Groups)
    • Covers both LFQ and TMT experiments
    • Validates output file generation and content structure
    • Includes proper column count and data row verification
    • Uses appropriate pytest markers for integration tests

    The tests ensure the CLI commands produce valid and complete quantms.io parquet outputs.

    quantmsio/core/quantms/msstats_in.py (1)

    28-35: Well-designed context manager and cleanup implementation

    The implementation correctly follows context manager protocol and ensures proper cleanup of database resources. The try/except/pass in __del__ is appropriate for destructor cleanup to avoid exceptions during garbage collection.

    Also applies to: 671-685

    quantmsio/core/quantms/pg.py (4)

    19-26: Excellent resource management pattern

    Great implementation of resource tracking and cleanup:

    • Initializing tracking lists before super().__init__ prevents issues if parent initialization fails
    • Comprehensive cleanup of file handles and temporary files
    • Proper context manager support
    • The try/except/pass in __del__ is appropriate for destructors

    Also applies to: 35-66


    229-274: Well-implemented safe file handling with automatic format detection

    The _safe_file_open method provides excellent file handling:

    • Automatic detection and handling of gzipped files
    • Proper resource tracking for cleanup
    • Context manager pattern ensures files are closed
    • Appropriate error handling in cleanup

    591-626: Well-structured SQL generation with proper aggregation

    The SQL generation is clean and efficient:

    • Proper use of GROUP BY for aggregation
    • Good selection of aggregate functions (SUM, COUNT, MAX, AVG)
    • Filtering out null anchor proteins and zero intensities
    • Proper ordering for consistent results

    518-523: Remove unused DataFrame assignment

    The protein_df variable is created but never used directly - DuckDB references DataFrames by name in SQL.

                 # Convert to DataFrame and load into DuckDB
                 if protein_data:
    -                protein_df = pd.DataFrame(protein_data)
                     msstats_in._duckdb.execute("DROP TABLE IF EXISTS protein_groups")
                     msstats_in._duckdb.execute(
    -                    "CREATE TABLE protein_groups AS SELECT * FROM protein_df"
    +                    "CREATE TABLE protein_groups AS SELECT * FROM pd.DataFrame(protein_data)"
                     )

    Likely an incorrect or invalid review comment.

    Copy link
    Contributor

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 1

    ♻️ Duplicate comments (2)
    quantmsio/core/quantms/mztab.py (1)

    784-808: Replace bare except clauses with specific exceptions.

    Using bare except: can catch system exits and keyboard interrupts. Be more specific about which exceptions to handle.

    The previous review comment about replacing bare except clauses is still applicable. Apply the suggested fix:

             if self._duckdb:
                 try:
                     self._duckdb.close()
    -            except:
    +            except Exception as e:
    +                self.logger.warning(f"Failed to close DuckDB connection: {e}")
    -                pass
    
             if self._duckdb_name and Path(self._duckdb_name).exists():
                 try:
                     os.remove(self._duckdb_name)
    -            except:
    +            except (OSError, IOError) as e:
    +                self.logger.warning(f"Failed to remove DuckDB file {self._duckdb_name}: {e}")
    -                pass
    
             if self._temp_decompressed_file and self._temp_decompressed_file.exists():
                 try:
                     self._temp_decompressed_file.unlink()
    -            except:
    +            except (OSError, IOError) as e:
    +                self.logger.warning(f"Failed to remove temporary file {self._temp_decompressed_file}: {e}")
    -                pass
    quantmsio/core/quantms/msstats_in.py (1)

    56-58: This is a duplicate of a previous review comment.

    The exception handling has been improved with logging, but the logger attribute issue needs to be addressed.

    🧹 Nitpick comments (10)
    quantmsio/core/quantms/mztab.py (3)

    32-81: Consider refactoring complex nested logic.

    The function has deeply nested conditionals that make it difficult to follow. Consider extracting helper methods for better readability.

    Consider this refactoring approach:

    +def _parse_modification_base(line_parts: list, modifications: dict) -> dict:
    +    """Parse base modification information."""
    +    values = line_parts[2].replace("[", "").replace("]", "").split(",")
    +    accession = values[1].strip()
    +    name = values[2].strip()
    +    index = line_parts[1].split("[")[1].split("]")[0]
    +    modifications[accession] = [name, index, None, None]
    +    return modifications
    +
    +def _find_modification_by_index(modifications: dict, index: str) -> Optional[str]:
    +    """Find modification accession by index."""
    +    for key, value in modifications.items():
    +        if value[1] == index:
    +            return key
    +    return None
    +
     def fetch_modifications_from_mztab_line(line: str, _modifications: dict) -> dict:
         line = line.strip()
         line_parts = line.split("\t")
         if line_parts[0] == "MTD" and "_mod[" in line_parts[1]:
             if "site" not in line_parts[1] and "position" not in line_parts[1]:
    -            values = line_parts[2].replace("[", "").replace("]", "").split(",")
    -            accession = values[1].strip()
    -            name = values[2].strip()
    -            index = line_parts[1].split("[")[1].split("]")[0]
    -            _modifications[accession] = [name, index, None, None]
    +            _modifications = _parse_modification_base(line_parts, _modifications)
             elif "site" in line_parts[1]:
                 index = line_parts[1].split("[")[1].split("]")[0]
    -            accession = None
    -            for key, value in _modifications.items():
    -                if value[1] == index:
    -                    accession = key
    +            accession = _find_modification_by_index(_modifications, index)
                 if accession is None:
                     raise Exception("The accession for the modification is None")
                 _modifications[accession][2] = line_parts[2]
             elif "position" in line_parts[1]:
                 index = line_parts[1].split("[")[1].split("]")[0]
    -            accession = None
    -            for key, value in _modifications.items():
    -                if value[1] == index:
    -                    accession = key
    +            accession = _find_modification_by_index(_modifications, index)
                 if accession is None:
                     raise Exception("The accession for the modification is None")
                 _modifications[accession][3] = line_parts[2]
         return _modifications

    407-448: Consider improving file handle management for chunked reads.

    The method opens file handles for chunked reading without immediate context management. While this may be intentional for streaming operations, consider documenting this pattern or exploring alternative approaches.

    For better resource management, consider this pattern:

    +    def _create_chunked_reader(self, seekable_path: Path, pos: int, nrows: int, **kwargs):
    +        """Create a chunked reader with proper resource management."""
    +        f = open(seekable_path, "r", encoding="utf-8")
    +        f.seek(pos)
    +        return pd.read_csv(f, sep="\t", nrows=nrows, low_memory=False, **kwargs), f
    +
         def __load_second(self, header, **kwargs):
             seekable_path = self._get_seekable_file_path()
             
             if "chunksize" in kwargs and kwargs["chunksize"] is not None:
    -            f = open(seekable_path, "r", encoding="utf-8")
    -            if header == "PSH":
    -                f.seek(self._psm_pos)
    -                return pd.read_csv(f, sep="\t", nrows=self._psm_len, low_memory=False, **kwargs)
    +            if header == "PSH":
    +                return self._create_chunked_reader(seekable_path, self._psm_pos, self._psm_len, **kwargs)
                 # ... similar for other headers

    495-495: Simplify kwargs.get usage.

    -                        columns = kwargs.get("usecols", None)
    +                        columns = kwargs.get("usecols")
    quantmsio/core/quantms/msstats_in.py (7)

    22-22: Modernize the super() call to Python 3 style.

    The current super() call uses the outdated Python 2 style with explicit arguments.

    -        super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads)
    +        super().__init__(report_path, duckdb_max_memory, duckdb_threads)

    114-116: Remove unnecessary variable assignment.

    The mapping_df variable is created but never used after the SQL execution.

    -        # Insert into DuckDB using pandas DataFrame
    -        mapping_df = pd.DataFrame(mapping_data)
    -        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
    +        # Insert into DuckDB using pandas DataFrame
    +        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM ?", [pd.DataFrame(mapping_data)])

    128-130: Remove unnecessary variable assignment.

    The sample_df variable is created but never used after the SQL execution.

    -        # Insert into DuckDB using pandas DataFrame
    -        sample_df = pd.DataFrame(sample_data)
    -        self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM sample_df")
    +        # Insert into DuckDB using pandas DataFrame
    +        self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM ?", [pd.DataFrame(sample_data)])

    132-220: Well-structured dynamic SQL view creation.

    The method effectively handles different experiment types and available columns to create optimized database views. The SQL construction appears safe since inputs are controlled internally.

    Consider using parameterized queries or a SQL builder library for future enhancements to improve maintainability and safety:

    # Example of safer approach for future consideration
    def _build_charge_expression(self):
        """Build charge column expression based on available columns."""
        if self.has_precursor_charge and self.has_charge:
            return "COALESCE(PrecursorCharge, Charge)"
        # ... rest of logic

    492-494: Replace print statement with proper logging.

    Consistent with other parts of the code, use logging instead of print statements.

    -                    print(
    -                        f"Warning: SQL aggregation failed ({e}), falling back to pandas"
    -                    )
    +                    self.logger.warning(
    +                        f"SQL aggregation failed ({e}), falling back to pandas"
    +                    )

    582-582: Replace print statement with proper logging.

    Use logging instead of print for consistency with the rest of the codebase.

    -            print(f"Filtering for protein: {protein_str}")
    +            self.logger.info(f"Filtering for protein: {protein_str}")

    685-698: Improve cleanup exception handling with logging.

    While try/except/pass is acceptable in __del__ methods, adding logging can help with debugging resource cleanup issues.

         def __del__(self):
             """Cleanup database views and tables."""
             try:
                 if hasattr(self, "_duckdb") and self._duckdb and self._optimized_setup_done:
                     self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats")
                     self._duckdb.execute("DROP TABLE IF EXISTS channel_mapping")
                     self._duckdb.execute("DROP TABLE IF EXISTS sample_mapping")
                     self._duckdb.execute("DROP TABLE IF EXISTS protein_groups")
                     self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats_with_pg")
                 # Always call parent cleanup to close connection and remove database file
                 if hasattr(self, "_duckdb") and self._duckdb:
                     self.destroy_duckdb_database()
    -        except Exception:
    -            pass
    +        except Exception as e:
    +            # Log cleanup failures but don't raise to avoid issues during garbage collection
    +            try:
    +                import logging
    +                logging.getLogger(__name__).debug(f"Error during cleanup: {e}")
    +            except:
    +                pass  # If even logging fails, silently continue
    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between 8ce4f5e and a88aa3d.

    📒 Files selected for processing (2)
    • quantmsio/core/quantms/msstats_in.py (1 hunks)
    • quantmsio/core/quantms/mztab.py (1 hunks)
    🧰 Additional context used
    🪛 Ruff (0.11.9)
    quantmsio/core/quantms/mztab.py

    255-255: Use a context manager for opening files

    (SIM115)


    258-258: Use a context manager for opening files

    (SIM115)


    414-414: Use a context manager for opening files

    (SIM115)


    478-478: Use a context manager for opening files

    (SIM115)


    495-495: Use kwargs.get("usecols") instead of kwargs.get("usecols", None)

    Replace kwargs.get("usecols", None) with kwargs.get("usecols")

    (SIM910)

    quantmsio/core/quantms/msstats_in.py

    89-89: Local variable e is assigned to but never used

    Remove assignment to unused variable e

    (F841)


    115-115: Local variable mapping_df is assigned to but never used

    Remove assignment to unused variable mapping_df

    (F841)


    129-129: Local variable sample_df is assigned to but never used

    Remove assignment to unused variable sample_df

    (F841)

    🪛 GitHub Check: CodeFactor
    quantmsio/core/quantms/mztab.py

    [notice] 691-692: quantmsio/core/quantms/mztab.py#L691-L692
    Try, Except, Continue detected. (B112)

    quantmsio/core/quantms/msstats_in.py

    [notice] 683-684: quantmsio/core/quantms/msstats_in.py#L683-L684
    Try, Except, Pass detected. (B110)

    🪛 Flake8 (7.2.0)
    quantmsio/core/quantms/mztab.py

    [error] 769-769: whitespace before ':'

    (E203)

    quantmsio/core/quantms/msstats_in.py

    [error] 89-89: local variable 'e' is assigned to but never used

    (F841)


    [error] 115-115: local variable 'mapping_df' is assigned to but never used

    (F841)


    [error] 129-129: local variable 'sample_df' is assigned to but never used

    (F841)


    [error] 229-229: whitespace before ':'

    (E203)


    [error] 547-547: whitespace before ':'

    (E203)

    🪛 Pylint (3.3.7)
    quantmsio/core/quantms/mztab.py

    [refactor] 83-83: Too many instance attributes (22/7)

    (R0902)


    [refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 414-414: Consider using 'with' for resource-allocating operations

    (R1732)


    [refactor] 478-478: Consider using 'with' for resource-allocating operations

    (R1732)


    [refactor] 606-640: Too many nested blocks (7/5)

    (R1702)


    [refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)


    [refactor] 715-732: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)

    quantmsio/core/quantms/msstats_in.py

    [refactor] 22-22: Consider using Python 3 style super() without arguments

    (R1725)


    [error] 70-70: Instance of 'MsstatsIN' has no 'logger' member

    (E1101)


    [error] 78-78: Instance of 'MsstatsIN' has no 'logger' member

    (E1101)


    [error] 272-272: Instance of 'MsstatsIN' has no 'logger' member

    (E1101)


    [error] 281-281: Instance of 'MsstatsIN' has no 'logger' member

    (E1101)


    [refactor] 419-420: Use 'yield from' directly instead of yielding each element one by one

    (R1737)

    ⏰ Context from checks skipped due to timeout of 90000ms (5)
    • GitHub Check: Codacy Static Code Analysis
    • GitHub Check: unit-tests
    • GitHub Check: unit-tests (3.11)
    • GitHub Check: unit-tests (3.10)
    • GitHub Check: unit-tests
    🔇 Additional comments (7)
    quantmsio/core/quantms/mztab.py (6)

    17-30: LGTM! Function logic is sound.

    The modification list generation function correctly handles null values and processes modifications appropriately.


    83-136: Well-structured initialization with good performance optimization.

    The conditional DuckDB setup based on file size is a smart optimization. The high number of instance attributes is justified given the comprehensive functionality provided.


    137-244: Excellent DuckDB integration with proper fallback handling.

    The implementation provides good performance optimization with appropriate fallback mechanisms when DuckDB operations fail. The file size threshold and configuration settings are well-chosen.


    246-312: Robust file handling with comprehensive gzip support.

    The implementation correctly handles both regular and gzipped files with appropriate context management. The temporary file creation for seekable operations on gzipped files is a necessary optimization.


    313-367: Efficient metadata parsing with proper caching.

    The once-only parsing pattern with caching is well-implemented and prevents redundant processing. Good error handling ensures the system remains functional even with parsing issues.


    519-773: Comprehensive and well-designed public API.

    The methods provide excellent functionality with proper DuckDB optimization and pandas fallback. The streaming capabilities and efficient querying methods demonstrate thoughtful design for different use cases.

    quantmsio/core/quantms/msstats_in.py (1)

    235-261: Excellent SQL-based performance optimization.

    The file statistics method effectively uses SQL aggregation for better performance compared to iterating through data in Python. The fallback mechanism ensures robustness.

    Comment on lines +70 to +78
    self.logger.warning(
    f"Failed to get column info via PRAGMA: {e}, trying fallback method"
    )
    try:
    sample_query = "SELECT * FROM report LIMIT 1"
    sample_data = self._duckdb.execute(sample_query).df()
    available_columns = set(sample_data.columns.tolist())
    except Exception as e2:
    self.logger.warning(f"Fallback column detection also failed: {e2}")
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue

    Fix missing logger attribute and unused variable.

    The code references self.logger but the MsstatsIN class doesn't inherit from a class that provides a logger attribute, which will cause an AttributeError.

    +import logging
    +
     class MsstatsIN(DuckDB):
         def __init__(
             self,
             report_path: Union[Path, str],
             sdrf_path: Union[Path, str],
             duckdb_max_memory="16GB",
             duckdb_threads=4,
         ):
             super().__init__(report_path, duckdb_max_memory, duckdb_threads)
    +        self.logger = logging.getLogger(__name__)
             self._sdrf = SDRFHandler(sdrf_path)

    Also, remove the unused exception variable:

    -            except Exception as e:
    +            except Exception:
                     # Fallback: get columns from a sample query
                     self.logger.warning(
    -                    f"Failed to get column info via PRAGMA: {e}, trying fallback method"
    +                    "Failed to get column info via PRAGMA, trying fallback method"
                     )
    📝 Committable suggestion

    ‼️ IMPORTANT
    Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    Suggested change
    self.logger.warning(
    f"Failed to get column info via PRAGMA: {e}, trying fallback method"
    )
    try:
    sample_query = "SELECT * FROM report LIMIT 1"
    sample_data = self._duckdb.execute(sample_query).df()
    available_columns = set(sample_data.columns.tolist())
    except Exception as e2:
    self.logger.warning(f"Fallback column detection also failed: {e2}")
    # At the top of quantmsio/core/quantms/msstats_in.py, add the logging import:
    import logging
    class MsstatsIN(DuckDB):
    def __init__(
    self,
    report_path: Union[Path, str],
    sdrf_path: Union[Path, str],
    duckdb_max_memory="16GB",
    duckdb_threads=4,
    ):
    super().__init__(report_path, duckdb_max_memory, duckdb_threads)
    # Initialize a logger for this class
    self.logger = logging.getLogger(__name__)
    self._sdrf = SDRFHandler(sdrf_path)
    ...
    def _detect_columns(self):
    try:
    columns_query = "PRAGMA table_info('report')"
    columns_info = self._duckdb.execute(columns_query).df()
    available_columns = set(columns_info["name"].tolist())
    except Exception:
    # Fallback: get columns from a sample query
    self.logger.warning(
    "Failed to get column info via PRAGMA, trying fallback method"
    )
    try:
    sample_query = "SELECT * FROM report LIMIT 1"
    sample_data = self._duckdb.execute(sample_query).df()
    available_columns = set(sample_data.columns.tolist())
    except Exception as e2:
    self.logger.warning(f"Fallback column detection also failed: {e2}")
    self._available_columns = available_columns
    ...
    🧰 Tools
    🪛 Pylint (3.3.7)

    [error] 70-70: Instance of 'MsstatsIN' has no 'logger' member

    (E1101)


    [error] 78-78: Instance of 'MsstatsIN' has no 'logger' member

    (E1101)

    🤖 Prompt for AI Agents
    In quantmsio/core/quantms/msstats_in.py around lines 70 to 78, the code uses
    self.logger for logging but the MsstatsIN class lacks a logger attribute,
    causing an AttributeError. Add a logger attribute to the MsstatsIN class, for
    example by initializing it with a standard logging.getLogger instance. Also,
    remove the unused exception variable e2 in the fallback except block by omitting
    the variable name in the except clause or replacing it with an underscore.
    

    Copy link
    Contributor

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 5

    🔭 Outside diff range comments (1)
    quantmsio/utils/pride_utils.py (1)

    674-702: Critical bug: Unreachable code after early return.

    The function has unreachable code after the early return on line 674. The main processing logic from lines 676-702 will never execute because the function exits early when modification_string is null/empty.

    Apply this diff to fix the logic flow:

         if (
             modification_string is None
             or modification_string == "null"
             or modification_string == ""
             or pd.isna(modification_string)
         ):
             return peptide_sequence
    
    -        modifications = get_modifications_object_from_mztab_line(
    -            modification_string=modification_string,
    -            modifications_definition=modifications_definition,
    -        )
    -
    -        aa_index = 0
    -        result_peptide: str = ""
    -        peptide_sequence = list(peptide_sequence)
    -        # Add n-term modification if it is present
    -        for key_index, value_index in modifications.items():
    -            if aa_index in value_index["position"]:
    -                result_peptide = "[" + key_index + "]" + result_peptide
    -        if len(result_peptide) > 0:
    -            result_peptide = result_peptide + "-"
    +    modifications = get_modifications_object_from_mztab_line(
    +        modification_string=modification_string,
    +        modifications_definition=modifications_definition,
    +    )
    +
    +    aa_index = 0
    +    result_peptide: str = ""
    +    peptide_sequence = list(peptide_sequence)
    +    # Add n-term modification if it is present
    +    for key_index, value_index in modifications.items():
    +        if aa_index in value_index["position"]:
    +            result_peptide = "[" + key_index + "]" + result_peptide
    +    if len(result_peptide) > 0:
    +        result_peptide = result_peptide + "-"
    +        
         aa_index += 1
         for aa in peptide_sequence:
    ♻️ Duplicate comments (4)
    quantmsio/commands/convert/quantms_project.py (1)

    80-103: Fix critical typo in parameter name.

    There's a typo in the parameter name that will cause a runtime error when the method is called.

            project_handler.add_software_provider(
    -            sortware_name="quantms", sortware_version=quantms_version
    +            software_name="quantms", software_version=quantms_version
            )
    quantmsio/core/quantms/msstats_in.py (1)

    68-68: Replace bare except clause with specific exception handling

    Bare except clauses can catch system exceptions like KeyboardInterrupt. Be more specific about expected exceptions.

                 try:
                     columns_query = "PRAGMA table_info('report')"
                     columns_info = self._duckdb.execute(columns_query).df()
                     available_columns = set(columns_info["name"].tolist())
    -            except Exception as e:
    +            except (AttributeError, KeyError, ValueError) as e:
                     # Fallback: get columns from a sample query
                     self.logger.warning(
                         f"Failed to get column info via PRAGMA: {e}, trying fallback method"
                     )
    quantmsio/core/quantms/mztab.py (2)

    680-680: Replace bare except clauses with specific exception handling

    Be specific about which exceptions to handle.

                         try:
                             count_result = self._duckdb.execute(
                                 f"SELECT COUNT(*) as count FROM {table}"
                             ).fetchone()
                             if count_result:
                                 stats["sections"][table.upper()] = {
                                     "row_count": count_result[0]
                                 }
    -                    except:
    +                    except (duckdb.Error, KeyError):
                             stats["sections"][table.upper()] = {"row_count": 0}
                             section_name = {"PSH": "PSM", "PEH": "PEP", "PRH": "PRT"}[
                                 header
                             ]
                             stats["sections"][section_name] = {"row_count": length}
                         except (ValueError, KeyError, IndexError) as e:
                             # Log specific errors for debugging but continue processing other sections
                             self.logger.debug(f"Failed to extract length for section {header}: {e}")
                             continue
    -                    except Exception:
    +                    except Exception as e:
    +                        # Log unexpected errors but continue processing other sections
    +                        self.logger.warning(f"Unexpected error extracting length for section {header}: {e}")
                             continue

    Also applies to: 691-691


    787-787: Replace bare except clauses in cleanup methods

    Using bare except: can catch system exits and keyboard interrupts. Be more specific.

             if self._duckdb:
                 try:
                     self._duckdb.close()
    -            except:
    -                pass
    +            except Exception as e:
    +                self.logger.warning(f"Failed to close DuckDB connection: {e}")
                 self._duckdb = None
    
             if self._duckdb_name and Path(self._duckdb_name).exists():
                 try:
                     os.remove(self._duckdb_name)
    -            except:
    -                pass
    +            except (OSError, IOError) as e:
    +                self.logger.warning(
    +                    f"Failed to remove DuckDB file {self._duckdb_name}: {e}"
    +                )
                 self._duckdb_name = None
    
             # Clean up temporary decompressed file
             if self._temp_decompressed_file and self._temp_decompressed_file.exists():
                 try:
                     self._temp_decompressed_file.unlink()
    -            except:
    -                pass
    +            except (OSError, IOError) as e:
    +                self.logger.warning(
    +                    f"Failed to remove temporary file {self._temp_decompressed_file}: {e}"
    +                )
                 self._temp_decompressed_file = None

    Also applies to: 794-794, 802-802

    🧹 Nitpick comments (19)
    quantmsio/commands/convert/quantms_project.py (5)

    21-34: Improve robustness of project prefix extraction.

    The get_project_prefix function makes assumptions about filename structure that could lead to incorrect results.

    def get_project_prefix(sdrf_file: Path) -> str:
    -    """Extract project prefix from SDRF filename (e.g. 'PXD000865' from 'PXD000865.sdrf.tsv')."""
    +    """Extract project prefix from SDRF filename (e.g. 'PXD000865' from 'PXD000865.sdrf.tsv')."""
        filename = sdrf_file.name
    -    # Remove .sdrf.tsv and any variations like _openms_design.sdrf.tsv
    -    prefix = filename.split(".sdrf")[0].split("_openms")[0]
    +    # Extract prefix before first occurrence of .sdrf or _openms
    +    prefix = filename.split(".sdrf")[0].split("_openms")[0]
    +    # Validate that we have a reasonable project accession format
    +    if not prefix or len(prefix) < 3:
    +        raise ValueError(f"Could not extract valid project prefix from filename: {filename}")
        return prefix

    81-81: Fix indentation issues in function definitions.

    Multiple function definitions have continuation line indentation issues that violate PEP 8 style guidelines.

    -def _initialize_project(output_folder_path: Path, project_accession: str, sdrf_file: Path, 
    -                       quantmsio_version: str, quantms_version: str):
    +def _initialize_project(output_folder_path: Path, project_accession: str, sdrf_file: Path,
    +                        quantmsio_version: str, quantms_version: str):
    
    -def _convert_features(mztab_file: Path, sdrf_file: Path, output_folder_path: Path, 
    -                     project_accession: str, generate_ibaq_view: bool) -> list:
    +def _convert_features(mztab_file: Path, sdrf_file: Path, output_folder_path: Path,
    +                      project_accession: str, generate_ibaq_view: bool) -> list:
    
    -def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str, 
    -                       output_folder_path: Path):
    +def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str,
    +                        output_folder_path: Path):
    
    -def _convert_psms(mztab_file: Path, sdrf_file: Path, output_folder_path: Path, 
    -                 project_accession: str) -> list:
    +def _convert_psms(mztab_file: Path, sdrf_file: Path, output_folder_path: Path,
    +                  project_accession: str) -> list:
    
    -def _register_files_in_project(created_files: list, output_folder_path: Path, 
    -                              project_accession: str):
    +def _register_files_in_project(created_files: list, output_folder_path: Path,
    +                               project_accession: str):
    
    -    project_handler = _initialize_project(output_folder_path, project_accession, 
    -                                        sdrf_file, quantmsio_version, quantms_version)
    +    project_handler = _initialize_project(output_folder_path, project_accession,
    +                                         sdrf_file, quantmsio_version, quantms_version)
    
    -        feature_files = _convert_features(mztab_file, sdrf_file, output_folder_path, 
    -                                        project_accession, generate_ibaq_view)
    +        feature_files = _convert_features(mztab_file, sdrf_file, output_folder_path,
    +                                         project_accession, generate_ibaq_view)

    Also applies to: 107-107, 136-136, 149-149, 172-172, 226-226, 233-233


    93-93: Remove unused variable assignments.

    Variables are assigned but never used, creating unnecessary code.

    -        # Save initial project file
    -        project_json = str(output_folder_path / f"{project_accession}.project.json")
             project_handler.save_project_info(
                 output_prefix_file=project_accession,
                 output_folder=str(output_folder_path),
                 delete_existing=True,
             )
    
    -    # Initialize project
    -    project_handler = _initialize_project(output_folder_path, project_accession,
    -                                         sdrf_file, quantmsio_version, quantms_version)
    +    # Initialize project
    +    _initialize_project(output_folder_path, project_accession,
    +                       sdrf_file, quantmsio_version, quantms_version)

    Also applies to: 225-225


    192-247: Consider reducing function complexity.

    The quantmsio_workflow function has 6 parameters and manages the entire workflow orchestration. While it has been improved by extracting helper functions, it could benefit from further modularization.

    Consider creating a QuantmsProjectConverter class to encapsulate the workflow state and reduce parameter passing:

    class QuantmsProjectConverter:
        def __init__(self, base_folder: str, output_folder: str, project_accession: str,
                     quantms_version: str, quantmsio_version: str, generate_ibaq_view: bool):
            self.base_folder = base_folder
            self.output_folder = output_folder
            self.project_accession = project_accession
            self.quantms_version = quantms_version
            self.quantmsio_version = quantmsio_version
            self.generate_ibaq_view = generate_ibaq_view
            self.created_files = []
            
        def convert(self) -> None:
            """Execute the full conversion workflow."""
            self._setup_and_validate()
            self._initialize_project()
            self._convert_data()
            self._register_files()

    304-306: Handle potential type mismatch in output directory assignment.

    The output_dir parameter is Optional[Path] but gets assigned a string value, which could cause type checking issues.

        # Default output to sibling quantms.io directory
        if not output_dir:
    -        output_dir = str(quantms_dir.parent / "quantms.io")
    +        output_dir = quantms_dir.parent / "quantms.io"
    quantmsio/utils/pride_utils.py (2)

    516-519: Simplify optional field assignments using dict.get()

    These if-else blocks can be simplified using dict.get() with a default value.

     def _set_optional_psm_fields(psm: dict, es: dict) -> None:
         """Set optional PSM fields with proper fallbacks."""
         # Handle decoy peptide information
    -    if "opt_global_cv_MS:1002217_decoy_peptide" in es:
    -        psm["is_decoy"] = es["opt_global_cv_MS:1002217_decoy_peptide"]
    -    else:
    -        psm["is_decoy"] = None
    +    psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None)
    
         # Handle posterior error probability
         if "opt_global_Posterior_Error_Probability_score" in es:
             psm["posterior_error_probability"] = es["opt_global_Posterior_Error_Probability_score"]
         else:
             psm["posterior_error_probability"] = None
    
         # Handle global q-value
    -    if "opt_global_q-value" in es:
    -        psm["global_qvalue"] = es["opt_global_q-value"]
    -    else:
    -        psm["global_qvalue"] = None
    +    psm["global_qvalue"] = es.get("opt_global_q-value", None)
    
         # Handle consensus support
    -    if "opt_global_consensus_support" in es:
    -        psm["consensus_support"] = es["opt_global_consensus_support"]
    -    else:
    -        psm["consensus_support"] = None
    +    psm["consensus_support"] = es.get("opt_global_consensus_support", None)

    Also applies to: 528-531, 534-537


    513-538: Simplify field assignments using dict.get() method.

    The static analysis tool correctly identifies opportunities to simplify the conditional assignments.

    Apply this diff to simplify the field assignments:

    -    # Handle decoy peptide information
    -    if "opt_global_cv_MS:1002217_decoy_peptide" in es:
    -        psm["is_decoy"] = es["opt_global_cv_MS:1002217_decoy_peptide"]
    -    else:
    -        psm["is_decoy"] = None
    +    # Handle decoy peptide information
    +    psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None)
    
    -    # Handle global q-value
    -    if "opt_global_q-value" in es:
    -        psm["global_qvalue"] = es["opt_global_q-value"]
    -    else:
    -        psm["global_qvalue"] = None
    +    # Handle global q-value
    +    psm["global_qvalue"] = es.get("opt_global_q-value", None)
    
    -    # Handle consensus support
    -    if "opt_global_consensus_support" in es:
    -        psm["consensus_support"] = es["opt_global_consensus_support"]
    -    else:
    -        psm["consensus_support"] = None
    +    # Handle consensus support
    +    psm["consensus_support"] = es.get("opt_global_consensus_support", None)
    tests/test_msstats_in.py (3)

    88-88: Remove unnecessary f-string prefixes

    These strings don't contain any placeholders, so the f-string prefix is not needed.

    -        print(f"\nLFQ test completed successfully!")
    +        print("\nLFQ test completed successfully!")
    -    print(f"\nTMT test completed successfully!")
    +    print("\nTMT test completed successfully!")
    -    print(f"\nDataset Comparison:")
    +    print("\nDataset Comparison:")

    Also applies to: 356-356, 428-428


    220-223: Replace unused loop variable with underscore

    The count variable is not used within the loop body.

         if intensities_channel_counts:
    -        for channel, count in intensities_channel_counts.items():
    +        for channel, _ in intensities_channel_counts.items():
                 if channel not in channel_counts:
                     channel_counts[channel] = 0

    264-270: Simplify dictionary membership test

    Remove unnecessary .keys() when checking dictionary membership.

         all_channels = sorted(
             set(
                 channel
                 for file_channels in file_channel_matrix.values()
    -            for channel in file_channels.keys()
    +            for channel in file_channels
             )
         )
    quantmsio/core/quantms/msstats_in.py (1)

    416-422: Use 'yield from' for better performance

    When yielding all elements from an iterator without modification, use yield from.

             if not self._optimized_setup_done:
                 # Fallback to original method
    -            for batch in self.generate_msstats_in(file_batch_size):
    -                yield batch
    +            yield from self.generate_msstats_in(file_batch_size)
                 return
    quantmsio/core/quantms/mztab.py (1)

    414-414: Consider using context managers for file operations

    These file operations should ideally use context managers to ensure proper resource cleanup.

    For chunked reading operations, the current approach is necessary since the file handle needs to remain open across multiple chunk iterations. However, consider documenting this design choice:

             if "chunksize" in kwargs and kwargs["chunksize"] is not None:
                 # For chunked reading, we need to keep the file handle open
    +            # Note: Caller is responsible for closing the file handle when done iterating
                 f = open(seekable_path, "r", encoding="utf-8")

    Also applies to: 478-478

    quantmsio/core/quantms/pg.py (7)

    76-84: Simplify conditional structure by removing unnecessary elif.

    The elif condition is unreachable after the break statement.

     for line in file:
         if line.startswith("PRH"):
             protein_columns = line.strip().split("\t")[1:]
             break
    -    elif line.startswith("PRT\t") and not protein_columns:
    +    if line.startswith("PRT\t") and not protein_columns:
             # Fallback: use first PRT line to determine column count
             prt_parts = line.strip().split("\t")
             # Generate default column names based on actual data
             protein_columns = [f"col_{i}" for i in range(len(prt_parts))]
             break

    89-92: Remove unnecessary empty finally block.

    The finally block serves no purpose here as file handles are already managed by the context manager.

     except Exception as e:
         logging.getLogger("quantmsio.core.mztab").warning(
             f"Could not extract protein columns: {e}"
         )
    -finally:
    -    # File handle is automatically tracked and will be cleaned up
    -    pass
     return protein_columns

    194-197: Simplify conditional by removing unnecessary elif.

     if len(parts) >= 3:
         return parts[2]  # PROT_HUMAN
    -elif len(parts) >= 2:
    +if len(parts) >= 2:
         return parts[1]  # P12345

    283-289: Remove unnecessary else after return.

     def _convert_to_parquet_format(self, df: pd.DataFrame) -> pa.Table:
         """Convert DataFrame to parquet format using PG_SCHEMA."""
         if df.empty:
             # For empty DataFrames, create an empty table with the correct schema
             return pa.Table.from_arrays(
                 [pa.array([], type=field.type) for field in PG_SCHEMA], schema=PG_SCHEMA
             )
    -    else:
    -        return pa.Table.from_pandas(df, schema=PG_SCHEMA, preserve_index=False)
    +    return pa.Table.from_pandas(df, schema=PG_SCHEMA, preserve_index=False)

    437-440: Document the cleanup behavior after context manager exit.

    The cleanup() call happens after the MsstatsIN context manager has already exited. Consider adding a comment to clarify this is intentional.

     # Context manager automatically cleans up DuckDB resources
     # Cleanup any temporary files created during processing
    +# Note: This cleanup is for MzTabProteinGroups resources, not MsstatsIN
     self.cleanup()

    577-577: Remove unnecessary return statement.

    Remove the empty return statement as Python functions return None by default.


    589-589: Fix slice formatting.

    -yield unique_files[i : i + batch_size]
    +yield unique_files[i:i + batch_size]
    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between a88aa3d and 0ff3ab3.

    📒 Files selected for processing (6)
    • quantmsio/commands/convert/quantms_project.py (1 hunks)
    • quantmsio/core/quantms/msstats_in.py (1 hunks)
    • quantmsio/core/quantms/mztab.py (1 hunks)
    • quantmsio/core/quantms/pg.py (1 hunks)
    • quantmsio/utils/pride_utils.py (4 hunks)
    • tests/test_msstats_in.py (1 hunks)
    🧰 Additional context used
    🪛 GitHub Check: CodeFactor
    tests/test_msstats_in.py

    [notice] 187-231: tests/test_msstats_in.py#L187-L231
    Complex Method

    🪛 Ruff (0.11.9)
    tests/test_msstats_in.py

    88-88: f-string without any placeholders

    Remove extraneous f prefix

    (F541)


    220-220: Loop control variable count not used within loop body

    (B007)


    268-268: Use key in dict instead of key in dict.keys()

    Remove .keys()

    (SIM118)


    356-356: f-string without any placeholders

    Remove extraneous f prefix

    (F541)


    428-428: f-string without any placeholders

    Remove extraneous f prefix

    (F541)

    quantmsio/commands/convert/quantms_project.py

    93-93: Local variable project_json is assigned to but never used

    Remove assignment to unused variable project_json

    (F841)


    225-225: Local variable project_handler is assigned to but never used

    Remove assignment to unused variable project_handler

    (F841)

    quantmsio/core/quantms/msstats_in.py

    89-89: Local variable e is assigned to but never used

    Remove assignment to unused variable e

    (F841)


    115-115: Local variable mapping_df is assigned to but never used

    Remove assignment to unused variable mapping_df

    (F841)


    129-129: Local variable sample_df is assigned to but never used

    Remove assignment to unused variable sample_df

    (F841)

    quantmsio/core/quantms/mztab.py

    255-255: Use a context manager for opening files

    (SIM115)


    258-258: Use a context manager for opening files

    (SIM115)


    414-414: Use a context manager for opening files

    (SIM115)


    478-478: Use a context manager for opening files

    (SIM115)


    495-495: Use kwargs.get("usecols") instead of kwargs.get("usecols", None)

    Replace kwargs.get("usecols", None) with kwargs.get("usecols")

    (SIM910)

    quantmsio/core/quantms/pg.py

    243-243: Use a context manager for opening files

    (SIM115)


    248-248: Use a context manager for opening files

    (SIM115)


    525-525: Local variable protein_df is assigned to but never used

    Remove assignment to unused variable protein_df

    (F841)

    quantmsio/utils/pride_utils.py

    516-519: Use psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None) instead of an if block

    Replace with psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None)

    (SIM401)


    528-531: Use psm["global_qvalue"] = es.get("opt_global_q-value", None) instead of an if block

    Replace with psm["global_qvalue"] = es.get("opt_global_q-value", None)

    (SIM401)


    534-537: Use psm["consensus_support"] = es.get("opt_global_consensus_support", None) instead of an if block

    Replace with psm["consensus_support"] = es.get("opt_global_consensus_support", None)

    (SIM401)

    🪛 Flake8 (7.2.0)
    tests/test_msstats_in.py

    [error] 88-88: f-string is missing placeholders

    (F541)


    [error] 356-356: f-string is missing placeholders

    (F541)


    [error] 428-428: f-string is missing placeholders

    (F541)

    quantmsio/commands/convert/quantms_project.py

    [error] 81-81: continuation line under-indented for visual indent

    (E128)


    [error] 93-93: local variable 'project_json' is assigned to but never used

    (F841)


    [error] 107-107: continuation line under-indented for visual indent

    (E128)


    [error] 136-136: continuation line under-indented for visual indent

    (E128)


    [error] 149-149: continuation line under-indented for visual indent

    (E128)


    [error] 172-172: continuation line under-indented for visual indent

    (E128)


    [error] 225-225: local variable 'project_handler' is assigned to but never used

    (F841)


    [error] 226-226: continuation line under-indented for visual indent

    (E128)


    [error] 233-233: continuation line under-indented for visual indent

    (E128)

    quantmsio/core/quantms/msstats_in.py

    [error] 89-89: local variable 'e' is assigned to but never used

    (F841)


    [error] 115-115: local variable 'mapping_df' is assigned to but never used

    (F841)


    [error] 129-129: local variable 'sample_df' is assigned to but never used

    (F841)


    [error] 229-229: whitespace before ':'

    (E203)


    [error] 547-547: whitespace before ':'

    (E203)

    quantmsio/core/quantms/mztab.py

    [error] 775-775: whitespace before ':'

    (E203)

    quantmsio/core/quantms/pg.py

    [error] 525-525: local variable 'protein_df' is assigned to but never used

    (F841)


    [error] 589-589: whitespace before ':'

    (E203)

    🪛 Pylint (3.3.7)
    tests/test_msstats_in.py

    [refactor] 187-187: Too many branches (15/12)

    (R0912)

    quantmsio/commands/convert/quantms_project.py

    [refactor] 192-192: Too many arguments (6/5)

    (R0913)


    [refactor] 192-192: Too many positional arguments (6/5)

    (R0917)


    [refactor] 192-192: Too many local variables (16/15)

    (R0914)


    [refactor] 289-289: Too many arguments (6/5)

    (R0913)


    [refactor] 289-289: Too many positional arguments (6/5)

    (R0917)

    quantmsio/core/quantms/msstats_in.py

    [refactor] 22-22: Consider using Python 3 style super() without arguments

    (R1725)


    [error] 70-70: Instance of 'MsstatsIN' has no 'logger' member

    (E1101)


    [error] 78-78: Instance of 'MsstatsIN' has no 'logger' member

    (E1101)


    [error] 272-272: Instance of 'MsstatsIN' has no 'logger' member

    (E1101)


    [error] 281-281: Instance of 'MsstatsIN' has no 'logger' member

    (E1101)


    [refactor] 419-420: Use 'yield from' directly instead of yielding each element one by one

    (R1737)

    quantmsio/core/quantms/mztab.py

    [refactor] 83-83: Too many instance attributes (22/7)

    (R0902)


    [refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 414-414: Consider using 'with' for resource-allocating operations

    (R1732)


    [refactor] 478-478: Consider using 'with' for resource-allocating operations

    (R1732)


    [refactor] 606-640: Too many nested blocks (7/5)

    (R1702)


    [refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)


    [refactor] 721-738: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)

    quantmsio/core/quantms/pg.py

    [error] 75-75: Non-iterable value file is used in an iterating context

    (E1133)


    [refactor] 76-84: Unnecessary "elif" after "break", remove the leading "el" from "elif"

    (R1723)


    [error] 112-112: Non-iterable value file is used in an iterating context

    (E1133)


    [refactor] 94-94: Too many branches (13/12)

    (R0912)


    [refactor] 194-197: Unnecessary "elif" after "return", remove the leading "el" from "elif"

    (R1705)


    [refactor] 283-289: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

    (R1705)


    [refactor] 291-291: Too many arguments (9/5)

    (R0913)


    [refactor] 291-291: Too many positional arguments (9/5)

    (R0917)


    [refactor] 291-291: Too many local variables (35/15)

    (R0914)


    [refactor] 291-291: Too many statements (55/50)

    (R0915)


    [refactor] 577-577: Useless return at end of function or method

    (R1711)


    [refactor] 633-633: Too many arguments (7/5)

    (R0913)


    [refactor] 633-633: Too many positional arguments (7/5)

    (R0917)


    [refactor] 633-633: Too many local variables (23/15)

    (R0914)

    ⏰ Context from checks skipped due to timeout of 90000ms (4)
    • GitHub Check: unit-tests
    • GitHub Check: unit-tests
    • GitHub Check: unit-tests (3.10)
    • GitHub Check: unit-tests (3.11)
    🔇 Additional comments (10)
    quantmsio/commands/convert/quantms_project.py (1)

    1-4: LGTM!

    The module docstring is clear and concise.

    quantmsio/utils/pride_utils.py (7)

    46-46: Good practice using raw strings for regex patterns!

    Using raw strings for regex patterns prevents potential issues with escape sequences and makes the regex patterns more readable.

    Also applies to: 52-52


    482-511: Excellent refactoring into focused helper functions!

    Breaking down the fetch_psm_from_mztab_line function into smaller, focused helper functions improves readability and maintainability. Each helper function has a clear single responsibility.


    46-46: Good fix: Using raw strings for regex patterns.

    Converting regex patterns to raw strings prevents potential escaping issues and follows Python best practices for regex patterns.

    Also applies to: 52-52


    482-511: Well-designed helper function with good key management.

    The function effectively handles optional keys by removing them from the required list when they don't exist in the data, preventing KeyError exceptions.


    540-552: Helper function provides clear separation of concerns.

    The peptidoform field logic is properly encapsulated and handles both explicit and derived peptidoform sequences correctly.


    554-571: Improved error handling and null safety.

    The function properly handles the case where ms_runs is None and provides clear error messages when ms_run is not found in the index.


    573-600: Excellent refactoring that improves maintainability.

    The decomposition of the monolithic function into focused helper functions significantly improves code readability and testability while maintaining the same functionality.

    quantmsio/core/quantms/mztab.py (1)

    83-819: Well-designed mzTab parser with comprehensive features!

    The implementation provides excellent functionality including:

    • Support for both regular and gzipped files
    • DuckDB integration for performance
    • Proper resource management with context managers
    • Efficient section extraction and streaming
    • Good error handling and logging

    The class structure is complex but justified by the feature set.

    quantmsio/core/quantms/pg.py (1)

    622-622: Verify file names are safe for SQL queries.

    Ensure that file_batch contains properly sanitized file names to prevent any potential SQL injection issues.

    The file names come from the msstats data which should be trusted, but it's good practice to validate inputs used in SQL queries.

    Comment on lines 127 to 128
    if generate_ibaq_view:
    _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Add created IBAQ file to tracking list.

    The IBAQ file generation doesn't add the created file to the tracking list, which means it won't be registered in the project metadata.

            # Generate IBAQ view if requested
            if generate_ibaq_view:
    -            _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)
    +            ibaq_file = _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)
    +            if ibaq_file:
    +                created_files.append(("ibaq-file", str(ibaq_file)))

    Update _generate_ibaq_view to return the created file path:

    -def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str,
    -                        output_folder_path: Path):
    +def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str,
    +                        output_folder_path: Path) -> Optional[Path]:
         """Generate IBAQ view from feature data."""
         print("\n=== Generating IBAQ View ===")
         try:
             ibaq_file = create_uuid_filename(project_accession, ".ibaq.parquet")
             ibaq_path = output_folder_path / ibaq_file
             write_ibaq_feature(str(sdrf_file), str(feature_file), str(ibaq_path))
             print("IBAQ view generation completed successfully")
    +        return ibaq_path
         except Exception as e:
             print(f"ERROR: IBAQ view generation failed: {str(e)}", file=sys.stderr)
    +        return None
    📝 Committable suggestion

    ‼️ IMPORTANT
    Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    Suggested change
    if generate_ibaq_view:
    _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)
    # Generate IBAQ view if requested
    if generate_ibaq_view:
    - _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)
    + ibaq_file = _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)
    + if ibaq_file:
    + created_files.append(("ibaq-file", str(ibaq_file)))
    Suggested change
    if generate_ibaq_view:
    _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)
    -def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str,
    - output_folder_path: Path):
    +def _generate_ibaq_view(sdrf_file: Path,
    + feature_file: Path,
    + project_accession: str,
    + output_folder_path: Path) -> Optional[Path]:
    """Generate IBAQ view from feature data."""
    print("\n=== Generating IBAQ View ===")
    try:
    ibaq_file = create_uuid_filename(project_accession, ".ibaq.parquet")
    ibaq_path = output_folder_path / ibaq_file
    write_ibaq_feature(str(sdrf_file), str(feature_file), str(ibaq_path))
    print("IBAQ view generation completed successfully")
    + return ibaq_path
    except Exception as e:
    print(f"ERROR: IBAQ view generation failed: {str(e)}", file=sys.stderr)
    + return None
    🤖 Prompt for AI Agents
    In quantmsio/commands/convert/quantms_project.py around lines 127 to 128, the
    code calls _generate_ibaq_view but does not add the created IBAQ file to the
    tracking list, so it is not registered in the project metadata. Modify
    _generate_ibaq_view to return the path of the created IBAQ file, then capture
    this return value here and append it to the tracking list that manages project
    files to ensure proper registration.
    


    self._available_columns = available_columns

    except Exception as e:
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Remove unused variable assignments

    These variables are assigned but never used.

    -        except Exception as e:
    +        except Exception:
                 # Default column set based on common msstats format
                 self._available_columns = {
             # Insert into DuckDB using pandas DataFrame
             mapping_df = pd.DataFrame(mapping_data)
    -        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
    +        self._duckdb.register("mapping_df", mapping_df)
    +        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
    +        self._duckdb.unregister("mapping_df")
             # Insert into DuckDB using pandas DataFrame
             sample_df = pd.DataFrame(sample_data)
    -        self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM sample_df")
    +        self._duckdb.register("sample_df", sample_df)
    +        self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM sample_df")
    +        self._duckdb.unregister("sample_df")

    Also applies to: 115-115, 129-129

    🧰 Tools
    🪛 Flake8 (7.2.0)

    [error] 89-89: local variable 'e' is assigned to but never used

    (F841)

    🪛 Ruff (0.11.9)

    89-89: Local variable e is assigned to but never used

    Remove assignment to unused variable e

    (F841)

    🤖 Prompt for AI Agents
    In quantmsio/core/quantms/msstats_in.py at lines 89, 115, and 129, the exception
    handlers assign the caught exception to a variable (e.g., 'except Exception as
    e:') but never use this variable. Remove the variable assignment by changing
    these lines to 'except Exception:' to avoid unused variable warnings and clean
    up the code.
    

    Comment on lines +14 to +27
    class MsstatsIN(DuckDB):
    def __init__(
    self,
    report_path: Union[Path, str],
    sdrf_path: Union[Path, str],
    duckdb_max_memory="16GB",
    duckdb_threads=4,
    ):
    super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads)
    self._sdrf = SDRFHandler(sdrf_path)
    self.experiment_type = self._sdrf.get_experiment_type_from_sdrf()
    self._sample_map = self._sdrf.get_sample_map_run()
    self._optimized_setup_done = False

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue

    Add missing logger attribute to the class

    The class uses self.logger in multiple places but doesn't initialize it. This will cause AttributeError at runtime.

     class MsstatsIN(DuckDB):
         def __init__(
             self,
             report_path: Union[Path, str],
             sdrf_path: Union[Path, str],
             duckdb_max_memory="16GB",
             duckdb_threads=4,
         ):
             super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads)
    +        self.logger = logging.getLogger(__name__)
             self._sdrf = SDRFHandler(sdrf_path)
             self.experiment_type = self._sdrf.get_experiment_type_from_sdrf()
             self._sample_map = self._sdrf.get_sample_map_run()
             self._optimized_setup_done = False

    Don't forget to import logging at the top of the file:

    import logging
    🧰 Tools
    🪛 Pylint (3.3.7)

    [refactor] 22-22: Consider using Python 3 style super() without arguments

    (R1725)

    🤖 Prompt for AI Agents
    In quantmsio/core/quantms/msstats_in.py around lines 14 to 27, the MsstatsIN
    class uses self.logger but does not initialize it, which will cause an
    AttributeError at runtime. To fix this, import the logging module at the top of
    the file and add a logger attribute initialization in the __init__ method, for
    example by setting self.logger = logging.getLogger(__name__).
    

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant