Major changes in the documentation and schemas created #99

ypriverol · 2025-06-24T15:21:50Z

PR Type

Enhancement

Description

Major CLI restructuring with organized command groups
Enhanced logging system with structured output support
New quantms workflow conversion command
Improved error handling and verbose logging

Changes walkthrough 📝

Relevant files

Enhancement

28 files

quantms.py `Add quantms workflow conversion command`	+318/-0
diann.py `Add DIA-NN conversion commands`	+256/-0
psm.py `Add PSM conversion and comparison commands`	+184/-0
feature.py `Add feature conversion command`	+167/-0
plot.py `Add visualization plotting commands`	+207/-0
quantmsioc.py `Restructure CLI with organized command groups`	+87/-42
de.py `Add differential expression transformation command`	+128/-0
stats.py `Add statistical analysis commands`	+114/-0
maxquant.py `Update MaxQuant conversion commands`	+31/-21
psm.py `Add enhanced logging to PSM processing`	+17/-6
feature.py `Add enhanced logging to feature processing`	+19/-1
project.py `Add project management commands`	+84/-0
ae.py `Add absolute expression transformation command`	+82/-0
format.py `Update schema with nullable fields`	+6/-0
fragpipe.py `Update FragPipe PSM conversion method`	+24/-21
gene.py `Add gene mapping transformation command`	+66/-0
project.py `Update project save methods`	+22/-7
fragpipe.py `Update FragPipe conversion command`	+15/-7
uniprot.py `Add UniProt mapping transformation command`	+56/-0
ibaq.py `Add IBAQ transformation command`	+55/-0
spectra.py `Add spectra transformation command`	+58/-0
anndata.py `Update AnnData merge command`	+20/-9
attach.py `Update file attachment command`	+6/-6
mztab.py `Add logging to MzTab class`	+3/-1
logger.py `Add logger utility module`	+10/-0
__init__.py `Add convert commands package`	+3/-0
__init__.py `Add transform commands package`	+3/-0
__init__.py `Add utils commands package`	+3/-0

Bug fix

1 files

pride_utils.py `Fix regex patterns with raw strings`	+2/-2

Documentation

7 files

feature.avsc `Update feature schema with scan format`	+18/-10
README.md `Major documentation update with usage examples`	+168/-1
psm.avsc `Update PSM schema with record types`	+11/-7
README_LOGGING.md `Add comprehensive logging documentation`	+127/-0
differential.avsc `Add differential expression schema`	+51/-0
ibaq_usecase.adoc `Update IBAQ use case documentation`	+7/-7
absolute.avsc `Add absolute expression schema`	+46/-0

Dependencies

2 files

enviroment.yml `Update DuckDB version requirement`	+1/-1
pyproject.toml `Update Python and PyOpenMS version requirements`	+2/-2

Additional files

13 files

ae_command.py	+0/-60
de_command.py	+0/-67
diann_command.py	+0/-191
feature_command.py	+0/-102
generate_gene_message_command.py	+0/-49
generate_spectra_message_command.py	+0/-46
ibaq_command.py	+0/-52
map_latest_uniport_command.py	+0/-44
plot_command.py	+0/-126
project_command.py	+0/-81
psm_command.py	+0/-79
quantms_command.py	+0/-140
statistic_command.py	+0/-104

Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
Check out the documentation for more information.

Summary by CodeRabbit

New Features
- Introduced CLI command groups for conversion, transformation, visualization, statistics, and project management, enhancing command-line organization.
- Added support for MaxQuant and quantms project data conversion with protein group processing and batch workflows.
- Added new CLI commands for gene and spectrum mapping, UniProt mapping, merging absolute expression files, and differential expression conversion.
- Introduced a new optimized mzTab parser with DuckDB integration and a protein group quantification module using SQL aggregation.
- Added logger configuration utility and enhanced resource management in core components.
- Added comprehensive GitHub Actions workflows for scheduled, unit, integration, and full test suites.
Improvements
- Refactored CLI commands for consistent hyphenated option naming and grouped commands logically.
- Enhanced data schemas with nullable fields, renamed intensity structures, and added peptide and feature counts for protein groups.
- Improved logging, error handling, and resource cleanup across modules.
- Updated documentation with detailed user, developer, logging, and implementation guides.
- Reorganized imports and standardized code style.
Bug Fixes
- Corrected typographical errors in documentation and CLI command examples.
- Fixed regex pattern usage in gene name extraction for robustness.
Tests
- Added extensive integration tests covering full conversion workflows for LFQ and TMT datasets.
- Added unit and integration tests for MaxQuant protein group transformations and msstats input processing.
- Ensured proper resource cleanup in tests to avoid dangling database connections.
Chores
- Updated dependency versions and raised minimum Python requirement to 3.10.
- Expanded .gitignore to exclude additional temporary and generated files.
- Removed deprecated CLI commands and files, consolidating functionality into new modules.

coderabbitai

Actionable comments posted: 23

♻️ Duplicate comments (7)

quantmsio/commands/utils/project.py (1)
77-79: Fix typos in parameter names.

The parameter names have typos: sortware_name and sortware_version should be software_name and software_version.
-    project_handler.add_software_provider(
-        sortware_name=software_name, sortware_version=software_version
-    )
+    project_handler.add_software_provider(
+        software_name=software_name, software_version=software_version
+    )
quantmsio/commands/transform/de.py (1)
67-68: Improve exception chaining.

Use raise ... from e to preserve the original exception context for better debugging.
-        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
+        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
quantmsio/commands/convert/maxquant.py (3)
98-99: Fix exception chaining to preserve error context.

When re-raising exceptions, use from e to preserve the error chain for better debugging.
-        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
+        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
217-218: Fix exception chaining to preserve error context.

Same issue as above - use proper exception chaining.
-        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
+        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
319-320: Fix exception chaining to preserve error context.

Same issue as above - use proper exception chaining.
-        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
+        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
quantmsio/commands/convert/diann.py (2)
91-93: Fix exception chaining to preserve error context.

When re-raising exceptions, use from e to preserve the error chain for better debugging.
-        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
+        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
144-146: Fix exception chaining to preserve error context.

Same issue as above - use proper exception chaining.
-        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.")
+        raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e

🧹 Nitpick comments (33)

quantmsio/core/quantms/psm.py (1)
1-3: Remove unused import and consider the logging addition.

The os import on line 2 is unused and should be removed. The addition of logging import aligns well with the PR's enhanced logging system objectives.
 import logging
-import os
 import re
quantmsio/commands/transform/uniprot.py (2)
50-51: Redundant parameter validation.

The manual validation of required parameters is unnecessary since click's required=True already handles this validation and will raise appropriate errors.
-    if not all([feature_file, fasta, output_folder]):
-        raise click.UsageError("Please provide all required parameters")
-
56-58: Remove unused variable assignment.

The output_path variable is assigned but never used. The filename is passed directly to the core function.
     filename = create_uuid_filename(output_prefix, ".feature.parquet")
-    output_path = output_folder / filename
     map_peptide_to_protein(str(feature_file), str(fasta), str(output_folder), filename)
README.md (2)
114-114: Minor punctuation formatting issue.

There's a loose punctuation mark in the environment variables list formatting.
-The package can be configured using environment variables:
+The package can be configured using environment variables:
132-144: Add language specification to code block.

The project structure code block should specify a language for better rendering and compliance with markdown standards.
-```
+```text
 quantmsio/
 ├── __init__.py
 ├── quantmsioc.py          # CLI entry point
.github/workflows/python-package.yml (1)
72-72: Add missing newline at end of file.

The file should end with a newline character for better POSIX compliance.
       env:
-        PYTHONIOENCODING: utf-8
+        PYTHONIOENCODING: utf-8
+
quantmsio/core/quantms/feature.py (1)
61-62: Use 'yield from' for cleaner generator delegation.

The current implementation yields each element individually. Using yield from is more efficient and cleaner.
-        for msstats in msstats_in.generate_msstats_in(file_num, protein_str):
-            yield msstats
+        yield from msstats_in.generate_msstats_in(file_num, protein_str)
.github/workflows/integration-tests.yml (1)
23-23: Remove trailing spaces and add missing newline.

The file has trailing spaces on multiple lines and is missing a newline at the end.

Remove trailing spaces from all affected lines and add a newline at the end of the file:
     timeout-minutes: 60  # 1 hour timeout for the entire job
-    
+
     
     steps:
     - uses: actions/checkout@v4
-    
+
     
     - name: Set up Python 3.10
       uses: actions/setup-python@v5
       with:
         python-version: "3.10"
-        
+
         
     # Install package in development mode
     poetry install
-        
+
         
     - name: Run Full Conversion Suite Tests
       if: ${{ github.event.inputs.test_type == 'all' || github.event.inputs.test_type == 'full-conversion-suite' || github.event_name == 'release' }}
       run: |
         poetry run pytest -vv tests/test_full_conversion_suite.py --timeout=900
       env:
         PYTHONIOENCODING: utf-8
-        
+
         
     - name: Run LFQ Tests Only
       if: ${{ github.event.inputs.test_type == 'lfq-only' }}
       run: |
         poetry run pytest -vv tests/test_full_conversion_suite.py -k "lfq" --timeout=900
       env:
         PYTHONIOENCODING: utf-8
-        
+
         
-    - name: Run TMT Tests Only  
+    - name: Run TMT Tests Only
       if: ${{ github.event.inputs.test_type == 'tmt-only' }}
       run: |
         poetry run pytest -vv tests/test_full_conversion_suite.py -k "tmt" --timeout=900
       env:
         PYTHONIOENCODING: utf-8
-        
+
         
     - name: Run All Integration Tests
       if: ${{ github.event.inputs.test_type == 'all' || github.event_name == 'release' }}
       run: |
         poetry run pytest -vv -m "integration" --timeout=900
       env:
-        PYTHONIOENCODING: utf-8 
+        PYTHONIOENCODING: utf-8
+
Also applies to: 26-26, 31-31, 40-40, 47-47, 54-54, 55-55, 61-61, 67-67
quantmsio/commands/utils/project.py (1)
68-69: Remove redundant parameter validation.

The parameter validation is unnecessary since Click already enforces required parameters with required=True.
-    if not all([project_accession, sdrf_file, output_folder]):
-        raise click.UsageError("Please provide all required parameters")
-
quantmsio/commands/transform/de.py (1)
16-26: Consider reducing function parameters.

The function has 9 parameters, which exceeds the recommended limit. Consider grouping related parameters into a configuration object or dataclass.
from dataclasses import dataclass
from typing import Optional
from pathlib import Path

@dataclass
class DifferentialConversionConfig:
    msstats_file: Path
    sdrf_file: Path
    project_file: Optional[Path] = None
    protein_file: Optional[Path] = None
    fdr_threshold: float = 0.05
    output_folder: Path = None
    output_prefix: Optional[str] = None
    delete_existing: bool = True
    verbose: bool = False

def convert_msstats_differential(config: DifferentialConversionConfig) -> None:
    # Function implementation using config.attribute_name
quantmsio/commands/utils/stats.py (1)
9-9: Address pylint suggestion for dictionary literal.

The static analysis tool suggests using dictionary literal syntax instead of dict() call.
-CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])
+CONTEXT_SETTINGS = {"help_option_names": ["-h", "--help"]}
.github/workflows/python-app.yml (2)
35-35: Fix YAML indentation consistency.

The YAML file has inconsistent indentation. Steps should be indented by 6 spaces, not 4.
     steps:
-    - uses: actions/checkout@v4
+      - uses: actions/checkout@v4
Apply this indentation fix to all three job definitions.

Also applies to: 64-64, 88-88

106-106: Add missing newline at end of file.

The file is missing a newline character at the end, which is a common YAML best practice.
         poetry run pytest -vv --timeout=900
       env:
         PYTHONIOENCODING: utf-8
+
tests/test_full_conversion_suite.py (3)
51-56: Address unused loop variables.

The static analysis correctly identifies unused loop variables that should be renamed with underscore prefix.
-    for dataset_name, files in [("LFQ", lfq_files), ("TMT", tmt_files)]:
-        for file_type, file_path in files.items():
+    for _dataset_name, files in [("LFQ", lfq_files), ("TMT", tmt_files)]:
+        for _file_type, file_path in files.items():
93-93: Fix f-string without placeholders.

The f-string on line 93 doesn't contain any placeholders and should be a regular string.
-        print(f"[TIMEOUT] Timeout after 10 minutes")
+        print("[TIMEOUT] Timeout after 10 minutes")
84-94: Simplify conditional return structure.

The else clause after return is unnecessary and can be simplified.
        if result.returncode == 0:
            print(f"[OK] Success in {duration:.2f}s")
            return True, duration, result.stdout, result.stderr
-        else:
-            print(f"[FAIL] Failed in {duration:.2f}s")
-            print(f"Error: {result.stderr}")
-            return False, duration, result.stdout, result.stderr
+        
+        print(f"[FAIL] Failed in {duration:.2f}s")
+        print(f"Error: {result.stderr}")
+        return False, duration, result.stdout, result.stderr
quantmsio/commands/convert/quantms.py (1)

191-201: Consider grouping related parameters for better maintainability.

The function has 9 parameters, which is acceptable for a CLI command but could become harder to maintain as it grows. Consider grouping related parameters into configuration objects if more options are added in the future.

For future consideration, you could group related parameters:

Input parameters: input_file, msstats_file, sdrf_file

Output parameters: output_folder, output_prefix

Processing options: compute_topn, compute_ibaq, topn

General options: verbose

This would make the function signature cleaner and more extensible.
quantmsio/commands/utils/plot.py (1)
14-14: Use dict literal instead of dict() call.

For better readability and slight performance improvement, use a dict literal.
-CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])
+CONTEXT_SETTINGS = {"help_option_names": ["-h", "--help"]}
docs/IMPLEMENTATION_DETAILS.md (3)
73-73: Use en dash for number ranges.

For better typography, use an en dash (–) instead of a hyphen (-) when specifying ranges.
-- **Before:** 7+ minutes for large datasets
-- **After:** 10-45 seconds for the same datasets
+- **Before:** 7+ minutes for large datasets  
+- **After:** 10–45 seconds for the same datasets
177-177: Add language specifiers to fenced code blocks.

Several code blocks are missing language specifiers, which helps with syntax highlighting.

For the code blocks starting at lines 177, 271, 281, and 291, add language specifiers:

Line 177:
-```
+```text
Lines 271, 281, 291:
-```
+```text
This improves readability with proper syntax highlighting.

Also applies to: 271-271, 281-281, 291-291

270-270: Remove trailing punctuation from headings.

Markdown headings should not have trailing punctuation according to common style guides.
-#### Input (Raw msstats data):
+#### Input (Raw msstats data)

-#### After Channel Mapping:
+#### After Channel Mapping

-#### After Aggregation (Final output):
+#### After Aggregation (Final output)
Also applies to: 280-280, 290-290
tests/test_msstats_in.py (4)
88-88: Remove unnecessary f-string prefixes.

These strings don't contain any placeholders, so the f-prefix is not needed.
-        print(f"\nLFQ test completed successfully!")
+        print("\nLFQ test completed successfully!")
...
-        print(f"\nTMT test completed successfully!")
+        print("\nTMT test completed successfully!")
...
-    print(f"\nDataset Comparison:")
+    print("\nDataset Comparison:")
Also applies to: 331-331, 403-403

236-240: Remove unused loop variable.

The count variable is not used within the loop body.
-                for channel, count in intensities_channel_counts.items():
+                for channel, _ in intensities_channel_counts.items():
                     if channel not in channel_counts:
                         channel_counts[channel] = 0
251-254: Remove unused variable assignment.

The file_channel_counts variable is assigned but never used.
-            # Original counting from channel column
-            file_channel_counts = msstats_batch.groupby(
-                ["reference_file_name", "channel"]
-            ).size()
-            # Note: Not using this since it only shows surviving channels after deduplication
+            # Note: Not using groupby counting since it only shows surviving channels after deduplication
282-284: Simplify dictionary key iteration.

Remove unnecessary .keys() when iterating over dictionary keys.
-                for channel in file_channels.keys()
+                for channel in file_channels
quantmsio/core/quantms/mztab.py (2)
495-495: Simplify dictionary get() call.

The default None is redundant when using dict.get().
-                        columns = kwargs.get("usecols", None)
+                        columns = kwargs.get("usecols")
83-126: Consider reducing instance attributes.

The class has 22 instance attributes, exceeding the recommended limit of 7. Consider grouping related attributes into data classes.

Would you like me to help refactor this by grouping related attributes? For example:

Position tracking attributes into a PositionTracker class

DuckDB-related attributes into a DuckDBConfig class

Metadata cache into a MetadataCache class
quantmsio/core/quantms/msstats_in.py (3)
14-27: Consider using Python 3 style super() and adding type hints

The initialization is well-structured, but could benefit from modern Python conventions and type hints for better code clarity.
 class MsstatsIN(DuckDB):
+    """MSstats input data processor using DuckDB for efficient data handling.
+    
+    Supports both LFQ and isobaric labeling (TMT/iTRAQ) experiments.
+    """
+    
     def __init__(
         self,
         report_path: Union[Path, str],
         sdrf_path: Union[Path, str],
-        duckdb_max_memory="16GB",
-        duckdb_threads=4,
-    ):
-        super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads)
+        duckdb_max_memory: str = "16GB",
+        duckdb_threads: int = 4,
+    ) -> None:
+        super().__init__(report_path, duckdb_max_memory, duckdb_threads)
         self._sdrf = SDRFHandler(sdrf_path)
         self.experiment_type = self._sdrf.get_experiment_type_from_sdrf()
         self._sample_map = self._sdrf.get_sample_map_run()
         self._optimized_setup_done = False
219-219: Fix whitespace before colon in slicing operations

PEP 8 recommends no whitespace before colons in slicing operations.
-            references[i : i + file_num] for i in range(0, len(references), file_num)
+            references[i:i + file_num] for i in range(0, len(references), file_num)
-            yield file_list[i : i + batch_size]
+            yield file_list[i:i + batch_size]
Also applies to: 533-533

671-685: Consider relying on context manager for cleanup instead of del

The __del__ method is not guaranteed to be called and can cause issues with garbage collection. Since you already have context manager support, consider emphasizing its use in documentation.
     def __del__(self):
         """Cleanup database views and tables."""
+        # Note: __del__ is not guaranteed to be called. Prefer using context manager.
         try:
             if hasattr(self, "_duckdb") and self._duckdb and self._optimized_setup_done:
                 self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats")
                 self._duckdb.execute("DROP TABLE IF EXISTS channel_mapping")
                 self._duckdb.execute("DROP TABLE IF EXISTS sample_mapping")
                 self._duckdb.execute("DROP TABLE IF EXISTS protein_groups")
                 self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats_with_pg")
             # Always call parent cleanup to close connection and remove database file
             if hasattr(self, "_duckdb") and self._duckdb:
                 self.destroy_duckdb_database()
         except:
-            pass
+            pass  # Suppress exceptions in destructor to prevent issues during GC
quantmsio/core/quantms/pg.py (3)
60-66: Improve exception handling in destructor

Using bare except in del is acceptable but could be improved with contextlib.suppress.
+from contextlib import suppress
+
     def __del__(self):
         """Destructor to ensure cleanup."""
-        try:
+        with suppress(Exception):
             self.cleanup()
-        except:
-            pass  # Ignore errors in destructor
229-273: Consider using ExitStack for cleaner resource management

The current implementation manually tracks file handles. Python's contextlib.ExitStack could simplify this.
from contextlib import ExitStack

def _safe_file_open(self, file_path, mode="r"):
    """Safely open files with automatic resource management."""
    import gzip
    
    # Handle both compressed and uncompressed files
    if str(file_path).endswith(".gz"):
        return gzip.open(file_path, "rt" if "t" not in mode else mode, encoding="utf-8")
    else:
        return open(file_path, mode, encoding="utf-8")
Then use it directly with with statements without the complex tracking logic.

583-583: Fix whitespace before colon in slicing operation
-                yield unique_files[i : i + batch_size]
+                yield unique_files[i:i + batch_size]

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57ea767 and 583005d.

⛔ Files ignored due to path filters (14)

tests/examples/maxquant/maxquant_full/PXD001819.sdrf.tsv is excluded by !**/*.tsv
tests/examples/maxquant/maxquant_full/evidence.txt.gz is excluded by !**/*.gz
tests/examples/maxquant/maxquant_simple/sdrf.tsv is excluded by !**/*.tsv
tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf.tsv is excluded by !**/*.tsv
tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design.tsv is excluded by !**/*.tsv
tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_msstats_in.csv.gz is excluded by !**/*.gz
tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_openms.mzTab.gz is excluded by !**/*.gz
tests/examples/quantms/dda-lfq-small/PXD040438.sdrf.tsv is excluded by !**/*.tsv
tests/examples/quantms/dda-lfq-small/PXD040438_msstats_in.csv is excluded by !**/*.csv
tests/examples/quantms/dda-plex-full/PXD007683-TMT.sdrf.tsv is excluded by !**/*.tsv
tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_msstats_in.csv.gz is excluded by !**/*.gz
tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_openms.mzTab.gz is excluded by !**/*.gz
tests/examples/quantms/dda-plex-small/MSV000079033-Blood-Plasma-iTRAQ.sdrf.tsv is excluded by !**/*.tsv
tests/examples/quantms/dda-plex-small/MSV000079033_msstats_in.csv is excluded by !**/*.csv

📒 Files selected for processing (59)

.github/workflows/integration-tests.yml (1 hunks)
.github/workflows/python-app.yml (2 hunks)
.github/workflows/python-package.yml (1 hunks)
.gitignore (1 hunks)
README.md (1 hunks)
docs/IMPLEMENTATION_DETAILS.md (1 hunks)
docs/README.adoc (5 hunks)
docs/feature.avsc (6 hunks)
docs/pg.avsc (3 hunks)
docs/tools.adoc (2 hunks)
pyproject.toml (3 hunks)
quantmsio/commands/convert/__init__.py (1 hunks)
quantmsio/commands/convert/diann.py (1 hunks)
quantmsio/commands/convert/fragpipe.py (1 hunks)
quantmsio/commands/convert/maxquant.py (1 hunks)
quantmsio/commands/convert/quantms.py (1 hunks)
quantmsio/commands/convert/quantms_project.py (1 hunks)
quantmsio/commands/transform/anndata.py (1 hunks)
quantmsio/commands/transform/de.py (1 hunks)
quantmsio/commands/transform/gene.py (1 hunks)
quantmsio/commands/transform/ibaq.py (1 hunks)
quantmsio/commands/transform/spectra.py (1 hunks)
quantmsio/commands/transform/uniprot.py (1 hunks)
quantmsio/commands/utils/plot.py (1 hunks)
quantmsio/commands/utils/project.py (1 hunks)
quantmsio/commands/utils/stats.py (1 hunks)
quantmsio/core/ae.py (1 hunks)
quantmsio/core/combiner.py (1 hunks)
quantmsio/core/common.py (3 hunks)
quantmsio/core/de.py (1 hunks)
quantmsio/core/diann/diann.py (3 hunks)
quantmsio/core/duckdb.py (1 hunks)
quantmsio/core/format.py (9 hunks)
quantmsio/core/fragpipe/fragpipe.py (3 hunks)
quantmsio/core/maxquant/maxquant.py (3 hunks)
quantmsio/core/msstats_in.py (0 hunks)
quantmsio/core/mztab.py (0 hunks)
quantmsio/core/project.py (2 hunks)
quantmsio/core/quantms/feature.py (4 hunks)
quantmsio/core/quantms/msstats_in.py (1 hunks)
quantmsio/core/quantms/mztab.py (1 hunks)
quantmsio/core/quantms/pg.py (1 hunks)
quantmsio/core/quantms/psm.py (5 hunks)
quantmsio/core/sdrf.py (2 hunks)
quantmsio/operate/plots.py (1 hunks)
quantmsio/operate/query.py (2 hunks)
quantmsio/operate/statistics.py (1 hunks)
quantmsio/operate/tools.py (1 hunks)
quantmsio/quantmsioc.py (3 hunks)
quantmsio/utils/file_utils.py (1 hunks)
tests/test_diann.py (2 hunks)
tests/test_feature.py (1 hunks)
tests/test_full_conversion_suite.py (1 hunks)
tests/test_maxquant.py (2 hunks)
tests/test_msstats_in.py (1 hunks)
tests/test_project.py (1 hunks)
tests/test_psm.py (1 hunks)
tests/test_query.py (1 hunks)
tests/test_sdrf.py (4 hunks)

💤 Files with no reviewable changes (2)

quantmsio/core/msstats_in.py
quantmsio/core/mztab.py

✅ Files skipped from review due to trivial changes (13)

tests/test_query.py
quantmsio/operate/plots.py
quantmsio/core/ae.py
quantmsio/core/de.py
quantmsio/utils/file_utils.py
tests/test_psm.py
quantmsio/core/duckdb.py
tests/test_project.py
tests/test_feature.py
quantmsio/core/combiner.py
quantmsio/operate/tools.py
tests/test_sdrf.py
docs/tools.adoc

🚧 Files skipped from review as they are similar to previous changes (11)

quantmsio/commands/convert/init.py
pyproject.toml
quantmsio/commands/transform/anndata.py
quantmsio/commands/transform/ibaq.py
quantmsio/commands/transform/spectra.py
quantmsio/core/project.py
docs/pg.avsc
quantmsio/quantmsioc.py
quantmsio/commands/convert/fragpipe.py
quantmsio/core/format.py
docs/feature.avsc

🧰 Additional context used

🧬 Code Graph Analysis (10)

quantmsio/operate/query.py (2)

quantmsio/utils/pride_utils.py (3)

generate_gene_name_map (35-55)

get_gene_accessions (58-71)

get_unanimous_name (19-32)

quantmsio/operate/statistics.py (1)

close (108-112)

tests/test_diann.py (3)

quantmsio/core/diann/diann.py (2)

get_report_from_database (48-65)

generate_pg_matrix (109-210)

quantmsio/core/quantms/feature.py (1)

Feature (18-292)

quantmsio/core/duckdb.py (1)

get_unique_references (66-69)

quantmsio/operate/statistics.py (1)

quantmsio/operate/query.py (1)

close (344-348)

quantmsio/commands/convert/maxquant.py (3)

quantmsio/core/maxquant/maxquant.py (5)

MaxQuant (54-750)

write_psm_to_file (362-373)

write_feature_to_file (380-399)

write_features_to_file (419-443)

write_protein_groups_to_file (684-711)

quantmsio/core/project.py (1)

create_uuid_filename (59-62)

quantmsio/utils/logger.py (1)

get_logger (8-10)

quantmsio/commands/convert/diann.py (3)

quantmsio/core/diann/diann.py (4)

DiaNNConvert (34-486)

write_feature_to_file (443-462)

write_features_to_file (464-486)

write_pg_matrix_to_file (423-441)

quantmsio/core/project.py (1)

create_uuid_filename (59-62)

quantmsio/utils/logger.py (1)

get_logger (8-10)

quantmsio/commands/transform/de.py (3)

quantmsio/core/de.py (6)

DifferentialExpressionHandler (33-269)

load_project_file (100-113)

load_msstats_file (81-98)

load_sdrf_file (255-261)

set_fdr_threshold (263-269)

convert_msstats_to_quantms (115-220)

quantmsio/utils/file_utils.py (1)

extract_protein_list (21-38)

quantmsio/utils/logger.py (1)

get_logger (8-10)

quantmsio/commands/transform/gene.py (1)

quantmsio/operate/tools.py (1)

generate_feature_of_gene (104-120)

quantmsio/commands/utils/plot.py (1)

quantmsio/operate/plots.py (5)

plot_distribution_of_ibaq (14-43)

plot_intensity_box_of_samples (180-217)

plot_intensity_distribution_of_samples (117-145)

plot_peptide_distribution_of_protein (148-177)

plot_peptides_of_lfq_condition (46-114)

quantmsio/commands/utils/project.py (1)

quantmsio/core/project.py (7)

check_directory (22-56)

populate_from_pride_archive (123-148)

populate_from_sdrf (282-288)

add_quantms_version (150-151)

add_software_provider (168-175)

add_sdrf_file (290-326)

save_updated_project_info (257-280)

quantmsio/commands/utils/stats.py (1)

quantmsio/operate/statistics.py (15)

IbaqStatistics (26-46)

ParquetStatistics (49-116)

get_number_of_proteins (10-11)

get_number_of_proteins (32-38)

get_number_of_proteins (80-92)

get_number_of_peptides (13-14)

get_number_of_peptides (62-66)

get_number_of_samples (16-17)

get_number_of_samples (40-46)

get_number_of_samples (74-78)

get_number_of_peptidoforms (19-20)

get_number_of_peptidoforms (68-72)

get_number_msruns (22-23)

get_number_msruns (94-98)

get_number_of_psms (100-106)

🪛 Ruff (0.11.9)

quantmsio/core/quantms/psm.py

2-2: os imported but unused

Remove unused import: os

(F401)

62-62: Use a context manager for opening files

(SIM115)

quantmsio/commands/convert/maxquant.py

99-99: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

218-218: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

320-320: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

quantmsio/commands/convert/quantms.py

86-86: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

140-140: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

249-249: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

quantmsio/core/maxquant/maxquant.py

733-733: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

735-735: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

737-737: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

739-739: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

743-743: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

745-745: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

749-749: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

quantmsio/commands/convert/quantms_project.py

5-5: logging imported but unused

Remove unused import: logging

(F401)

12-12: pandas imported but unused

Remove unused import: pandas

(F401)

13-13: pyarrow imported but unused

Remove unused import: pyarrow

(F401)

14-14: pyarrow.parquet imported but unused

Remove unused import: pyarrow.parquet

(F401)

22-22: quantmsio.core.quantms.mztab.MzTabProteinGroups imported but unused

Remove unused import: quantmsio.core.quantms.mztab.MzTabProteinGroups

(F401)

tests/test_msstats_in.py

88-88: f-string without any placeholders

Remove extraneous f prefix

(F541)

236-236: Loop control variable count not used within loop body

(B007)

251-251: Local variable file_channel_counts is assigned to but never used

Remove assignment to unused variable file_channel_counts

(F841)

283-283: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

331-331: f-string without any placeholders

Remove extraneous f prefix

(F541)

403-403: f-string without any placeholders

Remove extraneous f prefix

(F541)

quantmsio/core/quantms/msstats_in.py

68-68: Do not use bare except

(E722)

76-76: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

105-105: Local variable mapping_df is assigned to but never used

Remove assignment to unused variable mapping_df

(F841)

119-119: Local variable sample_df is assigned to but never used

Remove assignment to unused variable sample_df

(F841)

260-260: Do not use bare except

(E722)

266-266: Do not use bare except

(E722)

683-683: Do not use bare except

(E722)

tests/test_maxquant.py

6-6: pytest imported but unused

Remove unused import: pytest

(F401)

quantmsio/core/quantms/pg.py

62-65: Use contextlib.suppress(Exception) instead of try-except-pass

(SIM105)

64-64: Do not use bare except

(E722)

180-180: Do not use bare except

(E722)

241-241: Use a context manager for opening files

(SIM115)

246-246: Use a context manager for opening files

(SIM115)

261-264: Use contextlib.suppress(ValueError) instead of try-except-pass

(SIM105)

270-270: Do not use bare except

(E722)

519-519: Local variable protein_df is assigned to but never used

Remove assignment to unused variable protein_df

(F841)

quantmsio/core/quantms/mztab.py

255-255: Use a context manager for opening files

(SIM115)

258-258: Use a context manager for opening files

(SIM115)

414-414: Use a context manager for opening files

(SIM115)

478-478: Use a context manager for opening files

(SIM115)

495-495: Use kwargs.get("usecols") instead of kwargs.get("usecols", None)

Replace kwargs.get("usecols", None) with kwargs.get("usecols")

(SIM910)

680-680: Do not use bare except

(E722)

691-691: Do not use bare except

(E722)

785-788: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)

787-787: Do not use bare except

(E722)

792-795: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)

794-794: Do not use bare except

(E722)

800-803: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)

802-802: Do not use bare except

(E722)

tests/test_full_conversion_suite.py

51-51: Loop control variable dataset_name not used within loop body

Rename unused dataset_name to _dataset_name

(B007)

52-52: Loop control variable file_type not used within loop body

Rename unused file_type to _file_type

(B007)

93-93: f-string without any placeholders

Remove extraneous f prefix

(F541)

quantmsio/commands/convert/diann.py

93-93: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

146-146: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

quantmsio/commands/transform/de.py

68-68: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

quantmsio/commands/transform/uniprot.py

57-57: Local variable output_path is assigned to but never used

Remove assignment to unused variable output_path

(F841)

quantmsio/core/sdrf.py

127-129: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 Flake8 (7.2.0)

quantmsio/core/quantms/psm.py

[error] 2-2: 'os' imported but unused

(F401)

quantmsio/commands/convert/quantms_project.py

[error] 5-5: 'logging' imported but unused

(F401)

[error] 12-12: 'pandas as pd' imported but unused

(F401)

[error] 13-13: 'pyarrow as pa' imported but unused

(F401)

[error] 14-14: 'pyarrow.parquet as pq' imported but unused

(F401)

[error] 22-22: 'quantmsio.core.quantms.mztab.MzTabProteinGroups' imported but unused

(F401)

tests/test_msstats_in.py

[error] 88-88: f-string is missing placeholders

(F541)

[error] 251-251: local variable 'file_channel_counts' is assigned to but never used

(F841)

[error] 331-331: f-string is missing placeholders

(F541)

[error] 403-403: f-string is missing placeholders

(F541)

quantmsio/core/quantms/msstats_in.py

[error] 68-68: do not use bare 'except'

(E722)

[error] 76-76: local variable 'e' is assigned to but never used

(F841)

[error] 105-105: local variable 'mapping_df' is assigned to but never used

(F841)

[error] 119-119: local variable 'sample_df' is assigned to but never used

(F841)

[error] 219-219: whitespace before ':'

(E203)

[error] 260-260: do not use bare 'except'

(E722)

[error] 266-266: do not use bare 'except'

(E722)

[error] 533-533: whitespace before ':'

(E203)

[error] 683-683: do not use bare 'except'

(E722)

tests/test_maxquant.py

[error] 6-6: 'pytest' imported but unused

(F401)

quantmsio/core/quantms/pg.py

[error] 64-64: do not use bare 'except'

(E722)

[error] 180-180: do not use bare 'except'

(E722)

[error] 270-270: do not use bare 'except'

(E722)

[error] 519-519: local variable 'protein_df' is assigned to but never used

(F841)

[error] 583-583: whitespace before ':'

(E203)

quantmsio/core/quantms/mztab.py

[error] 680-680: do not use bare 'except'

(E722)

[error] 691-691: do not use bare 'except'

(E722)

[error] 769-769: whitespace before ':'

(E203)

[error] 787-787: do not use bare 'except'

(E722)

[error] 794-794: do not use bare 'except'

(E722)

[error] 802-802: do not use bare 'except'

(E722)

tests/test_full_conversion_suite.py

[error] 93-93: f-string is missing placeholders

(F541)

quantmsio/commands/transform/uniprot.py

[error] 57-57: local variable 'output_path' is assigned to but never used

(F841)

🪛 Pylint (3.3.7)

quantmsio/core/quantms/psm.py

[refactor] 62-62: Consider using 'with' for resource-allocating operations

(R1732)

quantmsio/commands/convert/maxquant.py

[refactor] 144-144: Too many arguments (8/5)

(R0913)

[refactor] 144-144: Too many positional arguments (8/5)

(R0917)

[refactor] 259-259: Too many arguments (7/5)

(R0913)

[refactor] 259-259: Too many positional arguments (7/5)

(R0917)

quantmsio/commands/convert/quantms.py

[refactor] 55-55: Too many arguments (6/5)

(R0913)

[refactor] 55-55: Too many positional arguments (6/5)

(R0917)

[refactor] 191-191: Too many arguments (9/5)

(R0913)

[refactor] 191-191: Too many positional arguments (9/5)

(R0917)

[refactor] 191-191: Too many local variables (16/15)

(R0914)

quantmsio/core/maxquant/maxquant.py

[refactor] 684-684: Too many arguments (6/5)

(R0913)

[refactor] 684-684: Too many positional arguments (6/5)

(R0917)

quantmsio/commands/convert/quantms_project.py

[error] 22-22: No name 'MzTabProteinGroups' in module 'quantmsio.core.quantms.mztab'

(E0611)

[refactor] 47-47: Too many arguments (6/5)

(R0913)

[refactor] 47-47: Too many positional arguments (6/5)

(R0917)

[refactor] 47-47: Too many local variables (25/15)

(R0914)

[refactor] 47-47: Too many branches (14/12)

(R0912)

[refactor] 47-47: Too many statements (78/50)

(R0915)

[refactor] 247-247: Too many arguments (6/5)

(R0913)

[refactor] 247-247: Too many positional arguments (6/5)

(R0917)

quantmsio/core/quantms/feature.py

[refactor] 61-62: Use 'yield from' directly instead of yielding each element one by one

(R1737)

tests/test_msstats_in.py

[refactor] 98-98: Too many local variables (34/15)

(R0914)

[refactor] 152-253: Too many nested blocks (6/5)

(R1702)

[refactor] 152-253: Too many nested blocks (6/5)

(R1702)

[refactor] 98-98: Too many branches (31/12)

(R0912)

[refactor] 98-98: Too many statements (114/50)

(R0915)

quantmsio/core/quantms/msstats_in.py

[refactor] 22-22: Consider using Python 3 style super() without arguments

(R1725)

[refactor] 405-406: Use 'yield from' directly instead of yielding each element one by one

(R1737)

quantmsio/core/quantms/pg.py

[error] 73-73: Non-iterable value file is used in an iterating context

(E1133)

[refactor] 74-82: Unnecessary "elif" after "break", remove the leading "el" from "elif"

(R1723)

[error] 110-110: Non-iterable value file is used in an iterating context

(E1133)

[refactor] 92-92: Too many branches (13/12)

(R0912)

[refactor] 192-195: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 277-283: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

[refactor] 285-285: Too many arguments (9/5)

(R0913)

[refactor] 285-285: Too many positional arguments (9/5)

(R0917)

[refactor] 285-285: Too many local variables (35/15)

(R0914)

[refactor] 285-285: Too many statements (55/50)

(R0915)

[refactor] 571-571: Useless return at end of function or method

(R1711)

[refactor] 627-627: Too many arguments (7/5)

(R0913)

[refactor] 627-627: Too many positional arguments (7/5)

(R0917)

[refactor] 627-627: Too many local variables (23/15)

(R0914)

quantmsio/core/quantms/mztab.py

[refactor] 83-83: Too many instance attributes (22/7)

(R0902)

[refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 414-414: Consider using 'with' for resource-allocating operations

(R1732)

[refactor] 478-478: Consider using 'with' for resource-allocating operations

(R1732)

[refactor] 606-640: Too many nested blocks (7/5)

(R1702)

[refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

[refactor] 715-732: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

tests/test_full_conversion_suite.py

[refactor] 84-90: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

quantmsio/commands/convert/diann.py

[refactor] 17-17: Too many arguments (12/5)

(R0913)

[refactor] 17-17: Too many positional arguments (12/5)

(R0917)

[refactor] 17-17: Too many local variables (19/15)

(R0914)

[refactor] 96-96: Too many arguments (7/5)

(R0913)

[refactor] 96-96: Too many positional arguments (7/5)

(R0917)

quantmsio/commands/transform/de.py

[refactor] 16-16: Too many arguments (9/5)

(R0913)

[refactor] 16-16: Too many positional arguments (9/5)

(R0917)

quantmsio/commands/transform/gene.py

[refactor] 45-45: Too many arguments (6/5)

(R0913)

[refactor] 45-45: Too many positional arguments (6/5)

(R0917)

quantmsio/commands/utils/plot.py

[refactor] 14-14: Consider using '{"help_option_names": ['-h', '--help']}' instead of a call to 'dict'.

(R1735)

quantmsio/commands/utils/project.py

[refactor] 45-45: Too many arguments (6/5)

(R0913)

[refactor] 45-45: Too many positional arguments (6/5)

(R0917)

[error] 86-86: Unexpected keyword argument 'output_file_name' in method call

(E1123)

quantmsio/commands/utils/stats.py

[refactor] 9-9: Consider using '{"help_option_names": ['-h', '--help']}' instead of a call to 'dict'.

(R1735)

🪛 actionlint (1.7.7)

.github/workflows/python-app.yml

37-37: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

66-66: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

90-90: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

.github/workflows/python-package.yml

23-23: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

56-56: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

.github/workflows/integration-tests.yml

28-28: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)

.github/workflows/python-app.yml

[warning] 35-35: wrong indentation: expected 6 but found 4

(indentation)

[warning] 64-64: wrong indentation: expected 6 but found 4

(indentation)

[warning] 88-88: wrong indentation: expected 6 but found 4

(indentation)

[error] 106-106: no new line character at the end of file

(new-line-at-end-of-file)

.github/workflows/python-package.yml

[error] 72-72: no new line character at the end of file

(new-line-at-end-of-file)

.github/workflows/integration-tests.yml

[error] 23-23: trailing spaces

(trailing-spaces)

[error] 26-26: trailing spaces

(trailing-spaces)

[error] 31-31: trailing spaces

(trailing-spaces)

[error] 40-40: trailing spaces

(trailing-spaces)

[error] 47-47: trailing spaces

(trailing-spaces)

[error] 54-54: trailing spaces

(trailing-spaces)

[error] 55-55: trailing spaces

(trailing-spaces)

[error] 61-61: trailing spaces

(trailing-spaces)

[error] 67-67: no new line character at the end of file

(new-line-at-end-of-file)

[error] 67-67: trailing spaces

(trailing-spaces)

🪛 LanguageTool

README.md

[uncategorized] ~114-~114: Loose punctuation mark.
Context: ...ment variables: - QUANTMSIO_LOG_LEVEL: Set logging level (DEBUG, INFO, WARNING...

(UNLIKELY_OPENING_PUNCTUATION)

docs/IMPLEMENTATION_DETAILS.md

[typographical] ~73-~73: If specifying a range, consider using an en dash instead of a hyphen.
Context: ...minutes for large datasets - After: 10-45 seconds for the same datasets - **Memor...

(HYPHEN_TO_EN)

🪛 markdownlint-cli2 (0.17.2)

README.md

132-132: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

docs/IMPLEMENTATION_DETAILS.md

177-177: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

270-270: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

271-271: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

280-280: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

281-281: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

290-290: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

291-291: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🪛 GitHub Check: CodeFactor

quantmsio/commands/convert/quantms_project.py

[notice] 45-199: quantmsio/commands/convert/quantms_project.py#L45-L199
Complex Method

tests/test_msstats_in.py

[notice] 98-331: tests/test_msstats_in.py#L98-L331
Complex Method

quantmsio/core/quantms/msstats_in.py

[notice] 260-260: quantmsio/core/quantms/msstats_in.py#L260
Do not use bare 'except'. (E722)

[notice] 683-683: quantmsio/core/quantms/msstats_in.py#L683
Do not use bare 'except'. (E722)

[notice] 683-684: quantmsio/core/quantms/msstats_in.py#L683-L684
Try, Except, Pass detected. (B110)

[notice] 266-266: quantmsio/core/quantms/msstats_in.py#L266
Do not use bare 'except'. (E722)

[notice] 68-68: quantmsio/core/quantms/msstats_in.py#L68
Do not use bare 'except'. (E722)

quantmsio/core/quantms/pg.py

[notice] 258-258: quantmsio/core/quantms/pg.py#L258
Do not use bare 'except'. (E722)

[notice] 64-64: quantmsio/core/quantms/pg.py#L64
Do not use bare 'except'. (E722)

[notice] 64-65: quantmsio/core/quantms/pg.py#L64-L65
Try, Except, Pass detected. (B110)

[notice] 172-172: quantmsio/core/quantms/pg.py#L172
Do not use bare 'except'. (E722)

[notice] 258-259: quantmsio/core/quantms/pg.py#L258-L259
Try, Except, Pass detected. (B110)

quantmsio/core/quantms/mztab.py

[notice] 802-802: quantmsio/core/quantms/mztab.py#L802
Do not use bare 'except'. (E722)

[notice] 787-788: quantmsio/core/quantms/mztab.py#L787-L788
Try, Except, Pass detected. (B110)

[notice] 794-794: quantmsio/core/quantms/mztab.py#L794
Do not use bare 'except'. (E722)

[notice] 680-680: quantmsio/core/quantms/mztab.py#L680
Do not use bare 'except'. (E722)

[notice] 794-795: quantmsio/core/quantms/mztab.py#L794-L795
Try, Except, Pass detected. (B110)

[notice] 802-803: quantmsio/core/quantms/mztab.py#L802-L803
Try, Except, Pass detected. (B110)

[notice] 691-691: quantmsio/core/quantms/mztab.py#L691
Do not use bare 'except'. (E722)

[notice] 691-692: quantmsio/core/quantms/mztab.py#L691-L692
Try, Except, Continue detected. (B112)

[notice] 787-787: quantmsio/core/quantms/mztab.py#L787
Do not use bare 'except'. (E722)

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: Codacy Static Code Analysis
GitHub Check: unit-tests
GitHub Check: unit-tests
GitHub Check: unit-tests (3.10)
GitHub Check: unit-tests (3.11)

🔇 Additional comments (41)

.gitignore (1)

8-70: Excellent comprehensive .gitignore update.

The extensive additions cover all the essential ignore patterns for the quantmsio project, including DuckDB databases, temporary processing files, test outputs, and standard Python development artifacts. These patterns align well with the project's data processing workflows and CLI command structure.

quantmsio/core/sdrf.py (1)

11-11: Minor import reordering.

The import order change is stylistic and doesn't affect functionality.

quantmsio/operate/statistics.py (1)

108-116: Excellent resource management implementation.

The addition of explicit close() and __del__() methods properly handles DuckDB connection lifecycle. The implementation correctly checks for attribute existence, closes the connection, and sets it to None to prevent double-closure issues. This pattern aligns with the similar implementation in quantmsio/operate/query.py and effectively prevents resource leaks.

quantmsio/operate/query.py (2)

15-19: Appropriate import organization.

The updated imports from pride_utils improve modularity by importing specific utility functions needed by the Query class methods.

344-352: Consistent resource management implementation.

The addition of close() and __del__() methods matches the pattern implemented in quantmsio/operate/statistics.py (lines 108-112), providing consistent DuckDB connection management across the codebase. This prevents resource leaks and ensures proper cleanup.

quantmsio/core/quantms/psm.py (2)

117-122: Improved logic separation for minimum score finding.

The refactoring to separate grouping and index extraction steps improves code clarity and makes the intent more explicit. This is a good improvement over the previous implementation.

231-250: Enhanced logging provides good traceability.

The addition of structured logging statements provides excellent traceability for PSM conversion operations, including input/output paths and optional protein filtering. This aligns well with the PR's logging enhancement objectives.

quantmsio/commands/transform/gene.py (1)

1-67: Well-structured CLI command with appropriate parameter handling.

The command follows good click patterns and properly handles the optional partitions parameter by splitting comma-separated values. While static analysis flags the high parameter count, all parameters appear necessary for the gene mapping functionality.

The delegation to generate_feature_of_gene keeps the CLI layer thin and focused on parameter handling, which is a good separation of concerns.

tests/test_diann.py (2)

3-4: Import path updates reflect module restructuring.

The updated import paths align with the module restructuring described in the PR objectives, moving to more specific submodules within quantmsio.core.quantms.

38-125: Comprehensive test coverage for protein group transformation.

This test excellently validates the complex nested structure of protein group data after transformation. It thoroughly checks:

Presence of required columns (intensities, additional_intensities)

Proper list structure for intensity arrays

Correct nested objects with expected fields (sample_accession, channel, intensity)

Validation of DIA-NN-specific values (LFQ channel)

Structure of additional intensity types with proper nesting

The test provides robust coverage for the refactored protein group handling and ensures data integrity throughout the transformation process.

README.md (2)

8-25: Excellent transformation to comprehensive user documentation.

The README has been completely restructured from a brief specification note into a comprehensive user and developer guide. This aligns perfectly with the PR's CLI reorganization and enhanced functionality, providing clear documentation for all the new command groups and features.

174-176: License inconsistency needs clarification.

The development section mentions MIT License, but the copyright notice at the bottom specifies GNU General Public License. This inconsistency should be resolved to avoid confusion.

Please clarify which license applies to this project - MIT (mentioned in line 176) or GPL (mentioned in the copyright notice starting at line 196). Consistent licensing information is important for open source projects.

quantmsio/core/quantms/feature.py (2)

58-62: Excellent use of context manager for resource management.

The refactoring to use a context manager with MsstatsIN ensures proper resource cleanup and automatic database closure. This is a significant improvement for reliability.

156-162: Good addition of structured logging.

The logging statements provide valuable traceability by recording input parameters, output paths, and optional filter files. This enhances debugging and monitoring capabilities.

Also applies to: 185-192
quantmsio/core/diann/diann.py (5)

15-24: Import reorganization looks good.

The reordering of imports to move DIANN_USECOLS after DIANN_PG_USECOLS and updating imports from quantmsio.core.quantms aligns with the module restructuring mentioned in the AI summary.

176-195: Review the additional_intensities structure for completeness.

The enhanced additional_intensities structure with nested intensities array containing intensity_name and intensity_value pairs is well-structured. This provides good separation between primary intensities and additional/derived intensities.

203-208: Good practice: Clean up intermediate columns.

Dropping the raw count columns unique_sequences and total_features after transformation is good practice to avoid confusion and reduce memory usage.

370-370: Verify consistency in additional_intensities key naming.

The change from "additional_intensity" to "intensities" in the add_additional_msg method should be consistent with the key used in generate_pg_matrix. Both methods now use "intensities" which is correct.

159-173: Manual Verification Required: Confirm intensities Array Structure Matches Schema

I was unable to locate any schema definitions for the new intensities field in the repository. Please ensure that the generated array with these properties:

sample_accession (string)

channel (string)

intensity (numeric)

aligns with the updated schema expectations.

Affected location:
• quantmsio/core/diann/diann.py (Lines 159–173)
# Create intensities array using the pg_quantity field (mapped from PG.Quantity)
report.loc[:, "intensities"] = report[
    ["reference_file_name", "pg_quantity"]
].apply(
    lambda rows: [
        {
            "sample_accession": self._sample_map[
                rows["reference_file_name"] + "-LFQ"
            ],
            "channel": "LFQ",
            "intensity": rows["pg_quantity"],
        }
    ],
    axis=1,
)
quantmsio/core/common.py (4)

8-11: Import reorganization improves module structure.

The updated imports from quantmsio and quantmsio.core.format align with the project's module restructuring and make dependencies more explicit.

99-111: Explicit column definition improves maintainability.

Defining DIANN_PG_USECOLS explicitly rather than dynamically generating it improves readability and makes it easier to maintain the column list independently of the mapping dictionary.

163-197: New mapping dictionaries extend format support.

The addition of MAXQUANT_PG_MAP and MZTAB_PG_MAP with their corresponding usage columns extends the library's support for multiple proteomics data formats. The field names are well-chosen and follow consistent naming conventions.

90-96: All DIANN_PG_MAP entries are correctly used in generate_pg_matrix.

The newly added mappings ("PG.Quantity" → pg_quantity, "Proteotypic" → unique_sequences, "Precursor.Quantity" → total_features) are referenced in diann.py for both the peptide_counts and feature_counts structs and in the intensities array creation, and are then dropped as expected. No further changes needed.

quantmsio/commands/utils/stats.py (1)

12-15: CLI command structure follows good practices.

The command group setup with proper context settings and clear documentation follows Click framework best practices.

.github/workflows/python-app.yml (2)

11-15: Good addition of scheduling and manual dispatch.

Adding cron scheduling for daily full test runs and workflow_dispatch for manual triggering improves the CI/CD workflow flexibility.

32-83: Excellent workflow restructuring for better testing granularity.

The separation into unit-tests, integration-tests, and full-test-suite jobs provides:

Faster feedback for unit tests

Conditional integration tests based on labels

Comprehensive testing on main branch and scheduled runs

This is a significant improvement over the single build job approach.

tests/test_full_conversion_suite.py (3)

66-80: Excellent subprocess configuration for robust testing.

The subprocess configuration includes:

UTF-8 encoding handling

Proper timeout (10 minutes for TMT protein groups)

Error handling with text output

Working directory set to workspace root

This ensures robust and reliable CLI testing.

119-167: Comprehensive test coverage for all conversion types.

The test suite covers all major conversion workflows (PSM, Feature, Protein Groups) for both LFQ and TMT datasets. Each test:

Validates input file existence

Creates temporary output directories

Runs the CLI command via subprocess

Analyzes output files for expected structure

Asserts success and data validity

This provides excellent end-to-end validation of the CLI functionality.

Also applies to: 169-223, 225-278

100-116: Good output file analysis helper.

The analyze_output_file function provides useful validation of parquet outputs including row counts, column counts, file size, and column names. The error handling ensures graceful failure when files can't be analyzed.
tests/test_maxquant.py (1)

356-362: Update main block to remove duplicate test function calls.

After removing the duplicate test functions, update the main block to only call the unique test functions.
 if __name__ == "__main__":
     test_maxquant_protein_groups_transform()
     test_maxquant_protein_groups_with_sample_columns()
-    test_maxquant_pg_basic_transformation()
-    test_maxquant_pg_sample_specific_columns()
     test_mztab_pg_gene_extraction()
     print("All MaxQuant protein group tests passed!")
Likely an incorrect or invalid review comment.
quantmsio/core/maxquant/maxquant.py (3)

445-496: Well-implemented batch processing for protein groups.

The iter_protein_groups_batch method demonstrates good practices:

Dynamic column detection for sample-specific intensities

Efficient chunked reading

Proper metadata storage using DataFrame attributes

587-619: Good separation of concerns in intensity creation methods.

The helper methods _create_intensities_from_samples and _create_additional_intensities properly encapsulate the logic for creating structured intensity data, making the code more maintainable and testable.

Also applies to: 620-663

733-749: Use isinstance() for type checking instead of direct comparison.

Direct type comparison with == is not recommended. Use isinstance() for proper type checking.
         for field, dtype in required_fields.items():
             if field not in df.columns:
-                if dtype == list:
+                if dtype is list:
                     df[field] = [[] for _ in range(len(df))]
-                elif dtype == str:
+                elif dtype is str:
                     df[field] = None
-                elif dtype == float:
+                elif dtype is float:
                     df[field] = 0.0
-                elif dtype == int:
+                elif dtype is int:
                     df[field] = 0
             else:
                 # Handle type conversions
-                if dtype == float:
+                if dtype is float:
                     df[field] = pd.to_numeric(df[field], errors="coerce").fillna(0.0)
-                elif dtype == int:
+                elif dtype is int:
                     df[field] = (
                         pd.to_numeric(df[field], errors="coerce").fillna(0).astype(int)
                     )
-                elif dtype == str:
+                elif dtype is str:
                     df[field] = df[field].astype(str).fillna("")
Likely an incorrect or invalid review comment.
quantmsio/core/quantms/mztab.py (1)

246-266: Context manager already handles file operations correctly.

The _safe_file_open method is properly implemented as a context manager that handles both regular and gzipped files. The static analysis warnings about lines 255 and 258 are false positives - the files are properly closed in the finally block.

quantmsio/core/quantms/msstats_in.py (2)

105-106: Variables are used implicitly in DuckDB SQL context

While static analysis flags these as unused, they're actually referenced in the SQL CREATE TABLE statements through DuckDB's implicit DataFrame registration.

Also applies to: 119-120

122-210: Well-structured SQL view creation with proper column handling

The implementation properly handles optional columns and creates appropriate views for both LFQ and TMT/iTRAQ experiments. The dynamic SQL generation based on available columns is a good approach.

docs/README.adoc (2)

1227-1280: Excellent documentation of intensity data structure

The clear distinction between primary intensities (raw measurements) and additional intensities (derived values) with concrete examples greatly improves the format specification. The semantic guidelines provide valuable guidance for implementers.

1405-1481: Well-structured protein group field definitions

The addition of structured peptide_counts and feature_counts fields, along with clearer descriptions of existing fields, improves the protein group specification. The mapping to tool-specific fields (DIA-NN, FragPipe, MaxQuant) is particularly helpful.
quantmsio/core/quantms/pg.py (3)

285-434: Well-architected protein quantification pipeline

The implementation effectively uses DuckDB for scalable SQL-based aggregation with proper error handling and detailed logging. The step-by-step approach with timing information aids debugging and performance analysis.

627-782: Comprehensive protein row construction with proper data transformation

The method effectively transforms SQL aggregation results into the structured protein group format. The handling of optional computations (TopN, iBAQ) and fallback values for missing data is well implemented.

517-524: DuckDB Implicit DataFrame Registration – No Change Required

The protein_df variable is picked up by DuckDB’s Python API at runtime—when you run
msstats_in._duckdb.execute(
    "CREATE TABLE protein_groups AS SELECT * FROM protein_df"
)
the protein_df object in the Python namespace is used as the source table. Although static analysis flags it as “unused,” it’s consumed implicitly by the SQL engine.

Optional (to satisfy linters): explicitly register the DataFrame before use:
msstats_in._duckdb.register("protein_df", protein_df)
msstats_in._duckdb.execute(
    "CREATE TABLE protein_groups AS SELECT * FROM protein_df"
)
File: quantmsio/core/quantms/pg.py
Lines: 517–524

coderabbitai · 2025-06-30T10:45:39Z

quantmsio/core/fragpipe/fragpipe.py

    def write_psms_to_parquet(
        self,
-        file_path: Path,
-        batch_size: int = 10000,
-        output_prefix_file: Optional[str] = None,
-        **metadata,
-    ):
-        if not file_path.exists():
-            raise FileNotFoundError(file_path)
-        if not self.output_directory.exists():
-            self.output_directory.mkdir(parents=True)
-        if not output_prefix_file:
-            output_prefix_file = "psm"
-
-        file_uuid = uuid.uuid4()
-        output_path = (
-            self.output_directory / f"{output_prefix_file}-{file_uuid}.psm.parquet"
-        )
-
-        metadata["file_type"] = "psm"
-        metadata["uuid"] = str(file_uuid)
-        metadata["creation_date"] = date.today().isoformat()
+        msms_file: Path,
+        batch_size: int = 1000000,
+        output_prefix: Optional[str] = None,
+    ) -> None:
+        """
+        Write PSMs to parquet file.
+
+        Args:
+            msms_file: Path to the psm.tsv file
+            batch_size: Number of rows to process at a time
+            output_prefix: Prefix for the output file name
+        """
+        if not output_prefix:
+            output_prefix = "psm"
+
+        file_uuid = str(uuid.uuid4())
+        output_path = self.output_directory / f"{output_prefix}-{file_uuid}.psm.parquet"
+
+        metadata = {
+            "file_type": "psm",
+            "uuid": file_uuid,
+            "creation_date": date.today().isoformat(),
+        }



🛠️ Refactor suggestion

Review the aggressive batch size increase and missing safety checks.

The refactoring improves code clarity with better parameter naming, type annotations, and documentation. However, there are two concerns:

Batch size increase: The default batch size increased 100x from 10,000 to 1,000,000. This could cause memory issues on systems with limited resources.

Missing safety checks: The removal of file existence checks and output directory creation could lead to less informative error messages and runtime failures.

Consider these improvements:

def write_psms_to_parquet( self, msms_file: Path, - batch_size: int = 1000000, + batch_size: int = 100000, # More conservative default output_prefix: Optional[str] = None, ) -> None: """ Write PSMs to parquet file. Args: msms_file: Path to the psm.tsv file batch_size: Number of rows to process at a time output_prefix: Prefix for the output file name """ + # Validate input file exists + if not msms_file.exists(): + raise FileNotFoundError(f"PSM file not found: {msms_file}") + + # Ensure output directory exists + self.output_directory.mkdir(parents=True, exist_ok=True) + if not output_prefix: output_prefix = "psm"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def write_psms_to_parquet(

self,

file_path: Path,

batch_size: int = 10000,

output_prefix_file: Optional[str] = None,

**metadata,

):

if not file_path.exists():

raise FileNotFoundError(file_path)

if not self.output_directory.exists():

self.output_directory.mkdir(parents=True)

if not output_prefix_file:

output_prefix_file = "psm"

file_uuid = uuid.uuid4()

output_path = (

self.output_directory / f"{output_prefix_file}-{file_uuid}.psm.parquet"

)

metadata["file_type"] = "psm"

metadata["uuid"] = str(file_uuid)

metadata["creation_date"] = date.today().isoformat()

msms_file: Path,

batch_size: int = 1000000,

output_prefix: Optional[str] = None,

) -> None:

"""

Write PSMs to parquet file.

Args:

msms_file: Path to the psm.tsv file

batch_size: Number of rows to process at a time

output_prefix: Prefix for the output file name

"""

if not output_prefix:

output_prefix = "psm"

file_uuid = str(uuid.uuid4())

output_path = self.output_directory / f"{output_prefix}-{file_uuid}.psm.parquet"

metadata = {

"file_type": "psm",

"uuid": file_uuid,

"creation_date": date.today().isoformat(),

}

def write_psms_to_parquet(

self,

msms_file: Path,

batch_size: int = 100000, # More conservative default

output_prefix: Optional[str] = None,

) -> None:

"""

Write PSMs to parquet file.

Args:

msms_file: Path to the psm.tsv file

batch_size: Number of rows to process at a time

output_prefix: Prefix for the output file name

"""

# Validate input file exists

if not msms_file.exists():

raise FileNotFoundError(f"PSM file not found: {msms_file}")

# Ensure output directory exists

self.output_directory.mkdir(parents=True, exist_ok=True)

if not output_prefix:

output_prefix = "psm"

file_uuid = str(uuid.uuid4())

output_path = self.output_directory / f"{output_prefix}-{file_uuid}.psm.parquet"

metadata = {

"file_type": "psm",

"uuid": file_uuid,

"creation_date": date.today().isoformat(),

}

# ... rest of implementation ...

🤖 Prompt for AI Agents

In quantmsio/core/fragpipe/fragpipe.py around lines 366 to 391, the default batch size was increased from 10,000 to 1,000,000, which may cause memory issues on some systems; reduce the default batch size back to a safer value like 10,000. Also, add safety checks to verify that the input msms_file exists before processing and ensure the output directory exists or create it if missing to prevent runtime errors and provide clearer error messages.

coderabbitai · 2025-06-30T10:45:39Z

quantmsio/core/sdrf.py

+            raise FileNotFoundError(
+                "The SDRF file provided not found: " + str(sdrf_file)
+            )


🛠️ Refactor suggestion

Improve exception chaining for better error context.

The string conversion fix for the error message is good. However, consider using proper exception chaining to preserve the original error context.

Apply this diff to implement proper exception chaining:

except FileNotFoundError: - raise FileNotFoundError( - "The SDRF file provided not found: " + str(sdrf_file) - ) + raise FileNotFoundError( + "The SDRF file provided not found: " + str(sdrf_file) + ) from None

Alternatively, if you want to preserve the original exception context:

-except FileNotFoundError: +except FileNotFoundError as err: raise FileNotFoundError( "The SDRF file provided not found: " + str(sdrf_file) - ) + ) from err

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

raise FileNotFoundError(

"The SDRF file provided not found: " + str(sdrf_file)

)

except FileNotFoundError:

raise FileNotFoundError(

"The SDRF file provided not found: " + str(sdrf_file)

) from None

Suggested change

raise FileNotFoundError(

"The SDRF file provided not found: " + str(sdrf_file)

)

except FileNotFoundError as err:

raise FileNotFoundError(

"The SDRF file provided not found: " + str(sdrf_file)

) from err

🧰 Tools

🪛 Ruff (0.11.9)

127-129: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents

In quantmsio/core/sdrf.py around lines 127 to 129, the FileNotFoundError is raised with a custom message but lacks proper exception chaining. Modify the code to use "raise FileNotFoundError(...) from original_exception" where original_exception is the caught exception, to preserve the original error context and improve debugging. This involves catching the original exception first, then raising the new FileNotFoundError with the "from" keyword referencing the caught exception.

coderabbitai · 2025-06-30T10:45:40Z

quantmsio/core/quantms/psm.py

+        if self.mztab_path.stat().st_size == 0:
            raise ValueError("File is empty")
-        f = open(self.mztab_path)
+        # Use seekable file path to handle gzip files
+        seekable_path = self._get_seekable_file_path()
+        f = open(seekable_path, "r", encoding="utf-8")
        pos = self._get_pos("PEH")
        f.seek(pos)
        line = f.readline()
        while not line.startswith("PEH"):
            line = f.readline()
        self._pep_columns = line.split("\n")[0].split("\t")
+        f.close()


🛠️ Refactor suggestion

Use context manager for file operations.

The file handling has been improved to support gzip files, but the manual file opening and closing should use a context manager for better resource management and exception safety.

- # Use seekable file path to handle gzip files - seekable_path = self._get_seekable_file_path() - f = open(seekable_path, "r", encoding="utf-8") - pos = self._get_pos("PEH") - f.seek(pos) - line = f.readline() - while not line.startswith("PEH"): - line = f.readline() - self._pep_columns = line.split("\n")[0].split("\t") - f.close() + # Use seekable file path to handle gzip files + seekable_path = self._get_seekable_file_path() + with open(seekable_path, "r", encoding="utf-8") as f: + pos = self._get_pos("PEH") + f.seek(pos) + line = f.readline() + while not line.startswith("PEH"): + line = f.readline() + self._pep_columns = line.split("\n")[0].split("\t")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if self.mztab_path.stat().st_size == 0:

raise ValueError("File is empty")

f = open(self.mztab_path)

# Use seekable file path to handle gzip files

seekable_path = self._get_seekable_file_path()

f = open(seekable_path, "r", encoding="utf-8")

pos = self._get_pos("PEH")

f.seek(pos)

line = f.readline()

while not line.startswith("PEH"):

line = f.readline()

self._pep_columns = line.split("\n")[0].split("\t")

f.close()

if self.mztab_path.stat().st_size == 0:

raise ValueError("File is empty")

# Use seekable file path to handle gzip files

seekable_path = self._get_seekable_file_path()

with open(seekable_path, "r", encoding="utf-8") as f:

pos = self._get_pos("PEH")

f.seek(pos)

line = f.readline()

while not line.startswith("PEH"):

line = f.readline()

self._pep_columns = line.split("\n")[0].split("\t")

🧰 Tools

🪛 Ruff (0.11.9)

62-62: Use a context manager for opening files

(SIM115)

🪛 Pylint (3.3.7)

[refactor] 62-62: Consider using 'with' for resource-allocating operations

(R1732)

🤖 Prompt for AI Agents

In quantmsio/core/quantms/psm.py around lines 58 to 69, the file is opened and closed manually which risks resource leaks if exceptions occur. Replace the manual open and close calls with a context manager using the "with" statement to ensure the file is properly closed after reading, improving resource management and exception safety.

coderabbitai · 2025-06-30T10:45:40Z

.github/workflows/python-package.yml

    - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v3
+      uses: actions/setup-python@v4


⚠️ Potential issue

Update GitHub Actions to latest version.

The actions/setup-python@v4 action is outdated and should be updated to v5 for better compatibility and security.

- uses: actions/setup-python@v4 + uses: actions/setup-python@v5

Apply this change to both occurrences (lines 23 and 56).

Also applies to: 56-56

🧰 Tools

🪛 actionlint (1.7.7)

23-23: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents

In .github/workflows/python-package.yml at lines 23 and 56, the GitHub Action 'actions/setup-python' is using version 'v4', which is outdated. Update both occurrences to use 'actions/setup-python@v5' to ensure better compatibility and security.

coderabbitai · 2025-06-30T10:45:40Z

.github/workflows/integration-tests.yml

+    - uses: actions/checkout@v4
+
+    - name: Set up Python 3.10
+      uses: actions/setup-python@v4


⚠️ Potential issue

Update GitHub Actions to latest version.

The actions/setup-python@v4 action is outdated and should be updated to v5.

- uses: actions/setup-python@v4 + uses: actions/setup-python@v5

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

uses: actions/setup-python@v4

uses: actions/setup-python@v5

🧰 Tools

🪛 actionlint (1.7.7)

28-28: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents

In .github/workflows/integration-tests.yml at line 28, the GitHub Action uses an outdated version of actions/setup-python (v4). Update the version tag from v4 to v5 to use the latest version of the action.

coderabbitai · 2025-06-30T10:45:42Z

quantmsio/commands/convert/quantms_project.py

+        project_handler.populate_from_sdrf(str(sdrf_file))
+        project_handler.add_quantms_version(quantmsio_version=quantmsio_version)
+        project_handler.add_software_provider(
+            sortware_name="quantms", sortware_version=quantms_version


⚠️ Potential issue

Fix typo in parameter name.

The parameter name has a typo that will cause a runtime error.

- sortware_name="quantms", sortware_version=quantms_version + software_name="quantms", software_version=quantms_version

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sortware_name="quantms", sortware_version=quantms_version

software_name="quantms", software_version=quantms_version

🤖 Prompt for AI Agents

In quantmsio/commands/convert/quantms_project.py at line 115, there is a typo in the parameter name "sortware_name" which should be corrected to "software_name" to prevent runtime errors. Update the parameter name to the correct spelling.

quantmsio/core/quantms/mztab.py

quantmsio/core/quantms/msstats_in.py

coderabbitai · 2025-06-30T10:45:42Z

quantmsio/core/quantms/msstats_in.py

+        except Exception as e:
+            print(f"Warning: Could not setup optimized processing: {e}")
+            # Fall back to original processing


🛠️ Refactor suggestion

Log exception details for better debugging

The current exception handling swallows all exceptions and only prints a warning. This could make debugging difficult in production.

except Exception as e: - print(f"Warning: Could not setup optimized processing: {e}") + import logging + logger = logging.getLogger(__name__) + logger.warning(f"Could not setup optimized processing: {type(e).__name__}: {e}") # Fall back to original processing + self._optimized_setup_done = False

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception as e:

print(f"Warning: Could not setup optimized processing: {e}")

# Fall back to original processing

except Exception as e:

import logging

logger = logging.getLogger(__name__)

logger.warning(f"Could not setup optimized processing: {type(e).__name__}: {e}")

# Fall back to original processing

self._optimized_setup_done = False

🤖 Prompt for AI Agents

In quantmsio/core/quantms/msstats_in.py around lines 56 to 58, the exception handling only prints a warning message without logging the full exception details, which hinders debugging. Modify the except block to log the complete exception information, including the stack trace, using a proper logging mechanism instead of just printing a warning. This will provide better visibility into the error cause during production issues.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (11)

quantmsio/commands/convert/quantms_project.py (3)
5-5: Remove unused import.

The logging import is not used in this file and should be removed.

111-111: Fix typo in parameter name.

The parameter name has a typo that will cause a runtime error.
-            sortware_name="quantms", sortware_version=quantms_version
+            software_name="quantms", software_version=quantms_version
43-201: Refactor complex function to improve maintainability.

This function is overly complex with multiple responsibilities and too many parameters. Consider breaking it down into smaller, focused functions for better maintainability.
quantmsio/core/quantms/mztab.py (2)

680-680: Replace bare except clauses with specific exceptions.

Using bare except: can catch system exits and keyboard interrupts. Be more specific about which exceptions to handle.

Also applies to: 691-691

785-803: Replace bare except clauses in cleanup methods.

The cleanup methods use bare except clauses which should be replaced with specific exception types for better error handling.

tests/test_maxquant.py (3)

6-6: Remove unused import.

The pytest import is not used in this file and should be removed.

223-287: Remove duplicate test function.

This function test_maxquant_pg_basic_transformation is an exact duplicate of test_maxquant_protein_groups_transform and should be removed to avoid redundancy.

289-329: Remove duplicate test function.

This function test_maxquant_pg_sample_specific_columns is an exact duplicate of test_maxquant_protein_groups_with_sample_columns and should be removed to avoid redundancy.
quantmsio/core/quantms/msstats_in.py (3)
68-73: Fix exception handling in column detection

Multiple issues with exception handling:

Bare except clause at line 68

Unused variable 'e' at line 76

Should log the exception details for debugging
             try:
                 columns_query = "PRAGMA table_info('report')"
                 columns_info = self._duckdb.execute(columns_query).df()
                 available_columns = set(columns_info["name"].tolist())
-            except:
+            except (AttributeError, KeyError, ValueError) as e:
+                import logging
+                logger = logging.getLogger(__name__)
+                logger.debug(f"PRAGMA query failed: {e}")
                 # Fallback: get columns from a sample query
                 sample_query = "SELECT * FROM report LIMIT 1"
                 sample_data = self._duckdb.execute(sample_query).df()
                 available_columns = set(sample_data.columns.tolist())

             self._available_columns = available_columns

-        except Exception as e:
+        except Exception:
             # Default column set based on common msstats format
             self._available_columns = {
Also applies to: 76-87

56-58: Replace bare except with specific exception handling and improve fallback handling

The bare except clause should catch specific exceptions, use proper logging instead of print, and ensure _optimized_setup_done remains False on failure.
-        except Exception as e:
-            print(f"Warning: Could not setup optimized processing: {e}")
-            # Fall back to original processing
+        except Exception as e:
+            import logging
+            logger = logging.getLogger(__name__)
+            logger.warning(f"Could not setup optimized processing: {type(e).__name__}: {e}")
+            # Fall back to original processing
+            self._optimized_setup_done = False
260-267: Replace bare except clauses in fallback statistics method

The bare except clauses could catch system exceptions and hide real issues.
         try:
             columns_query = "PRAGMA table_info('report')"
             columns_info = self._duckdb.execute(columns_query).df()
             available_columns = set(columns_info["name"].tolist())
-        except:
+        except (AttributeError, KeyError, ValueError):
             # If pragma doesn't work, try a different approach
             try:
                 sample_query = "SELECT * FROM report LIMIT 1"
                 sample_data = self._duckdb.execute(sample_query).df()
                 available_columns = set(sample_data.columns.tolist())
-            except:
+            except Exception:
                 # Fallback to basic assumptions
                 available_columns = {

🧹 Nitpick comments (3)

tests/test_full_conversion_suite.py (1)

70-76: Minor style improvement: simplify conditional structure.

The else clause after return is unnecessary and can be simplified.

-        if result.returncode == 0:
-            print(f"[OK] Success in {duration:.2f}s")
-            return True, duration, result.stdout, result.stderr
-        else:
-            print(f"[FAIL] Failed in {duration:.2f}s")
-            print(f"Error: {result.stderr}")
-            return False, duration, result.stdout, result.stderr
+        if result.returncode == 0:
+            print(f"[OK] Success in {duration:.2f}s")
+            return True, duration, result.stdout, result.stderr
+        
+        print(f"[FAIL] Failed in {duration:.2f}s")
+        print(f"Error: {result.stderr}")
+        return False, duration, result.stdout, result.stderr

quantmsio/core/quantms/msstats_in.py (1)

105-106: Remove unused DataFrame assignments

The DataFrames are created but the variables are never used. DuckDB references them by name in the SQL string.

-        mapping_df = pd.DataFrame(mapping_data)
-        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
+        # DuckDB can reference the DataFrame directly by name
+        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM pd.DataFrame(mapping_data)")

Or alternatively:

         # Insert into DuckDB using pandas DataFrame
         mapping_df = pd.DataFrame(mapping_data)
-        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
+        self._duckdb.register("mapping_df", mapping_df)
+        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
+        self._duckdb.unregister("mapping_df")

Also applies to: 119-120

quantmsio/core/quantms/pg.py (1)

285-295: Consider using a configuration object to reduce method arguments

The method has 9 arguments which impacts readability and maintainability. Consider using a configuration dataclass or dictionary.

from dataclasses import dataclass

@dataclass
class QuantificationConfig:
    compute_topn: bool = True
    topn: int = 3
    compute_ibaq: bool = True
    file_num: int = 10
    duckdb_max_memory: str = "16GB"
    duckdb_threads: int = 4

def quantify_from_msstats_optimized(
    self,
    msstats_path: str,
    sdrf_path: str,
    config: Optional[QuantificationConfig] = None
) -> pd.DataFrame:
    if config is None:
        config = QuantificationConfig()
    # Use config.compute_topn, config.topn, etc.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 583005d and 8ce4f5e.

📒 Files selected for processing (10)

quantmsio/commands/convert/quantms_project.py (1 hunks)
quantmsio/core/duckdb.py (2 hunks)
quantmsio/core/quantms/msstats_in.py (1 hunks)
quantmsio/core/quantms/mztab.py (1 hunks)
quantmsio/core/quantms/pg.py (1 hunks)
tests/test_diann.py (2 hunks)
tests/test_feature.py (4 hunks)
tests/test_full_conversion_suite.py (1 hunks)
tests/test_maxquant.py (2 hunks)
tests/test_query.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

tests/test_query.py
tests/test_feature.py
quantmsio/core/duckdb.py

🧰 Additional context used

🧬 Code Graph Analysis (2)

quantmsio/core/quantms/mztab.py (8)

quantmsio/quantmsioc.py (1)

project (84-86)

quantmsio/core/project.py (1)

create_uuid_filename (59-62)

quantmsio/operate/tools.py (1)

get_modification_details (205-247)

quantmsio/utils/pride_utils.py (1)

get_quantmsio_modifications (271-286)

quantmsio/core/quantms/pg.py (2)

_safe_file_open (229-273)

cleanup (35-58)

quantmsio/operate/query.py (1)

close (344-348)

quantmsio/operate/statistics.py (1)

close (108-112)

quantmsio/core/fragpipe/fragpipe.py (2)

append (43-44)

peptidoform (194-219)

quantmsio/core/quantms/msstats_in.py (4)

quantmsio/core/duckdb.py (3)

DuckDB (13-101)

destroy_duckdb_database (96-101)

query_field (86-94)

quantmsio/core/sdrf.py (3)

SDRFHandler (90-454)

get_experiment_type_from_sdrf (269-314)

get_sample_map_run (407-420)

quantmsio/operate/tools.py (1)

get_protein_accession (267-272)

quantmsio/utils/pride_utils.py (1)

clean_peptidoform_sequence (147-159)

🪛 Ruff (0.11.9)

tests/test_maxquant.py

6-6: pytest imported but unused

Remove unused import: pytest

(F401)

quantmsio/core/quantms/mztab.py

255-255: Use a context manager for opening files

(SIM115)

258-258: Use a context manager for opening files

(SIM115)

414-414: Use a context manager for opening files

(SIM115)

478-478: Use a context manager for opening files

(SIM115)

495-495: Use kwargs.get("usecols") instead of kwargs.get("usecols", None)

Replace kwargs.get("usecols", None) with kwargs.get("usecols")

(SIM910)

680-680: Do not use bare except

(E722)

691-691: Do not use bare except

(E722)

785-788: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)

792-795: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)

800-803: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)

quantmsio/core/quantms/pg.py

62-65: Use contextlib.suppress(Exception) instead of try-except-pass

(SIM105)

241-241: Use a context manager for opening files

(SIM115)

246-246: Use a context manager for opening files

(SIM115)

261-264: Use contextlib.suppress(ValueError) instead of try-except-pass

(SIM105)

519-519: Local variable protein_df is assigned to but never used

Remove assignment to unused variable protein_df

(F841)

quantmsio/commands/convert/quantms_project.py

5-5: logging imported but unused

Remove unused import: logging

(F401)

quantmsio/core/quantms/msstats_in.py

76-76: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

105-105: Local variable mapping_df is assigned to but never used

Remove assignment to unused variable mapping_df

(F841)

119-119: Local variable sample_df is assigned to but never used

Remove assignment to unused variable sample_df

(F841)

260-260: Do not use bare except

(E722)

266-266: Do not use bare except

(E722)

🪛 Flake8 (7.2.0)

tests/test_maxquant.py

[error] 6-6: 'pytest' imported but unused

(F401)

quantmsio/core/quantms/mztab.py

[error] 680-680: do not use bare 'except'

(E722)

[error] 691-691: do not use bare 'except'

(E722)

[error] 769-769: whitespace before ':'

(E203)

quantmsio/core/quantms/pg.py

[error] 519-519: local variable 'protein_df' is assigned to but never used

(F841)

[error] 583-583: whitespace before ':'

(E203)

quantmsio/commands/convert/quantms_project.py

[error] 5-5: 'logging' imported but unused

(F401)

quantmsio/core/quantms/msstats_in.py

[error] 76-76: local variable 'e' is assigned to but never used

(F841)

[error] 105-105: local variable 'mapping_df' is assigned to but never used

(F841)

[error] 119-119: local variable 'sample_df' is assigned to but never used

(F841)

[error] 219-219: whitespace before ':'

(E203)

[error] 260-260: do not use bare 'except'

(E722)

[error] 266-266: do not use bare 'except'

(E722)

[error] 533-533: whitespace before ':'

(E203)

🪛 GitHub Check: CodeFactor

quantmsio/core/quantms/mztab.py

[notice] 680-680: quantmsio/core/quantms/mztab.py#L680
Do not use bare 'except'. (E722)

[notice] 691-691: quantmsio/core/quantms/mztab.py#L691
Do not use bare 'except'. (E722)

[notice] 691-692: quantmsio/core/quantms/mztab.py#L691-L692
Try, Except, Continue detected. (B112)

[notice] 794-795: quantmsio/core/quantms/mztab.py#L794-L795
Try, Except, Pass detected. (B110)

[notice] 787-788: quantmsio/core/quantms/mztab.py#L787-L788
Try, Except, Pass detected. (B110)

[notice] 802-803: quantmsio/core/quantms/mztab.py#L802-L803
Try, Except, Pass detected. (B110)

quantmsio/core/quantms/pg.py

[notice] 64-65: quantmsio/core/quantms/pg.py#L64-L65
Try, Except, Pass detected. (B110)

[notice] 270-271: quantmsio/core/quantms/pg.py#L270-L271
Try, Except, Pass detected. (B110)

quantmsio/commands/convert/quantms_project.py

[notice] 45-199: quantmsio/commands/convert/quantms_project.py#L45-L199
Complex Method

quantmsio/core/quantms/msstats_in.py

[notice] 260-260: quantmsio/core/quantms/msstats_in.py#L260
Do not use bare 'except'. (E722)

[notice] 68-68: quantmsio/core/quantms/msstats_in.py#L68
Do not use bare 'except'. (E722)

[notice] 683-684: quantmsio/core/quantms/msstats_in.py#L683-L684
Try, Except, Pass detected. (B110)

🪛 Pylint (3.3.7)

quantmsio/core/quantms/mztab.py

[refactor] 83-83: Too many instance attributes (22/7)

(R0902)

[refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 414-414: Consider using 'with' for resource-allocating operations

(R1732)

[refactor] 478-478: Consider using 'with' for resource-allocating operations

(R1732)

[refactor] 606-640: Too many nested blocks (7/5)

(R1702)

[refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

[refactor] 715-732: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

quantmsio/core/quantms/pg.py

[error] 73-73: Non-iterable value file is used in an iterating context

(E1133)

[refactor] 74-82: Unnecessary "elif" after "break", remove the leading "el" from "elif"

(R1723)

[error] 110-110: Non-iterable value file is used in an iterating context

(E1133)

[refactor] 92-92: Too many branches (13/12)

(R0912)

[refactor] 192-195: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 277-283: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

[refactor] 285-285: Too many arguments (9/5)

(R0913)

[refactor] 285-285: Too many positional arguments (9/5)

(R0917)

[refactor] 285-285: Too many local variables (35/15)

(R0914)

[refactor] 285-285: Too many statements (55/50)

(R0915)

[refactor] 571-571: Useless return at end of function or method

(R1711)

[refactor] 627-627: Too many arguments (7/5)

(R0913)

[refactor] 627-627: Too many positional arguments (7/5)

(R0917)

[refactor] 627-627: Too many local variables (23/15)

(R0914)

quantmsio/commands/convert/quantms_project.py

[refactor] 43-43: Too many arguments (6/5)

(R0913)

[refactor] 43-43: Too many positional arguments (6/5)

(R0917)

[refactor] 43-43: Too many local variables (25/15)

(R0914)

[refactor] 43-43: Too many branches (14/12)

(R0912)

[refactor] 43-43: Too many statements (78/50)

(R0915)

[refactor] 243-243: Too many arguments (6/5)

(R0913)

[refactor] 243-243: Too many positional arguments (6/5)

(R0917)

quantmsio/core/quantms/msstats_in.py

[refactor] 22-22: Consider using Python 3 style super() without arguments

(R1725)

[refactor] 405-406: Use 'yield from' directly instead of yielding each element one by one

(R1737)

tests/test_full_conversion_suite.py

[refactor] 70-76: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: Codacy Static Code Analysis
GitHub Check: unit-tests
GitHub Check: unit-tests
GitHub Check: unit-tests (3.10)

🔇 Additional comments (13)

quantmsio/core/quantms/mztab.py (2)

83-136: LGTM! Excellent architecture for MzTab processing.

The class design demonstrates good practices with:

Optional DuckDB integration for large files

Proper gzip support

Resource management with context managers

Intelligent file size-based optimization

Comprehensive metadata caching

The initialization logic is well-structured and handles edge cases appropriately.

246-266: Well-implemented context manager for file operations.

The _safe_file_open method properly handles both gzipped and regular files with appropriate error handling and resource cleanup.

tests/test_diann.py (2)

20-27: Excellent addition of proper cleanup handling.

The try-finally blocks ensure that DuckDB resources are properly cleaned up even if exceptions occur during processing. This prevents resource leaks and is a good practice for database connections.

Also applies to: 35-45

48-145: Comprehensive test coverage for protein group intensity structure.

This new test function excellently validates:

The structured intensity representation with proper sample_accession and channel mapping

Both intensities and additional_intensities fields

Nested intensity arrays with correct types ("normalize_intensity" and "lfq")

Proper channel assignment ("LFQ" for DIA-NN)

The test provides thorough coverage for the refactored protein group handling and ensures data integrity.

tests/test_maxquant.py (1)

331-357: Excellent test for gene name extraction.

This test function provides good coverage for the gene name extraction logic from protein descriptions, testing various formats including edge cases like missing gene names and multiple gene entries.

tests/test_full_conversion_suite.py (3)

17-43: Excellent test infrastructure setup.

The helper functions provide a solid foundation for integration tests with proper workspace detection and file path management. The separation of LFQ and TMT test files is well-organized.

46-84: Well-implemented command execution with robust error handling.

The run_command function includes excellent practices:

Proper UTF-8 encoding handling

Reasonable timeout for long-running operations

Comprehensive error capture and logging

Good timing information for performance monitoring

105-426: Comprehensive integration test coverage.

This test suite provides excellent end-to-end validation of the conversion workflows:

Tests all major data types (PSM, Feature, Protein Groups)

Covers both LFQ and TMT experiments

Validates output file generation and content structure

Includes proper column count and data row verification

Uses appropriate pytest markers for integration tests

The tests ensure the CLI commands produce valid and complete quantms.io parquet outputs.

quantmsio/core/quantms/msstats_in.py (1)

28-35: Well-designed context manager and cleanup implementation

The implementation correctly follows context manager protocol and ensures proper cleanup of database resources. The try/except/pass in __del__ is appropriate for destructor cleanup to avoid exceptions during garbage collection.

Also applies to: 671-685
quantmsio/core/quantms/pg.py (4)

19-26: Excellent resource management pattern

Great implementation of resource tracking and cleanup:

Initializing tracking lists before super().__init__ prevents issues if parent initialization fails

Comprehensive cleanup of file handles and temporary files

Proper context manager support

The try/except/pass in __del__ is appropriate for destructors

Also applies to: 35-66

229-274: Well-implemented safe file handling with automatic format detection

The _safe_file_open method provides excellent file handling:

Automatic detection and handling of gzipped files

Proper resource tracking for cleanup

Context manager pattern ensures files are closed

Appropriate error handling in cleanup

591-626: Well-structured SQL generation with proper aggregation

The SQL generation is clean and efficient:

Proper use of GROUP BY for aggregation

Good selection of aggregate functions (SUM, COUNT, MAX, AVG)

Filtering out null anchor proteins and zero intensities

Proper ordering for consistent results

518-523: Remove unused DataFrame assignment

The protein_df variable is created but never used directly - DuckDB references DataFrames by name in SQL.
             # Convert to DataFrame and load into DuckDB
             if protein_data:
-                protein_df = pd.DataFrame(protein_data)
                 msstats_in._duckdb.execute("DROP TABLE IF EXISTS protein_groups")
                 msstats_in._duckdb.execute(
-                    "CREATE TABLE protein_groups AS SELECT * FROM protein_df"
+                    "CREATE TABLE protein_groups AS SELECT * FROM pd.DataFrame(protein_data)"
                 )
Likely an incorrect or invalid review comment.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

quantmsio/core/quantms/mztab.py (1)

784-808: Replace bare except clauses with specific exceptions.

Using bare except: can catch system exits and keyboard interrupts. Be more specific about which exceptions to handle.

The previous review comment about replacing bare except clauses is still applicable. Apply the suggested fix:

         if self._duckdb:
             try:
                 self._duckdb.close()
-            except:
+            except Exception as e:
+                self.logger.warning(f"Failed to close DuckDB connection: {e}")
-                pass

         if self._duckdb_name and Path(self._duckdb_name).exists():
             try:
                 os.remove(self._duckdb_name)
-            except:
+            except (OSError, IOError) as e:
+                self.logger.warning(f"Failed to remove DuckDB file {self._duckdb_name}: {e}")
-                pass

         if self._temp_decompressed_file and self._temp_decompressed_file.exists():
             try:
                 self._temp_decompressed_file.unlink()
-            except:
+            except (OSError, IOError) as e:
+                self.logger.warning(f"Failed to remove temporary file {self._temp_decompressed_file}: {e}")
-                pass

quantmsio/core/quantms/msstats_in.py (1)

56-58: This is a duplicate of a previous review comment.

The exception handling has been improved with logging, but the logger attribute issue needs to be addressed.

🧹 Nitpick comments (10)

quantmsio/core/quantms/mztab.py (3)

32-81: Consider refactoring complex nested logic.

The function has deeply nested conditionals that make it difficult to follow. Consider extracting helper methods for better readability.

Consider this refactoring approach:

+def _parse_modification_base(line_parts: list, modifications: dict) -> dict:
+    """Parse base modification information."""
+    values = line_parts[2].replace("[", "").replace("]", "").split(",")
+    accession = values[1].strip()
+    name = values[2].strip()
+    index = line_parts[1].split("[")[1].split("]")[0]
+    modifications[accession] = [name, index, None, None]
+    return modifications
+
+def _find_modification_by_index(modifications: dict, index: str) -> Optional[str]:
+    """Find modification accession by index."""
+    for key, value in modifications.items():
+        if value[1] == index:
+            return key
+    return None
+
 def fetch_modifications_from_mztab_line(line: str, _modifications: dict) -> dict:
     line = line.strip()
     line_parts = line.split("\t")
     if line_parts[0] == "MTD" and "_mod[" in line_parts[1]:
         if "site" not in line_parts[1] and "position" not in line_parts[1]:
-            values = line_parts[2].replace("[", "").replace("]", "").split(",")
-            accession = values[1].strip()
-            name = values[2].strip()
-            index = line_parts[1].split("[")[1].split("]")[0]
-            _modifications[accession] = [name, index, None, None]
+            _modifications = _parse_modification_base(line_parts, _modifications)
         elif "site" in line_parts[1]:
             index = line_parts[1].split("[")[1].split("]")[0]
-            accession = None
-            for key, value in _modifications.items():
-                if value[1] == index:
-                    accession = key
+            accession = _find_modification_by_index(_modifications, index)
             if accession is None:
                 raise Exception("The accession for the modification is None")
             _modifications[accession][2] = line_parts[2]
         elif "position" in line_parts[1]:
             index = line_parts[1].split("[")[1].split("]")[0]
-            accession = None
-            for key, value in _modifications.items():
-                if value[1] == index:
-                    accession = key
+            accession = _find_modification_by_index(_modifications, index)
             if accession is None:
                 raise Exception("The accession for the modification is None")
             _modifications[accession][3] = line_parts[2]
     return _modifications

407-448: Consider improving file handle management for chunked reads.

The method opens file handles for chunked reading without immediate context management. While this may be intentional for streaming operations, consider documenting this pattern or exploring alternative approaches.

For better resource management, consider this pattern:

+    def _create_chunked_reader(self, seekable_path: Path, pos: int, nrows: int, **kwargs):
+        """Create a chunked reader with proper resource management."""
+        f = open(seekable_path, "r", encoding="utf-8")
+        f.seek(pos)
+        return pd.read_csv(f, sep="\t", nrows=nrows, low_memory=False, **kwargs), f
+
     def __load_second(self, header, **kwargs):
         seekable_path = self._get_seekable_file_path()
         
         if "chunksize" in kwargs and kwargs["chunksize"] is not None:
-            f = open(seekable_path, "r", encoding="utf-8")
-            if header == "PSH":
-                f.seek(self._psm_pos)
-                return pd.read_csv(f, sep="\t", nrows=self._psm_len, low_memory=False, **kwargs)
+            if header == "PSH":
+                return self._create_chunked_reader(seekable_path, self._psm_pos, self._psm_len, **kwargs)
             # ... similar for other headers

495-495: Simplify kwargs.get usage.

-                        columns = kwargs.get("usecols", None)
+                        columns = kwargs.get("usecols")

quantmsio/core/quantms/msstats_in.py (7)

22-22: Modernize the super() call to Python 3 style.

The current super() call uses the outdated Python 2 style with explicit arguments.

-        super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads)
+        super().__init__(report_path, duckdb_max_memory, duckdb_threads)

114-116: Remove unnecessary variable assignment.

The mapping_df variable is created but never used after the SQL execution.

-        # Insert into DuckDB using pandas DataFrame
-        mapping_df = pd.DataFrame(mapping_data)
-        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
+        # Insert into DuckDB using pandas DataFrame
+        self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM ?", [pd.DataFrame(mapping_data)])

128-130: Remove unnecessary variable assignment.

The sample_df variable is created but never used after the SQL execution.

-        # Insert into DuckDB using pandas DataFrame
-        sample_df = pd.DataFrame(sample_data)
-        self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM sample_df")
+        # Insert into DuckDB using pandas DataFrame
+        self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM ?", [pd.DataFrame(sample_data)])

132-220: Well-structured dynamic SQL view creation.

The method effectively handles different experiment types and available columns to create optimized database views. The SQL construction appears safe since inputs are controlled internally.

Consider using parameterized queries or a SQL builder library for future enhancements to improve maintainability and safety:

# Example of safer approach for future consideration
def _build_charge_expression(self):
    """Build charge column expression based on available columns."""
    if self.has_precursor_charge and self.has_charge:
        return "COALESCE(PrecursorCharge, Charge)"
    # ... rest of logic

492-494: Replace print statement with proper logging.

Consistent with other parts of the code, use logging instead of print statements.

-                    print(
-                        f"Warning: SQL aggregation failed ({e}), falling back to pandas"
-                    )
+                    self.logger.warning(
+                        f"SQL aggregation failed ({e}), falling back to pandas"
+                    )

582-582: Replace print statement with proper logging.

Use logging instead of print for consistency with the rest of the codebase.

-            print(f"Filtering for protein: {protein_str}")
+            self.logger.info(f"Filtering for protein: {protein_str}")

685-698: Improve cleanup exception handling with logging.

While try/except/pass is acceptable in __del__ methods, adding logging can help with debugging resource cleanup issues.

     def __del__(self):
         """Cleanup database views and tables."""
         try:
             if hasattr(self, "_duckdb") and self._duckdb and self._optimized_setup_done:
                 self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats")
                 self._duckdb.execute("DROP TABLE IF EXISTS channel_mapping")
                 self._duckdb.execute("DROP TABLE IF EXISTS sample_mapping")
                 self._duckdb.execute("DROP TABLE IF EXISTS protein_groups")
                 self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats_with_pg")
             # Always call parent cleanup to close connection and remove database file
             if hasattr(self, "_duckdb") and self._duckdb:
                 self.destroy_duckdb_database()
-        except Exception:
-            pass
+        except Exception as e:
+            # Log cleanup failures but don't raise to avoid issues during garbage collection
+            try:
+                import logging
+                logging.getLogger(__name__).debug(f"Error during cleanup: {e}")
+            except:
+                pass  # If even logging fails, silently continue

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ce4f5e and a88aa3d.

📒 Files selected for processing (2)

quantmsio/core/quantms/msstats_in.py (1 hunks)
quantmsio/core/quantms/mztab.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.11.9)

quantmsio/core/quantms/mztab.py

255-255: Use a context manager for opening files

(SIM115)

258-258: Use a context manager for opening files

(SIM115)

414-414: Use a context manager for opening files

(SIM115)

478-478: Use a context manager for opening files

(SIM115)

495-495: Use kwargs.get("usecols") instead of kwargs.get("usecols", None)

Replace kwargs.get("usecols", None) with kwargs.get("usecols")

(SIM910)

quantmsio/core/quantms/msstats_in.py

89-89: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

115-115: Local variable mapping_df is assigned to but never used

Remove assignment to unused variable mapping_df

(F841)

129-129: Local variable sample_df is assigned to but never used

Remove assignment to unused variable sample_df

(F841)

🪛 GitHub Check: CodeFactor

quantmsio/core/quantms/mztab.py

[notice] 691-692: quantmsio/core/quantms/mztab.py#L691-L692
Try, Except, Continue detected. (B112)

quantmsio/core/quantms/msstats_in.py

[notice] 683-684: quantmsio/core/quantms/msstats_in.py#L683-L684
Try, Except, Pass detected. (B110)

🪛 Flake8 (7.2.0)

quantmsio/core/quantms/mztab.py

[error] 769-769: whitespace before ':'

(E203)

quantmsio/core/quantms/msstats_in.py

[error] 89-89: local variable 'e' is assigned to but never used

(F841)

[error] 115-115: local variable 'mapping_df' is assigned to but never used

(F841)

[error] 129-129: local variable 'sample_df' is assigned to but never used

(F841)

[error] 229-229: whitespace before ':'

(E203)

[error] 547-547: whitespace before ':'

(E203)

🪛 Pylint (3.3.7)

quantmsio/core/quantms/mztab.py

[refactor] 83-83: Too many instance attributes (22/7)

(R0902)

[refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 414-414: Consider using 'with' for resource-allocating operations

(R1732)

[refactor] 478-478: Consider using 'with' for resource-allocating operations

(R1732)

[refactor] 606-640: Too many nested blocks (7/5)

(R1702)

[refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

[refactor] 715-732: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

quantmsio/core/quantms/msstats_in.py

[refactor] 22-22: Consider using Python 3 style super() without arguments

(R1725)

[error] 70-70: Instance of 'MsstatsIN' has no 'logger' member

(E1101)

[error] 78-78: Instance of 'MsstatsIN' has no 'logger' member

(E1101)

[error] 272-272: Instance of 'MsstatsIN' has no 'logger' member

(E1101)

[error] 281-281: Instance of 'MsstatsIN' has no 'logger' member

(E1101)

[refactor] 419-420: Use 'yield from' directly instead of yielding each element one by one

(R1737)

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: Codacy Static Code Analysis
GitHub Check: unit-tests
GitHub Check: unit-tests (3.11)
GitHub Check: unit-tests (3.10)
GitHub Check: unit-tests

🔇 Additional comments (7)

quantmsio/core/quantms/mztab.py (6)

17-30: LGTM! Function logic is sound.

The modification list generation function correctly handles null values and processes modifications appropriately.

83-136: Well-structured initialization with good performance optimization.

The conditional DuckDB setup based on file size is a smart optimization. The high number of instance attributes is justified given the comprehensive functionality provided.

137-244: Excellent DuckDB integration with proper fallback handling.

The implementation provides good performance optimization with appropriate fallback mechanisms when DuckDB operations fail. The file size threshold and configuration settings are well-chosen.

246-312: Robust file handling with comprehensive gzip support.

The implementation correctly handles both regular and gzipped files with appropriate context management. The temporary file creation for seekable operations on gzipped files is a necessary optimization.

313-367: Efficient metadata parsing with proper caching.

The once-only parsing pattern with caching is well-implemented and prevents redundant processing. Good error handling ensures the system remains functional even with parsing issues.

519-773: Comprehensive and well-designed public API.

The methods provide excellent functionality with proper DuckDB optimization and pandas fallback. The streaming capabilities and efficient querying methods demonstrate thoughtful design for different use cases.

quantmsio/core/quantms/msstats_in.py (1)

235-261: Excellent SQL-based performance optimization.

The file statistics method effectively uses SQL aggregation for better performance compared to iterating through data in Python. The fallback mechanism ensures robustness.

coderabbitai · 2025-06-30T12:59:45Z

quantmsio/core/quantms/msstats_in.py

+                self.logger.warning(
+                    f"Failed to get column info via PRAGMA: {e}, trying fallback method"
+                )
+                try:
+                    sample_query = "SELECT * FROM report LIMIT 1"
+                    sample_data = self._duckdb.execute(sample_query).df()
+                    available_columns = set(sample_data.columns.tolist())
+                except Exception as e2:
+                    self.logger.warning(f"Fallback column detection also failed: {e2}")


⚠️ Potential issue

Fix missing logger attribute and unused variable.

The code references self.logger but the MsstatsIN class doesn't inherit from a class that provides a logger attribute, which will cause an AttributeError.

+import logging + class MsstatsIN(DuckDB): def __init__( self, report_path: Union[Path, str], sdrf_path: Union[Path, str], duckdb_max_memory="16GB", duckdb_threads=4, ): super().__init__(report_path, duckdb_max_memory, duckdb_threads) + self.logger = logging.getLogger(__name__) self._sdrf = SDRFHandler(sdrf_path)

Also, remove the unused exception variable:

- except Exception as e: + except Exception: # Fallback: get columns from a sample query self.logger.warning( - f"Failed to get column info via PRAGMA: {e}, trying fallback method" + "Failed to get column info via PRAGMA, trying fallback method" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.logger.warning(

f"Failed to get column info via PRAGMA: {e}, trying fallback method"

)

try:

sample_query = "SELECT * FROM report LIMIT 1"

sample_data = self._duckdb.execute(sample_query).df()

available_columns = set(sample_data.columns.tolist())

except Exception as e2:

self.logger.warning(f"Fallback column detection also failed: {e2}")

# At the top of quantmsio/core/quantms/msstats_in.py, add the logging import:

import logging

class MsstatsIN(DuckDB):

def __init__(

self,

report_path: Union[Path, str],

sdrf_path: Union[Path, str],

duckdb_max_memory="16GB",

duckdb_threads=4,

):

super().__init__(report_path, duckdb_max_memory, duckdb_threads)

# Initialize a logger for this class

self.logger = logging.getLogger(__name__)

self._sdrf = SDRFHandler(sdrf_path)

...

def _detect_columns(self):

try:

columns_query = "PRAGMA table_info('report')"

columns_info = self._duckdb.execute(columns_query).df()

available_columns = set(columns_info["name"].tolist())

except Exception:

# Fallback: get columns from a sample query

self.logger.warning(

"Failed to get column info via PRAGMA, trying fallback method"

)

try:

sample_query = "SELECT * FROM report LIMIT 1"

sample_data = self._duckdb.execute(sample_query).df()

available_columns = set(sample_data.columns.tolist())

except Exception as e2:

self.logger.warning(f"Fallback column detection also failed: {e2}")

self._available_columns = available_columns

...

🧰 Tools

🪛 Pylint (3.3.7)

[error] 70-70: Instance of 'MsstatsIN' has no 'logger' member

(E1101)

[error] 78-78: Instance of 'MsstatsIN' has no 'logger' member

(E1101)

🤖 Prompt for AI Agents

In quantmsio/core/quantms/msstats_in.py around lines 70 to 78, the code uses self.logger for logging but the MsstatsIN class lacks a logger attribute, causing an AttributeError. Add a logger attribute to the MsstatsIN class, for example by initializing it with a standard logging.getLogger instance. Also, remove the unused exception variable e2 in the fallback except block by omitting the variable name in the except clause or replacing it with an underscore.

coderabbitai

Actionable comments posted: 5

🔭 Outside diff range comments (1)

quantmsio/utils/pride_utils.py (1)

674-702: Critical bug: Unreachable code after early return.

The function has unreachable code after the early return on line 674. The main processing logic from lines 676-702 will never execute because the function exits early when modification_string is null/empty.

Apply this diff to fix the logic flow:

     if (
         modification_string is None
         or modification_string == "null"
         or modification_string == ""
         or pd.isna(modification_string)
     ):
         return peptide_sequence

-        modifications = get_modifications_object_from_mztab_line(
-            modification_string=modification_string,
-            modifications_definition=modifications_definition,
-        )
-
-        aa_index = 0
-        result_peptide: str = ""
-        peptide_sequence = list(peptide_sequence)
-        # Add n-term modification if it is present
-        for key_index, value_index in modifications.items():
-            if aa_index in value_index["position"]:
-                result_peptide = "[" + key_index + "]" + result_peptide
-        if len(result_peptide) > 0:
-            result_peptide = result_peptide + "-"
+    modifications = get_modifications_object_from_mztab_line(
+        modification_string=modification_string,
+        modifications_definition=modifications_definition,
+    )
+
+    aa_index = 0
+    result_peptide: str = ""
+    peptide_sequence = list(peptide_sequence)
+    # Add n-term modification if it is present
+    for key_index, value_index in modifications.items():
+        if aa_index in value_index["position"]:
+            result_peptide = "[" + key_index + "]" + result_peptide
+    if len(result_peptide) > 0:
+        result_peptide = result_peptide + "-"
+        
     aa_index += 1
     for aa in peptide_sequence:

♻️ Duplicate comments (4)

quantmsio/commands/convert/quantms_project.py (1)

80-103: Fix critical typo in parameter name.

There's a typo in the parameter name that will cause a runtime error when the method is called.
        project_handler.add_software_provider(
-            sortware_name="quantms", sortware_version=quantms_version
+            software_name="quantms", software_version=quantms_version
        )

quantmsio/core/quantms/msstats_in.py (1)

68-68: Replace bare except clause with specific exception handling

Bare except clauses can catch system exceptions like KeyboardInterrupt. Be more specific about expected exceptions.

             try:
                 columns_query = "PRAGMA table_info('report')"
                 columns_info = self._duckdb.execute(columns_query).df()
                 available_columns = set(columns_info["name"].tolist())
-            except Exception as e:
+            except (AttributeError, KeyError, ValueError) as e:
                 # Fallback: get columns from a sample query
                 self.logger.warning(
                     f"Failed to get column info via PRAGMA: {e}, trying fallback method"
                 )

quantmsio/core/quantms/mztab.py (2)

680-680: Replace bare except clauses with specific exception handling

Be specific about which exceptions to handle.

                     try:
                         count_result = self._duckdb.execute(
                             f"SELECT COUNT(*) as count FROM {table}"
                         ).fetchone()
                         if count_result:
                             stats["sections"][table.upper()] = {
                                 "row_count": count_result[0]
                             }
-                    except:
+                    except (duckdb.Error, KeyError):
                         stats["sections"][table.upper()] = {"row_count": 0}

                         section_name = {"PSH": "PSM", "PEH": "PEP", "PRH": "PRT"}[
                             header
                         ]
                         stats["sections"][section_name] = {"row_count": length}
                     except (ValueError, KeyError, IndexError) as e:
                         # Log specific errors for debugging but continue processing other sections
                         self.logger.debug(f"Failed to extract length for section {header}: {e}")
                         continue
-                    except Exception:
+                    except Exception as e:
+                        # Log unexpected errors but continue processing other sections
+                        self.logger.warning(f"Unexpected error extracting length for section {header}: {e}")
                         continue

Also applies to: 691-691

787-787: Replace bare except clauses in cleanup methods

Using bare except: can catch system exits and keyboard interrupts. Be more specific.

         if self._duckdb:
             try:
                 self._duckdb.close()
-            except:
-                pass
+            except Exception as e:
+                self.logger.warning(f"Failed to close DuckDB connection: {e}")
             self._duckdb = None

         if self._duckdb_name and Path(self._duckdb_name).exists():
             try:
                 os.remove(self._duckdb_name)
-            except:
-                pass
+            except (OSError, IOError) as e:
+                self.logger.warning(
+                    f"Failed to remove DuckDB file {self._duckdb_name}: {e}"
+                )
             self._duckdb_name = None

         # Clean up temporary decompressed file
         if self._temp_decompressed_file and self._temp_decompressed_file.exists():
             try:
                 self._temp_decompressed_file.unlink()
-            except:
-                pass
+            except (OSError, IOError) as e:
+                self.logger.warning(
+                    f"Failed to remove temporary file {self._temp_decompressed_file}: {e}"
+                )
             self._temp_decompressed_file = None

Also applies to: 794-794, 802-802

🧹 Nitpick comments (19)

quantmsio/commands/convert/quantms_project.py (5)

21-34: Improve robustness of project prefix extraction.

The get_project_prefix function makes assumptions about filename structure that could lead to incorrect results.

def get_project_prefix(sdrf_file: Path) -> str:
-    """Extract project prefix from SDRF filename (e.g. 'PXD000865' from 'PXD000865.sdrf.tsv')."""
+    """Extract project prefix from SDRF filename (e.g. 'PXD000865' from 'PXD000865.sdrf.tsv')."""
    filename = sdrf_file.name
-    # Remove .sdrf.tsv and any variations like _openms_design.sdrf.tsv
-    prefix = filename.split(".sdrf")[0].split("_openms")[0]
+    # Extract prefix before first occurrence of .sdrf or _openms
+    prefix = filename.split(".sdrf")[0].split("_openms")[0]
+    # Validate that we have a reasonable project accession format
+    if not prefix or len(prefix) < 3:
+        raise ValueError(f"Could not extract valid project prefix from filename: {filename}")
    return prefix

81-81: Fix indentation issues in function definitions.

Multiple function definitions have continuation line indentation issues that violate PEP 8 style guidelines.

-def _initialize_project(output_folder_path: Path, project_accession: str, sdrf_file: Path, 
-                       quantmsio_version: str, quantms_version: str):
+def _initialize_project(output_folder_path: Path, project_accession: str, sdrf_file: Path,
+                        quantmsio_version: str, quantms_version: str):

-def _convert_features(mztab_file: Path, sdrf_file: Path, output_folder_path: Path, 
-                     project_accession: str, generate_ibaq_view: bool) -> list:
+def _convert_features(mztab_file: Path, sdrf_file: Path, output_folder_path: Path,
+                      project_accession: str, generate_ibaq_view: bool) -> list:

-def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str, 
-                       output_folder_path: Path):
+def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str,
+                        output_folder_path: Path):

-def _convert_psms(mztab_file: Path, sdrf_file: Path, output_folder_path: Path, 
-                 project_accession: str) -> list:
+def _convert_psms(mztab_file: Path, sdrf_file: Path, output_folder_path: Path,
+                  project_accession: str) -> list:

-def _register_files_in_project(created_files: list, output_folder_path: Path, 
-                              project_accession: str):
+def _register_files_in_project(created_files: list, output_folder_path: Path,
+                               project_accession: str):

-    project_handler = _initialize_project(output_folder_path, project_accession, 
-                                        sdrf_file, quantmsio_version, quantms_version)
+    project_handler = _initialize_project(output_folder_path, project_accession,
+                                         sdrf_file, quantmsio_version, quantms_version)

-        feature_files = _convert_features(mztab_file, sdrf_file, output_folder_path, 
-                                        project_accession, generate_ibaq_view)
+        feature_files = _convert_features(mztab_file, sdrf_file, output_folder_path,
+                                         project_accession, generate_ibaq_view)

Also applies to: 107-107, 136-136, 149-149, 172-172, 226-226, 233-233

93-93: Remove unused variable assignments.

Variables are assigned but never used, creating unnecessary code.

-        # Save initial project file
-        project_json = str(output_folder_path / f"{project_accession}.project.json")
         project_handler.save_project_info(
             output_prefix_file=project_accession,
             output_folder=str(output_folder_path),
             delete_existing=True,
         )

-    # Initialize project
-    project_handler = _initialize_project(output_folder_path, project_accession,
-                                         sdrf_file, quantmsio_version, quantms_version)
+    # Initialize project
+    _initialize_project(output_folder_path, project_accession,
+                       sdrf_file, quantmsio_version, quantms_version)

Also applies to: 225-225

192-247: Consider reducing function complexity.

The quantmsio_workflow function has 6 parameters and manages the entire workflow orchestration. While it has been improved by extracting helper functions, it could benefit from further modularization.

Consider creating a QuantmsProjectConverter class to encapsulate the workflow state and reduce parameter passing:

class QuantmsProjectConverter:
    def __init__(self, base_folder: str, output_folder: str, project_accession: str,
                 quantms_version: str, quantmsio_version: str, generate_ibaq_view: bool):
        self.base_folder = base_folder
        self.output_folder = output_folder
        self.project_accession = project_accession
        self.quantms_version = quantms_version
        self.quantmsio_version = quantmsio_version
        self.generate_ibaq_view = generate_ibaq_view
        self.created_files = []
        
    def convert(self) -> None:
        """Execute the full conversion workflow."""
        self._setup_and_validate()
        self._initialize_project()
        self._convert_data()
        self._register_files()

304-306: Handle potential type mismatch in output directory assignment.

The output_dir parameter is Optional[Path] but gets assigned a string value, which could cause type checking issues.

    # Default output to sibling quantms.io directory
    if not output_dir:
-        output_dir = str(quantms_dir.parent / "quantms.io")
+        output_dir = quantms_dir.parent / "quantms.io"

quantmsio/utils/pride_utils.py (2)

516-519: Simplify optional field assignments using dict.get()

These if-else blocks can be simplified using dict.get() with a default value.

 def _set_optional_psm_fields(psm: dict, es: dict) -> None:
     """Set optional PSM fields with proper fallbacks."""
     # Handle decoy peptide information
-    if "opt_global_cv_MS:1002217_decoy_peptide" in es:
-        psm["is_decoy"] = es["opt_global_cv_MS:1002217_decoy_peptide"]
-    else:
-        psm["is_decoy"] = None
+    psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None)

     # Handle posterior error probability
     if "opt_global_Posterior_Error_Probability_score" in es:
         psm["posterior_error_probability"] = es["opt_global_Posterior_Error_Probability_score"]
     else:
         psm["posterior_error_probability"] = None

     # Handle global q-value
-    if "opt_global_q-value" in es:
-        psm["global_qvalue"] = es["opt_global_q-value"]
-    else:
-        psm["global_qvalue"] = None
+    psm["global_qvalue"] = es.get("opt_global_q-value", None)

     # Handle consensus support
-    if "opt_global_consensus_support" in es:
-        psm["consensus_support"] = es["opt_global_consensus_support"]
-    else:
-        psm["consensus_support"] = None
+    psm["consensus_support"] = es.get("opt_global_consensus_support", None)

Also applies to: 528-531, 534-537

513-538: Simplify field assignments using dict.get() method.

The static analysis tool correctly identifies opportunities to simplify the conditional assignments.

Apply this diff to simplify the field assignments:

-    # Handle decoy peptide information
-    if "opt_global_cv_MS:1002217_decoy_peptide" in es:
-        psm["is_decoy"] = es["opt_global_cv_MS:1002217_decoy_peptide"]
-    else:
-        psm["is_decoy"] = None
+    # Handle decoy peptide information
+    psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None)

-    # Handle global q-value
-    if "opt_global_q-value" in es:
-        psm["global_qvalue"] = es["opt_global_q-value"]
-    else:
-        psm["global_qvalue"] = None
+    # Handle global q-value
+    psm["global_qvalue"] = es.get("opt_global_q-value", None)

-    # Handle consensus support
-    if "opt_global_consensus_support" in es:
-        psm["consensus_support"] = es["opt_global_consensus_support"]
-    else:
-        psm["consensus_support"] = None
+    # Handle consensus support
+    psm["consensus_support"] = es.get("opt_global_consensus_support", None)

tests/test_msstats_in.py (3)

88-88: Remove unnecessary f-string prefixes

These strings don't contain any placeholders, so the f-string prefix is not needed.

-        print(f"\nLFQ test completed successfully!")
+        print("\nLFQ test completed successfully!")

-    print(f"\nTMT test completed successfully!")
+    print("\nTMT test completed successfully!")

-    print(f"\nDataset Comparison:")
+    print("\nDataset Comparison:")

Also applies to: 356-356, 428-428

220-223: Replace unused loop variable with underscore

The count variable is not used within the loop body.

     if intensities_channel_counts:
-        for channel, count in intensities_channel_counts.items():
+        for channel, _ in intensities_channel_counts.items():
             if channel not in channel_counts:
                 channel_counts[channel] = 0

264-270: Simplify dictionary membership test

Remove unnecessary .keys() when checking dictionary membership.

     all_channels = sorted(
         set(
             channel
             for file_channels in file_channel_matrix.values()
-            for channel in file_channels.keys()
+            for channel in file_channels
         )
     )

quantmsio/core/quantms/msstats_in.py (1)

416-422: Use 'yield from' for better performance

When yielding all elements from an iterator without modification, use yield from.

         if not self._optimized_setup_done:
             # Fallback to original method
-            for batch in self.generate_msstats_in(file_batch_size):
-                yield batch
+            yield from self.generate_msstats_in(file_batch_size)
             return

quantmsio/core/quantms/mztab.py (1)

414-414: Consider using context managers for file operations

These file operations should ideally use context managers to ensure proper resource cleanup.

For chunked reading operations, the current approach is necessary since the file handle needs to remain open across multiple chunk iterations. However, consider documenting this design choice:
         if "chunksize" in kwargs and kwargs["chunksize"] is not None:
             # For chunked reading, we need to keep the file handle open
+            # Note: Caller is responsible for closing the file handle when done iterating
             f = open(seekable_path, "r", encoding="utf-8")
Also applies to: 478-478

quantmsio/core/quantms/pg.py (7)

76-84: Simplify conditional structure by removing unnecessary elif.

The elif condition is unreachable after the break statement.

 for line in file:
     if line.startswith("PRH"):
         protein_columns = line.strip().split("\t")[1:]
         break
-    elif line.startswith("PRT\t") and not protein_columns:
+    if line.startswith("PRT\t") and not protein_columns:
         # Fallback: use first PRT line to determine column count
         prt_parts = line.strip().split("\t")
         # Generate default column names based on actual data
         protein_columns = [f"col_{i}" for i in range(len(prt_parts))]
         break

89-92: Remove unnecessary empty finally block.

The finally block serves no purpose here as file handles are already managed by the context manager.

 except Exception as e:
     logging.getLogger("quantmsio.core.mztab").warning(
         f"Could not extract protein columns: {e}"
     )
-finally:
-    # File handle is automatically tracked and will be cleaned up
-    pass
 return protein_columns

194-197: Simplify conditional by removing unnecessary elif.

 if len(parts) >= 3:
     return parts[2]  # PROT_HUMAN
-elif len(parts) >= 2:
+if len(parts) >= 2:
     return parts[1]  # P12345

283-289: Remove unnecessary else after return.

 def _convert_to_parquet_format(self, df: pd.DataFrame) -> pa.Table:
     """Convert DataFrame to parquet format using PG_SCHEMA."""
     if df.empty:
         # For empty DataFrames, create an empty table with the correct schema
         return pa.Table.from_arrays(
             [pa.array([], type=field.type) for field in PG_SCHEMA], schema=PG_SCHEMA
         )
-    else:
-        return pa.Table.from_pandas(df, schema=PG_SCHEMA, preserve_index=False)
+    return pa.Table.from_pandas(df, schema=PG_SCHEMA, preserve_index=False)

437-440: Document the cleanup behavior after context manager exit.

The cleanup() call happens after the MsstatsIN context manager has already exited. Consider adding a comment to clarify this is intentional.

 # Context manager automatically cleans up DuckDB resources
 # Cleanup any temporary files created during processing
+# Note: This cleanup is for MzTabProteinGroups resources, not MsstatsIN
 self.cleanup()

577-577: Remove unnecessary return statement.

Remove the empty return statement as Python functions return None by default.

589-589: Fix slice formatting.

-yield unique_files[i : i + batch_size]
+yield unique_files[i:i + batch_size]

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a88aa3d and 0ff3ab3.

📒 Files selected for processing (6)

quantmsio/commands/convert/quantms_project.py (1 hunks)
quantmsio/core/quantms/msstats_in.py (1 hunks)
quantmsio/core/quantms/mztab.py (1 hunks)
quantmsio/core/quantms/pg.py (1 hunks)
quantmsio/utils/pride_utils.py (4 hunks)
tests/test_msstats_in.py (1 hunks)

🧰 Additional context used

🪛 GitHub Check: CodeFactor

tests/test_msstats_in.py

[notice] 187-231: tests/test_msstats_in.py#L187-L231
Complex Method

🪛 Ruff (0.11.9)

tests/test_msstats_in.py

88-88: f-string without any placeholders

Remove extraneous f prefix

(F541)

220-220: Loop control variable count not used within loop body

(B007)

268-268: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

356-356: f-string without any placeholders

Remove extraneous f prefix

(F541)

428-428: f-string without any placeholders

Remove extraneous f prefix

(F541)

quantmsio/commands/convert/quantms_project.py

93-93: Local variable project_json is assigned to but never used

Remove assignment to unused variable project_json

(F841)

225-225: Local variable project_handler is assigned to but never used

Remove assignment to unused variable project_handler

(F841)

quantmsio/core/quantms/msstats_in.py

89-89: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

115-115: Local variable mapping_df is assigned to but never used

Remove assignment to unused variable mapping_df

(F841)

129-129: Local variable sample_df is assigned to but never used

Remove assignment to unused variable sample_df

(F841)

quantmsio/core/quantms/mztab.py

255-255: Use a context manager for opening files

(SIM115)

258-258: Use a context manager for opening files

(SIM115)

414-414: Use a context manager for opening files

(SIM115)

478-478: Use a context manager for opening files

(SIM115)

495-495: Use kwargs.get("usecols") instead of kwargs.get("usecols", None)

Replace kwargs.get("usecols", None) with kwargs.get("usecols")

(SIM910)

quantmsio/core/quantms/pg.py

243-243: Use a context manager for opening files

(SIM115)

248-248: Use a context manager for opening files

(SIM115)

525-525: Local variable protein_df is assigned to but never used

Remove assignment to unused variable protein_df

(F841)

quantmsio/utils/pride_utils.py

516-519: Use psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None) instead of an if block

Replace with psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None)

(SIM401)

528-531: Use psm["global_qvalue"] = es.get("opt_global_q-value", None) instead of an if block

Replace with psm["global_qvalue"] = es.get("opt_global_q-value", None)

(SIM401)

534-537: Use psm["consensus_support"] = es.get("opt_global_consensus_support", None) instead of an if block

Replace with psm["consensus_support"] = es.get("opt_global_consensus_support", None)

(SIM401)

🪛 Flake8 (7.2.0)

tests/test_msstats_in.py

[error] 88-88: f-string is missing placeholders

(F541)

[error] 356-356: f-string is missing placeholders

(F541)

[error] 428-428: f-string is missing placeholders

(F541)

quantmsio/commands/convert/quantms_project.py

[error] 81-81: continuation line under-indented for visual indent

(E128)

[error] 93-93: local variable 'project_json' is assigned to but never used

(F841)

[error] 107-107: continuation line under-indented for visual indent

(E128)

[error] 136-136: continuation line under-indented for visual indent

(E128)

[error] 149-149: continuation line under-indented for visual indent

(E128)

[error] 172-172: continuation line under-indented for visual indent

(E128)

[error] 225-225: local variable 'project_handler' is assigned to but never used

(F841)

[error] 226-226: continuation line under-indented for visual indent

(E128)

[error] 233-233: continuation line under-indented for visual indent

(E128)

quantmsio/core/quantms/msstats_in.py

[error] 89-89: local variable 'e' is assigned to but never used

(F841)

[error] 115-115: local variable 'mapping_df' is assigned to but never used

(F841)

[error] 129-129: local variable 'sample_df' is assigned to but never used

(F841)

[error] 229-229: whitespace before ':'

(E203)

[error] 547-547: whitespace before ':'

(E203)

quantmsio/core/quantms/mztab.py

[error] 775-775: whitespace before ':'

(E203)

quantmsio/core/quantms/pg.py

[error] 525-525: local variable 'protein_df' is assigned to but never used

(F841)

[error] 589-589: whitespace before ':'

(E203)

🪛 Pylint (3.3.7)

tests/test_msstats_in.py

[refactor] 187-187: Too many branches (15/12)

(R0912)

quantmsio/commands/convert/quantms_project.py

[refactor] 192-192: Too many arguments (6/5)

(R0913)

[refactor] 192-192: Too many positional arguments (6/5)

(R0917)

[refactor] 192-192: Too many local variables (16/15)

(R0914)

[refactor] 289-289: Too many arguments (6/5)

(R0913)

[refactor] 289-289: Too many positional arguments (6/5)

(R0917)

quantmsio/core/quantms/msstats_in.py

[refactor] 22-22: Consider using Python 3 style super() without arguments

(R1725)

[error] 70-70: Instance of 'MsstatsIN' has no 'logger' member

(E1101)

[error] 78-78: Instance of 'MsstatsIN' has no 'logger' member

(E1101)

[error] 272-272: Instance of 'MsstatsIN' has no 'logger' member

(E1101)

[error] 281-281: Instance of 'MsstatsIN' has no 'logger' member

(E1101)

[refactor] 419-420: Use 'yield from' directly instead of yielding each element one by one

(R1737)

quantmsio/core/quantms/mztab.py

[refactor] 83-83: Too many instance attributes (22/7)

(R0902)

[refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 414-414: Consider using 'with' for resource-allocating operations

(R1732)

[refactor] 478-478: Consider using 'with' for resource-allocating operations

(R1732)

[refactor] 606-640: Too many nested blocks (7/5)

(R1702)

[refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

[refactor] 721-738: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

quantmsio/core/quantms/pg.py

[error] 75-75: Non-iterable value file is used in an iterating context

(E1133)

[refactor] 76-84: Unnecessary "elif" after "break", remove the leading "el" from "elif"

(R1723)

[error] 112-112: Non-iterable value file is used in an iterating context

(E1133)

[refactor] 94-94: Too many branches (13/12)

(R0912)

[refactor] 194-197: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

[refactor] 283-289: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

[refactor] 291-291: Too many arguments (9/5)

(R0913)

[refactor] 291-291: Too many positional arguments (9/5)

(R0917)

[refactor] 291-291: Too many local variables (35/15)

(R0914)

[refactor] 291-291: Too many statements (55/50)

(R0915)

[refactor] 577-577: Useless return at end of function or method

(R1711)

[refactor] 633-633: Too many arguments (7/5)

(R0913)

[refactor] 633-633: Too many positional arguments (7/5)

(R0917)

[refactor] 633-633: Too many local variables (23/15)

(R0914)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: unit-tests
GitHub Check: unit-tests
GitHub Check: unit-tests (3.10)
GitHub Check: unit-tests (3.11)

🔇 Additional comments (10)

quantmsio/commands/convert/quantms_project.py (1)

1-4: LGTM!

The module docstring is clear and concise.

quantmsio/utils/pride_utils.py (7)

46-46: Good practice using raw strings for regex patterns!

Using raw strings for regex patterns prevents potential issues with escape sequences and makes the regex patterns more readable.

Also applies to: 52-52

482-511: Excellent refactoring into focused helper functions!

Breaking down the fetch_psm_from_mztab_line function into smaller, focused helper functions improves readability and maintainability. Each helper function has a clear single responsibility.

46-46: Good fix: Using raw strings for regex patterns.

Converting regex patterns to raw strings prevents potential escaping issues and follows Python best practices for regex patterns.

Also applies to: 52-52

482-511: Well-designed helper function with good key management.

The function effectively handles optional keys by removing them from the required list when they don't exist in the data, preventing KeyError exceptions.

540-552: Helper function provides clear separation of concerns.

The peptidoform field logic is properly encapsulated and handles both explicit and derived peptidoform sequences correctly.

554-571: Improved error handling and null safety.

The function properly handles the case where ms_runs is None and provides clear error messages when ms_run is not found in the index.

573-600: Excellent refactoring that improves maintainability.

The decomposition of the monolithic function into focused helper functions significantly improves code readability and testability while maintaining the same functionality.

quantmsio/core/quantms/mztab.py (1)

83-819: Well-designed mzTab parser with comprehensive features!

The implementation provides excellent functionality including:

Support for both regular and gzipped files

DuckDB integration for performance

Proper resource management with context managers

Efficient section extraction and streaming

Good error handling and logging

The class structure is complex but justified by the feature set.

quantmsio/core/quantms/pg.py (1)

622-622: Verify file names are safe for SQL queries.

Ensure that file_batch contains properly sanitized file names to prevent any potential SQL injection issues.

The file names come from the msstats data which should be trusted, but it's good practice to validate inputs used in SQL queries.

quantmsio/commands/convert/quantms_project.py

coderabbitai · 2025-06-30T14:01:53Z

quantmsio/commands/convert/quantms_project.py

+        if generate_ibaq_view:
+            _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)


🛠️ Refactor suggestion

Add created IBAQ file to tracking list.

The IBAQ file generation doesn't add the created file to the tracking list, which means it won't be registered in the project metadata.

# Generate IBAQ view if requested if generate_ibaq_view: - _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path) + ibaq_file = _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path) + if ibaq_file: + created_files.append(("ibaq-file", str(ibaq_file)))

Update _generate_ibaq_view to return the created file path:

-def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str, - output_folder_path: Path): +def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str, + output_folder_path: Path) -> Optional[Path]: """Generate IBAQ view from feature data.""" print("\n=== Generating IBAQ View ===") try: ibaq_file = create_uuid_filename(project_accession, ".ibaq.parquet") ibaq_path = output_folder_path / ibaq_file write_ibaq_feature(str(sdrf_file), str(feature_file), str(ibaq_path)) print("IBAQ view generation completed successfully") + return ibaq_path except Exception as e: print(f"ERROR: IBAQ view generation failed: {str(e)}", file=sys.stderr) + return None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if generate_ibaq_view:

_generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)

# Generate IBAQ view if requested

if generate_ibaq_view:

- _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)

+ ibaq_file = _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)

+ if ibaq_file:

+ created_files.append(("ibaq-file", str(ibaq_file)))

Suggested change

if generate_ibaq_view:

_generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)

-def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str,

- output_folder_path: Path):

+def _generate_ibaq_view(sdrf_file: Path,

+ feature_file: Path,

+ project_accession: str,

+ output_folder_path: Path) -> Optional[Path]:

"""Generate IBAQ view from feature data."""

print("\n=== Generating IBAQ View ===")

try:

ibaq_file = create_uuid_filename(project_accession, ".ibaq.parquet")

ibaq_path = output_folder_path / ibaq_file

write_ibaq_feature(str(sdrf_file), str(feature_file), str(ibaq_path))

print("IBAQ view generation completed successfully")

+ return ibaq_path

except Exception as e:

print(f"ERROR: IBAQ view generation failed: {str(e)}", file=sys.stderr)

+ return None

🤖 Prompt for AI Agents

In quantmsio/commands/convert/quantms_project.py around lines 127 to 128, the code calls _generate_ibaq_view but does not add the created IBAQ file to the tracking list, so it is not registered in the project metadata. Modify _generate_ibaq_view to return the path of the created IBAQ file, then capture this return value here and append it to the tracking list that manages project files to ensure proper registration.

coderabbitai · 2025-06-30T14:01:53Z

quantmsio/core/quantms/msstats_in.py

+
+            self._available_columns = available_columns
+
+        except Exception as e:


🛠️ Refactor suggestion

Remove unused variable assignments

These variables are assigned but never used.

- except Exception as e: + except Exception: # Default column set based on common msstats format self._available_columns = {

# Insert into DuckDB using pandas DataFrame mapping_df = pd.DataFrame(mapping_data) - self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df") + self._duckdb.register("mapping_df", mapping_df) + self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df") + self._duckdb.unregister("mapping_df")

# Insert into DuckDB using pandas DataFrame sample_df = pd.DataFrame(sample_data) - self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM sample_df") + self._duckdb.register("sample_df", sample_df) + self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM sample_df") + self._duckdb.unregister("sample_df")

Also applies to: 115-115, 129-129

🧰 Tools

🪛 Flake8 (7.2.0)

[error] 89-89: local variable 'e' is assigned to but never used

(F841)

🪛 Ruff (0.11.9)

89-89: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🤖 Prompt for AI Agents

In quantmsio/core/quantms/msstats_in.py at lines 89, 115, and 129, the exception handlers assign the caught exception to a variable (e.g., 'except Exception as e:') but never use this variable. Remove the variable assignment by changing these lines to 'except Exception:' to avoid unused variable warnings and clean up the code.

coderabbitai · 2025-06-30T14:01:54Z

quantmsio/core/quantms/msstats_in.py

+class MsstatsIN(DuckDB):
+    def __init__(
+        self,
+        report_path: Union[Path, str],
+        sdrf_path: Union[Path, str],
+        duckdb_max_memory="16GB",
+        duckdb_threads=4,
+    ):
+        super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads)
+        self._sdrf = SDRFHandler(sdrf_path)
+        self.experiment_type = self._sdrf.get_experiment_type_from_sdrf()
+        self._sample_map = self._sdrf.get_sample_map_run()
+        self._optimized_setup_done = False
+


⚠️ Potential issue

Add missing logger attribute to the class

The class uses self.logger in multiple places but doesn't initialize it. This will cause AttributeError at runtime.

class MsstatsIN(DuckDB): def __init__( self, report_path: Union[Path, str], sdrf_path: Union[Path, str], duckdb_max_memory="16GB", duckdb_threads=4, ): super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads) + self.logger = logging.getLogger(__name__) self._sdrf = SDRFHandler(sdrf_path) self.experiment_type = self._sdrf.get_experiment_type_from_sdrf() self._sample_map = self._sdrf.get_sample_map_run() self._optimized_setup_done = False

Don't forget to import logging at the top of the file:

import logging

🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 22-22: Consider using Python 3 style super() without arguments

(R1725)

🤖 Prompt for AI Agents

In quantmsio/core/quantms/msstats_in.py around lines 14 to 27, the MsstatsIN class uses self.logger but does not initialize it, which will cause an AttributeError at runtime. To fix this, import the logging module at the top of the file and add a logger attribute initialization in the __init__ method, for example by setting self.logger = logging.getLogger(__name__).

quantmsio/core/quantms/pg.py

ypriverol added 30 commits June 2, 2025 16:05

Minor changes in the command naming

a678e71

Merge branch 'main' of https://github.com/bigbio/quantms.io into dev

bf4e7e4

remove test logging.

38259c7

remove logging example

ed945bb

remove logging example

27a4c18

remove logging example

f816396

remove logging example

e9b7fdc

remove logging example

ebfb804

fix small error with regex

a325e34

logger fixed

dd5f83b

minor changes in logger

a7ba3a1

minor changes

e799da3

improving logger

18f47bd

minor changes, continue logging

3a13d59

minor changes

e55eb9b

update from main

ad4b891

change name parameters.

d66489c

minor fix

3d5de99

major changes in the folder structure on quantms

4cbdbbe

black applied to all files.

b3e3d40

minor changes and fixes

82eff25

minor changes to execute quantms

491ed3a

minor changes

08c61ae

minor changes

5181fa4

minor changes

9d15bc7

all commands changed

4bd42ce

black applied

d11b9a0

ibaq change

e747948

minor changes in commands

4f29c50

minor changes

d4c4e1c

ypriverol temporarily deployed to github-pages June 30, 2025 10:37 — with GitHub Pages Inactive

ypriverol temporarily deployed to github-pages June 30, 2025 10:37 — with GitHub Actions Inactive

coderabbitai bot reviewed Jun 30, 2025

View reviewed changes

changes clenaning code.

be5e432

ypriverol temporarily deployed to github-pages June 30, 2025 12:44 — with GitHub Pages Inactive

ypriverol temporarily deployed to github-pages June 30, 2025 12:44 — with GitHub Actions Inactive

minor changes

8ce4f5e

ypriverol temporarily deployed to github-pages June 30, 2025 12:46 — with GitHub Pages Inactive

ypriverol temporarily deployed to github-pages June 30, 2025 12:46 — with GitHub Actions Inactive

coderabbitai bot reviewed Jun 30, 2025

View reviewed changes

black applied.

a88aa3d

ypriverol temporarily deployed to github-pages June 30, 2025 12:55 — with GitHub Pages Inactive

ypriverol temporarily deployed to github-pages June 30, 2025 12:56 — with GitHub Actions Inactive

coderabbitai bot reviewed Jun 30, 2025

View reviewed changes

cleaning code

de66d9a

ypriverol temporarily deployed to github-pages June 30, 2025 13:08 — with GitHub Actions Inactive

ypriverol temporarily deployed to github-pages June 30, 2025 13:08 — with GitHub Pages Inactive

minor changes

e29cd81

ypriverol temporarily deployed to github-pages June 30, 2025 13:11 — with GitHub Pages Inactive

ypriverol temporarily deployed to github-pages June 30, 2025 13:12 — with GitHub Actions Inactive

making code less complex

0ff3ab3

ypriverol temporarily deployed to github-pages June 30, 2025 13:54 — with GitHub Pages Inactive

ypriverol temporarily deployed to github-pages June 30, 2025 13:54 — with GitHub Actions Inactive

black and isort applied.

e83a39c

coderabbitai bot reviewed Jun 30, 2025

View reviewed changes

ypriverol temporarily deployed to github-pages June 30, 2025 14:02 — with GitHub Actions Inactive

ypriverol temporarily deployed to github-pages June 30, 2025 14:02 — with GitHub Pages Inactive

black applied

c2d9442

ypriverol temporarily deployed to github-pages June 30, 2025 14:03 — with GitHub Pages Inactive

ypriverol deployed to github-pages June 30, 2025 14:03 — with GitHub Actions View deployment

	sortware_name="quantms", sortware_version=quantms_version
	software_name="quantms", software_version=quantms_version

		if generate_ibaq_view:
		_generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)


		self._available_columns = available_columns

		except Exception as e:

Major changes in the documentation and schemas created #99

Are you sure you want to change the base?

Major changes in the documentation and schemas created #99

Uh oh!

Conversation

ypriverol commented Jun 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Changes walkthrough 📝

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ypriverol commented Jun 24, 2025 •

edited by coderabbitai bot

Loading