-
Notifications
You must be signed in to change notification settings - Fork 4
Major changes in the documentation and schemas created #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 23
♻️ Duplicate comments (7)
quantmsio/commands/utils/project.py (1)
77-79
: Fix typos in parameter names.The parameter names have typos:
sortware_name
andsortware_version
should besoftware_name
andsoftware_version
.- project_handler.add_software_provider( - sortware_name=software_name, sortware_version=software_version - ) + project_handler.add_software_provider( + software_name=software_name, software_version=software_version + )quantmsio/commands/transform/de.py (1)
67-68
: Improve exception chaining.Use
raise ... from e
to preserve the original exception context for better debugging.- raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") + raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from equantmsio/commands/convert/maxquant.py (3)
98-99
: Fix exception chaining to preserve error context.When re-raising exceptions, use
from e
to preserve the error chain for better debugging.- raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") + raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
217-218
: Fix exception chaining to preserve error context.Same issue as above - use proper exception chaining.
- raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") + raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
319-320
: Fix exception chaining to preserve error context.Same issue as above - use proper exception chaining.
- raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") + raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from equantmsio/commands/convert/diann.py (2)
91-93
: Fix exception chaining to preserve error context.When re-raising exceptions, use
from e
to preserve the error chain for better debugging.- raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") + raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
144-146
: Fix exception chaining to preserve error context.Same issue as above - use proper exception chaining.
- raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") + raise click.ClickException(f"Error: {str(e)}\nCheck the logs for more details.") from e
🧹 Nitpick comments (33)
quantmsio/core/quantms/psm.py (1)
1-3
: Remove unused import and consider the logging addition.The
os
import on line 2 is unused and should be removed. The addition oflogging
import aligns well with the PR's enhanced logging system objectives.import logging -import os import re
quantmsio/commands/transform/uniprot.py (2)
50-51
: Redundant parameter validation.The manual validation of required parameters is unnecessary since click's
required=True
already handles this validation and will raise appropriate errors.- if not all([feature_file, fasta, output_folder]): - raise click.UsageError("Please provide all required parameters") -
56-58
: Remove unused variable assignment.The
output_path
variable is assigned but never used. The filename is passed directly to the core function.filename = create_uuid_filename(output_prefix, ".feature.parquet") - output_path = output_folder / filename map_peptide_to_protein(str(feature_file), str(fasta), str(output_folder), filename)
README.md (2)
114-114
: Minor punctuation formatting issue.There's a loose punctuation mark in the environment variables list formatting.
-The package can be configured using environment variables: +The package can be configured using environment variables:
132-144
: Add language specification to code block.The project structure code block should specify a language for better rendering and compliance with markdown standards.
-``` +```text quantmsio/ ├── __init__.py ├── quantmsioc.py # CLI entry point.github/workflows/python-package.yml (1)
72-72
: Add missing newline at end of file.The file should end with a newline character for better POSIX compliance.
env: - PYTHONIOENCODING: utf-8 + PYTHONIOENCODING: utf-8 +quantmsio/core/quantms/feature.py (1)
61-62
: Use 'yield from' for cleaner generator delegation.The current implementation yields each element individually. Using
yield from
is more efficient and cleaner.- for msstats in msstats_in.generate_msstats_in(file_num, protein_str): - yield msstats + yield from msstats_in.generate_msstats_in(file_num, protein_str).github/workflows/integration-tests.yml (1)
23-23
: Remove trailing spaces and add missing newline.The file has trailing spaces on multiple lines and is missing a newline at the end.
Remove trailing spaces from all affected lines and add a newline at the end of the file:
timeout-minutes: 60 # 1 hour timeout for the entire job - + steps: - uses: actions/checkout@v4 - + - name: Set up Python 3.10 uses: actions/setup-python@v5 with: python-version: "3.10" - + # Install package in development mode poetry install - + - name: Run Full Conversion Suite Tests if: ${{ github.event.inputs.test_type == 'all' || github.event.inputs.test_type == 'full-conversion-suite' || github.event_name == 'release' }} run: | poetry run pytest -vv tests/test_full_conversion_suite.py --timeout=900 env: PYTHONIOENCODING: utf-8 - + - name: Run LFQ Tests Only if: ${{ github.event.inputs.test_type == 'lfq-only' }} run: | poetry run pytest -vv tests/test_full_conversion_suite.py -k "lfq" --timeout=900 env: PYTHONIOENCODING: utf-8 - + - - name: Run TMT Tests Only + - name: Run TMT Tests Only if: ${{ github.event.inputs.test_type == 'tmt-only' }} run: | poetry run pytest -vv tests/test_full_conversion_suite.py -k "tmt" --timeout=900 env: PYTHONIOENCODING: utf-8 - + - name: Run All Integration Tests if: ${{ github.event.inputs.test_type == 'all' || github.event_name == 'release' }} run: | poetry run pytest -vv -m "integration" --timeout=900 env: - PYTHONIOENCODING: utf-8 + PYTHONIOENCODING: utf-8 +Also applies to: 26-26, 31-31, 40-40, 47-47, 54-54, 55-55, 61-61, 67-67
quantmsio/commands/utils/project.py (1)
68-69
: Remove redundant parameter validation.The parameter validation is unnecessary since Click already enforces required parameters with
required=True
.- if not all([project_accession, sdrf_file, output_folder]): - raise click.UsageError("Please provide all required parameters") -quantmsio/commands/transform/de.py (1)
16-26
: Consider reducing function parameters.The function has 9 parameters, which exceeds the recommended limit. Consider grouping related parameters into a configuration object or dataclass.
from dataclasses import dataclass from typing import Optional from pathlib import Path @dataclass class DifferentialConversionConfig: msstats_file: Path sdrf_file: Path project_file: Optional[Path] = None protein_file: Optional[Path] = None fdr_threshold: float = 0.05 output_folder: Path = None output_prefix: Optional[str] = None delete_existing: bool = True verbose: bool = False def convert_msstats_differential(config: DifferentialConversionConfig) -> None: # Function implementation using config.attribute_namequantmsio/commands/utils/stats.py (1)
9-9
: Address pylint suggestion for dictionary literal.The static analysis tool suggests using dictionary literal syntax instead of
dict()
call.-CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"]) +CONTEXT_SETTINGS = {"help_option_names": ["-h", "--help"]}.github/workflows/python-app.yml (2)
35-35
: Fix YAML indentation consistency.The YAML file has inconsistent indentation. Steps should be indented by 6 spaces, not 4.
steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@v4Apply this indentation fix to all three job definitions.
Also applies to: 64-64, 88-88
106-106
: Add missing newline at end of file.The file is missing a newline character at the end, which is a common YAML best practice.
poetry run pytest -vv --timeout=900 env: PYTHONIOENCODING: utf-8 +
tests/test_full_conversion_suite.py (3)
51-56
: Address unused loop variables.The static analysis correctly identifies unused loop variables that should be renamed with underscore prefix.
- for dataset_name, files in [("LFQ", lfq_files), ("TMT", tmt_files)]: - for file_type, file_path in files.items(): + for _dataset_name, files in [("LFQ", lfq_files), ("TMT", tmt_files)]: + for _file_type, file_path in files.items():
93-93
: Fix f-string without placeholders.The f-string on line 93 doesn't contain any placeholders and should be a regular string.
- print(f"[TIMEOUT] Timeout after 10 minutes") + print("[TIMEOUT] Timeout after 10 minutes")
84-94
: Simplify conditional return structure.The else clause after return is unnecessary and can be simplified.
if result.returncode == 0: print(f"[OK] Success in {duration:.2f}s") return True, duration, result.stdout, result.stderr - else: - print(f"[FAIL] Failed in {duration:.2f}s") - print(f"Error: {result.stderr}") - return False, duration, result.stdout, result.stderr + + print(f"[FAIL] Failed in {duration:.2f}s") + print(f"Error: {result.stderr}") + return False, duration, result.stdout, result.stderrquantmsio/commands/convert/quantms.py (1)
191-201
: Consider grouping related parameters for better maintainability.The function has 9 parameters, which is acceptable for a CLI command but could become harder to maintain as it grows. Consider grouping related parameters into configuration objects if more options are added in the future.
For future consideration, you could group related parameters:
- Input parameters:
input_file
,msstats_file
,sdrf_file
- Output parameters:
output_folder
,output_prefix
- Processing options:
compute_topn
,compute_ibaq
,topn
- General options:
verbose
This would make the function signature cleaner and more extensible.
quantmsio/commands/utils/plot.py (1)
14-14
: Use dict literal instead of dict() call.For better readability and slight performance improvement, use a dict literal.
-CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"]) +CONTEXT_SETTINGS = {"help_option_names": ["-h", "--help"]}docs/IMPLEMENTATION_DETAILS.md (3)
73-73
: Use en dash for number ranges.For better typography, use an en dash (–) instead of a hyphen (-) when specifying ranges.
-- **Before:** 7+ minutes for large datasets -- **After:** 10-45 seconds for the same datasets +- **Before:** 7+ minutes for large datasets +- **After:** 10–45 seconds for the same datasets
177-177
: Add language specifiers to fenced code blocks.Several code blocks are missing language specifiers, which helps with syntax highlighting.
For the code blocks starting at lines 177, 271, 281, and 291, add language specifiers:
Line 177:
-``` +```textLines 271, 281, 291:
-``` +```textThis improves readability with proper syntax highlighting.
Also applies to: 271-271, 281-281, 291-291
270-270
: Remove trailing punctuation from headings.Markdown headings should not have trailing punctuation according to common style guides.
-#### Input (Raw msstats data): +#### Input (Raw msstats data) -#### After Channel Mapping: +#### After Channel Mapping -#### After Aggregation (Final output): +#### After Aggregation (Final output)Also applies to: 280-280, 290-290
tests/test_msstats_in.py (4)
88-88
: Remove unnecessary f-string prefixes.These strings don't contain any placeholders, so the f-prefix is not needed.
- print(f"\nLFQ test completed successfully!") + print("\nLFQ test completed successfully!") ... - print(f"\nTMT test completed successfully!") + print("\nTMT test completed successfully!") ... - print(f"\nDataset Comparison:") + print("\nDataset Comparison:")Also applies to: 331-331, 403-403
236-240
: Remove unused loop variable.The
count
variable is not used within the loop body.- for channel, count in intensities_channel_counts.items(): + for channel, _ in intensities_channel_counts.items(): if channel not in channel_counts: channel_counts[channel] = 0
251-254
: Remove unused variable assignment.The
file_channel_counts
variable is assigned but never used.- # Original counting from channel column - file_channel_counts = msstats_batch.groupby( - ["reference_file_name", "channel"] - ).size() - # Note: Not using this since it only shows surviving channels after deduplication + # Note: Not using groupby counting since it only shows surviving channels after deduplication
282-284
: Simplify dictionary key iteration.Remove unnecessary
.keys()
when iterating over dictionary keys.- for channel in file_channels.keys() + for channel in file_channelsquantmsio/core/quantms/mztab.py (2)
495-495
: Simplify dictionary get() call.The default
None
is redundant when usingdict.get()
.- columns = kwargs.get("usecols", None) + columns = kwargs.get("usecols")
83-126
: Consider reducing instance attributes.The class has 22 instance attributes, exceeding the recommended limit of 7. Consider grouping related attributes into data classes.
Would you like me to help refactor this by grouping related attributes? For example:
- Position tracking attributes into a
PositionTracker
class- DuckDB-related attributes into a
DuckDBConfig
class- Metadata cache into a
MetadataCache
classquantmsio/core/quantms/msstats_in.py (3)
14-27
: Consider using Python 3 style super() and adding type hintsThe initialization is well-structured, but could benefit from modern Python conventions and type hints for better code clarity.
class MsstatsIN(DuckDB): + """MSstats input data processor using DuckDB for efficient data handling. + + Supports both LFQ and isobaric labeling (TMT/iTRAQ) experiments. + """ + def __init__( self, report_path: Union[Path, str], sdrf_path: Union[Path, str], - duckdb_max_memory="16GB", - duckdb_threads=4, - ): - super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads) + duckdb_max_memory: str = "16GB", + duckdb_threads: int = 4, + ) -> None: + super().__init__(report_path, duckdb_max_memory, duckdb_threads) self._sdrf = SDRFHandler(sdrf_path) self.experiment_type = self._sdrf.get_experiment_type_from_sdrf() self._sample_map = self._sdrf.get_sample_map_run() self._optimized_setup_done = False
219-219
: Fix whitespace before colon in slicing operationsPEP 8 recommends no whitespace before colons in slicing operations.
- references[i : i + file_num] for i in range(0, len(references), file_num) + references[i:i + file_num] for i in range(0, len(references), file_num)- yield file_list[i : i + batch_size] + yield file_list[i:i + batch_size]Also applies to: 533-533
671-685
: Consider relying on context manager for cleanup instead of delThe
__del__
method is not guaranteed to be called and can cause issues with garbage collection. Since you already have context manager support, consider emphasizing its use in documentation.def __del__(self): """Cleanup database views and tables.""" + # Note: __del__ is not guaranteed to be called. Prefer using context manager. try: if hasattr(self, "_duckdb") and self._duckdb and self._optimized_setup_done: self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats") self._duckdb.execute("DROP TABLE IF EXISTS channel_mapping") self._duckdb.execute("DROP TABLE IF EXISTS sample_mapping") self._duckdb.execute("DROP TABLE IF EXISTS protein_groups") self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats_with_pg") # Always call parent cleanup to close connection and remove database file if hasattr(self, "_duckdb") and self._duckdb: self.destroy_duckdb_database() except: - pass + pass # Suppress exceptions in destructor to prevent issues during GCquantmsio/core/quantms/pg.py (3)
60-66
: Improve exception handling in destructorUsing bare except in del is acceptable but could be improved with contextlib.suppress.
+from contextlib import suppress + def __del__(self): """Destructor to ensure cleanup.""" - try: + with suppress(Exception): self.cleanup() - except: - pass # Ignore errors in destructor
229-273
: Consider using ExitStack for cleaner resource managementThe current implementation manually tracks file handles. Python's
contextlib.ExitStack
could simplify this.from contextlib import ExitStack def _safe_file_open(self, file_path, mode="r"): """Safely open files with automatic resource management.""" import gzip # Handle both compressed and uncompressed files if str(file_path).endswith(".gz"): return gzip.open(file_path, "rt" if "t" not in mode else mode, encoding="utf-8") else: return open(file_path, mode, encoding="utf-8")Then use it directly with
with
statements without the complex tracking logic.
583-583
: Fix whitespace before colon in slicing operation- yield unique_files[i : i + batch_size] + yield unique_files[i:i + batch_size]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (14)
tests/examples/maxquant/maxquant_full/PXD001819.sdrf.tsv
is excluded by!**/*.tsv
tests/examples/maxquant/maxquant_full/evidence.txt.gz
is excluded by!**/*.gz
tests/examples/maxquant/maxquant_simple/sdrf.tsv
is excluded by!**/*.tsv
tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf.tsv
is excluded by!**/*.tsv
tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design.tsv
is excluded by!**/*.tsv
tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_msstats_in.csv.gz
is excluded by!**/*.gz
tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_openms.mzTab.gz
is excluded by!**/*.gz
tests/examples/quantms/dda-lfq-small/PXD040438.sdrf.tsv
is excluded by!**/*.tsv
tests/examples/quantms/dda-lfq-small/PXD040438_msstats_in.csv
is excluded by!**/*.csv
tests/examples/quantms/dda-plex-full/PXD007683-TMT.sdrf.tsv
is excluded by!**/*.tsv
tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_msstats_in.csv.gz
is excluded by!**/*.gz
tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_openms.mzTab.gz
is excluded by!**/*.gz
tests/examples/quantms/dda-plex-small/MSV000079033-Blood-Plasma-iTRAQ.sdrf.tsv
is excluded by!**/*.tsv
tests/examples/quantms/dda-plex-small/MSV000079033_msstats_in.csv
is excluded by!**/*.csv
📒 Files selected for processing (59)
.github/workflows/integration-tests.yml
(1 hunks).github/workflows/python-app.yml
(2 hunks).github/workflows/python-package.yml
(1 hunks).gitignore
(1 hunks)README.md
(1 hunks)docs/IMPLEMENTATION_DETAILS.md
(1 hunks)docs/README.adoc
(5 hunks)docs/feature.avsc
(6 hunks)docs/pg.avsc
(3 hunks)docs/tools.adoc
(2 hunks)pyproject.toml
(3 hunks)quantmsio/commands/convert/__init__.py
(1 hunks)quantmsio/commands/convert/diann.py
(1 hunks)quantmsio/commands/convert/fragpipe.py
(1 hunks)quantmsio/commands/convert/maxquant.py
(1 hunks)quantmsio/commands/convert/quantms.py
(1 hunks)quantmsio/commands/convert/quantms_project.py
(1 hunks)quantmsio/commands/transform/anndata.py
(1 hunks)quantmsio/commands/transform/de.py
(1 hunks)quantmsio/commands/transform/gene.py
(1 hunks)quantmsio/commands/transform/ibaq.py
(1 hunks)quantmsio/commands/transform/spectra.py
(1 hunks)quantmsio/commands/transform/uniprot.py
(1 hunks)quantmsio/commands/utils/plot.py
(1 hunks)quantmsio/commands/utils/project.py
(1 hunks)quantmsio/commands/utils/stats.py
(1 hunks)quantmsio/core/ae.py
(1 hunks)quantmsio/core/combiner.py
(1 hunks)quantmsio/core/common.py
(3 hunks)quantmsio/core/de.py
(1 hunks)quantmsio/core/diann/diann.py
(3 hunks)quantmsio/core/duckdb.py
(1 hunks)quantmsio/core/format.py
(9 hunks)quantmsio/core/fragpipe/fragpipe.py
(3 hunks)quantmsio/core/maxquant/maxquant.py
(3 hunks)quantmsio/core/msstats_in.py
(0 hunks)quantmsio/core/mztab.py
(0 hunks)quantmsio/core/project.py
(2 hunks)quantmsio/core/quantms/feature.py
(4 hunks)quantmsio/core/quantms/msstats_in.py
(1 hunks)quantmsio/core/quantms/mztab.py
(1 hunks)quantmsio/core/quantms/pg.py
(1 hunks)quantmsio/core/quantms/psm.py
(5 hunks)quantmsio/core/sdrf.py
(2 hunks)quantmsio/operate/plots.py
(1 hunks)quantmsio/operate/query.py
(2 hunks)quantmsio/operate/statistics.py
(1 hunks)quantmsio/operate/tools.py
(1 hunks)quantmsio/quantmsioc.py
(3 hunks)quantmsio/utils/file_utils.py
(1 hunks)tests/test_diann.py
(2 hunks)tests/test_feature.py
(1 hunks)tests/test_full_conversion_suite.py
(1 hunks)tests/test_maxquant.py
(2 hunks)tests/test_msstats_in.py
(1 hunks)tests/test_project.py
(1 hunks)tests/test_psm.py
(1 hunks)tests/test_query.py
(1 hunks)tests/test_sdrf.py
(4 hunks)
💤 Files with no reviewable changes (2)
- quantmsio/core/msstats_in.py
- quantmsio/core/mztab.py
✅ Files skipped from review due to trivial changes (13)
- tests/test_query.py
- quantmsio/operate/plots.py
- quantmsio/core/ae.py
- quantmsio/core/de.py
- quantmsio/utils/file_utils.py
- tests/test_psm.py
- quantmsio/core/duckdb.py
- tests/test_project.py
- tests/test_feature.py
- quantmsio/core/combiner.py
- quantmsio/operate/tools.py
- tests/test_sdrf.py
- docs/tools.adoc
🚧 Files skipped from review as they are similar to previous changes (11)
- quantmsio/commands/convert/init.py
- pyproject.toml
- quantmsio/commands/transform/anndata.py
- quantmsio/commands/transform/ibaq.py
- quantmsio/commands/transform/spectra.py
- quantmsio/core/project.py
- docs/pg.avsc
- quantmsio/quantmsioc.py
- quantmsio/commands/convert/fragpipe.py
- quantmsio/core/format.py
- docs/feature.avsc
🧰 Additional context used
🧬 Code Graph Analysis (10)
quantmsio/operate/query.py (2)
quantmsio/utils/pride_utils.py (3)
generate_gene_name_map
(35-55)get_gene_accessions
(58-71)get_unanimous_name
(19-32)quantmsio/operate/statistics.py (1)
close
(108-112)
tests/test_diann.py (3)
quantmsio/core/diann/diann.py (2)
get_report_from_database
(48-65)generate_pg_matrix
(109-210)quantmsio/core/quantms/feature.py (1)
Feature
(18-292)quantmsio/core/duckdb.py (1)
get_unique_references
(66-69)
quantmsio/operate/statistics.py (1)
quantmsio/operate/query.py (1)
close
(344-348)
quantmsio/commands/convert/maxquant.py (3)
quantmsio/core/maxquant/maxquant.py (5)
MaxQuant
(54-750)write_psm_to_file
(362-373)write_feature_to_file
(380-399)write_features_to_file
(419-443)write_protein_groups_to_file
(684-711)quantmsio/core/project.py (1)
create_uuid_filename
(59-62)quantmsio/utils/logger.py (1)
get_logger
(8-10)
quantmsio/commands/convert/diann.py (3)
quantmsio/core/diann/diann.py (4)
DiaNNConvert
(34-486)write_feature_to_file
(443-462)write_features_to_file
(464-486)write_pg_matrix_to_file
(423-441)quantmsio/core/project.py (1)
create_uuid_filename
(59-62)quantmsio/utils/logger.py (1)
get_logger
(8-10)
quantmsio/commands/transform/de.py (3)
quantmsio/core/de.py (6)
DifferentialExpressionHandler
(33-269)load_project_file
(100-113)load_msstats_file
(81-98)load_sdrf_file
(255-261)set_fdr_threshold
(263-269)convert_msstats_to_quantms
(115-220)quantmsio/utils/file_utils.py (1)
extract_protein_list
(21-38)quantmsio/utils/logger.py (1)
get_logger
(8-10)
quantmsio/commands/transform/gene.py (1)
quantmsio/operate/tools.py (1)
generate_feature_of_gene
(104-120)
quantmsio/commands/utils/plot.py (1)
quantmsio/operate/plots.py (5)
plot_distribution_of_ibaq
(14-43)plot_intensity_box_of_samples
(180-217)plot_intensity_distribution_of_samples
(117-145)plot_peptide_distribution_of_protein
(148-177)plot_peptides_of_lfq_condition
(46-114)
quantmsio/commands/utils/project.py (1)
quantmsio/core/project.py (7)
check_directory
(22-56)populate_from_pride_archive
(123-148)populate_from_sdrf
(282-288)add_quantms_version
(150-151)add_software_provider
(168-175)add_sdrf_file
(290-326)save_updated_project_info
(257-280)
quantmsio/commands/utils/stats.py (1)
quantmsio/operate/statistics.py (15)
IbaqStatistics
(26-46)ParquetStatistics
(49-116)get_number_of_proteins
(10-11)get_number_of_proteins
(32-38)get_number_of_proteins
(80-92)get_number_of_peptides
(13-14)get_number_of_peptides
(62-66)get_number_of_samples
(16-17)get_number_of_samples
(40-46)get_number_of_samples
(74-78)get_number_of_peptidoforms
(19-20)get_number_of_peptidoforms
(68-72)get_number_msruns
(22-23)get_number_msruns
(94-98)get_number_of_psms
(100-106)
🪛 Ruff (0.11.9)
quantmsio/core/quantms/psm.py
2-2: os
imported but unused
Remove unused import: os
(F401)
62-62: Use a context manager for opening files
(SIM115)
quantmsio/commands/convert/maxquant.py
99-99: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
218-218: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
320-320: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
quantmsio/commands/convert/quantms.py
86-86: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
140-140: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
249-249: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
quantmsio/core/maxquant/maxquant.py
733-733: Use is
and is not
for type comparisons, or isinstance()
for isinstance checks
(E721)
735-735: Use is
and is not
for type comparisons, or isinstance()
for isinstance checks
(E721)
737-737: Use is
and is not
for type comparisons, or isinstance()
for isinstance checks
(E721)
739-739: Use is
and is not
for type comparisons, or isinstance()
for isinstance checks
(E721)
743-743: Use is
and is not
for type comparisons, or isinstance()
for isinstance checks
(E721)
745-745: Use is
and is not
for type comparisons, or isinstance()
for isinstance checks
(E721)
749-749: Use is
and is not
for type comparisons, or isinstance()
for isinstance checks
(E721)
quantmsio/commands/convert/quantms_project.py
5-5: logging
imported but unused
Remove unused import: logging
(F401)
12-12: pandas
imported but unused
Remove unused import: pandas
(F401)
13-13: pyarrow
imported but unused
Remove unused import: pyarrow
(F401)
14-14: pyarrow.parquet
imported but unused
Remove unused import: pyarrow.parquet
(F401)
22-22: quantmsio.core.quantms.mztab.MzTabProteinGroups
imported but unused
Remove unused import: quantmsio.core.quantms.mztab.MzTabProteinGroups
(F401)
tests/test_msstats_in.py
88-88: f-string without any placeholders
Remove extraneous f
prefix
(F541)
236-236: Loop control variable count
not used within loop body
(B007)
251-251: Local variable file_channel_counts
is assigned to but never used
Remove assignment to unused variable file_channel_counts
(F841)
283-283: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
331-331: f-string without any placeholders
Remove extraneous f
prefix
(F541)
403-403: f-string without any placeholders
Remove extraneous f
prefix
(F541)
quantmsio/core/quantms/msstats_in.py
68-68: Do not use bare except
(E722)
76-76: Local variable e
is assigned to but never used
Remove assignment to unused variable e
(F841)
105-105: Local variable mapping_df
is assigned to but never used
Remove assignment to unused variable mapping_df
(F841)
119-119: Local variable sample_df
is assigned to but never used
Remove assignment to unused variable sample_df
(F841)
260-260: Do not use bare except
(E722)
266-266: Do not use bare except
(E722)
683-683: Do not use bare except
(E722)
tests/test_maxquant.py
6-6: pytest
imported but unused
Remove unused import: pytest
(F401)
quantmsio/core/quantms/pg.py
62-65: Use contextlib.suppress(Exception)
instead of try
-except
-pass
(SIM105)
64-64: Do not use bare except
(E722)
180-180: Do not use bare except
(E722)
241-241: Use a context manager for opening files
(SIM115)
246-246: Use a context manager for opening files
(SIM115)
261-264: Use contextlib.suppress(ValueError)
instead of try
-except
-pass
(SIM105)
270-270: Do not use bare except
(E722)
519-519: Local variable protein_df
is assigned to but never used
Remove assignment to unused variable protein_df
(F841)
quantmsio/core/quantms/mztab.py
255-255: Use a context manager for opening files
(SIM115)
258-258: Use a context manager for opening files
(SIM115)
414-414: Use a context manager for opening files
(SIM115)
478-478: Use a context manager for opening files
(SIM115)
495-495: Use kwargs.get("usecols")
instead of kwargs.get("usecols", None)
Replace kwargs.get("usecols", None)
with kwargs.get("usecols")
(SIM910)
680-680: Do not use bare except
(E722)
691-691: Do not use bare except
(E722)
785-788: Use contextlib.suppress(Exception)
instead of try
-except
-pass
Replace with contextlib.suppress(Exception)
(SIM105)
787-787: Do not use bare except
(E722)
792-795: Use contextlib.suppress(Exception)
instead of try
-except
-pass
Replace with contextlib.suppress(Exception)
(SIM105)
794-794: Do not use bare except
(E722)
800-803: Use contextlib.suppress(Exception)
instead of try
-except
-pass
Replace with contextlib.suppress(Exception)
(SIM105)
802-802: Do not use bare except
(E722)
tests/test_full_conversion_suite.py
51-51: Loop control variable dataset_name
not used within loop body
Rename unused dataset_name
to _dataset_name
(B007)
52-52: Loop control variable file_type
not used within loop body
Rename unused file_type
to _file_type
(B007)
93-93: f-string without any placeholders
Remove extraneous f
prefix
(F541)
quantmsio/commands/convert/diann.py
93-93: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
146-146: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
quantmsio/commands/transform/de.py
68-68: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
quantmsio/commands/transform/uniprot.py
57-57: Local variable output_path
is assigned to but never used
Remove assignment to unused variable output_path
(F841)
quantmsio/core/sdrf.py
127-129: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🪛 Flake8 (7.2.0)
quantmsio/core/quantms/psm.py
[error] 2-2: 'os' imported but unused
(F401)
quantmsio/commands/convert/quantms_project.py
[error] 5-5: 'logging' imported but unused
(F401)
[error] 12-12: 'pandas as pd' imported but unused
(F401)
[error] 13-13: 'pyarrow as pa' imported but unused
(F401)
[error] 14-14: 'pyarrow.parquet as pq' imported but unused
(F401)
[error] 22-22: 'quantmsio.core.quantms.mztab.MzTabProteinGroups' imported but unused
(F401)
tests/test_msstats_in.py
[error] 88-88: f-string is missing placeholders
(F541)
[error] 251-251: local variable 'file_channel_counts' is assigned to but never used
(F841)
[error] 331-331: f-string is missing placeholders
(F541)
[error] 403-403: f-string is missing placeholders
(F541)
quantmsio/core/quantms/msstats_in.py
[error] 68-68: do not use bare 'except'
(E722)
[error] 76-76: local variable 'e' is assigned to but never used
(F841)
[error] 105-105: local variable 'mapping_df' is assigned to but never used
(F841)
[error] 119-119: local variable 'sample_df' is assigned to but never used
(F841)
[error] 219-219: whitespace before ':'
(E203)
[error] 260-260: do not use bare 'except'
(E722)
[error] 266-266: do not use bare 'except'
(E722)
[error] 533-533: whitespace before ':'
(E203)
[error] 683-683: do not use bare 'except'
(E722)
tests/test_maxquant.py
[error] 6-6: 'pytest' imported but unused
(F401)
quantmsio/core/quantms/pg.py
[error] 64-64: do not use bare 'except'
(E722)
[error] 180-180: do not use bare 'except'
(E722)
[error] 270-270: do not use bare 'except'
(E722)
[error] 519-519: local variable 'protein_df' is assigned to but never used
(F841)
[error] 583-583: whitespace before ':'
(E203)
quantmsio/core/quantms/mztab.py
[error] 680-680: do not use bare 'except'
(E722)
[error] 691-691: do not use bare 'except'
(E722)
[error] 769-769: whitespace before ':'
(E203)
[error] 787-787: do not use bare 'except'
(E722)
[error] 794-794: do not use bare 'except'
(E722)
[error] 802-802: do not use bare 'except'
(E722)
tests/test_full_conversion_suite.py
[error] 93-93: f-string is missing placeholders
(F541)
quantmsio/commands/transform/uniprot.py
[error] 57-57: local variable 'output_path' is assigned to but never used
(F841)
🪛 Pylint (3.3.7)
quantmsio/core/quantms/psm.py
[refactor] 62-62: Consider using 'with' for resource-allocating operations
(R1732)
quantmsio/commands/convert/maxquant.py
[refactor] 144-144: Too many arguments (8/5)
(R0913)
[refactor] 144-144: Too many positional arguments (8/5)
(R0917)
[refactor] 259-259: Too many arguments (7/5)
(R0913)
[refactor] 259-259: Too many positional arguments (7/5)
(R0917)
quantmsio/commands/convert/quantms.py
[refactor] 55-55: Too many arguments (6/5)
(R0913)
[refactor] 55-55: Too many positional arguments (6/5)
(R0917)
[refactor] 191-191: Too many arguments (9/5)
(R0913)
[refactor] 191-191: Too many positional arguments (9/5)
(R0917)
[refactor] 191-191: Too many local variables (16/15)
(R0914)
quantmsio/core/maxquant/maxquant.py
[refactor] 684-684: Too many arguments (6/5)
(R0913)
[refactor] 684-684: Too many positional arguments (6/5)
(R0917)
quantmsio/commands/convert/quantms_project.py
[error] 22-22: No name 'MzTabProteinGroups' in module 'quantmsio.core.quantms.mztab'
(E0611)
[refactor] 47-47: Too many arguments (6/5)
(R0913)
[refactor] 47-47: Too many positional arguments (6/5)
(R0917)
[refactor] 47-47: Too many local variables (25/15)
(R0914)
[refactor] 47-47: Too many branches (14/12)
(R0912)
[refactor] 47-47: Too many statements (78/50)
(R0915)
[refactor] 247-247: Too many arguments (6/5)
(R0913)
[refactor] 247-247: Too many positional arguments (6/5)
(R0917)
quantmsio/core/quantms/feature.py
[refactor] 61-62: Use 'yield from' directly instead of yielding each element one by one
(R1737)
tests/test_msstats_in.py
[refactor] 98-98: Too many local variables (34/15)
(R0914)
[refactor] 152-253: Too many nested blocks (6/5)
(R1702)
[refactor] 152-253: Too many nested blocks (6/5)
(R1702)
[refactor] 98-98: Too many branches (31/12)
(R0912)
[refactor] 98-98: Too many statements (114/50)
(R0915)
quantmsio/core/quantms/msstats_in.py
[refactor] 22-22: Consider using Python 3 style super() without arguments
(R1725)
[refactor] 405-406: Use 'yield from' directly instead of yielding each element one by one
(R1737)
quantmsio/core/quantms/pg.py
[error] 73-73: Non-iterable value file is used in an iterating context
(E1133)
[refactor] 74-82: Unnecessary "elif" after "break", remove the leading "el" from "elif"
(R1723)
[error] 110-110: Non-iterable value file is used in an iterating context
(E1133)
[refactor] 92-92: Too many branches (13/12)
(R0912)
[refactor] 192-195: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 277-283: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 285-285: Too many arguments (9/5)
(R0913)
[refactor] 285-285: Too many positional arguments (9/5)
(R0917)
[refactor] 285-285: Too many local variables (35/15)
(R0914)
[refactor] 285-285: Too many statements (55/50)
(R0915)
[refactor] 571-571: Useless return at end of function or method
(R1711)
[refactor] 627-627: Too many arguments (7/5)
(R0913)
[refactor] 627-627: Too many positional arguments (7/5)
(R0917)
[refactor] 627-627: Too many local variables (23/15)
(R0914)
quantmsio/core/quantms/mztab.py
[refactor] 83-83: Too many instance attributes (22/7)
(R0902)
[refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 414-414: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 478-478: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 606-640: Too many nested blocks (7/5)
(R1702)
[refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 715-732: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
tests/test_full_conversion_suite.py
[refactor] 84-90: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
quantmsio/commands/convert/diann.py
[refactor] 17-17: Too many arguments (12/5)
(R0913)
[refactor] 17-17: Too many positional arguments (12/5)
(R0917)
[refactor] 17-17: Too many local variables (19/15)
(R0914)
[refactor] 96-96: Too many arguments (7/5)
(R0913)
[refactor] 96-96: Too many positional arguments (7/5)
(R0917)
quantmsio/commands/transform/de.py
[refactor] 16-16: Too many arguments (9/5)
(R0913)
[refactor] 16-16: Too many positional arguments (9/5)
(R0917)
quantmsio/commands/transform/gene.py
[refactor] 45-45: Too many arguments (6/5)
(R0913)
[refactor] 45-45: Too many positional arguments (6/5)
(R0917)
quantmsio/commands/utils/plot.py
[refactor] 14-14: Consider using '{"help_option_names": ['-h', '--help']}' instead of a call to 'dict'.
(R1735)
quantmsio/commands/utils/project.py
[refactor] 45-45: Too many arguments (6/5)
(R0913)
[refactor] 45-45: Too many positional arguments (6/5)
(R0917)
[error] 86-86: Unexpected keyword argument 'output_file_name' in method call
(E1123)
quantmsio/commands/utils/stats.py
[refactor] 9-9: Consider using '{"help_option_names": ['-h', '--help']}' instead of a call to 'dict'.
(R1735)
🪛 actionlint (1.7.7)
.github/workflows/python-app.yml
37-37: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
66-66: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
90-90: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
.github/workflows/python-package.yml
23-23: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
56-56: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
.github/workflows/integration-tests.yml
28-28: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 YAMLlint (1.37.1)
.github/workflows/python-app.yml
[warning] 35-35: wrong indentation: expected 6 but found 4
(indentation)
[warning] 64-64: wrong indentation: expected 6 but found 4
(indentation)
[warning] 88-88: wrong indentation: expected 6 but found 4
(indentation)
[error] 106-106: no new line character at the end of file
(new-line-at-end-of-file)
.github/workflows/python-package.yml
[error] 72-72: no new line character at the end of file
(new-line-at-end-of-file)
.github/workflows/integration-tests.yml
[error] 23-23: trailing spaces
(trailing-spaces)
[error] 26-26: trailing spaces
(trailing-spaces)
[error] 31-31: trailing spaces
(trailing-spaces)
[error] 40-40: trailing spaces
(trailing-spaces)
[error] 47-47: trailing spaces
(trailing-spaces)
[error] 54-54: trailing spaces
(trailing-spaces)
[error] 55-55: trailing spaces
(trailing-spaces)
[error] 61-61: trailing spaces
(trailing-spaces)
[error] 67-67: no new line character at the end of file
(new-line-at-end-of-file)
[error] 67-67: trailing spaces
(trailing-spaces)
🪛 LanguageTool
README.md
[uncategorized] ~114-~114: Loose punctuation mark.
Context: ...ment variables: - QUANTMSIO_LOG_LEVEL
: Set logging level (DEBUG, INFO, WARNING...
(UNLIKELY_OPENING_PUNCTUATION)
docs/IMPLEMENTATION_DETAILS.md
[typographical] ~73-~73: If specifying a range, consider using an en dash instead of a hyphen.
Context: ...minutes for large datasets - After: 10-45 seconds for the same datasets - **Memor...
(HYPHEN_TO_EN)
🪛 markdownlint-cli2 (0.17.2)
README.md
132-132: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
docs/IMPLEMENTATION_DETAILS.md
177-177: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
270-270: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
271-271: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
280-280: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
281-281: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
290-290: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
291-291: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
🪛 GitHub Check: CodeFactor
quantmsio/commands/convert/quantms_project.py
[notice] 45-199: quantmsio/commands/convert/quantms_project.py#L45-L199
Complex Method
tests/test_msstats_in.py
[notice] 98-331: tests/test_msstats_in.py#L98-L331
Complex Method
quantmsio/core/quantms/msstats_in.py
[notice] 260-260: quantmsio/core/quantms/msstats_in.py#L260
Do not use bare 'except'. (E722)
[notice] 683-683: quantmsio/core/quantms/msstats_in.py#L683
Do not use bare 'except'. (E722)
[notice] 683-684: quantmsio/core/quantms/msstats_in.py#L683-L684
Try, Except, Pass detected. (B110)
[notice] 266-266: quantmsio/core/quantms/msstats_in.py#L266
Do not use bare 'except'. (E722)
[notice] 68-68: quantmsio/core/quantms/msstats_in.py#L68
Do not use bare 'except'. (E722)
quantmsio/core/quantms/pg.py
[notice] 258-258: quantmsio/core/quantms/pg.py#L258
Do not use bare 'except'. (E722)
[notice] 64-64: quantmsio/core/quantms/pg.py#L64
Do not use bare 'except'. (E722)
[notice] 64-65: quantmsio/core/quantms/pg.py#L64-L65
Try, Except, Pass detected. (B110)
[notice] 172-172: quantmsio/core/quantms/pg.py#L172
Do not use bare 'except'. (E722)
[notice] 258-259: quantmsio/core/quantms/pg.py#L258-L259
Try, Except, Pass detected. (B110)
quantmsio/core/quantms/mztab.py
[notice] 802-802: quantmsio/core/quantms/mztab.py#L802
Do not use bare 'except'. (E722)
[notice] 787-788: quantmsio/core/quantms/mztab.py#L787-L788
Try, Except, Pass detected. (B110)
[notice] 794-794: quantmsio/core/quantms/mztab.py#L794
Do not use bare 'except'. (E722)
[notice] 680-680: quantmsio/core/quantms/mztab.py#L680
Do not use bare 'except'. (E722)
[notice] 794-795: quantmsio/core/quantms/mztab.py#L794-L795
Try, Except, Pass detected. (B110)
[notice] 802-803: quantmsio/core/quantms/mztab.py#L802-L803
Try, Except, Pass detected. (B110)
[notice] 691-691: quantmsio/core/quantms/mztab.py#L691
Do not use bare 'except'. (E722)
[notice] 691-692: quantmsio/core/quantms/mztab.py#L691-L692
Try, Except, Continue detected. (B112)
[notice] 787-787: quantmsio/core/quantms/mztab.py#L787
Do not use bare 'except'. (E722)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: unit-tests
- GitHub Check: unit-tests
- GitHub Check: unit-tests (3.10)
- GitHub Check: unit-tests (3.11)
🔇 Additional comments (41)
.gitignore (1)
8-70
: Excellent comprehensive .gitignore update.The extensive additions cover all the essential ignore patterns for the quantmsio project, including DuckDB databases, temporary processing files, test outputs, and standard Python development artifacts. These patterns align well with the project's data processing workflows and CLI command structure.
quantmsio/core/sdrf.py (1)
11-11
: Minor import reordering.The import order change is stylistic and doesn't affect functionality.
quantmsio/operate/statistics.py (1)
108-116
: Excellent resource management implementation.The addition of explicit
close()
and__del__()
methods properly handles DuckDB connection lifecycle. The implementation correctly checks for attribute existence, closes the connection, and sets it toNone
to prevent double-closure issues. This pattern aligns with the similar implementation inquantmsio/operate/query.py
and effectively prevents resource leaks.quantmsio/operate/query.py (2)
15-19
: Appropriate import organization.The updated imports from
pride_utils
improve modularity by importing specific utility functions needed by theQuery
class methods.
344-352
: Consistent resource management implementation.The addition of
close()
and__del__()
methods matches the pattern implemented inquantmsio/operate/statistics.py
(lines 108-112), providing consistent DuckDB connection management across the codebase. This prevents resource leaks and ensures proper cleanup.quantmsio/core/quantms/psm.py (2)
117-122
: Improved logic separation for minimum score finding.The refactoring to separate grouping and index extraction steps improves code clarity and makes the intent more explicit. This is a good improvement over the previous implementation.
231-250
: Enhanced logging provides good traceability.The addition of structured logging statements provides excellent traceability for PSM conversion operations, including input/output paths and optional protein filtering. This aligns well with the PR's logging enhancement objectives.
quantmsio/commands/transform/gene.py (1)
1-67
: Well-structured CLI command with appropriate parameter handling.The command follows good click patterns and properly handles the optional partitions parameter by splitting comma-separated values. While static analysis flags the high parameter count, all parameters appear necessary for the gene mapping functionality.
The delegation to
generate_feature_of_gene
keeps the CLI layer thin and focused on parameter handling, which is a good separation of concerns.tests/test_diann.py (2)
3-4
: Import path updates reflect module restructuring.The updated import paths align with the module restructuring described in the PR objectives, moving to more specific submodules within
quantmsio.core.quantms
.
38-125
: Comprehensive test coverage for protein group transformation.This test excellently validates the complex nested structure of protein group data after transformation. It thoroughly checks:
- Presence of required columns (
intensities
,additional_intensities
)- Proper list structure for intensity arrays
- Correct nested objects with expected fields (
sample_accession
,channel
,intensity
)- Validation of DIA-NN-specific values (LFQ channel)
- Structure of additional intensity types with proper nesting
The test provides robust coverage for the refactored protein group handling and ensures data integrity throughout the transformation process.
README.md (2)
8-25
: Excellent transformation to comprehensive user documentation.The README has been completely restructured from a brief specification note into a comprehensive user and developer guide. This aligns perfectly with the PR's CLI reorganization and enhanced functionality, providing clear documentation for all the new command groups and features.
174-176
: License inconsistency needs clarification.The development section mentions MIT License, but the copyright notice at the bottom specifies GNU General Public License. This inconsistency should be resolved to avoid confusion.
Please clarify which license applies to this project - MIT (mentioned in line 176) or GPL (mentioned in the copyright notice starting at line 196). Consistent licensing information is important for open source projects.
quantmsio/core/quantms/feature.py (2)
58-62
: Excellent use of context manager for resource management.The refactoring to use a context manager with
MsstatsIN
ensures proper resource cleanup and automatic database closure. This is a significant improvement for reliability.
156-162
: Good addition of structured logging.The logging statements provide valuable traceability by recording input parameters, output paths, and optional filter files. This enhances debugging and monitoring capabilities.
Also applies to: 185-192
quantmsio/core/diann/diann.py (5)
15-24
: Import reorganization looks good.The reordering of imports to move
DIANN_USECOLS
afterDIANN_PG_USECOLS
and updating imports fromquantmsio.core.quantms
aligns with the module restructuring mentioned in the AI summary.
176-195
: Review the additional_intensities structure for completeness.The enhanced
additional_intensities
structure with nestedintensities
array containingintensity_name
andintensity_value
pairs is well-structured. This provides good separation between primary intensities and additional/derived intensities.
203-208
: Good practice: Clean up intermediate columns.Dropping the raw count columns
unique_sequences
andtotal_features
after transformation is good practice to avoid confusion and reduce memory usage.
370-370
: Verify consistency in additional_intensities key naming.The change from
"additional_intensity"
to"intensities"
in theadd_additional_msg
method should be consistent with the key used ingenerate_pg_matrix
. Both methods now use"intensities"
which is correct.
159-173
: Manual Verification Required: Confirmintensities
Array Structure Matches SchemaI was unable to locate any schema definitions for the new
intensities
field in the repository. Please ensure that the generated array with these properties:
sample_accession
(string)channel
(string)intensity
(numeric)aligns with the updated schema expectations.
Affected location:
• quantmsio/core/diann/diann.py (Lines 159–173)# Create intensities array using the pg_quantity field (mapped from PG.Quantity) report.loc[:, "intensities"] = report[ ["reference_file_name", "pg_quantity"] ].apply( lambda rows: [ { "sample_accession": self._sample_map[ rows["reference_file_name"] + "-LFQ" ], "channel": "LFQ", "intensity": rows["pg_quantity"], } ], axis=1, )quantmsio/core/common.py (4)
8-11
: Import reorganization improves module structure.The updated imports from
quantmsio
andquantmsio.core.format
align with the project's module restructuring and make dependencies more explicit.
99-111
: Explicit column definition improves maintainability.Defining
DIANN_PG_USECOLS
explicitly rather than dynamically generating it improves readability and makes it easier to maintain the column list independently of the mapping dictionary.
163-197
: New mapping dictionaries extend format support.The addition of
MAXQUANT_PG_MAP
andMZTAB_PG_MAP
with their corresponding usage columns extends the library's support for multiple proteomics data formats. The field names are well-chosen and follow consistent naming conventions.
90-96
: All DIANN_PG_MAP entries are correctly used ingenerate_pg_matrix
.The newly added mappings (
"PG.Quantity" → pg_quantity
,"Proteotypic" → unique_sequences
,"Precursor.Quantity" → total_features
) are referenced indiann.py
for both thepeptide_counts
andfeature_counts
structs and in the intensities array creation, and are then dropped as expected. No further changes needed.quantmsio/commands/utils/stats.py (1)
12-15
: CLI command structure follows good practices.The command group setup with proper context settings and clear documentation follows Click framework best practices.
.github/workflows/python-app.yml (2)
11-15
: Good addition of scheduling and manual dispatch.Adding cron scheduling for daily full test runs and workflow_dispatch for manual triggering improves the CI/CD workflow flexibility.
32-83
: Excellent workflow restructuring for better testing granularity.The separation into unit-tests, integration-tests, and full-test-suite jobs provides:
- Faster feedback for unit tests
- Conditional integration tests based on labels
- Comprehensive testing on main branch and scheduled runs
This is a significant improvement over the single build job approach.
tests/test_full_conversion_suite.py (3)
66-80
: Excellent subprocess configuration for robust testing.The subprocess configuration includes:
- UTF-8 encoding handling
- Proper timeout (10 minutes for TMT protein groups)
- Error handling with text output
- Working directory set to workspace root
This ensures robust and reliable CLI testing.
119-167
: Comprehensive test coverage for all conversion types.The test suite covers all major conversion workflows (PSM, Feature, Protein Groups) for both LFQ and TMT datasets. Each test:
- Validates input file existence
- Creates temporary output directories
- Runs the CLI command via subprocess
- Analyzes output files for expected structure
- Asserts success and data validity
This provides excellent end-to-end validation of the CLI functionality.
Also applies to: 169-223, 225-278
100-116
: Good output file analysis helper.The
analyze_output_file
function provides useful validation of parquet outputs including row counts, column counts, file size, and column names. The error handling ensures graceful failure when files can't be analyzed.tests/test_maxquant.py (1)
356-362
: Update main block to remove duplicate test function calls.After removing the duplicate test functions, update the main block to only call the unique test functions.
if __name__ == "__main__": test_maxquant_protein_groups_transform() test_maxquant_protein_groups_with_sample_columns() - test_maxquant_pg_basic_transformation() - test_maxquant_pg_sample_specific_columns() test_mztab_pg_gene_extraction() print("All MaxQuant protein group tests passed!")Likely an incorrect or invalid review comment.
quantmsio/core/maxquant/maxquant.py (3)
445-496
: Well-implemented batch processing for protein groups.The
iter_protein_groups_batch
method demonstrates good practices:
- Dynamic column detection for sample-specific intensities
- Efficient chunked reading
- Proper metadata storage using DataFrame attributes
587-619
: Good separation of concerns in intensity creation methods.The helper methods
_create_intensities_from_samples
and_create_additional_intensities
properly encapsulate the logic for creating structured intensity data, making the code more maintainable and testable.Also applies to: 620-663
733-749
: Useisinstance()
for type checking instead of direct comparison.Direct type comparison with
==
is not recommended. Useisinstance()
for proper type checking.for field, dtype in required_fields.items(): if field not in df.columns: - if dtype == list: + if dtype is list: df[field] = [[] for _ in range(len(df))] - elif dtype == str: + elif dtype is str: df[field] = None - elif dtype == float: + elif dtype is float: df[field] = 0.0 - elif dtype == int: + elif dtype is int: df[field] = 0 else: # Handle type conversions - if dtype == float: + if dtype is float: df[field] = pd.to_numeric(df[field], errors="coerce").fillna(0.0) - elif dtype == int: + elif dtype is int: df[field] = ( pd.to_numeric(df[field], errors="coerce").fillna(0).astype(int) ) - elif dtype == str: + elif dtype is str: df[field] = df[field].astype(str).fillna("")Likely an incorrect or invalid review comment.
quantmsio/core/quantms/mztab.py (1)
246-266
: Context manager already handles file operations correctly.The
_safe_file_open
method is properly implemented as a context manager that handles both regular and gzipped files. The static analysis warnings about lines 255 and 258 are false positives - the files are properly closed in the finally block.quantmsio/core/quantms/msstats_in.py (2)
105-106
: Variables are used implicitly in DuckDB SQL contextWhile static analysis flags these as unused, they're actually referenced in the SQL CREATE TABLE statements through DuckDB's implicit DataFrame registration.
Also applies to: 119-120
122-210
: Well-structured SQL view creation with proper column handlingThe implementation properly handles optional columns and creates appropriate views for both LFQ and TMT/iTRAQ experiments. The dynamic SQL generation based on available columns is a good approach.
docs/README.adoc (2)
1227-1280
: Excellent documentation of intensity data structureThe clear distinction between primary intensities (raw measurements) and additional intensities (derived values) with concrete examples greatly improves the format specification. The semantic guidelines provide valuable guidance for implementers.
1405-1481
: Well-structured protein group field definitionsThe addition of structured
peptide_counts
andfeature_counts
fields, along with clearer descriptions of existing fields, improves the protein group specification. The mapping to tool-specific fields (DIA-NN, FragPipe, MaxQuant) is particularly helpful.quantmsio/core/quantms/pg.py (3)
285-434
: Well-architected protein quantification pipelineThe implementation effectively uses DuckDB for scalable SQL-based aggregation with proper error handling and detailed logging. The step-by-step approach with timing information aids debugging and performance analysis.
627-782
: Comprehensive protein row construction with proper data transformationThe method effectively transforms SQL aggregation results into the structured protein group format. The handling of optional computations (TopN, iBAQ) and fallback values for missing data is well implemented.
517-524
: DuckDB Implicit DataFrame Registration – No Change RequiredThe
protein_df
variable is picked up by DuckDB’s Python API at runtime—when you runmsstats_in._duckdb.execute( "CREATE TABLE protein_groups AS SELECT * FROM protein_df" )the
protein_df
object in the Python namespace is used as the source table. Although static analysis flags it as “unused,” it’s consumed implicitly by the SQL engine.Optional (to satisfy linters): explicitly register the DataFrame before use:
msstats_in._duckdb.register("protein_df", protein_df) msstats_in._duckdb.execute( "CREATE TABLE protein_groups AS SELECT * FROM protein_df" )File: quantmsio/core/quantms/pg.py
Lines: 517–524
def write_psms_to_parquet( | ||
self, | ||
file_path: Path, | ||
batch_size: int = 10000, | ||
output_prefix_file: Optional[str] = None, | ||
**metadata, | ||
): | ||
if not file_path.exists(): | ||
raise FileNotFoundError(file_path) | ||
if not self.output_directory.exists(): | ||
self.output_directory.mkdir(parents=True) | ||
if not output_prefix_file: | ||
output_prefix_file = "psm" | ||
|
||
file_uuid = uuid.uuid4() | ||
output_path = ( | ||
self.output_directory / f"{output_prefix_file}-{file_uuid}.psm.parquet" | ||
) | ||
|
||
metadata["file_type"] = "psm" | ||
metadata["uuid"] = str(file_uuid) | ||
metadata["creation_date"] = date.today().isoformat() | ||
msms_file: Path, | ||
batch_size: int = 1000000, | ||
output_prefix: Optional[str] = None, | ||
) -> None: | ||
""" | ||
Write PSMs to parquet file. | ||
|
||
Args: | ||
msms_file: Path to the psm.tsv file | ||
batch_size: Number of rows to process at a time | ||
output_prefix: Prefix for the output file name | ||
""" | ||
if not output_prefix: | ||
output_prefix = "psm" | ||
|
||
file_uuid = str(uuid.uuid4()) | ||
output_path = self.output_directory / f"{output_prefix}-{file_uuid}.psm.parquet" | ||
|
||
metadata = { | ||
"file_type": "psm", | ||
"uuid": file_uuid, | ||
"creation_date": date.today().isoformat(), | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Review the aggressive batch size increase and missing safety checks.
The refactoring improves code clarity with better parameter naming, type annotations, and documentation. However, there are two concerns:
-
Batch size increase: The default batch size increased 100x from 10,000 to 1,000,000. This could cause memory issues on systems with limited resources.
-
Missing safety checks: The removal of file existence checks and output directory creation could lead to less informative error messages and runtime failures.
Consider these improvements:
def write_psms_to_parquet(
self,
msms_file: Path,
- batch_size: int = 1000000,
+ batch_size: int = 100000, # More conservative default
output_prefix: Optional[str] = None,
) -> None:
"""
Write PSMs to parquet file.
Args:
msms_file: Path to the psm.tsv file
batch_size: Number of rows to process at a time
output_prefix: Prefix for the output file name
"""
+ # Validate input file exists
+ if not msms_file.exists():
+ raise FileNotFoundError(f"PSM file not found: {msms_file}")
+
+ # Ensure output directory exists
+ self.output_directory.mkdir(parents=True, exist_ok=True)
+
if not output_prefix:
output_prefix = "psm"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def write_psms_to_parquet( | |
self, | |
file_path: Path, | |
batch_size: int = 10000, | |
output_prefix_file: Optional[str] = None, | |
**metadata, | |
): | |
if not file_path.exists(): | |
raise FileNotFoundError(file_path) | |
if not self.output_directory.exists(): | |
self.output_directory.mkdir(parents=True) | |
if not output_prefix_file: | |
output_prefix_file = "psm" | |
file_uuid = uuid.uuid4() | |
output_path = ( | |
self.output_directory / f"{output_prefix_file}-{file_uuid}.psm.parquet" | |
) | |
metadata["file_type"] = "psm" | |
metadata["uuid"] = str(file_uuid) | |
metadata["creation_date"] = date.today().isoformat() | |
msms_file: Path, | |
batch_size: int = 1000000, | |
output_prefix: Optional[str] = None, | |
) -> None: | |
""" | |
Write PSMs to parquet file. | |
Args: | |
msms_file: Path to the psm.tsv file | |
batch_size: Number of rows to process at a time | |
output_prefix: Prefix for the output file name | |
""" | |
if not output_prefix: | |
output_prefix = "psm" | |
file_uuid = str(uuid.uuid4()) | |
output_path = self.output_directory / f"{output_prefix}-{file_uuid}.psm.parquet" | |
metadata = { | |
"file_type": "psm", | |
"uuid": file_uuid, | |
"creation_date": date.today().isoformat(), | |
} | |
def write_psms_to_parquet( | |
self, | |
msms_file: Path, | |
batch_size: int = 100000, # More conservative default | |
output_prefix: Optional[str] = None, | |
) -> None: | |
""" | |
Write PSMs to parquet file. | |
Args: | |
msms_file: Path to the psm.tsv file | |
batch_size: Number of rows to process at a time | |
output_prefix: Prefix for the output file name | |
""" | |
# Validate input file exists | |
if not msms_file.exists(): | |
raise FileNotFoundError(f"PSM file not found: {msms_file}") | |
# Ensure output directory exists | |
self.output_directory.mkdir(parents=True, exist_ok=True) | |
if not output_prefix: | |
output_prefix = "psm" | |
file_uuid = str(uuid.uuid4()) | |
output_path = self.output_directory / f"{output_prefix}-{file_uuid}.psm.parquet" | |
metadata = { | |
"file_type": "psm", | |
"uuid": file_uuid, | |
"creation_date": date.today().isoformat(), | |
} | |
# ... rest of implementation ... |
🤖 Prompt for AI Agents
In quantmsio/core/fragpipe/fragpipe.py around lines 366 to 391, the default
batch size was increased from 10,000 to 1,000,000, which may cause memory issues
on some systems; reduce the default batch size back to a safer value like
10,000. Also, add safety checks to verify that the input msms_file exists before
processing and ensure the output directory exists or create it if missing to
prevent runtime errors and provide clearer error messages.
raise FileNotFoundError( | ||
"The SDRF file provided not found: " + str(sdrf_file) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve exception chaining for better error context.
The string conversion fix for the error message is good. However, consider using proper exception chaining to preserve the original error context.
Apply this diff to implement proper exception chaining:
except FileNotFoundError:
- raise FileNotFoundError(
- "The SDRF file provided not found: " + str(sdrf_file)
- )
+ raise FileNotFoundError(
+ "The SDRF file provided not found: " + str(sdrf_file)
+ ) from None
Alternatively, if you want to preserve the original exception context:
-except FileNotFoundError:
+except FileNotFoundError as err:
raise FileNotFoundError(
"The SDRF file provided not found: " + str(sdrf_file)
- )
+ ) from err
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
raise FileNotFoundError( | |
"The SDRF file provided not found: " + str(sdrf_file) | |
) | |
except FileNotFoundError: | |
raise FileNotFoundError( | |
"The SDRF file provided not found: " + str(sdrf_file) | |
) from None |
raise FileNotFoundError( | |
"The SDRF file provided not found: " + str(sdrf_file) | |
) | |
except FileNotFoundError as err: | |
raise FileNotFoundError( | |
"The SDRF file provided not found: " + str(sdrf_file) | |
) from err |
🧰 Tools
🪛 Ruff (0.11.9)
127-129: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In quantmsio/core/sdrf.py around lines 127 to 129, the FileNotFoundError is
raised with a custom message but lacks proper exception chaining. Modify the
code to use "raise FileNotFoundError(...) from original_exception" where
original_exception is the caught exception, to preserve the original error
context and improve debugging. This involves catching the original exception
first, then raising the new FileNotFoundError with the "from" keyword
referencing the caught exception.
if self.mztab_path.stat().st_size == 0: | ||
raise ValueError("File is empty") | ||
f = open(self.mztab_path) | ||
# Use seekable file path to handle gzip files | ||
seekable_path = self._get_seekable_file_path() | ||
f = open(seekable_path, "r", encoding="utf-8") | ||
pos = self._get_pos("PEH") | ||
f.seek(pos) | ||
line = f.readline() | ||
while not line.startswith("PEH"): | ||
line = f.readline() | ||
self._pep_columns = line.split("\n")[0].split("\t") | ||
f.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use context manager for file operations.
The file handling has been improved to support gzip files, but the manual file opening and closing should use a context manager for better resource management and exception safety.
- # Use seekable file path to handle gzip files
- seekable_path = self._get_seekable_file_path()
- f = open(seekable_path, "r", encoding="utf-8")
- pos = self._get_pos("PEH")
- f.seek(pos)
- line = f.readline()
- while not line.startswith("PEH"):
- line = f.readline()
- self._pep_columns = line.split("\n")[0].split("\t")
- f.close()
+ # Use seekable file path to handle gzip files
+ seekable_path = self._get_seekable_file_path()
+ with open(seekable_path, "r", encoding="utf-8") as f:
+ pos = self._get_pos("PEH")
+ f.seek(pos)
+ line = f.readline()
+ while not line.startswith("PEH"):
+ line = f.readline()
+ self._pep_columns = line.split("\n")[0].split("\t")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if self.mztab_path.stat().st_size == 0: | |
raise ValueError("File is empty") | |
f = open(self.mztab_path) | |
# Use seekable file path to handle gzip files | |
seekable_path = self._get_seekable_file_path() | |
f = open(seekable_path, "r", encoding="utf-8") | |
pos = self._get_pos("PEH") | |
f.seek(pos) | |
line = f.readline() | |
while not line.startswith("PEH"): | |
line = f.readline() | |
self._pep_columns = line.split("\n")[0].split("\t") | |
f.close() | |
if self.mztab_path.stat().st_size == 0: | |
raise ValueError("File is empty") | |
# Use seekable file path to handle gzip files | |
seekable_path = self._get_seekable_file_path() | |
with open(seekable_path, "r", encoding="utf-8") as f: | |
pos = self._get_pos("PEH") | |
f.seek(pos) | |
line = f.readline() | |
while not line.startswith("PEH"): | |
line = f.readline() | |
self._pep_columns = line.split("\n")[0].split("\t") |
🧰 Tools
🪛 Ruff (0.11.9)
62-62: Use a context manager for opening files
(SIM115)
🪛 Pylint (3.3.7)
[refactor] 62-62: Consider using 'with' for resource-allocating operations
(R1732)
🤖 Prompt for AI Agents
In quantmsio/core/quantms/psm.py around lines 58 to 69, the file is opened and
closed manually which risks resource leaks if exceptions occur. Replace the
manual open and close calls with a context manager using the "with" statement to
ensure the file is properly closed after reading, improving resource management
and exception safety.
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v3 | ||
uses: actions/setup-python@v4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update GitHub Actions to latest version.
The actions/setup-python@v4
action is outdated and should be updated to v5
for better compatibility and security.
- uses: actions/setup-python@v4
+ uses: actions/setup-python@v5
Apply this change to both occurrences (lines 23 and 56).
Also applies to: 56-56
🧰 Tools
🪛 actionlint (1.7.7)
23-23: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
In .github/workflows/python-package.yml at lines 23 and 56, the GitHub Action
'actions/setup-python' is using version 'v4', which is outdated. Update both
occurrences to use 'actions/setup-python@v5' to ensure better compatibility and
security.
- uses: actions/checkout@v4 | ||
|
||
- name: Set up Python 3.10 | ||
uses: actions/setup-python@v4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update GitHub Actions to latest version.
The actions/setup-python@v4
action is outdated and should be updated to v5
.
- uses: actions/setup-python@v4
+ uses: actions/setup-python@v5
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
uses: actions/setup-python@v4 | |
uses: actions/setup-python@v5 |
🧰 Tools
🪛 actionlint (1.7.7)
28-28: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
In .github/workflows/integration-tests.yml at line 28, the GitHub Action uses an
outdated version of actions/setup-python (v4). Update the version tag from v4 to
v5 to use the latest version of the action.
project_handler.populate_from_sdrf(str(sdrf_file)) | ||
project_handler.add_quantms_version(quantmsio_version=quantmsio_version) | ||
project_handler.add_software_provider( | ||
sortware_name="quantms", sortware_version=quantms_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in parameter name.
The parameter name has a typo that will cause a runtime error.
- sortware_name="quantms", sortware_version=quantms_version
+ software_name="quantms", software_version=quantms_version
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
sortware_name="quantms", sortware_version=quantms_version | |
software_name="quantms", software_version=quantms_version |
🤖 Prompt for AI Agents
In quantmsio/commands/convert/quantms_project.py at line 115, there is a typo in
the parameter name "sortware_name" which should be corrected to "software_name"
to prevent runtime errors. Update the parameter name to the correct spelling.
except Exception as e: | ||
print(f"Warning: Could not setup optimized processing: {e}") | ||
# Fall back to original processing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Log exception details for better debugging
The current exception handling swallows all exceptions and only prints a warning. This could make debugging difficult in production.
except Exception as e:
- print(f"Warning: Could not setup optimized processing: {e}")
+ import logging
+ logger = logging.getLogger(__name__)
+ logger.warning(f"Could not setup optimized processing: {type(e).__name__}: {e}")
# Fall back to original processing
+ self._optimized_setup_done = False
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
except Exception as e: | |
print(f"Warning: Could not setup optimized processing: {e}") | |
# Fall back to original processing | |
except Exception as e: | |
import logging | |
logger = logging.getLogger(__name__) | |
logger.warning(f"Could not setup optimized processing: {type(e).__name__}: {e}") | |
# Fall back to original processing | |
self._optimized_setup_done = False |
🤖 Prompt for AI Agents
In quantmsio/core/quantms/msstats_in.py around lines 56 to 58, the exception
handling only prints a warning message without logging the full exception
details, which hinders debugging. Modify the except block to log the complete
exception information, including the stack trace, using a proper logging
mechanism instead of just printing a warning. This will provide better
visibility into the error cause during production issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (11)
quantmsio/commands/convert/quantms_project.py (3)
5-5
: Remove unused import.The
logging
import is not used in this file and should be removed.
111-111
: Fix typo in parameter name.The parameter name has a typo that will cause a runtime error.
- sortware_name="quantms", sortware_version=quantms_version + software_name="quantms", software_version=quantms_version
43-201
: Refactor complex function to improve maintainability.This function is overly complex with multiple responsibilities and too many parameters. Consider breaking it down into smaller, focused functions for better maintainability.
quantmsio/core/quantms/mztab.py (2)
680-680
: Replace bare except clauses with specific exceptions.Using bare
except:
can catch system exits and keyboard interrupts. Be more specific about which exceptions to handle.Also applies to: 691-691
785-803
: Replace bare except clauses in cleanup methods.The cleanup methods use bare except clauses which should be replaced with specific exception types for better error handling.
tests/test_maxquant.py (3)
6-6
: Remove unused import.The
pytest
import is not used in this file and should be removed.
223-287
: Remove duplicate test function.This function
test_maxquant_pg_basic_transformation
is an exact duplicate oftest_maxquant_protein_groups_transform
and should be removed to avoid redundancy.
289-329
: Remove duplicate test function.This function
test_maxquant_pg_sample_specific_columns
is an exact duplicate oftest_maxquant_protein_groups_with_sample_columns
and should be removed to avoid redundancy.quantmsio/core/quantms/msstats_in.py (3)
68-73
: Fix exception handling in column detectionMultiple issues with exception handling:
- Bare except clause at line 68
- Unused variable 'e' at line 76
- Should log the exception details for debugging
try: columns_query = "PRAGMA table_info('report')" columns_info = self._duckdb.execute(columns_query).df() available_columns = set(columns_info["name"].tolist()) - except: + except (AttributeError, KeyError, ValueError) as e: + import logging + logger = logging.getLogger(__name__) + logger.debug(f"PRAGMA query failed: {e}") # Fallback: get columns from a sample query sample_query = "SELECT * FROM report LIMIT 1" sample_data = self._duckdb.execute(sample_query).df() available_columns = set(sample_data.columns.tolist()) self._available_columns = available_columns - except Exception as e: + except Exception: # Default column set based on common msstats format self._available_columns = {Also applies to: 76-87
56-58
: Replace bare except with specific exception handling and improve fallback handlingThe bare except clause should catch specific exceptions, use proper logging instead of print, and ensure
_optimized_setup_done
remains False on failure.- except Exception as e: - print(f"Warning: Could not setup optimized processing: {e}") - # Fall back to original processing + except Exception as e: + import logging + logger = logging.getLogger(__name__) + logger.warning(f"Could not setup optimized processing: {type(e).__name__}: {e}") + # Fall back to original processing + self._optimized_setup_done = False
260-267
: Replace bare except clauses in fallback statistics methodThe bare except clauses could catch system exceptions and hide real issues.
try: columns_query = "PRAGMA table_info('report')" columns_info = self._duckdb.execute(columns_query).df() available_columns = set(columns_info["name"].tolist()) - except: + except (AttributeError, KeyError, ValueError): # If pragma doesn't work, try a different approach try: sample_query = "SELECT * FROM report LIMIT 1" sample_data = self._duckdb.execute(sample_query).df() available_columns = set(sample_data.columns.tolist()) - except: + except Exception: # Fallback to basic assumptions available_columns = {
🧹 Nitpick comments (3)
tests/test_full_conversion_suite.py (1)
70-76
: Minor style improvement: simplify conditional structure.The else clause after return is unnecessary and can be simplified.
- if result.returncode == 0: - print(f"[OK] Success in {duration:.2f}s") - return True, duration, result.stdout, result.stderr - else: - print(f"[FAIL] Failed in {duration:.2f}s") - print(f"Error: {result.stderr}") - return False, duration, result.stdout, result.stderr + if result.returncode == 0: + print(f"[OK] Success in {duration:.2f}s") + return True, duration, result.stdout, result.stderr + + print(f"[FAIL] Failed in {duration:.2f}s") + print(f"Error: {result.stderr}") + return False, duration, result.stdout, result.stderrquantmsio/core/quantms/msstats_in.py (1)
105-106
: Remove unused DataFrame assignmentsThe DataFrames are created but the variables are never used. DuckDB references them by name in the SQL string.
- mapping_df = pd.DataFrame(mapping_data) - self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df") + # DuckDB can reference the DataFrame directly by name + self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM pd.DataFrame(mapping_data)")Or alternatively:
# Insert into DuckDB using pandas DataFrame mapping_df = pd.DataFrame(mapping_data) - self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df") + self._duckdb.register("mapping_df", mapping_df) + self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df") + self._duckdb.unregister("mapping_df")Also applies to: 119-120
quantmsio/core/quantms/pg.py (1)
285-295
: Consider using a configuration object to reduce method argumentsThe method has 9 arguments which impacts readability and maintainability. Consider using a configuration dataclass or dictionary.
from dataclasses import dataclass @dataclass class QuantificationConfig: compute_topn: bool = True topn: int = 3 compute_ibaq: bool = True file_num: int = 10 duckdb_max_memory: str = "16GB" duckdb_threads: int = 4 def quantify_from_msstats_optimized( self, msstats_path: str, sdrf_path: str, config: Optional[QuantificationConfig] = None ) -> pd.DataFrame: if config is None: config = QuantificationConfig() # Use config.compute_topn, config.topn, etc.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
quantmsio/commands/convert/quantms_project.py
(1 hunks)quantmsio/core/duckdb.py
(2 hunks)quantmsio/core/quantms/msstats_in.py
(1 hunks)quantmsio/core/quantms/mztab.py
(1 hunks)quantmsio/core/quantms/pg.py
(1 hunks)tests/test_diann.py
(2 hunks)tests/test_feature.py
(4 hunks)tests/test_full_conversion_suite.py
(1 hunks)tests/test_maxquant.py
(2 hunks)tests/test_query.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/test_query.py
- tests/test_feature.py
- quantmsio/core/duckdb.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
quantmsio/core/quantms/mztab.py (8)
quantmsio/quantmsioc.py (1)
project
(84-86)quantmsio/core/project.py (1)
create_uuid_filename
(59-62)quantmsio/operate/tools.py (1)
get_modification_details
(205-247)quantmsio/utils/pride_utils.py (1)
get_quantmsio_modifications
(271-286)quantmsio/core/quantms/pg.py (2)
_safe_file_open
(229-273)cleanup
(35-58)quantmsio/operate/query.py (1)
close
(344-348)quantmsio/operate/statistics.py (1)
close
(108-112)quantmsio/core/fragpipe/fragpipe.py (2)
append
(43-44)peptidoform
(194-219)
quantmsio/core/quantms/msstats_in.py (4)
quantmsio/core/duckdb.py (3)
DuckDB
(13-101)destroy_duckdb_database
(96-101)query_field
(86-94)quantmsio/core/sdrf.py (3)
SDRFHandler
(90-454)get_experiment_type_from_sdrf
(269-314)get_sample_map_run
(407-420)quantmsio/operate/tools.py (1)
get_protein_accession
(267-272)quantmsio/utils/pride_utils.py (1)
clean_peptidoform_sequence
(147-159)
🪛 Ruff (0.11.9)
tests/test_maxquant.py
6-6: pytest
imported but unused
Remove unused import: pytest
(F401)
quantmsio/core/quantms/mztab.py
255-255: Use a context manager for opening files
(SIM115)
258-258: Use a context manager for opening files
(SIM115)
414-414: Use a context manager for opening files
(SIM115)
478-478: Use a context manager for opening files
(SIM115)
495-495: Use kwargs.get("usecols")
instead of kwargs.get("usecols", None)
Replace kwargs.get("usecols", None)
with kwargs.get("usecols")
(SIM910)
680-680: Do not use bare except
(E722)
691-691: Do not use bare except
(E722)
785-788: Use contextlib.suppress(Exception)
instead of try
-except
-pass
Replace with contextlib.suppress(Exception)
(SIM105)
792-795: Use contextlib.suppress(Exception)
instead of try
-except
-pass
Replace with contextlib.suppress(Exception)
(SIM105)
800-803: Use contextlib.suppress(Exception)
instead of try
-except
-pass
Replace with contextlib.suppress(Exception)
(SIM105)
quantmsio/core/quantms/pg.py
62-65: Use contextlib.suppress(Exception)
instead of try
-except
-pass
(SIM105)
241-241: Use a context manager for opening files
(SIM115)
246-246: Use a context manager for opening files
(SIM115)
261-264: Use contextlib.suppress(ValueError)
instead of try
-except
-pass
(SIM105)
519-519: Local variable protein_df
is assigned to but never used
Remove assignment to unused variable protein_df
(F841)
quantmsio/commands/convert/quantms_project.py
5-5: logging
imported but unused
Remove unused import: logging
(F401)
quantmsio/core/quantms/msstats_in.py
76-76: Local variable e
is assigned to but never used
Remove assignment to unused variable e
(F841)
105-105: Local variable mapping_df
is assigned to but never used
Remove assignment to unused variable mapping_df
(F841)
119-119: Local variable sample_df
is assigned to but never used
Remove assignment to unused variable sample_df
(F841)
260-260: Do not use bare except
(E722)
266-266: Do not use bare except
(E722)
🪛 Flake8 (7.2.0)
tests/test_maxquant.py
[error] 6-6: 'pytest' imported but unused
(F401)
quantmsio/core/quantms/mztab.py
[error] 680-680: do not use bare 'except'
(E722)
[error] 691-691: do not use bare 'except'
(E722)
[error] 769-769: whitespace before ':'
(E203)
quantmsio/core/quantms/pg.py
[error] 519-519: local variable 'protein_df' is assigned to but never used
(F841)
[error] 583-583: whitespace before ':'
(E203)
quantmsio/commands/convert/quantms_project.py
[error] 5-5: 'logging' imported but unused
(F401)
quantmsio/core/quantms/msstats_in.py
[error] 76-76: local variable 'e' is assigned to but never used
(F841)
[error] 105-105: local variable 'mapping_df' is assigned to but never used
(F841)
[error] 119-119: local variable 'sample_df' is assigned to but never used
(F841)
[error] 219-219: whitespace before ':'
(E203)
[error] 260-260: do not use bare 'except'
(E722)
[error] 266-266: do not use bare 'except'
(E722)
[error] 533-533: whitespace before ':'
(E203)
🪛 GitHub Check: CodeFactor
quantmsio/core/quantms/mztab.py
[notice] 680-680: quantmsio/core/quantms/mztab.py#L680
Do not use bare 'except'. (E722)
[notice] 691-691: quantmsio/core/quantms/mztab.py#L691
Do not use bare 'except'. (E722)
[notice] 691-692: quantmsio/core/quantms/mztab.py#L691-L692
Try, Except, Continue detected. (B112)
[notice] 794-795: quantmsio/core/quantms/mztab.py#L794-L795
Try, Except, Pass detected. (B110)
[notice] 787-788: quantmsio/core/quantms/mztab.py#L787-L788
Try, Except, Pass detected. (B110)
[notice] 802-803: quantmsio/core/quantms/mztab.py#L802-L803
Try, Except, Pass detected. (B110)
quantmsio/core/quantms/pg.py
[notice] 64-65: quantmsio/core/quantms/pg.py#L64-L65
Try, Except, Pass detected. (B110)
[notice] 270-271: quantmsio/core/quantms/pg.py#L270-L271
Try, Except, Pass detected. (B110)
quantmsio/commands/convert/quantms_project.py
[notice] 45-199: quantmsio/commands/convert/quantms_project.py#L45-L199
Complex Method
quantmsio/core/quantms/msstats_in.py
[notice] 260-260: quantmsio/core/quantms/msstats_in.py#L260
Do not use bare 'except'. (E722)
[notice] 68-68: quantmsio/core/quantms/msstats_in.py#L68
Do not use bare 'except'. (E722)
[notice] 683-684: quantmsio/core/quantms/msstats_in.py#L683-L684
Try, Except, Pass detected. (B110)
🪛 Pylint (3.3.7)
quantmsio/core/quantms/mztab.py
[refactor] 83-83: Too many instance attributes (22/7)
(R0902)
[refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 414-414: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 478-478: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 606-640: Too many nested blocks (7/5)
(R1702)
[refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 715-732: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
quantmsio/core/quantms/pg.py
[error] 73-73: Non-iterable value file is used in an iterating context
(E1133)
[refactor] 74-82: Unnecessary "elif" after "break", remove the leading "el" from "elif"
(R1723)
[error] 110-110: Non-iterable value file is used in an iterating context
(E1133)
[refactor] 92-92: Too many branches (13/12)
(R0912)
[refactor] 192-195: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 277-283: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 285-285: Too many arguments (9/5)
(R0913)
[refactor] 285-285: Too many positional arguments (9/5)
(R0917)
[refactor] 285-285: Too many local variables (35/15)
(R0914)
[refactor] 285-285: Too many statements (55/50)
(R0915)
[refactor] 571-571: Useless return at end of function or method
(R1711)
[refactor] 627-627: Too many arguments (7/5)
(R0913)
[refactor] 627-627: Too many positional arguments (7/5)
(R0917)
[refactor] 627-627: Too many local variables (23/15)
(R0914)
quantmsio/commands/convert/quantms_project.py
[refactor] 43-43: Too many arguments (6/5)
(R0913)
[refactor] 43-43: Too many positional arguments (6/5)
(R0917)
[refactor] 43-43: Too many local variables (25/15)
(R0914)
[refactor] 43-43: Too many branches (14/12)
(R0912)
[refactor] 43-43: Too many statements (78/50)
(R0915)
[refactor] 243-243: Too many arguments (6/5)
(R0913)
[refactor] 243-243: Too many positional arguments (6/5)
(R0917)
quantmsio/core/quantms/msstats_in.py
[refactor] 22-22: Consider using Python 3 style super() without arguments
(R1725)
[refactor] 405-406: Use 'yield from' directly instead of yielding each element one by one
(R1737)
tests/test_full_conversion_suite.py
[refactor] 70-76: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: unit-tests
- GitHub Check: unit-tests
- GitHub Check: unit-tests (3.10)
🔇 Additional comments (13)
quantmsio/core/quantms/mztab.py (2)
83-136
: LGTM! Excellent architecture for MzTab processing.The class design demonstrates good practices with:
- Optional DuckDB integration for large files
- Proper gzip support
- Resource management with context managers
- Intelligent file size-based optimization
- Comprehensive metadata caching
The initialization logic is well-structured and handles edge cases appropriately.
246-266
: Well-implemented context manager for file operations.The
_safe_file_open
method properly handles both gzipped and regular files with appropriate error handling and resource cleanup.tests/test_diann.py (2)
20-27
: Excellent addition of proper cleanup handling.The try-finally blocks ensure that DuckDB resources are properly cleaned up even if exceptions occur during processing. This prevents resource leaks and is a good practice for database connections.
Also applies to: 35-45
48-145
: Comprehensive test coverage for protein group intensity structure.This new test function excellently validates:
- The structured intensity representation with proper sample_accession and channel mapping
- Both
intensities
andadditional_intensities
fields- Nested intensity arrays with correct types ("normalize_intensity" and "lfq")
- Proper channel assignment ("LFQ" for DIA-NN)
The test provides thorough coverage for the refactored protein group handling and ensures data integrity.
tests/test_maxquant.py (1)
331-357
: Excellent test for gene name extraction.This test function provides good coverage for the gene name extraction logic from protein descriptions, testing various formats including edge cases like missing gene names and multiple gene entries.
tests/test_full_conversion_suite.py (3)
17-43
: Excellent test infrastructure setup.The helper functions provide a solid foundation for integration tests with proper workspace detection and file path management. The separation of LFQ and TMT test files is well-organized.
46-84
: Well-implemented command execution with robust error handling.The
run_command
function includes excellent practices:
- Proper UTF-8 encoding handling
- Reasonable timeout for long-running operations
- Comprehensive error capture and logging
- Good timing information for performance monitoring
105-426
: Comprehensive integration test coverage.This test suite provides excellent end-to-end validation of the conversion workflows:
- Tests all major data types (PSM, Feature, Protein Groups)
- Covers both LFQ and TMT experiments
- Validates output file generation and content structure
- Includes proper column count and data row verification
- Uses appropriate pytest markers for integration tests
The tests ensure the CLI commands produce valid and complete quantms.io parquet outputs.
quantmsio/core/quantms/msstats_in.py (1)
28-35
: Well-designed context manager and cleanup implementationThe implementation correctly follows context manager protocol and ensures proper cleanup of database resources. The try/except/pass in
__del__
is appropriate for destructor cleanup to avoid exceptions during garbage collection.Also applies to: 671-685
quantmsio/core/quantms/pg.py (4)
19-26
: Excellent resource management patternGreat implementation of resource tracking and cleanup:
- Initializing tracking lists before
super().__init__
prevents issues if parent initialization fails- Comprehensive cleanup of file handles and temporary files
- Proper context manager support
- The try/except/pass in
__del__
is appropriate for destructorsAlso applies to: 35-66
229-274
: Well-implemented safe file handling with automatic format detectionThe
_safe_file_open
method provides excellent file handling:
- Automatic detection and handling of gzipped files
- Proper resource tracking for cleanup
- Context manager pattern ensures files are closed
- Appropriate error handling in cleanup
591-626
: Well-structured SQL generation with proper aggregationThe SQL generation is clean and efficient:
- Proper use of GROUP BY for aggregation
- Good selection of aggregate functions (SUM, COUNT, MAX, AVG)
- Filtering out null anchor proteins and zero intensities
- Proper ordering for consistent results
518-523
: Remove unused DataFrame assignmentThe
protein_df
variable is created but never used directly - DuckDB references DataFrames by name in SQL.# Convert to DataFrame and load into DuckDB if protein_data: - protein_df = pd.DataFrame(protein_data) msstats_in._duckdb.execute("DROP TABLE IF EXISTS protein_groups") msstats_in._duckdb.execute( - "CREATE TABLE protein_groups AS SELECT * FROM protein_df" + "CREATE TABLE protein_groups AS SELECT * FROM pd.DataFrame(protein_data)" )Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
quantmsio/core/quantms/mztab.py (1)
784-808
: Replace bare except clauses with specific exceptions.Using bare
except:
can catch system exits and keyboard interrupts. Be more specific about which exceptions to handle.The previous review comment about replacing bare except clauses is still applicable. Apply the suggested fix:
if self._duckdb: try: self._duckdb.close() - except: + except Exception as e: + self.logger.warning(f"Failed to close DuckDB connection: {e}") - pass if self._duckdb_name and Path(self._duckdb_name).exists(): try: os.remove(self._duckdb_name) - except: + except (OSError, IOError) as e: + self.logger.warning(f"Failed to remove DuckDB file {self._duckdb_name}: {e}") - pass if self._temp_decompressed_file and self._temp_decompressed_file.exists(): try: self._temp_decompressed_file.unlink() - except: + except (OSError, IOError) as e: + self.logger.warning(f"Failed to remove temporary file {self._temp_decompressed_file}: {e}") - passquantmsio/core/quantms/msstats_in.py (1)
56-58
: This is a duplicate of a previous review comment.The exception handling has been improved with logging, but the logger attribute issue needs to be addressed.
🧹 Nitpick comments (10)
quantmsio/core/quantms/mztab.py (3)
32-81
: Consider refactoring complex nested logic.The function has deeply nested conditionals that make it difficult to follow. Consider extracting helper methods for better readability.
Consider this refactoring approach:
+def _parse_modification_base(line_parts: list, modifications: dict) -> dict: + """Parse base modification information.""" + values = line_parts[2].replace("[", "").replace("]", "").split(",") + accession = values[1].strip() + name = values[2].strip() + index = line_parts[1].split("[")[1].split("]")[0] + modifications[accession] = [name, index, None, None] + return modifications + +def _find_modification_by_index(modifications: dict, index: str) -> Optional[str]: + """Find modification accession by index.""" + for key, value in modifications.items(): + if value[1] == index: + return key + return None + def fetch_modifications_from_mztab_line(line: str, _modifications: dict) -> dict: line = line.strip() line_parts = line.split("\t") if line_parts[0] == "MTD" and "_mod[" in line_parts[1]: if "site" not in line_parts[1] and "position" not in line_parts[1]: - values = line_parts[2].replace("[", "").replace("]", "").split(",") - accession = values[1].strip() - name = values[2].strip() - index = line_parts[1].split("[")[1].split("]")[0] - _modifications[accession] = [name, index, None, None] + _modifications = _parse_modification_base(line_parts, _modifications) elif "site" in line_parts[1]: index = line_parts[1].split("[")[1].split("]")[0] - accession = None - for key, value in _modifications.items(): - if value[1] == index: - accession = key + accession = _find_modification_by_index(_modifications, index) if accession is None: raise Exception("The accession for the modification is None") _modifications[accession][2] = line_parts[2] elif "position" in line_parts[1]: index = line_parts[1].split("[")[1].split("]")[0] - accession = None - for key, value in _modifications.items(): - if value[1] == index: - accession = key + accession = _find_modification_by_index(_modifications, index) if accession is None: raise Exception("The accession for the modification is None") _modifications[accession][3] = line_parts[2] return _modifications
407-448
: Consider improving file handle management for chunked reads.The method opens file handles for chunked reading without immediate context management. While this may be intentional for streaming operations, consider documenting this pattern or exploring alternative approaches.
For better resource management, consider this pattern:
+ def _create_chunked_reader(self, seekable_path: Path, pos: int, nrows: int, **kwargs): + """Create a chunked reader with proper resource management.""" + f = open(seekable_path, "r", encoding="utf-8") + f.seek(pos) + return pd.read_csv(f, sep="\t", nrows=nrows, low_memory=False, **kwargs), f + def __load_second(self, header, **kwargs): seekable_path = self._get_seekable_file_path() if "chunksize" in kwargs and kwargs["chunksize"] is not None: - f = open(seekable_path, "r", encoding="utf-8") - if header == "PSH": - f.seek(self._psm_pos) - return pd.read_csv(f, sep="\t", nrows=self._psm_len, low_memory=False, **kwargs) + if header == "PSH": + return self._create_chunked_reader(seekable_path, self._psm_pos, self._psm_len, **kwargs) # ... similar for other headers
495-495
: Simplify kwargs.get usage.- columns = kwargs.get("usecols", None) + columns = kwargs.get("usecols")quantmsio/core/quantms/msstats_in.py (7)
22-22
: Modernize the super() call to Python 3 style.The current super() call uses the outdated Python 2 style with explicit arguments.
- super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads) + super().__init__(report_path, duckdb_max_memory, duckdb_threads)
114-116
: Remove unnecessary variable assignment.The
mapping_df
variable is created but never used after the SQL execution.- # Insert into DuckDB using pandas DataFrame - mapping_df = pd.DataFrame(mapping_data) - self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df") + # Insert into DuckDB using pandas DataFrame + self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM ?", [pd.DataFrame(mapping_data)])
128-130
: Remove unnecessary variable assignment.The
sample_df
variable is created but never used after the SQL execution.- # Insert into DuckDB using pandas DataFrame - sample_df = pd.DataFrame(sample_data) - self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM sample_df") + # Insert into DuckDB using pandas DataFrame + self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM ?", [pd.DataFrame(sample_data)])
132-220
: Well-structured dynamic SQL view creation.The method effectively handles different experiment types and available columns to create optimized database views. The SQL construction appears safe since inputs are controlled internally.
Consider using parameterized queries or a SQL builder library for future enhancements to improve maintainability and safety:
# Example of safer approach for future consideration def _build_charge_expression(self): """Build charge column expression based on available columns.""" if self.has_precursor_charge and self.has_charge: return "COALESCE(PrecursorCharge, Charge)" # ... rest of logic
492-494
: Replace print statement with proper logging.Consistent with other parts of the code, use logging instead of print statements.
- print( - f"Warning: SQL aggregation failed ({e}), falling back to pandas" - ) + self.logger.warning( + f"SQL aggregation failed ({e}), falling back to pandas" + )
582-582
: Replace print statement with proper logging.Use logging instead of print for consistency with the rest of the codebase.
- print(f"Filtering for protein: {protein_str}") + self.logger.info(f"Filtering for protein: {protein_str}")
685-698
: Improve cleanup exception handling with logging.While try/except/pass is acceptable in
__del__
methods, adding logging can help with debugging resource cleanup issues.def __del__(self): """Cleanup database views and tables.""" try: if hasattr(self, "_duckdb") and self._duckdb and self._optimized_setup_done: self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats") self._duckdb.execute("DROP TABLE IF EXISTS channel_mapping") self._duckdb.execute("DROP TABLE IF EXISTS sample_mapping") self._duckdb.execute("DROP TABLE IF EXISTS protein_groups") self._duckdb.execute("DROP VIEW IF EXISTS processed_msstats_with_pg") # Always call parent cleanup to close connection and remove database file if hasattr(self, "_duckdb") and self._duckdb: self.destroy_duckdb_database() - except Exception: - pass + except Exception as e: + # Log cleanup failures but don't raise to avoid issues during garbage collection + try: + import logging + logging.getLogger(__name__).debug(f"Error during cleanup: {e}") + except: + pass # If even logging fails, silently continue
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
quantmsio/core/quantms/msstats_in.py
(1 hunks)quantmsio/core/quantms/mztab.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
quantmsio/core/quantms/mztab.py
255-255: Use a context manager for opening files
(SIM115)
258-258: Use a context manager for opening files
(SIM115)
414-414: Use a context manager for opening files
(SIM115)
478-478: Use a context manager for opening files
(SIM115)
495-495: Use kwargs.get("usecols")
instead of kwargs.get("usecols", None)
Replace kwargs.get("usecols", None)
with kwargs.get("usecols")
(SIM910)
quantmsio/core/quantms/msstats_in.py
89-89: Local variable e
is assigned to but never used
Remove assignment to unused variable e
(F841)
115-115: Local variable mapping_df
is assigned to but never used
Remove assignment to unused variable mapping_df
(F841)
129-129: Local variable sample_df
is assigned to but never used
Remove assignment to unused variable sample_df
(F841)
🪛 GitHub Check: CodeFactor
quantmsio/core/quantms/mztab.py
[notice] 691-692: quantmsio/core/quantms/mztab.py#L691-L692
Try, Except, Continue detected. (B112)
quantmsio/core/quantms/msstats_in.py
[notice] 683-684: quantmsio/core/quantms/msstats_in.py#L683-L684
Try, Except, Pass detected. (B110)
🪛 Flake8 (7.2.0)
quantmsio/core/quantms/mztab.py
[error] 769-769: whitespace before ':'
(E203)
quantmsio/core/quantms/msstats_in.py
[error] 89-89: local variable 'e' is assigned to but never used
(F841)
[error] 115-115: local variable 'mapping_df' is assigned to but never used
(F841)
[error] 129-129: local variable 'sample_df' is assigned to but never used
(F841)
[error] 229-229: whitespace before ':'
(E203)
[error] 547-547: whitespace before ':'
(E203)
🪛 Pylint (3.3.7)
quantmsio/core/quantms/mztab.py
[refactor] 83-83: Too many instance attributes (22/7)
(R0902)
[refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 414-414: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 478-478: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 606-640: Too many nested blocks (7/5)
(R1702)
[refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 715-732: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
quantmsio/core/quantms/msstats_in.py
[refactor] 22-22: Consider using Python 3 style super() without arguments
(R1725)
[error] 70-70: Instance of 'MsstatsIN' has no 'logger' member
(E1101)
[error] 78-78: Instance of 'MsstatsIN' has no 'logger' member
(E1101)
[error] 272-272: Instance of 'MsstatsIN' has no 'logger' member
(E1101)
[error] 281-281: Instance of 'MsstatsIN' has no 'logger' member
(E1101)
[refactor] 419-420: Use 'yield from' directly instead of yielding each element one by one
(R1737)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: unit-tests
- GitHub Check: unit-tests (3.11)
- GitHub Check: unit-tests (3.10)
- GitHub Check: unit-tests
🔇 Additional comments (7)
quantmsio/core/quantms/mztab.py (6)
17-30
: LGTM! Function logic is sound.The modification list generation function correctly handles null values and processes modifications appropriately.
83-136
: Well-structured initialization with good performance optimization.The conditional DuckDB setup based on file size is a smart optimization. The high number of instance attributes is justified given the comprehensive functionality provided.
137-244
: Excellent DuckDB integration with proper fallback handling.The implementation provides good performance optimization with appropriate fallback mechanisms when DuckDB operations fail. The file size threshold and configuration settings are well-chosen.
246-312
: Robust file handling with comprehensive gzip support.The implementation correctly handles both regular and gzipped files with appropriate context management. The temporary file creation for seekable operations on gzipped files is a necessary optimization.
313-367
: Efficient metadata parsing with proper caching.The once-only parsing pattern with caching is well-implemented and prevents redundant processing. Good error handling ensures the system remains functional even with parsing issues.
519-773
: Comprehensive and well-designed public API.The methods provide excellent functionality with proper DuckDB optimization and pandas fallback. The streaming capabilities and efficient querying methods demonstrate thoughtful design for different use cases.
quantmsio/core/quantms/msstats_in.py (1)
235-261
: Excellent SQL-based performance optimization.The file statistics method effectively uses SQL aggregation for better performance compared to iterating through data in Python. The fallback mechanism ensures robustness.
self.logger.warning( | ||
f"Failed to get column info via PRAGMA: {e}, trying fallback method" | ||
) | ||
try: | ||
sample_query = "SELECT * FROM report LIMIT 1" | ||
sample_data = self._duckdb.execute(sample_query).df() | ||
available_columns = set(sample_data.columns.tolist()) | ||
except Exception as e2: | ||
self.logger.warning(f"Fallback column detection also failed: {e2}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix missing logger attribute and unused variable.
The code references self.logger
but the MsstatsIN
class doesn't inherit from a class that provides a logger attribute, which will cause an AttributeError
.
+import logging
+
class MsstatsIN(DuckDB):
def __init__(
self,
report_path: Union[Path, str],
sdrf_path: Union[Path, str],
duckdb_max_memory="16GB",
duckdb_threads=4,
):
super().__init__(report_path, duckdb_max_memory, duckdb_threads)
+ self.logger = logging.getLogger(__name__)
self._sdrf = SDRFHandler(sdrf_path)
Also, remove the unused exception variable:
- except Exception as e:
+ except Exception:
# Fallback: get columns from a sample query
self.logger.warning(
- f"Failed to get column info via PRAGMA: {e}, trying fallback method"
+ "Failed to get column info via PRAGMA, trying fallback method"
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self.logger.warning( | |
f"Failed to get column info via PRAGMA: {e}, trying fallback method" | |
) | |
try: | |
sample_query = "SELECT * FROM report LIMIT 1" | |
sample_data = self._duckdb.execute(sample_query).df() | |
available_columns = set(sample_data.columns.tolist()) | |
except Exception as e2: | |
self.logger.warning(f"Fallback column detection also failed: {e2}") | |
# At the top of quantmsio/core/quantms/msstats_in.py, add the logging import: | |
import logging | |
class MsstatsIN(DuckDB): | |
def __init__( | |
self, | |
report_path: Union[Path, str], | |
sdrf_path: Union[Path, str], | |
duckdb_max_memory="16GB", | |
duckdb_threads=4, | |
): | |
super().__init__(report_path, duckdb_max_memory, duckdb_threads) | |
# Initialize a logger for this class | |
self.logger = logging.getLogger(__name__) | |
self._sdrf = SDRFHandler(sdrf_path) | |
... | |
def _detect_columns(self): | |
try: | |
columns_query = "PRAGMA table_info('report')" | |
columns_info = self._duckdb.execute(columns_query).df() | |
available_columns = set(columns_info["name"].tolist()) | |
except Exception: | |
# Fallback: get columns from a sample query | |
self.logger.warning( | |
"Failed to get column info via PRAGMA, trying fallback method" | |
) | |
try: | |
sample_query = "SELECT * FROM report LIMIT 1" | |
sample_data = self._duckdb.execute(sample_query).df() | |
available_columns = set(sample_data.columns.tolist()) | |
except Exception as e2: | |
self.logger.warning(f"Fallback column detection also failed: {e2}") | |
self._available_columns = available_columns | |
... |
🧰 Tools
🪛 Pylint (3.3.7)
[error] 70-70: Instance of 'MsstatsIN' has no 'logger' member
(E1101)
[error] 78-78: Instance of 'MsstatsIN' has no 'logger' member
(E1101)
🤖 Prompt for AI Agents
In quantmsio/core/quantms/msstats_in.py around lines 70 to 78, the code uses
self.logger for logging but the MsstatsIN class lacks a logger attribute,
causing an AttributeError. Add a logger attribute to the MsstatsIN class, for
example by initializing it with a standard logging.getLogger instance. Also,
remove the unused exception variable e2 in the fallback except block by omitting
the variable name in the except clause or replacing it with an underscore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🔭 Outside diff range comments (1)
quantmsio/utils/pride_utils.py (1)
674-702
: Critical bug: Unreachable code after early return.The function has unreachable code after the early return on line 674. The main processing logic from lines 676-702 will never execute because the function exits early when modification_string is null/empty.
Apply this diff to fix the logic flow:
if ( modification_string is None or modification_string == "null" or modification_string == "" or pd.isna(modification_string) ): return peptide_sequence - modifications = get_modifications_object_from_mztab_line( - modification_string=modification_string, - modifications_definition=modifications_definition, - ) - - aa_index = 0 - result_peptide: str = "" - peptide_sequence = list(peptide_sequence) - # Add n-term modification if it is present - for key_index, value_index in modifications.items(): - if aa_index in value_index["position"]: - result_peptide = "[" + key_index + "]" + result_peptide - if len(result_peptide) > 0: - result_peptide = result_peptide + "-" + modifications = get_modifications_object_from_mztab_line( + modification_string=modification_string, + modifications_definition=modifications_definition, + ) + + aa_index = 0 + result_peptide: str = "" + peptide_sequence = list(peptide_sequence) + # Add n-term modification if it is present + for key_index, value_index in modifications.items(): + if aa_index in value_index["position"]: + result_peptide = "[" + key_index + "]" + result_peptide + if len(result_peptide) > 0: + result_peptide = result_peptide + "-" + aa_index += 1 for aa in peptide_sequence:
♻️ Duplicate comments (4)
quantmsio/commands/convert/quantms_project.py (1)
80-103
: Fix critical typo in parameter name.There's a typo in the parameter name that will cause a runtime error when the method is called.
project_handler.add_software_provider( - sortware_name="quantms", sortware_version=quantms_version + software_name="quantms", software_version=quantms_version )quantmsio/core/quantms/msstats_in.py (1)
68-68
: Replace bare except clause with specific exception handlingBare except clauses can catch system exceptions like
KeyboardInterrupt
. Be more specific about expected exceptions.try: columns_query = "PRAGMA table_info('report')" columns_info = self._duckdb.execute(columns_query).df() available_columns = set(columns_info["name"].tolist()) - except Exception as e: + except (AttributeError, KeyError, ValueError) as e: # Fallback: get columns from a sample query self.logger.warning( f"Failed to get column info via PRAGMA: {e}, trying fallback method" )quantmsio/core/quantms/mztab.py (2)
680-680
: Replace bare except clauses with specific exception handlingBe specific about which exceptions to handle.
try: count_result = self._duckdb.execute( f"SELECT COUNT(*) as count FROM {table}" ).fetchone() if count_result: stats["sections"][table.upper()] = { "row_count": count_result[0] } - except: + except (duckdb.Error, KeyError): stats["sections"][table.upper()] = {"row_count": 0}section_name = {"PSH": "PSM", "PEH": "PEP", "PRH": "PRT"}[ header ] stats["sections"][section_name] = {"row_count": length} except (ValueError, KeyError, IndexError) as e: # Log specific errors for debugging but continue processing other sections self.logger.debug(f"Failed to extract length for section {header}: {e}") continue - except Exception: + except Exception as e: + # Log unexpected errors but continue processing other sections + self.logger.warning(f"Unexpected error extracting length for section {header}: {e}") continueAlso applies to: 691-691
787-787
: Replace bare except clauses in cleanup methodsUsing bare
except:
can catch system exits and keyboard interrupts. Be more specific.if self._duckdb: try: self._duckdb.close() - except: - pass + except Exception as e: + self.logger.warning(f"Failed to close DuckDB connection: {e}") self._duckdb = None if self._duckdb_name and Path(self._duckdb_name).exists(): try: os.remove(self._duckdb_name) - except: - pass + except (OSError, IOError) as e: + self.logger.warning( + f"Failed to remove DuckDB file {self._duckdb_name}: {e}" + ) self._duckdb_name = None # Clean up temporary decompressed file if self._temp_decompressed_file and self._temp_decompressed_file.exists(): try: self._temp_decompressed_file.unlink() - except: - pass + except (OSError, IOError) as e: + self.logger.warning( + f"Failed to remove temporary file {self._temp_decompressed_file}: {e}" + ) self._temp_decompressed_file = NoneAlso applies to: 794-794, 802-802
🧹 Nitpick comments (19)
quantmsio/commands/convert/quantms_project.py (5)
21-34
: Improve robustness of project prefix extraction.The
get_project_prefix
function makes assumptions about filename structure that could lead to incorrect results.def get_project_prefix(sdrf_file: Path) -> str: - """Extract project prefix from SDRF filename (e.g. 'PXD000865' from 'PXD000865.sdrf.tsv').""" + """Extract project prefix from SDRF filename (e.g. 'PXD000865' from 'PXD000865.sdrf.tsv').""" filename = sdrf_file.name - # Remove .sdrf.tsv and any variations like _openms_design.sdrf.tsv - prefix = filename.split(".sdrf")[0].split("_openms")[0] + # Extract prefix before first occurrence of .sdrf or _openms + prefix = filename.split(".sdrf")[0].split("_openms")[0] + # Validate that we have a reasonable project accession format + if not prefix or len(prefix) < 3: + raise ValueError(f"Could not extract valid project prefix from filename: {filename}") return prefix
81-81
: Fix indentation issues in function definitions.Multiple function definitions have continuation line indentation issues that violate PEP 8 style guidelines.
-def _initialize_project(output_folder_path: Path, project_accession: str, sdrf_file: Path, - quantmsio_version: str, quantms_version: str): +def _initialize_project(output_folder_path: Path, project_accession: str, sdrf_file: Path, + quantmsio_version: str, quantms_version: str): -def _convert_features(mztab_file: Path, sdrf_file: Path, output_folder_path: Path, - project_accession: str, generate_ibaq_view: bool) -> list: +def _convert_features(mztab_file: Path, sdrf_file: Path, output_folder_path: Path, + project_accession: str, generate_ibaq_view: bool) -> list: -def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str, - output_folder_path: Path): +def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str, + output_folder_path: Path): -def _convert_psms(mztab_file: Path, sdrf_file: Path, output_folder_path: Path, - project_accession: str) -> list: +def _convert_psms(mztab_file: Path, sdrf_file: Path, output_folder_path: Path, + project_accession: str) -> list: -def _register_files_in_project(created_files: list, output_folder_path: Path, - project_accession: str): +def _register_files_in_project(created_files: list, output_folder_path: Path, + project_accession: str): - project_handler = _initialize_project(output_folder_path, project_accession, - sdrf_file, quantmsio_version, quantms_version) + project_handler = _initialize_project(output_folder_path, project_accession, + sdrf_file, quantmsio_version, quantms_version) - feature_files = _convert_features(mztab_file, sdrf_file, output_folder_path, - project_accession, generate_ibaq_view) + feature_files = _convert_features(mztab_file, sdrf_file, output_folder_path, + project_accession, generate_ibaq_view)Also applies to: 107-107, 136-136, 149-149, 172-172, 226-226, 233-233
93-93
: Remove unused variable assignments.Variables are assigned but never used, creating unnecessary code.
- # Save initial project file - project_json = str(output_folder_path / f"{project_accession}.project.json") project_handler.save_project_info( output_prefix_file=project_accession, output_folder=str(output_folder_path), delete_existing=True, ) - # Initialize project - project_handler = _initialize_project(output_folder_path, project_accession, - sdrf_file, quantmsio_version, quantms_version) + # Initialize project + _initialize_project(output_folder_path, project_accession, + sdrf_file, quantmsio_version, quantms_version)Also applies to: 225-225
192-247
: Consider reducing function complexity.The
quantmsio_workflow
function has 6 parameters and manages the entire workflow orchestration. While it has been improved by extracting helper functions, it could benefit from further modularization.Consider creating a
QuantmsProjectConverter
class to encapsulate the workflow state and reduce parameter passing:class QuantmsProjectConverter: def __init__(self, base_folder: str, output_folder: str, project_accession: str, quantms_version: str, quantmsio_version: str, generate_ibaq_view: bool): self.base_folder = base_folder self.output_folder = output_folder self.project_accession = project_accession self.quantms_version = quantms_version self.quantmsio_version = quantmsio_version self.generate_ibaq_view = generate_ibaq_view self.created_files = [] def convert(self) -> None: """Execute the full conversion workflow.""" self._setup_and_validate() self._initialize_project() self._convert_data() self._register_files()
304-306
: Handle potential type mismatch in output directory assignment.The output_dir parameter is Optional[Path] but gets assigned a string value, which could cause type checking issues.
# Default output to sibling quantms.io directory if not output_dir: - output_dir = str(quantms_dir.parent / "quantms.io") + output_dir = quantms_dir.parent / "quantms.io"quantmsio/utils/pride_utils.py (2)
516-519
: Simplify optional field assignments using dict.get()These if-else blocks can be simplified using
dict.get()
with a default value.def _set_optional_psm_fields(psm: dict, es: dict) -> None: """Set optional PSM fields with proper fallbacks.""" # Handle decoy peptide information - if "opt_global_cv_MS:1002217_decoy_peptide" in es: - psm["is_decoy"] = es["opt_global_cv_MS:1002217_decoy_peptide"] - else: - psm["is_decoy"] = None + psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None) # Handle posterior error probability if "opt_global_Posterior_Error_Probability_score" in es: psm["posterior_error_probability"] = es["opt_global_Posterior_Error_Probability_score"] else: psm["posterior_error_probability"] = None # Handle global q-value - if "opt_global_q-value" in es: - psm["global_qvalue"] = es["opt_global_q-value"] - else: - psm["global_qvalue"] = None + psm["global_qvalue"] = es.get("opt_global_q-value", None) # Handle consensus support - if "opt_global_consensus_support" in es: - psm["consensus_support"] = es["opt_global_consensus_support"] - else: - psm["consensus_support"] = None + psm["consensus_support"] = es.get("opt_global_consensus_support", None)Also applies to: 528-531, 534-537
513-538
: Simplify field assignments using dict.get() method.The static analysis tool correctly identifies opportunities to simplify the conditional assignments.
Apply this diff to simplify the field assignments:
- # Handle decoy peptide information - if "opt_global_cv_MS:1002217_decoy_peptide" in es: - psm["is_decoy"] = es["opt_global_cv_MS:1002217_decoy_peptide"] - else: - psm["is_decoy"] = None + # Handle decoy peptide information + psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None) - # Handle global q-value - if "opt_global_q-value" in es: - psm["global_qvalue"] = es["opt_global_q-value"] - else: - psm["global_qvalue"] = None + # Handle global q-value + psm["global_qvalue"] = es.get("opt_global_q-value", None) - # Handle consensus support - if "opt_global_consensus_support" in es: - psm["consensus_support"] = es["opt_global_consensus_support"] - else: - psm["consensus_support"] = None + # Handle consensus support + psm["consensus_support"] = es.get("opt_global_consensus_support", None)tests/test_msstats_in.py (3)
88-88
: Remove unnecessary f-string prefixesThese strings don't contain any placeholders, so the f-string prefix is not needed.
- print(f"\nLFQ test completed successfully!") + print("\nLFQ test completed successfully!")- print(f"\nTMT test completed successfully!") + print("\nTMT test completed successfully!")- print(f"\nDataset Comparison:") + print("\nDataset Comparison:")Also applies to: 356-356, 428-428
220-223
: Replace unused loop variable with underscoreThe
count
variable is not used within the loop body.if intensities_channel_counts: - for channel, count in intensities_channel_counts.items(): + for channel, _ in intensities_channel_counts.items(): if channel not in channel_counts: channel_counts[channel] = 0
264-270
: Simplify dictionary membership testRemove unnecessary
.keys()
when checking dictionary membership.all_channels = sorted( set( channel for file_channels in file_channel_matrix.values() - for channel in file_channels.keys() + for channel in file_channels ) )quantmsio/core/quantms/msstats_in.py (1)
416-422
: Use 'yield from' for better performanceWhen yielding all elements from an iterator without modification, use
yield from
.if not self._optimized_setup_done: # Fallback to original method - for batch in self.generate_msstats_in(file_batch_size): - yield batch + yield from self.generate_msstats_in(file_batch_size) returnquantmsio/core/quantms/mztab.py (1)
414-414
: Consider using context managers for file operationsThese file operations should ideally use context managers to ensure proper resource cleanup.
For chunked reading operations, the current approach is necessary since the file handle needs to remain open across multiple chunk iterations. However, consider documenting this design choice:
if "chunksize" in kwargs and kwargs["chunksize"] is not None: # For chunked reading, we need to keep the file handle open + # Note: Caller is responsible for closing the file handle when done iterating f = open(seekable_path, "r", encoding="utf-8")
Also applies to: 478-478
quantmsio/core/quantms/pg.py (7)
76-84
: Simplify conditional structure by removing unnecessaryelif
.The
elif
condition is unreachable after thebreak
statement.for line in file: if line.startswith("PRH"): protein_columns = line.strip().split("\t")[1:] break - elif line.startswith("PRT\t") and not protein_columns: + if line.startswith("PRT\t") and not protein_columns: # Fallback: use first PRT line to determine column count prt_parts = line.strip().split("\t") # Generate default column names based on actual data protein_columns = [f"col_{i}" for i in range(len(prt_parts))] break
89-92
: Remove unnecessary empty finally block.The finally block serves no purpose here as file handles are already managed by the context manager.
except Exception as e: logging.getLogger("quantmsio.core.mztab").warning( f"Could not extract protein columns: {e}" ) -finally: - # File handle is automatically tracked and will be cleaned up - pass return protein_columns
194-197
: Simplify conditional by removing unnecessaryelif
.if len(parts) >= 3: return parts[2] # PROT_HUMAN -elif len(parts) >= 2: +if len(parts) >= 2: return parts[1] # P12345
283-289
: Remove unnecessaryelse
afterreturn
.def _convert_to_parquet_format(self, df: pd.DataFrame) -> pa.Table: """Convert DataFrame to parquet format using PG_SCHEMA.""" if df.empty: # For empty DataFrames, create an empty table with the correct schema return pa.Table.from_arrays( [pa.array([], type=field.type) for field in PG_SCHEMA], schema=PG_SCHEMA ) - else: - return pa.Table.from_pandas(df, schema=PG_SCHEMA, preserve_index=False) + return pa.Table.from_pandas(df, schema=PG_SCHEMA, preserve_index=False)
437-440
: Document the cleanup behavior after context manager exit.The cleanup() call happens after the MsstatsIN context manager has already exited. Consider adding a comment to clarify this is intentional.
# Context manager automatically cleans up DuckDB resources # Cleanup any temporary files created during processing +# Note: This cleanup is for MzTabProteinGroups resources, not MsstatsIN self.cleanup()
577-577
: Remove unnecessary return statement.Remove the empty return statement as Python functions return
None
by default.
589-589
: Fix slice formatting.-yield unique_files[i : i + batch_size] +yield unique_files[i:i + batch_size]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
quantmsio/commands/convert/quantms_project.py
(1 hunks)quantmsio/core/quantms/msstats_in.py
(1 hunks)quantmsio/core/quantms/mztab.py
(1 hunks)quantmsio/core/quantms/pg.py
(1 hunks)quantmsio/utils/pride_utils.py
(4 hunks)tests/test_msstats_in.py
(1 hunks)
🧰 Additional context used
🪛 GitHub Check: CodeFactor
tests/test_msstats_in.py
[notice] 187-231: tests/test_msstats_in.py#L187-L231
Complex Method
🪛 Ruff (0.11.9)
tests/test_msstats_in.py
88-88: f-string without any placeholders
Remove extraneous f
prefix
(F541)
220-220: Loop control variable count
not used within loop body
(B007)
268-268: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
356-356: f-string without any placeholders
Remove extraneous f
prefix
(F541)
428-428: f-string without any placeholders
Remove extraneous f
prefix
(F541)
quantmsio/commands/convert/quantms_project.py
93-93: Local variable project_json
is assigned to but never used
Remove assignment to unused variable project_json
(F841)
225-225: Local variable project_handler
is assigned to but never used
Remove assignment to unused variable project_handler
(F841)
quantmsio/core/quantms/msstats_in.py
89-89: Local variable e
is assigned to but never used
Remove assignment to unused variable e
(F841)
115-115: Local variable mapping_df
is assigned to but never used
Remove assignment to unused variable mapping_df
(F841)
129-129: Local variable sample_df
is assigned to but never used
Remove assignment to unused variable sample_df
(F841)
quantmsio/core/quantms/mztab.py
255-255: Use a context manager for opening files
(SIM115)
258-258: Use a context manager for opening files
(SIM115)
414-414: Use a context manager for opening files
(SIM115)
478-478: Use a context manager for opening files
(SIM115)
495-495: Use kwargs.get("usecols")
instead of kwargs.get("usecols", None)
Replace kwargs.get("usecols", None)
with kwargs.get("usecols")
(SIM910)
quantmsio/core/quantms/pg.py
243-243: Use a context manager for opening files
(SIM115)
248-248: Use a context manager for opening files
(SIM115)
525-525: Local variable protein_df
is assigned to but never used
Remove assignment to unused variable protein_df
(F841)
quantmsio/utils/pride_utils.py
516-519: Use psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None)
instead of an if
block
Replace with psm["is_decoy"] = es.get("opt_global_cv_MS:1002217_decoy_peptide", None)
(SIM401)
528-531: Use psm["global_qvalue"] = es.get("opt_global_q-value", None)
instead of an if
block
Replace with psm["global_qvalue"] = es.get("opt_global_q-value", None)
(SIM401)
534-537: Use psm["consensus_support"] = es.get("opt_global_consensus_support", None)
instead of an if
block
Replace with psm["consensus_support"] = es.get("opt_global_consensus_support", None)
(SIM401)
🪛 Flake8 (7.2.0)
tests/test_msstats_in.py
[error] 88-88: f-string is missing placeholders
(F541)
[error] 356-356: f-string is missing placeholders
(F541)
[error] 428-428: f-string is missing placeholders
(F541)
quantmsio/commands/convert/quantms_project.py
[error] 81-81: continuation line under-indented for visual indent
(E128)
[error] 93-93: local variable 'project_json' is assigned to but never used
(F841)
[error] 107-107: continuation line under-indented for visual indent
(E128)
[error] 136-136: continuation line under-indented for visual indent
(E128)
[error] 149-149: continuation line under-indented for visual indent
(E128)
[error] 172-172: continuation line under-indented for visual indent
(E128)
[error] 225-225: local variable 'project_handler' is assigned to but never used
(F841)
[error] 226-226: continuation line under-indented for visual indent
(E128)
[error] 233-233: continuation line under-indented for visual indent
(E128)
quantmsio/core/quantms/msstats_in.py
[error] 89-89: local variable 'e' is assigned to but never used
(F841)
[error] 115-115: local variable 'mapping_df' is assigned to but never used
(F841)
[error] 129-129: local variable 'sample_df' is assigned to but never used
(F841)
[error] 229-229: whitespace before ':'
(E203)
[error] 547-547: whitespace before ':'
(E203)
quantmsio/core/quantms/mztab.py
[error] 775-775: whitespace before ':'
(E203)
quantmsio/core/quantms/pg.py
[error] 525-525: local variable 'protein_df' is assigned to but never used
(F841)
[error] 589-589: whitespace before ':'
(E203)
🪛 Pylint (3.3.7)
tests/test_msstats_in.py
[refactor] 187-187: Too many branches (15/12)
(R0912)
quantmsio/commands/convert/quantms_project.py
[refactor] 192-192: Too many arguments (6/5)
(R0913)
[refactor] 192-192: Too many positional arguments (6/5)
(R0917)
[refactor] 192-192: Too many local variables (16/15)
(R0914)
[refactor] 289-289: Too many arguments (6/5)
(R0913)
[refactor] 289-289: Too many positional arguments (6/5)
(R0917)
quantmsio/core/quantms/msstats_in.py
[refactor] 22-22: Consider using Python 3 style super() without arguments
(R1725)
[error] 70-70: Instance of 'MsstatsIN' has no 'logger' member
(E1101)
[error] 78-78: Instance of 'MsstatsIN' has no 'logger' member
(E1101)
[error] 272-272: Instance of 'MsstatsIN' has no 'logger' member
(E1101)
[error] 281-281: Instance of 'MsstatsIN' has no 'logger' member
(E1101)
[refactor] 419-420: Use 'yield from' directly instead of yielding each element one by one
(R1737)
quantmsio/core/quantms/mztab.py
[refactor] 83-83: Too many instance attributes (22/7)
(R0902)
[refactor] 369-374: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 415-429: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 433-447: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 414-414: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 478-478: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 606-640: Too many nested blocks (7/5)
(R1702)
[refactor] 654-657: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 721-738: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
quantmsio/core/quantms/pg.py
[error] 75-75: Non-iterable value file is used in an iterating context
(E1133)
[refactor] 76-84: Unnecessary "elif" after "break", remove the leading "el" from "elif"
(R1723)
[error] 112-112: Non-iterable value file is used in an iterating context
(E1133)
[refactor] 94-94: Too many branches (13/12)
(R0912)
[refactor] 194-197: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 283-289: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 291-291: Too many arguments (9/5)
(R0913)
[refactor] 291-291: Too many positional arguments (9/5)
(R0917)
[refactor] 291-291: Too many local variables (35/15)
(R0914)
[refactor] 291-291: Too many statements (55/50)
(R0915)
[refactor] 577-577: Useless return at end of function or method
(R1711)
[refactor] 633-633: Too many arguments (7/5)
(R0913)
[refactor] 633-633: Too many positional arguments (7/5)
(R0917)
[refactor] 633-633: Too many local variables (23/15)
(R0914)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: unit-tests
- GitHub Check: unit-tests
- GitHub Check: unit-tests (3.10)
- GitHub Check: unit-tests (3.11)
🔇 Additional comments (10)
quantmsio/commands/convert/quantms_project.py (1)
1-4
: LGTM!The module docstring is clear and concise.
quantmsio/utils/pride_utils.py (7)
46-46
: Good practice using raw strings for regex patterns!Using raw strings for regex patterns prevents potential issues with escape sequences and makes the regex patterns more readable.
Also applies to: 52-52
482-511
: Excellent refactoring into focused helper functions!Breaking down the
fetch_psm_from_mztab_line
function into smaller, focused helper functions improves readability and maintainability. Each helper function has a clear single responsibility.
46-46
: Good fix: Using raw strings for regex patterns.Converting regex patterns to raw strings prevents potential escaping issues and follows Python best practices for regex patterns.
Also applies to: 52-52
482-511
: Well-designed helper function with good key management.The function effectively handles optional keys by removing them from the required list when they don't exist in the data, preventing KeyError exceptions.
540-552
: Helper function provides clear separation of concerns.The peptidoform field logic is properly encapsulated and handles both explicit and derived peptidoform sequences correctly.
554-571
: Improved error handling and null safety.The function properly handles the case where ms_runs is None and provides clear error messages when ms_run is not found in the index.
573-600
: Excellent refactoring that improves maintainability.The decomposition of the monolithic function into focused helper functions significantly improves code readability and testability while maintaining the same functionality.
quantmsio/core/quantms/mztab.py (1)
83-819
: Well-designed mzTab parser with comprehensive features!The implementation provides excellent functionality including:
- Support for both regular and gzipped files
- DuckDB integration for performance
- Proper resource management with context managers
- Efficient section extraction and streaming
- Good error handling and logging
The class structure is complex but justified by the feature set.
quantmsio/core/quantms/pg.py (1)
622-622
: Verify file names are safe for SQL queries.Ensure that
file_batch
contains properly sanitized file names to prevent any potential SQL injection issues.The file names come from the msstats data which should be trusted, but it's good practice to validate inputs used in SQL queries.
if generate_ibaq_view: | ||
_generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add created IBAQ file to tracking list.
The IBAQ file generation doesn't add the created file to the tracking list, which means it won't be registered in the project metadata.
# Generate IBAQ view if requested
if generate_ibaq_view:
- _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)
+ ibaq_file = _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path)
+ if ibaq_file:
+ created_files.append(("ibaq-file", str(ibaq_file)))
Update _generate_ibaq_view
to return the created file path:
-def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str,
- output_folder_path: Path):
+def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str,
+ output_folder_path: Path) -> Optional[Path]:
"""Generate IBAQ view from feature data."""
print("\n=== Generating IBAQ View ===")
try:
ibaq_file = create_uuid_filename(project_accession, ".ibaq.parquet")
ibaq_path = output_folder_path / ibaq_file
write_ibaq_feature(str(sdrf_file), str(feature_file), str(ibaq_path))
print("IBAQ view generation completed successfully")
+ return ibaq_path
except Exception as e:
print(f"ERROR: IBAQ view generation failed: {str(e)}", file=sys.stderr)
+ return None
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if generate_ibaq_view: | |
_generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path) | |
# Generate IBAQ view if requested | |
if generate_ibaq_view: | |
- _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path) | |
+ ibaq_file = _generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path) | |
+ if ibaq_file: | |
+ created_files.append(("ibaq-file", str(ibaq_file))) |
if generate_ibaq_view: | |
_generate_ibaq_view(sdrf_file, feature_file, project_accession, output_folder_path) | |
-def _generate_ibaq_view(sdrf_file: Path, feature_file: Path, project_accession: str, | |
- output_folder_path: Path): | |
+def _generate_ibaq_view(sdrf_file: Path, | |
+ feature_file: Path, | |
+ project_accession: str, | |
+ output_folder_path: Path) -> Optional[Path]: | |
"""Generate IBAQ view from feature data.""" | |
print("\n=== Generating IBAQ View ===") | |
try: | |
ibaq_file = create_uuid_filename(project_accession, ".ibaq.parquet") | |
ibaq_path = output_folder_path / ibaq_file | |
write_ibaq_feature(str(sdrf_file), str(feature_file), str(ibaq_path)) | |
print("IBAQ view generation completed successfully") | |
+ return ibaq_path | |
except Exception as e: | |
print(f"ERROR: IBAQ view generation failed: {str(e)}", file=sys.stderr) | |
+ return None |
🤖 Prompt for AI Agents
In quantmsio/commands/convert/quantms_project.py around lines 127 to 128, the
code calls _generate_ibaq_view but does not add the created IBAQ file to the
tracking list, so it is not registered in the project metadata. Modify
_generate_ibaq_view to return the path of the created IBAQ file, then capture
this return value here and append it to the tracking list that manages project
files to ensure proper registration.
|
||
self._available_columns = available_columns | ||
|
||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove unused variable assignments
These variables are assigned but never used.
- except Exception as e:
+ except Exception:
# Default column set based on common msstats format
self._available_columns = {
# Insert into DuckDB using pandas DataFrame
mapping_df = pd.DataFrame(mapping_data)
- self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
+ self._duckdb.register("mapping_df", mapping_df)
+ self._duckdb.execute("CREATE TABLE channel_mapping AS SELECT * FROM mapping_df")
+ self._duckdb.unregister("mapping_df")
# Insert into DuckDB using pandas DataFrame
sample_df = pd.DataFrame(sample_data)
- self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM sample_df")
+ self._duckdb.register("sample_df", sample_df)
+ self._duckdb.execute("CREATE TABLE sample_mapping AS SELECT * FROM sample_df")
+ self._duckdb.unregister("sample_df")
Also applies to: 115-115, 129-129
🧰 Tools
🪛 Flake8 (7.2.0)
[error] 89-89: local variable 'e' is assigned to but never used
(F841)
🪛 Ruff (0.11.9)
89-89: Local variable e
is assigned to but never used
Remove assignment to unused variable e
(F841)
🤖 Prompt for AI Agents
In quantmsio/core/quantms/msstats_in.py at lines 89, 115, and 129, the exception
handlers assign the caught exception to a variable (e.g., 'except Exception as
e:') but never use this variable. Remove the variable assignment by changing
these lines to 'except Exception:' to avoid unused variable warnings and clean
up the code.
class MsstatsIN(DuckDB): | ||
def __init__( | ||
self, | ||
report_path: Union[Path, str], | ||
sdrf_path: Union[Path, str], | ||
duckdb_max_memory="16GB", | ||
duckdb_threads=4, | ||
): | ||
super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads) | ||
self._sdrf = SDRFHandler(sdrf_path) | ||
self.experiment_type = self._sdrf.get_experiment_type_from_sdrf() | ||
self._sample_map = self._sdrf.get_sample_map_run() | ||
self._optimized_setup_done = False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing logger attribute to the class
The class uses self.logger
in multiple places but doesn't initialize it. This will cause AttributeError
at runtime.
class MsstatsIN(DuckDB):
def __init__(
self,
report_path: Union[Path, str],
sdrf_path: Union[Path, str],
duckdb_max_memory="16GB",
duckdb_threads=4,
):
super(MsstatsIN, self).__init__(report_path, duckdb_max_memory, duckdb_threads)
+ self.logger = logging.getLogger(__name__)
self._sdrf = SDRFHandler(sdrf_path)
self.experiment_type = self._sdrf.get_experiment_type_from_sdrf()
self._sample_map = self._sdrf.get_sample_map_run()
self._optimized_setup_done = False
Don't forget to import logging at the top of the file:
import logging
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 22-22: Consider using Python 3 style super() without arguments
(R1725)
🤖 Prompt for AI Agents
In quantmsio/core/quantms/msstats_in.py around lines 14 to 27, the MsstatsIN
class uses self.logger but does not initialize it, which will cause an
AttributeError at runtime. To fix this, import the logging module at the top of
the file and add a logger attribute initialization in the __init__ method, for
example by setting self.logger = logging.getLogger(__name__).
PR Type
Enhancement
Description
Major CLI restructuring with organized command groups
Enhanced logging system with structured output support
New quantms workflow conversion command
Improved error handling and verbose logging
Changes walkthrough 📝
28 files
Add quantms workflow conversion command
Add DIA-NN conversion commands
Add PSM conversion and comparison commands
Add feature conversion command
Add visualization plotting commands
Restructure CLI with organized command groups
Add differential expression transformation command
Add statistical analysis commands
Update MaxQuant conversion commands
Add enhanced logging to PSM processing
Add enhanced logging to feature processing
Add project management commands
Add absolute expression transformation command
Update schema with nullable fields
Update FragPipe PSM conversion method
Add gene mapping transformation command
Update project save methods
Update FragPipe conversion command
Add UniProt mapping transformation command
Add IBAQ transformation command
Add spectra transformation command
Update AnnData merge command
Update file attachment command
Add logging to MzTab class
Add logger utility module
Add convert commands package
Add transform commands package
Add utils commands package
1 files
Fix regex patterns with raw strings
7 files
Update feature schema with scan format
Major documentation update with usage examples
Update PSM schema with record types
Add comprehensive logging documentation
Add differential expression schema
Update IBAQ use case documentation
Add absolute expression schema
2 files
Update DuckDB version requirement
Update Python and PyOpenMS version requirements
13 files
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests
Chores
.gitignore
to exclude additional temporary and generated files.