From cef9a99c9d3a84bd70fb358cf919be5d687c0fd2 Mon Sep 17 00:00:00 2001 From: anjag01 Date: Fri, 9 May 2025 10:38:45 +0200 Subject: [PATCH 1/3] Update file-handling.md added subsection compression of mzML files --- .../types-of-topp-tools/file-handling.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/docs/getting-started/types-of-topp-tools/file-handling.md b/docs/getting-started/types-of-topp-tools/file-handling.md index 83324433..887b357f 100644 --- a/docs/getting-started/types-of-topp-tools/file-handling.md +++ b/docs/getting-started/types-of-topp-tools/file-handling.md @@ -33,6 +33,21 @@ The TOPP tools work only on the HUPO-PSI `mzML` format. If you need to convert * For format names as file extension, the tool derives the format from the extension. For other extensions, the file formats of the input and output file can be given explicitly. +## Compression of mzML files + +TOPP tools now support writing compressed .mzML.gz files for efficient storage. For example, PeakPickerHiRes can output compressed files: + +`PeakPickerHiRes -in input.mzML -out output.mzML.gz -threads 8` + +Compression uses pigz (parallel gzip) if installed for faster performance, falling back to Boost's gzip otherwise. When using pigz, OpenMS limits threads to the user-specified value (e.g., -threads 8) via omp_get_max_threads(), ensuring compatibility with cluster schedulers. Install pigz for optimal speed. + +Trade-offs: + +Efficiency: .mzML.gz files are 2-3x smaller; pigz is significantly faster but CPU-intensive. +Compatibility: Ensure downstream tools support .mzML.gz. + +This feature, integrated into MzMLHandler::writeTo, supports indexed mzML and enhances data management. + ## Converting between DTA and mzML Sequest DTA files can be extracted from a mzML file using the `DTAExtractor`: @@ -119,4 +134,4 @@ It can currently write the following formats: This example shows how to convert pepXML to idXML: -`IDFileConverter -in infile.pepXML -out outfile.idXML` \ No newline at end of file +`IDFileConverter -in infile.pepXML -out outfile.idXML` From d3936707918fff7eb29f99eef26a6077c520173d Mon Sep 17 00:00:00 2001 From: anjag01 Date: Thu, 22 May 2025 16:42:34 +0200 Subject: [PATCH 2/3] Update file-handling.md added a subsection for the compression of mzML files --- docs/getting-started/types-of-topp-tools/file-handling.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/getting-started/types-of-topp-tools/file-handling.md b/docs/getting-started/types-of-topp-tools/file-handling.md index 887b357f..c94b32ca 100644 --- a/docs/getting-started/types-of-topp-tools/file-handling.md +++ b/docs/getting-started/types-of-topp-tools/file-handling.md @@ -39,14 +39,14 @@ TOPP tools now support writing compressed .mzML.gz files for efficient storage. `PeakPickerHiRes -in input.mzML -out output.mzML.gz -threads 8` -Compression uses pigz (parallel gzip) if installed for faster performance, falling back to Boost's gzip otherwise. When using pigz, OpenMS limits threads to the user-specified value (e.g., -threads 8) via omp_get_max_threads(), ensuring compatibility with cluster schedulers. Install pigz for optimal speed. +Compression uses pigz (parallel gzip) if installed for faster performance, falling back to OpenMS's internal compression mechanism otherwise. When using pigz, OpenMS limits threads to the user-specified value (e.g., -threads 8) via omp_get_max_threads(), ensuring compatibility with cluster schedulers. Install pigz for optimal speed. Trade-offs: Efficiency: .mzML.gz files are 2-3x smaller; pigz is significantly faster but CPU-intensive. Compatibility: Ensure downstream tools support .mzML.gz. -This feature, integrated into MzMLHandler::writeTo, supports indexed mzML and enhances data management. +This feature supports indexed mzML and enhances data management. ## Converting between DTA and mzML From d935be5dfc7a329eb6ac530acf840060659000cc Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Thu, 3 Jul 2025 12:04:38 +0200 Subject: [PATCH 3/3] add section on writing mzML.gz files --- .../types-of-topp-tools/file-handling.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/docs/getting-started/types-of-topp-tools/file-handling.md b/docs/getting-started/types-of-topp-tools/file-handling.md index c94b32ca..5272ff00 100644 --- a/docs/getting-started/types-of-topp-tools/file-handling.md +++ b/docs/getting-started/types-of-topp-tools/file-handling.md @@ -35,18 +35,24 @@ formats of the input and output file can be given explicitly. ## Compression of mzML files -TOPP tools now support writing compressed .mzML.gz files for efficient storage. For example, PeakPickerHiRes can output compressed files: + +OpenMS has supported **reading** of compressed mzML, mzXML, and mzData for a long time. + +Since OpenMS 3.5, TOPP tools that produce mzML output files also support **writing** compressed gzipped `.mzML.gz` files. +To enable compression, simply use `mzML.gz` instead of `.mzML` as the output filename. + + For example, PeakPickerHiRes can output compressed files like this: `PeakPickerHiRes -in input.mzML -out output.mzML.gz -threads 8` -Compression uses pigz (parallel gzip) if installed for faster performance, falling back to OpenMS's internal compression mechanism otherwise. When using pigz, OpenMS limits threads to the user-specified value (e.g., -threads 8) via omp_get_max_threads(), ensuring compatibility with cluster schedulers. Install pigz for optimal speed. +Compression uses the `pigz` (parallel gzip) tool, if installed, or falls back to OpenMS's internal compression mechanism otherwise. `pigz` offers faster compression speed, even if only using one thread. The number of threads used for compression is determined by the usual `-threads ` flag of the TOPP tool. +Without pigz, the internal gzip compressor is used, which only supports a single thread, irrespective of the value given in `-threads `. + -Trade-offs: +compression efficiency: `.mzML.gz` files are typically 2-3x smaller +compression speed: `pigz` is significantly faster than the internal compression. Install `pigz` if possible (it's available via the usual package managers), -Efficiency: .mzML.gz files are 2-3x smaller; pigz is significantly faster but CPU-intensive. -Compatibility: Ensure downstream tools support .mzML.gz. -This feature supports indexed mzML and enhances data management. ## Converting between DTA and mzML