Skip to content

Releases: MikkelSchubert/adapterremoval

AdapterRemoval v2.3.4

24 Aug 15:25
Compare
Choose a tag to compare

This release adds a new couple of command-line options for handling non-ACGTN
bases in FASTQ data and back-ports a few minor fixes from the development
branch.

Added

  • Added support for converting Uracils (U) in input data to Thymine (T) via the
    --convert-uracils flag.
  • Added support for replacing IUPAC-encoded degenerate bases with Ns via the
    --mask-degenerate-bases flag.
  • Added DESTDIR support to make install.

Fixed

  • Improved progress timer accuracy, so updates occur closer to every 1M reads.

Changed

  • Minor improvements to --help text and documentation.

AdapterRemoval v3.0.0-alpha2

20 Aug 14:09
Compare
Choose a tag to compare
Pre-release

This is the second alpha release of AdapterRemoval v3. It is the intention that
a third alpha release, or the final 3.0 release, will follow within the next
couple of months.

As with alpha 1, changes that affect how AdapterRemoval is used (e.g. by
removing options) or that result in different output compared to AdapterRemoval
v2 are marked with the label "[BREAKING]".

In addition to changes listed below, this release includes increased throughput
thanks to improved parallelization of various steps in internal pipeline,
support for AVX512 and general improvements to the SIMD alignment algorithms,
loop unrolling of non-SIMD alignments to significantly increase throughput when
SIMD is not available, and a significant decrease in the number of allocations
to decrease overhead.

This release requires a compiler with support for c++17 and libdeflate is now a
mandatory dependency.

Draft documentation is available here and a pre-compiled binary for x86-64
Linux systems is attached below.

Added

  • Added support for converting (U)racils in input data to T(hymine) via the
    --convert-uracils flag.
  • Added support for replacing IUPAC-encoded degenerate bases with Ns via the
    --mask-degenerate-bases flag.
  • Added support for writing output in SAM/BAM formats, with optional
    user-supplied read-group information.
  • Added support for alignments using AVX512 instructions. AVX512 support only
    available when AdapterRemoval is compiled with GCC v11+ or Clang v8+.
  • Added support selecting output file formats via the file extension and via
    the --out-format option. A corresponding option, --stdout-format was
    added to select the format for data written to STDOUT.
  • Added support for reading from STDIN or writing to STDOUT when '-' is used as
    the filename, as an alternative to using /dev/stdin or /dev/stdout.
  • Added dedicated threads solely for writing output data. This allows compute
    threads to work at full capacity, as long as the destination can consume
    written data fast enough. This may result in CPU utilization exceeding
    --threads by a couple of percent.
  • Added support for setting DESTDIR when running make install.
  • Added --licenses flag for displaying licenses of 3rd party code used by /
    incorporated into AdapterRemoval.
  • Added --simd option allowing the user to select the specific SIMD
    instruction set they wish to use.
  • Added Containerfile for building static binaries using alpine/musl.

Changed

  • [BREAKING] Changed the default --mm/--mismatch-rate from 1/3 to 1/6,
    in order to decrease the false positive rate, in particular for read merging.
  • [BREAKING] Default to writing gzip-compressed FASTQ files; output written
    to STDOUT is uncompressed by default.
  • [BREAKING] Discarded reads are no longer saved by default.
  • [BREAKING] Output files for discarded reads and singleton (orphan)
    paired-end reads are only created if filtering is enabled.
  • [BREAKING] The --basename / --out-prefix no longer defaults to
    your_output. Instead the user is required to set at least one --out-*
    option.
  • [BREAKING] Merged --identify-adapters and --report-only commands. The
    adapter sequence is presently only reported in the HTML report, but will be
    added to the JSON report following some planned changes.
  • [BREAKING] Reverted --min-complexity being enabled by default.
  • Increased the default --threads value to 2.
  • A number of command-line options were renamed for consistency; use of the old
    names is still supported, but will trigger a warning message.
  • Re-organized compression: level 1 is streamed using isa-l, while levels 2-13
    correspond to libdeflate levels 1 to 12.
  • Changed the default compression level to 5 on the new scale (libdeflate level
    4); this results in a ~40% increase in throughput at the cost of roughly ~3%
    larger output files.
  • Setting an --out-* option in demultiplexing mode overrides the basename /
    prefix for that specific output type.
  • Add smoothing to GC values calculated for the GC content curve, to account
    for the fact that possible GC% values are unevenly distributed depending on
    the read length.

Removed

The following changes are all [BREAKING] as described above:

  • Removed support for original merging algorithm has been removed. The
    --merge-strategy additive method produces very similar, but slightly more
    conservative scores.
  • Removed the ability to randomly sample a base if no best base could be
    selected in case of mismatches. Such bases are now changed to N, while both
    methods assign a Phred score of 0 (!).

AdapterRemoval v3.0.0-alpha1

07 Nov 19:07
Compare
Choose a tag to compare
Pre-release

This is the first alpha release of AdapterRemoval v3. This is a major revision
of AdapterRemoval, with the goals of simplify usage by picking a sensible set of
default settings, adding new features to handle a wider range of data, providing
human/machine readable reports, and improving overall throughput.

This release features a number of breaking changes compared to AdapterRemoval v2
and it is therefore recommended that you carefully read the list of changes
below. Changes that affect how AdapterRemoval is used (e.g. by removing options)
or that result in different output compared to AdapterRemoval v2 are marked with
the label "[BREAKING]".

This is an alpha release; not all planned features are complete (more QC reports
are planned among other things), additional optimizations will be attempted, and
documentation is still needs to be expanded further before the final release.
Feedback is very welcome in the mean time.

Draft documentation is available here and a pre-compiled binary for x86-64 Linux systems is attached below.

Added

  • Reports are now available in JSON format for easy parsing and in HTML format
    for human consumption. These replace the old --settings file.
  • AVX2 enabled alignment algorithm for a significant performance boost (YMMV).
  • Added support for detecting supported CPU extensions (SSE/AVX) at runtime.
  • Support for combining output by simply by specifying the same filename for for
    multiple outputs types, e.g. --output1 file.fq --output2 file.fq will for
    example produce interleaved output.
  • Added handling for /dev/null as a "magic" output filename. Read-types
    writing to this exact path will be discarded early in the pipeline, saving
    time previously spent processing, compressing, and writing FASTQ reads.
  • Added read complexity filter inspired by [fastp].
  • Added the ability to only processes the first N reads/read pairs via the
    newly added --head N command-line option.
  • Added estimation of duplication rates based on the [FastQC] algorithm.
  • Automatic detection of mate separators based on the first chunk of reads
    processed. The --mate-separator is therefore only required in cases where
    the results are ambiguous.
  • Automatic gzip compression of output files with a .gz extension. This makes
    it possible to compress only a subset of files and removes the need for the
    --gzip option when manually specifying output files.
  • Added options --prefix-read1, --prefix-read2, and --prefix-merged for
    adding custom prefixes to the names of FASTQ reads.

Changed

  • [BREAKING] Default adapters have been changed to the [recommended Illumina
    sequences], equivalent to the first 33 bp of the adapter sequences used by
    AdapterRemoval v2. This makes the default settings more generally applicable.
  • [BREAKING] The trimming options --trimwindows, --trimns,
    --trimqualities, and --minquality have been deprecated in favor of a new
    the modified Mott's algorithm, which is enabled by default. The trimming
    algorithm used may be changed using new --trim-strategy option.
  • [BREAKING] Merging now defaults to using the conservative algorithm,
    meaning that matching quality scores are assigned Q_match = max(Q_a, Q_b)
    instead of Q_match ~= Q_a + Q_b, and that same-quality mismatches are
    assigned 'N' instead of one being picked at random. Motivated in part by
    doi:10.1186/s12859-018-2579-2. This can be changed using --merge-strategy.
  • The --merge option no longer has any effect when processing SE data;
    previously this option would treat reads with at --minalignmentlength
    adapter as pseudo-merged reads.
  • [BREAKING] Merged reads are no longer given a M_ name prefix and merged
    reads that have been trimmed after merging are no longer given an MT_ name
    prefix. Instead, see the new option --prefix-merged.
  • [BREAKING] Default filenames have all been revised and now include proper
    extensions to indicate the format.
  • [BREAKING] The executable is now named adapterremoval3. This was done to
    allow v3 to coexist with AdapterRemoval v2 and to prevent accidental use of
    the wrong version.
  • [BREAKING] Changed the default --maxns value from 1000 to "infinite"
  • --gzip now defaults to compressing independent blocks of 64kb data using
    libdeflate. This significantly improves throughput in both single- and
    (especially) multi-threaded mode, but may be incompatible with a few programs.
    Compression levels of 3 and below use isa-l for compression and provides a
    more universally compatible output.
  • The term "merging" is now used consistently instead of "collapsing", including
    for default output filenames. Options have been renamed, but old option names
    continue to work (except for --outputcollapsedtruncated).
  • Improvements to alignment algorithm in order to terminate early if possible.
  • Logging is now done more consistently and exposes options to increase or
    decrease the amount of messages printed (debug, info, warning, errors).

Removed

The following changes are all [BREAKING] as described above:

  • The --outputcollapsedtruncated has been removed and all merged reads
    (whether quality trimmed or not) are simply written to --outputmerged.
  • The --qualitybase-output has been removed. Output is now always Phred+33.
  • The --combined-output option has been removed in favor of allowing arbitrary
    merging of output files (see above).
  • The --settings option has been replaced by --out-json and --out-html for
    machine and human readable reports, respectively.
  • Removed support for guessing the intended command-line argument based on
    prefixes. I.e. --th will no longer be accepted for --threads. Due to the
    number of options added, removed, and renamed, this is no longer reliable.
  • The deprecated --pcr1 and --pcr2 options have been removed.
  • Dropped undocumented support for '.' as equivalent to 'N' in FASTQ reads.
  • Support for reading and writing of bzip2 files has been removed.

AdapterRemoval v2.3.3

15 Apr 09:33
Compare
Choose a tag to compare
  • Updated Catch2 to fix compilation with glibc 2.34, courtesy of loganrosen.

AdapterRemoval v2.3.2

17 Mar 11:26
Compare
Choose a tag to compare
  • Improved error messages when AdapterRemoval failed to open or write FASTQ
    files (issue #42).
  • Fixed build on some architectures. Patch courtesy of Andreas Tille/the Debian
    build team.
  • Fixed display of max Phred scores in FASTQ validation error messages.
  • Removed benchmarking scripts which were included in the repo for the sake of
    making Schubert et al. 2016 reproducible. This is no longer relevant.
  • Use 'install' in the Makefile; patch courtesy of Eric DEVEAUD.
  • Added --collapse-deterministic to .settings file.
  • Fixed --minadapteroverlap being misapplied in PE mode.
  • Added --collapse-conservatively merge algorithm based on FASTQ-join. See
    the man-page for more information

AdapterRemoval v2.3.1

12 Oct 20:08
Compare
Choose a tag to compare
  • Added --preserve5p option. This option prevents AdapterRemoval from trimming
    the 5p of reads when the --trimqualities, --trimns, and --trimwindows options
    are used. Neither end of collapsed reads are trimmed when this option is used.
  • Fixed Ns being miscounted as As when constructing consensus adapter sequences
    using --identify-adapters.

AdapterRemoval v2.3.0

12 Mar 17:06
Compare
Choose a tag to compare
  • Fixed --collapse producing slightly different result on 32 bit and 64 bit
    architectures. Courtesy of Andreas Tille.
  • Added support for output files without a basename; to create such output
    files, use an empty basename (--basename "") or a basename ending with a
    slash (--basename path/).
  • Added support for managing file handles to allow AdapterRemoval to run
    when the the number of output files exceeds the number of file handles, e.g.
    when demultiplexing large numbers of samples.
  • Reworked demultiplexing to improve performance for many paired barcodes.

AdapterRemoval v2.2.4

10 Feb 16:56
Compare
Choose a tag to compare
  • Fixed bug in --trim5p N which would AdapterRemoval to abort if N was greater
    than the pre-trimmed read length.
  • Fixed --identify-adapters not respecting the --mate-separator option.

AdapterRemoval v2.2.3

22 Jan 21:41
Compare
Choose a tag to compare
  • Added support for trimming reads by a fixed amount: --trim5p N --trim3p N.
    Different values may be given for each mate: --trim5p N1 N2. Trimming is
    carried out after adapters have been removed and reads have been collapsed,
    if enabled, but before quality trimming (Ns and low qualities).
  • Added option for determistic read merging (--collapse-deterministic). In
    this mode AdapterRemoval will set a merged base to 'N' with quality 0 if
    the corresponding bases on the two mates differ, and if both have the same
    quality score. The default behavior is to select one of the two bases at
    random.
  • Fixed reporting of line numbers in error messages.
  • Added conda installation instructions, courtesy of Maxime Borry (maxibor).
  • Fixed reading mate 2 adapters specified via --adapter-list. Adapters would
    be used in the reverse orientation compared to --adapter2. Courtesy of
    Karolis (KarolisM).
  • Fixed various typos and improved help/error messages.

AdapterRemoval v2.2.2

17 Jul 14:28
Compare
Choose a tag to compare
  • Made gzip and bzip2 support mandatory.
  • Added support for Intel compilers, courtesy of Kevin Murray (kdmurray91).