Skip to content

Recentrifuge command line

Jose Manuel Martí edited this page Mar 27, 2024 · 10 revisions

Command layout

The layout of the Recentrifuge (rcf) command (ver. 1.14.0) is:

usage: rcf [-h] [-V] [-n PATH] [--format GENERIC_FORMAT]
           (-f FILE | -g FILE | -l FILE | -r FILE | -k FILE) [-o FILE]
           [-e OUTPUT_TYPE] [-p] [--nohtml] [-a | -c CONTROLS_NUMBER]
           [-s SCORING] [-y NUMBER] [-m INT] [-x TAXID] [-i TAXID]
           [-z NUMBER] [-w INT] [-u SUMMARY_BEHAVIOR] [-t]
           [--nokollapse] [-d] [--strain] [--sequential]

Groups of options and flags

Input

Define Recentrifuge input files and formats

  -n PATH, --nodespath PATH
                        path for the nodes information files (nodes.dmp and
                        names.dmp from NCBI)
  --format GENERIC_FORMAT
                        format of the output files from a generic classifier
                        included with the option -g. It is a string like
                        "TYP:csv,TID:1,LEN:3,SCO:6,UNC:0" where valid file
                        TYPes are csv/tsv/ssv, and the rest of fields indicate
                        the number of column used (starting in 1) for the
                        TaxIDs assigned, the LENgth of the read, the SCOre
                        given to the assignment, and the taxid code used for
                        UNClassified reads
  -f FILE, --file FILE  Centrifuge output files; if a single directory is
                        entered, every .out file inside will be taken as a
                        different sample; multiple -f is available to include
                        several Centrifuge samples
  -g FILE, --generic FILE
                        output file from a generic classifier; it requires the
                        flag --format (see such option for details); multiple
                        -g is available to include several generic samples
  -l FILE, --lmat FILE  LMAT output dir or file prefix; if just "." is
                        entered, every subdirectory under the current
                        directory will be taken as a sample and scanned
                        looking for LMAT output files; multiple -l is
                        available to include several samples.
  -r FILE, --clark FILE
                        CLARK full-mode output files; if a single directory is
                        entered, every .csv file inside will be taken as a
                        different sample; multiple -r is available to include
                        several CLARK, CLARK-l, and CLARK-S full-mode samples.
  -k FILE, --kraken FILE
                        Kraken output files; if a single directory is entered,
                        every .krk file inside will be taken as a different
                        sample; multiple -k is available to include several
                        Kraken (version 1 or 2) samples.

Output

Related to the Recentrifuge output files

  -o FILE, --outprefix FILE
                        output prefix; if not given, it will be inferred from
                        input files; an HTML filename is still accepted for
                        backwards compatibility with legacy --outhtml option
  -e OUTPUT_TYPE, --extra OUTPUT_TYPE
                        type of extra output to be generated, and can be one
                        of ['FULL', 'CSV', 'MULTICSV', 'TSV', 'DYNOMICS']
  -p, --pickle          pickle (serialize) statistics and data results in
                        pandas DataFrames (format affected by selection of
                        --extra)
  --nohtml              suppress saving the HTML output file

Tuning

Coarse tuning of algorithm parameters

  -a, --avoidcross      avoid cross analysis
  -c CONTROLS_NUMBER, --controls CONTROLS_NUMBER
                        this number of first samples will be treated as
                        negative controls; default is no controls
  -s SCORING, --scoring SCORING
                        type of scoring to be applied, and can be one of
                        ['SHEL', 'LENGTH', 'LOGLENGTH', 'NORMA', 'LMAT',
                        'CLARK_C', 'CLARK_G', 'KRAKEN', 'GENERIC']
  -y NUMBER, --minscore NUMBER
                        minimum score/confidence of the classification of a
                        read to pass the quality filter; all pass by default
  -m INT, --mintaxa INT
                        minimum taxa to avoid collapsing one level into the
                        parent (if not specified a value will be automatically
                        assigned)
  -x TAXID, --exclude TAXID
                        NCBI taxid code to exclude a taxon and all underneath
                        (multiple -x is available to exclude several taxid)
  -i TAXID, --include TAXID
                        NCBI taxid code to include a taxon and all underneath
                        (multiple -i is available to include several taxid);
                        by default, all the taxa are considered for inclusion

Fine tuning

Fine tuning of algorithm parameters

  -z NUMBER, --ctrlminscore NUMBER
                        minimum score/confidence of the classification of a
                        read in control samples to pass the quality filter; it
                        defaults to "minscore"
  -w INT, --ctrlmintaxa INT
                        minimum taxa to avoid collapsing one level into the
                        parent (if not specified a value will be automatically
                        assigned)
  -u SUMMARY_BEHAVIOR, --summary SUMMARY_BEHAVIOR
                        choice for summary behaviour, and can be one of
                        ['ADD', 'ONLY', 'AVOID']
  -t, --takeoutroot     remove counts directly assigned to the "root" level
  --nokollapse          show the "cellular organisms" taxon

Advanced

Advanced modes of running

  -d, --debug           increase output verbosity and perform additional
                        checks (default: False)
  --sequential          deactivate parallel processing (default: False)
  --strain              set strain level instead of species as the resolution
                        limit for the robust contamination removal algorithm;
                        use with caution, this is an EXPERIMENTAL feature

Other

Other useful arguments

  -h, --help            show the help message and exit
  -V, --version         show program's version number and exit