Skip to content

LexicMap 0.4.0

Latest
Compare
Choose a tag to compare
@shenwei356 shenwei356 released this 18 Aug 13:42
· 28 commits to main since this release

v0.4.0 - 2024-08-15

  • New commands:
    • lexicmap utils 2blast: Convert the default search output to blast-style format.
  • lexicmap index:
    • Support suffix matching of seeds, now seeds are immune to any single SNP!!!, at the cost of doubled seed data.
    • Better sketching desert filling for highly-repetitive regions.
    • Change the default value of --seed-max-desert from 900 to 200 to increase alignment sensitivity.
    • Mask gap regions (N's).
    • Fix skipping interval regions by further including the last k-1 bases of contigs.
    • Fix a bug in indexing small genomes.
    • Change the default value of -b, --batch-size from 10,000 to 5,000.
    • Improve lexichash data structure.
    • Write and merge seed data in parallel, new flag -J/--seed-data-threads.
    • Improve the log.
  • lexicmap search:
    • Fix chaining for highly-repetitive regions.
    • Perform more accurate alignment with WFA.
    • Use buffered reader for seeds file reading.
    • Fix object recycling and reduce memory usage.
    • Fix alignment against genomes with many short contigs.
    • Fix early quit when meeting a sequence shorter than k.
    • Add a new option -J/--max-query-conc to limit the miximum number of concurrent queries,
      with a default valule of 12 instead of the number of CPUs, which reduces the memory usage
      in batch searching.
    • Result format:
      • Cluster alignments of each target sequence.
      • Remove the column seeds.
      • Add columns gaps, cigar, align, which can be reformated with lexicmap utils 2blast.
  • lexicmap utils kmers:
    • Fix the progress bar.
    • Fix a bug where some masks do not have any k-mer.
    • Add a new column prefix to show the length of common prefix between the seed and the probe.
    • Add a new column reversed to indicate if the k-mer is reversed for suffix matching.
  • lexicmap utils masks:
    • Add the support of only outputting a specific mask.
  • lexicmap utils seed-pos:
    • New columns: sseqid and pos_seq.
    • More accurate seed distance.
    • Add histograms of numbers of seed in sliding windows.
  • lexicmap utils subseq:
    • Fix a bug when the given end position is larger than the sequence length.
    • Add the strand ("+" or "-") in the sequence header.