Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kmer array overflow. currKmerArrayOffset=0, kmerBufferPos=1024, kmerArraySize=1024. #274

Closed
kad-ecoli opened this issue Feb 13, 2020 · 9 comments

Comments

@kad-ecoli
Copy link

Expected Behavior

mmseqs2 successfully linclust a 49 sequence protein fasta

Current Behavior

mmseqs2 complain Kmer array overflow

Steps to Reproduce (for bugs)

~/seqdb/JGI/script/mmseqs2/bin/mmseqs createdb DB.fasta DB -v 1
mkdir tmp
~/seqdb/JGI/script/mmseqs2/bin/mmseqs linclust DB DB_clu tmp -c 0.9 --cov-mode 1 --threads 1 -v 1

MMseqs Output (for bugs)

Kmer array overflow. currKmerArrayOffset=0, kmerBufferPos=1024, kmerArraySize=1024.
Error: kmermatcher died

Context

The input file DB.fasta and all intermediate files are attached.
linclust_3300021621.zip

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
    MMseqs2 Version: 481696b5f426f991211894d8a855bf9d60065c8f
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
    https://mmseqs.com/latest/mmseqs-linux-sse41.tar.gz
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
    fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
  • Operating system and version:
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.7.1908 (Core)
Release:        7.7.1908
Codename:       Core
@martin-steinegger
Copy link
Member

Thank you for the report. Could you please rerun it with the latest version?

@martin-steinegger
Copy link
Member

oh the error still exists in the newest version

@martin-steinegger
Copy link
Member

martin-steinegger commented Feb 14, 2020

Okay the bug should be fairly rear. It occurs if linclust exactly extracts 1024 k-mers.
Quick fix is to increase the amount of k-mers per sequence e.g. to 30 (--kmer-per-seq 30).

@kad-ecoli
Copy link
Author

Could you give a command example to show how to increase the amount of k-mers per sequence e.g. to 30?

@martin-steinegger
Copy link
Member

~/seqdb/JGI/script/mmseqs2/bin/mmseqs linclust DB DB_clu tmp -c 0.9 --cov-mode 1 --threads 1 -v 1 --kmer-per-seq 30

martin-steinegger added a commit that referenced this issue Feb 14, 2020
@martin-steinegger
Copy link
Member

Should be fixed now. Thank you for reporting.

if you want a set of stickers (see https://twitter.com/thesteinegger/status/1201076220957315074), send me your address to themartinsteinegger at gmail com.

@kad-ecoli
Copy link
Author

I am afraid the issue is not yet solved. I run mmseqs using the same method on the following set of sequence. mmseqs will again complain about "Kmer array overflow. currKmerArrayOffset=10240, kmerBufferPos=1024, kmerArraySize=11264." if --kmer-per-seq 30 was not set
DB.fasta.zip

@martin-steinegger
Copy link
Member

@kad-ecoli oh, there still seem to be cases where an off by one could occur. I pushed a fix. Could you try the latest version? Thank you for your patience. I got your mail address for the stickers! We will sent them out this week.

@kad-ecoli
Copy link
Author

The new fix works fine. I also received the stickers. Thank you for the little foxes.

RuoshiZhang added a commit to soedinglab/spacepharer that referenced this issue May 12, 2020
46c843895 Update combine pval agg-mode 3
67d610136 Disable fancy progress bars on travis to reduce output
203a21736 Updated two more tests to use tighter ROC thresholds
a9052f449 Update regression with tighter bounds for ROC tests
c62736a6d Correctly parse keys from data files in filterdb --filter-file This was causing a linsearch instability
fe007cb4e Use MultiParam for gapOpen, gapExtend costs
3513001d3 Add easy-rbh workflow
d0d3032e9 Fix RBH search if using -a to show alignments
ce1a43bf1 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
ea24e4934 Fix issues with abs. path if using aria2c
5228745f5 Improve --alignment-mode parameter description and make it a non expert parameter
fffa9b10e Fix various inconsistencies and usability issues with alignall: * alignall alignment-mode did not correspond to align alignment-mode * add-backtrace did not do anything, has to be specified now if backtrace is needed * Did return a alignment db type even though it is incompatible with that type, uses generic for now * various parameters were passed but unused   - zdrop and scorebias are used now (however see below)   - realign, alt ali, max accept/reject, wrapped are now gone
290668474 Fix wrong warning
813d81f29 Update regression
264d78117 Switch greedy clustering algorithm back to old idea
c09f6574e Improve nucleotide clustering workflow
38a737708 Set k-mers in linclust to 0 for the nucleotide clustering
7df6e3f75 Replace characters that can not be reversed by N in extract frames
e9678f625 Update regression
f886e868f Add nucleotide support to cluster (workflow nucleotide_clustering), clust module will infer identity automatically if missing, Improve low. mem. greedy incremental algorithm, Update regression
5f8735872 Add kmers-per-sequence-scale to linsearch
0310eb607 Change --kmer-per-seq-scale to a multi parameter, add error if cluster is called with a nucleotide sequence
e258bc8d8 Fix #299 PDB70 database creation was not working
7095f37e4 Add support reverse complemente in rescorediagonal --rescore-mode 0 and 1
61ca48883 Fix result2dnamsa
70d014e41 Add search-type 4 to Search
462f24cbb Add module result2dnamsa
5670d990e Fix regression error
e4451d591 Add result direction parameter to kmersearch
12c499dcd Fix reverse sequences issues in linclust and linsearch
44499c3ce Update filterdb regression test
807b4a56a Fix issue soedinglab/MMseqs2#290. Filterdb checked for mode == true but mode was 2.
24479bc27 Fix Docker
a578f52a7 Fix char signedness on PPC
a0d64a989 Update regression
a07a266f9 Working on PPC64LE support
09734177c Remove remaining _mm_shuffle_epi32
cdef78a69 Merge pull request #285 from hgsommer/misc_small
283c8d03f Replace goto end in ssw
6bfc50281 Fix c/p mistake in convertalignments
e61da3447 Fix spelling of 'length'
9a63760fa Replace nested ternary operator
4349b5c6e Avoid repeatedly checking for profile db types
c170a11f5 Call MsaFilter::shuffleSequences() from MsaFilter::filter()
ef49ba220 Return value from MsaFilter::filter()
d155dc36c Replace int by bool literals for bool variable
ec6722adc Align headings with column in PSSMCalculator::printProfile()
548a9bd68 Avoid forward declaration of ScoreMatrix
d0fbe471f Do some cleanup in StripedSmithWaterman.cpp
91d1aeddc Replace check for zero-sized containers by empty()
e47b8eed9 Remove superfluous parameter from ssw_init()
250b1221d Simplify return statements
4fe1116ae Remove counting zero scores in Sequence::mapProfile()
4303728b5 Replace multiplication by zero
1bd602420 Remove increment by zero
e4d4389f2 Move check for exit condition in front of allocations
556d26d1a Clean up function signatures in MultipleAlignment
3863af9ac Move include back to header to restore build
e1208493a Remove unused TmpResult score field
1fd4db8f2 Die if DBReader cannot reopen files (e.g. no more file handles left)
1e21b87ba Purge sequenceLookup early since its recreate in split databases
40854ddcd Prefiltering and CacheFriendlyOperations refactoring
2433e086b WASM work in progress
14014cd0e Fix prefilter overflow instability
e0f971848 Add conda forge to conda install instructions
aa175d636 Fix off by one in kmermatcher soedinglab/MMseqs2#274 (comment)
d1607bc8a Remove LINE_MAX
eca2155d7 Clear string buffer instead of reassigning in swapresults
0f4645edd Fix wrong reverse marking in linsearch reported by UBSAN
5b612a327 Missing mpi binaries for travis regression
83d22417a Next try for ARM compiler flags
7ad122f0a Missed a few variables
ac7914bea Do not require a cmake variable to build ARM
0dcfaadbb Update regression to fix broken samtools call on ARM
29927b4c4 More NEON fixes, we assume signed chars, ARM uses unsigned by default
7760220ff Next try to get the ARM regression to work
cc6d0d52b Add hack to not break travis log size limit
5408c3d10 Try to get NEON to compile
83192cabd Fix search workflow parameters printed twice
f6f001c8c Fix new clang-10 warnings and further travis fixes
259e64341 llvm-10 alias is not whitelisted in travis yet
b1249fd54 Fix errors in Travis YAML from previous commit
18486d4c5 Update travis - use native aarch64 for neon - use xenial - shorten script
98c37f3c3 shortend MultiParam usage, improved line breaks in usage
c9be07f1a Add gcc-9 to travis
2e5fb309a Fix travis clang build
d5865c894 Remove MultiParam g++-9 warning
73679835b Rework target split merging
ca5869397 Fix RESSIZE issue in slice search if sequences are used
491900b99 Improve usage text of cluster/linclust
0166850a2 Remove old greedy incremental clustering code and just run the memory efficient version instead.
15163e64c Fix Verbosity in workflows
aa78af463 Fix issue soedinglab/MMseqs2#274
7846dfce3 fixed clang template error
e1206371c extended MultiParam class, replaced ScoreMatrixFile type by MultiParam<char*>
b88b54756 rewrite alphabetSize as multi parameter
ecb4e35d4 started template class MultiParam to store sequence type specific values
e1a1c1226 changed dbtype comparision in AlignmentSymmetry
2a829aef7 Replace symlinkat call with getcwd/chdir/symlink/chdir to fix Conda build using macOS 10.9 SDK
28e83e8d5 Add OpenMP include to DBReader
fb00aa0c3 Fix realloc issue while IndexTable creation of profiles
504e5021f Take max. seq. len of query and target db in prefilter and alignment
16e235214 Fix bug if seq. len > max seq. length in Alignment
80d0187de Fix asan issue
751f5c19f Make ZDROP an expert parameter, change description text
1b6edd0d4 Rework x detection (SIMD)
9677254ab Merge branch 'master' of https://github.com/soedinglab/mmseqs2
1ac1e6866 Fix max seq issues in prefilter
cb737033c Reset download strategy to not use aria2c for the NCBI download
c95f3ee0e fixed ksw2 test
72b95c0ce Error if we cannot download from NCBI
1d0aad50b Fix databases not piecing togehter all kalamari accessions
516723d53 Merge branch 'master' of https://github.com/soedinglab/MMseqs2
d81b6cca5 added zdrop parameter to control banded nucleotide alignment
e2e39a971 Add Kalamari Contaminants database
c0c538ea3 Various fixes in databases script
08cc95b3a Fix createtaxdb redownloading when taxdump already exists
018eb3498 Remove a bit whitespace in front of each parameter in usage message
8aa7513de add aggregatetax example, fix typos
8bcd7c740 Fix typo
8e581b762 Rework usage texts
7dc25764a Hide most parameters from createindex
2baa609e8 Add examples to many modules
00a7d7696 fixed bugs for long or wrapped nucleotide sequences
a4bdcb478 eggNOG profiles should not depend on the deleted MSAs
4c7830954 Fix eggNOG database construction
f7a5599c8 Cleanup not needed files immediately in databases workflow
3ed3690d4 Fix downloads always restarting in databases workflow
4cfac9a8a Fix aria warning with more than 16 connections
e0a00e10d Revert "Use SW instead of BandedNucAln if we don't have diagonals"
7ac966b2e Fix result2msa could fail if it was writing compressed output
95729ac7c Fix wrong output DB type written in alignall
f899e7c7a Use SW instead of BandedNucAln if we don't have diagonals
c08d9fa8e Allow parameter descriptions to span multiple lines
57868498e MMseqs2 is not limited to proteins, update README to reflect that
11818b0a2 Cleanup hiding parameters in workflows
c481cea60 Remove some useless includes
2f64aeeb8 Fix databases timestamp appending instead of overwriting
ae9e9e329 Add eggNOG setup procedure to databases
31c8e5d50 Shorten two short parameter descriptions
2f49d3e3e Read header from lookup in msa2profile if available
1356869b0 add option to reverese profile dbs
ac3482e80 More issues with zlib and tar2db
aaafafe43 Fix tar2db keys
c751d9e2f More tar2db fixes
a9c93014c Fix variadic input to tar2db
51a761305 Add tar2db module to convert content of any tar to a DB
96f9a91e5 Use nedmalloc on Windows/Cygwin
73f5c2a2d Add databases workflow to README
5a7ac9e54 make align output consistent
c5ebe5297 fixed setcover cluster mode (by fixing bug in similarity reading for short aln results e.g. hamming distance aln)
481696b5f Fix databases output
c6b4a57a8 Beginning cleaning up parameter descriptions
a9552a177 Show default value of bool parameters
af89c4677 Add a proposed example text structure
9c17f4eba Rework module description texts, better categories, shorten all descriptions, prepare to replace long descriptions with examples
00ff199e8 Add Resfinder DB
f1011ecb4 Fix krona again marked as vendored
02001ab03 missing mode resulted in different top1
4375463bc Header db should not have to be a unsplit db
edccbf33f Actually fix extractorfs lookup creation
041e8e558 Improve README
a8f2c7bad Remove correct workflow script in createtaxdb.sh
26c8202a9 print createdb cmd line again
df02bae34 Refactor createseqfiledb, remove stringstream
2523ebe1a do not write null byte
af847a724 Fix clang warning from DBConcat
ef1ec596f extend dbconcat to handle auxillary files
528bd2134 not needed
dec1b9215 Silence warning in GCC 4.8 casting function to void*
2d44c886d Fix extractorfs not being able to create lookup
ffe66afac Replace isnumber with isdigit. Add more tests to TestTaxExpr
fbe09867e Rework Taxon Expr parsing
f58329ef5 Add constructor to define custom functions to ExpressionParser
b6ef07281 Initialize expressionparser per thread, was not thread safe
f966bfa62 Fix reallocation issue in BandedAlignment
bbd3c2bb7 Add +1 to realloc in BandedNucleotideAligner but not to length
6b6e82ae6 Add +1 to realloc in mapSequence
75e2c8ec4 Fix off by one issues in realloc in rescorediagonal and BandedNucleotideAligner
afd14c8c2 First step to get rid of maxSeqLen
13ca612db Fix allocation issue in kermatcher if sequences are longer than > 2^16
62de5ba93 Fix off by one in computation for splits in kmermatcher
35e95d180 Change int_sequence to char (big change)
ecf82f2f4 Revert "Temporarily disable soft split mode for createdb in easy workflows"
d19219dd4 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
1a0d898ec Fix softlink issue in createdb soedinglab/MMseqs2#265
13e0fe466 Temporarily disable soft split mode for createdb in easy workflows
4487b6e14 Fix view module to work with softlinked createdb dbs
c1e9eb0e3 Fix MPI issue if only one server is used
e781c3fe5 fix MPI compile error
9bcff2844 Fix Filter2 bug of HH-suite in MMseqs2 soedinglab/hh-suite#182
01db79d33 Fix some bugs in splitting handling
d9a887453 Fix memory splitting issues in kmermatcher, kmerindexdb
37880f083 Fix MPI in kmermatcher and indexdb
bee93123f Update regression
03a89ff1c Merge branch 'master' of https://github.com/soedinglab/mmseqs2
6ca967362 Update the way how k-mers are extracted in kmermatcher. Extraction should be now ~3 times faster.
f1388309d Introducing databases workflow to automatically setup and download common databases
d78fdbb06 Add progress to convertmsa
18acba224 Do not recreate _mapping file if it already exists in createtaxdb
63a373f5a Skip validations steps correctly if a input db is neither INPUT nor OUTPUT
d95caa1a7 Allow modules with zero parameters
9f8aff948 Allow modules to handle -h or --help themselves
cf5691f92 Typo
8ebc9d16b fixed access mode
31895414d Clarify parameter help in createdb
f644744a8 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
c287719d9 Remove check for profiles for splice serach. It should also work with sequence databases.
c75fe9acf regression submodule w filtertaxseqdb
7587a872f Add one more missing check in kmermatcher
8d4e9f4fc Remove +1 from size in initKmerPositionMemory
aca141e95 Fix shellcheck error in splicesearch
8bdff50e1 Move +1 from initKmerPositionMemory outside
f12821e35 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
d74b76ca5 Avoid overflow in kmermatcher if split is needed
fd90ff2c3 Move compiled data resources into subfolders
2fd9f25d2 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
b439ce831 Make the slice search applicable to other databases types, not just profiles
589a2e276 Fix apply crashing on empty entries
82542a6ac Merge branch 'master' of https://github.com/soedinglab/mmseqs2
c0acdd8f3 Fix memory leak in createsubdb.
5129a956d Validate taxonomic ranks and make input/output formats consistent
53bb55b38 Fix issues in hash function soedinglab/MMseqs2#252
764c4a3e7 Fix lca message
c013a6929 Fix LCA output message
a1206690d Change db validator from result2stats
714f5b4fb Replace mmaped input file with std c io in createsubdb
6e43e9413 Add remove .source file to rmdb
3e58bb85b Fix result2flat soedinglab/MMseqs2#261
3e27833db Revert easycluster.sh back to result2flat. Reason is that createsubdb can not handle soft linked sequence databases (input.0 -> input.fas)
33354680f Merge branch 'master' of https://github.com/soedinglab/mmseqs2
1e92fb504 Replace result2repseq and result2flat with createsubdb and convert2fasta
55bcdd303 single step clustering could potential cluster unrelated sequences due to hash collisions
fdd0646b1 Fix clusthash issues with parallelization and nucl input
e62a1c717 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
1336b7ad2 Add MSA to allDb and allDbAndFlat
48a037a2e Update Prefiltering.cpp
a1adbf52d Fix warning: Remove useless copy constructor from Matcher::result_t
d3ca42657 Remove truncatedCounter variable in QueryMatcher
4647525ec Show full help text if "Error in argument " occurs
4149ae457 Remove annoying message in prefilter (truncated result). Move it to the statistics section.
d5aab5b86 Update regression
1f1e049e6 Fix output of unclassified hits in convertalis
83ff5c601 Fix permission issues for tmp directory
cce6e6714 add support to output taxon in easy-search when using an indexed database
f200bdd62 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
6f28a29ae Fix seg. fault if all sequences could be classified
473d60580 Update batches
b52668f6e Add chat icon
af54c8e8e Update README.md
7eb6a0b70 Makde addtaxonomy more resilient against invalid taxonomy mappings
3482b0e91 Merge pull request #260 from RuoshiZhang/master
36f49f5b5 Fix issue in memory computation for split
bcb97d63f Update README.md
abcd97de7 write same number of fields even if no hit
38e102181 Update regression to hopefully fix windows failure
f41511465 Fix spelling error
1fd24924e Add a search-type 4 for trans-trans search returning a nucl backtrace in offsetalignment
31f6d7ac3 add aggragatetax to assign set tax by majority vote
b6e8ee239 allow more dbtypes in swapdb
c9d02ef21 add option to view rank index
49db7258e typo fix
9c32930f3 Merge branch 'master' of github.com:soedinglab/MMseqs2
17b5494fe Fix auto detection of dbtype in createdb
8831df81d Merge branch 'master' of github.com:soedinglab/MMseqs2
be1a9822c Fix createseqfiledb soedinglab/MMseqs2#258
02be0c4ea Fix summarizeresult to support reverse position in alignment
7ef586276 added filtertaxseqdb
00f2fd2b8 added mode for all but index
127db8c6d minor tidying for filtertaxdb
8144e7653 Merge branch 'master' of github.com:soedinglab/MMseqs2
48f77fa7d Fix ASan issue in filterdb
d722d5724 Fix warning in filterdb
4a4e6ea15 Update regression test for filterdb
31a7dc124 filterdb --join-db ignores lines it cannot join instead of crash
6c6faa96d filterdb's --extract-lines works together with --trim-to-one-column
12bee8142 filterdb can filter by rows with value within percentage #249
5c919ab95 Allow double parameters separately from floats in parsing
f9be8a88d Remove broken filterdb paths
1dc04f5e1 Refactoring of filterdb
90e3a9aaf Fix bug for enforced dbtypes in createdb
a4cee78db New regression to check stdin support
17ec97c78 Add stdin support to easy workflows
76c9e7c36 Fix compiler warnings in KSeqWrapper
0cc45536b Overwrite dbtype correctly in createdb
c0045182b Add stdin to createdb
02a88e438 use https instead of ftp for downloading taxdb data
a33bd27f4 offsetalignments now correctly returns a nucleotide backtrace if needed
456e1b5ab include VTML40 in binary for easier access
775de3850 Add missed target .source file for reading in convertalis
c08c071b2 Overload patterncompiler isMatch for pos of match
ba6aa8d12 avoid appending extra tabs besthitperset

git-subtree-dir: lib/mmseqs
git-subtree-split: 46c8438958edccd8fd09640eb174e2449529e4df
martin-steinegger added a commit to steineggerlab/conterminator that referenced this issue Oct 25, 2022
c48da9d7 Update Prefiltering.cpp
45891515 Reset errno before various strto* calls
7e284099 Update docker install instruction to GHCR
28b00883 Fix FASTA input not ending with a newline resulting in invalid sequence db with --createdb-mode 1 (#617)
a81d9e72 Fix issue with gcc 4.9
8799829d Fix compile error
1761bd60 Add module db2tar: Create a tar file from a database
dcd180be (Re)add support for tar-writing to microtar
fea8d203 Add support for external k-mer thresholds for the prefilter
ede0be15 Rework rescore diagonal
8f78b0ab Rework ungapped alignment
aabc78c2 Fix indexdb
ce8cd536 Fix masking issue
304a99bb Delete unmasked index to fix asan issue
67949d70 Fix #586 summarizeresult should not reject hits that match the coverage threshold
3d4840b3 Use macos-11 in azure
8ff26f23 Support finding taxonomy db paths from other prefilter databases
8ff72796 Add speedup shortcut to TaxonomyExpression for a single tax identifier
1d631726 Add taxonomic filtering during prefilter with --taxon-list
3b9cf881 Add URIs as allowed parameter inputs
1c739ae7 Add easy parsable tsv output to databases
ba4e11f1 workflow_dispatch can tag container as latest
7ebd2e04 Revert alignment profile in sequence.cpp
5185d3cb Allow tagging of docker containers through workflow dispatch
eb203d35 Build docker image in GH action and publish to ghcr
678c82ac GTDB ar122_taxonomy does not exist anymore, replace with different file #561
7be78c81 Fix tar2db breaking with --tar-include/exclude #561
d1555862 Encode more
16b57741 Encode " \n\t[]{}^$?|.~!*" as b64
b0b8e85f Fix truncated profile sequences in convertalis #567
96b20099 Fix broken badges in README (and remove travis)
407b315e Fix multi-threading issues in pairaln
92deb92f Fix unpackdb parameter
be8c278c Progress update fix
58593ec0 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
3f8695ea Add multi-thread support to pairaln
e9e829c7 Fix seg. fault in realign
ce7bf53b Point Kalamari3.7v to a fixed commit soedinglab/MMseqs2#531
fcf52600 Remove a level of indirection to access compatible index version
922e2691 Fix failing utility tests
74c3aa65 Fix typo (violoations -> violations) (#526)
7281baf9 Add --comp-bias-corr-scale
d89fcecf Write serialized index in appenddbtoindex
79ea1ee3 Fix new IndexReader USER_SELECT trying to read header databases as fallback
a506d677 Allow subprojects to build their own precomputed indices
75af0c82 Add appenddbtoindex to argument a precomputed index in sub-projects
4f046dd1 Add mask prob to mask sequence
38cf3f10 Fix TestIndexTable
b768f48f Add --mask-prob parameter
bfc6f85b removed error message for wrapped scoring, should work with all rescore modes
edb8223d Fix pairaln
6e7ed700 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
e19df7ce Rework pairing to support more than two sequences
9fded60a Add environment variable MMSEQS_IGNORE_INDEX to ignore an existing precomputed target index
efacc690 Cushioning the overestimated number of diagonals in case of many successive hits on one diagonal
5fc318b6 Add convertalis --format-mode 4 to print blast-tab headers
80fcadde Disable profile gap scores in msa2profile temporarily
9cc89aa5 Fix huge memory allocations introduced in 49c2b70
a8c30da5 result2msa correctly prints X residues
482dedc6 Explicitly set threads in Cirrus
75e9bfaa Update tectonic in azure to fix error in userguide building
16830a52 Fix number of CPUs used in cirrus
aab640d2 Fix gap pseudocount mode again
716fb621 Turn --k-score into MuliParam so it works correctly in iterative-profile search
56816b39 Resfinder download should not use tar wildcards, broken in busybox #494
e85ceb9d Change the url for UniRef* from ftp to https in databases downloader (#496)
49c2b70b Fix mem. issue
09e261bf Avoid substracting from getMaxSeqLen
4b77690e Move maxSeqLen logig to getMaxSeqLen() to avoid index issues
d8736973 Fix max length in DBReader Allocate CSProfile only when needed
42bf6438 Rework download database
5afd33c3 Make "databases" usable in sub-projects
f6518799 Update regression
f3f5b133 Update k-score sensitivity fitting for no-cntxt profile searches
3e92abf7 Add db-load-mode support to pairaln
5e245d17 copy dbtype and clear map
4a3bb340 Merge branch 'master' of https://github.com/milot-mirdita/mmseqs2
9a0df0d2 Add pairaln
fa44760e Fix recent forgotten else in getKmerThreshold
45b2b521 Revert "Try increasing the k-mer thresholds again for 5/6-mers"
be119433 Fix prefilter not correctly masking extended dbtype for comparision
e3ce4605 Fix memory leak in MappingReader uncovered by ASan
06bdc5e7 Fix missing cassert header in tsv2exprofiledb
8521fb45 Remove useless calls to opendir/closedir in FileUtil
885b4699 Add workflow to create expandable profile (profile-profile) db from a bunch of TSV files
ad05844f Add missing pseudocount check in indexdb
e33c32aa Fit new values for prefilter
7950368f Fix another broken test
b456cf51 Fix unused variables in lca
003cd244 Merge remote-tracking branch 'main/master'
6a8f586b Add extended dbtype to check for context specific pseudocounts, so that the correctly fitted kmer thresholds can be used
92a19497 Fix uninitialized warning in addtaxonomy
2e75435e Fix createbintaxonomy mapping dump size written
178eacff impl. contextPseudoCnts getKmerThreshold, values not fitted yet
35c67c87 Change pos. spec. gap costs to templates
9defdf89 fixed bug for uneven number of repeated kmers
0c26a107 replaced global with end_to_end in rescore mode variable
9064061d fixed size_t parameter handling
3fa46fe3 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
763fa9ff Change compress loop to omp static to keep order
49710b7f Fix sub. mat asan issue
d0a00d6a Update Sub. Mat. logic for aa2num mapping
ccf55559 Fix test
e4aae927 Make taxonomy mapping mmap'able for instant read-in
c66fd1b1 Fix syntax error in filterresult
87623596 Fix issues with include identities in filterresult
91617c4b Add includeIdentity to filterresult
fe16da39 Stay compatible with previous short A3M header output format
ce5b2418 Fix wrong assumption about header databases IDs with new index database scheme in result2msa
a54df874 Remove E-value threshold in filterresults
5647a56a Allow --diff 0
d5656191 Add MSA output mode for A3M+aln info
85ce8472 Expand can filter in each target cluster before expanding
ae4c7ab1 Merge branch 'master' of https://github.com/soedinglab/MMseqs2
38ab523a Merge branch 'master' of https://github.com/soedinglab/mmseqs2
5e0d11f2 Extend MSA filtering for bucketed filtering within qid buckets
c6d8ae0c Add filter min enable
25cb16ff Enable result2profile/filterresult to read new expand alignment index
37225004 Don't mask consensus sequences in profiles
b2a34020 Ignore cacode warnings
c3e90f41 Allow indexing of profile-profile db
f3491183 Make sure very large database don't overflow localThreads
66fa3c76 Update regression to remove result2pp from expand check
87fed2e6 Merge remote-tracking branch 'main/master'
5b75b842 Try increasing the k-mer thresholds again for 5/6-mers
ad5837b3 Revert "result2msa now supports reading from index"
7ee3e794 Fix wrong database name printed for variadic input when creating a tmp directory
15fdf48e result2msa now supports reading from index
7aade9df Change deep copies to const references in result2msa
ce7cf754 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
31eb67ae Add A3M support to result2msa
56f7685b Add symlinks/copies for _taxonomy file #474
904d0c6d Transition old compiler tests from travis to CirrusCI
442d8983 Fix memory issues in QueryMatcher
17c8028e Move fixRlimitNoFile to Application
c6634976 Fix the forbidden symbols when using unpackdb (#467)
488df863 Refactoring of gff2db
d822533f Build update function for DbType validators
a09a704e Remove bash dependency in regression to fix FreeBSD in CirrusCI
4f1996a4 Fix FreeBSD on CirrusCI samtools issue
a2e2129c Add CirrusCI to test FreeBSD
01492c95 Revert "Make sure QueryMatcher::radixSortByScoreSize cant corrupt memory"
15ace29a Fix posix_madvise on FreeBSD returning error if size=0 (See #460)
86152a2f Remove useless calls to std::map::operator[]
d4dd06d2 Fix iterative profile search restartable again
91b61706 Make sure QueryMatcher::radixSortByScoreSize cant corrupt memory
af317095 Save a buch of work when sequences are not needed in expand*
be5a1da4 Replace many aligned allocation in MultipleAlignment with single allocation
7469d599 Fix unused warning
942a012a Move MultiParam::format out of header to avoid compilation warning
d2148058 Fix unused parameter warning
40ba03f4 Disable warnings from nedmalloc (external dependency)
c811a511 Fix tests after profile-profile refactoring
7a8ee485 Try to fix profile-profile alignment for SSE
68862ed2 Add missing simd.h functions for SSE
a09de7eb Fix compile errors
807d97a9 Merge remote-tracking branch 'main/master' into ppmerge
4578f8ba Temporary change to slicesearch to speed things up
3a51b445 Add support to support position-specific gap penalties in profile-profile alignment in iterative search.
139e4502 Get rid of MathUtil::popCount in favor of __builtin_popcount
bbfd6e26 Add preload mode to expand(aln/2profile)
b14d0136 Fix a few more tests
635911ec Increase sortresult buffer for matcher result
d6c19db9 Fix exhaustive search parameter in examples
e86afeab Move substitution matrix init code out of Parameters::parseParameters to fix tests
62f7aba1 Replace biorxiv citation for taxonomy paper
24f6b52a Cleanup magic value with constant in kseq
c7f6a37e Allocate at least a 20 * 20 matrix in StripedSmithWaterman
57de8c8d Fix profile2repseq input database type
96a069e5 Shellcheck fix
52c6ae87 "Can not" to "Cannot" in DBReader and cleanup
e39d02af MemoryMapped cannot accidentally segfault on 0-byte sized files anymore
2d7411a1 Revert "Bug fix with empty temporary files"
7be4fca9 Add VOGDB to database downloader
dd5db429 Update dbCAN2 to V9 and make remove .aln suffix from profile names
d4a33542 Always set a value for FILTER_RESULT in exhaustive search
ec1f599e Update regression for recent change to nucl-nucl search
c967985e changed rescoring for nucleotide sequences only in prefilter
19064f27 Revert "fixed rescoring for nucleotide sequences with multiple diagonals for one target exceeding UCHAR_MAX count"
c54c5382 fixed signed error
f751bcc9 fixed rescoring for nucleotide sequences with multiple diagonals for one target exceeding UCHAR_MAX count
1d770285 Fix endless loop in rescorediagonal
4462533c Don't allow iterative profile search in taxonomy #432
64a2265f Make sure no backtraces are computed in lcaalign
b8501a1b Fix previous broken commit
971b442e Fix additional two more memory leaks before exit
7fbc0b65 Fix memory leak in DBWriter::createRenumberedDB
a6cab565 Fix prefilter/alignment with 0-size query input #433
14a3dce2 createsubdb and view can now return results from identifiers in .lookup with --id-mode 1
6622c9f0 Fix DBReader::USE_LOOKUP_REV
d77de8da Fix extractorfs sometimes loading invalid start/stop codons on non-avx2 platforms
5daca424 Fix typos in extractorfs warnings for short input sequences
fe61aeee Replace strcpy in microtar
0523594f Add support for GNU tar specific filenames and some lesser used entry types to tar2db
5ed18ff0 Merge commit '15242315f80fbda1bffc05cd41fa47c192373902'
15242315 Squashed 'lib/simde/simde/' changes from 79bf0b7c..1f4a28c4
bb02734e Get rid of more scanf calls
fa4cd2a7 Fix arch selection on ARM (use -mcpu instead of -march) and s390x (enable -mzvector)
a202b3c2 Squashed 'lib/simde/simde/' changes from b6c9c964..79bf0b7c
fb39ca1e Merge commit 'a202b3c2d58cc2f80ecfb2123158377f08bc6510'
3d40f105 Fixes for gap panalties merge
2718ca75 First attempt to merge prof-prof and gap-penalties
93f90b04 Fixes to last merge
b7811188 Merge branch 'master' into main-master
22a7bfa2 Add iterativepp workflow
1a87a226 Cleanup Matcher::compressAlignment
6885bad8 Get rid of sscanf in Matcher::uncompressAlignment
50ce7a5c Fix previous commit writing dbtypes for big endian
852f04de Fix compile error
afa6d02d Read/write dbtypes always as little-endian
6269994f Explicitly support size_t in Parameters
d9744e3c Fix some 32-bit issues #418
c25aec57 Cleanup kmergenerator header
be343e98 Additional s390x fixes (linclust might work now)
45111b64 Add initial fixes to get MMseqs2 working on s390x
b1704ccc Merge branch 'master' of https://github.com/soedinglab/mmseqs2
f388ead8 Add parameter --alignment-output-mode, remove alignment mode 5
2a4a2dc5 Add correlation score parameter to align
f9d2ae30 Add support for new Multiparameter type
cbc1b489 Refactor pseudocounts
1e58454a Restore K4000.crf from history
f6eadeaa --majority parameter was missing from taxonomy workflow
24217dc9 Reduce number of threads on travis ARM
ff4c9029 Remove SORTRESULT_PAR from search.cpp
178d3b5f Fix exhaustive search
247de411 Move warning from inner loop to outer in extractorfs
6a0dcee4 Update Regression test
f92447d0 Rename slice to exhaustive search, add filterresult
6c2fefce Set pca to 0.0 in expand2profile
0cc7e674 Add unpackdb to split a database into separate files #406
877344c3 Add USE_SYSTEM_ZSTD cmake flag to use system provided zstd #411
bbd56417 Replace throw with abort in ALP again
46c26ce9 Add missing licenses and readmes for code in lib #403
20543e0a Update ALP to 1.98 and add readme/license
d5717e82 Add CDD to databases downloader #410
04b27f98 msa2profile always copies lookup/source files instead of linking them to be independent from the MSA db
2d83f517 msa2profile/result can skip the first sequence
242a8faf Pass threads to tar2db in databases workflow
a19f5a52 Allow clustering of clustering input with set-cover or connected-component by ignoring scores/weight
39a41403 Don't set INT_MAX as --max-seqs in slice search to avoid huge allocations in prefilter
9290a2b5 Allow sequence database input in taxonomyreport #408
aaba0c7f Short circuit cluster-reassign if nothing can be reassigned
3822a8f5 Fix tmp files not getting removed in linclust/cluster with --remove-tmp--files
2a35e025 Fix kmermatcher setting user k-mer pattern in auto k-mer selection and breaking
a1050359 Rename accelerated 2bLCA to approximate 2bLCA to be consistent with manuscript
11698a5b Rename LICENCE to LICENSE soedinglab/MMseqs2#402
0828d865 Allow result database input in taxonomyreport #401
b31ebb64 Krona taxonomy report was not working if no sequence was unclassified
9f0fb3ed Cleanup taxonomyreport
a2d9568d Fix wrong azure dependency
b1367fc2 Make resultToBuffer buffer sizes consistent (needs further refactoring)
98f9939d Get rid of results temporary array in msa2result
d495e0e9 Replace texlive with tectonic for userguide building
e03b5257 Fix MMseqs2 Taxonomy citation
602689c1 Update examples in mmseqs (easy-)taxonomy invocation
ecf152cf Improve (easy-)taxonomy description text by reordering parameters by importance
e0b04434 Improve description of --orf-filter
a7f91d46 Add warning if cluster or prefilter input is used in majoritylca with invalid --vote-mode
a3399397 Update regression to include recent speedup
d5da12d7 Add GTDB to databases downloader
83780f4c Respect verbosity for rmdb calls in databases
9011c15d Improve output of databases list
86c03fd4 Increase buffer sizes in tar2db
2bd03c68 Fix tar directory (symlink, etc) entries causing tar2db to stop early
7bdb222d Use DBWriter to write .lookup multi-threaded in tar2db
23c9e1e7 Don't use multiple threads in tar2db when reading .tar.gz/.tgz as nearly all the time is spent inside zlib
2e128d4f Increase zlib buffer in tar2db to speedup reading
c1911893 Fix multiple locations where Util::checkAllocation would never be called as the preceding allocation would already terminate on failure
1f302134 Fix two compilation failures revealed by Debian
5b03cdff Another instance of the same warning
3fda449b Fix compile warning
3b0197af Encode species names in taxonomy blocklist to make sure we don't block random nodes in non-NCBI taxonomies (e.g. GTDB)
ab2426f8 Fix String MultiParameter (e.g. sub matrices) breaking if filenames contain whitespaces
e8de3507 Encode whitespace containing parameters as base64 to better deal with shell word splitting in workflows
c7a7c366 Add instructions to simd.h
6672bbc9 Fix missing newline in log message
84034a52 Remove useless taxonomy ancestor warning
6609c6cd Fix invalid taxonomy output mode being set
441c52cf Fix taxpercontig not working with easy-taxonomy
4ce38109 lca is not computed by easy-taxonomy anymore
9d631c16 Fix cleanup of taxonomy intermediate files
d0f596f5 taxonomyreport and addtaxonomy output is now adjustable in easy-taxonomy
6bfd08d5 Cleanup default set parameters in easy-taxonomy
afcade16 Improve default taxonomy parameter lists shown (without -h)
fc126b3e Improve error messages when something is wrong with the input/output paths
3b49310f Improve unrecognized parameter message
83b9e9a1 Remove useless missing tmp dir warning
d0a9b79f Fix typo
48f9737a Add ORF filter parameters only to taxonomy for now
a6068975 Disable unfinished ORF filter in search
336d9d04 Add taxonomy citation
f7fde6fe Reduce binary taxonomy dump memory requirements slightly
eff61cfe Add \0 byte after serialization
7e63e1ea Fix typo in Parameters.h
019de271 Add vector of predefined substitution matrices
34b3a539 Merge pull request #389 from mr-c/simde_v0.7.0
74724b3a Cleanup headers in kmermatcher
73fd5cfa Update xxhash to v0.8.0
8dd192c0 Don't create false _has_{builtin,attribute}
c2d60348 Squashed 'lib/simde/simde/' changes from f2257f11..b6c9c964
062ef995 Merge commit 'c2d60348af5c036eb2cbc7974d84065e16ab4096' into simde_v0.7.0
bad16c76 Check correctly for existing of binary tax dump in createtaxdb
457cacab Replace string concatenation in aggregatetax with append
a5169557 Fix strcmp comperator in nrtotaxmapping too
0da81a03 Fix ASAN free-delete mismatch
4fa7cb27 Replace std::sort in StringBlock with fast sort
dc4f9ed4 Wrong comparision used in sort comperator was crashing clang
e09b3db3 Move taxonomy version to cpp file
1645696b Use less threads on PPC64LE regression
9c0a99ca Fix compile error in taxonomy test
f1ab0b3c Fix missing newline in lca
7ff6dc5e Add version check for binary taxonomy
df301e3b Create serialized/mmapable taxonomy in createtaxdb, taxonomy loads instantly compared to before
3addec8e Remove debug output mode from createtaxdb again
5407ca4c Don't create taxonomy files in createtaxdb again if they already exist
95968440 read correct number of CPUs in macos build script of nproc is not available
0defb362 Split aggregatetax and aggregatetaxweight parameter lists
86e6b0b7 Cleanup weightedMajorityLCA
d03a8d03 Add score vote mode to taxonomy weighted voting
553a670d Split non-index parts over more files if a split index is requested
f5a762ff Do not read e-values for tax-id 0 again in aggregatetax
4c1137c2 Add majoritylca module for majority voting based taxonomy from alignment results
4224c6a6 Move majority lca voting to NcbiTaxonomy class
6ab700bf Fix parameter order in lca and aggregatetax
26a8e478 Skip secondary structure in msa2* with (c)a3m input
6f56a262 Fix: Extract the correct source name when tar2db and createdb are used together
ea83a916 Fix cmake deprecation warning
ca6aea96 Fix #379: E-value parameters are now correctly parsed as doubles instead of floats
1cec7419 Fix atomic check when cross-compiling
aed7d976 Fix now correctly switching to xcode 12.2 in azure
184d834a Try building macOS ARM binaries on Azure's Catalina VMs
9b819686 Fix not returning error in mergeresultsbyset after error case
9f718741 Add MMSEQS_FORCE_MERGE env var for forcing generating fully merged dbs
3df79c30 Build arm64 macos binary only on big sur (not in CI yet)
acfa3ef1 Build universal mac binary for sse/avx and arm neon
f4f38685 Add symlinks to splitdb #376
41adb5d4 Add cpdb and lndb, place them and rmdb, mvdb into same file
99410a2e Revert "Remove handling of pre-split sequences in splitsequence"
3c0000ba Remove handling of pre-split sequences in splitsequence
6bb22ecc Add splitsequence parameters to all relevant workflows
d204e91f createtaxdb can create a taxdb by mapping through .source
1c52b75a Fix tar2db would create entries for non regular tar files
2719ba2f Allow createdb to read generic dbtype (to use in combination with tar2db)
9e990b30 Add missing stdin dbtype to getDbTypeName
c8e082e3 Increase number of opened files limit when DBReader is used
2a972e91 Fix gapped score calculation in proteinaln2nucl
750e8844 Update regression for taxonomy
35ad87ed Remove debug message
6a882624 Unify TaxPerContig and Taxonomy
7da33b05 Acc 2bLCA is now default for protein and translated taxonomy, tophit is always used for nucl-nucl
9d0169cc Taxonomy search mode fully integrated into alignment module
f8d2878e Refactor alignment to allow computing a limited number of realignments
1cc54190 New 2blca could compute LCA from res not finding anything in first aln
5067c1d4 Taxonomy refactoring
18da8d6e Set approx 2blca as default taxpercontig mode
6da35599 Make taxpercontig orf-prefilter parameters adjustable
45c4de7f Include file size and modified date of inputs in tmp file hash calculation #372
cc472544 Fix #371: --cov-mode 5 was not working
8e8e9a0b Fix MPI compile issue
f537370a BC breaking: Unify in result2msa --compress --summarize --omit-consensus to --msa-format-mode, support stockholm output
951d51b4 Don't link header db etc in filterresult to output db
349c2765 Move currentKey out of ifdef in tar2db
d95e41e7 Always compute result files in easy-taxonomy
31a90e13 Actually fix the uninitialized warning
20eeaabc Fix uninitialized warning
3c94c0a2 splitsequence can create a sequence database with original headers
aca7380b Return bit-score in proteinaln2nucl instead of raw-score
18588bb3 Fix filterresult off by one issue
9b74117e proteinaln2nucl can now compute scores and evalues
8ea08f0c Add curl flag to follow redirects to database downloader
1cf3002a Fix compiler warning
5dc4bcd4 Update eggnog urls (fix curl bug)
20a03128 Fix id issue in tar2db
be4d2e07 Add multi-threading support to tar2db
f6831608 Merge pull request #359 from mr-c/spelling
b244246b Spelling typos fixes
d9f2041e Merge branch 'master' of https://github.com/soedinglab/mmseqs2
971f9d90 Turn profiles from lin-space to scores, add average profile-profile code
96d452cb Inline single use of DBWriter::mergeFiles to mergedbs
24ecc26c Fix some compilation flags would not be correctly set during cross-compilation
beabb353 Make sure to flush stdout/err before calling any workflows
a1622068 Add missing dbtypes to allDbAndFlat
49240a30 Setting APT::Immediate-Configure=false fixes cross-compiler installation
d4fd0729 Next try to fix cross-compilation
bd3e49fe Remove ubuntu-toolchain ppa breaking cross-compiler installation on azure
4b9b3b56 Remove all other apt sources from azure before installing cross-compilers
57f429a0 remove unused remnants of the past in alignment class
de06950f Reduce calls to posix_memalign, fixes lock contention of some platforms
d3b0cf9a Fix result2profile could allocate not enough memory if target database contained much longer sequences than query database
1a490efe Support ungapped alignments in sliced search
3af62f06 Fix banded_sw
333cc350 Fix addtaxonomy always crashing due to invalid check
29e327f9 change orf filter params to match test runs
cc7d7da3 result2repseq should preload the sequence database into memory
63794225 Improve createsubdb help text
951d5a72 Add nrtotaxmapping to create taxonomy mapping from NR
90e71f99 Squashed 'lib/simde/simde/' changes from 938d82c8..f2257f11
df69c26e Merge commit '90e71f9968d3925e545c45d7c68325dd3cd0c588' into master
48950b95 Correctly pass threads/verbosity in taxonomy workflow
9d3ab794 Merge commit 'b6a4528e818ca644f8200fc84b2d1856ecd8f5c7' into master
b6a4528e Squashed 'lib/simde/simde/' changes from 2119ac73..938d82c8
725d9f63 Modified Profile-Profile alignment implementation with templates.
113e3212 Fix ASAN issue in extractorf when using AVX2
b15e95a1 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
b7ec0e93 Fix setcover issues with dbs > 2^31 sequences
f8b3f8b1 Add biocontainers badge
b7ac683c Update cluster update regression
4d665ce9 Automatically set cluster parameters also in cluster update
b5a08833 Fix #272 remove deleted sequences from old clustering in cluster update
66f77ce8 Cleanup subtractdbs
b2ac9e0b remove confusing comments
3d2e394a Limit number of jobs used for compiling on travis
d58cc78c Fix invalid symlinks in result2repseq
21f71466 Cluster update refactoring
60d5be17 Add missing var to profile
12b78e3f Merge branch 'master' of https://github.com/haydenji0731/MMseqs
2aaac47a First running version of double max profile/profile
fbe754e0 Fix missing newline in first sequence in entries of result2msa
db1c38b1 Made changes to SSW class for Profile2Profile Alignment
a29379e2 Do not map scores if not needed in result2pp
e80ec9a3 Updated ROC for result2pp
769aa78a Add seqdb preloading in result2pp
cf8b1429 Remove more unused parameter from result2pp
f2a29339 Update regression to include result2pp test
3fac8dde Copy profile information of unaligned regions from query profile
967a4555 Cleanup and fix result2pp
b2f49a25 Add NR taxonomy information
efdbe941 Change serial sort to std::sort
97a8f1dc Update regression
0c123fe7 Fix comp. bias correction in expandaln
401d8e6f Add --max-seqs to ungappedprefilter
f57d1a71 Update expandaln, expand2profile and regression
a62ea9a9 Update reassign cov. mode in prefilter and fix regression
64f9294b Update regression to include expansion test
61d8b64d Fix coverage read in for nucl-nucl alignment results (#339)
45ae9276 Compute evalues and sort correctly in expandaln
38fab36e Fix wrong sequences being loading in expandaln due to wrong sorting
3aa032be Cleanup in MultipleAlignment
ea3212f0 Fix realloc size in profile set size increase
7cca0508 Fix restart cluster-reassign
0945e5a5 Add prefilter parameter to reassign
4e436c79 Fix compile error in tests
47e62299 Avoid constant allocations in PSSMCalculator
657a97c0 Don't clone the whole result_t vector uselessly in profile related modules
b87cae01 MultipleAlignment does not require constantly allocating and deallocating Sequence objects anymore
486e13ac Remove add internal ID parameter in result2msa
0a8a7a3a expand2profile module should be able to directly build a new profile
a84e6f48 Make max set size in profile classes dynamically growable
5baf62ab Cleanup Sequence class
e4b2ffb0 Move PSSM masking and writing to its own file
d10a6104 Fix clang warning
76d7d83b Fix progressbar in first clust readin step
01937be2 Taxonomy expressions in filtertax(seq)db interpret , as || now #320
fddf635d Add SILVA to databases module
9ec7c5e6 Fix MPI warning
ce65cb86 disable ICC in travis, beta08 breaks their setvars.sh script and SIMDe has many issues
87183135 Fix warning in clang
97653a92 Check the return code of fclose to handle full disk errors better
06bd0cfd Add filterresult for pairwise HHblits filtering to reduce redundancy in a result db #316
3bdaf488 Fix various result2msa modes (compress works cleanly now, --filter-msa mode could return invalid MSAs)
c1f78338 Fix invalid projected backtraces in expandaln
d741a251 Remove circular include
595625a1 Cleanup result2msa/profile
8ad36374 Unify to computation of alignments in msa2result and transitivealign
55534d71 Fix wrong lengths used in msa2profile
5d10ce00 Rewrite expandaln module
4be0d6e1 Add msa2result module for generating result dbs from MSAs
a179ab27 Cleanup DBConcat
a9c56e57 Merge branch 'master' of github.com:soedinglab/MMseqs2
ec3b8254 Try out new aggregate tax algoritm
cfba9f02 Fox .index.0 files not being removed after sorting
dde4b2e3 Next try to downgrade ICC
618331da Downgrade ICC since latest version seems to be broken
ed45a9f2 Remove unused variables in rewritten microtar
328732a1 Update regression
ae7398d6 Added fident to convertalis. fident prints the fraction of the sequence identity. pident reports the percentage. soedinglab/MMseqs2#337
a61b9eb9 handle the unranked root and cell orgs
d2141f32 ORF filter with high-eval thr ungapped alignment
ea01a174 Remove useless cast in QueryMatcher
1e95b6bd Update tantan
207d0d21 Allow overwriting string parameters with empty strings
755a7b03 Add new binaries to README and fix whitespace
be05b8d0 Add orf-filter to taxpercontig and cleanup
22e17aa4 orf-filter should also work in easy-search and easy-taxonomy
7fefa8af added mode to ByteParser
4393c5aa typo
b05d7d75 Speed up read index and kmermatcher
3f9a6031 Fix --search-type 4 in createindex
18e90119 Rework read index in DBReader
1eb72611 Do not sort indexes when already ordered while DB close
65f246b1 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
053fd61c Improve multi-threaded speed of writing clustering results
e1a71066 Fix typo in arch name
3b1e528c Add SSE2 binary to Docker
d66ee416 Use Ubuntu 20.04 for cross-compilation
8ba605e8 Add SSE2 and cross-compiled ARM64/POWER8/POWER9 builds to azure
a5e485ba Fix broken checks for libraries when cross-compiling
7fe0cb90 Fix progress bar in DBConcat
cef0731b Create translated index if --search-type 4 is used in createindex
47afc572 Fix --search-mode 4 issues in offsetalignment
80fdcbed Change cluster reassing to bool soedinglab/MMseqs2#329
57e8a9df Allow ORF filter only in combination with query nucleotides
d55f06ce Fix Pfam.full database creation
659cc1f8 Add additional experimental ORF prefiltering step before translated search
e934f1c4 made ByteParser more informative
4d14c9fe tax-lineage modes: 0 nothing, 1 names, 2 taxids
b777cd09 Disable ips4o on ppc for now
95a88524 find_package is case sensitive
8e797b1d Allow disabling use of IPS4O, cleanup
850a196b added seqs assignment agreement to the output
e21dc40f Fix wrong existence checks for databases in workflows
5901a0a9 Set minimum clang to 5.0 for now
d7b46e60 Disable ips4o on cygwin
033fda23 Change travis gcc check to 4.9
908675d2 Add includes
ee7b5c11 Change random_shuffle to shuffle
d1a1af5e Rewrite atomic check in cmake
d6590f39 Add missing FastSort.h
704d0fb4 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
109be7bb Change sort to ips4o if possible
02059366 Fix warning
d092a469 Fix kmermatcher MPI support
b001dfb2 Made modifications for Profile-Profile alignment. Changes belong to SSW, Alignment, Matcher. Right before integrating lin space vector cost calculation for H value.
521c0d25 Made modifications to ssw algorithm implementation.
2f1db01c Rename martin.steinegger@mpibpc.mpg.de to martin.steinegger@snu.ac.kr
0f7b6856 Fix #326 wrong citation link
62a387ed Merge branch 'master' of https://github.com/soedinglab/mmseqs2
c125a217 Fix issues in expandaln
648bc1f6 Add Pfam-B download script
16e79a2a Add dbCAN2 download script
7c0ed7f8 Microtar would try to seek backwards resulting in horrible gzip read performance
cab0e838 Fix #323 createdb not correctly reading gz/bzip with --createdb-mode 1
1d650034 mmseqs --help should not give a useless correction suggestion
35c58af9 Improve download of taxdmp file in createtaxdb
68feeb20 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
6546822c Add missing .dbtype to newSeqDb header in cluster update
2a787482 Merge pull request #321 from milot-mirdita/simde
72d19b96 Seems like travis reduced the RAM available on ARM
565ad3f9 Add script to update SIMDe
b9783a7f Squashed 'lib/simde/simde/' content from commit 2119ac73
9828f0d6 Merge commit 'b9783a7fca1677486f2f830a9c59fda11330980c' as 'lib/simde/simde'
641ef68b Remove submodule in preparation for subtree
b6dd6447 Work around clang issue
a877dc00 Rebuild SIMD autodetection
5ba9e7ae Cleanup warnings
3980d2a7 Add one Newton-Raphson it to make division with _mm_rcp_ps always consistent
27b82963 Try limiting threads in ppc to not crash on 4gig ram
c95bdcc1 Silence strict aliasing warning in Itoa for NEON
590cfb96 Rebuild 128/256 bit SIMD split in simd.h
f5750fee Enable building on non-x86 and less than SSE4.1
21d798f0 Remove not finished createtaxdb changes
b59c3381 Make orf information available through convertalis
284bb757 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
f4bbce84 Add MemoryTracker, Account for index size when computing available memory
e2510e8f fixed comment because it wasmisleading
def7ace2 Add convertalis HTML output based on MMseqs2 app (app.mmseqs.com)
dd3ff63a Fix convertkb to work without a mapping file
dc054792 Previous lookup writer would always report failing
52ac0f36 Refactor lookup writing to not corrupt memory if an accession is too long
9f2be0e0 Disable ICC travis for now
d319bb92 Merge branch 'master' of github.com:soedinglab/MMseqs2
d1522365 remove appendtaxaln
648cf836 refined code as per Milot's feedback
94db0316 One more INT -> UINT warning that ICC complains about
271b7c13 Next try for travis
2c1dfdd4 Fix terminate value in SSW again
6016e1b1 Try to fix travis
ddaaaf7f Fix various warnings reported by ICC, add ICC to travis
f3adc10f added aggregatetaxweights to get rid of appendtaxaln
211dd7a7 change to SSTR
517b01ba true in createParameterString saves defining taxonomy defaults
aa068c86 added taxonomy default parameter values because life
f9d1face in the process of adding taxPerContig workflow
94f895a3 fixed english
19f1dfbd moved weight const back
8db3e714 moved definition of constants
05cfc8bd added mode of tax output: both lca and aln
2cd59046 added voteMode parameter
128f57b5 extended aggregatetax to handle eval-based weights
e4a10bd7 added appendtaxaln for extending aggregatetax
0c29da4a Actually fix the filterdb --join-db issue
7ff6ae7c Restore fix lost char in joindb mode change
f5c8b28c Update README.md
e4f7e745 Add qOrfStart/qOrfEnd, dbOrfStart,dbOrfEnd to offsetalignment
cf40916c Merge branch 'master' of https://github.com/soedinglab/mmseqs2
c0dac797 Do not write null byte in splitdb
cbb542af added rand id to tmp files created at localTmp
214e87e9 Remove goto in lca.cpp
c8309fce Merge branch 'master' of https://github.com/soedinglab/mmseqs2
b761ddf4 Fix issue with qset format output
80bff832 Do not write .lookup in easy workflows if not needed
21f7a05f createdb can now read a database containing FASTA/Q entries
d5a05376 Fix whitespace and cleanup output strings in createdb
d14b622e Fix cygwin compile issue
9e5fb33b Introduce KSeqWrapper to read from memory location
dc7b9626 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
de6f7524 Fix soft link createdb bug if multiple input file are provided
b06bee91 Fix alpha regex
46c84389 Update combine pval agg-mode 3
67d61013 Disable fancy progress bars on travis to reduce output
203a2173 Updated two more tests to use tighter ROC thresholds
a9052f44 Update regression with tighter bounds for ROC tests
c62736a6 Correctly parse keys from data files in filterdb --filter-file This was causing a linsearch instability
fe007cb4 Use MultiParam for gapOpen, gapExtend costs
3513001d Add easy-rbh workflow
d0d3032e Fix RBH search if using -a to show alignments
ce1a43bf Merge branch 'master' of https://github.com/soedinglab/mmseqs2
ea24e493 Fix issues with abs. path if using aria2c
5228745f Improve --alignment-mode parameter description and make it a non expert parameter
fffa9b10 Fix various inconsistencies and usability issues with alignall: * alignall alignment-mode did not correspond to align alignment-mode * add-backtrace did not do anything, has to be specified now if backtrace is needed * Did return a alignment db type even though it is incompatible with that type, uses generic for now * various parameters were passed but unused   - zdrop and scorebias are used now (however see below)   - realign, alt ali, max accept/reject, wrapped are now gone
29066847 Fix wrong warning
813d81f2 Update regression
264d7811 Switch greedy clustering algorithm back to old idea
c09f6574 Improve nucleotide clustering workflow
38a73770 Set k-mers in linclust to 0 for the nucleotide clustering
7df6e3f7 Replace characters that can not be reversed by N in extract frames
e9678f62 Update regression
f886e868 Add nucleotide support to cluster (workflow nucleotide_clustering), clust module will infer identity automatically if missing, Improve low. mem. greedy incremental algorithm, Update regression
5f873587 Add kmers-per-sequence-scale to linsearch
0310eb60 Change --kmer-per-seq-scale to a multi parameter, add error if cluster is called with a nucleotide sequence
e258bc8d Fix #299 PDB70 database creation was not working
7095f37e Add support reverse complemente in rescorediagonal --rescore-mode 0 and 1
61ca4888 Fix result2dnamsa
70d014e4 Add search-type 4 to Search
462f24cb Add module result2dnamsa
5670d990 Fix regression error
e4451d59 Add result direction parameter to kmersearch
12c499dc Fix reverse sequences issues in linclust and linsearch
44499c3c Update filterdb regression test
807b4a56 Fix issue soedinglab/MMseqs2#290. Filterdb checked for mode == true but mode was 2.
24479bc2 Fix Docker
a578f52a Fix char signedness on PPC
a0d64a98 Update regression
a07a266f Working on PPC64LE support
09734177 Remove remaining _mm_shuffle_epi32
cdef78a6 Merge pull request #285 from hgsommer/misc_small
283c8d03 Replace goto end in ssw
6bfc5028 Fix c/p mistake in convertalignments
e61da344 Fix spelling of 'length'
9a63760f Replace nested ternary operator
4349b5c6 Avoid repeatedly checking for profile db types
c170a11f Call MsaFilter::shuffleSequences() from MsaFilter::filter()
ef49ba22 Return value from MsaFilter::filter()
d155dc36 Replace int by bool literals for bool variable
ec6722ad Align headings with column in PSSMCalculator::printProfile()
548a9bd6 Avoid forward declaration of ScoreMatrix
d0fbe471 Do some cleanup in StripedSmithWaterman.cpp
91d1aedd Replace check for zero-sized containers by empty()
e47b8eed Remove superfluous parameter from ssw_init()
250b1221 Simplify return statements
4fe1116a Remove counting zero scores in Sequence::mapProfile()
4303728b Replace multiplication by zero
1bd60242 Remove increment by zero
e4d4389f Move check for exit condition in front of allocations
556d26d1 Clean up function signatures in MultipleAlignment
3863af9a Move include back to header to restore build
e1208493 Remove unused TmpResult score field
1fd4db8f Die if DBReader cannot reopen files (e.g. no more file handles left)
1e21b87b Purge sequenceLookup early since its recreate in split databases
40854ddc Prefiltering and CacheFriendlyOperations refactoring
2433e086 WASM work in progress
14014cd0 Fix prefilter overflow instability
e0f97184 Add conda forge to conda install instructions
aa175d63 Fix off by one in kmermatcher soedinglab/MMseqs2#274 (comment)
d1607bc8 Remove LINE_MAX
eca2155d Clear string buffer instead of reassigning in swapresults
0f4645ed Fix wrong reverse marking in linsearch reported by UBSAN
5b612a32 Missing mpi binaries for travis regression
83d22417 Next try for ARM compiler flags
7ad122f0 Missed a few variables
ac7914be Do not require a cmake variable to build ARM
0dcfaadb Update regression to fix broken samtools call on ARM
29927b4c More NEON fixes, we assume signed chars, ARM uses unsigned by default
7760220f Next try to get the ARM regression to work
cc6d0d52 Add hack to not break travis log size limit
5408c3d1 Try to get NEON to compile
83192cab Fix search workflow parameters printed twice
f6f001c8 Fix new clang-10 warnings and further travis fixes
259e6434 llvm-10 alias is not whitelisted in travis yet
b1249fd5 Fix errors in Travis YAML from previous commit
18486d4c Update travis - use native aarch64 for neon - use xenial - shorten script
98c37f3c shortend MultiParam usage, improved line breaks in usage
c9be07f1 Add gcc-9 to travis
2e5fb309 Fix travis clang build
d5865c89 Remove MultiParam g++-9 warning
73679835 Rework target split merging
ca586939 Fix RESSIZE issue in slice search if sequences are used
491900b9 Improve usage text of cluster/linclust
0166850a Remove old greedy incremental clustering code and just run the memory efficient version instead.
15163e64 Fix Verbosity in workflows
aa78af46 Fix issue soedinglab/MMseqs2#274
7846dfce fixed clang template error
e1206371 extended MultiParam class, replaced ScoreMatrixFile type by MultiParam<char*>
b88b5475 rewrite alphabetSize as multi parameter
ecb4e35d started template class MultiParam to store sequence type specific values
e1a1c122 changed dbtype comparision in AlignmentSymmetry
2a829aef Replace symlinkat call with getcwd/chdir/symlink/chdir to fix Conda build using macOS 10.9 SDK
28e83e8d Add OpenMP include to DBReader
fb00aa0c Fix realloc issue while IndexTable creation of profiles
504e5021 Take max. seq. len of query and target db in prefilter and alignment
16e23521 Fix bug if seq. len > max seq. length in Alignment
80d0187d Fix asan issue
751f5c19 Make ZDROP an expert parameter, change description text
1b6edd0d Rework x detection (SIMD)
9677254a Merge branch 'master' of https://github.com/soedinglab/mmseqs2
1ac1e686 Fix max seq issues in prefilter
cb737033 Reset download strategy to not use aria2c for the NCBI download
c95f3ee0 fixed ksw2 test
72b95c0c Error if we cannot download from NCBI
1d0aad50 Fix databases not piecing togehter all kalamari accessions
516723d5 Merge branch 'master' of https://github.com/soedinglab/MMseqs2
d81b6cca added zdrop parameter to control banded nucleotide alignment
e2e39a97 Add Kalamari Contaminants database
c0c538ea Various fixes in databases script
08cc95b3 Fix createtaxdb redownloading when taxdump already exists
018eb349 Remove a bit whitespace in front of each parameter in usage message
8aa7513d add aggregatetax example, fix typos
8bcd7c74 Fix typo
8e581b76 Rework usage texts
7dc25764 Hide most parameters from createindex
2baa609e Add examples to many modules
00a7d769 fixed bugs for long or wrapped nucleotide sequences
a4bdcb47 eggNOG profiles should not depend on the deleted MSAs
4c783095 Fix eggNOG database construction
f7a5599c Cleanup not needed files immediately in databases workflow
3ed3690d Fix downloads always restarting in databases workflow
4cfac9a8 Fix aria warning with more than 16 connections
e0a00e10 Revert "Use SW instead of BandedNucAln if we don't have diagonals"
7ac966b2 Fix result2msa could fail if it was writing compressed output
95729ac7 Fix wrong output DB type written in alignall
f899e7c7 Use SW instead of BandedNucAln if we don't have diagonals
c08d9fa8 Allow parameter descriptions to span multiple lines
57868498 MMseqs2 is not limited to proteins, update README to reflect that
11818b0a Cleanup hiding parameters in workflows
c481cea6 Remove some useless includes
2f64aeeb Fix databases timestamp appending instead of overwriting
ae9e9e32 Add eggNOG setup procedure to databases
31c8e5d5 Shorten two short parameter descriptions
2f49d3e3 Read header from lookup in msa2profile if available
1356869b add option to reverese profile dbs
ac3482e8 More issues with zlib and tar2db
aaafafe4 Fix tar2db keys
c751d9e2 More tar2db fixes
a9c93014 Fix variadic input to tar2db
51a76130 Add tar2db module to convert content of any tar to a DB
96f9a91e Use nedmalloc on Windows/Cygwin
73f5c2a2 Add databases workflow to README
5a7ac9e5 make align output consistent
c5ebe529 fixed setcover cluster mode (by fixing bug in similarity reading for short aln results e.g. hamming distance aln)
481696b5 Fix databases output
c6b4a57a Beginning cleaning up parameter descriptions
a9552a17 Show default value of bool parameters
af89c467 Add a proposed example text structure

git-subtree-dir: lib/mmseqs
git-subtree-split: c48da9d781b81804727b5cccfed7f97cfcc20c9d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants