Kmer array overflow. currKmerArrayOffset=0, kmerBufferPos=1024, kmerArraySize=1024. #274

kad-ecoli · 2020-02-13T16:53:41Z

Expected Behavior

mmseqs2 successfully linclust a 49 sequence protein fasta

Current Behavior

mmseqs2 complain Kmer array overflow

Steps to Reproduce (for bugs)

~/seqdb/JGI/script/mmseqs2/bin/mmseqs createdb DB.fasta DB -v 1
mkdir tmp
~/seqdb/JGI/script/mmseqs2/bin/mmseqs linclust DB DB_clu tmp -c 0.9 --cov-mode 1 --threads 1 -v 1

MMseqs Output (for bugs)

Kmer array overflow. currKmerArrayOffset=0, kmerBufferPos=1024, kmerArraySize=1024.
Error: kmermatcher died

Context

The input file DB.fasta and all intermediate files are attached.
linclust_3300021621.zip

Your Environment

Include as many relevant details about the environment you experienced the bug in.

Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
MMseqs2 Version: 481696b5f426f991211894d8a855bf9d60065c8f
Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
https://mmseqs.com/latest/mmseqs-linux-sse41.tar.gz
Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
Operating system and version:

LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.7.1908 (Core)
Release:        7.7.1908
Codename:       Core

The text was updated successfully, but these errors were encountered:

martin-steinegger · 2020-02-13T22:10:10Z

Thank you for the report. Could you please rerun it with the latest version?

martin-steinegger · 2020-02-14T00:17:09Z

oh the error still exists in the newest version

martin-steinegger · 2020-02-14T00:40:25Z

Okay the bug should be fairly rear. It occurs if linclust exactly extracts 1024 k-mers.
Quick fix is to increase the amount of k-mers per sequence e.g. to 30 (--kmer-per-seq 30).

kad-ecoli · 2020-02-14T13:57:45Z

Could you give a command example to show how to increase the amount of k-mers per sequence e.g. to 30?

martin-steinegger · 2020-02-14T14:30:41Z

~/seqdb/JGI/script/mmseqs2/bin/mmseqs linclust DB DB_clu tmp -c 0.9 --cov-mode 1 --threads 1 -v 1 --kmer-per-seq 30

martin-steinegger · 2020-02-16T21:27:11Z

Should be fixed now. Thank you for reporting.

if you want a set of stickers (see https://twitter.com/thesteinegger/status/1201076220957315074), send me your address to themartinsteinegger at gmail com.

kad-ecoli · 2020-02-25T21:33:45Z

I am afraid the issue is not yet solved. I run mmseqs using the same method on the following set of sequence. mmseqs will again complain about "Kmer array overflow. currKmerArrayOffset=10240, kmerBufferPos=1024, kmerArraySize=11264." if --kmer-per-seq 30 was not set
DB.fasta.zip

martin-steinegger · 2020-02-26T03:23:09Z

@kad-ecoli oh, there still seem to be cases where an off by one could occur. I pushed a fix. Could you try the latest version? Thank you for your patience. I got your mail address for the stickers! We will sent them out this week.

kad-ecoli · 2020-03-08T15:15:35Z

The new fix works fine. I also received the stickers. Thank you for the little foxes.

46c843895 Update combine pval agg-mode 3 67d610136 Disable fancy progress bars on travis to reduce output 203a21736 Updated two more tests to use tighter ROC thresholds a9052f449 Update regression with tighter bounds for ROC tests c62736a6d Correctly parse keys from data files in filterdb --filter-file This was causing a linsearch instability fe007cb4e Use MultiParam for gapOpen, gapExtend costs 3513001d3 Add easy-rbh workflow d0d3032e9 Fix RBH search if using -a to show alignments ce1a43bf1 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 ea24e4934 Fix issues with abs. path if using aria2c 5228745f5 Improve --alignment-mode parameter description and make it a non expert parameter fffa9b10e Fix various inconsistencies and usability issues with alignall: * alignall alignment-mode did not correspond to align alignment-mode * add-backtrace did not do anything, has to be specified now if backtrace is needed * Did return a alignment db type even though it is incompatible with that type, uses generic for now * various parameters were passed but unused - zdrop and scorebias are used now (however see below) - realign, alt ali, max accept/reject, wrapped are now gone 290668474 Fix wrong warning 813d81f29 Update regression 264d78117 Switch greedy clustering algorithm back to old idea c09f6574e Improve nucleotide clustering workflow 38a737708 Set k-mers in linclust to 0 for the nucleotide clustering 7df6e3f75 Replace characters that can not be reversed by N in extract frames e9678f625 Update regression f886e868f Add nucleotide support to cluster (workflow nucleotide_clustering), clust module will infer identity automatically if missing, Improve low. mem. greedy incremental algorithm, Update regression 5f8735872 Add kmers-per-sequence-scale to linsearch 0310eb607 Change --kmer-per-seq-scale to a multi parameter, add error if cluster is called with a nucleotide sequence e258bc8d8 Fix #299 PDB70 database creation was not working 7095f37e4 Add support reverse complemente in rescorediagonal --rescore-mode 0 and 1 61ca48883 Fix result2dnamsa 70d014e41 Add search-type 4 to Search 462f24cbb Add module result2dnamsa 5670d990e Fix regression error e4451d591 Add result direction parameter to kmersearch 12c499dcd Fix reverse sequences issues in linclust and linsearch 44499c3ce Update filterdb regression test 807b4a56a Fix issue soedinglab/MMseqs2#290. Filterdb checked for mode == true but mode was 2. 24479bc27 Fix Docker a578f52a7 Fix char signedness on PPC a0d64a989 Update regression a07a266f9 Working on PPC64LE support 09734177c Remove remaining _mm_shuffle_epi32 cdef78a69 Merge pull request #285 from hgsommer/misc_small 283c8d03f Replace goto end in ssw 6bfc50281 Fix c/p mistake in convertalignments e61da3447 Fix spelling of 'length' 9a63760fa Replace nested ternary operator 4349b5c6e Avoid repeatedly checking for profile db types c170a11f5 Call MsaFilter::shuffleSequences() from MsaFilter::filter() ef49ba220 Return value from MsaFilter::filter() d155dc36c Replace int by bool literals for bool variable ec6722adc Align headings with column in PSSMCalculator::printProfile() 548a9bd68 Avoid forward declaration of ScoreMatrix d0fbe471f Do some cleanup in StripedSmithWaterman.cpp 91d1aeddc Replace check for zero-sized containers by empty() e47b8eed9 Remove superfluous parameter from ssw_init() 250b1221d Simplify return statements 4fe1116ae Remove counting zero scores in Sequence::mapProfile() 4303728b5 Replace multiplication by zero 1bd602420 Remove increment by zero e4d4389f2 Move check for exit condition in front of allocations 556d26d1a Clean up function signatures in MultipleAlignment 3863af9ac Move include back to header to restore build e1208493a Remove unused TmpResult score field 1fd4db8f2 Die if DBReader cannot reopen files (e.g. no more file handles left) 1e21b87ba Purge sequenceLookup early since its recreate in split databases 40854ddcd Prefiltering and CacheFriendlyOperations refactoring 2433e086b WASM work in progress 14014cd0e Fix prefilter overflow instability e0f971848 Add conda forge to conda install instructions aa175d636 Fix off by one in kmermatcher soedinglab/MMseqs2#274 (comment) d1607bc8a Remove LINE_MAX eca2155d7 Clear string buffer instead of reassigning in swapresults 0f4645edd Fix wrong reverse marking in linsearch reported by UBSAN 5b612a327 Missing mpi binaries for travis regression 83d22417a Next try for ARM compiler flags 7ad122f0a Missed a few variables ac7914bea Do not require a cmake variable to build ARM 0dcfaadbb Update regression to fix broken samtools call on ARM 29927b4c4 More NEON fixes, we assume signed chars, ARM uses unsigned by default 7760220ff Next try to get the ARM regression to work cc6d0d52b Add hack to not break travis log size limit 5408c3d10 Try to get NEON to compile 83192cabd Fix search workflow parameters printed twice f6f001c8c Fix new clang-10 warnings and further travis fixes 259e64341 llvm-10 alias is not whitelisted in travis yet b1249fd54 Fix errors in Travis YAML from previous commit 18486d4c5 Update travis - use native aarch64 for neon - use xenial - shorten script 98c37f3c3 shortend MultiParam usage, improved line breaks in usage c9be07f1a Add gcc-9 to travis 2e5fb309a Fix travis clang build d5865c894 Remove MultiParam g++-9 warning 73679835b Rework target split merging ca5869397 Fix RESSIZE issue in slice search if sequences are used 491900b99 Improve usage text of cluster/linclust 0166850a2 Remove old greedy incremental clustering code and just run the memory efficient version instead. 15163e64c Fix Verbosity in workflows aa78af463 Fix issue soedinglab/MMseqs2#274 7846dfce3 fixed clang template error e1206371c extended MultiParam class, replaced ScoreMatrixFile type by MultiParam<char*> b88b54756 rewrite alphabetSize as multi parameter ecb4e35d4 started template class MultiParam to store sequence type specific values e1a1c1226 changed dbtype comparision in AlignmentSymmetry 2a829aef7 Replace symlinkat call with getcwd/chdir/symlink/chdir to fix Conda build using macOS 10.9 SDK 28e83e8d5 Add OpenMP include to DBReader fb00aa0c3 Fix realloc issue while IndexTable creation of profiles 504e5021f Take max. seq. len of query and target db in prefilter and alignment 16e235214 Fix bug if seq. len > max seq. length in Alignment 80d0187de Fix asan issue 751f5c19f Make ZDROP an expert parameter, change description text 1b6edd0d4 Rework x detection (SIMD) 9677254ab Merge branch 'master' of https://github.com/soedinglab/mmseqs2 1ac1e6866 Fix max seq issues in prefilter cb737033c Reset download strategy to not use aria2c for the NCBI download c95f3ee0e fixed ksw2 test 72b95c0ce Error if we cannot download from NCBI 1d0aad50b Fix databases not piecing togehter all kalamari accessions 516723d53 Merge branch 'master' of https://github.com/soedinglab/MMseqs2 d81b6cca5 added zdrop parameter to control banded nucleotide alignment e2e39a971 Add Kalamari Contaminants database c0c538ea3 Various fixes in databases script 08cc95b3a Fix createtaxdb redownloading when taxdump already exists 018eb3498 Remove a bit whitespace in front of each parameter in usage message 8aa7513de add aggregatetax example, fix typos 8bcd7c740 Fix typo 8e581b762 Rework usage texts 7dc25764a Hide most parameters from createindex 2baa609e8 Add examples to many modules 00a7d7696 fixed bugs for long or wrapped nucleotide sequences a4bdcb478 eggNOG profiles should not depend on the deleted MSAs 4c7830954 Fix eggNOG database construction f7a5599c8 Cleanup not needed files immediately in databases workflow 3ed3690d4 Fix downloads always restarting in databases workflow 4cfac9a8a Fix aria warning with more than 16 connections e0a00e10d Revert "Use SW instead of BandedNucAln if we don't have diagonals" 7ac966b2e Fix result2msa could fail if it was writing compressed output 95729ac7c Fix wrong output DB type written in alignall f899e7c7a Use SW instead of BandedNucAln if we don't have diagonals c08d9fa8e Allow parameter descriptions to span multiple lines 57868498e MMseqs2 is not limited to proteins, update README to reflect that 11818b0a2 Cleanup hiding parameters in workflows c481cea60 Remove some useless includes 2f64aeeb8 Fix databases timestamp appending instead of overwriting ae9e9e329 Add eggNOG setup procedure to databases 31c8e5d50 Shorten two short parameter descriptions 2f49d3e3e Read header from lookup in msa2profile if available 1356869b0 add option to reverese profile dbs ac3482e80 More issues with zlib and tar2db aaafafe43 Fix tar2db keys c751d9e2f More tar2db fixes a9c93014c Fix variadic input to tar2db 51a761305 Add tar2db module to convert content of any tar to a DB 96f9a91e5 Use nedmalloc on Windows/Cygwin 73f5c2a2d Add databases workflow to README 5a7ac9e54 make align output consistent c5ebe5297 fixed setcover cluster mode (by fixing bug in similarity reading for short aln results e.g. hamming distance aln) 481696b5f Fix databases output c6b4a57a8 Beginning cleaning up parameter descriptions a9552a177 Show default value of bool parameters af89c4677 Add a proposed example text structure 9c17f4eba Rework module description texts, better categories, shorten all descriptions, prepare to replace long descriptions with examples 00ff199e8 Add Resfinder DB f1011ecb4 Fix krona again marked as vendored 02001ab03 missing mode resulted in different top1 4375463bc Header db should not have to be a unsplit db edccbf33f Actually fix extractorfs lookup creation 041e8e558 Improve README a8f2c7bad Remove correct workflow script in createtaxdb.sh 26c8202a9 print createdb cmd line again df02bae34 Refactor createseqfiledb, remove stringstream 2523ebe1a do not write null byte af847a724 Fix clang warning from DBConcat ef1ec596f extend dbconcat to handle auxillary files 528bd2134 not needed dec1b9215 Silence warning in GCC 4.8 casting function to void* 2d44c886d Fix extractorfs not being able to create lookup ffe66afac Replace isnumber with isdigit. Add more tests to TestTaxExpr fbe09867e Rework Taxon Expr parsing f58329ef5 Add constructor to define custom functions to ExpressionParser b6ef07281 Initialize expressionparser per thread, was not thread safe f966bfa62 Fix reallocation issue in BandedAlignment bbd3c2bb7 Add +1 to realloc in BandedNucleotideAligner but not to length 6b6e82ae6 Add +1 to realloc in mapSequence 75e2c8ec4 Fix off by one issues in realloc in rescorediagonal and BandedNucleotideAligner afd14c8c2 First step to get rid of maxSeqLen 13ca612db Fix allocation issue in kermatcher if sequences are longer than > 2^16 62de5ba93 Fix off by one in computation for splits in kmermatcher 35e95d180 Change int_sequence to char (big change) ecf82f2f4 Revert "Temporarily disable soft split mode for createdb in easy workflows" d19219dd4 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 1a0d898ec Fix softlink issue in createdb soedinglab/MMseqs2#265 13e0fe466 Temporarily disable soft split mode for createdb in easy workflows 4487b6e14 Fix view module to work with softlinked createdb dbs c1e9eb0e3 Fix MPI issue if only one server is used e781c3fe5 fix MPI compile error 9bcff2844 Fix Filter2 bug of HH-suite in MMseqs2 soedinglab/hh-suite#182 01db79d33 Fix some bugs in splitting handling d9a887453 Fix memory splitting issues in kmermatcher, kmerindexdb 37880f083 Fix MPI in kmermatcher and indexdb bee93123f Update regression 03a89ff1c Merge branch 'master' of https://github.com/soedinglab/mmseqs2 6ca967362 Update the way how k-mers are extracted in kmermatcher. Extraction should be now ~3 times faster. f1388309d Introducing databases workflow to automatically setup and download common databases d78fdbb06 Add progress to convertmsa 18acba224 Do not recreate _mapping file if it already exists in createtaxdb 63a373f5a Skip validations steps correctly if a input db is neither INPUT nor OUTPUT d95caa1a7 Allow modules with zero parameters 9f8aff948 Allow modules to handle -h or --help themselves cf5691f92 Typo 8ebc9d16b fixed access mode 31895414d Clarify parameter help in createdb f644744a8 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 c287719d9 Remove check for profiles for splice serach. It should also work with sequence databases. c75fe9acf regression submodule w filtertaxseqdb 7587a872f Add one more missing check in kmermatcher 8d4e9f4fc Remove +1 from size in initKmerPositionMemory aca141e95 Fix shellcheck error in splicesearch 8bdff50e1 Move +1 from initKmerPositionMemory outside f12821e35 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 d74b76ca5 Avoid overflow in kmermatcher if split is needed fd90ff2c3 Move compiled data resources into subfolders 2fd9f25d2 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 b439ce831 Make the slice search applicable to other databases types, not just profiles 589a2e276 Fix apply crashing on empty entries 82542a6ac Merge branch 'master' of https://github.com/soedinglab/mmseqs2 c0acdd8f3 Fix memory leak in createsubdb. 5129a956d Validate taxonomic ranks and make input/output formats consistent 53bb55b38 Fix issues in hash function soedinglab/MMseqs2#252 764c4a3e7 Fix lca message c013a6929 Fix LCA output message a1206690d Change db validator from result2stats 714f5b4fb Replace mmaped input file with std c io in createsubdb 6e43e9413 Add remove .source file to rmdb 3e58bb85b Fix result2flat soedinglab/MMseqs2#261 3e27833db Revert easycluster.sh back to result2flat. Reason is that createsubdb can not handle soft linked sequence databases (input.0 -> input.fas) 33354680f Merge branch 'master' of https://github.com/soedinglab/mmseqs2 1e92fb504 Replace result2repseq and result2flat with createsubdb and convert2fasta 55bcdd303 single step clustering could potential cluster unrelated sequences due to hash collisions fdd0646b1 Fix clusthash issues with parallelization and nucl input e62a1c717 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 1336b7ad2 Add MSA to allDb and allDbAndFlat 48a037a2e Update Prefiltering.cpp a1adbf52d Fix warning: Remove useless copy constructor from Matcher::result_t d3ca42657 Remove truncatedCounter variable in QueryMatcher 4647525ec Show full help text if "Error in argument " occurs 4149ae457 Remove annoying message in prefilter (truncated result). Move it to the statistics section. d5aab5b86 Update regression 1f1e049e6 Fix output of unclassified hits in convertalis 83ff5c601 Fix permission issues for tmp directory cce6e6714 add support to output taxon in easy-search when using an indexed database f200bdd62 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 6f28a29ae Fix seg. fault if all sequences could be classified 473d60580 Update batches b52668f6e Add chat icon af54c8e8e Update README.md 7eb6a0b70 Makde addtaxonomy more resilient against invalid taxonomy mappings 3482b0e91 Merge pull request #260 from RuoshiZhang/master 36f49f5b5 Fix issue in memory computation for split bcb97d63f Update README.md abcd97de7 write same number of fields even if no hit 38e102181 Update regression to hopefully fix windows failure f41511465 Fix spelling error 1fd24924e Add a search-type 4 for trans-trans search returning a nucl backtrace in offsetalignment 31f6d7ac3 add aggragatetax to assign set tax by majority vote b6e8ee239 allow more dbtypes in swapdb c9d02ef21 add option to view rank index 49db7258e typo fix 9c32930f3 Merge branch 'master' of github.com:soedinglab/MMseqs2 17b5494fe Fix auto detection of dbtype in createdb 8831df81d Merge branch 'master' of github.com:soedinglab/MMseqs2 be1a9822c Fix createseqfiledb soedinglab/MMseqs2#258 02be0c4ea Fix summarizeresult to support reverse position in alignment 7ef586276 added filtertaxseqdb 00f2fd2b8 added mode for all but index 127db8c6d minor tidying for filtertaxdb 8144e7653 Merge branch 'master' of github.com:soedinglab/MMseqs2 48f77fa7d Fix ASan issue in filterdb d722d5724 Fix warning in filterdb 4a4e6ea15 Update regression test for filterdb 31a7dc124 filterdb --join-db ignores lines it cannot join instead of crash 6c6faa96d filterdb's --extract-lines works together with --trim-to-one-column 12bee8142 filterdb can filter by rows with value within percentage #249 5c919ab95 Allow double parameters separately from floats in parsing f9be8a88d Remove broken filterdb paths 1dc04f5e1 Refactoring of filterdb 90e3a9aaf Fix bug for enforced dbtypes in createdb a4cee78db New regression to check stdin support 17ec97c78 Add stdin support to easy workflows 76c9e7c36 Fix compiler warnings in KSeqWrapper 0cc45536b Overwrite dbtype correctly in createdb c0045182b Add stdin to createdb 02a88e438 use https instead of ftp for downloading taxdb data a33bd27f4 offsetalignments now correctly returns a nucleotide backtrace if needed 456e1b5ab include VTML40 in binary for easier access 775de3850 Add missed target .source file for reading in convertalis c08c071b2 Overload patterncompiler isMatch for pos of match ba6aa8d12 avoid appending extra tabs besthitperset git-subtree-dir: lib/mmseqs git-subtree-split: 46c8438958edccd8fd09640eb174e2449529e4df

c48da9d7 Update Prefiltering.cpp 45891515 Reset errno before various strto* calls 7e284099 Update docker install instruction to GHCR 28b00883 Fix FASTA input not ending with a newline resulting in invalid sequence db with --createdb-mode 1 (#617) a81d9e72 Fix issue with gcc 4.9 8799829d Fix compile error 1761bd60 Add module db2tar: Create a tar file from a database dcd180be (Re)add support for tar-writing to microtar fea8d203 Add support for external k-mer thresholds for the prefilter ede0be15 Rework rescore diagonal 8f78b0ab Rework ungapped alignment aabc78c2 Fix indexdb ce8cd536 Fix masking issue 304a99bb Delete unmasked index to fix asan issue 67949d70 Fix #586 summarizeresult should not reject hits that match the coverage threshold 3d4840b3 Use macos-11 in azure 8ff26f23 Support finding taxonomy db paths from other prefilter databases 8ff72796 Add speedup shortcut to TaxonomyExpression for a single tax identifier 1d631726 Add taxonomic filtering during prefilter with --taxon-list 3b9cf881 Add URIs as allowed parameter inputs 1c739ae7 Add easy parsable tsv output to databases ba4e11f1 workflow_dispatch can tag container as latest 7ebd2e04 Revert alignment profile in sequence.cpp 5185d3cb Allow tagging of docker containers through workflow dispatch eb203d35 Build docker image in GH action and publish to ghcr 678c82ac GTDB ar122_taxonomy does not exist anymore, replace with different file #561 7be78c81 Fix tar2db breaking with --tar-include/exclude #561 d1555862 Encode more 16b57741 Encode " \n\t[]{}^$?|.~!*" as b64 b0b8e85f Fix truncated profile sequences in convertalis #567 96b20099 Fix broken badges in README (and remove travis) 407b315e Fix multi-threading issues in pairaln 92deb92f Fix unpackdb parameter be8c278c Progress update fix 58593ec0 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 3f8695ea Add multi-thread support to pairaln e9e829c7 Fix seg. fault in realign ce7bf53b Point Kalamari3.7v to a fixed commit soedinglab/MMseqs2#531 fcf52600 Remove a level of indirection to access compatible index version 922e2691 Fix failing utility tests 74c3aa65 Fix typo (violoations -> violations) (#526) 7281baf9 Add --comp-bias-corr-scale d89fcecf Write serialized index in appenddbtoindex 79ea1ee3 Fix new IndexReader USER_SELECT trying to read header databases as fallback a506d677 Allow subprojects to build their own precomputed indices 75af0c82 Add appenddbtoindex to argument a precomputed index in sub-projects 4f046dd1 Add mask prob to mask sequence 38cf3f10 Fix TestIndexTable b768f48f Add --mask-prob parameter bfc6f85b removed error message for wrapped scoring, should work with all rescore modes edb8223d Fix pairaln 6e7ed700 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 e19df7ce Rework pairing to support more than two sequences 9fded60a Add environment variable MMSEQS_IGNORE_INDEX to ignore an existing precomputed target index efacc690 Cushioning the overestimated number of diagonals in case of many successive hits on one diagonal 5fc318b6 Add convertalis --format-mode 4 to print blast-tab headers 80fcadde Disable profile gap scores in msa2profile temporarily 9cc89aa5 Fix huge memory allocations introduced in 49c2b70 a8c30da5 result2msa correctly prints X residues 482dedc6 Explicitly set threads in Cirrus 75e9bfaa Update tectonic in azure to fix error in userguide building 16830a52 Fix number of CPUs used in cirrus aab640d2 Fix gap pseudocount mode again 716fb621 Turn --k-score into MuliParam so it works correctly in iterative-profile search 56816b39 Resfinder download should not use tar wildcards, broken in busybox #494 e85ceb9d Change the url for UniRef* from ftp to https in databases downloader (#496) 49c2b70b Fix mem. issue 09e261bf Avoid substracting from getMaxSeqLen 4b77690e Move maxSeqLen logig to getMaxSeqLen() to avoid index issues d8736973 Fix max length in DBReader Allocate CSProfile only when needed 42bf6438 Rework download database 5afd33c3 Make "databases" usable in sub-projects f6518799 Update regression f3f5b133 Update k-score sensitivity fitting for no-cntxt profile searches 3e92abf7 Add db-load-mode support to pairaln 5e245d17 copy dbtype and clear map 4a3bb340 Merge branch 'master' of https://github.com/milot-mirdita/mmseqs2 9a0df0d2 Add pairaln fa44760e Fix recent forgotten else in getKmerThreshold 45b2b521 Revert "Try increasing the k-mer thresholds again for 5/6-mers" be119433 Fix prefilter not correctly masking extended dbtype for comparision e3ce4605 Fix memory leak in MappingReader uncovered by ASan 06bdc5e7 Fix missing cassert header in tsv2exprofiledb 8521fb45 Remove useless calls to opendir/closedir in FileUtil 885b4699 Add workflow to create expandable profile (profile-profile) db from a bunch of TSV files ad05844f Add missing pseudocount check in indexdb e33c32aa Fit new values for prefilter 7950368f Fix another broken test b456cf51 Fix unused variables in lca 003cd244 Merge remote-tracking branch 'main/master' 6a8f586b Add extended dbtype to check for context specific pseudocounts, so that the correctly fitted kmer thresholds can be used 92a19497 Fix uninitialized warning in addtaxonomy 2e75435e Fix createbintaxonomy mapping dump size written 178eacff impl. contextPseudoCnts getKmerThreshold, values not fitted yet 35c67c87 Change pos. spec. gap costs to templates 9defdf89 fixed bug for uneven number of repeated kmers 0c26a107 replaced global with end_to_end in rescore mode variable 9064061d fixed size_t parameter handling 3fa46fe3 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 763fa9ff Change compress loop to omp static to keep order 49710b7f Fix sub. mat asan issue d0a00d6a Update Sub. Mat. logic for aa2num mapping ccf55559 Fix test e4aae927 Make taxonomy mapping mmap'able for instant read-in c66fd1b1 Fix syntax error in filterresult 87623596 Fix issues with include identities in filterresult 91617c4b Add includeIdentity to filterresult fe16da39 Stay compatible with previous short A3M header output format ce5b2418 Fix wrong assumption about header databases IDs with new index database scheme in result2msa a54df874 Remove E-value threshold in filterresults 5647a56a Allow --diff 0 d5656191 Add MSA output mode for A3M+aln info 85ce8472 Expand can filter in each target cluster before expanding ae4c7ab1 Merge branch 'master' of https://github.com/soedinglab/MMseqs2 38ab523a Merge branch 'master' of https://github.com/soedinglab/mmseqs2 5e0d11f2 Extend MSA filtering for bucketed filtering within qid buckets c6d8ae0c Add filter min enable 25cb16ff Enable result2profile/filterresult to read new expand alignment index 37225004 Don't mask consensus sequences in profiles b2a34020 Ignore cacode warnings c3e90f41 Allow indexing of profile-profile db f3491183 Make sure very large database don't overflow localThreads 66fa3c76 Update regression to remove result2pp from expand check 87fed2e6 Merge remote-tracking branch 'main/master' 5b75b842 Try increasing the k-mer thresholds again for 5/6-mers ad5837b3 Revert "result2msa now supports reading from index" 7ee3e794 Fix wrong database name printed for variadic input when creating a tmp directory 15fdf48e result2msa now supports reading from index 7aade9df Change deep copies to const references in result2msa ce7cf754 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 31eb67ae Add A3M support to result2msa 56f7685b Add symlinks/copies for _taxonomy file #474 904d0c6d Transition old compiler tests from travis to CirrusCI 442d8983 Fix memory issues in QueryMatcher 17c8028e Move fixRlimitNoFile to Application c6634976 Fix the forbidden symbols when using unpackdb (#467) 488df863 Refactoring of gff2db d822533f Build update function for DbType validators a09a704e Remove bash dependency in regression to fix FreeBSD in CirrusCI 4f1996a4 Fix FreeBSD on CirrusCI samtools issue a2e2129c Add CirrusCI to test FreeBSD 01492c95 Revert "Make sure QueryMatcher::radixSortByScoreSize cant corrupt memory" 15ace29a Fix posix_madvise on FreeBSD returning error if size=0 (See #460) 86152a2f Remove useless calls to std::map::operator[] d4dd06d2 Fix iterative profile search restartable again 91b61706 Make sure QueryMatcher::radixSortByScoreSize cant corrupt memory af317095 Save a buch of work when sequences are not needed in expand* be5a1da4 Replace many aligned allocation in MultipleAlignment with single allocation 7469d599 Fix unused warning 942a012a Move MultiParam::format out of header to avoid compilation warning d2148058 Fix unused parameter warning 40ba03f4 Disable warnings from nedmalloc (external dependency) c811a511 Fix tests after profile-profile refactoring 7a8ee485 Try to fix profile-profile alignment for SSE 68862ed2 Add missing simd.h functions for SSE a09de7eb Fix compile errors 807d97a9 Merge remote-tracking branch 'main/master' into ppmerge 4578f8ba Temporary change to slicesearch to speed things up 3a51b445 Add support to support position-specific gap penalties in profile-profile alignment in iterative search. 139e4502 Get rid of MathUtil::popCount in favor of __builtin_popcount bbfd6e26 Add preload mode to expand(aln/2profile) b14d0136 Fix a few more tests 635911ec Increase sortresult buffer for matcher result d6c19db9 Fix exhaustive search parameter in examples e86afeab Move substitution matrix init code out of Parameters::parseParameters to fix tests 62f7aba1 Replace biorxiv citation for taxonomy paper 24f6b52a Cleanup magic value with constant in kseq c7f6a37e Allocate at least a 20 * 20 matrix in StripedSmithWaterman 57de8c8d Fix profile2repseq input database type 96a069e5 Shellcheck fix 52c6ae87 "Can not" to "Cannot" in DBReader and cleanup e39d02af MemoryMapped cannot accidentally segfault on 0-byte sized files anymore 2d7411a1 Revert "Bug fix with empty temporary files" 7be4fca9 Add VOGDB to database downloader dd5db429 Update dbCAN2 to V9 and make remove .aln suffix from profile names d4a33542 Always set a value for FILTER_RESULT in exhaustive search ec1f599e Update regression for recent change to nucl-nucl search c967985e changed rescoring for nucleotide sequences only in prefilter 19064f27 Revert "fixed rescoring for nucleotide sequences with multiple diagonals for one target exceeding UCHAR_MAX count" c54c5382 fixed signed error f751bcc9 fixed rescoring for nucleotide sequences with multiple diagonals for one target exceeding UCHAR_MAX count 1d770285 Fix endless loop in rescorediagonal 4462533c Don't allow iterative profile search in taxonomy #432 64a2265f Make sure no backtraces are computed in lcaalign b8501a1b Fix previous broken commit 971b442e Fix additional two more memory leaks before exit 7fbc0b65 Fix memory leak in DBWriter::createRenumberedDB a6cab565 Fix prefilter/alignment with 0-size query input #433 14a3dce2 createsubdb and view can now return results from identifiers in .lookup with --id-mode 1 6622c9f0 Fix DBReader::USE_LOOKUP_REV d77de8da Fix extractorfs sometimes loading invalid start/stop codons on non-avx2 platforms 5daca424 Fix typos in extractorfs warnings for short input sequences fe61aeee Replace strcpy in microtar 0523594f Add support for GNU tar specific filenames and some lesser used entry types to tar2db 5ed18ff0 Merge commit '15242315f80fbda1bffc05cd41fa47c192373902' 15242315 Squashed 'lib/simde/simde/' changes from 79bf0b7c..1f4a28c4 bb02734e Get rid of more scanf calls fa4cd2a7 Fix arch selection on ARM (use -mcpu instead of -march) and s390x (enable -mzvector) a202b3c2 Squashed 'lib/simde/simde/' changes from b6c9c964..79bf0b7c fb39ca1e Merge commit 'a202b3c2d58cc2f80ecfb2123158377f08bc6510' 3d40f105 Fixes for gap panalties merge 2718ca75 First attempt to merge prof-prof and gap-penalties 93f90b04 Fixes to last merge b7811188 Merge branch 'master' into main-master 22a7bfa2 Add iterativepp workflow 1a87a226 Cleanup Matcher::compressAlignment 6885bad8 Get rid of sscanf in Matcher::uncompressAlignment 50ce7a5c Fix previous commit writing dbtypes for big endian 852f04de Fix compile error afa6d02d Read/write dbtypes always as little-endian 6269994f Explicitly support size_t in Parameters d9744e3c Fix some 32-bit issues #418 c25aec57 Cleanup kmergenerator header be343e98 Additional s390x fixes (linclust might work now) 45111b64 Add initial fixes to get MMseqs2 working on s390x b1704ccc Merge branch 'master' of https://github.com/soedinglab/mmseqs2 f388ead8 Add parameter --alignment-output-mode, remove alignment mode 5 2a4a2dc5 Add correlation score parameter to align f9d2ae30 Add support for new Multiparameter type cbc1b489 Refactor pseudocounts 1e58454a Restore K4000.crf from history f6eadeaa --majority parameter was missing from taxonomy workflow 24217dc9 Reduce number of threads on travis ARM ff4c9029 Remove SORTRESULT_PAR from search.cpp 178d3b5f Fix exhaustive search 247de411 Move warning from inner loop to outer in extractorfs 6a0dcee4 Update Regression test f92447d0 Rename slice to exhaustive search, add filterresult 6c2fefce Set pca to 0.0 in expand2profile 0cc7e674 Add unpackdb to split a database into separate files #406 877344c3 Add USE_SYSTEM_ZSTD cmake flag to use system provided zstd #411 bbd56417 Replace throw with abort in ALP again 46c26ce9 Add missing licenses and readmes for code in lib #403 20543e0a Update ALP to 1.98 and add readme/license d5717e82 Add CDD to databases downloader #410 04b27f98 msa2profile always copies lookup/source files instead of linking them to be independent from the MSA db 2d83f517 msa2profile/result can skip the first sequence 242a8faf Pass threads to tar2db in databases workflow a19f5a52 Allow clustering of clustering input with set-cover or connected-component by ignoring scores/weight 39a41403 Don't set INT_MAX as --max-seqs in slice search to avoid huge allocations in prefilter 9290a2b5 Allow sequence database input in taxonomyreport #408 aaba0c7f Short circuit cluster-reassign if nothing can be reassigned 3822a8f5 Fix tmp files not getting removed in linclust/cluster with --remove-tmp--files 2a35e025 Fix kmermatcher setting user k-mer pattern in auto k-mer selection and breaking a1050359 Rename accelerated 2bLCA to approximate 2bLCA to be consistent with manuscript 11698a5b Rename LICENCE to LICENSE soedinglab/MMseqs2#402 0828d865 Allow result database input in taxonomyreport #401 b31ebb64 Krona taxonomy report was not working if no sequence was unclassified 9f0fb3ed Cleanup taxonomyreport a2d9568d Fix wrong azure dependency b1367fc2 Make resultToBuffer buffer sizes consistent (needs further refactoring) 98f9939d Get rid of results temporary array in msa2result d495e0e9 Replace texlive with tectonic for userguide building e03b5257 Fix MMseqs2 Taxonomy citation 602689c1 Update examples in mmseqs (easy-)taxonomy invocation ecf152cf Improve (easy-)taxonomy description text by reordering parameters by importance e0b04434 Improve description of --orf-filter a7f91d46 Add warning if cluster or prefilter input is used in majoritylca with invalid --vote-mode a3399397 Update regression to include recent speedup d5da12d7 Add GTDB to databases downloader 83780f4c Respect verbosity for rmdb calls in databases 9011c15d Improve output of databases list 86c03fd4 Increase buffer sizes in tar2db 2bd03c68 Fix tar directory (symlink, etc) entries causing tar2db to stop early 7bdb222d Use DBWriter to write .lookup multi-threaded in tar2db 23c9e1e7 Don't use multiple threads in tar2db when reading .tar.gz/.tgz as nearly all the time is spent inside zlib 2e128d4f Increase zlib buffer in tar2db to speedup reading c1911893 Fix multiple locations where Util::checkAllocation would never be called as the preceding allocation would already terminate on failure 1f302134 Fix two compilation failures revealed by Debian 5b03cdff Another instance of the same warning 3fda449b Fix compile warning 3b0197af Encode species names in taxonomy blocklist to make sure we don't block random nodes in non-NCBI taxonomies (e.g. GTDB) ab2426f8 Fix String MultiParameter (e.g. sub matrices) breaking if filenames contain whitespaces e8de3507 Encode whitespace containing parameters as base64 to better deal with shell word splitting in workflows c7a7c366 Add instructions to simd.h 6672bbc9 Fix missing newline in log message 84034a52 Remove useless taxonomy ancestor warning 6609c6cd Fix invalid taxonomy output mode being set 441c52cf Fix taxpercontig not working with easy-taxonomy 4ce38109 lca is not computed by easy-taxonomy anymore 9d631c16 Fix cleanup of taxonomy intermediate files d0f596f5 taxonomyreport and addtaxonomy output is now adjustable in easy-taxonomy 6bfd08d5 Cleanup default set parameters in easy-taxonomy afcade16 Improve default taxonomy parameter lists shown (without -h) fc126b3e Improve error messages when something is wrong with the input/output paths 3b49310f Improve unrecognized parameter message 83b9e9a1 Remove useless missing tmp dir warning d0a9b79f Fix typo 48f9737a Add ORF filter parameters only to taxonomy for now a6068975 Disable unfinished ORF filter in search 336d9d04 Add taxonomy citation f7fde6fe Reduce binary taxonomy dump memory requirements slightly eff61cfe Add \0 byte after serialization 7e63e1ea Fix typo in Parameters.h 019de271 Add vector of predefined substitution matrices 34b3a539 Merge pull request #389 from mr-c/simde_v0.7.0 74724b3a Cleanup headers in kmermatcher 73fd5cfa Update xxhash to v0.8.0 8dd192c0 Don't create false _has_{builtin,attribute} c2d60348 Squashed 'lib/simde/simde/' changes from f2257f11..b6c9c964 062ef995 Merge commit 'c2d60348af5c036eb2cbc7974d84065e16ab4096' into simde_v0.7.0 bad16c76 Check correctly for existing of binary tax dump in createtaxdb 457cacab Replace string concatenation in aggregatetax with append a5169557 Fix strcmp comperator in nrtotaxmapping too 0da81a03 Fix ASAN free-delete mismatch 4fa7cb27 Replace std::sort in StringBlock with fast sort dc4f9ed4 Wrong comparision used in sort comperator was crashing clang e09b3db3 Move taxonomy version to cpp file 1645696b Use less threads on PPC64LE regression 9c0a99ca Fix compile error in taxonomy test f1ab0b3c Fix missing newline in lca 7ff6dc5e Add version check for binary taxonomy df301e3b Create serialized/mmapable taxonomy in createtaxdb, taxonomy loads instantly compared to before 3addec8e Remove debug output mode from createtaxdb again 5407ca4c Don't create taxonomy files in createtaxdb again if they already exist 95968440 read correct number of CPUs in macos build script of nproc is not available 0defb362 Split aggregatetax and aggregatetaxweight parameter lists 86e6b0b7 Cleanup weightedMajorityLCA d03a8d03 Add score vote mode to taxonomy weighted voting 553a670d Split non-index parts over more files if a split index is requested f5a762ff Do not read e-values for tax-id 0 again in aggregatetax 4c1137c2 Add majoritylca module for majority voting based taxonomy from alignment results 4224c6a6 Move majority lca voting to NcbiTaxonomy class 6ab700bf Fix parameter order in lca and aggregatetax 26a8e478 Skip secondary structure in msa2* with (c)a3m input 6f56a262 Fix: Extract the correct source name when tar2db and createdb are used together ea83a916 Fix cmake deprecation warning ca6aea96 Fix #379: E-value parameters are now correctly parsed as doubles instead of floats 1cec7419 Fix atomic check when cross-compiling aed7d976 Fix now correctly switching to xcode 12.2 in azure 184d834a Try building macOS ARM binaries on Azure's Catalina VMs 9b819686 Fix not returning error in mergeresultsbyset after error case 9f718741 Add MMSEQS_FORCE_MERGE env var for forcing generating fully merged dbs 3df79c30 Build arm64 macos binary only on big sur (not in CI yet) acfa3ef1 Build universal mac binary for sse/avx and arm neon f4f38685 Add symlinks to splitdb #376 41adb5d4 Add cpdb and lndb, place them and rmdb, mvdb into same file 99410a2e Revert "Remove handling of pre-split sequences in splitsequence" 3c0000ba Remove handling of pre-split sequences in splitsequence 6bb22ecc Add splitsequence parameters to all relevant workflows d204e91f createtaxdb can create a taxdb by mapping through .source 1c52b75a Fix tar2db would create entries for non regular tar files 2719ba2f Allow createdb to read generic dbtype (to use in combination with tar2db) 9e990b30 Add missing stdin dbtype to getDbTypeName c8e082e3 Increase number of opened files limit when DBReader is used 2a972e91 Fix gapped score calculation in proteinaln2nucl 750e8844 Update regression for taxonomy 35ad87ed Remove debug message 6a882624 Unify TaxPerContig and Taxonomy 7da33b05 Acc 2bLCA is now default for protein and translated taxonomy, tophit is always used for nucl-nucl 9d0169cc Taxonomy search mode fully integrated into alignment module f8d2878e Refactor alignment to allow computing a limited number of realignments 1cc54190 New 2blca could compute LCA from res not finding anything in first aln 5067c1d4 Taxonomy refactoring 18da8d6e Set approx 2blca as default taxpercontig mode 6da35599 Make taxpercontig orf-prefilter parameters adjustable 45c4de7f Include file size and modified date of inputs in tmp file hash calculation #372 cc472544 Fix #371: --cov-mode 5 was not working 8e8e9a0b Fix MPI compile issue f537370a BC breaking: Unify in result2msa --compress --summarize --omit-consensus to --msa-format-mode, support stockholm output 951d51b4 Don't link header db etc in filterresult to output db 349c2765 Move currentKey out of ifdef in tar2db d95e41e7 Always compute result files in easy-taxonomy 31a90e13 Actually fix the uninitialized warning 20eeaabc Fix uninitialized warning 3c94c0a2 splitsequence can create a sequence database with original headers aca7380b Return bit-score in proteinaln2nucl instead of raw-score 18588bb3 Fix filterresult off by one issue 9b74117e proteinaln2nucl can now compute scores and evalues 8ea08f0c Add curl flag to follow redirects to database downloader 1cf3002a Fix compiler warning 5dc4bcd4 Update eggnog urls (fix curl bug) 20a03128 Fix id issue in tar2db be4d2e07 Add multi-threading support to tar2db f6831608 Merge pull request #359 from mr-c/spelling b244246b Spelling typos fixes d9f2041e Merge branch 'master' of https://github.com/soedinglab/mmseqs2 971f9d90 Turn profiles from lin-space to scores, add average profile-profile code 96d452cb Inline single use of DBWriter::mergeFiles to mergedbs 24ecc26c Fix some compilation flags would not be correctly set during cross-compilation beabb353 Make sure to flush stdout/err before calling any workflows a1622068 Add missing dbtypes to allDbAndFlat 49240a30 Setting APT::Immediate-Configure=false fixes cross-compiler installation d4fd0729 Next try to fix cross-compilation bd3e49fe Remove ubuntu-toolchain ppa breaking cross-compiler installation on azure 4b9b3b56 Remove all other apt sources from azure before installing cross-compilers 57f429a0 remove unused remnants of the past in alignment class de06950f Reduce calls to posix_memalign, fixes lock contention of some platforms d3b0cf9a Fix result2profile could allocate not enough memory if target database contained much longer sequences than query database 1a490efe Support ungapped alignments in sliced search 3af62f06 Fix banded_sw 333cc350 Fix addtaxonomy always crashing due to invalid check 29e327f9 change orf filter params to match test runs cc7d7da3 result2repseq should preload the sequence database into memory 63794225 Improve createsubdb help text 951d5a72 Add nrtotaxmapping to create taxonomy mapping from NR 90e71f99 Squashed 'lib/simde/simde/' changes from 938d82c8..f2257f11 df69c26e Merge commit '90e71f9968d3925e545c45d7c68325dd3cd0c588' into master 48950b95 Correctly pass threads/verbosity in taxonomy workflow 9d3ab794 Merge commit 'b6a4528e818ca644f8200fc84b2d1856ecd8f5c7' into master b6a4528e Squashed 'lib/simde/simde/' changes from 2119ac73..938d82c8 725d9f63 Modified Profile-Profile alignment implementation with templates. 113e3212 Fix ASAN issue in extractorf when using AVX2 b15e95a1 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 b7ec0e93 Fix setcover issues with dbs > 2^31 sequences f8b3f8b1 Add biocontainers badge b7ac683c Update cluster update regression 4d665ce9 Automatically set cluster parameters also in cluster update b5a08833 Fix #272 remove deleted sequences from old clustering in cluster update 66f77ce8 Cleanup subtractdbs b2ac9e0b remove confusing comments 3d2e394a Limit number of jobs used for compiling on travis d58cc78c Fix invalid symlinks in result2repseq 21f71466 Cluster update refactoring 60d5be17 Add missing var to profile 12b78e3f Merge branch 'master' of https://github.com/haydenji0731/MMseqs 2aaac47a First running version of double max profile/profile fbe754e0 Fix missing newline in first sequence in entries of result2msa db1c38b1 Made changes to SSW class for Profile2Profile Alignment a29379e2 Do not map scores if not needed in result2pp e80ec9a3 Updated ROC for result2pp 769aa78a Add seqdb preloading in result2pp cf8b1429 Remove more unused parameter from result2pp f2a29339 Update regression to include result2pp test 3fac8dde Copy profile information of unaligned regions from query profile 967a4555 Cleanup and fix result2pp b2f49a25 Add NR taxonomy information efdbe941 Change serial sort to std::sort 97a8f1dc Update regression 0c123fe7 Fix comp. bias correction in expandaln 401d8e6f Add --max-seqs to ungappedprefilter f57d1a71 Update expandaln, expand2profile and regression a62ea9a9 Update reassign cov. mode in prefilter and fix regression 64f9294b Update regression to include expansion test 61d8b64d Fix coverage read in for nucl-nucl alignment results (#339) 45ae9276 Compute evalues and sort correctly in expandaln 38fab36e Fix wrong sequences being loading in expandaln due to wrong sorting 3aa032be Cleanup in MultipleAlignment ea3212f0 Fix realloc size in profile set size increase 7cca0508 Fix restart cluster-reassign 0945e5a5 Add prefilter parameter to reassign 4e436c79 Fix compile error in tests 47e62299 Avoid constant allocations in PSSMCalculator 657a97c0 Don't clone the whole result_t vector uselessly in profile related modules b87cae01 MultipleAlignment does not require constantly allocating and deallocating Sequence objects anymore 486e13ac Remove add internal ID parameter in result2msa 0a8a7a3a expand2profile module should be able to directly build a new profile a84e6f48 Make max set size in profile classes dynamically growable 5baf62ab Cleanup Sequence class e4b2ffb0 Move PSSM masking and writing to its own file d10a6104 Fix clang warning 76d7d83b Fix progressbar in first clust readin step 01937be2 Taxonomy expressions in filtertax(seq)db interpret , as || now #320 fddf635d Add SILVA to databases module 9ec7c5e6 Fix MPI warning ce65cb86 disable ICC in travis, beta08 breaks their setvars.sh script and SIMDe has many issues 87183135 Fix warning in clang 97653a92 Check the return code of fclose to handle full disk errors better 06bd0cfd Add filterresult for pairwise HHblits filtering to reduce redundancy in a result db #316 3bdaf488 Fix various result2msa modes (compress works cleanly now, --filter-msa mode could return invalid MSAs) c1f78338 Fix invalid projected backtraces in expandaln d741a251 Remove circular include 595625a1 Cleanup result2msa/profile 8ad36374 Unify to computation of alignments in msa2result and transitivealign 55534d71 Fix wrong lengths used in msa2profile 5d10ce00 Rewrite expandaln module 4be0d6e1 Add msa2result module for generating result dbs from MSAs a179ab27 Cleanup DBConcat a9c56e57 Merge branch 'master' of github.com:soedinglab/MMseqs2 ec3b8254 Try out new aggregate tax algoritm cfba9f02 Fox .index.0 files not being removed after sorting dde4b2e3 Next try to downgrade ICC 618331da Downgrade ICC since latest version seems to be broken ed45a9f2 Remove unused variables in rewritten microtar 328732a1 Update regression ae7398d6 Added fident to convertalis. fident prints the fraction of the sequence identity. pident reports the percentage. soedinglab/MMseqs2#337 a61b9eb9 handle the unranked root and cell orgs d2141f32 ORF filter with high-eval thr ungapped alignment ea01a174 Remove useless cast in QueryMatcher 1e95b6bd Update tantan 207d0d21 Allow overwriting string parameters with empty strings 755a7b03 Add new binaries to README and fix whitespace be05b8d0 Add orf-filter to taxpercontig and cleanup 22e17aa4 orf-filter should also work in easy-search and easy-taxonomy 7fefa8af added mode to ByteParser 4393c5aa typo b05d7d75 Speed up read index and kmermatcher 3f9a6031 Fix --search-type 4 in createindex 18e90119 Rework read index in DBReader 1eb72611 Do not sort indexes when already ordered while DB close 65f246b1 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 053fd61c Improve multi-threaded speed of writing clustering results e1a71066 Fix typo in arch name 3b1e528c Add SSE2 binary to Docker d66ee416 Use Ubuntu 20.04 for cross-compilation 8ba605e8 Add SSE2 and cross-compiled ARM64/POWER8/POWER9 builds to azure a5e485ba Fix broken checks for libraries when cross-compiling 7fe0cb90 Fix progress bar in DBConcat cef0731b Create translated index if --search-type 4 is used in createindex 47afc572 Fix --search-mode 4 issues in offsetalignment 80fdcbed Change cluster reassing to bool soedinglab/MMseqs2#329 57e8a9df Allow ORF filter only in combination with query nucleotides d55f06ce Fix Pfam.full database creation 659cc1f8 Add additional experimental ORF prefiltering step before translated search e934f1c4 made ByteParser more informative 4d14c9fe tax-lineage modes: 0 nothing, 1 names, 2 taxids b777cd09 Disable ips4o on ppc for now 95a88524 find_package is case sensitive 8e797b1d Allow disabling use of IPS4O, cleanup 850a196b added seqs assignment agreement to the output e21dc40f Fix wrong existence checks for databases in workflows 5901a0a9 Set minimum clang to 5.0 for now d7b46e60 Disable ips4o on cygwin 033fda23 Change travis gcc check to 4.9 908675d2 Add includes ee7b5c11 Change random_shuffle to shuffle d1a1af5e Rewrite atomic check in cmake d6590f39 Add missing FastSort.h 704d0fb4 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 109be7bb Change sort to ips4o if possible 02059366 Fix warning d092a469 Fix kmermatcher MPI support b001dfb2 Made modifications for Profile-Profile alignment. Changes belong to SSW, Alignment, Matcher. Right before integrating lin space vector cost calculation for H value. 521c0d25 Made modifications to ssw algorithm implementation. 2f1db01c Rename martin.steinegger@mpibpc.mpg.de to martin.steinegger@snu.ac.kr 0f7b6856 Fix #326 wrong citation link 62a387ed Merge branch 'master' of https://github.com/soedinglab/mmseqs2 c125a217 Fix issues in expandaln 648bc1f6 Add Pfam-B download script 16e79a2a Add dbCAN2 download script 7c0ed7f8 Microtar would try to seek backwards resulting in horrible gzip read performance cab0e838 Fix #323 createdb not correctly reading gz/bzip with --createdb-mode 1 1d650034 mmseqs --help should not give a useless correction suggestion 35c58af9 Improve download of taxdmp file in createtaxdb 68feeb20 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 6546822c Add missing .dbtype to newSeqDb header in cluster update 2a787482 Merge pull request #321 from milot-mirdita/simde 72d19b96 Seems like travis reduced the RAM available on ARM 565ad3f9 Add script to update SIMDe b9783a7f Squashed 'lib/simde/simde/' content from commit 2119ac73 9828f0d6 Merge commit 'b9783a7fca1677486f2f830a9c59fda11330980c' as 'lib/simde/simde' 641ef68b Remove submodule in preparation for subtree b6dd6447 Work around clang issue a877dc00 Rebuild SIMD autodetection 5ba9e7ae Cleanup warnings 3980d2a7 Add one Newton-Raphson it to make division with _mm_rcp_ps always consistent 27b82963 Try limiting threads in ppc to not crash on 4gig ram c95bdcc1 Silence strict aliasing warning in Itoa for NEON 590cfb96 Rebuild 128/256 bit SIMD split in simd.h f5750fee Enable building on non-x86 and less than SSE4.1 21d798f0 Remove not finished createtaxdb changes b59c3381 Make orf information available through convertalis 284bb757 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 f4bbce84 Add MemoryTracker, Account for index size when computing available memory e2510e8f fixed comment because it wasmisleading def7ace2 Add convertalis HTML output based on MMseqs2 app (app.mmseqs.com) dd3ff63a Fix convertkb to work without a mapping file dc054792 Previous lookup writer would always report failing 52ac0f36 Refactor lookup writing to not corrupt memory if an accession is too long 9f2be0e0 Disable ICC travis for now d319bb92 Merge branch 'master' of github.com:soedinglab/MMseqs2 d1522365 remove appendtaxaln 648cf836 refined code as per Milot's feedback 94db0316 One more INT -> UINT warning that ICC complains about 271b7c13 Next try for travis 2c1dfdd4 Fix terminate value in SSW again 6016e1b1 Try to fix travis ddaaaf7f Fix various warnings reported by ICC, add ICC to travis f3adc10f added aggregatetaxweights to get rid of appendtaxaln 211dd7a7 change to SSTR 517b01ba true in createParameterString saves defining taxonomy defaults aa068c86 added taxonomy default parameter values because life f9d1face in the process of adding taxPerContig workflow 94f895a3 fixed english 19f1dfbd moved weight const back 8db3e714 moved definition of constants 05cfc8bd added mode of tax output: both lca and aln 2cd59046 added voteMode parameter 128f57b5 extended aggregatetax to handle eval-based weights e4a10bd7 added appendtaxaln for extending aggregatetax 0c29da4a Actually fix the filterdb --join-db issue 7ff6ae7c Restore fix lost char in joindb mode change f5c8b28c Update README.md e4f7e745 Add qOrfStart/qOrfEnd, dbOrfStart,dbOrfEnd to offsetalignment cf40916c Merge branch 'master' of https://github.com/soedinglab/mmseqs2 c0dac797 Do not write null byte in splitdb cbb542af added rand id to tmp files created at localTmp 214e87e9 Remove goto in lca.cpp c8309fce Merge branch 'master' of https://github.com/soedinglab/mmseqs2 b761ddf4 Fix issue with qset format output 80bff832 Do not write .lookup in easy workflows if not needed 21f7a05f createdb can now read a database containing FASTA/Q entries d5a05376 Fix whitespace and cleanup output strings in createdb d14b622e Fix cygwin compile issue 9e5fb33b Introduce KSeqWrapper to read from memory location dc7b9626 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 de6f7524 Fix soft link createdb bug if multiple input file are provided b06bee91 Fix alpha regex 46c84389 Update combine pval agg-mode 3 67d61013 Disable fancy progress bars on travis to reduce output 203a2173 Updated two more tests to use tighter ROC thresholds a9052f44 Update regression with tighter bounds for ROC tests c62736a6 Correctly parse keys from data files in filterdb --filter-file This was causing a linsearch instability fe007cb4 Use MultiParam for gapOpen, gapExtend costs 3513001d Add easy-rbh workflow d0d3032e Fix RBH search if using -a to show alignments ce1a43bf Merge branch 'master' of https://github.com/soedinglab/mmseqs2 ea24e493 Fix issues with abs. path if using aria2c 5228745f Improve --alignment-mode parameter description and make it a non expert parameter fffa9b10 Fix various inconsistencies and usability issues with alignall: * alignall alignment-mode did not correspond to align alignment-mode * add-backtrace did not do anything, has to be specified now if backtrace is needed * Did return a alignment db type even though it is incompatible with that type, uses generic for now * various parameters were passed but unused - zdrop and scorebias are used now (however see below) - realign, alt ali, max accept/reject, wrapped are now gone 29066847 Fix wrong warning 813d81f2 Update regression 264d7811 Switch greedy clustering algorithm back to old idea c09f6574 Improve nucleotide clustering workflow 38a73770 Set k-mers in linclust to 0 for the nucleotide clustering 7df6e3f7 Replace characters that can not be reversed by N in extract frames e9678f62 Update regression f886e868 Add nucleotide support to cluster (workflow nucleotide_clustering), clust module will infer identity automatically if missing, Improve low. mem. greedy incremental algorithm, Update regression 5f873587 Add kmers-per-sequence-scale to linsearch 0310eb60 Change --kmer-per-seq-scale to a multi parameter, add error if cluster is called with a nucleotide sequence e258bc8d Fix #299 PDB70 database creation was not working 7095f37e Add support reverse complemente in rescorediagonal --rescore-mode 0 and 1 61ca4888 Fix result2dnamsa 70d014e4 Add search-type 4 to Search 462f24cb Add module result2dnamsa 5670d990 Fix regression error e4451d59 Add result direction parameter to kmersearch 12c499dc Fix reverse sequences issues in linclust and linsearch 44499c3c Update filterdb regression test 807b4a56 Fix issue soedinglab/MMseqs2#290. Filterdb checked for mode == true but mode was 2. 24479bc2 Fix Docker a578f52a Fix char signedness on PPC a0d64a98 Update regression a07a266f Working on PPC64LE support 09734177 Remove remaining _mm_shuffle_epi32 cdef78a6 Merge pull request #285 from hgsommer/misc_small 283c8d03 Replace goto end in ssw 6bfc5028 Fix c/p mistake in convertalignments e61da344 Fix spelling of 'length' 9a63760f Replace nested ternary operator 4349b5c6 Avoid repeatedly checking for profile db types c170a11f Call MsaFilter::shuffleSequences() from MsaFilter::filter() ef49ba22 Return value from MsaFilter::filter() d155dc36 Replace int by bool literals for bool variable ec6722ad Align headings with column in PSSMCalculator::printProfile() 548a9bd6 Avoid forward declaration of ScoreMatrix d0fbe471 Do some cleanup in StripedSmithWaterman.cpp 91d1aedd Replace check for zero-sized containers by empty() e47b8eed Remove superfluous parameter from ssw_init() 250b1221 Simplify return statements 4fe1116a Remove counting zero scores in Sequence::mapProfile() 4303728b Replace multiplication by zero 1bd60242 Remove increment by zero e4d4389f Move check for exit condition in front of allocations 556d26d1 Clean up function signatures in MultipleAlignment 3863af9a Move include back to header to restore build e1208493 Remove unused TmpResult score field 1fd4db8f Die if DBReader cannot reopen files (e.g. no more file handles left) 1e21b87b Purge sequenceLookup early since its recreate in split databases 40854ddc Prefiltering and CacheFriendlyOperations refactoring 2433e086 WASM work in progress 14014cd0 Fix prefilter overflow instability e0f97184 Add conda forge to conda install instructions aa175d63 Fix off by one in kmermatcher soedinglab/MMseqs2#274 (comment) d1607bc8 Remove LINE_MAX eca2155d Clear string buffer instead of reassigning in swapresults 0f4645ed Fix wrong reverse marking in linsearch reported by UBSAN 5b612a32 Missing mpi binaries for travis regression 83d22417 Next try for ARM compiler flags 7ad122f0 Missed a few variables ac7914be Do not require a cmake variable to build ARM 0dcfaadb Update regression to fix broken samtools call on ARM 29927b4c More NEON fixes, we assume signed chars, ARM uses unsigned by default 7760220f Next try to get the ARM regression to work cc6d0d52 Add hack to not break travis log size limit 5408c3d1 Try to get NEON to compile 83192cab Fix search workflow parameters printed twice f6f001c8 Fix new clang-10 warnings and further travis fixes 259e6434 llvm-10 alias is not whitelisted in travis yet b1249fd5 Fix errors in Travis YAML from previous commit 18486d4c Update travis - use native aarch64 for neon - use xenial - shorten script 98c37f3c shortend MultiParam usage, improved line breaks in usage c9be07f1 Add gcc-9 to travis 2e5fb309 Fix travis clang build d5865c89 Remove MultiParam g++-9 warning 73679835 Rework target split merging ca586939 Fix RESSIZE issue in slice search if sequences are used 491900b9 Improve usage text of cluster/linclust 0166850a Remove old greedy incremental clustering code and just run the memory efficient version instead. 15163e64 Fix Verbosity in workflows aa78af46 Fix issue soedinglab/MMseqs2#274 7846dfce fixed clang template error e1206371 extended MultiParam class, replaced ScoreMatrixFile type by MultiParam<char*> b88b5475 rewrite alphabetSize as multi parameter ecb4e35d started template class MultiParam to store sequence type specific values e1a1c122 changed dbtype comparision in AlignmentSymmetry 2a829aef Replace symlinkat call with getcwd/chdir/symlink/chdir to fix Conda build using macOS 10.9 SDK 28e83e8d Add OpenMP include to DBReader fb00aa0c Fix realloc issue while IndexTable creation of profiles 504e5021 Take max. seq. len of query and target db in prefilter and alignment 16e23521 Fix bug if seq. len > max seq. length in Alignment 80d0187d Fix asan issue 751f5c19 Make ZDROP an expert parameter, change description text 1b6edd0d Rework x detection (SIMD) 9677254a Merge branch 'master' of https://github.com/soedinglab/mmseqs2 1ac1e686 Fix max seq issues in prefilter cb737033 Reset download strategy to not use aria2c for the NCBI download c95f3ee0 fixed ksw2 test 72b95c0c Error if we cannot download from NCBI 1d0aad50 Fix databases not piecing togehter all kalamari accessions 516723d5 Merge branch 'master' of https://github.com/soedinglab/MMseqs2 d81b6cca added zdrop parameter to control banded nucleotide alignment e2e39a97 Add Kalamari Contaminants database c0c538ea Various fixes in databases script 08cc95b3 Fix createtaxdb redownloading when taxdump already exists 018eb349 Remove a bit whitespace in front of each parameter in usage message 8aa7513d add aggregatetax example, fix typos 8bcd7c74 Fix typo 8e581b76 Rework usage texts 7dc25764 Hide most parameters from createindex 2baa609e Add examples to many modules 00a7d769 fixed bugs for long or wrapped nucleotide sequences a4bdcb47 eggNOG profiles should not depend on the deleted MSAs 4c783095 Fix eggNOG database construction f7a5599c Cleanup not needed files immediately in databases workflow 3ed3690d Fix downloads always restarting in databases workflow 4cfac9a8 Fix aria warning with more than 16 connections e0a00e10 Revert "Use SW instead of BandedNucAln if we don't have diagonals" 7ac966b2 Fix result2msa could fail if it was writing compressed output 95729ac7 Fix wrong output DB type written in alignall f899e7c7 Use SW instead of BandedNucAln if we don't have diagonals c08d9fa8 Allow parameter descriptions to span multiple lines 57868498 MMseqs2 is not limited to proteins, update README to reflect that 11818b0a Cleanup hiding parameters in workflows c481cea6 Remove some useless includes 2f64aeeb Fix databases timestamp appending instead of overwriting ae9e9e32 Add eggNOG setup procedure to databases 31c8e5d5 Shorten two short parameter descriptions 2f49d3e3 Read header from lookup in msa2profile if available 1356869b add option to reverese profile dbs ac3482e8 More issues with zlib and tar2db aaafafe4 Fix tar2db keys c751d9e2 More tar2db fixes a9c93014 Fix variadic input to tar2db 51a76130 Add tar2db module to convert content of any tar to a DB 96f9a91e Use nedmalloc on Windows/Cygwin 73f5c2a2 Add databases workflow to README 5a7ac9e5 make align output consistent c5ebe529 fixed setcover cluster mode (by fixing bug in similarity reading for short aln results e.g. hamming distance aln) 481696b5 Fix databases output c6b4a57a Beginning cleaning up parameter descriptions a9552a17 Show default value of bool parameters af89c467 Add a proposed example text structure git-subtree-dir: lib/mmseqs git-subtree-split: c48da9d781b81804727b5cccfed7f97cfcc20c9d

martin-steinegger added a commit that referenced this issue Feb 14, 2020

Fix issue #274

aa78af4

martin-steinegger closed this as completed Feb 16, 2020

martin-steinegger added a commit that referenced this issue Feb 26, 2020

Fix off by one in kmermatcher #274 (comment)

aa175d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kmer array overflow. currKmerArrayOffset=0, kmerBufferPos=1024, kmerArraySize=1024. #274

Kmer array overflow. currKmerArrayOffset=0, kmerBufferPos=1024, kmerArraySize=1024. #274

kad-ecoli commented Feb 13, 2020

martin-steinegger commented Feb 13, 2020

martin-steinegger commented Feb 14, 2020

martin-steinegger commented Feb 14, 2020 •

edited

Loading

kad-ecoli commented Feb 14, 2020

martin-steinegger commented Feb 14, 2020

martin-steinegger commented Feb 16, 2020

kad-ecoli commented Feb 25, 2020

martin-steinegger commented Feb 26, 2020

kad-ecoli commented Mar 8, 2020

Kmer array overflow. currKmerArrayOffset=0, kmerBufferPos=1024, kmerArraySize=1024. #274

Kmer array overflow. currKmerArrayOffset=0, kmerBufferPos=1024, kmerArraySize=1024. #274

Comments

kad-ecoli commented Feb 13, 2020

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

MMseqs Output (for bugs)

Context

Your Environment

martin-steinegger commented Feb 13, 2020

martin-steinegger commented Feb 14, 2020

martin-steinegger commented Feb 14, 2020 • edited Loading

kad-ecoli commented Feb 14, 2020

martin-steinegger commented Feb 14, 2020

martin-steinegger commented Feb 16, 2020

kad-ecoli commented Feb 25, 2020

martin-steinegger commented Feb 26, 2020

kad-ecoli commented Mar 8, 2020

martin-steinegger commented Feb 14, 2020 •

edited

Loading