Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clusters listed in debug.report.tsv not in report.tsv #188

Closed
whottel opened this issue Jul 27, 2017 · 3 comments
Closed

Clusters listed in debug.report.tsv not in report.tsv #188

whottel opened this issue Jul 27, 2017 · 3 comments

Comments

@whottel
Copy link

whottel commented Jul 27, 2017

For my samples I see the following in the log:
__________________________ Assembling each cluster ___________________________
Will run 1 cluster(s) in parallel
Constructing cluster APH_9__Ia (1 of 9)
Constructing cluster Chlamydophila_psittaci_16S (2 of 9)
Constructing cluster Helicobacter_pylori_16S (3 of 9)
Constructing cluster Mycobacterium_abscessus_16S+ (4 of 9)
Constructing cluster Neisseria_- (5 of 9)
Constructing cluster OXA_29 (6 of 9)
Constructing cluster Pasteurella_multocida_16S (7 of 9)
Not constructing cluster Propionibacterium_acnes_16S because it only has 2 reads (8 of 9)
Constructing cluster rrsB+ (9 of 9)

I notice these clusters are listed in debug.report.tsv for all of my samples. In some of my samples the final report.tsv file is empty. In others only one of the clusters listed in the debug file is described in the report.tsv file. Can I tweak some of the nucmer or assembly options so that all the clusters are described in the report.tsv file?

@martinghunt
Copy link
Contributor

If a cluster doesn't make it into the report file, it's because there's no good evidence that a sequence from that cluster exists in the reads. Usually it's because of a few spurious reads mapped, and then the assembly fails. If the assembly works, then you would need a nucmer match. What do the lines of the debug report.tsv look like that got removed from the report.tsv? You could try lowering --nucmer_min_id and --nucmer_min_len.

@whottel
Copy link
Author

whottel commented Jul 31, 2017

The debug.reprt.tsv output looks like this:
flag | reads | cluster
1024 | 16 | APH_9__Ia
1024 | 200 | Chlamydophila_psittaci_16S
1024 | 46 | Helicobacter_pylori_16S
1024 | 206 | Mycobacterium_abscessus_16S+
1024 | 684 | Neisseria_-
1024 | 20 | OXA_29
1024 | 522 | Pasteurella_multocida_16S
1024 | 522 | rrsB+

The other columns are empty. The corresponding report.tsv file is entirely empty.
I have tried lowering --nucmer_min_id and --nucmer_min_len, going so far as setting each to 1 and I am still getting the same result.

@martinghunt
Copy link
Contributor

Flag 1024 means that it failed to find the closest reference sequence. My guess is that for each cluster there is no nucmer match bewteen the contigs and the reference sequences. This should be in the log file around the text "Looking for closest match from sequences within cluster". You could do extra checks by using the --noclean option and analysing the assemblies yourself - they are called clusters/*/assembly.fa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants