Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with logo plots #62

Open
guillemsanchezsanchez1996 opened this issue May 30, 2023 · 1 comment
Open

Issue with logo plots #62

guillemsanchezsanchez1996 opened this issue May 30, 2023 · 1 comment

Comments

@guillemsanchezsanchez1996
Copy link

guillemsanchezsanchez1996 commented May 30, 2023

Hello everybody,

First of all thanks a lot Conga team for creating this nice and cool package, and for keeping in mind the gd TCR "aficionados" in your work. I am struggling to run the make_tcr_logos.py script with a gd tsv file. I have edited it to include the "human_gd" variable in organism. The error is the following one:

python scripts/make_tcr_logos.py --tcrs_tsvfile data/CD4_naive_1.tsv --outfile_prefix CD4_naive_1_2 --organism human_gd
Read 321 paired TCRs from data/CD4_naive_1.tsv
made: CD4_naive_1_2_tcr_logo_A.png
Traceback (most recent call last):
File "/home/willy_s/conga/scripts/make_tcr_logos.py", line 68, in
make_tcr_logo_for_tcrs(
File "/home/willy_s/conga/conga/tcrdist/make_tcr_logo.py", line 515, in make_tcr_logo_for_tcrs
cmds = make_default_logo_svg_cmds(
File "/home/willy_s/conga/conga/tcrdist/make_tcr_logo.py", line 376, in make_default_logo_svg_cmds
b_junction_results = tcr_sampler.analyze_junction( organism, vb_gene, jb_gene,
File "/home/willy_s/conga/conga/tcrdist/tcr_sampler.py", line 401, in analyze_junction
assert 3*len(cdr3_protseq) == len(ncount)
AssertionError

Do yo have an idea about what is going on? I think the problem is with the delta sequence logo.

Guillem

PS: Here a snapshot of my tcr file, I have not seen any strange sequence (i.e CDR3 with very few aminoacids)
imatge

@sschattgen
Copy link
Collaborator

Hi Guillem,

Thanks for your interest.

The issue is due to the nucleotide sequence not being equal to 3 times the amino acid sequence. It seems you have extra/missing nucleotides or amino acids somewhere in the table. You can use pandas and this bit of code to find which ones are causing the error.

import pandas as pd
df = pd.read_table('your_table.tsv')

df[
    (3*df.cdr3a.str.len() != df.cdr3a_nucseq.str.len()) | 
    (3*df.cdr3b.str.len() != df.cdr3b_nucseq.str.len())
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants