Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Aggregate the individual GEX runs into a single AnnData object #52

Open
QianhuiXu opened this issue Oct 24, 2022 · 2 comments
Open

Comments

@QianhuiXu
Copy link

QianhuiXu commented Oct 24, 2022

Hello,

conga is a wonderful tool!

I ran into an issue with explore fancy_conga_pipeline_with_batches_and_gammadelta_tcrs notebook.

My command : gex_datasets = sorted(glob.glob('*-CD3'))
diseases = ['C','NC','CT'] # colitis, no-colitis, healthy control
contigs_file = '/home/shpc_100668/conga/GSE144469_RAW/GSE144469_TCR_filtered_contig_annotations_all.csv'
all_contigs = pd.read_csv(contigs_file)
all_data = []
for donor_num, gex_dir in enumerate(gex_datasets):
# The folder name is also the donor ID
donor = gex_dir.split('-')[0]
donor_contigs = all_contigs[all_contigs.barcode.str.endswith(donor)].copy()
# change the barcode suffix to '-1' to match the GEX data
donor_contigs['barcode'] = donor_contigs.barcode.str.split('-').str.get(0)+'-1'
donor_contigs_file = f'{donor}_abtcr_filtered_contigs.csv'
donor_contigs.to_csv(donor_contigs_file)
# process the contigs to generate conga clonotypes
donor_clones_file = f'{donor}_abtcr_clones.tsv'
make_10x_clones_file(
donor_contigs_file,
organism = 'human', # using 'human' for TCRab
clones_file = donor_clones_file,
stringent = True, # (the default) see Note #1 on clonotype filtering
)
# read the GEX data and the clonotypes into CoNGA
adata = conga.preprocess.read_dataset(
gex_dir, '10x_mtx', donor_clones_file,
allow_missing_kpca_file=True)
disease = donor[:-1]
adata.obs['disease'] = disease
adata.obs['disease_int'] = diseases.index(disease) # conga batch ids are integers
adata.obs['donor'] = donor
adata.obs['donor_int'] = donor_num # conga batch ids are integers
all_data.append( adata )
new_adata = all_data[0].concatenate(all_data[1:])
new_adata.write('merged_gex_abtcr.h5ad')

Error: IndexError Traceback (most recent call last)
/tmp/ipykernel_1354605/1967687937.py in
33
34 # concatenate the datasets
---> 35 new_adata = all_data[0].concatenate(all_data[1:])
36 #save the aggregate AnnData object
37 new_adata.write('merged_gex_abtcr.h5ad')

IndexError: list index out of range

I'm really at a loss as to how to proceed, and any guidance would be much appreciated!
Thank you for your kind help!

@phbradley
Copy link
Owner

Hi there, thanks for trying conga, and thanks for the feedback. This error suggests that the list "all_data" is empty, which may be because the preceding loop did not execute. The loop was over the files found by the glob command

gex_datasets = sorted(glob.glob('*-CD3'))

Could you check and see whether the expected files are present and in the directory where the notebook is running? These would be the *-CD3 folders that have the GEX counts data in them.

@QianhuiXu
Copy link
Author

QianhuiXu commented Oct 25, 2022

Thank you for your help! I have solved this error by changing the reading directory: gex_datasets = sorted(glob.glob('/home/shpc_100668/conga/GSE144469_RAW/*-CD3'))
But I got another issue in the next step, I have put these *-gdTCR_filtered_contig_annotations.csv files in the reading directory('/home/shpc_100668/conga/GSE144469_RAW/').

My command : gex_datasets = sorted(glob.glob('/home/shpc_100668/conga/GSE144469_RAW/*-CD3'))
diseases = ['C','NC','CT'] # colitis, no-colitis, healthy control
contigs_file = '/home/shpc_100668/conga/GSE144469_RAW/GSE144469_TCR_filtered_contig_annotations_all.csv'
all_contigs = pd.read_csv(contigs_file)
all_data = []
for donor_num, gex_dir in enumerate(gex_datasets):
donor = gex_dir.split('-')[0]
donor_contigs = all_contigs[all_contigs.barcode.str.endswith(donor)].copy()
donor_contigs['barcode'] = donor_contigs.barcode.str.split('-').str.get(0)+'-1'
donor_contigs_file = f'{donor}_abtcr_filtered_contigs.csv'
donor_contigs.to_csv(donor_contigs_file)
donor_clones_file = f'{donor}_abtcr_clones.tsv'
make_10x_clones_file(
donor_contigs_file,
organism = 'human', # using 'human' for TCRab
clones_file = donor_clones_file,
stringent = True, # (the default) see Note #1 on clonotype filtering
)
adata = conga.preprocess.read_dataset(
gex_dir, '10x_mtx', donor_clones_file,
allow_missing_kpca_file=True)
disease = donor[:-1]
adata.obs['disease'] = disease
adata.obs['disease_int'] = diseases.index(disease) # conga batch ids are integers
adata.obs['donor'] = donor
adata.obs['donor_int'] = donor_num
all_data.append( adata )
new_adata = all_data[0].concatenate(all_data[1:])
new_adata.write('merged_gex_abtcr.h5ad')

error:
ab_counts: []
old_unpaired_barcodes: 0 old_paired_barcodes: 0 new_stringent_paired_barcodes: 0
reading: /home/shpc_100668/conga/GSE144469_RAW/C1-CD3 of type 10x_mtx
total barcodes: 3862 (3862, 33538)
reading: /home/shpc_100668/conga/GSE144469_RAW/C1_abtcr_clones.tsv
WARNING: missing kpca_file: /home/shpc_100668/conga/GSE144469_RAW/C1_abtcr_clones_AB.dist_50_kpcs
WARNING: X_tcr_pca will be empty
Reducing to the 0 barcodes (out of 3862) with paired TCR sequence data
/home/shpc_100668/conga/conga/preprocess.py:233: DeprecationWarning: Use is_view instead of isview, isview will be removed in the future.
if adata.isview: # ran into trouble with AnnData views vs copies

AttributeError Traceback (most recent call last)
/tmp/ipykernel_2715303/7264258.py in
23 adata = conga.preprocess.read_dataset(
24 gex_dir, '10x_mtx', donor_clones_file,
---> 25 allow_missing_kpca_file=True)
26 disease = donor[:-1]
27 adata.obs['disease'] = disease

~/conga/conga/preprocess.py in read_dataset(gex_data, gex_data_type, clones_file, make_var_names_unique, keep_cells_without_tcrs, kpca_file, allow_missing_kpca_file, gex_only, suffix_for_non_gene_features)
403
404 tcrs = [ barcode2tcr[x] for x in adata.obs.index ]
--> 405 store_tcrs_in_adata( adata, tcrs )
406
407 return adata

~/conga/conga/preprocess.py in store_tcrs_in_adata(adata, tcrs)
178
179 # ensure lower case
--> 180 adata.obs['cdr3a_nucseq'] = adata.obs.cdr3a_nucseq.str.lower()
181 adata.obs['cdr3b_nucseq'] = adata.obs.cdr3b_nucseq.str.lower()
182

~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/generic.py in getattr(self, name)
5485 ):
5486 return self[name]
-> 5487 return object.getattribute(self, name)
5488
5489 def setattr(self, name: str, value) -> None:

~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/accessor.py in get(self, obj, cls)
179 # we're accessing the attribute of the class, i.e., Dataset.geo
180 return self._accessor
--> 181 accessor_obj = self._accessor(obj)
182 # Replace the property with the accessor object. Inspired by:
183 # https://www.pydanny.com/cached-property.html

~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/strings/accessor.py in init(self, data)
166 from pandas.core.arrays.string_ import StringDtype
167
--> 168 self._inferred_dtype = self._validate(data)
169 self._is_categorical = is_categorical_dtype(data.dtype)
170 self._is_string = isinstance(data.dtype, StringDtype)

~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/strings/accessor.py in _validate(data)
223
224 if inferred_dtype not in allowed_types:
--> 225 raise AttributeError("Can only use .str accessor with string values!")
226 return inferred_dtype
227

AttributeError: Can only use .str accessor with string values!

Thank you for your kind help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants