Error in Aggregate the individual GEX runs into a single AnnData object #52

QianhuiXu · 2022-10-24T13:59:31Z

Hello,

conga is a wonderful tool！

I ran into an issue with explore fancy_conga_pipeline_with_batches_and_gammadelta_tcrs notebook.

My command : gex_datasets = sorted(glob.glob('*-CD3'))
diseases = ['C','NC','CT'] # colitis, no-colitis, healthy control
contigs_file = '/home/shpc_100668/conga/GSE144469_RAW/GSE144469_TCR_filtered_contig_annotations_all.csv'
all_contigs = pd.read_csv(contigs_file)
all_data = []
for donor_num, gex_dir in enumerate(gex_datasets):
# The folder name is also the donor ID
donor = gex_dir.split('-')[0]
donor_contigs = all_contigs[all_contigs.barcode.str.endswith(donor)].copy()
# change the barcode suffix to '-1' to match the GEX data
donor_contigs['barcode'] = donor_contigs.barcode.str.split('-').str.get(0)+'-1'
donor_contigs_file = f'{donor}_abtcr_filtered_contigs.csv'
donor_contigs.to_csv(donor_contigs_file)
# process the contigs to generate conga clonotypes
donor_clones_file = f'{donor}_abtcr_clones.tsv'
make_10x_clones_file(
donor_contigs_file,
organism = 'human', # using 'human' for TCRab
clones_file = donor_clones_file,
stringent = True, # (the default) see Note #1 on clonotype filtering
)
# read the GEX data and the clonotypes into CoNGA
adata = conga.preprocess.read_dataset(
gex_dir, '10x_mtx', donor_clones_file,
allow_missing_kpca_file=True)
disease = donor[:-1]
adata.obs['disease'] = disease
adata.obs['disease_int'] = diseases.index(disease) # conga batch ids are integers
adata.obs['donor'] = donor
adata.obs['donor_int'] = donor_num # conga batch ids are integers
all_data.append( adata )
new_adata = all_data[0].concatenate(all_data[1:])
new_adata.write('merged_gex_abtcr.h5ad')

Error: IndexError Traceback (most recent call last)
/tmp/ipykernel_1354605/1967687937.py in
33
34 # concatenate the datasets
---> 35 new_adata = all_data[0].concatenate(all_data[1:])
36 #save the aggregate AnnData object
37 new_adata.write('merged_gex_abtcr.h5ad')

IndexError: list index out of range

I'm really at a loss as to how to proceed, and any guidance would be much appreciated!
Thank you for your kind help!

phbradley · 2022-10-24T15:55:44Z

Hi there, thanks for trying conga, and thanks for the feedback. This error suggests that the list "all_data" is empty, which may be because the preceding loop did not execute. The loop was over the files found by the glob command

gex_datasets = sorted(glob.glob('*-CD3'))

Could you check and see whether the expected files are present and in the directory where the notebook is running? These would be the *-CD3 folders that have the GEX counts data in them.

QianhuiXu · 2022-10-25T05:24:04Z

Thank you for your help! I have solved this error by changing the reading directory: gex_datasets = sorted(glob.glob('/home/shpc_100668/conga/GSE144469_RAW/*-CD3'))
But I got another issue in the next step, I have put these *-gdTCR_filtered_contig_annotations.csv files in the reading directory('/home/shpc_100668/conga/GSE144469_RAW/').

My command : gex_datasets = sorted(glob.glob('/home/shpc_100668/conga/GSE144469_RAW/*-CD3'))
diseases = ['C','NC','CT'] # colitis, no-colitis, healthy control
contigs_file = '/home/shpc_100668/conga/GSE144469_RAW/GSE144469_TCR_filtered_contig_annotations_all.csv'
all_contigs = pd.read_csv(contigs_file)
all_data = []
for donor_num, gex_dir in enumerate(gex_datasets):
donor = gex_dir.split('-')[0]
donor_contigs = all_contigs[all_contigs.barcode.str.endswith(donor)].copy()
donor_contigs['barcode'] = donor_contigs.barcode.str.split('-').str.get(0)+'-1'
donor_contigs_file = f'{donor}_abtcr_filtered_contigs.csv'
donor_contigs.to_csv(donor_contigs_file)
donor_clones_file = f'{donor}_abtcr_clones.tsv'
make_10x_clones_file(
donor_contigs_file,
organism = 'human', # using 'human' for TCRab
clones_file = donor_clones_file,
stringent = True, # (the default) see Note #1 on clonotype filtering
)
adata = conga.preprocess.read_dataset(
gex_dir, '10x_mtx', donor_clones_file,
allow_missing_kpca_file=True)
disease = donor[:-1]
adata.obs['disease'] = disease
adata.obs['disease_int'] = diseases.index(disease) # conga batch ids are integers
adata.obs['donor'] = donor
adata.obs['donor_int'] = donor_num
all_data.append( adata )
new_adata = all_data[0].concatenate(all_data[1:])
new_adata.write('merged_gex_abtcr.h5ad')

error:
ab_counts: []
old_unpaired_barcodes: 0 old_paired_barcodes: 0 new_stringent_paired_barcodes: 0
reading: /home/shpc_100668/conga/GSE144469_RAW/C1-CD3 of type 10x_mtx
total barcodes: 3862 (3862, 33538)
reading: /home/shpc_100668/conga/GSE144469_RAW/C1_abtcr_clones.tsv
WARNING: missing kpca_file: /home/shpc_100668/conga/GSE144469_RAW/C1_abtcr_clones_AB.dist_50_kpcs
WARNING: X_tcr_pca will be empty
Reducing to the 0 barcodes (out of 3862) with paired TCR sequence data
/home/shpc_100668/conga/conga/preprocess.py:233: DeprecationWarning: Use is_view instead of isview, isview will be removed in the future.
if adata.isview: # ran into trouble with AnnData views vs copies

AttributeError Traceback (most recent call last)
/tmp/ipykernel_2715303/7264258.py in
23 adata = conga.preprocess.read_dataset(
24 gex_dir, '10x_mtx', donor_clones_file,
---> 25 allow_missing_kpca_file=True)
26 disease = donor[:-1]
27 adata.obs['disease'] = disease

~/conga/conga/preprocess.py in read_dataset(gex_data, gex_data_type, clones_file, make_var_names_unique, keep_cells_without_tcrs, kpca_file, allow_missing_kpca_file, gex_only, suffix_for_non_gene_features)
403
404 tcrs = [ barcode2tcr[x] for x in adata.obs.index ]
--> 405 store_tcrs_in_adata( adata, tcrs )
406
407 return adata

~/conga/conga/preprocess.py in store_tcrs_in_adata(adata, tcrs)
178
179 # ensure lower case
--> 180 adata.obs['cdr3a_nucseq'] = adata.obs.cdr3a_nucseq.str.lower()
181 adata.obs['cdr3b_nucseq'] = adata.obs.cdr3b_nucseq.str.lower()
182

~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/generic.py in getattr(self, name)
5485 ):
5486 return self[name]
-> 5487 return object.getattribute(self, name)
5488
5489 def setattr(self, name: str, value) -> None:

~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/accessor.py in get(self, obj, cls)
179 # we're accessing the attribute of the class, i.e., Dataset.geo
180 return self._accessor
--> 181 accessor_obj = self._accessor(obj)
182 # Replace the property with the accessor object. Inspired by:
183 # https://www.pydanny.com/cached-property.html

~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/strings/accessor.py in init(self, data)
166 from pandas.core.arrays.string_ import StringDtype
167
--> 168 self._inferred_dtype = self._validate(data)
169 self._is_categorical = is_categorical_dtype(data.dtype)
170 self._is_string = isinstance(data.dtype, StringDtype)

~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/strings/accessor.py in _validate(data)
223
224 if inferred_dtype not in allowed_types:
--> 225 raise AttributeError("Can only use .str accessor with string values!")
226 return inferred_dtype
227

AttributeError: Can only use .str accessor with string values!

Thank you for your kind help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in Aggregate the individual GEX runs into a single AnnData object #52

Error in Aggregate the individual GEX runs into a single AnnData object #52

QianhuiXu commented Oct 24, 2022 •

edited

Loading

phbradley commented Oct 24, 2022

QianhuiXu commented Oct 25, 2022 •

edited

Loading

Error in Aggregate the individual GEX runs into a single AnnData object #52

Error in Aggregate the individual GEX runs into a single AnnData object #52

Comments

QianhuiXu commented Oct 24, 2022 • edited Loading

phbradley commented Oct 24, 2022

QianhuiXu commented Oct 25, 2022 • edited Loading

QianhuiXu commented Oct 24, 2022 •

edited

Loading

QianhuiXu commented Oct 25, 2022 •

edited

Loading