Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seek for help running representation unifcation #121

Closed
shiying-sxu opened this issue Jul 7, 2022 · 7 comments
Closed

Seek for help running representation unifcation #121

shiying-sxu opened this issue Jul 7, 2022 · 7 comments

Comments

@shiying-sxu
Copy link

Hi,
I met a problem in the step of representation unification.
After I execute the command according to representation_unification.md, my vcf output folder is empty. I carefully check the steps and find that the CreateTensorFullAlignment command is missing in step 4 in the representation_unification.md file provided now, but after I add this command Found that the files in my CANDIDATE_DETAILS_PATH are all empty. Can you help me?
Sincerely

@aquaskyline
Copy link
Member

RU doesn't require CreateTensorFullAlignment and we are unable to pinpoint the problem per your description. We suggest you check the output of each step to ensure they are intact.

@shiying-sxu
Copy link
Author

The result of my running is as follows, can you help me to see what is wrong?

This is WhatsHap 1.2.1 running under Python 3.6.10
Working on 1 samples from 1 family
Leaving chromosome 'chr1' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr2' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr3' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr4' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr5' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr6' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr7' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr8' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr9' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr10' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr11' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr12' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr13' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr14' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr15' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr16' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr17' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr18' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr19' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr20' unchanged (present in VCF but not requested by option --chromosome)
Leaving chromosome 'chr21' unchanged (present in VCF but not requested by option --chromosome)
======== Working on chromosome 'chr22'
---- Processing individual HG001
Using maximum coverage per sample of 15X
Number of variants skipped due to missing genotypes: 0
Number of remaining heterozygous variants: 24447
Reading alignments and detecting alleles ...
Found 84382 reads covering 24442 variants
Kept 66806 reads that cover at least two variants each
Reducing coverage to at most 15X by selecting most informative reads ...
Selected 18985 reads covering 24437 variants
Variants covered by at least one phase-informative read in at least one individual after read selection: 24437
Phasing 1 sample by solving the MEC problem ...
MEC cost: 358080
No. of phased blocks: 310
Largest block contains 9714 variants (39.8% of accessible variants) between position 30263536 and 42499006
======== Writing VCF
Done writing VCF
Changed 120 genotypes while writing VCF
Leaving chromosome 'chrX' unchanged (present in VCF but not requested by option --chromosome)

== SUMMARY ==
Maximum memory usage: 0.417 GB
Time spent reading BAM/CRAM: 82.1 s
Time spent parsing VCF: 57.8 s
Time spent selecting reads: 8.1 s
Time spent phasing: 125.4 s
Time spent writing VCF: 48.8 s
Time spent finding components: 4.3 s
Time spent on rest: 32.2 s
Total elapsed time: 358.6 s
Found 1 sample(s) in input VCF
Found 1 sample(s) in BAM file
Reading alignments and detecting alleles ...
Found 84015 reads covering 24143 variants

== SUMMARY ==
Total alignments processed: 431114
Alignments that could be tagged: 83055
Alignments spanning multiple phase sets: 0
Finished in 211.6 s
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[faidx] Truncated sequence: chr22:46903593-50973515
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
cat: '/work/Clair3-main/data/outputref-HG001_GRCh38/vcf_output/vcf_*': 没有那个文件或目录
[WARNING] No vcf file found, please check the setting
[WARNING] No variant found, please check the setting

@zhengzhenxian
Copy link
Collaborator

Hi,
seem no record found in your VCF output, are you following the steps here for RU? There is no need to add CreateTensorFullAlignment step in RU, as we have included this step in UnifyRepresentation submodule, pls check the code here for more details.

@shiying-sxu
Copy link
Author

The command I execute is as follows

Setup variables

CLAIR3="/work/Clair3-main/clair3.py" # clair3.py
PYPY="/home/user/anaconda3/envs/clair3/bin/pypy3.6" # e.g. pypy3
WHATSHAP="/home/user/anaconda3/envs/clair3/bin/whatshap" # e.g. whatshap
PARALLEL="/home/user/anaconda3/envs/clair3/bin/parallel" # e.g. parallel
TABIX="/home/user/anaconda3/envs/clair3/bin/tabix" # e.g. tabix
SAMTOOLS="/home/user/anaconda3/envs/clair3/bin/samtools" # e.g. samtools

Input parameters

PLATFORM="ont" # e.g. {ont, hifi, ilmn}
VCF_FILE_PATH="/work/Clair3-main/data/datatest/HG001/GRCh38/GRCh38.vcf.gz" # [YOUR_VCF_FILE_PATH]e.g. hg003.vcf.gz
BAM_FILE_PATH="/work/Clair3-main/data/datatest/HG001/GRCh38/GRCh38.bam" # e.g. hg003.bam alignment.bam
REFERENCE_FILE_PATH="/work/Clair3-main/data/datatest/HG001/GRCh38/GRCh38.fa" # e.g. hg003.fasta
BED_FILE_PATH="/work/Clair3-main/data/datatest/HG001/GRCh38/GRCh38.bed" # e.g. hg003.bed
OUTPUT_DIR="/work/Clair3-main/data/outputref-HG001_GRCh38" # e.g. output

Chromosome prefix ("chr" if chromosome names have the "chr" prefix)染色体前缀

CHR_PREFIX="chr"

array of chromosomes (do not include "chr"-prefix)

CHR=(22)

Number of threads to be used

THREADS=24

The number of chucks to be divided into for parallel processing

#并行加工需分割的卡盘数量
chunk_num=15
CHUNK_LIST=seq 1 ${chunk_num}

Minimum AF required for a candidate variant

MIN_AF=0.08

Temporary working directory

SPLIT_BED_PATH="${OUTPUT_DIR}/split_beds"
VCF_OUTPUT_PATH="${OUTPUT_DIR}/vcf_output"
VAR_OUTPUT_PATH="${OUTPUT_DIR}/var"
PHASE_VCF_PATH="${OUTPUT_DIR}/phased_vcf"
PHASE_BAM_PATH="${OUTPUT_DIR}/phased_bam"

mkdir -p ${SPLIT_BED_PATH}
mkdir -p ${VCF_OUTPUT_PATH}
mkdir -p ${VAR_OUTPUT_PATH}
mkdir -p ${PHASE_VCF_PATH}
mkdir -p ${PHASE_BAM_PATH}

2. Phase VCF file using WhatsHap

To apply representation unification, using a phased read alignment is highly recommended in order to get more precious unified result.

WhatsHap phasing vcf file if vcf file includes '|' in INFO tag

cd ${OUTPUT_DIR}

WhatsHap phasing vcf file if vcf file includes '|' in INFO tag

${WHATSHAP} unphase ${VCF_FILE_PATH} > ${OUTPUT_DIR}/INPUT.vcf.gz

WhatsHap phase vcf file

${PARALLEL} --joblog ${PHASE_VCF_PATH}/phase.log -j${THREADS}
"${WHATSHAP} phase
--output ${PHASE_VCF_PATH}/phased_{1}.vcf.gz
--reference ${REFERENCE_FILE_PATH}
--chromosome ${CHR_PREFIX}{1}
--ignore-read-groups
--distrust-genotypes
${OUTPUT_DIR}/INPUT.vcf.gz
${BAM_FILE_PATH}" ::: ${CHR[@]}

Index phased vcf file

${PARALLEL} -j ${THREADS} tabix -p vcf ${PHASE_VCF_PATH}/phased_{1}.vcf.gz ::: ${CHR[@]}

```

#3. Haplotag read alignment using WhatsHap

WhatsHap haplotags bam file

${PARALLEL} --joblog ${PHASE_BAM_PATH}/haplotag.log -j${THREADS}
"${WHATSHAP} haplotag
--output ${PHASE_BAM_PATH}/{1}.bam
--reference ${REFERENCE_FILE_PATH}
--regions ${CHR_PREFIX}{1}
--ignore-read-groups
${PHASE_VCF_PATH}/phased_{1}.vcf.gz
${BAM_FILE_PATH}" ::: ${CHR[@]}

Index the phased bam file using samtools

${PARALLEL} --joblog ${PHASE_BAM_PATH}/index.log -j ${THREADS} ${SAMTOOLS} index -@12 ${PHASE_BAM_PATH}/{1}.bam ::: ${CHR[@]}
#4.Prepare true variant set

Split bed file regions according to the contig name and extend bed region

${PARALLEL} --joblog ${SPLIT_BED_PATH}/split_extend_bed.log -j${THREADS}
"${PYPY} ${CLAIR3} SplitExtendBed
--bed_fn ${BED_FILE_PATH}
--output_fn ${SPLIT_BED_PATH}/{1}
--ctgName ${CHR_PREFIX}{1}" ::: ${CHR[@]}

#Get true variant label information from VCF file
${PARALLEL} --joblog ${VAR_OUTPUT_PATH}/get_truth.log -j${THREADS}
"${PYPY} ${CLAIR3} GetTruth
--vcf_fn ${PHASE_VCF_PATH}/phased_{1}.vcf.gz
--ctgName ${CHR_PREFIX}{1}
--var_fn ${VAR_OUTPUT_PATH}/var_{1}" ::: ${CHR[@]}

#5. Unify Representation for true variant set and candidate sites
${PARALLEL} --joblog ${OUTPUT_DIR}/unify_repre.log -j${THREADS}
"${PYPY} ${CLAIR3} UnifyRepresentation
--bam_fn ${PHASE_BAM_PATH}/{1}.bam
--var_fn ${VAR_OUTPUT_PATH}/var_{1}
--ref_fn ${REFERENCE_FILE_PATH}
--bed_fn ${BED_FILE_PATH}
--extend_bed ${SPLIT_BED_PATH}/{1}
--output_vcf_fn ${VCF_OUTPUT_PATH}/vcf_{1}_{2}
--min_af ${MIN_AF}
--chunk_id {2}
--chunk_num ${chunk_num}
--platform ${PLATFORM}
--ctgName ${CHR_PREFIX}{1}" ::: ${CHR[@]} ::: ${CHUNK_LIST[@]} > ${OUTPUT_DIR}/RU.log

#6. Merge and sort unified VCF output
cat ${VCF_OUTPUT_PATH}/vcf_* | ${PYPY} ${CLAIR3} SortVcf --output_fn ${OUTPUT_DIR}/unified.vcf
bgzip -f ${OUTPUT_DIR}/unified.vcf
tabix -f -p vcf ${OUTPUT_DIR}/unified.vcf.gz

@aquaskyline aquaskyline changed the title Representation Unification error Seek for help running representation unifcation Jul 7, 2022
@zhengzhenxian
Copy link
Collaborator

Still no hint for the empty output. Could you check each file's validity in step 5? We speculated that some files are empty or missing in your step 5.

@shiying-sxu
Copy link
Author

Still no hint for the empty output. Could you check each file's validity in step 5? We speculated that some files are empty or missing in your step 5.

After I reinstalled the environment, it can run correctly, I think it should be an environment problem.
In addition, I have a problem that when I try to run quickdemo, I get an error no module named libclair3, and I can't install it correctly with the command conda install libclair3

@zhengzhenxian
Copy link
Collaborator

After I reinstalled the environment, it can run correctly, I think it should be an environment problem. In addition, I have a problem that when I try to run quickdemo, I get an error no module named libclair3, and I can't install it correctly with the command conda install libclair3

libclair3 is built via this command(source activate clair3 && make PREFIX=${CONDA_PREFIX}) in installation option 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants