Generating FASTA datasets for Variant Processing

This repository focuses on converting datasets from public resources into proteomic, search-compatible FASTA format. This is a building step to generate custom variant-containing FASTA databases.

Required R Packages

rinstall.packages(c("dplyr", "stringr", "readr"))

Input:

WT_og.csv file: List of BRAF variants.

Columns are as such:

Code:

Variant_Fasta_Conversion.Rmd: This R Markdown script processes a CSV file containing wild-type (WT) protein sequences and corresponding mutations to generate mutated sequences and their reverse complements. It loads/reads required libraries. For each entry, it parses mutation descriptions (e.g., A333N), verifies the wild-type residue, and substitutes the new amino acid at the specified position. Both WT and mutated sequences are then reversed. The code then creates FASTA headers for both forward and reverse sequences, including special handling for WT sequences (labeled separately).

Output:

WT_mutated_with_variant.fasta: .fasta file that holds the WT forward & reversed sequences, and forward & reversed sequences for mutations. The sequences are altered to hold the mutation.

Structure of headers are identified as such:

Fwd/Rev_spVariantTag | Uniprot ID | Organism (OS) Gene Name (GN)

WT_mutated_output.csv: Writes a CSV containing the original and mutated sequences. Columns highlight protein sequence, variant tag, position of mutation, mutation, mutated sequences, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generating FASTA datasets for Variant Processing

Required R Packages

Input:

Code:

Output:

Structure of headers are identified as such:

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
README.md		README.md
Variant_FASTA_Conversion.R		Variant_FASTA_Conversion.R
WT_mutated_output.csv		WT_mutated_output.csv
WT_mutated_with_variant.fasta		WT_mutated_with_variant.fasta
WT_og.csv		WT_og.csv

BackusLab/Custom-Variant-Database

Folders and files

Latest commit

History

Repository files navigation

Generating FASTA datasets for Variant Processing

Required R Packages

Input:

Code:

Output:

Structure of headers are identified as such:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages