This pipeline processes mutation data for the MOLT4 cell line. It performs:
- Filtering of missense variants with valid UniProt IDs
- Fetching canonical protein sequences via the UniProt API
- Applying amino acid mutations to sequences
- Generating forward and reverse protein sequences
- Extracting peptides around mutation sites (using K/R cleavage logic)
- Saving annotated outputs in FASTA and CSV formats
- Verifying peptide correctness through position and boundary checks
Ensure the following R packages are installed:
install.packages(c("tidyverse", "httr", "stringr", "readr", "knitr"))
Required File:
MOLT4 mutations.csv
Required Columns (case-insensitive):
Uniprot ID
: UniProt accession (e.g., P12345)Variant Info
: Must include the string"missense_variant"
Protein.Change
: Mutation notation (e.g.,p.P750Q
)Gene
: Gene symbol (e.g.,TP53
)
File Name | Description |
---|---|
MOLT4_mutations_with_sequences.csv |
Filtered entries + canonical sequences |
MOLT4_analysis_summary.csv |
Summary of processing/filtering stats |
MOLT4_mutated_protein_output.csv |
Mutated forward/reverse protein sequences + metadata |
MOLT4_mutated_protein_database.fasta |
FASTA-formatted protein sequences (mutated) |
MOLT4_mutated_peptide_database.fasta |
FASTA-formatted peptides around mutations |
MOLT4_peptide_data.csv |
Peptide details and positional metadata |
- Place
MOLT4 mutations.csv
in your working directory - Open the R script or R Markdown file
- Run all code blocks from top to bottom (recommended: RStudio)
- Output files will be saved to your working directory
- Filters for rows with valid UniProt ID and
missense_variant
- Fetches the canonical FASTA sequence from UniProt API
- Adds it to the dataset
- Parses mutations (e.g.,
p.P750Q
) - Replaces the amino acid at the given position in the sequence
- Validates original amino acid at the target site
- Generates reverse of mutated sequence
- Locates the second K/R residue upstream and downstream of the mutation
- Extracts peptide sequence from that window, inclusive of the K/R
- Applied to both forward and reverse sequences
- Checks boundary correctness of extracted peptides
- Validates mutation position
- Ensures the mutation is reflected in the peptide
Fwd_spP750Q|Q5SV97-1|PERM1_P750Q OS=Homo sapiens GN=PERM1P750Q (Sequence)
Fwd_spP750Q|Q5SV97-1|PERM1_P750Q OS=Homo sapiens GN=PERM1 (Truncated Sequence)