Skip to content

HW5 2018

Meg Staton edited this page Oct 9, 2018 · 7 revisions

Fill in the missing code to make this script work as described. Save the script as a PLAIN TEXT file and email to mstaton1@utk.edu. Do not use Microsoft Word. PLAIN TEXT.

Due by midnight on October 17th for full credit. 20% off if submitted up to midnight Oct 24th.

Note, do not use any hardcoded index locations for any sequence features (like the location of the RE cut site). Use variables. This should work even if you change the sequence at the top of the script.

Instructions:

# HW #5
# DNA and Python

# here are the input strings
upstream = 'GAAATTGGTTATATCACCAAGTCATCAACTATTTACCATATGCCAAATCCAAATCCAA'
downstream = 'GCTTAATAATAATAATTCTTATTATATATCCAGAGGCTGCTGCTCTGCCATATT'

# create a variable called my_dna that has the upstream sequence joined to the downstream sequence 
# (1 pt)


# get the total length of my_dna and store in a variable named totalLen
# (1 pt)


# get the number of Gs in my_dna and store in a variable named gnum
# (1 pt)


# get the number of Cs in my_dna and store in a variable named cnum
# (1 pt)


# calculate the gc percentage and store in a variable named gc
gc = (gnum + cnum)/totalLen

# print the gc content of my_dna
# the printed line should read "GC content is " and then print the GC content
# (1 pt)


# find the location of the restriction enzyme cut site A/AGCTT (HindIII) and 
# store in a variable named hindiii_index 


# print the hindiii_index variable
# note that this should print the location of the beginning (first A) in the recognition pattern
# the printed line should read "HindIII cut site is " and then the string index 
# (1 pt)


# get the part of the sequence before the cut site (including the A before the 
# cut site) and print with the format:
# Before the cut site: <bases here>
# (1 pt)


# get the part of the sequence after the cut site (including the AGCTT)
# and print with the format:
# After the cut site: <bases here>
# (1 pt)


# there is a TATA box in the string. print everything upstream of it
# print with the format:
# Upstream of TATA box: <bases here>
# (1 pt)


# there is a start codon (ATG) in the string. print everything downstream of it, in lower case
# print with the format:
# Downstream of start codon: <bases here>
# (1 pt)

Example output (not the actual output! this was run with different input sequences):

GC content is 0.41739130434782606
HindIII cut site is 64
Before the cut site: GATTTTGTATATTGGCAGTAGAAAACACTTCGGCCGCCACCAAGCATATGCCAAATCCAAATCCA
After the cut site: AGCTTAAGTTCTAGGTTTCATTTCGCTACTCCTAACATTTGGGCAGACTT
Upstream of TATA box: ATTTTG
Downstream of start codon: ccaaatccaaatccaagcttaagttctaggtttcatttcgctactcctaacatttgggcagactt
Clone this wiki locally