The main class to get to grips with in Biopython is Seq
. This is the primary interface to sequence data that you will work with. It is imported as:
from Bio.Seq import Seq
Once you have the Seq
class, you can create sequences from standard Python strings:
my_dna = Seq("AGTACACTGGTT")
Once you have the DNA in a Seq
object, you can perform standard operations on it, such as getting the complement of the sequence:
my_dna.complement()
and the reverse complement:
my_dna.reverse_complement()
You can the corresponding RNA sequence from a DNA sequence by using the transcribe
method:
my_rna = my_dna.transcribe()
Once you have an RNA sequence, you can again do standard operations on it, such as getting the complement:
my_rna.complement_rna()
It is also possible to convert back from an RNA sequence to a DNA sequence:
my_dna_from_rna = my_rna.back_transcribe()
Which, if it's working correctly should give us back the original data:
my_dna == my_dna_from_rna
Once we have an RNA sequence, you can get the expressed protein with translate
:
my_protein = my_rna.translate()
my_protein
Given a particular sequence
new_seq = Seq("AAATGGCAAAA")
Use the Biopython documentation to discover how you can count how many time the subsequence AA
is present.
Do you get the count you expect? Can you find the way to count all instances of AA
, even those that overlap?