Identify the open reading frame in the following DNA sequence, the protein that this

gene encodes for, its function, and the source. You can consult the bioinformatics exercise “Project 1: Databases for the Storage and ‘Mining’ of Genome Sequences” (ATTACHED). The procedure to identify the gene and the protein that it encodes is as follows:

** Look carefully at the DNA sequence provided (ATTACHED Q_10_i_dna_sequence.pdf), and identify the start site for transcription. **

Click on the DNA sequence from the start site of transcription and select all of the sequence and copy the sequence.

Go to the National Center for Biotechnology Information website and click on BLAST on the right hand side under “Popular Resources”. BLAST is a program that will allow you to find the protein sequence for the DNA sequence (gene) you submit. Next click on blastx on the left hand column under the title “Basic Blast”.Paste the DNA sequence into the box and click BLAST!. The search may take a few seconds and the page will keep updating until the search is completed.

When the search is complete you will have a figure showing the most homologous results or “sequences producing significant alignments” and following that, a list of what these proteins are. Your protein will be the first one on the list. You can click on the left hand side on the accession number or sequence identifier information which will bring up more information. You should be able to find the name, function, size (number of amino acids) and source (name of the organism) for the protein.

Your answer should include the:1.Amino acid sequence of the protein2.Size of the protein3. Identity of the protein4. Function of the protein

