Biology 205 Online Lab Exercise #2: Having a BLAST!
VBS
Home page,Genetics
Home page, Previous Page, Next
Page
Introduction:
Imagine you are research associate at a well known biotech company and your boss (who by the way knows nothing about biology) has asked you to do some research on the enzyme polynucleotide phosphorylase. Search Pubmed to find one or articles summarizing the function of polynucleotide phosphorylase in Escherishia coli. Note that the enzyme name is often abbreviated PNPase.
The scenario:
Your boss wants to know the following things about this protein: These are some of theings she might want to know:
1. What are the functions of this protein in E. coli?
2. Is this protein or a related protein also found in humans (Homo sapiens)? If so is it associated with any medical conditions or genetic disorders?
3. Are there any chemical assays for this protein?
4. Why is this protein important in the history of modern genetics? You might find a google (www.google.com) search most useful for answering this question.
Learning goals:
Procedure:
This is a somewhat free form exercise and you should help each other explore the databases. The specific things that you can do with these programs changes from semester to semester. We will explore the available options as a group but each of you will do the activity parts of the lab on your own, turning in your own assignment. The activities are due the next lab, but we will spend part of lecture discussing this activity and dealing with any problems related to it.
Search tips:
NCBI searches can use Boolean operators, so you might search for general review articles about this protein by searching for polynucleotide phosphorylase AND review. Scan some of abstracts and list the main functions of this enzyme in E. coli. Boolean operators are terms such AND, OR or NOT. In NCBI searches these terms must be in upper case.
If you did your search properly you should see a series of numbered citations with the author, title of the article and Journal citation.
To the far right of the author's name you should see links in pale blue that say "Related articles" and "Links" which you may find useful. When you get your initial pubmed search, underneath the blank search box there will be link in pale blue called "Limits". This allows you to do advanced searching by publication types, dates etc.
If you want, another way to search is to open up the TaxBrowser from the opening NCBI page. This links to data about all the major organisms used in genetics and presents you to a series of links for these organisms. Clicking on the Escherishia coli link takes you to another page that has a table labeled "entrez records" which has a series of links to different databases that you can search about E. coli. Personally I like doing Boolean searches from the main NCBI window.
Activity 1: Before you go any further. Do a general search for PNPase. You might even do a ggogle search. What does this protein do?
About Protein database records:
Suppose your boss wants you to find the sequence of amino acids in the polynucleotide phosphorylase from E. coli. To do this, do an NCBI search using the protein database rather than pubmed.
You will get a series of accession numbers and citations. Find one that deals specifically with this enzyme and open it up. You will see a database record with a long series of database fields some of which have comments which may or may not be useful. A typical data record for this enzyme might start out like this:
LOCUS P05055 711 aa linear BCT 15-JUN-2002
DEFINITION Polyribonucleotide nucleotidyltransferase (Polynucleotide
phosphorylase) (PNPase).
ACCESSION P05055
VERSION P05055 GI:1172545
DBSOURCE swissprot: locus PNP_ECOLI, accession P05055;
Toward the end of the data record is this sequence of fields:
ORIGIN
1 mlnpivrkfq ygqhtvtlet gmmarqataa vmvsmddtav fvtvvgqkka kpgqdffplt
61 vnyqertyaa gripgsffrr egrpsegetl iarlidrpir plfpegfvne vqviatvvsv
121 npqvnpdiva migasaalsl sgipfngpig aarvgyindq yvlnptqdel keskldlvva
181 gteaavlmve seaqllsedq mlgavvfghe qqqvviqnin elvkeagkpr wdwqpepvne
241 alnarvaala earlsdayri tdkqeryaqv dviksetiat llaedetlde nelgeilhai
301 eknvvrsrvl agepridgre kdmirgldvr tgvlprthgs alftrgetqa lvtatlgtar
361 daqvldelmg ertdtflfhy nfppysvget gmvgspkrre ighgrlakrg vlavmpdmdk
421 fpytvrvvse itesngsssm asvcgaslal mdagvpikaa vagiamglvk egdnyvvlsd
481 ilgdedhlgd mdfkvagsrd gisalqmdik iegitkeimq valnqakgar lhilgvmeqa
541 inaprgdise faprihtiki npdkikdvig kggsviralt eetgttieie ddgtvkiaat
601 dgekakhair rieeitaeie vgrvytgkvt rivdfgafva igggkeglvh isqiadkrve
661 kvtdylqmgq evpvkvlevd rqgrirlsik eateqsqpaa apeapaaeqg e
//
This gives the sequence of amino acids in the protein as recorded in one of the major protein databases, in this case the Swiss-prot data base. The letters are the new one letter symbols for the amino acids. The genetic code table on page 145 in Russell shows these in addition to the standard 3 letter abbreviations. So for instance the letter 'f' stands for phenylalanine.
I recommend that you find and examine the whole record I am referring to by doing the following from the opening NCBI page: Type or paste the accession number P05055 into the search window and from the pull down menu select proteins and press "Go". This should bring up just the P05055 database record from the protein database. Notice that you get a whole series of references about this protein and its gene, many of which you can access at least the abstracts for.
For example as part of this database record there are the following references:
TITLE The complete genome sequence of Escherichia coli
K-12
JOURNAL Science 277 (5331), 1453-1474 (1997)
MEDLINE 97426617
PUBMED 9278503
REMARK SEQUENCE FROM N.A.
STRAIN=K12 / MG1655
REFERENCE 3 (residues 1 to 711)
AUTHORS Portier,C. and Regnier,P.
TITLE Expression of the rpsO and pnp genes: structural analysis of a DNA
fragment carrying their control regions
JOURNAL Nucleic Acids Res. 12 (15), 6091-6102 (1984)
MEDLINE 84297215
PUBMED 6382163
REMARK SEQUENCE OF 1-196 FROM NA
REFERENCE 4 (residues 1 to 711)
AUTHORS Evans,S. and Dennis,P.P.
TITLE Promoter activity and transcript mapping in the regulatory region
for genes encoding ribosomal protein S15 and polynucleotide
phosphorylase of Escherichia coli
In the actual record the medline and pubmed accession record numbers are live
links or you can search for them separately.
At anytime while searching if you click on the link that says history from one of the NCBI databases, you will get a list of your recent searches. This is useful if you need to backtrack. Also when looking up information about a gene or a protein, researchers on each of the major organisms in genetics have developed standard names for genes. Record these as they are often useful for searching. For instance,the gene in E, coli that codes for polynucleotide phosphorylase is called pnp.
I recommend leaving a text editor open in a separate window and cutting and pasting materials of interest to a text file for later use. Be sure to document what you do and the sources of your information just as in standard library research.
Blinks, Domains and Links.
When you do a search and get a database record, or list of data base records there are often one or more of the following three links on the right hand side of the page by each record. These links say "BLink, Domains, Links"
The "links" link is the easiest to understand. It gives you links to related references to the literature and often to the DNA sequence of the gene or to related protein sequences.
The "Domain" link enables you to examine the domains present in your protein and look for similar domains in proteins of other organisms. A protein domain is a discrete or independent region of a protein with its own characteristic folding pattern and specific function in the protein. Analysis of domains is useful in diverse areas such as drug design and evolutionary biology.
Blink stands for blast link and this is a new feature on NCBI. Opening this link automatically uses a program called BLAST: http://www.ncbi.nlm.nih.gov/BLAST/ to search available sequence databases and extract those proteins most similar to the one you have using a similarity program.
Having a BLAST.
Often you have a protein sequence and want to compare it with that of other
organisms or look for similar proteins perhaps with different functions. This
can be done using BLAST. BLAST( Basic Local Alignment Search Tool) is a search
engine that searches for similar protein or nucleotide sequences to one that
you provide the search engine. Return to the main NCBI page or if the blast
link is visible at the top of whatever page you are on
or open this link: http://www.ncbi.nlm.nih.gov/BLAST/.
BLAST is an extremely powerful and complex program and we can just scratch the surface. You may find this tutorial http://www.geospiza.com/outreach/BLAST/ useful. Examples of how blast is useful include the following:
Before you do a BLAST search, read about BLAST in the BLAST overview to get
a basic idea of the types of things you can do with BLAST. BLAST compares
sequences based on their alignment and calculates a similarity or S score and
and E-value. The E-value is related to the the number of sequences with an S
score as large or larger that would be observed in a random sequence of either
nucleotides or proteins depending what sort of sequence comparison you are doing.
Blast can search for proteins or for nucleic acid sequences and the searches operate pretty much the same way. When you open the BLAST search window you can enter either an accession number or a sequence of monomers-amino acids if you are doing a polypeptide search using BLASTp or a nucleic acid search using BLASTn.
Activity 2. Doing BLAST searches using BLINK from the NCBI site:
Return to E. coli polynucleotide phosphorylase record. At the top of the page on the right hand side you will see the 'Blinks' link. Click on this link and examine the resulting display. Depending on whether or not you are dealing with a protein or a nucleic acid, you may be able to get a BLASTn or a BLASTp report for your sequence. Do this for the E. coli PNPase and briefly describe three types of information you can get from BLINK at least this semester. :-)
Activity 3: Doing BLAST searches directly:
In addition to using BLINK you can do more general BLAST searches by selecting the BLAST link from the main NCBI page. This links to a page that allows you to do a number of specialized searches BLASTn and BLASTp being the most generally useful. If you open the BLASTp or BLASTn link you are presented with a search window. This is a very flexible search. For example you can enter an accession code from a protein database or a nucleotide database. Note that if you enter an accession code for a protein into BLASTn's window you get an error message and vice versa.
Try doing a direct BLASTp search using the accession code ( P05055 )for the E. coli PNPase and briefly describe three types of information you can get from this sort of BLASTp search. The search may take a few minutes as your job is batched along with other jobs.
The other thing you can do with BLAST is enter a sequence of monomers, amino acids using a BLASTp search or nucleotide sequence using a BLASTn search. You can even use RNA sequences for a BLASTn search. Just for grins, cut and paste the amino acid sequence at the end of the E. coli PNPase record into the BLASTp search window to verify that you can do this.
Activity 4: Your boss has gotten excited about PNPase and is curious about human PNPase. Do an OMIM search to get you started and find a reference to both the nucleotide and protein sequence to a human PNPase. Write the gene name, the OMIM accession number, and a reference to a major paper that discusses this gene here:
Pull up a protein and a nucleotide sequence for your PNPase gene and answer the following questions and investigate the protein and nucleotide information for the gene.
A. How many base pairs does the gene have? Note that the sequence given is
the sequence that is transcribed to make the RNA.
B. You may remember that each a sequence of 3 bases is supposed to correspond
to a codon. So about how many amino acids long should the human PNPase protein
be?
C. How many amino acid residues are actually in the PNPase protein?
Your answers may differ depending on whether or not the nucleotide sequence is for the original gene or one consisting of so called 'cDNA'.
D. Using either your protein database accession number or cutting and pasting the amino acid sequence for the human PNPase gene into the BLASTp search window, do a BLASTp search for all similar sequences. What sequences are most similar to human PNPase?
Find the E. coli PNPase human PNPase comparison by scrolling down through this report and in a paragraph, compare the human PNPase to the E. coli PNPase. How are the proteins different and how are they similar?
Activity 4: Getting to your own projects (10 points Due next Thursday): We have just scratched the surface with BLAST and the other features you can link to using the NCBI database! Assignment for next lab. Do an OMIM search, or other sort of search if need be for a gene that appears to be related to your project in some way. Find the accession number for a protein (assuming the gene codes for a protein) coded for by the gene. Do a BLASTp search for the protein and write a half page summary about this gene and its protein. Make sure that you provide references and accession numbers from OMIM and the protein database and nucleotide database used. Check to see if a similar gene is found in both Yeast and E. coli and report your findings, yes or no and briefly describe what is known about the function of the gene in these organisms (if anything!)
If you do not have a specific gene related to your project use the mystery sequences found in the file mystery.html. The first sequence represents an mRNA and it's peotein product isolated from a human nerve cell in the brain. The mRNA is given in terms of the equivalent(not complementary) DNA sequence. The second sequence in the file gives the corresponding protein sequence. Use either the mRNA squence or protein and do the appropriate BLAST search to figure out what gene the mRNA and the protein are from. Write a half page summary as described in the previous paragraph.
In a future lab we will examine how we can use BLAST and various protein and gene data bases to look at evolutionary relationships among different groups of organisms an approach called molecular phylogeny.
VBS Home page,Genetics Home page, Previous Page, Next Page
pgd created 03/10/02
extensively revised 10/14/03, 04/05/04