Biology 205 Studying evolutionary relationships and population genetics using NCBI

VBS Home page, Genetics Home Page, Previous Page, Next Page

The NCBI databases provide access to a number of different tools for studying population genetics and the evolutionary relationships between species using nucleotide or amino acid sequences. Today we will explore a familiar protein, cytochrome C in order to learn about the following tools available through NCBI that allow investigators to do comparative studies between different organisms. These sorts of studies are important in terms of understanding evolutionary relationships among organisms and to provide insight into the possible functions of different genes.

Complete the following five activities between today and next Wednesday. You should prepare answers to each of these activities and send them to me electronically by next Wednesday.
1. First go to the NCBI site map and find the definitions of the following terms using the site search Engine found at theNCBI site map:

A. Homology or Homologous sequences





B. Orthology or Orthologous sequences


 


C. Paralogy or Paralogous sequences

 

 

D. Synteny

 

 


E. Phylogeny or Phylogenetic tree.


 


2. Do a BLASTp  search for human cytochrome C using the Swiss-prot database. If you wish, directly input the accession number P00001 into the BLASTp search page.

When the BLAST results appear, remember this may take a few minutes, click on the taxonomy report link. The taxonomy report arranges the significant BLAST alignments in a number of reports.

The lineage report shows the BLAST results arrayed in terms of standard taxonomic groups centered on your protein sequence.\, In this case human cytochrome C. The lineage report for cytochrome C should start out looking like this:

Eukaryota           [eukaryotes]
. Fungi/Metazoa group [eukaryotes]
. . Bilateria [animals]
. . . Coelomata [animals]
. . . . Deuterostomia [animals]
. . . . . Vertebrata [vertebrates]
. . . . . . Gnathostomata [vertebrates]
. . . . . . . Euteleostomi [vertebrates]
. . . . . . . . Tetrapoda [vertebrates]
. . . . . . . . . Amniota [vertebrates]
. . . . . . . . . . Theria [mammals]
. . . . . . . . . . . Eutheria [mammals]
. . . . . . . . . . . . Primates [mammals]
. . . . . . . . . . . . . Catarrhini [mammals]
. . . . . . . . . . . . . . Homo sapiens (man) ------------------------------- 194 1 hit [mammals] Cytochrome c
. . . . . . . . . . . . . . Macaca mulatta (rhesus macaque) .................. 189 1 hit [mammals] Cytochrome c
. . . . . . . . . . . . . Ateles sp. ----------------------------------------- 184 1 hit [mammals] Cytochrome c
. . . . . . . . . . . . Oryctolagus cuniculus (domestic rabbit) -------------- 179 1 hit [mammals] Cytochrome c
. . . . . . . . . . . . Mus musculus (mouse) ................................. 178 2 hits [mammals] Cytochrome c, somatic
. . . . . . . . . . . . Hippopotamus amphibius ............................... 176 1 hit [mammals] Cytochrome c
. . . . . . . . . . . . Bos taurus (bovine) ....

Notice that the upper part of the report gives the nested taxonomic hierarchy from Domain to Kingdom down to species for the species in the data base for your protein. Note starting with humans and then going to other species the report gives the number of significant BLAST hits, an similarity score(BLAST or bit score. A somewhat technical look at what these scores mean is here: http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html#head3), the taxonomic level for the organism and then a link to whatever protein  the hits are for.  You can click on the links to get more information about the species, taxonomic group or the proteins involved.

A.  Looking at the blast score, higher blast scores corresponds to more similar protein sequences, do the BLAST scores make sense in terms of taxonomic relationships? Are there any unrelated vertebrates with high BLAST scores compared to human cytochrome C?

If yes, what organism is it?  Scroll down to the organism report- just after the lineage report to get information about this organism. The organism report arranges the BLAST hits by similarity(BLAST score) regardless of the taxonomic relationships between the organisms.






B. What might explain this odd finding? You might compare how similar this organism's cytochrome C sequence is with human cytochrome C compared to species which are more closely related to us. To do this go to the original BLAST report and look at the listing of hits arranged by bit score which is found just after the graphical display.








C. Just for grins, do a BLAST search using this other organism's cytochrome C sequence. Does human cytochrome C come out as similar to this organisms? What organisms are most similar in terms of BLAST scores to your target sequence?






3. In medicine and genetics we often study gene action and diseases using various animal models. One useful resource is this resource: http://www.ncbi.nlm.nih.gov/Homology/ which allows you to find regions of homology between mouse chromosomes and human chromosomes and identify paralogous and orthologous sequences. In order to use this database you need to know your gene's name and what human chromosome it is on. We did this last week for our cytochrome C gene so you might repeat this search using locuslink. Then knowing the chromosome and gene name you can compare where the gene's ortholog is in the mouse genome (if an ortholog is known). See if you can get the mouse - human homology site to work for the cytochrome C gene.

A. What mouse chromosome is the gene on?

 

 

 

 

B. Compare the region of the mouse chromosome wit the ortholog to the corresponding region of the human chromosome with the cytochrome C gene locus. Are the regions similar or different in terms of the genes found there? This will take some digging.

 

 

 

 

4.  Population genetics data. Finding population genetic data about your gene may require several approaches.

A. Return to the NCBI database and do a search for cytochrome C using POPSet. I recommend that you search using cytochrome C AND human. Do you find any direct references to cytochrome C? Notice you see references to mitochondria DNA sequences. Why do these references come up when you search for cytochrome C?







B. Use the NCBI home page to do an OMIM search by clicking on the OMIM link at the top of the page(not the pull down menu) or use this link: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM&cmd=Limits . Restrict your search to allelic variants by checking the appropriate box on this page. Do you find any allelic variants involving cytochrome C or related genes? Note that if you use the OMIM accession number for human cytochrome C (123970 ) you do not find any allele variants for human cytochrome C gene. Why might this not be surprising?  



Tip: Often when you do your general OMIM search there will be a section with references to  the population genetics of your gene.






5.  Another useful resource is locus link. You can search locus link using OMIM numbers such as 123970 for our cytochrome C. When you do the locus link search you will typically see a series of colored squares which are explained on the locus link help page. If locus link has information about allelic variants there will be a purple square labeled 'V' for variant. Clicking on that square should pull up any data related to allele variants including possibly population genetics data. Does this search yield any information about allelic variants for the cytochrome C gene?

 

 

 

 

 

 

 

 

 

pgd 03/23/03