Probability in Genetics.

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

Introduction.  In order to understand genetics you need to have some basic concepts concerning probababilty. The goal of this exercise is to introduce the important probability concepts and illustrate their use with some elementary examples from genetics. At first reading this material may seem a bit abstract, but as you work with problems you will begin to get comfortable with it.

Basic Definitions:

Sample space.

All possible outcomes and their freuquencies for a random experiment. For example. If you flip a coin once the sample space is the set {H,T} and if the coin is fair the frequency of H in the sample space and the frequency of T in the sample space is 1/2. Suppose you have two coins the sample space when both coins are flipped is {HH, HT, TH, TT}. An event is a subset of the sample space. Sometimes describing the sample space can be a little bit tricky for example when the probabilities of each event in the sample space are not equal! In this case we describe the sample sapce as a list of events and their frequencies all of which have to sum to 1.0. A good example is the cell cycle problem discussed below.

Ordered vs unordered event.

If the sequence of the joint occurrence of several events is important then this is ordered. For example, maybe it is important to distinguish between {HT} and {TH} when two coins are flipped. This is an ordered event. But if you define your event as being 'exactly one head' when two coins are flipped, then this can happen in two ways {HT} and {TH}. So the event 'exactly one head is an unordered event.

Probability.

The expected frequency of a particular event when an experiment is repeated an infinite number of times is the probability of the event. For a single coin toss, the probability of a head on a single toss is 1/2. Probabilities are always assumed to be real numbers between 0 and 1. Probabilities in genetics are often predicted based on certain hypotheses and then the predictions are used to test the hypothesis using real data.

We will refer to the probability of an event as P(event). So for a coin flipped once and only once for the sample space {T,H} P(H) = 1/2. Notice that in the absence of other information, we will often assume that the probability of elementary events such as the result of a single toss of a coin are equally likely. The problem is for us is going to be when elementary events are combined.

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

Counting: Often times it is useful to have some rules for counting the number of possible events (outcomes for an experiment) or the number of events in a certain sub set of all possible events.

Ordered events. Example. Consider flipping two coins once. For each coin there are two possible outcomes. Hence the total number of possible outcomes involving both coins is 2*2 = 4. They are HH, HT, TH and TT. Notice that we make a distinction here between HT and TH. In other word the sequence of the events is important. When this is true we talk about ordered events.

Useful rule for the number of ordered events: In general if for an event you can have one of N possible outcomes, then if you have M joint events then the total number of possible combinations of M joint events is going to be N^M. Thus for a single die there are six possible outcomes. If you have 10 dice, there are 6^10 possible outcomes if all 10 dice are tossed at once, 6 outcomes for the first die times 6 for the second die etc.

Unordered events. Sometimes the order is not important. For example if in the case of our two coins, we define the event of interest as getting exactly one head, there are two ordered ways to do this HT and TH and we really don't care about the order. These are unordered events. We will see this type of event a lot in genetics. For example a heterozygous individual Aa could have received the A allele from her mother and the a allele from her father or vice versa, butwe really don't care which. But in counting we have to count both events!

In general we can ask for N experiments with two possible outcomes for each experiment about the number of events involving M of the first outcome and N-M of the second outcome. For example if you flip 10 coins you can ask about the number of ways you can get 6 heads in 10 flips.

This is going to be all the possible ordered combinations of 6 Heads in 10 flips{ TTTTHHHHHH, THTTTHHHHH, etc.}.

Useful rule for the number of unordered joint events where there are N possible outomes per event and M joint events:

Number of unordered joint events = N!/[(M!)(N-M)!]

Where N! is N factorial or 1*2*3* ...*N

For for our coin example, the number of ways you can get 6 heads in 10 tosses is:

10!/(6! 4!) = 10*9*8*7/(4*3*2*1)

Make sure you understand why this is!.

Note that for this example the total number of possible ordered events is 2^N or 2 raised to the Nth power. So the probability of any ordered outcome is going to be 1/(2^N). Thus the probability of getting 6 heads for 10 coin tosses is going to be:

number of all ordered outcomes resulting in 6 heads/total number of all possible ordered outcomes. =

10!/(6! 4!)/2^N

This works since we are assuming the probabilities for all the ordered outcomes are equal.

Knowing about counting enables us to define probability in the following way :

The probability of a particular event happening is simply the frequency that the event would be found in a random sample of all events from the sample space.

For example with our two coin experiment there are four possible outsomes and the sample space is {HH, HT, TH, TT}. Since these events are equally frequent the probability of each one is obviously 1/4.

The probability of the unordered event exactly one head in two tosses = P(HT) + P(TH) = 1/2

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

Applications of these ideas to genetics.

The cells in your body receive half their chromosomes from your father and one half of their chromosomes from your mother. So for each pair of homologous chromosomes one will be a maternal chromosome and one will be a paternal chromosome.. During meiosis when the haploid gametes are formed, each member of the pair but not both ends up in a gamete. This is the principle of segregation.

1. How many possible combinations of maternal and paternal chromosomes are there?

Answer. For any single pair of chromosomes either the gamete has a maternal chromosome or a paternal chromosome, so there are two possibilities for any single pair of chromosomes. Since we have 23 pairs of homologous chromosomes there are 2^23 possible combinations of maternal Vs paternal chromosomes in the gametes.

2. How many of these combinations involve 10 maternal chromosomes and 13 paternal chromosomes?

Answer: We want the total number of ordered combinations involving 10 maternal and 13 paternal chromosomes. So use the formula:

N!/[(M!)(N-M)!] where N = 23 and M = 13

3. What is the probability of getting a gamete with 10 maternal and 13 paternal chromosomes? Assume any ordered combination is equally likely.

4. Recall that DNA is made from a sequence of four nucleotides that differ in terms of which nitrogen base A,T,G,C is present in each nucleotide.

Suppose you have a region of DNA which is 100 nucleotides long. How many possible nucleotide sequences are there for this region of DNA?

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

Rules for handling probabilities:

Independent and mutually exclusive events:

Two events A and B are said to be independent events if the probabilities of both events happening jointly.

Useful rule: multiplication rule for the joint  probability of two independent events A,B:

P(A and B) = P(A)*P(B)

So for example if we flip two coins P(H on the first coin and H on the second coin) = P(H on the first coin)*P(H on the second coin) = 1/4

Two events are said to be mutually exclusive if P(A and B) = 0

For example for a single flip of a coin the event H and the event T are mutually exclusive since they cannot happen at the same time.

Note that if for all the mutually exclusive events in a sample space the probabilities must some to 1. Thus P(A or B) = 1

For example a die has 6 different faces which are mutually exclusive. If each one is equally likely then the probability of any on occurring on a single toss is 1/6. So all the probabilities have to sum to 1 for our sample space{1,2,3,4,5,6}

Useful rules: Addition rules for mutually exclusive events:

Given N mutually exclusive events {A1,A2, ... ,AN} then

P(A1) + P(A2) + ... + P(AN) = 1.0

P(A particular mutually exclusive event) = 1 - P(all the others)

Example for coin tossing.

Suppose I toss a coin 10 times (or 10 coins once) and ask what is the probability of getting at least one head in 10 tosses. We could do this in two ways one is to sum probabilities P(1 at least head in 10) + P(2 heads in 10) + P(10 heads in 10) or we can simply go:

P(1 at least head in 10) = 1 - P(no heads in 10) = 1 - 1/(2^10)

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

Application to genetics:

5. For maternal and paternal chromosomes in the gamete example, what is the probability of a gamete having at least one maternal chromosome?

P(at least one maternal chromosome) = 1 - P(no maternal chromosomes) = 1 - 1/2^23

6. For a type of tissue the probability of a randomly selected cell being in a particular stage of the cell cycle is given by the following table which describes the sample space:

Cell cycle Probability

Interphase 0.5

Prophase 0.1

Metaphase 0.05

Anaphase 0.2

Telophase 0.15

Suppose you examine 100 cells at random from this tissue. What is the probability of seeing at least one metaphase? Hint: what is the probability of seeing no metaphases in 100 cells?

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

Conditional probability and Bayes' Theorem:

Often our estimate of the probability on an event B will be modified based on partial information that we have or because we know a particular event A has taken place. This kind of probability is called conditional probability. For the events A and B we often use the phrase, the probability of B given A. This is often written as P(B | A) where the '|' means 'given'. If two events are independent then it is always the case that

P(B | A) = P(B) and if two events are mutually exclusive the P(B | A) = 0.

Conditional probability is particularly useful where events are correlated with each other or in situations where we are given partial information about an event that restricts what part of the total sample space we need to examine.

Example suppose we have two die and we are interested in the total number of dots that come up when both dots are tossed.

What is the probability of this sum being 5?

If we don't know any thing in advance then we are interested in the outcomes 0 + 5, 1 + 4 , 2 + 3 , 3 + 2 , 4 + 1 , 5 + 0 since these are the combinations of numbers which add up to five dots total.

The probability of this is clearly going to be 6*1/6^2 = 6/36 = 1/6. Where does the 1/^6^2 come about from?

What is P(5 dots total for both die | the first die comes up with 1 dot)?

This turns out to be 1/6 since now we only have to deal with the second die and there is only one possibility for the second die where the total dots is 5, namely the second die comes up 4.

Useful rule: Baye's theorem for conditional probabilities:

P(B | A ) = P(A and B)/P(A)

So for our dice problem

P(5 dots total for both die | the first die comes up with 1 dot) = P(second die comes up 4 and the first die comes up 1)/P(first die comes up 1)

or assuming the die are independent:

P(second comes up 4)*P(first comes up 1)/P(first comes up 1)

Bayes theorem seems simple but it very important in genetic counseling and pedigree analysis.

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

Application to genetics.

7. Many types of color blindness are what are called X linked, that is determined by genes on the X chromosome. Suppose a woman is carrying one X chromosome with the gene for a particular type of color blindness; her other X chromosome does not have this gene. If she is married to a man who does not have this gene on his X chromosome, consider the following:

A. What is the probability that her first child will carry the X chromosome with the gene associated with color blindness?

Hint: remember the male has sex chromosomes X,y. For now let 'X*' be an X chromosome carrying the allele for color blindness and 'X' be the chromosome carrying the allele for normal vision.

B. An individual will have color vision if he or she has at least on X chromosome without the gene for color blindness. What is the probability that the child is color blind?

Hints: P(color blind) = 1 - P(at least one X chromosome without the gene for color blindness).

1 - P(at least one x chromosome without the gene for color blindness AND child is female)*P(child is female) +

P(at least one x chromosome without the gene for color blindness AND child is male)*P(child is male)

C. Suppose amniocentesis reveals that the child is male. What is the probability that the child is color blind.

Hint: use Baye's theorem or tree diagrams discussed below.

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

Summary of tricks you should know:

Counting tricks

The number of ordered events: In general if for an event you can have one of N possible outcomes, then if you have M joint events then the total number of possible combinations of M joint events is going to be N^M. Thus for a single die there are six possible outcomes. If you have 10 dice, there are 6^10 possible outcomes if all 10 dice are tossed at once, 6 outcomes for the first die times 6 for the second die etc.

The number of unordered joint events where there are N possible outomes per event and M joint events:

Number of unordered joint events = N!/[(M!)(N-M)!]

Where N! is N factorial or 1*2*3* ...*N

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

Probability tricks

Multiplication rule for the joint  probability of two independent events A,B:

P(A and B) = P(A)*P(B)

So for example if we flip two coins P(H on the first coin and H on the second coin) = P(H on the first coin)*P(H on the second coin) = 1/4

Addition rules for mutually exclusive events:

Given N mutually exclusive events {A1,A2, ... ,AN} then

P(A1) + P(A2) + ... + P(AN) = 1.0

Here is a really useful trick:

P(A particular mutually exclusive event) = 1 - P(all the others)

Baye's theorem for conditional probabilities:

P(B | A ) = P(A and B)/P(A)

Here P( B | A) means B given that A is true.

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

Tree or branching diagrams

Often it is useful to visualize genetics problems in terms of tree or branch diagrams and we will often make use of this technique. The basic idea is that if you have an event with a number of possible outcomes then you can represent the possible outcomes for the event as branches on a tree and write the probabilites associated with each outcome by the branch. The simplest tree useful tree diagram is one for two possible outcomes for a single event, say the toss of a coin where there is just the possible events {T,H}.

Suppose we now have two independent tosses of the same coin or two separate coins tossed together. We can represent this situation as the accompanying tree diagragm.

Notice you can get the probability of any ordered combination of T, H but just traveling down all the possible paths on the tree and multiplying the probabilities as you go.  Thus P{TH} = 1/2 x 1/2 = 1/4.  We can of course get this result using our multiplication rule for two independent events, but tree diagrams are an easy way to conceptualize what is going on in a particular problem. Often times genetics problems are done using so called Punnett Squares and you may have been taught that particular technique. Tree diagrams have some big advantages over Punnett Squares:

• It is easier to visually make the connection between the diagram and the underlying probability concepts using tree diagrams
• Tree diagrams force careful analysis of the problem
• Tree diagrams allow easier handling of conditional probabilities. This will becone very apparent when we deal with the topic of linkage later on.

Here is the tree diagram for the colorblindness problem. Note that I show the possible gametes for the X*X female and then for each of the gametes produced by the female have a second set of branches for the possible male gametes. I could have just as easily done my tree starting with the male gametes. The outcomes in terms of the off spring would still be the same. You might draw the diagram starting with the male's gametes first just for practice.

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

Problems to work on in groups

Counting problems

1.  If you have 10 coins, how many possible combinations of heads and tails are there for all 10 coins?  Hint: how many combinations for 1 coin; two coins; three coins?

2. Proteins are made up of chains of amino acids.  Insulin is a relatively small protein with 53 amino acid residues. How many possible proteins of length 53 can be made with 20 possible amino acids for each position in the protein?

3.  Humans have 23 pairs of chromosomes.  Gametes get one chromosome from each pair. How many possible gametes are possible just looking at combinations of chromosomes? Hints: Suppose an animal just has one pair of chromosomes, how many gametes are possible? In this problem we are ignoring any sort of recombination within each pair of chromosomes.

4. If you have a pair of die, how many combinations of two faces adding up to a 6 are possible? Hint: first think how many combinations of two numbers between 1 and 6 sum to six (e.g. 1 + 5, 5 + 1)?

Problems 5 and 6 deal with so called sampling without replacement and are more challenging than the other problems. We won't generally encounter this sort of problem in genetics, but they are related to conditional probability which we will encounter.

5. A standard deck of cards has 52 cards, ignoring the Joker.   If you are dealt four cards at random from a shuffled deck, how many hands are there that have all four aces? Hints: As the hand is dealt, the cards are removed from the deck. This is an example of sampling without replacement. If you are being dealt a hand, how many aces can you get on the the first card? How many for the second card given that you got an ace on the first card? How about the third card given that you already have two aces? And for the fourth card?

6. How many possible hands of four cards are there? Hint note that there are 52 ways of selecting the first card, 51 one ways of selecting the second card etc..

Probability problems

7.  For problem 1, what is the probability of getting all heads for the 10 coins (i.e, P(HHHHHHHHHH))?

8. For the die from problem 3 and 4, what is the probability of getting two faces adding up to a 6? Hints: First list all the events in the set of all sums equalling 6 then note that for a single die P(1) = 1/6 etc and use the rule for independent probabilities. For instance what is P(first die face =1 and second die face = 5)

9.  A particular hypothetical human disease occurs with a probabilty of 0.1 in males and with a probability of 0.4 in females.

A. Assuming that the frequency of males is 0.5 and females 0.5 in a very large population, what is the probability that an individual selected at random from this population will have the disease?

B. What is the probability that an individual will be male and have the disease? Hint: One way to do this is to use Bayes' theorem but the problem is better solved with a tree diagram.

VBS Home page,VBS Course Navigator, Biology 205- Genetics, Probability in Genetics, Previous Page, Next Page,top of page

pgd 02/23/02