Biology 205 General Genetics

Probability in Genetics.

Introduction.  In order to understand genetics you need to have some basic concepts concerning probability. The notion of probability and chance is used in every are of genetics from basic considerations of the outcome of meiosis and Mendel's so called laws of inheritance to gene sequencing problems and data base searches. The goal of this exercise is to introduce the important probability concepts and illustrate their use with some elementary examples from genetics.

Learning goals:



Basic Definitions:

Sample space.

All possible outcomes for a random experiment. For example. If you flip a coin once the sample space is the set {H,T}. Suppose you have two coins the sample space when both coins are flipped is {HH, HT, TH, TT}. An event is a subset of the sample space.

Activity 1

Suppose you have two dice each with faces 1 through 6. List the sample space (the set of all possible outcomes for a single toss of both dice):



Activity 2

You have four balls in an urn. A red ball, a green ball, a blue ball and an orange ball. Suppose you remove the balls from the urn in random order. You might for instance pull the green ball out first, the blue one next etc. List the sample space for this experiment. Hint you are listing all possible arrangements of the four balls.

Ordered Vs unordered event.

If the sequence of the joint occurrence of several events is important then this is ordered. For example, maybe it is important to distinguish between {HT} and {TH} when two coins are flipped. This is an ordered event. Another example would be the sample space for the experiment in activity 2. Suppose for the coin example if you define your event as being 'exactly one head' when two coins are flipped, then this can happen in two ways {HT} and {TH}. So the event 'exactly one head' is an unordered event because you don't care which head comes first! An unordered events if you think about it carefully describes a subset of ordered events in your original sample space.

Activity 3

Returning to the dice example from activity 1. List the outcomes in the sample space that correspond to the unordered event: 'the sum of both faces when the dice are tossed once is equal to 6':


Some counting rules: Often times we won't want to list every possible of event but instead need to count how many there are and there are some useful rules.

Counting rule for the number of ordered events: In general if for a single event you can have one of X possible outcomes, then if you have Y joint events then the total number of possible combinations of Y joint events is going to be X^Y. Thus for a single die there are six possible outcomes. If you have 10 dice, there are 6^10 possible outcomes if all 10 dice are tossed at once, 6 outcomes for the first die times 6 for the second die etc.


A powerful rule for unordered events: In general we can ask for N experiments with two possible outcomes for each experiment about the number of events involving M of the first outcome and N-M of the second outcome. For example if you flip 10 coins you can ask about the number of ways you can get 6 heads in 10 flips. This is going to be all the possible ordered combinations of 6 Heads in 10 flips{ TTTTHHHHHH, THTTTHHHHH, etc.}.Useful rule for the number of unordered joint events where there are N possible independent events and M events of one particular type: The number of Number of possible unordered joint events = N!/[(M!)(N-M)!]Where N! is N factorial or 1*2*3* ...*N.

For our coin example, the number of ways you can get 6 heads in 10 tosses is:10!/(6! 4!) = 10*9*8*7/(4*3*2*1)Make sure you understand why this is!.Note that for this example the total number of possible ordered events is 2^N or 2 raised to the Nth power.

In statistics the expression

N!/[(M!)(N-M)!] is often called N choose M which represents the number of ways of selecting M things out of a group of N things.


Applications of these ideas to genetics.


Activity 4

A. The cells in your body receive half their chromosomes from your father and one half of their chromosomes from your mother. So for each pair of homologous chromosomes one will be a maternal chromosome and one will be a paternal chromosome.. During meiosis when the haploid gametes are formed, each member of the pair but not both ends up in a gamete. This is the principle of segregation.1. For humans (N =23) how many possible combinations of maternal and paternal chromosomes are there? Show your reasoning. Hint: are we dealing with ordered or unordered events here?




B, How many of the combinations from part A involve 10 maternal chromosomes and 13 paternal chromosomes? Hint are we dealing with an ordered or unordered event here?


Activity 5

Recall that DNA is made from a sequence of four nucleotides that differ in terms of which nitrogen base A,T,G,C is present in each nucleotide. Suppose you have a region of DNA which is 100 nucleotides long. How many possible nucleotide sequences are there for this region of DNA?



Typically we repeat experiment many times in genetics and other areas of science and we are often interested in how often a certain outcome might be expected to happen. The expected frequency of a particular event when an experiment is repeated an infinite number of times is the probability of the event. For a single coin toss, the probability of a head on a single toss is 1/2. Probabilities are always assumed to be real numbers between 0 and 1. Probabilities in genetics are often predicted based on certain hypotheses and then the predictions are used to test the hypothesis using real data. We will refer to the probability of an event as P(event) or sometimes Pr(event). So for a coin flipped once and only once for the sample space {T,H} P(H) = 1/2. Notice that in the absence of other information, we will often assume that the probability of elementary events such as the result of a single toss of a coin are equally likely. The problem is for us is going to be when elementary events are combined.


Independent and mutually exclusive events:

Two events A and B are said to be independent events if the probabilities of both events happening jointly.

Useful rule: multiplication rule for the joint  probability of two independent events A,B:

P(A and B) = P(A)*P(B)

So for example if we flip two coins P(H on the first coin and H on the second coin) = P(H on the first coin)*P(H on the second coin) = 1/4

Two events are said to be mutually exclusive if P(A and B) = 0

For example for a single flip of a coin the event H and the event T are mutually exclusive since they cannot happen at the same time.

Note that if for all the mutually exclusive events in a sample space the probabilities must some to 1. Thus P(A or B) = 1

For example a die has 6 different faces which are mutually exclusive. If each one is equally likely then the probability of any on occurring on a single toss is 1/6. So all the probabilities have to sum to 1 for our sample space{1,2,3,4,5,6}

Useful addition rules for mutually exclusive events:

Rule 1: Given N mutually exclusive events {A1,A2, ... ,AN} then

P(A1) + P(A2) + ... + P(AN) = 1.0


Rule 2: P(A particular mutually exclusive event) = 1 - P(all the others) This rule is very useful!

For example:

Suppose I toss a coin 10 times (or 10 coins once) and ask what is the probability of getting at least one head in 10 tosses. We could do this in two ways one is to sum probabilities P(1 at least head in 10) + P(2 heads in 10) + P(10 heads in 10) or we can simply go:

P(1 at least head in 10 tosses) = 1 - P(no heads in 10 tosses) = 1 - 1/(2^10)


Application to genetics:

Activity 6

For maternal and paternal chromosomes in the human gamete example, what is the probability of a gamete having at least one maternal chromosome?




Activity 7

For a type of tissue the probability of a randomly selected cell being in a particular stage of the cell cycle is given by the following table:

Cell cycle Probability

Interphase 0.5

Prophase 0.1

Metaphase 0.05

Anaphase 0.2

Telophase 0.15


Suppose you examine 100 cells from this tissue.

A. On average what proportion of cells would you expect to be in metaphase?




B. What is the probability of seeing at least one metaphase?




Conditional probability and Bayes' Theorem:

Often our estimate of the probability on an event B will be modified based on partial information that we have or because we know a particular event A has taken place. This kind of probability is called conditional probability. For the events A and B we often use the phrase, the probability of B given A. This is often written as P(B | A) where the '|' means 'given'. If two events are independent then it is always the case that

P(B | A) = P(B) and if two events are mutually exclusive the P(B | A) = 0.

Conditional probability is particularly useful where events are correlated with each other or in situations where we are given partial information about an event that restricts what part of the total sample space we need to examine.

Example suppose we have two die and we are interested in the total number of dots that come up when both dots are tossed.

What is the probability of this sum being 5?

If we don't know any thing in advance then we are interested in the outcomes , 1 + 4 , 2 + 3 , 3 + 2 , 4 + 1 since these are the combinations of numbers which add up to five dots total. Note that even though ultimately we are not interested in the order of the dice to calculate the probability that the sum of the two faces is 5 we still have to treat 1+4, 1 on the first die and 4 on the second die, as different than 4+1.

The probability of this is clearly going to be 6*1/6^2 = 6/36 = 1/6. Where does the 1/6^2 come about from?

What is P(5 dots total for both die | the first die comes up with 1 dot)?

This turns out to be 1/6 since now we only have to deal with the second die and there is only one possibility for the second die where the total dots is 5, namely the second die comes up 4.

Bayes' theorem for conditional probabilities:

P(B | A ) = P(A and B)/P(A)

So for our dice problem

P(5 dots total for both die | the first die comes up with 1 dot) = P(second die comes up 4 and the first die comes up 1)/P(first die comes up 1)

or assuming the die are independent:

P(second comes up 4)*P(first comes up 1)/P(first comes up 1)

Bayes theorem seems simple but it very important in genetic counseling and pedigree analysis.

Application to genetics.

Activity 8

Many types of color blindness are what are called X linked, that is determined by genes on the X chromosome. Suppose a woman is carrying one X chromosome with the gene for a particular type of color blindness; her other X chromosome does not have this gene. If she is married to a man who does not have this gene on his X chromosome. You may remember that color blindness is X linked recessive.

A. What is the probability that her first child will carry the X chromosome with the gene associated with color blindness?




B. Suppose amniocentesis reveals that the child is male. What is the probability that the child is color blind. Hint: this involves conditional probability





Tree or branch diagrams: a very useful trick!

Often it is useful to visualize genetics problems in terms of tree or branch diagrams and we will often make use of this technique. The basic idea is that if you have an event with a number of possible outcomes then you can represent the possible outcomes for the event as branches on a tree and write the probabilities associated with each outcome by the branch. The simplest tree useful tree diagram is one for two possible outcomes for a single event, say the toss of a coin where there is just the possible events {T,H}.

Suppose we now have two independent tosses of the same coin or two separate coins tossed together. We can represent this situation as the accompanying tree diagram.


Notice you can get the probability of any ordered combination of T, H but just traveling down all the possible paths on the tree and multiplying the probabilities as you go.  Thus P{TH} = 1/2 x 1/2 = 1/4.  We can of course get this result using our multiplication rule for two independent events, but tree diagrams are an easy way to conceptualize what is going on in a particular problem. Often times genetics problems are done using so called Punnett Squares and you may have been taught that particular technique. Tree diagrams have the following big advantages over Punnett Squares:







Activity 9

Suppose you have two dice and want to determine the probability that when both dice are tossed you will get AT LEAST one 6. Represent this problem as simple tree diagram. Think!






Activity 10

Solve the following problems:

Counting problems

1.  If you have 10 coins, how many possible combinations of heads and tails are there for all 10 coins?  Hint: how many combinations for 1 coin; two coins; three coins?




2. Proteins are made up of chains of amino acids.  Insulin is a relatively small protein with 53 amino acid residues. How many possible proteins of length 53 can be made with 20 possible amino acids for each position in the protein?




3. You may remember that in messenger RNA each group of 3 nucleotide bases (A,U, G, C) is called a codon. How many possible codons (for instance AAA) are possible?




4. Suppose you have a deck of cards(52 different cards) How many possible hands of four cards are there?


A powerful rule for probabilities for unordered events

Often times scientists are interested in the probability that a certain number of a particular outcome will happen in a group of a particular size. For instance suppose there are 10 babies born on a given day in a hospital and there are 3 girls and 7 boys born. A scientist might be interested in determining how likely this outcome would be if for each birth the probability of getting a girl is 1/2 and the probability of getting boy is 1/2. By the 19th century scientists including Gregor Mendel realized that this sort of problem can be solved using the binomial expansion of

(a + b)^n where n is any positive integer corresponding to the size of the group. You may know from algebra that each term of the binomial expansion is given by:

n!/((n-m)!m!)a^(n-m) * b^m

For the baby example let n = 10 and m = 3, and both a and b = 1/2. Remember that n! is n factorial.

Thus P(3 girls and 7 boys for 10 births) can easily be solved for by substitution. What is this probability?

Activity 11

A biologist is studying an endangered species of lemur and has found that for some reason on average 80% of the offspring at birth are females and 20% of the offspring are males. Suppose a zoo anticipates that it's captive lemurs will produce 30 offspring.

A. What is the expected number of females in this batch of offspring?




B. What is the probability that no males will be born?




C. What is the probability that no females will be born?




D. What is the probability that exactly 5 males will be born? Use the binomial expansion.



Activity 12

Albinism in humans is controlled by a recessive gene (c). Suppose two heterozygous individuals (genotype Cc) marry. Assume they have six children.

A. What is the probability that at least one child will be albino?




B. What is the probability that three will be albino and three will have pigmentation?





C. A challenge: What is the probability that three of the children will be boys AND all three boys will be albino? Hint assume gender is independent of whether or not a child turns out to be albino. Not too tricky if you think carefully!




Activity 13

These sorts of probability calculations are common in genetic counseling. Huntington's disease is rare condition inherited as a dominant lethal. Individuals who are Hh live well into adulthood, oftentimes having children before the debilitating effects of the disease become obvious, hh individuals live a normal life. A woman whose father died of Huntington's disease marries a man whose parents did not have the disease. Assume the woman's mother did not develop the disease either. If the woman and the man decide to have a child, what is the probability that the child will develop Huntington's?

Hint for solving this problem: Huntington's is very rare so what is the most likely genotype of the woman's father?






pgd revised 08/27/03