The Hardy Weinberg Law and Natural Selection .

VBS Home page,Syllabus, Previous Page, Next Page

Goal: Investigate the Hardy Weinberg law of equilibrium and some simple models of natural selection used in population genetics. After today you should be able to do the following:


The Hardy Weinberg law of equilibrium is one of the most important concepts in population genetics. First of all for many situations involving complete dominance it provides a way to infer what proportion of the population consists of homozygous dominant vs. heterozygotes simply by knowing the proportion of the population that is homozygous recessive. This is particularly useful in answering questions about the frequency of individuals that might be carrying rare recessive alleles for human genetic disorders.

More importantly in population genetics, Hardy Weinberg provides a framework from which population geneticists construct models to study the evolutionary process, particularly at the level of micro-evolution or genetic change in a population.

The Hardy Weinberg Law of Equilibrium.

Suppose we are studying a single locus with two alleles A and a. Let p = the frequency of the A allele in the population and q be the frequency of the a allele. The Hardy Weinberg says that in the absence of evolution then the following relationship holds or rapidly become true:

The frequency of the genotypes AA, Aa and aa respectively are given by p2, 2pq and q2. Note that these terms are simply the expansion of the following:

(p +q) 2

Question 1. Phenylketonuria(PKU) is an autosomal recessive disorder. The frequency of individuals who have PKU is about 1/12,000. Use the Hardy Weinberg law to estimate the frequency of individuals that are heterozygous carriers of PKU.

Question 2 . Suppose you have three alleles in a population A1, A2 and A3 which are found with frequencies p, q, and r respectively. 

A. List the possible genotypes

B. Note that Hardy Weinberg law can be generalized to this situation as an expansion of

(p + q + r)2 . Use this fact to estimate the frequencies of the possible genotypes in a population where p = 0.2 , q = 0.4. 

Hint: why didn't I give you r ?

Question 3. A puzzler!

The ABO blood system is a well-known example of codominance. A certain population has the following frequencies for the blood types:

A =0.40

B =0.27

AB =0.24

O = 0.09

A. Estimate the frequency of the 'i' allele in the population. 

B. Let p be the frequency of the IA allele in the population and q be the frequency of the IB allele in the population and r be the frequency of the 'i' allele. Estimate the frequency of the alleles. Clearly show all your work.


·Assume Hardy Weinberg can be extended to situations where you have multiple allele systems.

·Assume random mating, which means random combinations of gametes.

Write a general algebraic expression for the frequency of blood type A in terms of a function of p and r. A certain formula from algebra dealing with solving certain equations might be useful. J

Question 4. Suppose I am a geneticist studying sickle cell anemia in a particular population and I find the following frequencies genotype frequencies in 1000 infants screened for the BA and BS alleles: 

BABA = 0.60

BABS = 0.35

BSBS = 0. 05

A. Estimate the frequency of the BA and the BS alleles in the population. Hint: remember how we did the snapdragon example in lecture. See page 98 - 99 for a discussion of the genetics of sickle cell.

B. Use the resulting allele frequencies to predict what the genotype frequencies should be under Hardy Weinberg equilibrium.

C. Do the observed data suggest that the Hardy Weinberg law holds for this population? Use the observed data in A and the (predicted or expected) genotype frequencies to test this by means of a certain statistical test.

Simple models of natural selection.

The Hardy Weinberg law ideally applies when 5 basic assumptions are met which you should review. Population geneticists attempt to develop mathematical models of evolution by modeling what happens when one or more of the assumptions of the Hardy Weinberg law are seriously violated. 

Natural selection refers to the differential survival and reproductive success of different phenotypes in a population in response to environmental conditions. Lets suppose for instance that butterflies in a certain population come in three wing colors: white, grey and black. Suppose that we let W11, W 12 and W22 be the relative number of offspring left by each type of butterfly. This relative number is often called Darwinian fitness. See page 657 in your text. 

So for example suppose the white butterflies leave on average 15 offspring per butterfly that survives to adult hood, the grey butterflies leave 18 offspring per butterfly and the black butterflies leave 9 offspring per butterfly. We then use these to calculate the Darwinian fitnesses per adult butterfly of each phenotype as:

W11 = 15/18; W12 = 18/18; W22 = 9/18

Notice that the Darwinian fitnesses are relative to the phenotype with the highest absolute reproductive success. Mathematically this just normalizes the fitness values so that the maximum Darwinian fitness is 1.0. One thing this allows is the expression of fitness in terms of what is called a selection coefficient which is simply 1 -Wij for each of the Darwinian fitnesses. 

For our example what are the selection coefficients S1, S12 and S2 for each phenotype?

Often times it is easier to express population genetics models in terms of the selection coefficients rather than the Darwinian fitnesses. 

Note that natural selection is simply differential survival and reproduction of different phenotypes and that natural selection will only lead to evolution if the phenotype is at least in part tied to genetics. So lets suppose for simplicity that wing color in this butterfly is controlled by two alleles A1 and A2 . Complete the following table for a hypothetical butterfly population:  
Phenotype White Grey Black
Genotype A1 A1 A1 A2 A2 A2
Number of Adults in parent generation (time t) 1000 750 100
Reproductive success per adult of each genotype 11 18 5
Darwinian Fitnesses W11 = W12 = W22 =
Selection Coefficients S11 = S12 = S22 =

Now we have all the information necessary to predict how the frequencies of the A1 and A2 alleles will change over for this population. We will do this by writing an expression for p', the allele frequency of A1 at time t+1 as a function of the current value of p. 

Question 5. 

Use the data from the table to calculate the frequencies p of the A1 allele and q for the A2 allele during the parent generation. This will also give you the frequency of each type of gamete produced by the parents. Remember that p + q =1.0 so check yourself.

Average fitness of the alleles

In simple selection models written in terms of p and q we can calculate the average Darwinian fitness (W 1 and W2) of each allele:

This is 

W1 = (W11*1000 + 1/2 W12*750)/ (1000 + 1/2*750) for A1 and:

W2 = (1/2 W12 *750 + W22*100)/ (1/2*750 + 100) for A2

The A1 allele will spread if W1 > W2 and A2 the allele will spread if W2 > W1.

Which allele has the higher average fitness and will thus become more common in the very next generation?

The expression for the average allele fitnesses can be rewritten in terms of gene and genotype frequencies and this turns out to be:

(Equation 1A) 

W1 = (W11 p11 + 1/2W12p12)/(p11 + 1/2p12 )

(Equation 1B) 

W2 = (W22p22 + 1/2W 12p12/(p22 + 1/2p12)

Where p11 is the frequency of one of the A1 A1 genotype, p12 the frequency of the A1 A2 genotype and p 22 the frequency of the A2 A2 genotype in the actual population.  Notice that the average allele fitnesses are really functions of the genotype Darwinian fitness and the frequecies of the different genotypes. This should suggest to you that the average allele fitnesses are not constants and that the predictions made about the spread of an allele hbased on average allele fitness are only good locally at that particula set of genotype frequencies.

Generally analyses based on differences in average allele fitness are used only when the question under consideration is only is an allele with stread when it is very rare.

Equations for change in allele frequencies

What we typically want is an expression for p' in terms of p during the previous generation when natural selection is operating but all the other assumptions of Hardy Weinberg apply. It turns out that this is relatively easy and is given by:

p' = W1p/(W1p + W2 pq) and it turns out that this becomes:

(Equation 2A)

p' = [W11p2 + W12pq]/W


(Equation 2B)

W = W11p2 + 2W12pq + W22q2 is called the average population fitness

So combining 2A and 2B gives a general formula for p'

(Equation 2C)

p' = [W11p2 + W12pq]/(W11p2 + 2W12 pq + W22q2)

This expression is the same as given in your text in tables 22.10 and 22.11 

Now we have an expression that allows us to follow the change in allele frequencies from generation to generation simply by reevaluating this expression at each generation.

Generally population geneticists use this general formula and derive algebraic expressions for specific cases in terms of the selection coefficients. Some of these common expressions for different types of natural selection are in table 22.12 of your text.

Notice these are written in terms of D p's that is the change in allele frequency between time t and the next time, The general form of the difference equation is simply:

(Equation 3A)

p = p' - p

= W11p2 + W 12pq/W - p

After some algebra this becomes:

= pq[p(W11 - W12 ) + q(W12 - W22)]/W

(Equation 3B)

D ppq[p(W11 -W12 ) + q(W12 - W22)] /( W11 p2 + 2W12pq + W22 q2)

Note that the corrsponding expression in your text is written in terms of the other allele which occurs with frequency q. It the same expression as equation 3, except multiplied by -1.  The trick to deriving equation 3B from equation 3A is to multiply the top and bottom of equation 3A by W and writing W in terms of the Darwinian fitnesses. Then this can be rearranged to yield equation 3B. The details are here

These difference equations are useful because they allow one, as we shall see, to investigate equilibrium values for allele frequencies, that is values of p where p does not change. Remember at equilibrium  Dp = 0. 

Question 6. 

A. Looking at table 22.12, which type of selection is shown in the butterfly example? You should refer to the Darwinian fitnesses and selection coefficients you calculated earlier to help you.

B. Substitute the values for the selection coefficients and write a specific expression for the butterfly example: Note that the formulas in the text in table 22.12 are given in terms of Dq so all you have to remember is that D p = -Dq

Stability analysis:

Often population geneticists can find equilibrium values and determine whether or not the equilibria are stable. This can be done using calculus, but often a graphical analysis is useful. You will analyze the selection model for the butterfly example. Graphical stability analysis involves plotting Dp as a function of p and examining the slope of the function around any equilibrium points.

Question 7. Use a calculator or spreadsheet and your formula from 6B or the general formula in equation 2c or equation 3B to complete the following table for p' as a function of p.

A simple Excell worksheet.
A more complex worksheet for graphing.  
D pp'-p 

Next roughly sketch the graph showing the equilibrium vales. If you have access to a programmable graphing calculator or spreadsheet program this should be easy. The points where the graph crosses the x axis ( Dp = 0 ) are equilibrium points.

B. How many equilibrium points are there for this model?

C. Which equilibrium is stable? How do you know? Hint: for a stable equilibrium point what happens to D p for values of p smaller than the equilibrium or larger than the equilibrium?

Graphical analysis example:

Graph of D p as a function of p:


Notice there are three equilibrium points one each at p = 0.0 and p =1.0 and the third at p = 0.65. Equilibrium points are typically either stable or unstable. Lets examine the first equilibrium point at p = 0. This means that the population only has the A2 allele. If a very small number of A1 alleles are introduced then this makes p slightly grater than zero. Taking this arbitrarily small value above the equilibrium point as the new p, notice that Dp is greater than zero and hence p' is larger. This means that over time p should increase for this happens as long as D p is positive. The opposite will be true for the equilibrium at p = 1.0. 

But what about the equilibrium at p = 0.65? Notice here is you pick a value of p slightly smaller than .65, say p = 0.64, then Dp is positive meaning that p will creep back up toward the equilibrium point. If p is say 0.66 then D p is negative and p should creep back to the equilibrium point. Hence, the thrird equilibrium point is stable.

This is a pretty traditional analysis. Some types of genetic systems in population genetics may exhilbit nonlinear behavior, for example complex models involving certain types of frequency dependent selection. These models are often best analysed by plotting p' as a function of p.

pgd created 04/18/03 revised 12/18/04