Sunday, February 28, 2010

Jukes-Cantor (5)

I am trying to see how the equations in the Jukes-Cantor model of sequence evolution work, and then eventually, extend this to other models. In order to test my understanding, I'll want to work out some practical examples. But I'm not there yet.

What I want to do here is to wrap up something from the first post. There we had two differential equations for the rate of change of a particular nucleotide position:

d/dt(PXX(t)) =  -3*α*e-4αt
d/dt(PXY(t)) = α*e-4αt

And we'd like to express these results in terms of PXX(t) and PXY(t):

PXX(t) = 1/4 + 3/4*e-4αt
PXY(t) = 1/4 - 1/4*e-4αt

Taking the first one, we have

PXX(t) = 1/4 + 3/4*e-4αt
3*e-4αt = 4*(PXX(t) - 1/4)
-3*α*e-4αt = -4*α*(PXX(t) - 1/4)
d/dt(PXX(t)) = α - 4*α*PXX(t)

And for the second

PXY(t) = 1/4 - 1/4*e-4αt
e-4αt = 1 - 4*PXY(t))
α*e-4αt = α - 4*α*PXY(t))
d/dt(PXY(t)) = α - 4*α*PXY(t)

So the slopes are proportional to the probabilities, with an extra term. But the most interesting thing is that the form is the same for both PXX and PXY!

I wasn't expecting this but it makes sense, because at long times we come to equilibrium (the stationary distribution of the Markov chain), and all rates are the same. At time-zero we have PXX = 1 and the rate is -3*α, while PXY = 0 and the rate is α. I think it's OK.