Python for Bioinformatics: Plotting the normal distribution with Python

Wednesday, February 24, 2010

Plotting the normal distribution with Python

It is nice to be able to add a plot of the normal distribution on top of another plot, say a histogram of your data. I've done it before from R (here) using code like this (which assumes we have some data in an array M):

plot(function(x) dnorm(x,0,sd(M)),
  min(M),max(M),lwd=3,
  add=T,col='red')

I wanted to find out how to do this using numpy and matplotlib. It turns out that while R has these functions built-in, numpy doesn't seem to have them. On this page showing an example exactly like what I want to do, the SciPy docs implement the pdf for the normal distribution as if it had come straight out of wikipedia. So, we'll do the same!

Here is a first shot at it, just the normal pdf, without any sample data. If you want "curves", (or what look like curves but are actually very closely spaced "points"), just make the variable dx smaller. The plot code is similar to what we did the other day, with the Poisson distribution.

import math
import numpy as np
import matplotlib.pyplot as plt

def normal(mu,sigma):
    def f(x):
        z = 1.0*(x-mu)/sigma
        e = math.e**(-0.5*z**2)
        C = math.sqrt(2*math.pi)*sigma
        return 1.0*e/C
    return f

X = 2
dx = 0.1
R = np.arange(-X,X+dx,dx)

L = list()
sdL = (0.5,1,2,3)
for sd in sdL:
    f = normal(mu=0,sigma=sd)
    L.append([f(x) for x in R])
colors = ['r','b','purple']

for c,P in zip(colors,L):
    plt.plot(R,P,zorder=1,color='0.2',lw=1.5)
    plt.scatter(R,P,zorder=2,s=50,color=c)
    
ax = plt.axes()
ax.set_xlim(-2.1,2.1)
#ax.set_ylim(-0.01,0.5)
plt.savefig('example.png')