Saturday, March 5, 2011

Handling large sequence sets (7) Encore



There is one more thing to say about the HITS project. Using bowtie worked well enough that I feel comfortable posting a plot similar to Fig 2 of the paper. Each point represents one of 1148 genes plotted. The x-axis is saturation (fraction of TA sites in that gene which were hit by the transposon in the library) and the y-axis is the selection index (ratio of total HITS in the lung compared to the library).

It doesn't look exactly like the paper, but it's pretty good.

I couldn't get matplotlib to do the semi-log plot, so I used R:


setwd('Desktop')
data = read.table('plot_data.txt',head=F)
color.list = rep('steelblue',length(data[,1]))
sel = data[,3] < 0.15
color.list[sel] = 'red'
sel = data[,2] < 0.4
color.list[sel] = 'lightgray'
plot(data[,2],data[,3],
col=color.list,log='y',pch=16,
xlab='saturation',ylab='selection index')


And that really is it for this. Here is hemR again: