Saturday, December 4, 2010

Don't forget grep

I've always had a hard time with text processing in R. (Of course, with RPy, I don't need to worry about this so much anymore). Still, it's important to remember the R function grep; the basic usage is:

grep(pattern, x)

Here's a very simple example:

names = paste(rep(c('A','B'),3),1:6,sep='')
> names
[1] "A1" "B2" "A3" "B4" "A5" "B6"
> sel = grep('B',names)
> sel
[1] 2 4 6
> colors = rep('red',length(names))
> colors[sel] = 'blue'
> colors
[1] "red" "blue" "red" "blue" "red" "blue"

We might use this in ape to color the labels for the tips of a tree depending on the origin of the sequence (say, database or patient sample).

In Python, you'd probably do something like this:

>>> L = list('AB')*3
>>> L = [item + str(i+1) for i,item in enumerate(L)]
>>> L
['A1', 'B2', 'A3', 'B4', 'A5', 'B6']
>>> cL = ['red'] * len(L)
>>> for i,n in enumerate(L):
... if 'B' in n:
... cL[i] = 'blue'
...
>>> cL
['red', 'blue', 'red', 'blue', 'red', 'blue']