Thursday, November 18, 2010

RPy: R from Python (2)

I took another try at RPy this morning (earlier post here). I think I'm on the same machine but not positive. I got it by easy_install and have:


R version 2.10.0 (2009-10-26)
[R.app GUI 1.30 (5511) x86_64-apple-darwin9.8.0]
rpy2 2.1.7-20101104


Let's re-run the example from the other day:

>>> import rpy2.robjects as robjects
>>> from rpy2.robjects.packages import importr
>>> importr('Bolstad')
Warning message:
package 'Bolstad' was built under R version 2.10.1
<rpy2.robjects.packages.SignatureTranslatedPackage object at 0x10056c890>
>>> binobp = robjects.r['binobp']
>>> result = binobp(68,200,1,1)
Posterior Mean : 0.3415842
..

which launches an X11 window for the plot and also prints a summary of the results. The result variable is:

>>> result
<Vector - Python:0x1005792d8 / R:0x100c6ca10>
>>> type(result)
<class 'rpy2.robjects.vectors.Vector'>
>>> help(result)

Help on Vector in module rpy2.robjects.vectors object (in part):

class Vector(rpy2.robjects.robject.RObjectMixin, rpy2.rinterface.SexpVector)
| R vector-like object. Items can be accessed with:
| - the method "__getitem__" ("[" operator)


It has __getitem__.

Since it's a Vector, that sounds like it should behave like a list rather than a dictionary, and consistent with that we cannot retrieve items by name, only by index:

>>> result['mean']
Traceback (most recent call last):
File "", line 1, in
File "build/bdist.macosx-10.6-universal/egg/rpy2/robjects/vectors.py", line 183, in __getitem__
TypeError: 'str' object cannot be interpreted as an index
>>> result[4]
<FloatVector - Python:0x100575ab8 / R:0x102a857d8>

The thing we were missing previously was a way to associate the components of the R list (like posterior, mean, etc.) with the values in the result. I can see in the help(result) above what was suggested in Comments, which is the names attribute.

>>> print result.names
[1] "posterior" "likelihood" "prior" "pi" "mean"
[6] "var" "sd" "quantiles"
>>> names = result.names
>>> type(names)
<class 'rpy2.robjects.vectors.StrVector'>
>>> for n in names:
... print n
...
posterior
likelihood
prior
pi
mean
var
sd
quantiles

result.names is a StrVector, and does not support index, however it can be converted to a Python list easily enough:

>>> names = result.names
>>> names.index('mean')
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'StrVector' object has no attribute 'index'
>>> names = list(names)
>>> names.index('mean')
4



>>> mean = result[names.index('mean')]
>>> mean
<FloatVector - Python:0x100571d88 / R:0x102a857d8>
>>> mean = list(mean)
>>> type(mean[0])
<type 'float'>
>>> mean[0]
0.34158415841584161

So I guess the thing is that the objects we're using are complex because we may still want to do R-like things with them, but they can be converted to standard Python types if you know the right attribute to access. In this case it should have been easy since that's the only one that really looks promising in the help. But the way, this is precisely the thing that bugs me the most about R, complex objects that I spend a long time trying to figure out how to take them apart to get what I want. We'll have to see how things go with more standard R lists etc.

The documentation seems to be a bit dated, using for example this:

from rpy import *
r.wilcox_test

which doesn't work even substituting rpy2

>>> from rpy2 import *
>>> r.wilcox_test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'r' is not defined



>>> import rpy2.robjects as robjects
>>> robjects.r['wilcox_test']
Traceback (most recent call last):
File "<stdin>", line 1, in
File "build/bdist.macosx-10.6-universal/egg/rpy2/robjects/__init__.py", line 241, in __getitem__
LookupError: 'wilcox_test' not found

That sure looks like a bug.
[UPDATE: It's not.

>>> from rpy2 import robjects
>>> from rpy2.robjects.packages import importr
>>> stats = importr('stats')
>>> stats.wilcox_test
<SignatureTranslatedFunction - Python:0x100586248 / R:0x100ad5678>

/UPDATE]
But this works:

>>> seq = robjects.r['seq']
>>> grid = seq(0,10,length=100)
>>> grid
<FloatVector - Python:0x1050146c8 / R:0x1025b3930>
>>> degrees = 4
>>> dchisq = robjects.r['dchisq']
>>> values = [dchisq(x,degrees) for x in grid]
>>> par = robjects.r['par']
>>> par(ann=0)
<Vector - Python:0x104fffbd8 / R:0x100bdd148>

launched X11

>>> plot = robjects.r['plot']
>>> plot(grid, values, type='lines')
Warning message:
In plot.xy(xy, type, ...) :
plot type 'lines' will be truncated to first character
<RObject - Python:0x100581ab8 / R:0x1008dad78>

We're directed to /examples in the distribution. Where is it? It's not in site-packages except in .egg format

>>> rpy2.__doc__
>>> rpy2.__file__
'/Library/Python/2.6/site-packages/rpy2-2.1.7_20101104-py2.6-macosx-10.6-universal.egg/rpy2/__init__.pyc'


$ cd /Library/Python/2.6/site-packages/rpy2-2.1.7_20101104-py2.6-macosx-10.6-universal.egg/rpy2
-bash: cd: /Library/Python/2.6/site-packages/rpy2-2.1.7_20101104-py2.6-macosx-10.6-universal.egg/rpy2: Not a directory

I guess we'll go to the Demos linked on the main doc page.

One last example, which is not quite what I was hoping for!

>>> rnorm = robjects.r['rnorm']
>>> cbind = robjects.r['cbind']
>>> x = rnorm(1000,0,1)
>>> y = rnorm(1000,0,4)
>>> M = cbind(x,y)
>>> plot = robjects.r['plot']
>>> plot(M)
<RObject - Python:0x100570908 / R:0x1008dad78>

plot(x,y) looks the same. Hmmm...

[ UPDATE: Solved! See here ]