Python调用R编程——rpy2
在Python调用R,最常见的方式是使用rpy2模块。
简介
模块
The package is made of several sub-packages or modules:
- rpy2.rinterface —— Low-level interface to R, when speed and flexibility matter most. Close to R’s C-level API.
- rpy2.robjects —— High-level interface, when ease-of-use matters most. Should be the right pick for casual and general use. Based on the previous one.
- rpy2.interactive —— High-level interface, with an eye for interactive work. Largely based on rpy2.robjects.
- rpy2.rpy_classic —— High-level interface similar to the one in RPy-1.x. This is provided for compatibility reasons, as well as to facilitate the migration to RPy2.
- rpy2.rlike —— Data structures and functions to mimic some of R’s features and specificities in pure Python (no embedded R process).
在Python导入R进程
import rpy2.robjects as robjects
Python中的R包
导入R包
Importing R packages is often the first step when running R code, and rpy2 is providing a function rpy2.robjects.packages.importr() that makes that step very similar to importing Python packages.
from rpy2.robjects.packages import importr
# import R's "base" package
base = importr('base')
r实例
We mentioned earlier that rpy2 is running an embedded R. This is may be a little abstract, so there is an object rpy2.robjects.r to make it tangible.
在Python获得R对象
The __getitem__() method of rpy2.robjects.r, gets the R object associated with a given symbol
>>> pi = robjects.r['pi']
>>> pi[0]
3.14159265358979
执行R语句
The object r is also callable, and the string passed in a call is evaluated as R code.
>>> piplus2 = robjects.r('pi') + 2
>>> piplus2.r_repr()
c(3.14159265358979, 2)
>>> pi0plus2 = robjects.r('pi')[0] + 2
>>> print(pi0plus2)
5.1415926535897931
R对象的表达方式
An R object has a string representation that can be used directly into R code to be evaluated.
>>> letters = robjects.r['letters']
>>> rcode = 'paste(%s, collapse="-")' %(letters.r_repr())
>>> res = robjects.r(rcode)
>>> print(res)
"a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"
R向量
In R, data are mostly represented by vectors, even when looking like scalars. When looking closely at the R object pi used previously, we can observe that this is in fact a vector of length 1.
>>> len(robjects.r['pi'])
>>> robjects.r['pi'][0]
3.1415926535897931
创建R向量
Creating R vectors can be achieved simply.
>>> res = robjects.StrVector(['abc', 'def'])
>>> print(res.r_repr())
c("abc", "def")
>>> res = robjects.IntVector([1, 2, 3])
>>> print(res.r_repr())
1:3
>>> res = robjects.FloatVector([1.1, 2.2, 3.3])
>>> print(res.r_repr())
c(1.1, 2.2, 3.3)
The easiest way to create such objects is to do it through R functions.
>>> v = robjects.FloatVector([1.1, 2.2, 3.3, 4.4, 5.5, 6.6])
>>> m = robjects.r['matrix'](v, nrow = 2)
>>> print(m)
[,1] [,2] [,3]
[1,] 1.1 3.3 5.5
[2,] 2.2 4.4 6.6
调用R函数
Calling R functions is disappointingly similar to calling Python functions.
>>> rsort = robjects.r['sort']
>>> res = rsort(robjects.IntVector([1,2,3]), decreasing=True)
>>> print(res.r_repr())
c(3L, 2L, 1L)
By default, calling R functions return R objects.
一些例子
Linear models
from rpy2.robjects import FloatVector
from rpy2.robjects.packages import importr
stats = importr('stats')
base = importr('base')
ctl = FloatVector([4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14])
trt = FloatVector([4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69])
group = base.gl(2, 10, 20, labels = ["Ctl","Trt"])
weight = ctl + trt
robjects.globalenv["weight"] = weight
robjects.globalenv["group"] = group
lm_D9 = stats.lm("weight ~ group")
print(stats.anova(lm_D9))
# omitting the intercept
lm_D90 = stats.lm("weight ~ group - 1")
print(base.summary(lm_D90))
>>> print(lm_D9.names)
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "contrasts" "xlevels" "call" "terms"
[13] "model"
>>> print(lm_D9.rx2('coefficients'))
(Intercept) groupTrt
5.032 -0.371
>>> print(lm_D9.rx('coefficients'))
$coefficients
(Intercept) groupTrt
5.032 -0.371
Creating an R vector or matrix, and filling its cells using Python code
from rpy2.robjects import NA_Real
from rpy2.rlike.container import TaggedList
from rpy2.robjects.packages import importr
base = importr('base')
# create a numerical matrix of size 100x10 filled with NAs
m = base.matrix(NA_Real, nrow=100, ncol=10)
# fill the matrix
for row_i in xrange(1, 100+1):
for col_i in xrange(1, 10+1):
m.rx[TaggedList((row_i, ), (col_i, ))] = row_i + col_i * 100
R的高级接口
robject包
This module should be the right pick for casual and general use. Its aim is to abstract some of the details and provide an intuitive interface to both Python and R programmers.
>>> import rpy2.robjects as robjects
r:R的实例
The instance can be seen as the entry point to an embedded R process. The elements that would be accessible from an equivalent R environment are accessible as attributes of the instance.
>>> pi = robjects.r.pi
>>> letters = robjects.r.letters
>>> plot = robjects.r.plot
>>> dir = robjects.r.dir
When safety matters most, we recommend using __getitem__() to get a given R object.
>>> as_null = robjects.r['as.null']
Storing the object in a python variable will protect it from garbage collection, even if deleted from the objects visible to an R user.
>>> robjects.globalenv['foo'] = 1.2
>>> foo = robjects.r['foo']
>>> foo[0]
1.2
>>> robjects.r['rm']('foo')
>>> robjects.r['foo']
LookupError: 'foo' not found
>>> foo[0]
1.2
执行字符串中的R代码
Just like it is the case with RPy-1.x, on-the-fly evaluation of R code contained in a string can be performed by calling the r instance.
>>> print(robjects.r('1+2'))
[1] 3
>>> sqr = robjects.r('function(x) x^2')
>>> print(sqr)
function (x)
x^2
>>> print(sqr(2))
[1] 4
The astute reader will quickly realize that R objects named by python variables can be plugged into code through their R representation.
>>> x = robjects.r.rnorm(100)
>>> robjects.r('hist(%s, xlab="x", main="hist(x)")' %x.r_repr())
R语言环境
R environments can be described to the Python user as an hybrid of a dictionary and a scope.
The first of all environments is called the Global Environment, that can also be referred to as the R workspace.
Assigning a value to a symbol in an environment has been made as simple as assigning a value to a key in a Python dictionary.
>>> robjects.r.ls(globalenv)
>>> robjects.globalenv["a"] = 123
>>> print(robjects.r.ls(globalenv))
An environment is also iter-able, returning all the symbols (keys) it contains.
>>> env = robjects.r.baseenv()
>>> [x for x in env]
<a long list returned>
函数
R functions exposed by rpy2's high-level interface can be used:
- like any regular Python function as they are callable objects
- through their method rcall()
可调用性callable
from rpy2.robjects.packages import importr
base = importr('base')
stats = importr('stats')
graphics = importr('graphics')
plot = graphics.plot
rnorm = stats.rnorm
plot(rnorm(100), ylab="random")
This is all looking fine and simple until R arguments with names such as na.rm are encountered. By default, this is addressed by having a translation of ‘.’ (dot) in the R argument name into a ‘_’ in the Python argument name.
In Python one can write:
from rpy2.robjects.packages import importr
base = importr('base')
base.rank(0, na_last = True)
R is capable of introspection, and can return the arguments accepted by a function through the function formals().
>>> from rpy2.robjects.packages import importr
>>> stats = importr('stats')
>>> rnorm = stats.rnorm
>>> rnorm.formals()
<Vector - Python:0x8790bcc / R:0x93db250>
>>> tuple(rnorm.formals().names)
('n', 'mean', 'sd')
rcall()
The method Function.rcall() is an alternative way to call an underlying R function.
R的表达式——Formulae
For tasks such as modelling and plotting, an R formula can be a terse, yet readable, way of expressing what is wanted.
The class robjects.Formula is representing an R formula.
import array
from rpy2.robjects import IntVector, Formula
from rpy2.robjects.packages import importr
stats = importr('stats')
x = IntVector(range(1, 11))
y = x.ro + stats.rnorm(10, sd=0.2)
fmla = Formula('y ~ x')
env = fmla.environment
env['x'] = x
env['y'] = y
fit = stats.lm(fmla)
Other options are:
- Evaluate R code on the fly so we that model fitting function has a symbol in R
fit = robjects.r('lm(%s)' %fmla.r_repr())
- Evaluate R code where all symbols are defined
R包
导入R包
This is achieved by the R functions library() and require() (attaching the namespace of the package to the R search path).
from rpy2.robjects.packages import importr
utils = importr("utils")
向量和数组
Beside functions and environments, most of the objects an R user is interacting with are vector-like. For example, this means that any scalar is in fact a vector of length one.
The class Vector has a constructor:
>>> x = robjects.Vector(3)
创建向量
Creating vectors can be achieved either from R or from Python.
When the vectors are created from R, one should not worry much as they will be exposed as they should by rpy2.robjects.
When one wants to create a vector from Python, either the class Vector or the convenience classes IntVector, FloatVector, BoolVector, StrVector can be used.
因素向量 —— FactorVector
>>> sv = ro.StrVector('ababbc')
>>> fac = ro.FactorVector(sv)
>>> print(fac)
[1] a b a b b c
Levels: a b c
>>> tuple(fac)
(1, 2, 1, 2, 2, 3)
>>> tuple(fac.levels)
('a', 'b', 'c')
解析向量元素
Extracting, Python-style
The python __getitem__() method behaves like a Python user would expect it for a vector (and indexing starts at zero).
>>> x = robjects.r.seq(1, 5)
>>> tuple(x)
(1, 2, 3, 4, 5)
>>> x.names = robjects.StrVector('abcde')
>>> print(x)
a b c d e
1 2 3 4 5
>>> x[0]
1
>>> x[4]
5
>>> x[-1]
5
Extracting, R-style
Access to R-style extracting/subsetting is granted though the two delegators rx and rx2, representing the R functions [ and [[ respectively.
>>> print(x.rx(1))
[1] 1
>>> print(x.rx('a'))
a
1
向量赋值
Assigning, Python-style
Since vectors are exposed as Python mutable sequences, the assignment works as for regular Python lists.
>>> x = robjects.IntVector((1,2,3))
>>> print(x)
[1] 1 2 3
>>> x[0] = 9
>>> print(x)
[1] 9 2 3
In R vectors can be named, that is elements of the vector have a name.
>>> x = robjects.ListVector({'a': 1, 'b': 2, 'c': 3})
>>> x[x.names.index('b')] = 9
Assigning, R-style
The attributes rx and rx2 used previously can again be used:
>>> x = robjects.IntVector(range(1, 4))
>>> print(x)
[1] 1 2 3
>>> x.rx[1] = 9
>>> print(x)
[1] 9 2 3
For the sake of complete compatibility with R, arguments can be named (and passed as a dict or rpy2.rlike.container.TaggedList).
>>> x = robjects.ListVector({'a': 1, 'b': 2, 'c': 3})
>>> x.rx2[{'i': x.names.index('b')}] = 9
缺失值
In S/Splus/R special NA values can be used in a data vector to indicate that fact, and rpy2.robjects makes aliases for those available as data objects NA_Logical, NA_Real, NA_Integer, NA_Character, NA_Complex.
>>> x = robjects.IntVector(range(3))
>>> x[0] = robjects.NA_Integer
>>> print(x)
[1] NA 1 2
>>> x[0] is robjects.NA_Integer
True
>>> x[0] == robjects.NA_Integer
True
>>> [y for y in x if y is not robjects.NA_Integer]
[1, 2]
运算
To expose that to Python, a delegating attribute ro is provided for vector-like objects.
>>> x = robjects.r.seq(1, 10)
>>> print(x.ro + 1)
2:11
名字 —— Names
R vectors can have a name given to all or some of the elements. The property names can be used to get, or set, those names.
>>> x = robjects.r.seq(1, 5)
>>> x.names = robjects.StrVector('abcde')
>>> x.names[0]
'a'
>>> x.names[0] = 'z'
>>> tuple(x.names)
('z', 'b', 'c', 'd', 'e')
Array
In R, arrays are simply vectors with a dimension attribute. That fact was reflected in the class hierarchy with robjects.Array inheriting from robjects.Vector.
Matrix
A Matrix is a special case of Array. As with arrays, one must remember that this is just a vector with dimension attributes (number of rows, number of columns).
>>> m = robjects.r.matrix(robjects.IntVector(range(10)), nrow=5)
>>> print(m)
[,1] [,2]
[1,] 0 5
[2,] 1 6
[3,] 2 7
[4,] 3 8
[5,] 4 9
>>> m = ro.r.matrix(ro.IntVector(range(2, 8)), nrow=3)
>>> print(m)
[,1] [,2]
[1,] 2 5
[2,] 3 6
[3,] 4 7
>>> m[0]
2
>>> m[5]
7
>>> print(m.rx(1))
[1] 2
>>> print(m.rx(6))
[1] 7
DataFrame
In rpy2.robjects, DataFrame represents the R class data.frame.
Creating a DataFrame can be done by:
- Using the constructor for the class
- Create the data.frame through R
- Read data from a file using the instance method from_csvfile()
The DataFrame constructor accepts either an rinterface.SexpVector (with typeof equal to VECSXP, that is, an R list) or any Python object implementing the method items() (for example dict or rpy2.rlike.container.OrdDict).
>>> d = {'a': robjects.IntVector((1,2,3)), 'b': robjects.IntVector((4,5,6))}
>>> dataf = robject.DataFrame(d)
To create a DataFrame and be certain of the clumn order order, an ordered dictionary can be used:
>>> import rpy2.rlike.container as rlc
>>> od = rlc.OrdDict([('value', robjects.IntVector((1,2,3))),
('letter', robjects.StrVector(('x', 'y', 'z')))])
>>> dataf = robjects.DataFrame(od)
>>> print(dataf.colnames)
[1] "letter" "value"
Here again, Python’s __getitem__() will work as a Python programmer will expect it to:
>>> len(dataf)
2
>>> dataf[0]
<Vector - Python:0x8a58c2c / R:0x8e7dd08>
The DataFrame is composed of columns, with each column being possibly of a different type:
>>> [column.rclass[0] for column in dataf]
['factor', 'integer']
>>> dataf.rx(1)
<DataFrame - Python:0x8a584ac / R:0x95a6fb8>
>>> print(dataf.rx(1))
letter
1 x
2 y
3 z
>>> dataf.rx2(1)
<Vector - Python:0x8a4bfcc / R:0x8e7dd08>
>>> print(dataf.rx2(1))
[1] x y z
Levels: x y z
转换R对象到Python对象
The approach followed in rpy2 has 2 levels (rinterface and robjects), and conversion functions help moving between them.
协议 —— Protocols
R vectors are mapped to Python objects implementing the methods __getitem__() / __setitem__() in the sequence protocol so elements can be accessed easily.
R functions are mapped to Python objects implementing the __call__() so they can be called just as if they were functions.
R environments are mapped to Python objects implementing __getitem__() / __setitem__() in the mapping protocol so elements can be accessed similarly to in a Python dict.
转换 —— Conversion
In its high-level interface rpy2 is using a conversion system that has the task of convertion objects between the following 3 representations: - lower-level interface to R (rpy2.rinterface level), - higher-level interface to R (rpy2.robjects level) - other (no rpy2) representations
Numpy包
高级接口
From rpy2 to numpy
R vectors or arrays can be converted to numpy arrays using numpy.array() or numpy.asarray().
import numpy
ltr = robjects.r.letters
ltr_np = numpy.array(ltr)
From numpy to rpy2
The activation (and deactivation) of the automatic conversion of numpy objects into rpy2 objects can be made with:
from rpy2.robjects import numpy2ri
numpy2ri.activate()
numpy2ri.deactivate()
作者:plutoese
链接:https://www.jianshu.com/p/d8578362245a
來源:简书
简书著作权归作者所有,任何形式的转载都请联系作者获得授权并注明出处。
Python调用R编程——rpy2的更多相关文章
- python 调用 R,使用rpy2
python 与 R 是当今数据分析的两大主流语言.作为一个统计系的学生,我最早接触的是R,后来才接触的python.python是通用编程语言,科学计算.数据分析是其重要的组成部分,但并非全部:而R ...
- Python调用R语言
网络上经常看到有人问数据分析是学习Python好还是R语言好,还有一些争论Python好还是R好的文章.每次看到这样的文章我都会想到李舰和肖凯的<数据科学中的R语言>,书中一直强调,工具不 ...
- 用python调用R做数据分析-准备工作
0.R的介绍 R是自由软件,不带不论什么担保.在某些条件下你能够将其自由散布,用'license()'或'licence()'来看散布的具体条件. R是个合作计划.有很多人为之做出了贡献,用'cont ...
- python调用R语言,关联规则可视化
首先当然要配置r语言环境变量什么的 D:\R-3.5.1\bin\x64; D:\R-3.5.1\bin\x64\R.dll;D:\R-3.5.1;D:\ProgramData\Anaconda3\L ...
- 数据挖掘之Python调用R包、函数、脚本
Python中集成R :参考博客http://blog.csdn.net/weidelight/article/details/44946785
- (转)python中调用R语言通过rpy2 进行交互安装配置详解
python中调用R语言通过rpy2 进行交互安装配置详解(R_USER.R_HOME配置) 2018年11月08日 10:00:11 luqin_ 阅读数:753 python中调用R语言通过r ...
- python调用c\c++
前言 python 这门语言,凭借着其极高的易学易用易读性和丰富的扩展带来的学习友好性和项目友好性,近年来迅速成为了越来越多的人们的首选.然而一旦拿python与传统的编程语言(C/C++)如来比较的 ...
- Python黑帽编程2.3 字符串、列表、元组、字典和集合
Python黑帽编程2.3 字符串.列表.元组.字典和集合 本节要介绍的是Python里面常用的几种数据结构.通常情况下,声明一个变量只保存一个值是远远不够的,我们需要将一组或多组数据进行存储.查询 ...
- Python黑帽编程2.7 异常处理
Python黑帽编程2.7 异常处理 异常是个很宽泛的概念,如果程序没有按预想的执行,都可以说是异常了.遇到一些特殊情况没处理会引发异常,比如读文件的时候文件不存在,网络连接超时.程序本身的错误也可以 ...
随机推荐
- 如何使用threejs实现第一人称视角的移动
在数据可视化领域利用webgl来创建三维场景或VR已经越来越普遍,各种开发框架也应运而生.今天我们就通过最基本的threejs来完成第一人称视角的场景巡检功能.如果你是一位threejs的初学者或正打 ...
- Docker快速安装
目前装Docker得最简单方式就是脚本安装了,方法如下: curl -fsSL https://get.docker.com -o get-docker.sh sh get-docker.sh 安装后 ...
- Eclipse导war包忽略node_modules等文件
window7环境下,选择project->Properties->如下图
- 兔子问题(Rabbit problem)
Description 有一种兔子,出生后一个月就可以长大,然后再过一个月一对长大的兔子就可以生育一对小兔子且以后每个月都能生育一对.现在,我们有一对刚出生的这种兔子,那么,n 个月过后,我们会有多少 ...
- 如何用Java实现条件编译
在 C 或 C++ 中,可以通过预处理语句来实现条件编译.代码如下: #define DEBUG #IFDEF DEBUUG /* code block 1 */ #ELSE /* code bloc ...
- MySQL8.0新特性总览
1.消除了buffer pool mutex (Percona的贡献) 2.数据字典全部采用InnoDB引擎存储,支持DDL原子性.crash safe.metadata管理更完善(可以利用ibd2s ...
- Gitlab服务不能启动postgresql
源博文:http://www.zxmseed.com/blog/911081 1.查看启动的服务 -sh-4.1$ gitlab-ctl status warning: gitlab-workhors ...
- Vue组件全局/局部注册
全局注册 main.js中创建 Vue.component('button-counter', { data: function () { return { count: 0 } }, templat ...
- 关于memset的赋值(最大值最小值的选择)
memset赋值赋的是ASSCII码转为二进制赋值 比如 memset(,0xff,sizeof()),0xff转为二进制11111111,int为4字节所以最后为111111111111111111 ...
- java.lang.AbstractMethodError: null
在使用springcloud的时候运行报这个错,原因是版本冲突导致的,在idea中创建springcloud项目的时候,这里默认是${spring-cloud.version},但是如果你使用的是高版 ...