R-table和tapply函数
table可统计数据的频数
tapply可根据因子、向量和要计算的函数计算
> class<-c(1,2,3,2,1,2,1,3)
> class
[1] 1 2 3
> c(81,65,72,88,73,91,56,90)->student
> class
[1] 1 2 3 2 1 2 1 3
>factor(class)->class
> tapply(student,class,mean)
1 2 3
70.00000 81.33333 81.00000
> tapply(student,class,min)
1 2 3
56 65 72
> tapply(student,class,max)
1 2 3
81 91 90
> table(class)
class
1 2 3
3 3 2
>
Apply a Function Over a Ragged Array
Description
Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors.
Usage
tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
Arguments
X |
an atomic object, typically a vector. |
INDEX |
list of factors, each of same length as |
FUN |
the function to be applied, or |
... |
optional arguments to |
simplify |
If |
Value
If FUN
is not NULL
, it is passed to match.fun
, and hence it can be a function or a symbol or character string naming a function.
When FUN
is present, tapply
calls FUN
for each cell that has any data in it. If FUN
returns a single atomic value for each such cell (e.g., functions mean
or var
) and when simplify
is TRUE
,tapply
returns a multi-way array containing the values, and NA
for the empty cells. The array has the same number of dimensions as INDEX
has components; the number of levels in a dimension is the number of levels (nlevels()
) in the corresponding component of INDEX
. Note that if the return value has a class (e.g. an object of class "Date"
) the class is discarded.
Note that contrary to S, simplify = TRUE
always returns an array, possibly 1-dimensional.
If FUN
does not return a single atomic value, tapply
returns an array of mode list
whose components are the values of the individual calls to FUN
, i.e., the result is a list with a dim
attribute.
When there is an array answer, its dimnames
are named by the names of INDEX
and are based on the levels of the grouping factors (possibly after coercion).
For a list result, the elements corresponding to empty cells are NULL
.
Note
Optional arguments to FUN
supplied by the ...
argument are not divided into cells. It is therefore inappropriate for FUN
to expect additional arguments with the same length as X
.
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
See Also
the convenience functions by
and aggregate
(using tapply
); apply
, lapply
with its versions sapply
and mapply
.
Examples
require(stats) groups <- as.factor(rbinom(32, n = 5, prob = 0.4)) tapply(groups, groups, length) #- is almost the same as table(groups) ## contingency table from data.frame : array with named dimnames tapply(warpbreaks$breaks, warpbreaks[,-1], sum) tapply(warpbreaks$breaks, warpbreaks[, 3, drop = FALSE], sum) n <- 17; fac <- factor(rep(1:3, length = n), levels = 1:5) table(fac) tapply(1:n, fac, sum) tapply(1:n, fac, sum, simplify = FALSE) tapply(1:n, fac, range) tapply(1:n, fac, quantile) ## example of ... argument: find quarterly means tapply(presidents, cycle(presidents), mean, na.rm = TRUE) ind <- list(c(1, 2, 2), c("A", "A", "B")) table(ind) tapply(1:3, ind) #-> the split vector tapply(1:3, ind, sum)
问题:
有数万个数据,两列数据 一列为名称(A列 ) 一列为值(x列),一个名称可对应多个值,一个值可能有多个名称,具体问题如下所示
A1 x1
A2 x2
A3 x3
A4 x4
A1 x5
A2 x3
A5 x6
A1 x7
想得到的结果,将A列名称唯一化,出现一个值对应多个值的列表,且想批量处理
A1 x1 x5 x7
A2 x2 x3
A3 x3
A4 x4
A5 x6
解决方案1:perl
use strict;
use warnings;
my %hash;
open OUT, "> lines.txt" or die"$!";
while () {
chomp;
my ($line1,$line2) = split/\s+/;
push @{$hash{$line1}},$line2;
}
foreach my $key(sort keys %hash) {
print OUT "$key\t@{$hash{$key}}\n";
}
close OUT;
__DATA__
A1 x1
A2 x2
A3 x3
A4 x4
A1 x5
A2 x3
A5 x6
tapply(d[,2], d[,1], print)
R-table和tapply函数的更多相关文章
- R语言中apply函数
前言 刚开始接触R语言时,会听到各种的R语言使用技巧,其中最重要的一条就是不要用循环,效率特别低,要用向量计算代替循环计算. 那么,这是为什么呢?原因在于R的循环操作for和while,都是基于R语言 ...
- C调用lua的table里面的函数
网上搜索C.C++调用lua函数,有一大堆复制粘贴的. 但是搜索<C调用lua的table里面的函数> 怎么就没几个呢? 经过探索,发现其实逻辑是这样的: 1.根据name获取table ...
- R语言数据读入函数read.table
1.read.table:可以读TXT也可以读CSV (1)file:文件名 (2)header:是否包含表头 (3)sep:分隔符,如果不设定默认是空格 (4)dec:标志小数点符号,有些国家的小数 ...
- R apply() 函数和 tapply() 函数
apply(a,b,c) a是矩阵 b是行或列的代表,1是行,2是列 c是执行函数,如求和-sum,求平均-mean,求-range tapply(a,b,c) a是一个一维数据, ...
- R语言:常用函数【转】
数据结构 一.数据管理vector:向量 numeric:数值型向量 logical:逻辑型向量 character:字符型向量list:列表 data.frame:数据框 c:连接为向量或列表len ...
- R语言——基本绘图函数
通过一个综合的例子测试绘图函数 学习的内容是tigerfish老师的教程. 第一节:基本知识 用seq函数产生100位学生的学号. > num = seq(,) > num [] [] [ ...
- R8—批量生成文件夹,批量读取文件夹名称+R文件管理系统操作函数
一. 批量生成文件夹,批量读取文件夹名称 今日,工作中遇到这样一个问题:boss给我们提供了200多家公司的ID代码(如6007.7920等),需要根据这些ID号去搜索下载新闻,从而将下载到的新闻存到 ...
- R语言常用数学函数
语言的数学运算和一些简单的函数整理如下: 向量可以进行那些常规的算术运算,不同长度的向量可以相加,这种情况下最短的向量将被循环使用. > x <- 1:4 > a <- 1 ...
- R语言 三个函数sort();rank();order()
R语言入门,弄懂了几个简单的函数,分享一下:R语言排序有几个基本函数: sort():rank():order()sort()是对向量进行从小到大的排序rank()返回的是对向量中每个数值对应的秩or ...
随机推荐
- ios处理键盘
#pragma mark - Keyboard - (void)addKeyboardNoti { [[NSNotificationCenter defaultCenter] addObserver: ...
- ubuntu for win10 里运行apache+php
一直想试试ubuntu for win10中运行网站测试一下,弄了好久,今天终于基本弄明白了, ubuntu for win10里的IP就是外面WIN10的IP,在里面建立网站了可以直接在外面WIN1 ...
- python2和Python3异同总结
1. python3 异常不再接收逗号(,)作为参数: ## python3 中这样可以正常运行 try: print("在这里执行的代码,有异常进入except") except ...
- HTML5学习笔记(十二):JavaScript新增Map和Set
Map JavaScript的默认对象表示方式{}可以视为其他语言中的Map或Dictionary的数据结构,即一组键值对. 但是JavaScript的对象有个小问题,就是键必须是字符串.但实际上Nu ...
- [Windows Azure] How to Scale an Application
How to Scale an Application To use this feature and other new Windows Azure capabilities, sign up fo ...
- linux命令(48):打乱一个文本文件的所有行
如果用python读进内存再打乱的思路,如果大文件的话,就比较麻烦了 网上找到一个简单的方法,shuf: $ shuf --help 用法: shuf [选项]... [文件] 或者: shuf -e ...
- 【转&参考】MySQL利用frm和ibd文件进行数据恢复
MySQL利用frm和idb文件进行数据恢复 源MySQL现状: 版本:5.6.* 存储引擎:innodb存储引擎 要恢复数据库:skill 重点要恢复表:slot_value 已有的文件: 备份了所 ...
- LeetCode: Distinct Subsequences 解题报告
Distinct Subsequences Given a string S and a string T, count the number of distinct subsequences of ...
- Windows + Ubuntu双系统时间不一致
在安装Ubuntu和Windows双系统的情况下,Ubuntu的时间总会和Windows的时间相差8小时,原因在于widows认为BIOS时间是本地时间,Ubuntu认为BIOS时间是UTC时间 su ...
- 6. 支持向量机(SVM)核函数
1. 感知机原理(Perceptron) 2. 感知机(Perceptron)基本形式和对偶形式实现 3. 支持向量机(SVM)拉格朗日对偶性(KKT) 4. 支持向量机(SVM)原理 5. 支持向量 ...