Data Manipulation with dplyr in R
Data Manipulation with dplyr in R
select
select(data,变量名)
The filter and arrange verbs
arrange
counties_selected <- counties %>%
select(state, county, population, private_work, public_work, self_employed)
# Add a verb to sort in descending order of public_work
counties_selected %>%arrange(desc(public_work))
filter
counties_selected <- counties %>%
select(state, county, population)
# Filter for counties in the state of California that have a population above 1000000
counties_selected %>%
filter(state == "California",
population > 1000000)
#筛选多个变量
filter(id %in% c("a","b","c"...)) 存在
filter(id %in% c("a","b","c"...)) 不存在
fct_relevel {forcats}
Reorder factor levels by hand
排序,order不好使的时候
f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a"))
fct_relevel(f)
fct_relevel(f, "a")
fct_relevel(f, "b", "a")
# Move to the third position
fct_relevel(f, "a", after = 2)
# Relevel to the end
fct_relevel(f, "a", after = Inf)
fct_relevel(f, "a", after = 3)
# Revel with a function
fct_relevel(f, sort)
fct_relevel(f, sample)
fct_relevel(f, rev)
Filtering and arranging
counties_selected <- counties %>%
select(state, county, population, private_work, public_work, self_employed)
>
> # Filter for Texas and more than 10000 people; sort in descending order of private_work
> counties_selected %>%filter(state=='Texas',population>10000)%>%arrange(desc(private_work))
# A tibble: 169 x 6
state county population private_work public_work self_employed
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Texas Gregg 123178 84.7 9.8 5.4
2 Texas Collin 862215 84.1 10 5.8
3 Texas Dallas 2485003 83.9 9.5 6.4
4 Texas Harris 4356362 83.4 10.1 6.3
5 Texas Andrews 16775 83.1 9.6 6.8
6 Texas Tarrant 1914526 83.1 11.4 5.4
7 Texas Titus 32553 82.5 10 7.4
8 Texas Denton 731851 82.2 11.9 5.7
9 Texas Ector 149557 82 11.2 6.7
10 Texas Moore 22281 82 11.7 5.9
# ... with 159 more rows
Mutate
counties_selected <- counties %>%
select(state, county, population, public_work)
# Sort in descending order of the public_workers column
counties_selected %>%
mutate(public_workers = public_work * population / 100) %>%arrange(desc(public_workers))
counties %>%
# Select the five columns
select(state, county, population, men, women) %>%
# Add the proportion_men variable
mutate(proportion_men = men / population) %>%
# Filter for population of at least 10,000
filter(population >= 10000) %>%
# Arrange proportion of men in descending order
arrange(desc(proportion_men))
The count verb
counties_selected %>%count(region,sort=TRUE)
counties_selected %>%count(state,wt=citizens,sort=TRUE)
Summarizing
# Summarize to find minimum population, maximum unemployment, and average income
counties_selected %>%summarize(
min_population=min(population),
max_unemployment=max(unemployment),
average_income=mean(income)
)
# Add a density column, then sort in descending order
counties_selected %>%
group_by(state) %>%
summarize(total_area = sum(land_area),
total_population = sum(population),
density=total_population/total_area) %>%arrange(desc(density))
发现了,归根到底是一种函数关系,看看该怎样处理这个函数比较简单,如果写不出来,可能和小学的时候应用题写不出来有关系
top_n
按照优先级来筛选
# Extract the most populated row for each state
counties_selected %>%
group_by(state, metro) %>%
summarize(total_pop = sum(population)) %>%
top_n(1, total_pop)
Selecting
Using the select verb, we can answer interesting questions about our dataset by focusing in on related groups of verbs. Data manipulation primitives in R and Python Both R and Python are incredibly good tools to manipula ... dplyr and data.table are amazing packages that make data manipulation in R fun. Both packages have t ... dplyr 0.4.0 January 9, 2015 in Uncategorized I’m very pleased to announce that dplyr 0.4.0 is now av ... 1.错误描写叙述 java.sql.SQLException: Can not issue data manipulation statements with executeQuery(). at c ... 转: Can not issue data manipulation statements with executeQuery()错误解决 2012年03月27日 15:47:52 katalya 阅 ... 2018-02-19 18:03:54 一.数据操纵语言(Data Manipulation Language) 数据操纵语言是指插入,删除和更新语言. 二.视图(View) 数据库三级模式,两级映射 ... 这个错误提示是说无法发行sql语句到指定的位置 错误写法: 正确写法: excuteQuery是查询语句,而我要调用的是更新的语句,所以这样数据库很为难到底要干嘛,实际我想用的是更新,但是我写成了查询 ... Can not issue data manipulation statements with executeQuery() 报错的解决方案: 把“ResultSet rs = statement. ... Hive Data Manipulation Language Hive Data Manipulation Language Loading files into tables Syntax Syn ... Jmeter性能监控插件由客户端插件和服务器端程序组成. 官方文档及插件下载地址https://jmeter-plugins.org/wiki/PerfMon/ 将插件 plugins-manager ... 会动的汉克狗: <!doctype html> <html lang="en"> <head> <meta charset="U ... Linux系统之网络相关的命令 网络概述 网络:通过通信介质和通信设备 将分布不同地点的两台或多台计算机,经过相应的程序实现通信switch 交换机router 路由器网络的功能:数据通信:利用网络传 ... 本次的实验环境是: kali linux -3 kali linux 全版本地址: http://old.kali.org/kali-images/ 楼主的主系统是:kali linux 如果想学好 ... 查看机器型号等 dmidecode 是一个读取电脑 DMI(桌面管理接口(Desktop Management Interface))表内容并且以人类可读的格式显示系统硬件信息的工具.这个表包含系统硬 ... 原文来我的公众号:Spark性能优化指南——初级篇 一. Spark作业原理 我们使用spark-submit提交一个Spark作业之后,这个作业就会启动一个对应的Driver进程.该进程是向集群管理 ... 文章目录 1.环境准备 1.1下载zooKeeper 1.3安装zooKeeper 1.4配置zooKeeper环境变量 1.5 修改zookeeper集群配置文件 1.6 创建myid文件 1.7 ... 个人博客 地址:https://www.wenhaofan.com/article/20190407105818 引入依赖 <dependency> <groupId>org. ... 安卓开发中长按弹出菜单的创建方法: 1.首先给View注册上下文菜单registerForContextMenu(); 2.添加上下文菜单内容onCreateContextMenu(): ---可以通 ... 引子 曾经很多次看过最大流的模板,基础概念什么的也看了很多遍.也曾经用过强者同学的板子,然而却一直不会网络流.虽然曾经尝试过写,然而即使最简单的一种算法也没有写成功过,然后对着强者大神的代码一点一点的 ...
The colon (
Data Manipulation with dplyr in R的更多相关文章
随机推荐