Data Manipulation with dplyr in R

select
The filter and arrange verbs
arrange
filter
fct_relevel {forcats}
- Filtering and arranging
Mutate
The count verb
Summarizing
top_n
Selecting
rename
transmute
Grouped mutates
Window functions

select

select(data，变量名）

The filter and arrange verbs

arrange

counties_selected <- counties %>%

  select(state, county, population, private_work, public_work, self_employed)

# Add a verb to sort in descending order of public_work

counties_selected %>%arrange(desc(public_work))

filter

counties_selected <- counties %>%

  select(state, county, population)

# Filter for counties in the state of California that have a population above 1000000

counties_selected %>%

  filter(state == "California",

         population > 1000000)

#筛选多个变量

filter(id %in% c("a","b","c"...)) 存在

filter(id %in% c("a","b","c"...)) 不存在

fct_relevel {forcats}

Reorder factor levels by hand

排序，order不好使的时候

f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a"))

fct_relevel(f)

fct_relevel(f, "a")

fct_relevel(f, "b", "a")

# Move to the third position

fct_relevel(f, "a", after = 2)

# Relevel to the end

fct_relevel(f, "a", after = Inf)

fct_relevel(f, "a", after = 3)

# Revel with a function

fct_relevel(f, sort)

fct_relevel(f, sample)

fct_relevel(f, rev)

Filtering and arranging

 counties_selected <- counties %>%

    select(state, county, population, private_work, public_work, self_employed)

>

> # Filter for Texas and more than 10000 people; sort in descending order of private_work

> counties_selected %>%filter(state=='Texas',population>10000)%>%arrange(desc(private_work))

# A tibble: 169 x 6

   state county  population private_work public_work self_employed

   <chr> <chr>        <dbl>        <dbl>       <dbl>         <dbl>

 1 Texas Gregg       123178         84.7         9.8           5.4

 2 Texas Collin      862215         84.1        10             5.8

 3 Texas Dallas     2485003         83.9         9.5           6.4

 4 Texas Harris     4356362         83.4        10.1           6.3

 5 Texas Andrews      16775         83.1         9.6           6.8

 6 Texas Tarrant    1914526         83.1        11.4           5.4

 7 Texas Titus        32553         82.5        10             7.4

 8 Texas Denton      731851         82.2        11.9           5.7

 9 Texas Ector       149557         82          11.2           6.7

10 Texas Moore        22281         82          11.7           5.9

# ... with 159 more rows

Mutate

counties_selected <- counties %>%

  select(state, county, population, public_work)

# Sort in descending order of the public_workers column

counties_selected %>%

  mutate(public_workers = public_work * population / 100) %>%arrange(desc(public_workers))

counties %>%

  # Select the five columns

  select(state, county, population, men, women) %>%

  # Add the proportion_men variable

  mutate(proportion_men = men / population) %>%

  # Filter for population of at least 10,000

  filter(population >= 10000) %>%

  # Arrange proportion of men in descending order

  arrange(desc(proportion_men))

The count verb

counties_selected %>%count(region,sort=TRUE)

counties_selected %>%count(state,wt=citizens,sort=TRUE)

Summarizing

# Summarize to find minimum population, maximum unemployment, and average income

counties_selected %>%summarize(

min_population=min(population),

max_unemployment=max(unemployment),

average_income=mean(income)

)

# Add a density column, then sort in descending order

counties_selected %>%

  group_by(state) %>%

  summarize(total_area = sum(land_area),

            total_population = sum(population),

            density=total_population/total_area) %>%arrange(desc(density))

发现了，归根到底是一种函数关系，看看该怎样处理这个函数比较简单，如果写不出来，可能和小学的时候应用题写不出来有关系

top_n

按照优先级来筛选

# Extract the most populated row for each state

counties_selected %>%

  group_by(state, metro) %>%

  summarize(total_pop = sum(population)) %>%

  top_n(1, total_pop)

Selecting

Using the select verb, we can answer interesting questions about our dataset by focusing in on related groups of verbs.

The colon (

Data Manipulation with dplyr in R的更多相关文章

Data manipulation primitives in R and Python
Data manipulation primitives in R and Python Both R and Python are incredibly good tools to manipula ...

Best packages for data manipulation in R
dplyr and data.table are amazing packages that make data manipulation in R fun. Both packages have t ...

The dplyr package has been updated with new data manipulation commands for filters, joins and set operations.（转）
dplyr 0.4.0 January 9, 2015 in Uncategorized I’m very pleased to announce that dplyr 0.4.0 is now av ...

java.sql.SQLException: Can not issue data manipulation statements with executeQuery().
1.错误描写叙述 java.sql.SQLException: Can not issue data manipulation statements with executeQuery(). at c ...

Can not issue data manipulation statements with executeQuery()错误解决
转: Can not issue data manipulation statements with executeQuery()错误解决 2012年03月27日 15:47:52 katalya 阅 ...

数据库原理及应用-SQL数据操纵语言（Data Manipulation Language）和嵌入式SQL&存储过程
2018-02-19 18:03:54 一.数据操纵语言(Data Manipulation Language) 数据操纵语言是指插入,删除和更新语言. 二.视图(View) 数据库三级模式,两级映射 ...

Can not issue data manipulation statements with executeQuery().解决方案
这个错误提示是说无法发行sql语句到指定的位置错误写法: 正确写法: excuteQuery是查询语句,而我要调用的是更新的语句,所以这样数据库很为难到底要干嘛,实际我想用的是更新,但是我写成了查询 ...

Can not issue data manipulation statements with executeQuery()的解决方案
Can not issue data manipulation statements with executeQuery() 报错的解决方案: 把“ResultSet rs = statement. ...

【转】Hive Data Manipulation Language
Hive Data Manipulation Language Hive Data Manipulation Language Loading files into tables Syntax Syn ...

随机推荐

【python基础语法】第8天作业练习题
""" # 第一题: # 要求:请将数据读取出来,转换为以下格式 {'data0': '数据aaa', 'data1': '数据bbb', 'data2': '数据ccc ...

Kafka消费者没有收到通知的分析
今天遇到两位三方人员跟我反馈,某微服务的异步接口功能不正常了,由于该异步接口采用Kafka异步消息的方案,对方说没有收到Kafka给消费者的通知,根据此问题,联系了相关人员进行了分析: (一)明确环境 ...

LeetCode 面试题 02.02. 返回倒数第 k 个节点
题目链接:https://leetcode-cn.com/problems/kth-node-from-end-of-list-lcci/ 实现一种算法,找出单向链表中倒数第 k 个节点.返回该节点的 ...

CSS3结构类选择器补充
:empty 没有子元素(包括文本节点)的元素 :not 否定选择器 <!DOCTYPE html> <html lang="en" manifest=&quo ...

(LNMP) Nginx_PHP_MariaDB
L用的是Centos7.5以上,主要是NMP三组件的安装记录. 通常会先安装一下依赖: yum install -y pcre-devel zlib-devel openssl-devel 使用yum ...

剑指offer-面试题47-礼物的最大价值-动态规划
/* 题目: 给定一个m*n的棋盘,每格放一个礼物(每个礼物的值大于0), 从左上角出发,向下或向右走到达右下角,得到的礼物和最大. */ /* 思路: f(i,j)=max[f(i-1,j),f(i ...

React网络请求跨域代理设置
之前的之所以可以请求其他域名下的网络数据,是因为我们在服务端设置了相关配置,如下所示如果将其注释掉,再次测试,如下所示此时便无法跨域操作,接下来介绍下React如何实现跨域代理 (1)分析 Rea ...

免费免驱动从网上截取正版音乐与MV及视频
在QQ Music里听到Alan Walker 的 Songs,想download,但是要绿钻.MV也有QQ音乐的logo,这下有了一个绝佳的办法! Tools:1.Gihosoft TubeGet ...

JS函数深入
函数的本质是对象三种定义方式 1. 字面量=function声明 function add() { // body... } add(); 2. var赋值表达式 var add = funct ...

关于c# hashtable的一个注意点
Hashtable在操作时,一定要注意一点: 当保存值时,如果使用的是字符串作为键,那么在判断是否存在此键时,必须使用字符串来检查,否则,即使是能隐式转换的值也将无法检查到,如: Hashtable ...

Data Manipulation with dplyr in R

select

The filter and arrange verbs

arrange

filter

fct_relevel {forcats}

Filtering and arranging

Mutate

The count verb

Summarizing

top_n

Selecting

Data Manipulation with dplyr in R的更多相关文章

随机推荐

热门专题