之前一阵子,在EDX上学习了R语言的一门基础课程,这里做个总结。这门课程主要侧重于R的数据结构的介绍,当然也介绍了它的基本的绘图手段。

工作空间相关

  1. ls()
  2. ## character(0)
  3. rm(a)
  4. ## Warning in rm(a): 找不到对象'a'
  5. ls()
  6. ## character(0)

基本数据类型

  1. logical

    • TRUE/FALSE/NA/T/F(推荐使用完整形式)/某些时候的0与非0
  2. numeric
    • integer is numeric
    • numeric not always integer
  3. character

Other atomic types:

  • double: higher precision
  • complex: complex numbers
  • raw: store raw bytes

is.*()返回括号内内容是否是*对应类型。

  1. # logical
  2. TRUE
  3. ## [1] TRUE
  4. class(TRUE)
  5. ## [1] "logical"
  6. FALSE
  7. ## [1] FALSE
  8. class(NA)
  9. ## [1] "logical"
  10. T
  11. ## [1] TRUE
  12. F
  13. ## [1] FALSE
  14. # numeric
  15. 2
  16. ## [1] 2
  17. class(2)
  18. ## [1] "numeric"
  19. 2.5
  20. ## [1] 2.5
  21. class(2.5)
  22. ## [1] "numeric"
  23. 2L
  24. ## [1] 2
  25. class(2L)
  26. ## [1] "integer"
  27. is.numeric(2)
  28. ## [1] TRUE
  29. is.numeric(2L)
  30. ## [1] TRUE
  31. #integer is numeric
  32. #numeric not always integer
  33. is.integer(2)
  34. ## [1] FALSE
  35. is.integer(2L)
  36. ## [1] TRUE
  37. # character
  38. "I love data science!"
  39. ## [1] "I love data science!"
  40. class("I love data science!")
  41. ## [1] "character"

强制转换

as.*()返回括号内内容转化为*对应类型后的结果,有些情况无法转换。

  1. as.numeric(TRUE)
  2. ## [1] 1
  3. as.numeric(FALSE)
  4. ## [1] 0
  5. as.character(4)
  6. ## [1] "4"
  7. as.numeric("4.5")
  8. ## [1] 4.5
  9. as.integer("4.5")
  10. ## [1] 4
  11. as.numeric("Hello")
  12. ## Warning: 强制改变过程中产生了NA
  13. ## [1] NA

向量 Vector

  • Sequence of data elements
  • Same basic type
    • Automatic coercion if necessary
  • character, numeric, logical
  • Single value = Vector

创建 c()或者利用:

  1. # c()
  2. drawn_suits <- c("hearts", "spades", "diamonds",
  3. "diamonds", "spades")
  4. drawn_suits
  5. ## [1] "hearts" "spades" "diamonds" "diamonds" "spades"
  6. is.vector(drawn_suits)
  7. ## [1] TRUE
  8. # :
  9. 1:5
  10. ## [1] 1 2 3 4 5
  11. is.vector(1:5)
  12. ## [1] TRUE

命名 names()

  1. remain <- c(11, 12, 11, 13)
  2. suits <- c("spades", "hearts", "diamonds", "clubs")
  3. names(remain) <- suits
  4. remain
  5. ## spades hearts diamonds clubs
  6. ## 11 12 11 13
  7. #or
  8. remain <- c(spades = 11, hearts = 12,
  9. diamonds = 11, clubs = 13)
  10. remain
  11. ## spades hearts diamonds clubs
  12. ## 11 12 11 13
  13. #or
  14. remain <- c("spades" = 11, "hearts" = 12,
  15. "diamonds" = 11, "clubs" = 13)
  16. remain
  17. ## spades hearts diamonds clubs
  18. ## 11 12 11 13

单值仍为向量

  1. my_apples <- 5
  2. my_oranges <- "six"
  3. is.vector(my_apples)
  4. ## [1] TRUE
  5. is.vector(my_oranges)
  6. ## [1] TRUE
  7. length(my_apples)
  8. ## [1] 1
  9. length(my_oranges)
  10. ## [1] 1

强制变换

  1. drawn_ranks <- c(7, 4, "A", 10, "K", 3, 2, "Q")
  2. drawn_ranks
  3. ## [1] "7" "4" "A" "10" "K" "3" "2" "Q"
  4. class(drawn_ranks)
  5. ## [1] "character"

基本运算

很自然的可以由单数的运算推广出来。

  1. # with number: +-*/
  2. earnings <- c(50, 100, 30)
  3. earnings * 3
  4. ## [1] 150 300 90
  5. earnings^2
  6. ## [1] 2500 10000 900
  7. # with vector: +-*/
  8. earnings <- c(50, 100, 30)
  9. expenses <- c(30, 40, 80)
  10. bank <- earnings - expenses
  11. ## sum() >
  12. sum(bank)
  13. ## [1] 30
  14. earnings > expenses
  15. ## [1] TRUE TRUE FALSE
  16. ## multiplication and division are done element-wise!
  17. earnings * c(1, 2, 3)
  18. ## [1] 50 200 90

子集

三种索引方式

  • 序号(R从1开始)
  • 名字 —— names()的利用
  • 逻辑值

  1. remain <- c(spades = 11, hearts = 12,
  2. diamonds = 11, clubs = 13)
  3. remain[1]
  4. ## spades
  5. ## 11
  6. remain["spades"]
  7. ## spades
  8. ## 11
  9. remain[c(4, 1)] # 此法可以用来交换或者抽取特定位置的元素
  10. ## clubs spades
  11. ## 13 11
  12. remain[c("clubs", "spades")]
  13. ## clubs spades
  14. ## 13 11
  15. # 逻辑值索引,短的会被自动循环使用
  16. remain[c(TRUE, FALSE)]
  17. ## spades diamonds
  18. ## 11 11
  19. remain[c(TRUE, FALSE, TRUE, FALSE)]
  20. ## spades diamonds
  21. ## 11 11
  22. # 负索引,“all but it”,返回除此之外的元素
  23. remain[-1]
  24. ## hearts diamonds clubs
  25. ## 12 11 13
  26. remain[-c(1, 2)]
  27. ## diamonds clubs
  28. ## 11 13
  29. #remain[-"spades"] #can't work

矩阵 Matrix

  • Vector: 1D array of data elements
  • Matrix: 2D array of data elements
  • Rows and columns
  • One atomic vector type

创建 matrix()

默认按列填充

  1. # 直接创建
  2. matrix(1:6, nrow = 2)
  3. ## [,1] [,2] [,3]
  4. ## [1,] 1 3 5
  5. ## [2,] 2 4 6
  6. matrix(1:6, ncol = 3)
  7. ## [,1] [,2] [,3]
  8. ## [1,] 1 3 5
  9. ## [2,] 2 4 6
  10. matrix(1:6, nrow = 2, byrow = TRUE)
  11. ## [,1] [,2] [,3]
  12. ## [1,] 1 2 3
  13. ## [2,] 4 5 6
  14. # 循环创建
  15. matrix(1:3, nrow = 2, ncol = 3)
  16. ## [,1] [,2] [,3]
  17. ## [1,] 1 3 2
  18. ## [2,] 2 1 3
  19. matrix(1:4, nrow = 2, ncol = 3)
  20. ## Warning in matrix(1:4, nrow = 2, ncol = 3): 数据长度[4]不是矩阵列数[3]的整
  21. ## 倍数
  22. ## [,1] [,2] [,3]
  23. ## [1,] 1 3 1
  24. ## [2,] 2 4 2
  25. # 组合创建
  26. cbind(1:3, 1:3)
  27. ## [,1] [,2]
  28. ## [1,] 1 1
  29. ## [2,] 2 2
  30. ## [3,] 3 3
  31. rbind(1:3, 1:3)
  32. ## [,1] [,2] [,3]
  33. ## [1,] 1 2 3
  34. ## [2,] 1 2 3
  35. m <- matrix(1:6, byrow = TRUE, nrow = 2)
  36. rbind(m, 7:9)
  37. ## [,1] [,2] [,3]
  38. ## [1,] 1 2 3
  39. ## [2,] 4 5 6
  40. ## [3,] 7 8 9
  41. cbind(m, c(10, 11))
  42. ## [,1] [,2] [,3] [,4]
  43. ## [1,] 1 2 3 10
  44. ## [2,] 4 5 6 11

命名

rownames(), colnames()

  1. m <- matrix(1:6, byrow = TRUE, nrow = 2)
  2. rownames(m) <- c("row1", "row2")
  3. m
  4. ## [,1] [,2] [,3]
  5. ## row1 1 2 3
  6. ## row2 4 5 6
  7. colnames(m) <- c("col1", "col2", "col3")
  8. m
  9. ## col1 col2 col3
  10. ## row1 1 2 3
  11. ## row2 4 5 6
  12. # 直接命名
  13. m <- matrix(1:6, byrow = TRUE, nrow = 2,
  14. dimnames = list(c("row1", "row2"),
  15. c("col1", "col2", "col3")))
  16. m
  17. ## col1 col2 col3
  18. ## row1 1 2 3
  19. ## row2 4 5 6

强制转换

  1. num <- matrix(1:8, ncol = 2)
  2. num
  3. ## [,1] [,2]
  4. ## [1,] 1 5
  5. ## [2,] 2 6
  6. ## [3,] 3 7
  7. ## [4,] 4 8
  8. char <- matrix(LETTERS[1:6], nrow = 4, ncol = 3)
  9. char
  10. ## [,1] [,2] [,3]
  11. ## [1,] "A" "E" "C"
  12. ## [2,] "B" "F" "D"
  13. ## [3,] "C" "A" "E"
  14. ## [4,] "D" "B" "F"
  15. num <- matrix(1:8, ncol = 2)
  16. char <- matrix(LETTERS[1:6], nrow = 4, ncol = 3)
  17. cbind(num, char)
  18. ## [,1] [,2] [,3] [,4] [,5]
  19. ## [1,] "1" "5" "A" "E" "C"
  20. ## [2,] "2" "6" "B" "F" "D"
  21. ## [3,] "3" "7" "C" "A" "E"
  22. ## [4,] "4" "8" "D" "B" "F"

子集运算

  1. m <- matrix(sample(1:15, 12), nrow = 3)
  2. rownames(m) <- c("r1", "r2", "r3")
  3. colnames(m) <- c("a", "b", "c", "d")
  4. m
  5. ## a b c d
  6. ## r1 7 5 6 10
  7. ## r2 3 9 12 8
  8. ## r3 15 13 2 4
  9. m[1,3]
  10. ## [1] 6
  11. m[3,]
  12. ## a b c d
  13. ## 15 13 2 4
  14. m[,3]
  15. ## r1 r2 r3
  16. ## 6 12 2
  17. m[4] # 默认按列计数
  18. ## [1] 5
  19. m[2, c(2, 3)]
  20. ## b c
  21. ## 9 12
  22. m[c(1, 2), c(2, 3)]
  23. ## b c
  24. ## r1 5 6
  25. ## r2 9 12
  26. m[c(1, 3), c(1, 3, 4)]
  27. ## a c d
  28. ## r1 7 6 10
  29. ## r3 15 2 4
  30. m["r2","c"]
  31. ## [1] 12
  32. m[2,"c"]
  33. ## [1] 12
  34. m[3, c("c", "d")]
  35. ## c d
  36. ## 2 4
  37. m[c(FALSE, FALSE, TRUE),
  38. c(FALSE, TRUE, FALSE, TRUE)]
  39. ## b d
  40. ## 13 4
  41. m[c(FALSE, FALSE, TRUE),
  42. c(FALSE, TRUE)]
  43. ## b d
  44. ## 13 4

矩阵运算

  • colSums(), rowSums()
  • Standard arithmetic possible
  • Element-wise computation

  1. the_fellowship <- c(316, 556)
  2. two_towers <- c(343, 584)
  3. return_king <- c(378, 742)
  4. lotr_matrix <- rbind(the_fellowship, two_towers, return_king)
  5. colnames(lotr_matrix) <- c("US", "non-US")
  6. rownames(lotr_matrix) <- c("Fellowship", "Two Towers",
  7. "Return King")
  8. lotr_matrix
  9. ## US non-US
  10. ## Fellowship 316 556
  11. ## Two Towers 343 584
  12. ## Return King 378 742
  13. # 与数字 +-*/
  14. lotr_matrix / 1.12
  15. ## US non-US
  16. ## Fellowship 282.1429 496.4286
  17. ## Two Towers 306.2500 521.4286
  18. ## Return King 337.5000 662.5000
  19. lotr_matrix - 50
  20. ## US non-US
  21. ## Fellowship 266 506
  22. ## Two Towers 293 534
  23. ## Return King 328 692
  24. # 与矩阵 +-*/ (这里不是线性代数中的矩阵计算)
  25. theater_cut <- matrix(c(50, 80, 100), nrow = 3, ncol = 2)
  26. theater_cut
  27. ## [,1] [,2]
  28. ## [1,] 50 50
  29. ## [2,] 80 80
  30. ## [3,] 100 100
  31. lotr_matrix - theater_cut
  32. ## US non-US
  33. ## Fellowship 266 506
  34. ## Two Towers 263 504
  35. ## Return King 278 642
  36. # 与向量
  37. lotr_matrix - c(50, 80, 100) #按列循环计算
  38. ## US non-US
  39. ## Fellowship 266 506
  40. ## Two Towers 263 504
  41. ## Return King 278 642

因子 Factors

  • Factors for categorical variables
  • Limited number of different values
  • Belong to category

创建因子 factor()

  1. blood <- c("B", "AB", "O", "A", "O", "O", "A", "B")
  2. blood
  3. ## [1] "B" "AB" "O" "A" "O" "O" "A" "B"
  4. blood_factor <- factor(blood) # 默认等级按照字母顺序定
  5. blood_factor
  6. ## [1] B AB O A O O A B
  7. ## Levels: A AB B O
  8. str(blood_factor)
  9. ## Factor w/ 4 levels "A","AB","B","O": 3 2 4 1 4 4 1 3
  10. # 自定义level
  11. blood_factor2 <- factor(blood,
  12. levels = c("O", "A", "B", "AB"))
  13. blood_factor2
  14. ## [1] B AB O A O O A B
  15. ## Levels: O A B AB
  16. str(blood_factor2)
  17. ## Factor w/ 4 levels "O","A","B","AB": 3 4 1 2 1 1 2 3

Rename factor levels

  1. blood <- c("B", "AB", "O", "A", "O", "O", "A", "B")
  2. #1.1
  3. blood_factor <- factor(blood)
  4. levels(blood_factor) <- c("BT_A", "BT_AB", "BT_B", "BT_O")
  5. #1.2
  6. blood <- c("B", "AB", "O", "A", "O", "O", "A", "B")
  7. blood_factor <- factor(blood)
  8. factor(blood,
  9. levels = c("O", "A", "B", "AB"),
  10. labels = c("BT_O", "BT_A", "BT_B", "BT_AB"))
  11. ## [1] BT_B BT_AB BT_O BT_A BT_O BT_O BT_A BT_B
  12. ## Levels: BT_O BT_A BT_B BT_AB
  13. #2
  14. factor(blood, labels = c("BT_A", "BT_AB", "BT_B", "BT_O"))
  15. ## [1] BT_B BT_AB BT_O BT_A BT_O BT_O BT_A BT_B
  16. ## Levels: BT_A BT_AB BT_B BT_O

Ordered factor

  1. blood <- c("B", "AB", "O", "A", "O", "O", "A", "B")
  2. blood_factor <- factor(blood)
  3. blood_factor[1] < blood_factor[2]
  4. ## Warning in Ops.factor(blood_factor[1], blood_factor[2]): '<' not meaningful
  5. ## for factors
  6. ## [1] NA
  7. # 下面比较大小才是有意义的
  8. tshirt <- c("M", "L", "S", "S", "L", "M", "L", "M")
  9. tshirt_factor <- factor(tshirt, ordered = TRUE,
  10. levels = c("S", "M", "L"))
  11. tshirt_factor
  12. ## [1] M L S S L M L M
  13. ## Levels: S < M < L
  14. tshirt_factor[1] < tshirt_factor[2]
  15. ## [1] TRUE

列表 List

Vector - Matrix - List

  • Vector: 1D, same type
  • Matrix: 2D, same type
  • List:
    • Different R objects
    • No coercion
    • Loss of some functionality

创建列表 list()

  1. list("Rsome times", 190, 5)
  2. ## [[1]]
  3. ## [1] "Rsome times"
  4. ##
  5. ## [[2]]
  6. ## [1] 190
  7. ##
  8. ## [[3]]
  9. ## [1] 5
  10. song <- list("Rsome times", 190, 5)
  11. is.list(song)
  12. ## [1] TRUE

命名列表

  1. #1
  2. song <- list("Rsome times", 190, 5)
  3. names(song) <- c("title", "duration", "track")
  4. song
  5. ## $title
  6. ## [1] "Rsome times"
  7. ##
  8. ## $duration
  9. ## [1] 190
  10. ##
  11. ## $track
  12. ## [1] 5
  13. #2
  14. song <- list(title = "Rsome times",
  15. duration = 190,
  16. track = 5)
  17. song
  18. ## $title
  19. ## [1] "Rsome times"
  20. ##
  21. ## $duration
  22. ## [1] 190
  23. ##
  24. ## $track
  25. ## [1] 5
  26. str(song)
  27. ## List of 3
  28. ## $ title : chr "Rsome times"
  29. ## $ duration: num 190
  30. ## $ track : num 5

列表嵌套

  1. similar_song <- list(title = "R you on time?",
  2. duration = 230)
  3. song <- list(title = "Rsome times",
  4. duration = 190, track = 5,
  5. similar = similar_song)
  6. str(song)
  7. ## List of 4
  8. ## $ title : chr "Rsome times"
  9. ## $ duration: num 190
  10. ## $ track : num 5
  11. ## $ similar :List of 2
  12. ## ..$ title : chr "R you on time?"
  13. ## ..$ duration: num 230

子集运算

[ versus [[

  1. similar_song <- list(title = "R you on time?",
  2. duration = 230)
  3. song <- list(title = "Rsome times",
  4. duration = 190, track = 5,
  5. similar = similar_song)
  6. str(song)
  7. ## List of 4
  8. ## $ title : chr "Rsome times"
  9. ## $ duration: num 190
  10. ## $ track : num 5
  11. ## $ similar :List of 2
  12. ## ..$ title : chr "R you on time?"
  13. ## ..$ duration: num 230
  14. song[1]
  15. ## $title
  16. ## [1] "Rsome times"
  17. song[[1]]
  18. ## [1] "Rsome times"
  19. song[c(1, 3)]
  20. ## $title
  21. ## [1] "Rsome times"
  22. ##
  23. ## $track
  24. ## [1] 5
  25. #song[[c(1, 3)]] #can't work
  26. #song[[1]][[3]] #can't work
  27. song[["duration"]]
  28. ## [1] 190
  29. song["duration"]
  30. ## $duration
  31. ## [1] 190
  32. song[c(FALSE, TRUE, TRUE, FALSE)]
  33. ## $duration
  34. ## [1] 190
  35. ##
  36. ## $track
  37. ## [1] 5
  38. #song[[c(FALSE, TRUE, TRUE, FALSE)]] # can't work
  39. #song[[F]][[T]][[T]][[F]] #also
  40. # list in list
  41. song[[4]][[1]]
  42. ## [1] "R you on time?"
  43. song[[c(4, 1)]]
  44. ## [1] "R you on time?"
  45. song[c("duration", "similar")]
  46. ## $duration
  47. ## [1] 190
  48. ##
  49. ## $similar
  50. ## $similar$title
  51. ## [1] "R you on time?"
  52. ##
  53. ## $similar$duration
  54. ## [1] 230

[[ or [ ? + [[ to select list element + [ results in

sublist + [[ and $ to subset and extend lists

列表扩展

这里引出了R中比较重要的一个符号$

  1. similar_song <- list(title = "R you on time?",
  2. duration = 230)
  3. song <- list(title = "Rsome times",
  4. duration = 190, track = 5,
  5. similar = similar_song)
  6. #$
  7. song$duration
  8. ## [1] 190
  9. #extending
  10. friends <- c("Kurt", "Florence",
  11. "Patti", "Dave")
  12. song$sent <- friends #或者 song[["sent"]] <- friends
  13. song$similar$reason <- "too long"
  14. song
  15. ## $title
  16. ## [1] "Rsome times"
  17. ##
  18. ## $duration
  19. ## [1] 190
  20. ##
  21. ## $track
  22. ## [1] 5
  23. ##
  24. ## $similar
  25. ## $similar$title
  26. ## [1] "R you on time?"
  27. ##
  28. ## $similar$duration
  29. ## [1] 230
  30. ##
  31. ## $similar$reason
  32. ## [1] "too long"
  33. ##
  34. ##
  35. ## $sent
  36. ## [1] "Kurt" "Florence" "Patti" "Dave"

数据框 Data Frame

  • Observations 观测值
  • Variables 变量
  • Example: people
    • each person = observation
    • properties (name, age …) = variables
  • Rows = observations (persons)
  • Columns = variables (age, name, …)

不同的变量的观测值可以类型不同,但是变量自己的所有观测值类型一致。

多在导入数据时使用。

创建数据框

  1. name <- c("Anne", "Pete", "Frank", "Julia", "Cath")
  2. age <- c(28, 30, 21, 39, 35)
  3. child <- c(FALSE, TRUE, TRUE, FALSE, TRUE)
  4. df <- data.frame(name, age, child)
  5. str(df)
  6. ## 'data.frame': 5 obs. of 3 variables:
  7. ## $ name : Factor w/ 5 levels "Anne","Cath",..: 1 5 3 4 2
  8. ## $ age : num 28 30 21 39 35
  9. ## $ child: logi FALSE TRUE TRUE FALSE TRUE

命名数据框

  1. name <- c("Anne", "Pete", "Frank", "Julia", "Cath")
  2. age <- c(28, 30, 21, 39, 35)
  3. child <- c(FALSE, TRUE, TRUE, FALSE, TRUE)
  4. df <- data.frame(name, age, child)
  5. names(df) <- c("Name", "Age", "Child")
  6. str(df)
  7. ## 'data.frame': 5 obs. of 3 variables:
  8. ## $ Name : Factor w/ 5 levels "Anne","Cath",..: 1 5 3 4 2
  9. ## $ Age : num 28 30 21 39 35
  10. ## $ Child: logi FALSE TRUE TRUE FALSE TRUE
  11. df <- data.frame(Name = name, Age = age, Child = child) #also
  12. str(df)
  13. ## 'data.frame': 5 obs. of 3 variables:
  14. ## $ Name : Factor w/ 5 levels "Anne","Cath",..: 1 5 3 4 2
  15. ## $ Age : num 28 30 21 39 35
  16. ## $ Child: logi FALSE TRUE TRUE FALSE TRUE

可见,这里的字符串向量,被自动转化为因子类型,所以可以设置参数来避免此隐含行为。

  1. name <- c("Anne", "Pete", "Frank", "Julia", "Cath")
  2. age <- c(28, 30, 21, 39, 35)
  3. child <- c(FALSE, TRUE, TRUE, FALSE, TRUE)
  4. df <- data.frame(name, age, child,
  5. stringsAsFactors = FALSE)
  6. str(df)
  7. ## 'data.frame': 5 obs. of 3 variables:
  8. ## $ name : chr "Anne" "Pete" "Frank" "Julia" ...
  9. ## $ age : num 28 30 21 39 35
  10. ## $ child: logi FALSE TRUE TRUE FALSE TRUE

子集运算

Subset Data Frame * Subsetting syntax from matrices and lists * [

from matrices * [[ and $ from lists

  1. name <- c("Anne", "Pete", "Frank", "Julia", "Cath")
  2. age <- c(28, 30, 21, 39, 35)
  3. child <- c(FALSE, TRUE, TRUE, FALSE, TRUE)
  4. people <- data.frame(name, age, child,
  5. stringsAsFactors = FALSE)
  6. # 类似矩阵的操作
  7. people[3,2]
  8. ## [1] 21
  9. people[3,"age"]
  10. ## [1] 21
  11. people[,"age"]
  12. ## [1] 28 30 21 39 35
  13. people[3,] # 由于返回的是一个数据框,我的R notebook不显示数据框
  14. ## name age child
  15. ## 3 Frank 21 TRUE
  16. people[c(3, 5), c("age", "child")] # 同上
  17. ## age child
  18. ## 3 21 TRUE
  19. ## 5 35 TRUE
  20. # 类似列表的操作
  21. people$age
  22. ## [1] 28 30 21 39 35
  23. people[["age"]]
  24. ## [1] 28 30 21 39 35
  25. people[[2]]
  26. ## [1] 28 30 21 39 35
  27. ## 由于返回的是一个数据框,我的R notebook不显示数据框
  28. people["age"]
  29. ## age
  30. ## 1 28
  31. ## 2 30
  32. ## 3 21
  33. ## 4 39
  34. ## 5 35
  35. people[2]
  36. ## age
  37. ## 1 28
  38. ## 2 30
  39. ## 3 21
  40. ## 4 39
  41. ## 5 35

扩展数据框

Extend Data Frame * Add columns = add variables * Add rows = add

observations

  1. name <- c("Anne", "Pete", "Frank", "Julia", "Cath")
  2. age <- c(28, 30, 21, 39, 35)
  3. child <- c(FALSE, TRUE, TRUE, FALSE, TRUE)
  4. people <- data.frame(name, age, child,
  5. stringsAsFactors = FALSE)
  6. #Add column
  7. height <- c(163, 177, 163, 162, 157)
  8. people$height <- height
  9. str(people)
  10. ## 'data.frame': 5 obs. of 4 variables:
  11. ## $ name : chr "Anne" "Pete" "Frank" "Julia" ...
  12. ## $ age : num 28 30 21 39 35
  13. ## $ child : logi FALSE TRUE TRUE FALSE TRUE
  14. ## $ height: num 163 177 163 162 157
  15. ##also
  16. people[["height"]] <- height
  17. str(people)
  18. ## 'data.frame': 5 obs. of 4 variables:
  19. ## $ name : chr "Anne" "Pete" "Frank" "Julia" ...
  20. ## $ age : num 28 30 21 39 35
  21. ## $ child : logi FALSE TRUE TRUE FALSE TRUE
  22. ## $ height: num 163 177 163 162 157
  23. weight <- c(74, 63, 68, 55, 56)
  24. cbind(people, weight)
  25. ## name age child height weight
  26. ## 1 Anne 28 FALSE 163 74
  27. ## 2 Pete 30 TRUE 177 63
  28. ## 3 Frank 21 TRUE 163 68
  29. ## 4 Julia 39 FALSE 162 55
  30. ## 5 Cath 35 TRUE 157 56
  31. #Add row 这里要注意,有时候会出错
  32. tom <- data.frame("Tom", 37, FALSE, 183)
  33. #rbind(people, tom)
  34. #会报错:
  35. #Error : names do not match previous names
  36. tom <- data.frame(name = "Tom", age = 37,
  37. child = FALSE, height = 183)
  38. rbind(people, tom)
  39. ## name age child height
  40. ## 1 Anne 28 FALSE 163
  41. ## 2 Pete 30 TRUE 177
  42. ## 3 Frank 21 TRUE 163
  43. ## 4 Julia 39 FALSE 162
  44. ## 5 Cath 35 TRUE 157
  45. ## 6 Tom 37 FALSE 183

排序

这里主要介绍了sort()order(),其中,order()更适合用来为数据框调整顺序。

  1. str(people)
  2. ## 'data.frame': 5 obs. of 4 variables:
  3. ## $ name : chr "Anne" "Pete" "Frank" "Julia" ...
  4. ## $ age : num 28 30 21 39 35
  5. ## $ child : logi FALSE TRUE TRUE FALSE TRUE
  6. ## $ height: num 163 177 163 162 157
  7. #sort()直接对于向量元素进行了排序
  8. sort(people$age)
  9. ## [1] 21 28 30 35 39
  10. #order()会返回对应大小等级所实际在的位置
  11. ranks <- order(people$age)
  12. ranks
  13. ## [1] 3 1 2 5 4
  14. people$age
  15. ## [1] 28 30 21 39 35
  16. people[ranks, ] #直接对行进行了排序
  17. ## name age child height
  18. ## 3 Frank 21 TRUE 163
  19. ## 1 Anne 28 FALSE 163
  20. ## 2 Pete 30 TRUE 177
  21. ## 5 Cath 35 TRUE 157
  22. ## 4 Julia 39 FALSE 162
  23. #或者如下可以实现降序排序
  24. people[order(people$age, decreasing = TRUE), ]
  25. ## name age child height
  26. ## 4 Julia 39 FALSE 162
  27. ## 5 Cath 35 TRUE 157
  28. ## 2 Pete 30 TRUE 177
  29. ## 1 Anne 28 FALSE 163
  30. ## 3 Frank 21 TRUE 163

绘图 Graphics

这里主要介绍了graphics包的plot()hist()

plot()会根据不同的数据类型,而画出不同的图像

  1. plot() (categorical) 条形图 例如:plot(countries$continent)
  2. plot() (numerical) 散点图 例如:plot(countries$population)
  3. plot() (2x numerical) 散点图

    例如:plot(countries$area, countries$population)

    plot(log(countries$area), log(countries$population))
  4. plot() (2x categorical) 某种条形图的变形

    例如:plot(countries$continent, countries$religion)

hist()可以绘制直方图 例如: hist(africa$population)

hist(africa$population, breaks = 10)

Other graphics functions * barplot() * boxplot() * pairs()

自定义绘图

这里就是修改参数了。无需多讲。

这里,引出了函数par(),这是一个绘图的公共参数列表,里面存放着常用的一些绘图的公共属性,可以实现绘制多幅图形时,基本属性的一次性确定。

例如:

  1. par(col = "blue")
  2. plot(mercury$temperature, mercury$pressure)

常用的plot的属性有:

  1. plot(mercury$temperature, mercury$pressure,
  2. xlab = "Temperature",
  3. ylab = "Pressure",
  4. main = "T vs P for Mercury", #标题
  5. type = "o",
  6. col = "orange",
  7. col.main = "darkgray",
  8. cex.axis = 0.6, #cex系列属性表示缩放程度
  9. lty = 5, #Line Type
  10. pch = 4 #Plot Symbol
  11. )

多图绘制

mfrowmfcol参数可以在一个图形框里,用来放置多个图像,区别是,前者是将后面plot语句生成的图像按行填充,而后者是按列填充。

  1. #按行填充
  2. par(mfrow = c(2,2))
  3. plot(shop$ads, shop$sales)
  4. plot(shop$comp, shop$sales)
  5. plot(shop$inv, shop$sales)
  6. plot(shop$size_dist, shop$sales)
  7. #按列填充
  8. par(mfcol = c(2,2))
  9. plot(shop$ads, shop$sales)
  10. plot(shop$comp, shop$sales)
  11. plot(shop$inv, shop$sales)
  12. plot(shop$size_dist, shop$sales)

Reset the grid

  1. par(mfrow = c(1,1))

相较于这个,layout()函数设置的更为灵活。

  1. grid <- matrix(c(1, 1, 2, 3), nrow = 2,
  2. ncol = 2, byrow = TRUE)
  3. layout(grid)
  4. plot(shop$ads, shop$sales) #放在grid的1号位置
  5. plot(shop$comp, shop$sales) #放在grid的2号位置
  6. plot(shop$inv, shop$sales) #放在grid的3号位置

Reset the grid

  1. layout(1)
  2. par(mfcol = c(1,1))

Reset all parameters

  1. old_par <- par()
  2. par(col = "red")
  3. plot(shop$ads, shop$sales)
  4. par(old_par)
  5. plot(shop$ads, shop$sales)

线性拟合

引出函数lm() —— linear

model
,**lm(a~b)就是对a=k*b+c进行线性拟合**

  1. plot(shop$ads, shop$sales,
  2. pch = 16, col = 2,
  3. xlab = "advertisement",
  4. ylab = "net sales")
  5. lm_sales <- lm(shop$sales ~ shop$ads)
  6. abline(coef(lm_sales), lwd = 2) #取模型系数,线宽为2,画直线

【R】数据结构的更多相关文章

  1. R: 数据结构、数据类型的描述。

    ################################################### 问题:数据结构..类型  18.4.27 有哪些数据结构.类型??  各自有什么特点? 解决方案 ...

  2. 机器学习与R语言

    此书网上有英文电子版:Machine Learning with R - Second Edition [eBook].pdf(附带源码) 评价本书:入门级的好书,介绍了多种机器学习方法,全部用R相关 ...

  3. 机器学习 1、R语言

    R语言 R是用于统计分析.绘图的语言和操作环境.R是属于GNU系统的一个自由.免费.源代码开放的软件,它是一个用于统计计算和统计制图的优秀工具. 特点介绍 •主要用于统计分析.绘图.数据挖掘 •R内置 ...

  4. 《R实战》读书笔记三

    第二章  创建数据集 本章概要 1探索R数据结构 2使用数据编辑器 3数据导入 4数据集标注 本章所介绍内容概括例如以下. 两个方面的内容. 方面一:R数据结构 方面二:进入数据或者导入数据到数据结构 ...

  5. 新工具︱微软Microsoft Visual Studio的R语言模块下载试用Ing...(尝鲜)

    笔者:前几天看到了以下的图片,着实一惊.作为R语言入门小菜鸟,还是觉得很好看,于是花了一点时间下载下来试用了一下,觉得还是挺高大上的. 就是英文不好是硬伤.下面贴给小白,我当时的下载步骤与遇见的问题. ...

  6. 让R与Python共舞

    转载:http://ices01.sinaapp.com/?p=129      R(又称R语言)是一款开源的跨平台的数值统计和数值图形化展现 工具.通俗点说,R是用来做统计和画图的.R拥有自己的脚本 ...

  7. Linux下源码编译安装rpy2

    R(又称R语言)是一款开源的跨平台的数值统计和数值图形化展现工具.rpy2是Python直接调用R的第三方库,它可以实现使用python读取R的对象.调用R的方法以及Python与R数据结构转换等.这 ...

  8. web server性能优化浅谈

    作者:ZhiYan,Jack47 转载请保留作者和原文出处 Update: 2018.8.8 在无锁小节增加了一些内容 性能优化,优化的东西一定得在主路径上,结合测量的结果去优化.不然即使性能再好,逻 ...

  9. iOS开发--知识点总结

    1 .全局变量,变量名前加下划线.和系统一致. 2 . nil指针为空   @“”字符串为空 (内容为空)       ==  判断内存地址   基本变量    对于一些基本类型 可以使用==来判断, ...

  10. 使用 Rcpp

    正如我们所提到的那样,并行计算只有在每次迭代都是独立的情况下才可行,这样最终结果才不会依赖运行顺序.然而,并非所有任务都像这样理想.因此,并行计算可能会受到影响.那么怎样才能使算法快速运行,并且可以轻 ...

随机推荐

  1. javascript之定时器的使用

    一:什么是定时器 (一)无限循环定时器 <script> window.onload = function(){ function test(){ alert("test&quo ...

  2. github readme 添加图片预览

    ![img](https://github.com/lanshengzhong/mina_alan/blob/master/screenshot/2.gif) ![图片加载失败的时候就会显示这段话]( ...

  3. java面向对象2-封装

    2 封装 封装:是指隐藏对象的属性和实现细节,仅对外提供公共访问方式,面向对象三大特点之一.好处: 防止数据被任意篡改,提高安全性 隐藏了实现细节,仅暴露方法 如何实现封装? 使用private关键字 ...

  4. 【长郡NOIP2014模拟10.22】搞笑的代码

    题目 在OI界存在着一位传奇选手--QQ,他总是以风格迥异的搞笑代码受世人围观 某次某道题目的输入是一个排列,他使用了以下伪代码来生成数据 while 序列长度<n do { 随机生成一个整数属 ...

  5. 【leetcode】1179. Reformat Department Table

    题目如下: SQL Schema Table: Department +---------------+---------+ | Column Name | Type | +------------- ...

  6. 【shell】sed后向引用替换文本

    要求如下: 原文 <server name="92服" port="10092" os="android" hidden=" ...

  7. Python 变量类型 Ⅱ

    Python字符串 字符串或串(String)是由数字.字母.下划线组成的一串字符. 一般记为 : s="a1a2···an"(n>=0) 它是编程语言中表示文本的数据类型. ...

  8. ssh 密码连接报错 permission denied

    背景:ssh连接openstack  king时,出错permission denied (publickey,gssapi-keyex,gssapi-with-mic) 原因:king未开启ssh ...

  9. aspnet:MaxHttpCollectionKeys 不起作用

    场景: vs2010  webform  表单域长度,在webconfig中加入该节点,有的项目起作用,有的项目无效,不知道是什么原因??

  10. k8s安装报错 Error: unknown flag: --experimental-upload-certs

    今天安装k8sV1.16的版本时候,执行突然发现命令不对,之前安装V1.15的时候是可以的,可能是版本升级的原因. 解决: unknown flag: --experimental-upload-ce ...