字符串处理中基本函数的使用

R自带函数与stringr包函数对比

> states <- row.names(USArrests)
> # 提取字符串子集
> substr(x = states, start = 1, stop = 4)
[1] "Alab" "Alas" "Ariz" "Arka" "Cali" "Colo" "Conn" "Dela" "Flor" "Geor" "Hawa" "Idah" "Illi" "Indi" "Iowa" "Kans" "Kent"
[18] "Loui" "Main" "Mary" "Mass" "Mich" "Minn" "Miss" "Miss" "Mont" "Nebr" "Neva" "New " "New " "New " "New " "Nort" "Nort"
[35] "Ohio" "Okla" "Oreg" "Penn" "Rhod" "Sout" "Sout" "Tenn" "Texa" "Utah" "Verm" "Virg" "Wash" "West" "Wisc" "Wyom"
> abbreviate(states, minlength = 5)
Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware
"Alabm" "Alask" "Arizn" "Arkns" "Clfrn" "Colrd" "Cnnct" "Delwr"
Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas
"Flord" "Georg" "Hawai" "Idaho" "Illns" "Indin" "Iowa" "Kanss"
Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi
"Kntck" "Lousn" "Maine" "Mryln" "Mssch" "Mchgn" "Mnnst" "Mssss"
Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York
"Missr" "Montn" "Nbrsk" "Nevad" "NwHmp" "NwJrs" "NwMxc" "NwYrk"
North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina
"NrthC" "NrthD" "Ohio" "Oklhm" "Oregn" "Pnnsy" "RhdIs" "SthCr"
South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia
"SthDk" "Tnnss" "Texas" "Utah" "Vrmnt" "Virgn" "Wshng" "WstVr"
Wisconsin Wyoming
"Wscns" "Wymng"
> # 计算字符串长度
> nchar(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> str_count(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> str_length(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> # 大写和小写
> tolower(states) # 变为小写
[1] "alabama" "alaska" "arizona" "arkansas" "california" "colorado" "connecticut"
[8] "delaware" "florida" "georgia" "hawaii" "idaho" "illinois" "indiana"
[15] "iowa" "kansas" "kentucky" "louisiana" "maine" "maryland" "massachusetts"
[22] "michigan" "minnesota" "mississippi" "missouri" "montana" "nebraska" "nevada"
[29] "new hampshire" "new jersey" "new mexico" "new york" "north carolina" "north dakota" "ohio"
[36] "oklahoma" "oregon" "pennsylvania" "rhode island" "south carolina" "south dakota" "tennessee"
[43] "texas" "utah" "vermont" "virginia" "washington" "west virginia" "wisconsin"
[50] "wyoming"
> toupper(states) # 变为大写
[1] "ALABAMA" "ALASKA" "ARIZONA" "ARKANSAS" "CALIFORNIA" "COLORADO" "CONNECTICUT"
[8] "DELAWARE" "FLORIDA" "GEORGIA" "HAWAII" "IDAHO" "ILLINOIS" "INDIANA"
[15] "IOWA" "KANSAS" "KENTUCKY" "LOUISIANA" "MAINE" "MARYLAND" "MASSACHUSETTS"
[22] "MICHIGAN" "MINNESOTA" "MISSISSIPPI" "MISSOURI" "MONTANA" "NEBRASKA" "NEVADA"
[29] "NEW HAMPSHIRE" "NEW JERSEY" "NEW MEXICO" "NEW YORK" "NORTH CAROLINA" "NORTH DAKOTA" "OHIO"
[36] "OKLAHOMA" "OREGON" "PENNSYLVANIA" "RHODE ISLAND" "SOUTH CAROLINA" "SOUTH DAKOTA" "TENNESSEE"
[43] "TEXAS" "UTAH" "VERMONT" "VIRGINIA" "WASHINGTON" "WEST VIRGINIA" "WISCONSIN"
[50] "WYOMING"
> # 符号替换
> chartr("Tt", "Uu", "AgCTcctTagct")
[1] "AgCUccuUagcu"
> str_replace_all("AgCTcctTagct", pattern = "T", replacement = "U")
[1] "AgCUcctUagct"
> # 字符串连接
> paste("control", 1:3, sep = "_")
[1] "control_1" "control_2" "control_3"
> str_c("control", 1:3, sep = "_")
[1] "control_1" "control_2" "control_3"
> x <- c("I love R", "I'm fascinated by Statisitcs", "I")
> # 包含匹配
> grep(pattern = "love", x = x)
[1] 1
> grep(pattern = "love", x = x, value = TRUE)
[1] "I love R"
> grepl(pattern = "love", x = x)
[1] TRUE FALSE FALSE
> str_detect(string = x, pattern = "love")
[1] TRUE FALSE FALSE
> # match返回第一个完全匹配的位置
> match(x = "I",table = x)
[1] 3
> "I" %in% x
[1] TRUE
> # 字符串拆分
> text <- "I love R.\nI'm fascinated by Statisitcs."
> cat(text)
I love R.
I'm fascinated by Statisitcs.
> strsplit(text, split = " ")
[[1]]
[1] "I" "love" "R.\nI'm" "fascinated" "by" "Statisitcs."
> strsplit(text, split = "\\s")
[[1]]
[1] "I" "love" "R." "I'm" "fascinated" "by" "Statisitcs."
> str_split(text, pattern = "\\s")
[[1]]
[1] "I" "love" "R." "I'm" "fascinated" "by" "Statisitcs."
> # 匹配替换
> test_vector3 <- c("Without the vowels,We can still read the word.")
> sub(pattern = "[aeiou]",replacement = "-",x = test_vector3)
[1] "W-thout the vowels,We can still read the word."
> gsub(pattern = "[aeiou]",replacement = "-",x = test_vector3)
[1] "W-th--t th- v-w-ls,W- c-n st-ll r--d th- w-rd."
> str_replace_all(string = test_vector3, pattern = "[aeiou]",
+ replacement = "-")
[1] "W-th--t th- v-w-ls,W- c-n st-ll r--d th- w-rd."
> # 字符串定制输出
> string <- "Each character string in the input is first split into\n paragraphs
+ (or lines containing whitespace)"
> strwrap(x = string, width = 30)
[1] "Each character string in the" "input is first split into" "paragraphs (or lines" "containing whitespace)"
> str_wrap(string = string, width = 30)
[1] "Each character string in\nthe input is first split\ninto paragraphs (or lines\ncontaining whitespace)"
> cat(str_wrap(string = string, width = 30))
Each character string in
the input is first split
into paragraphs (or lines
containing whitespace)

R语言学习笔记(二十二):字符串处理中的函数对比(代码实现)的更多相关文章

  1. R语言学习笔记(十二):零碎知识点(31-35)

    31--round(),floor()和ceiling() round()四舍五入取整 floor()向下取整 ceiling()向上取整 > round(3.5) [1] 4 > flo ...

  2. R语言学习笔记(十五):获取文件和目录信息

    file.info() 参数是表示文件名称的字符串向量,函数会给出每个文件的大小.创建时间.是否为目录等信息. > file.info("z.txt") size isdir ...

  3. R语言学习笔记(十九):字符串处理中预定义字符组(表格介绍)

    R中预定义的字符组 代码 含义说明 [:digit:]或\\d 数字; [0-9] [^[:digit:]]或\\D 非数字; 等价于[^0-9] [:lower:] 小写字母; [a-z] [:up ...

  4. R语言学习笔记(十四):零碎知识点(41-45)

    41--ls( ) ls()可以用来列出现存的所有对象. pattern是一个具名参数,可以列出所有名称中含有字符串"s"的对象. > ls() [1] "s&qu ...

  5. R语言学习笔记(十):零碎知识点(21-25)

    21--assign() assign函数可以通过变量名的字符串来赋值 > assign('a', 1:3) > a [1] 1 2 3 > b <- c('a') > ...

  6. R语言学习笔记(十八):零碎知识点46-50

    seq_along与seq_len函数的使用 在for循环中有用 > seq_along(c(2,3,5)) [1] 1 2 3 > seq_len(3) [1] 1 2 3

  7. R语言学习笔记(十六):构建分割点函数

    选取预测概率的分割点 cutoff<- function(n,p){ pp<-1 i<-0 while (pp>=0.02) { model.predfu<-rep(&q ...

  8. 汇编入门学习笔记 (十二)—— int指令、port

    疯狂的暑假学习之  汇编入门学习笔记 (十二)--  int指令.port 參考: <汇编语言> 王爽 第13.14章 一.int指令 1. int指令引发的中断 int n指令,相当于引 ...

  9. VSTO 学习笔记(十二)自定义公式与Ribbon

    原文:VSTO 学习笔记(十二)自定义公式与Ribbon 这几天工作中在开发一个Excel插件,包含自定义公式,根据条件从数据库中查询结果.这次我们来做一个简单的测试,达到类似的目的. 即在Excel ...

随机推荐

  1. Oracle DUL/AUL/ODU 工具说明

    转自 http://blog.csdn.net/launch_225/article/details/7523195 假设我们的数据库遇到以下情况: 第一, 没有备份; 第二, 常规方法无法恢复; 第 ...

  2. 团队项目——软件需求分析(NABCD)

    一.团队项目简介 团队名称:SmartCoder 项目名称:<一起> 二.针对 " 地图可视化查看发布的内容 " 这一特点进行 NABCD 分析 N(Need需求) 往 ...

  3. Oracle EBS 清理归档

    oraprod 登陆数据库服务器 执行 rman target / 如图: 执行: delete noprompt force archivelog all completed before ‘sys ...

  4. 细说C#继承

    简介 继承(封装.多态)是面向对象编程三大特性之一,继承的思想就是摈弃代码的冗余,实现更好的重用性. 继承从字面上理解,无外乎让人想到某人继承某人的某些东西,一个给一个拿.这个语义在生活中,就像 家族 ...

  5. Hadoop HBase概念学习系列之概念视图(又名为逻辑模型)(八)

    其实啊,我们把HBase想象成一个大的映射关系,再者,本来,HBase存储的数据可以理解为一种key和value的映射关系,但有不是简简单单的映射关系那种,因为比如有各个时间戳版本啊. 通过行键.行键 ...

  6. November 26th 2016 Week 48th Saturday

    All growth is a leap in the dark. 所有的成长都是黑暗中的一跃. But it is a dark and long night, I can't see any st ...

  7. spark-submit提交参数设置

    /apps/app/spark-1.6.1-bin-hadoop2.6/bin/spark-submit --class com.zdhy.zoc2.sparksql.core.JavaSparkSq ...

  8. django 错误之 OSError: mysql_config not found

    pip 导入包时出现如下错误 Complete output from command python setup.py egg_info: /bin/: mysql_config: not found ...

  9. 深入 Java 调试体系: 第 1 部分,初探JPDA 体系

    JPDA(Java Platform Debugger Architecture)是 Java 平台调试体系结构的缩写,通过 JPDA 提供的 API,开发人员可以方便灵活的搭建 Java 调试应用程 ...

  10. python第十课——循环结构收尾

    2.4.time模块的初体验 sleep(sec)函数:程序一旦执行到sleep()函数,会立即休眠sec秒,等到时间到了,自动醒过来,然后继续往下执行... 思路步骤: 第一步:导入time模块 i ...