spark-shell简单使用介绍(scala)

>>提君博客原创 http://www.cnblogs.com/tijun/ <<

1.进入命令窗口

./bin/spark-shell

附上帮助指令，查看一些帮助信息

scala> :help

All commands can be abbreviated, e.g., :he instead of :help.

:edit <id>|<line>        edit history

:help [command]          print this summary or command-specific help

:history [num]           show the history (optional num is commands to show)

:h? <string>             search the history

:imports [name name ...] show import history, identifying sources of names

:implicits [-v]          show the implicits in scope

:javap <path|class>      disassemble a file or class name

:line <id>|<line>        place line(s) at the end of history

:load <path>             interpret lines in a file

:paste [-raw] [path]     enter paste mode or paste a file

:power                   enable power user mode

:quit                    exit the interpreter

:replay [options]        reset the repl and replay all previous commands

:require <path>          add a jar to the classpath

:reset [options]         reset the repl to its initial state, forgetting all session entries

:save <path>             save replayable session to a file

:sh <command line>       run a shell command (result is implicitly => List[String])

:settings <options>      update compiler options, if possible; see reset

:silent                  disable/enable automatic printing of results

:type [-v] <expr>        display the type of an expression without evaluating it

:kind [-v] <expr>        display the kind of expression's type

:warnings                show the suppressed warnings from the most recent line which had any

2.使用spark加载文件，创建Dataset

scala> val textFile = spark.read.textFile("hdfs://cluster1/input/README.txt")

textFile: org.apache.spark.sql.Dataset[String] = [value: string]

3.使用sc加载文件，创建RDD

scala> val textFile=sc.textFile("hdfs://cluster1/input/README.txt")

textFile: org.apache.spark.rdd.RDD[String] = hdfs://cluster1/input/README.txt MapPartitionsRDD[1] at textFile at <console>:24

4.统计textFile里面有多少行(item)

提君博客原创

scala> textFile.count()    // Number of items in this Dataset

res0: Long =

5.查看第一个iterm

scala> textFile.first()   // First item in this Dataset

res1: String = For the latest information about Hadoop, please visit our website at:

上面都挺简单，下面来一个完整的wordcount

>>提君博客原创 http://www.cnblogs.com/tijun/ <<

6.wordcount

scala> val wordsRdd=textFile.flatMap(line=>line.split(" "))

wordsRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[] at flatMap at <console>:

scala> val kvsRdd=wordsRdd.map(word=>(word,))

kvsRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[] at map at <console>:

scala> val countRdd=kvsRdd.reduceByKey(_+_)

countRdd: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[] at reduceByKey at <console>:

scala> countRdd.collect()

res2: Array[(String, Int)] = Array((under,), (this,), (distribution,), (Technology,), (country,), (is,), (Jetty,), (currently,), (permitted.,), (check,), (have,), (Security,), (U.S.,), (with,), (BIS,), (This,), (mortbay.org.,), ((ECCN),), (using,), (security,), (Department,), (export,), (reside,), (any,), (algorithms.,), (from,), (re-export,), (has,), (SSL,), (Industry,), (Administration,), (details,), (provides,), (http://hadoop.apache.org/core/,1), (country's,1), (Unrestricted,1), (740.13),1), (policies,1), (country,,1), (concerning,1), (uses,1), (Apache,1), (possession,,2), (information,2), (our,2), (as,1), ("",18), (Bureau,1), (wiki,,1), (please,2), (form,1), (information.,1), (ENC,1), (Export,2), (included,1), (asymmetric,1), (Commodity,1), (For,1),...

本篇先暂时写到这里，后续再继续完善。

提君博客原创

>>提君博客原创 http://www.cnblogs.com/tijun/ <<

spark-shell简单使用介绍(scala)的更多相关文章

Spark Shell简单使用
基础 Spark的shell作为一个强大的交互式数据分析工具,提供了一个简单的方式学习API.它可以使用Scala(在Java虚拟机上运行现有的Java库的一个很好方式)或Python.在Spark目 ...
Spark学习进度-Spark环境搭建&Spark shell
Spark环境搭建下载包所需Spark包:我选择的是2.2.0的对应Hadoop2.7版本的,下载地址:https://archive.apache.org/dist/spark/spark-2. ...
在Scala IDEA for Eclipse或IDEA里程序编译实现与在Spark Shell下的对比（其实就是那么一回事）
不多说,直接上干货! 比如,我这里拿主成分分析(PCA). 1.主成分分析(PCA)的概念介绍主成分分析(PCA) 是一种对数据进行旋转变换的统计学方法,其本质是在线性空间中进行一个基变换,使得变换 ...
01 . Shell详细入门介绍及简单应用
Shell简介 Shell 是一个 C 语言编写的脚本语言,它是用户与 Linux 的桥梁,用户输入命令交给 Shell 解释处理Shell 将相应的操作传递给内核(Kernel),内核把处理的结果输 ...
Spark源码分析之Spark Shell（上）
终于开始看Spark源码了,先从最常用的spark-shell脚本开始吧.不要觉得一个启动脚本有什么东东,其实里面还是有很多知识点的.另外,从启动脚本入手,是寻找代码入口最简单的方法,很多开源框架,其 ...
1. Spark的安装及介绍
*以下内容由<Spark快速大数据分析>整理所得. 读书笔记的第一部分是记录如何安装Spark?同时,简单介绍下Spark. 一.Spark安装二.Spark介绍一.Spark安装如 ...
02、体验Spark shell下RDD编程
02.体验Spark shell下RDD编程 1.Spark RDD介绍 RDD是Resilient Distributed Dataset,中文翻译是弹性分布式数据集.该类是Spark是核心类成员之 ...
Spark集群 + Akka + Kafka + Scala 开发(2) : 开发一个Spark应用
前言在Spark集群 + Akka + Kafka + Scala 开发(1) : 配置开发环境,我们已经部署好了一个Spark的开发环境. 本文的目标是写一个Spark应用,并可以在集群中测试. ...
Spark集群 + Akka + Kafka + Scala 开发(1) : 配置开发环境
目标配置一个spark standalone集群 + akka + kafka + scala的开发环境. 创建一个基于spark的scala工程,并在spark standalone的集群环境中运 ...

随机推荐

linux学习笔记整理（八）
第九章文件的归档和压缩本节所讲内容:9.1 tar命令进行文件的归档和压缩9.2 zip管理压缩文件9.3 了解gzip-bzip2- xz管理压缩文件-file-sort查看文件 9.1 tar命 ...
mysql概述
MySql大致分为三层结构: 第一层:客户端并非MySql所独有,例如:连接处理.授权认证.安全等功能均在这一层处理第二层:核心服务包括查询解析.分析.优化.缓存.内置函数(比如 : 时间.数学.加 ...
理解Shadow DOM(一)
1. 什么是Shadow DOM? Shadow DOM 如果按照英文翻译的话可以理解为影子DOM, 何为影子DOM呢?可以理解为一般情况下使用肉眼看不到的DOM结构,那如果一般情况下看不到的话,那 ...
面试官问，说一个你在工作非常有价值的bug
如果你去参考面试,做足了准备,面对面试官员从容不迫,吐沫横飞的大谈自己的工作经历.突然,面试官横插一句:说一个你在工作非常有价值的bug.顿时,整个空气都仿佛都凝固了!“What?”... 我想没几个 ...
centos 7 安装mqtt 修改用户名和密码
我先新买个Centos 的系统咱登录呢就用这个软件,主要是方便,可以少安装一些东西根据自己的系统选择,上面的是32位的. 输入 root 回车输入密码然后回车输入的时候什么也不显示输入 c ...
ASP MD5
<% Private Const BITS_TO_A_BYTE = 8 Private Const BYTES_TO_A_WORD = 4 Private Const BITS_TO_A_WOR ...
UIWindow 官方文档解析
UIWindow定义了一个window对象,其用于管理和协调一个app在设备屏幕上的显示.除非一个app能在外部设备上显示内容,一般就只有一个window. window的主要功能:1)提供一个区域来 ...
C#(.NET) HMAC SHA256实现
HMAC SHA256的实现比较简单,可以用多种语言实现,下面我用C#语言实现,一种结果是居于BASE64,另外一种是居于64位. C# HMAC SHA256 (Base64) using Syst ...
SpringBoot集成Apache Shiro
笔者因为项目转型的原因,对Apache Shiro安全框架做了一点研究工作,故想写点东西以便将来查阅.之所以选择Shiro也是看了很多人的推荐,号称功能丰富强大,而且易于使用.实践下来的确如大多数人所 ...
Python-类的继承与派生
python中类的继承分为:单继承和多继承 class ParentClass1: #定义父类 pass class ParentClass2: #定义父类 pass class SubClass1( ...

spark-shell简单使用介绍(scala)

spark-shell简单使用介绍(scala)的更多相关文章

随机推荐

热门专题