IDEA开发spark本地运行
1.建立spakTesk项目,建立scala对象Test
2.Tesk对象的代码如下
package sparkTest /**
* Created by jiahong on 15-8-2.
*/
import org.apache.spark.{SparkConf,SparkContext}
object Test {
def main(args: Array[String]) {
if (args.length < ) {
System.err.println("Usage: <file>")
System.exit()
} val conf=new SparkConf().setAppName("Test").setMaster("local")
val sc=new SparkContext(conf) val rdd=sc.textFile("/home/jiahong/sparkWorkSpace/input") //统计单词个数,然后按个数从高到低排序
val result=rdd.flatMap(_.split(" ")).map((_,)).reduceByKey(_+_).map(x=>(x._2,x._1)).sortByKey(false).map(x=>(x._2,x._1)) result.saveAsTextFile("/home/jiahong/sparkWorkSpace/output") print(result) } }
本地测试hive的话,代码如下:
package sparkTest.sparkSql import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkContext, SparkConf} object HiveSqlTest {
def main(args: Array[String]) { val conf = new SparkConf().setAppName("HiveLink").setMaster("spark://JIAs-Mac.local:7077").
setJars(Array("/Users/JIA/Desktop/jar/hiveTest/sparkTest.jar")) val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc) val sql = "select * from Test limit 100" sqlContext.sql(sql).map(s => s(0) + "," + s(1) + "," + s(2) + "," + s(3)+","+s(4)).collect().foreach(println) }
}
注意:需要把hive-site.xml放到项目目录下,新建Resources设置为Resources root
3.设置本地运行,在IDEA的右上角-点开Edit Configurations
4.设置本地运行,在Vm options:上填写:-Dspark.master=local ,Program arguments上填写:local
5.点击run运行,run前先开启本机的spark
/usr/lib/jdk/jdk1..0_79/bin/java -Dspark.master=local -Didea.launcher.port= -Didea.launcher.bin.path=/home/jiahong/idea-IC-141.1532./bin -Dfile.encoding=UTF- -classpath /usr/lib/jdk/jdk1..0_79/jre/lib/resources.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jfxrt.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/charsets.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jsse.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/rt.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/plugin.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/deploy.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jfr.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/javaws.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/management-agent.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jce.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/zipfs.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/dnsns.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunec.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunjce_provider.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunpkcs11.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/localedata.jar:/home/jiahong/IdeaProjects/sparkTest/out/production/sparkTest:/home/jiahong/apache/spark-1.3.-bin-hadoop2./lib/spark-assembly-1.3.-hadoop2.6.0.jar:/home/jiahong/apache/scala-2.10./lib/scala-actors-migration.jar:/home/jiahong/apache/scala-2.10./lib/scala-reflect.jar:/home/jiahong/apache/scala-2.10./lib/scala-actors.jar:/home/jiahong/apache/scala-2.10./lib/scala-swing.jar:/home/jiahong/apache/scala-2.10./lib/scala-library.jar:/home/jiahong/idea-IC-141.1532./lib/idea_rt.jar com.intellij.rt.execution.application.AppMain sparkTest.Test local
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
// :: INFO SparkContext: Running Spark version 1.3.
// :: WARN Utils: Your hostname, jiahong-OptiPlex- resolves to a loopback address: 127.0.1.1; using 192.168.199.187 instead (on interface eth0)
// :: WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
// :: WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO SecurityManager: Changing view acls to: jiahong
// :: INFO SecurityManager: Changing modify acls to: jiahong
// :: INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jiahong); users with modify permissions: Set(jiahong)
// :: INFO Slf4jLogger: Slf4jLogger started
// :: INFO Remoting: Starting remoting
// :: INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@jiahong-OptiPlex-7010.lan:37917]
// :: INFO Utils: Successfully started service 'sparkDriver' on port .
// :: INFO SparkEnv: Registering MapOutputTracker
// :: INFO SparkEnv: Registering BlockManagerMaster
// :: INFO DiskBlockManager: Created local directory at /tmp/spark-a2cbde0d--4a95-80df-a99a14127efc/blockmgr-3cbdae80-810a-4ecf-b012-0979b3d714d0
// :: INFO MemoryStore: MemoryStore started with capacity 469.5 MB
// :: INFO HttpFileServer: HTTP File server directory is /tmp/spark--df98-4e7e-afa1-4dd36b655012/httpd-28cb8de9-caa4---347cea890b07
// :: INFO HttpServer: Starting HTTP Server
// :: INFO Server: jetty-.y.z-SNAPSHOT
// :: INFO AbstractConnector: Started SocketConnector@0.0.0.0:
// :: INFO Utils: Successfully started service 'HTTP file server' on port .
// :: INFO SparkEnv: Registering OutputCommitCoordinator
// :: INFO Server: jetty-.y.z-SNAPSHOT
// :: INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:
// :: INFO Utils: Successfully started service 'SparkUI' on port .
// :: INFO SparkUI: Started SparkUI at http://jiahong-OptiPlex-7010.lan:4040
// :: INFO Executor: Starting executor ID <driver> on host localhost
// :: INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@jiahong-OptiPlex-7010.lan:37917/user/HeartbeatReceiver
// :: INFO NettyBlockTransferService: Server created on
// :: INFO BlockManagerMaster: Trying to register BlockManager
// :: INFO BlockManagerMasterActor: Registering block manager localhost: with 469.5 MB RAM, BlockManagerId(<driver>, localhost, )
// :: INFO BlockManagerMaster: Registered BlockManager
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 178.6 KB, free 469.4 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.8 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost: (size: 24.8 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
// :: INFO SparkContext: Created broadcast from textFile at Test.scala:
// :: INFO FileInputFormat: Total input paths to process :
MapPartitionsRDD[] at map at Test.scala:
// :: INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
// :: INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
// :: INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
// :: INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
// :: INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
// :: INFO SparkContext: Starting job: saveAsTextFile at Test.scala:
// :: INFO DAGScheduler: Registering RDD (map at Test.scala:)
// :: INFO DAGScheduler: Registering RDD (map at Test.scala:)
// :: INFO DAGScheduler: Got job (saveAsTextFile at Test.scala:) with output partitions (allowLocal=false)
// :: INFO DAGScheduler: Final stage: Stage (saveAsTextFile at Test.scala:)
// :: INFO DAGScheduler: Parents of final stage: List(Stage )
// :: INFO DAGScheduler: Missing parents: List(Stage )
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at map at Test.scala:), which has no missing parents
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.6 KB, free 469.3 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.6 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost: (size: 2.6 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at map at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 0.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 0.0 (TID )
// :: INFO HadoopRDD: Input split: file:/home/jiahong/sparkWorkSpace/input/test.txt:+
// :: INFO Executor: Finished task 0.0 in stage 0.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID ) in ms on localhost (/)
// :: INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: Stage (map at Test.scala:) finished in 0.092 s
// :: INFO DAGScheduler: looking for newly runnable stages
// :: INFO DAGScheduler: running: Set()
// :: INFO DAGScheduler: waiting: Set(Stage , Stage )
// :: INFO DAGScheduler: failed: Set()
// :: INFO DAGScheduler: Missing parents for Stage : List()
// :: INFO DAGScheduler: Missing parents for Stage : List(Stage )
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at map at Test.scala:), which is now runnable
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.0 KB, free 469.3 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.1 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost: (size: 2.1 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at map at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 1.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 1.0 (TID )
// :: INFO ShuffleBlockFetcherIterator: Getting non-empty blocks out of blocks
// :: INFO ShuffleBlockFetcherIterator: Started remote fetches in ms
// :: INFO Executor: Finished task 0.0 in stage 1.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID ) in ms on localhost (/)
// :: INFO DAGScheduler: Stage (map at Test.scala:) finished in 0.077 s
// :: INFO DAGScheduler: looking for newly runnable stages
// :: INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: running: Set()
// :: INFO DAGScheduler: waiting: Set(Stage )
// :: INFO DAGScheduler: failed: Set()
// :: INFO DAGScheduler: Missing parents for Stage : List()
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at saveAsTextFile at Test.scala:), which is now runnable
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 124.7 KB, free 469.2 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 74.9 KB, free 469.1 MB)
// :: INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost: (size: 74.9 KB, free: 469.4 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at saveAsTextFile at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 2.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 2.0 (TID )
// :: INFO ShuffleBlockFetcherIterator: Getting non-empty blocks out of blocks
// :: INFO ShuffleBlockFetcherIterator: Started remote fetches in ms
// :: INFO FileOutputCommitter: Saved output of task 'attempt_201508021058_0002_m_000000_2' to file:/home/jiahong/sparkWorkSpace/output/_temporary//task_201508021058_0002_m_000000
// :: INFO SparkHadoopMapRedUtil: attempt_201508021058_0002_m_000000_2: Committed
// :: INFO Executor: Finished task 0.0 in stage 2.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID ) in ms on localhost (/)
// :: INFO DAGScheduler: Stage (saveAsTextFile at Test.scala:) finished in 0.138 s
// :: INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: Job finished: saveAsTextFile at Test.scala:, took 0.483353 s
MapPartitionsRDD[] at map at Test.scala:
Process finished with exit code
6.结果如下:
input目录下有个test.txt文件,内容如下
运行之后,output目录下文件如下:
注意:
一开始运行时,可能会碰到如下问题
Exception in thread "main" java.lang.NoSuchMethodError:
解决办法是,在启动你的spark时,观看scala的版本是多少,然后你在本机安装对应的版本,最后在IDEA上修改过来。我之前本机安装的是2.11.7版本,导致错误,最后查看spark的scala版本为2.10.4,我重新安装了,然后再在IDEA上修改过来,就可以正确运行了!
IDEA开发spark本地运行的更多相关文章
- Spark本地运行成功,集群运行空指针异。
一个很久之前写的Spark作业,当时运行在local模式下.最近又开始处理这方面数据了,就打包提交集群,结果频频空指针.最开始以为是程序中有null调用了,经过排除发现是继承App导致集群运行时候无法 ...
- spark之scala程序开发(本地运行模式):单词出现次数统计
准备工作: 将运行Scala-Eclipse的机器节点(CloudDeskTop)内存调整至4G,因为需要在该节点上跑本地(local)Spark程序,本地Spark程序会启动Worker进程耗用大量 ...
- spark之scala程序开发(集群运行模式):单词出现次数统计
准备工作: 将运行Scala-Eclipse的机器节点(CloudDeskTop)内存调整至4G,因为需要在该节点上跑本地(local)Spark程序,本地Spark程序会启动Worker进程耗用大量 ...
- windows下Idea结合maven开发spark和本地调试
本人的开发环境: 1.虚拟机centos 6.5 2.jdk 1.8 3.spark2.2.0 4.scala 2.11.8 5.maven 3.5.2 在开发和搭环境时必须注意版本兼容的问题 ...
- 开发函数计算的正确姿势 —— 使用 Fun Local 本地运行与调试
前言 首先介绍下在本文出现的几个比较重要的概念: 函数计算(Function Compute): 函数计算是一个事件驱动的服务,通过函数计算,用户无需管理服务器等运行情况,只需编写代码并上传.函数计算 ...
- Spark程序本地运行
Spark程序本地运行 本次安装是在JDK安装完成的基础上进行的! SPARK版本和hadoop版本必须对应!!! spark是基于hadoop运算的,两者有依赖关系,见下图: 前言: 1.环境 ...
- spark本地环境的搭建到运行第一个spark程序
搭建spark本地环境 搭建Java环境 (1)到官网下载JDK 官网链接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8- ...
- spark window本地运行wordcount错误
在运行本地运行spark或者hadoop代码时可能会遇到一下三种问题 1.Exception in thread "main" java.lang.UnsatisfiedLin ...
- 使用scala开发spark入门总结
使用scala开发spark入门总结 一.spark简单介绍 关于spark的介绍网上有很多,可以自行百度和google,这里只做简单介绍.推荐简单介绍连接:http://blog.jobbole.c ...
随机推荐
- hdu 1890 Robotic Sort
原题链接:http://acm.hdu.edu.cn/showproblem.php?pid=1890 如下: #include<cstdio> #include<cstdlib&g ...
- Swift基础小结_2
import Foundation // MARK: - ?和!的区别// ?代表可选类型,实质上是枚举类型,里面有None和Some两种类型,其实nil相当于OPtional.None,如果非nil ...
- AngularJS 授权 + Node.js REST api
作者好屌啊,我不懂的他全都懂. Authentication with AngularJS and a Node.js REST api 几个月前,我开始觉得 AngularJS 好像好牛逼的样子,于 ...
- 条款7:为多态基类声明virtual析构函数
C++明确指出:当派生类对象是由一个基类指针释放的,而基类中的析构函数不是虚函数,那么结果是未定义的.其实我们执行时其结果就是:只调用最上层基类的析构函数,派生类及其中间基类的析构函数得不到调用. # ...
- MVVMLight leaning note
Learning Note For MvvmLight MvvmLight quitstart refer link1 : MVVMLight HelloWorld *** mc:Ignorable ...
- FPGA入门1
FPGA入门知识介绍 近几年来,由于现场可编程门阵列(FPGA)的使用非常灵活,又可以无限次的编程,已受到越来越多的电子编程者的喜爱,很多朋友都想学习一些FPGA入门知识准备进行这个行业,现在关 ...
- 一个关于C#中基类与接口混合继承的疑问总结
思路参照 http://www.cnblogs.com/allenlooplee/archive/2004/11/16/64553.html,对原文进行了简化和补充,感谢原作者. 问题很简单,如下所示 ...
- online training
https://www.skillfeed.com/browse http://teamtreehouse.com/features http://www.pluralsight.com/ https ...
- 快速排序QuickSort
前几天实现了直接插入排序.冒泡排序和直接选择排序这三个基础排序.今天看了一下冒泡排序的改进算法,快速排序.单独记录一下,后面还有归并和基数排序等 快速排序 1.选择一个支点默认为数组第一个元素及arr ...
- 软件工程——四则运算3(C#)
一.设计思想 设计两个窗体,在第一个窗体中选择功能参数,在第二个窗体中显示所出题目. 二.源代码 Form1.cs: using System; using System.Collections.Ge ...