Spark:用Scala和Java实现WordCount
http://www.cnblogs.com/byrhuangqiang/p/4017725.html
为了在IDEA中编写scala,今天安装配置学习了IDEA集成开发环境。IDEA确实很优秀,学会之后,用起来很顺手。关于如何搭建scala和IDEA开发环境,请看文末的参考资料。
用Scala和Java实现WordCount,其中Java实现的JavaWordCount是spark自带的例子($SPARK_HOME/examples/src/main/java/org/apache/spark/examples/JavaWordCount.java)
1.环境
- OS:Red Hat Enterprise Linux Server release 6.4 (Santiago)
- Hadoop:Hadoop 2.4.1
- JDK:1.7.0_60
- Spark:1.1.0
- Scala:2.11.2
- 集成开发环境:IntelliJ IDEA 13.1.3
注意:需要在客户端windows环境下安装IDEA、Scala、JDK,并且为IDEA下载scala插件。
2.Scala实现单词计数

- 1 package com.hq
- 2
- 3 /**
- 4 * User: hadoop
- 5 * Date: 2014/10/10 0010
- 6 * Time: 18:59
- 7 */
- 8 import org.apache.spark.SparkConf
- 9 import org.apache.spark.SparkContext
- 10 import org.apache.spark.SparkContext._
- 11
- 12 /**
- 13 * 统计字符出现次数
- 14 */
- 15 object WordCount {
- 16 def main(args: Array[String]) {
- 17 if (args.length < 1) {
- 18 System.err.println("Usage: <file>")
- 19 System.exit(1)
- 20 }
- 21
- 22 val conf = new SparkConf()
- 23 val sc = new SparkContext(conf)
- 24 val line = sc.textFile(args(0))
- 25
- 26 line.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
- 27
- 28 sc.stop()
- 29 }
- 30 }

3.Java实现单词计数

- 1 package com.hq;
- 2
- 3 /**
- 4 * User: hadoop
- 5 * Date: 2014/10/10 0010
- 6 * Time: 19:26
- 7 */
- 8
- 9 import org.apache.spark.SparkConf;
- 10 import org.apache.spark.api.java.JavaPairRDD;
- 11 import org.apache.spark.api.java.JavaRDD;
- 12 import org.apache.spark.api.java.JavaSparkContext;
- 13 import org.apache.spark.api.java.function.FlatMapFunction;
- 14 import org.apache.spark.api.java.function.Function2;
- 15 import org.apache.spark.api.java.function.PairFunction;
- 16 import scala.Tuple2;
- 17
- 18 import java.util.Arrays;
- 19 import java.util.List;
- 20 import java.util.regex.Pattern;
- 21
- 22 public final class JavaWordCount {
- 23 private static final Pattern SPACE = Pattern.compile(" ");
- 24
- 25 public static void main(String[] args) throws Exception {
- 26
- 27 if (args.length < 1) {
- 28 System.err.println("Usage: JavaWordCount <file>");
- 29 System.exit(1);
- 30 }
- 31
- 32 SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount");
- 33 JavaSparkContext ctx = new JavaSparkContext(sparkConf);
- 34 JavaRDD<String> lines = ctx.textFile(args[0], 1);
- 35
- 36 JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
- 37 @Override
- 38 public Iterable<String> call(String s) {
- 39 return Arrays.asList(SPACE.split(s));
- 40 }
- 41 });
- 42
- 43 JavaPairRDD<String, Integer> ones = words.mapToPair(new PairFunction<String, String, Integer>() {
- 44 @Override
- 45 public Tuple2<String, Integer> call(String s) {
- 46 return new Tuple2<String, Integer>(s, 1);
- 47 }
- 48 });
- 49
- 50 JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() {
- 51 @Override
- 52 public Integer call(Integer i1, Integer i2) {
- 53 return i1 + i2;
- 54 }
- 55 });
- 56
- 57 List<Tuple2<String, Integer>> output = counts.collect();
- 58 for (Tuple2<?, ?> tuple : output) {
- 59 System.out.println(tuple._1() + ": " + tuple._2());
- 60 }
- 61 ctx.stop();
- 62 }
- 63 }

4.IDEA打包和运行
4.1 IDEA的工程结构
在IDEA中建立Scala工程,并导入spark api编程jar包(spark-assembly-1.1.0-hadoop2.4.0.jar:$SPARK_HOME/lib/里面)
4.2 打成jar包
File ---> Project Structure
配置完成后,在菜单栏中选择Build->Build Artifacts...,然后使用Build等命令打包。打包完成后会在状态栏中显示“Compilation completed successfully...”的信息,去jar包输出路径下查看jar包,如下所示。
ScalaTest1848.jar就是我们编程所产生的jar包,里面包含了三个类HelloWord、WordCount、JavaWordCount。
可以用这个jar包在spark集群里面运行java或者scala的单词计数程序。
4.3 以Spark集群standalone方式运行单词计数
上传jar包到服务器,并放置在/home/ebupt/test/WordCount.jar路径下。
上传一个text文本文件到HDFS作为单词计数的输入文件:hdfs://eb170:8020/user/ebupt/text
内容如下

- 1 import org apache spark api java JavaPairRDD
- 2 import org apache spark api java JavaRDD
- 3 import org apache spark api java JavaSparkContext
- 4 import org apache spark api java function FlatMapFunction
- 5 import org apache spark api java function Function
- 6 import org apache spark api java function Function2
- 7 import org apache spark api java function PairFunction
- 8 import scala Tuple2

用spark-submit命令提交任务运行,具体使用查看:spark-submit --help

- 1 [ebupt@eb174 bin]$ spark-submit --help
- 2 Spark assembly has been built with Hive, including Datanucleus jars on classpath
- 3 Usage: spark-submit [options] <app jar | python file> [app options]
- 4 Options:
- 5 --master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.
- 6 --deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
- 7 on one of the worker machines inside the cluster ("cluster")
- 8 (Default: client).
- 9 --class CLASS_NAME Your application's main class (for Java / Scala apps).
- 10 --name NAME A name of your application.
- 11 --jars JARS Comma-separated list of local jars to include on the driver
- 12 and executor classpaths.
- 13 --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place
- 14 on the PYTHONPATH for Python apps.
- 15 --files FILES Comma-separated list of files to be placed in the working
- 16 directory of each executor.
- 17
- 18 --conf PROP=VALUE Arbitrary Spark configuration property.
- 19 --properties-file FILE Path to a file from which to load extra properties. If not
- 20 specified, this will look for conf/spark-defaults.conf.
- 21
- 22 --driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512M).
- 23 --driver-java-options Extra Java options to pass to the driver.
- 24 --driver-library-path Extra library path entries to pass to the driver.
- 25 --driver-class-path Extra class path entries to pass to the driver. Note that
- 26 jars added with --jars are automatically included in the
- 27 classpath.
- 28
- 29 --executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).
- 30
- 31 --help, -h Show this help message and exit
- 32 --verbose, -v Print additional debug output
- 33
- 34 Spark standalone with cluster deploy mode only:
- 35 --driver-cores NUM Cores for driver (Default: 1).
- 36 --supervise If given, restarts the driver on failure.
- 37
- 38 Spark standalone and Mesos only:
- 39 --total-executor-cores NUM Total cores for all executors.
- 40
- 41 YARN-only:
- 42 --executor-cores NUM Number of cores per executor (Default: 1).
- 43 --queue QUEUE_NAME The YARN queue to submit to (Default: "default").
- 44 --num-executors NUM Number of executors to launch (Default: 2).
- 45 --archives ARCHIVES Comma separated list of archives to be extracted into the
- 46 working directory of each executor.

①提交scala实现的单词计数:
[ebupt@eb174 test]$ spark-submit --master spark://eb174:7077 --name WordCountByscala --class com.hq.WordCount --executor-memory 1G --total-executor-cores 2 ~/test/WordCount.jar hdfs://eb170:8020/user/ebupt/text
②提交java实现的单词计数:
[ebupt@eb174 test]$ spark-submit --master spark://eb174:7077 --name JavaWordCountByHQ --class com.hq.JavaWordCount --executor-memory 1G --total-executor-cores 2 ~/test/WordCount.jar hdfs://eb170:8020/user/ebupt/text
③2者运行结果类似,所以只写了一个:

- 1 Spark assembly has been built with Hive, including Datanucleus jars on classpath
- 2 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
- 3 14/10/10 19:24:51 INFO SecurityManager: Changing view acls to: ebupt,
- 4 14/10/10 19:24:51 INFO SecurityManager: Changing modify acls to: ebupt,
- 5 14/10/10 19:24:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ebupt, ); users with modify permissions: Set(ebupt, )
- 6 14/10/10 19:24:52 INFO Slf4jLogger: Slf4jLogger started
- 7 14/10/10 19:24:52 INFO Remoting: Starting remoting
- 8 14/10/10 19:24:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@eb174:56344]
- 9 14/10/10 19:24:52 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@eb174:56344]
- 10 14/10/10 19:24:52 INFO Utils: Successfully started service 'sparkDriver' on port 56344.
- 11 14/10/10 19:24:52 INFO SparkEnv: Registering MapOutputTracker
- 12 14/10/10 19:24:52 INFO SparkEnv: Registering BlockManagerMaster
- 13 14/10/10 19:24:52 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141010192452-3398
- 14 14/10/10 19:24:52 INFO Utils: Successfully started service 'Connection manager for block manager' on port 41110.
- 15 14/10/10 19:24:52 INFO ConnectionManager: Bound socket to port 41110 with id = ConnectionManagerId(eb174,41110)
- 16 14/10/10 19:24:52 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
- 17 14/10/10 19:24:52 INFO BlockManagerMaster: Trying to register BlockManager
- 18 14/10/10 19:24:52 INFO BlockManagerMasterActor: Registering block manager eb174:41110 with 265.4 MB RAM
- 19 14/10/10 19:24:52 INFO BlockManagerMaster: Registered BlockManager
- 20 14/10/10 19:24:52 INFO HttpFileServer: HTTP File server directory is /tmp/spark-8051667e-bfdb-4ecd-8111-52992b16bb13
- 21 14/10/10 19:24:52 INFO HttpServer: Starting HTTP Server
- 22 14/10/10 19:24:52 INFO Utils: Successfully started service 'HTTP file server' on port 48233.
- 23 14/10/10 19:24:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
- 24 14/10/10 19:24:53 INFO SparkUI: Started SparkUI at http://eb174:4040
- 25 14/10/10 19:24:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- 26 14/10/10 19:24:53 INFO SparkContext: Added JAR file:/home/ebupt/test/WordCountByscala.jar at http://10.1.69.174:48233/jars/WordCountByscala.jar with timestamp 1412940293532
- 27 14/10/10 19:24:53 INFO AppClient$ClientActor: Connecting to master spark://eb174:7077...
- 28 14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
- 29 14/10/10 19:24:53 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=278302556
- 30 14/10/10 19:24:53 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 265.3 MB)
- 31 14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141010192453-0009
- 32 14/10/10 19:24:53 INFO AppClient$ClientActor: Executor added: app-20141010192453-0009/0 on worker-20141008204132-eb176-49618 (eb176:49618) with 1 cores
- 33 14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20141010192453-0009/0 on hostPort eb176:49618 with 1 cores, 1024.0 MB RAM
- 34 14/10/10 19:24:53 INFO AppClient$ClientActor: Executor added: app-20141010192453-0009/1 on worker-20141008204132-eb175-56337 (eb175:56337) with 1 cores
- 35 14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20141010192453-0009/1 on hostPort eb175:56337 with 1 cores, 1024.0 MB RAM
- 36 14/10/10 19:24:53 INFO AppClient$ClientActor: Executor updated: app-20141010192453-0009/0 is now RUNNING
- 37 14/10/10 19:24:53 INFO AppClient$ClientActor: Executor updated: app-20141010192453-0009/1 is now RUNNING
- 38 14/10/10 19:24:53 INFO MemoryStore: ensureFreeSpace(12633) called with curMem=163705, maxMem=278302556
- 39 14/10/10 19:24:53 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.3 KB, free 265.2 MB)
- 40 14/10/10 19:24:53 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on eb174:41110 (size: 12.3 KB, free: 265.4 MB)
- 41 14/10/10 19:24:53 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
- 42 14/10/10 19:24:54 INFO FileInputFormat: Total input paths to process : 1
- 43 14/10/10 19:24:54 INFO SparkContext: Starting job: collect at WordCount.scala:26
- 44 14/10/10 19:24:54 INFO DAGScheduler: Registering RDD 3 (map at WordCount.scala:26)
- 45 14/10/10 19:24:54 INFO DAGScheduler: Got job 0 (collect at WordCount.scala:26) with 2 output partitions (allowLocal=false)
- 46 14/10/10 19:24:54 INFO DAGScheduler: Final stage: Stage 0(collect at WordCount.scala:26)
- 47 14/10/10 19:24:54 INFO DAGScheduler: Parents of final stage: List(Stage 1)
- 48 14/10/10 19:24:54 INFO DAGScheduler: Missing parents: List(Stage 1)
- 49 14/10/10 19:24:54 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at WordCount.scala:26), which has no missing parents
- 50 14/10/10 19:24:54 INFO MemoryStore: ensureFreeSpace(3400) called with curMem=176338, maxMem=278302556
- 51 14/10/10 19:24:54 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 265.2 MB)
- 52 14/10/10 19:24:54 INFO MemoryStore: ensureFreeSpace(2082) called with curMem=179738, maxMem=278302556
- 53 14/10/10 19:24:54 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 265.2 MB)
- 54 14/10/10 19:24:54 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on eb174:41110 (size: 2.0 KB, free: 265.4 MB)
- 55 14/10/10 19:24:54 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
- 56 14/10/10 19:24:54 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at WordCount.scala:26)
- 57 14/10/10 19:24:54 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
- 58 14/10/10 19:24:56 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@eb176:35482/user/Executor#1456950111] with ID 0
- 59 14/10/10 19:24:56 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, eb176, ANY, 1238 bytes)
- 60 14/10/10 19:24:56 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@eb175:35502/user/Executor#-1231100997] with ID 1
- 61 14/10/10 19:24:56 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, eb175, ANY, 1238 bytes)
- 62 14/10/10 19:24:56 INFO BlockManagerMasterActor: Registering block manager eb176:33296 with 530.3 MB RAM
- 63 14/10/10 19:24:56 INFO BlockManagerMasterActor: Registering block manager eb175:32903 with 530.3 MB RAM
- 64 14/10/10 19:24:57 INFO ConnectionManager: Accepted connection from [eb176/10.1.69.176:39218]
- 65 14/10/10 19:24:57 INFO ConnectionManager: Accepted connection from [eb175/10.1.69.175:55227]
- 66 14/10/10 19:24:57 INFO SendingConnection: Initiating connection to [eb176/10.1.69.176:33296]
- 67 14/10/10 19:24:57 INFO SendingConnection: Initiating connection to [eb175/10.1.69.175:32903]
- 68 14/10/10 19:24:57 INFO SendingConnection: Connected to [eb175/10.1.69.175:32903], 1 messages pending
- 69 14/10/10 19:24:57 INFO SendingConnection: Connected to [eb176/10.1.69.176:33296], 1 messages pending
- 70 14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on eb175:32903 (size: 2.0 KB, free: 530.3 MB)
- 71 14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on eb176:33296 (size: 2.0 KB, free: 530.3 MB)
- 72 14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on eb176:33296 (size: 12.3 KB, free: 530.3 MB)
- 73 14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on eb175:32903 (size: 12.3 KB, free: 530.3 MB)
- 74 14/10/10 19:24:58 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 1697 ms on eb175 (1/2)
- 75 14/10/10 19:24:58 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 1715 ms on eb176 (2/2)
- 76 14/10/10 19:24:58 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
- 77 14/10/10 19:24:58 INFO DAGScheduler: Stage 1 (map at WordCount.scala:26) finished in 3.593 s
- 78 14/10/10 19:24:58 INFO DAGScheduler: looking for newly runnable stages
- 79 14/10/10 19:24:58 INFO DAGScheduler: running: Set()
- 80 14/10/10 19:24:58 INFO DAGScheduler: waiting: Set(Stage 0)
- 81 14/10/10 19:24:58 INFO DAGScheduler: failed: Set()
- 82 14/10/10 19:24:58 INFO DAGScheduler: Missing parents for Stage 0: List()
- 83 14/10/10 19:24:58 INFO DAGScheduler: Submitting Stage 0 (ShuffledRDD[4] at reduceByKey at WordCount.scala:26), which is now runnable
- 84 14/10/10 19:24:58 INFO MemoryStore: ensureFreeSpace(2096) called with curMem=181820, maxMem=278302556
- 85 14/10/10 19:24:58 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.0 KB, free 265.2 MB)
- 86 14/10/10 19:24:58 INFO MemoryStore: ensureFreeSpace(1338) called with curMem=183916, maxMem=278302556
- 87 14/10/10 19:24:58 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1338.0 B, free 265.2 MB)
- 88 14/10/10 19:24:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on eb174:41110 (size: 1338.0 B, free: 265.4 MB)
- 89 14/10/10 19:24:58 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
- 90 14/10/10 19:24:58 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (ShuffledRDD[4] at reduceByKey at WordCount.scala:26)
- 91 14/10/10 19:24:58 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
- 92 14/10/10 19:24:58 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, eb175, PROCESS_LOCAL, 1008 bytes)
- 93 14/10/10 19:24:58 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, eb176, PROCESS_LOCAL, 1008 bytes)
- 94 14/10/10 19:24:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on eb175:32903 (size: 1338.0 B, free: 530.3 MB)
- 95 14/10/10 19:24:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on eb176:33296 (size: 1338.0 B, free: 530.3 MB)
- 96 14/10/10 19:24:58 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@eb175:59119
- 97 14/10/10 19:24:58 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 144 bytes
- 98 14/10/10 19:24:58 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@eb176:39028
- 99 14/10/10 19:24:58 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 109 ms on eb176 (1/2)
- 100 14/10/10 19:24:58 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 120 ms on eb175 (2/2)
- 101 14/10/10 19:24:58 INFO DAGScheduler: Stage 0 (collect at WordCount.scala:26) finished in 0.123 s
- 102 14/10/10 19:24:58 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
- 103 14/10/10 19:24:58 INFO SparkContext: Job finished: collect at WordCount.scala:26, took 3.815637915 s
- 104 (scala,1)
- 105 (Function2,1)
- 106 (JavaSparkContext,1)
- 107 (JavaRDD,1)
- 108 (Tuple2,1)
- 109 (,1)
- 110 (org,7)
- 111 (apache,7)
- 112 (JavaPairRDD,1)
- 113 (java,7)
- 114 (function,4)
- 115 (api,7)
- 116 (Function,1)
- 117 (PairFunction,1)
- 118 (spark,7)
- 119 (FlatMapFunction,1)
- 120 (import,8)
- 121 14/10/10 19:24:58 INFO SparkUI: Stopped Spark web UI at http://eb174:4040
- 122 14/10/10 19:24:58 INFO DAGScheduler: Stopping DAGScheduler
- 123 14/10/10 19:24:58 INFO SparkDeploySchedulerBackend: Shutting down all executors
- 124 14/10/10 19:24:58 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
- 125 14/10/10 19:24:58 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb176,33296)
- 126 14/10/10 19:24:58 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(eb176,33296)
- 127 14/10/10 19:24:58 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(eb176,33296) not found
- 128 14/10/10 19:24:58 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(eb175,32903)
- 129 14/10/10 19:24:58 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb175,32903)
- 130 14/10/10 19:24:58 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb175,32903)
- 131 14/10/10 19:24:58 INFO ConnectionManager: Key not valid ? sun.nio.ch.SelectionKeyImpl@5e92c11b
- 132 14/10/10 19:24:58 INFO ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@5e92c11b
- 133 java.nio.channels.CancelledKeyException
- 134 at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:310)
- 135 at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
- 136 14/10/10 19:24:59 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
- 137 14/10/10 19:24:59 INFO ConnectionManager: Selector thread was interrupted!
- 138 14/10/10 19:24:59 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(eb176,33296)
- 139 14/10/10 19:24:59 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(eb176,33296) not found
- 140 14/10/10 19:24:59 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb176,33296)
- 141 14/10/10 19:24:59 WARN ConnectionManager: All connections not cleaned up
- 142 14/10/10 19:24:59 INFO ConnectionManager: ConnectionManager stopped
- 143 14/10/10 19:24:59 INFO MemoryStore: MemoryStore cleared
- 144 14/10/10 19:24:59 INFO BlockManager: BlockManager stopped
- 145 14/10/10 19:24:59 INFO BlockManagerMaster: BlockManagerMaster stopped
- 146 14/10/10 19:24:59 INFO SparkContext: Successfully stopped SparkContext
- 147 14/10/10 19:24:59 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
- 148 14/10/10 19:24:59 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
- 149 14/10/10 19:24:59 INFO Remoting: Remoting shut down
- 150 14/10/10 19:24:59 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.

5.参考资料
关于IDEA的使用:Scala从零开始:使用Intellij IDEA写hello world
scala编写WC: Spark wordcount开发并提交到集群运行
java编写WC:用java编写spark程序,简单示例及运行、Spark在Yarn上运行Wordcount程序
Spark:用Scala和Java实现WordCount的更多相关文章
- Spark 用Scala和Java分别实现wordcount
Scala import org.apache.spark.{SparkConf, SparkContext} object wordcount { def main(args: Array[Stri ...
- 0基础就可以上手的Spark脚本开发-for Java
前言 最近由于工作需要,要分析大几百G的Nginx日志数据.之前也有过类似的需求,但那个时候数据量不多.一次只有几百兆,或者几个G.因为数据都在Hive里面,当时的做法是:把数据从Hive导到MySQ ...
- 编写Spark的WordCount程序并提交到集群运行[含scala和java两个版本]
编写Spark的WordCount程序并提交到集群运行[含scala和java两个版本] 1. 开发环境 Jdk 1.7.0_72 Maven 3.2.1 Scala 2.10.6 Spark 1.6 ...
- java+hadoop+spark+hbase+scala+kafka+zookeeper配置环境变量记录备忘
java+hadoop+spark+hbase+scala 在/etc/profile 下面加上如下环境变量 export JAVA_HOME=/usr/java/jdk1.8.0_102 expor ...
- spark提示Caused by: java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Lscala.collection.immutable.Map;
spark提示Caused by: java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot b ...
- spark之scala程序开发(集群运行模式):单词出现次数统计
准备工作: 将运行Scala-Eclipse的机器节点(CloudDeskTop)内存调整至4G,因为需要在该节点上跑本地(local)Spark程序,本地Spark程序会启动Worker进程耗用大量 ...
- Scala IDEA for Eclipse里用maven来创建scala和java项目代码环境(图文详解)
这篇博客 是在Scala IDEA for Eclipse里手动创建scala代码编写环境. Scala IDE for Eclipse的下载.安装和WordCount的初步使用(本地模式和集群模式) ...
- 用maven来创建scala和java项目代码环境(图文详解)(Intellij IDEA(Ultimate版本)、Intellij IDEA(Community版本)和Scala IDEA for Eclipse皆适用)(博主推荐)
不多说,直接上干货! 为什么要写这篇博客? 首先,对于spark项目,强烈建议搭建,用Intellij IDEA(Ultimate版本),如果你还有另所爱好尝试Scala IDEA for Eclip ...
- Win7 Eclipse 搭建spark java1.8(lambda)环境:WordCount helloworld例子
[学习笔记] Win7 Eclipse 搭建spark java1.8(lambda)环境:WordCount helloworld例子 lambda表达式是java8给我们带来的一个重量的新特性,借 ...
随机推荐
- ubuntu 常用软件配置
1. 首先重装系统后需要执行: sudo apt-get install update 2. 然后安装必要的软件: terminator, vim, git,
- NetAdvantage
1.LC.exe的问题 ⇒非正式版,删除licenses.licx文件 2.項目 "obj\Debug\BasicFeaturesSample.Form1.resources" は ...
- Solution for Latex error: "Cannot determine size of graphic"
I'm trying to include graphics in my Latex-file, which I compiled with latex+dvipdf on OS X. Latex h ...
- hdu4777-Rabbit Kingdom
题意:求区间内与其他任何数都互质的数的个数. 题解:求出每个数左右互质的边界.然后对询问排序,通过树状数组求解. 讲道理真的好难啊= = http://blog.csdn.net/dyx404514/ ...
- TCP/IP模型的简单解释
TCP/IP模型是互联网的基础.想要理解互联网,就必须理解这个模型.但是,它不好懂,我就从来没有搞懂过. 前几天,BetterExplained上有一篇文章,很通俗地解释了这个模型.我读后有一种恍然大 ...
- [OC Foundation框架 - 11] NSMutableDictionary
void dicUse() { NSMutableDictionary *dic = [NSMutableDictionary dictionary]; Student *stu1 = [Stud ...
- ECshop模板机制
ECshop模板机制整理 模板机制 近期新项目涉及到ECshop的二次开发,趁此良机正好可以对闻名已久的ECshop系统进行深入了解.要了解一个系统,那么该系统的模板机制就是最重要的一环.相关整理如下 ...
- Activity的生命周期,BACK键和HOME键生命周期
Activity的生命周期模型在Google提供的官方文档上有比较详细的一个图示 public class HelloActivity extends Activity { public static ...
- android驱动[置顶] 我的DIY Android之旅--驱动并控制你的Android开发板蜂鸣器
改章节个人在深圳游玩的时候突然想到的...这几周就有想写几篇关于android驱动的博客,所以回家到之后就奋笔疾书的写出来发布了 这些天一直在想Android驱动框架层的实现,本文借助老罗教师的博客和 ...
- HDU 1718 Rank counting sort解法
本题是利用counting sort的思想去解题. 注意本题,好像利用直接排序,然后查找rank是会直接被判WA的.奇怪的推断系统. 由于分数值的范围是0到100,很小,而student 号码又很大, ...