1. Spark Shell测试

Spark Shell是一个特别适合快速开发Spark原型程序的工具,可以帮助我们熟悉Scala语言。即使你对Scala不熟悉,仍然可以使用这一工具。Spark Shell使得用户可以和Spark集群进行交互,提交查询,这便于调试,也便于初学者使用Spark。

测试案例1:

[Spark@Master spark]$ MASTER=spark://Master:7077 bin/spark-shell //连接到集群
Spark assembly has been built with Hive, including Datanucleus jars on classpath
// :: INFO spark.SecurityManager: Changing view acls to: Spark,
// :: INFO spark.SecurityManager: Changing modify acls to: Spark,
// :: INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
// :: INFO spark.HttpServer: Starting HTTP Server
// :: INFO server.Server: jetty-.y.z-SNAPSHOT
// :: INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:
// :: INFO util.Utils: Successfully started service 'HTTP class server' on port .
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.
/_/ Using Scala version 2.10. (Java HotSpot(TM) -Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
// :: INFO spark.SecurityManager: Changing view acls to: Spark,
// :: INFO spark.SecurityManager: Changing modify acls to: Spark,
// :: INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
// :: INFO slf4j.Slf4jLogger: Slf4jLogger started
// :: INFO Remoting: Starting remoting
// :: INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@Master:45322]
// :: INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@Master:45322]
// :: INFO util.Utils: Successfully started service 'sparkDriver' on port .
// :: INFO spark.SparkEnv: Registering MapOutputTracker
// :: INFO spark.SparkEnv: Registering BlockManagerMaster
// :: INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local--e9cc
// :: INFO util.Utils: Successfully started service 'Connection manager for block manager' on port .
// :: INFO network.ConnectionManager: Bound socket to port with id = ConnectionManagerId(Master,)
// :: INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
// :: INFO storage.BlockManagerMaster: Trying to register BlockManager
// :: INFO storage.BlockManagerMasterActor: Registering block manager Master: with 267.3 MB RAM
// :: INFO storage.BlockManagerMaster: Registered BlockManager
// :: INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-87ad77b3-40b1--958f-b1d632f2b4f5
// :: INFO spark.HttpServer: Starting HTTP Server
// :: INFO server.Server: jetty-.y.z-SNAPSHOT
// :: INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:
// :: INFO util.Utils: Successfully started service 'HTTP file server' on port .
// :: INFO server.Server: jetty-.y.z-SNAPSHOT
// :: INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:
// :: INFO util.Utils: Successfully started service 'SparkUI' on port .
// :: INFO ui.SparkUI: Started SparkUI at http://Master:4040
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO client.AppClient$ClientActor: Connecting to master spark://Master:7077...
// :: INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
// :: INFO repl.SparkILoop: Created spark context..
Spark context available as sc. scala> // :: INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app--
// :: INFO client.AppClient$ClientActor: Executor added: app--/ on worker--Slave1- (Slave1:) with cores
// :: INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app--/ on hostPort Slave1: with cores, 512.0 MB RAM
// :: INFO client.AppClient$ClientActor: Executor added: app--/ on worker--Slave2- (Slave2:) with cores
// :: INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app--/ on hostPort Slave2: with cores, 512.0 MB RAM
// :: INFO client.AppClient$ClientActor: Executor updated: app--/ is now RUNNING
// :: INFO client.AppClient$ClientActor: Executor updated: app--/ is now RUNNING
// :: INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@Slave1:41369/user/Executor#-1591583962] with ID 0
// :: INFO storage.BlockManagerMasterActor: Registering block manager Slave1: with 267.3 MB RAM
// :: INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@Slave2:47569/user/Executor#-1622351454] with ID 1
// :: INFO storage.BlockManagerMasterActor: Registering block manager Slave2: with 267.3 MB RAM scala> val file = sc.textFile("hdfs://Master:9000/data/test1")
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.6 KB, free 267.1 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Master: (size: 12.6 KB, free: 267.3 MB)
// :: INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
file: org.apache.spark.rdd.RDD[String] = hdfs://Master:9000/data/test1 MappedRDD[1] at textFile at <console>:12 scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
// :: INFO mapred.FileInputFormat: Total input paths to process :
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[] at reduceByKey at <console>: scala> count.collect()
// :: INFO spark.SparkContext: Starting job: collect at <console>:
// :: INFO scheduler.DAGScheduler: Registering RDD (map at <console>:)
// :: INFO scheduler.DAGScheduler: Got job (collect at <console>:) with output partitions (allowLocal=false)
// :: INFO scheduler.DAGScheduler: Final stage: Stage (collect at <console>:)
// :: INFO scheduler.DAGScheduler: Parents of final stage: List(Stage )
// :: INFO scheduler.DAGScheduler: Missing parents: List(Stage )
// :: INFO scheduler.DAGScheduler: Submitting Stage (MappedRDD[] at map at <console>:), which has no missing parents
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 267.1 MB)
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 267.1 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Master: (size: 2.0 KB, free: 267.3 MB)
// :: INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
// :: INFO scheduler.DAGScheduler: Submitting missing tasks from Stage (MappedRDD[] at map at <console>:)
// :: INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with tasks
// :: INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID , Slave2, NODE_LOCAL, bytes)
// :: INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID , Slave1, NODE_LOCAL, bytes)
// :: INFO network.ConnectionManager: Accepted connection from [Slave1/192.168.8.30:]
// :: INFO network.SendingConnection: Initiating connection to [Slave1/192.168.8.30:]
// :: INFO network.ConnectionManager: Accepted connection from [Slave2/192.168.8.31:]
// :: INFO network.SendingConnection: Connected to [Slave1/192.168.8.30:], messages pending
// :: INFO network.SendingConnection: Initiating connection to [Slave2/192.168.8.31:]
// :: INFO network.SendingConnection: Connected to [Slave2/192.168.8.31:], messages pending
// :: INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave1: (size: 2.0 KB, free: 267.3 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave2: (size: 2.0 KB, free: 267.3 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave1: (size: 12.6 KB, free: 267.3 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave2: (size: 12.6 KB, free: 267.3 MB)
// :: INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID ) in ms on Slave2 (/)
// :: INFO scheduler.DAGScheduler: Stage (map at <console>:) finished in 8.626 s
// :: INFO scheduler.DAGScheduler: looking for newly runnable stages
// :: INFO scheduler.DAGScheduler: running: Set()
// :: INFO scheduler.DAGScheduler: waiting: Set(Stage )
// :: INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID ) in ms on Slave1 (/)
// :: INFO scheduler.DAGScheduler: failed: Set()
// :: INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
// :: INFO scheduler.DAGScheduler: Missing parents for Stage : List()
// :: INFO scheduler.DAGScheduler: Submitting Stage (ShuffledRDD[] at reduceByKey at <console>:), which is now runnable
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.1 KB, free 267.1 MB)
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1327.0 B, free 267.1 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Master: (size: 1327.0 B, free: 267.3 MB)
// :: INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
// :: INFO scheduler.DAGScheduler: Submitting missing tasks from Stage (ShuffledRDD[] at reduceByKey at <console>:)
// :: INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with tasks
// :: INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID , Slave2, PROCESS_LOCAL, bytes)
// :: INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID , Slave1, PROCESS_LOCAL, bytes)
// :: INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave1: (size: 1327.0 B, free: 267.3 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave2: (size: 1327.0 B, free: 267.3 MB)
// :: INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle to sparkExecutor@Slave1:
// :: INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle is bytes
// :: INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle to sparkExecutor@Slave2:
// :: INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID ) in ms on Slave2 (/)
// :: INFO scheduler.DAGScheduler: Stage (collect at <console>:) finished in 0.179 s
// :: INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID ) in ms on Slave1 (/)
// :: INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
// :: INFO spark.SparkContext: Job finished: collect at <console>:, took 8.947687849 s
res0: Array[(String, Int)] = Array((spark,1), (hadoop,2), (hbase,1)) scala>

测试案例2:

运行Spark自带测试程序

[Spark@Master spark]$ bin/run-example org.apache.spark.examples.SparkPi 2 spark://192.168.8.29:7077
Spark assembly has been built with Hive, including Datanucleus jars on classpath
// :: INFO spark.SecurityManager: Changing view acls to: Spark,
// :: INFO spark.SecurityManager: Changing modify acls to: Spark,
// :: INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
// :: INFO slf4j.Slf4jLogger: Slf4jLogger started
// :: INFO Remoting: Starting remoting
// :: INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@Master:60670]
// :: INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@Master:60670]
// :: INFO util.Utils: Successfully started service 'sparkDriver' on port .
// :: INFO spark.SparkEnv: Registering MapOutputTracker
// :: INFO spark.SparkEnv: Registering BlockManagerMaster
// :: INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local--
// :: INFO util.Utils: Successfully started service 'Connection manager for block manager' on port .
// :: INFO network.ConnectionManager: Bound socket to port with id = ConnectionManagerId(Master,)
// :: INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
// :: INFO storage.BlockManagerMaster: Trying to register BlockManager
// :: INFO storage.BlockManagerMasterActor: Registering block manager Master: with 267.3 MB RAM
// :: INFO storage.BlockManagerMaster: Registered BlockManager
// :: INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark---4e30-89a3-83a560210e14
// :: INFO spark.HttpServer: Starting HTTP Server
// :: INFO server.Server: jetty-.y.z-SNAPSHOT
// :: INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:
// :: INFO util.Utils: Successfully started service 'HTTP file server' on port .
// :: INFO server.Server: jetty-.y.z-SNAPSHOT
// :: INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:
// :: INFO util.Utils: Successfully started service 'SparkUI' on port .
// :: INFO ui.SparkUI: Started SparkUI at http://Master:4040
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO spark.SparkContext: Added JAR file:/home/Spark/husor/spark/lib/spark-examples-1.1.-hadoop2.4.0.jar at http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar with timestamp 1417402887362
// :: INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@Master:60670/user/HeartbeatReceiver
// :: INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:
// :: INFO scheduler.DAGScheduler: Got job (reduce at SparkPi.scala:) with output partitions (allowLocal=false)
// :: INFO scheduler.DAGScheduler: Final stage: Stage (reduce at SparkPi.scala:)
// :: INFO scheduler.DAGScheduler: Parents of final stage: List()
// :: INFO scheduler.DAGScheduler: Missing parents: List()
// :: INFO scheduler.DAGScheduler: Submitting Stage (MappedRDD[] at map at SparkPi.scala:), which has no missing parents
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1728.0 B, free 267.3 MB)
// :: INFO scheduler.DAGScheduler: Submitting missing tasks from Stage (MappedRDD[] at map at SparkPi.scala:)
// :: INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with tasks
// :: INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO executor.Executor: Running task 0.0 in stage 0.0 (TID )
// :: INFO executor.Executor: Fetching http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar with timestamp 1417402887362
// :: INFO util.Utils: Fetching http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar to /tmp/fetchFileTemp7489373377783107634.tmp
// :: INFO executor.Executor: Adding file:/tmp/spark-ad7b4d7f--406b-b3a9-21bd79fddf9f/spark-examples-1.1.-hadoop2.4.0.jar to class loader
// :: INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID ). bytes result sent to driver
// :: INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO executor.Executor: Running task 1.0 in stage 0.0 (TID )
// :: INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID ). bytes result sent to driver
// :: INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID ) in ms on localhost (/)
// :: INFO scheduler.DAGScheduler: Stage (reduce at SparkPi.scala:) finished in 0.936 s
// :: INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID ) in ms on localhost (/)
// :: INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
// :: INFO spark.SparkContext: Job finished: reduce at SparkPi.scala:, took 1.3590325 s
Pi is roughly 3.13872
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
// :: INFO ui.SparkUI: Stopped Spark web UI at http://Master:4040
// :: INFO scheduler.DAGScheduler: Stopping DAGScheduler
// :: INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
// :: INFO network.ConnectionManager: Selector thread was interrupted!
// :: INFO network.ConnectionManager: ConnectionManager stopped
// :: INFO storage.MemoryStore: MemoryStore cleared
// :: INFO storage.BlockManager: BlockManager stopped
// :: INFO storage.BlockManagerMaster: BlockManagerMaster stopped
// :: INFO spark.SparkContext: Successfully stopped SparkContext
// :: INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
// :: INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

2. 利用Intellij IDEA(Scala插件)编写相应的Spark程序后进行打包成.jar文件后,提交到Spark集群进行运行

其中,com.husor.Test.WordCount.scala代码如下:

package com.husor.Test

import org.apache.spark.{SparkContext,SparkConf}
import org.apache.spark.SparkContext._ /**
* Created by huxiu on 2014/11/27.
*/
object WordCount {
def main(args: Array[String]) { println("Test is starting......") if (args.length < ) {
System.err.println("Usage: HDFS_InputFile <File> HDFS_OutputDir <Directory>")
System.exit()
} //System.setProperty("hadoop.home.dir", "d:\\winutil\\") val conf = new SparkConf().setAppName("WordCount")
.setSparkHome("SPARK_HOME") val spark = new SparkContext(conf) //val spark = new SparkContext("local","WordCount") val file = spark.textFile(args()) //在控制台上进行输出
//file.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
//val wordcounts = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_) val wordCounts = file.flatMap(_.split(" ")).map((_, )).reduceByKey(_+_)
wordCounts.saveAsTextFile(args())
spark.stop() println("Test is Succeed!!!") }
}

相应的执行脚本runSpark.sh如下:

#!/bin/bash

set -x

spark-submit \
--class com.husor.Test.WordCount \
--master spark://Master:7077 \
--executor-memory 512m \
--total-executor-cores \
/home/Spark/husor/spark/SparkTest.jar \
hdfs://Master:9000/data/test1 \
hdfs://Master:9000/user/huxiu/SparkWordCount

给执行脚本runSpark.sh添加执行权限(chmod +x runSpark.sh),执行过程如下:

[Spark@Master spark]$ ./runSpark.sh
+ spark-submit --class com.husor.Test.WordCount --master spark://Master:7077 --executor-memory 512m --total-executor-cores 1 /home/Spark/husor/spark/SparkTest.jar hdfs://Master:9000/data/test1 hdfs://Master:9000/user/huxiu/SparkWordCount
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Test is starting......
// :: INFO spark.SecurityManager: Changing view acls to: Spark,
// :: INFO spark.SecurityManager: Changing modify acls to: Spark,
// :: INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
// :: INFO slf4j.Slf4jLogger: Slf4jLogger started
// :: INFO Remoting: Starting remoting
// :: INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@Master:37899]
// :: INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@Master:37899]
// :: INFO util.Utils: Successfully started service 'sparkDriver' on port .
// :: INFO spark.SparkEnv: Registering MapOutputTracker
// :: INFO spark.SparkEnv: Registering BlockManagerMaster
// :: INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local--
// :: INFO util.Utils: Successfully started service 'Connection manager for block manager' on port .
// :: INFO network.ConnectionManager: Bound socket to port with id = ConnectionManagerId(Master,)
// :: INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
// :: INFO storage.BlockManagerMaster: Trying to register BlockManager
// :: INFO storage.BlockManagerMasterActor: Registering block manager Master: with 267.3 MB RAM
// :: INFO storage.BlockManagerMaster: Registered BlockManager
// :: INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-83b486ec--4f71-be00-0418e485151f
// :: INFO spark.HttpServer: Starting HTTP Server
// :: INFO server.Server: jetty-.y.z-SNAPSHOT
// :: INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:
// :: INFO util.Utils: Successfully started service 'HTTP file server' on port .
// :: INFO server.Server: jetty-.y.z-SNAPSHOT
// :: INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:
// :: INFO util.Utils: Successfully started service 'SparkUI' on port .
// :: INFO ui.SparkUI: Started SparkUI at http://Master:4040
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO spark.SparkContext: Added JAR file:/home/Spark/husor/spark/SparkTest.jar at http://Master:34902/jars/SparkTest.jar with timestamp 1417407052941
// :: INFO client.AppClient$ClientActor: Connecting to master spark://Master:7077...
// :: INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)
// :: INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app--
// :: INFO client.AppClient$ClientActor: Executor added: app--/ on worker--Slave1- (Slave1:) with cores
// :: INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app--/ on hostPort Slave1: with cores, 512.0 MB RAM
// :: INFO client.AppClient$ClientActor: Executor updated: app--/ is now RUNNING
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.6 KB, free 267.1 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Master: (size: 12.6 KB, free: 267.3 MB)
// :: INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
// :: INFO mapred.FileInputFormat: Total input paths to process :
// :: INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
// :: INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
// :: INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
// :: INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
// :: INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
// :: INFO spark.SparkContext: Starting job: saveAsTextFile at WordCount.scala:
// :: INFO scheduler.DAGScheduler: Registering RDD (map at WordCount.scala:)
// :: INFO scheduler.DAGScheduler: Got job (saveAsTextFile at WordCount.scala:) with output partitions (allowLocal=false)
// :: INFO scheduler.DAGScheduler: Final stage: Stage (saveAsTextFile at WordCount.scala:)
// :: INFO scheduler.DAGScheduler: Parents of final stage: List(Stage )
// :: INFO scheduler.DAGScheduler: Missing parents: List(Stage )
// :: INFO scheduler.DAGScheduler: Submitting Stage (MappedRDD[] at map at WordCount.scala:), which has no missing parents
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 267.1 MB)
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 267.1 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Master: (size: 2.0 KB, free: 267.3 MB)
// :: INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
// :: INFO scheduler.DAGScheduler: Submitting missing tasks from Stage (MappedRDD[] at map at WordCount.scala:)
// :: INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with tasks
// :: INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@Slave1:38410/user/Executor#898843507] with ID 0
// :: INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID , Slave1, NODE_LOCAL, bytes)
// :: INFO storage.BlockManagerMasterActor: Registering block manager Slave1: with 267.3 MB RAM
// :: INFO network.ConnectionManager: Accepted connection from [Slave1/192.168.8.30:]
// :: INFO network.SendingConnection: Initiating connection to [Slave1/192.168.8.30:]
// :: INFO network.SendingConnection: Connected to [Slave1/192.168.8.30:], messages pending
// :: INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave1: (size: 2.0 KB, free: 267.3 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave1: (size: 12.6 KB, free: 267.3 MB)
// :: INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID , Slave1, NODE_LOCAL, bytes)
// :: INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID ) in ms on Slave1 (/)
// :: INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID ) in ms on Slave1 (/)
// :: INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
// :: INFO scheduler.DAGScheduler: Stage (map at WordCount.scala:) finished in 4.444 s
// :: INFO scheduler.DAGScheduler: looking for newly runnable stages
// :: INFO scheduler.DAGScheduler: running: Set()
// :: INFO scheduler.DAGScheduler: waiting: Set(Stage )
// :: INFO scheduler.DAGScheduler: failed: Set()
// :: INFO scheduler.DAGScheduler: Missing parents for Stage : List()
// :: INFO scheduler.DAGScheduler: Submitting Stage (MappedRDD[] at saveAsTextFile at WordCount.scala:), which is now runnable
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 56.2 KB, free 267.0 MB)
// :: INFO storage.MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.4 KB, free 267.0 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Master: (size: 19.4 KB, free: 267.2 MB)
// :: INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
// :: INFO scheduler.DAGScheduler: Submitting missing tasks from Stage (MappedRDD[] at saveAsTextFile at WordCount.scala:)
// :: INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with tasks
// :: INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID , Slave1, PROCESS_LOCAL, bytes)
// :: INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave1: (size: 19.4 KB, free: 267.2 MB)
// :: INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle to sparkExecutor@Slave1:
// :: INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle is bytes
// :: INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID , Slave1, PROCESS_LOCAL, bytes)
// :: INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID ) in ms on Slave1 (/)
// :: INFO scheduler.DAGScheduler: Stage (saveAsTextFile at WordCount.scala:) finished in 0.710 s
// :: INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID ) in ms on Slave1 (/)
// :: INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
// :: INFO spark.SparkContext: Job finished: saveAsTextFile at WordCount.scala:, took 5.556490798 s
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
// :: INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
// :: INFO ui.SparkUI: Stopped Spark web UI at http://Master:4040
// :: INFO scheduler.DAGScheduler: Stopping DAGScheduler
// :: INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
// :: INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down
// :: INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(Slave1,)
// :: INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(Slave1,)
// :: INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(Slave1,)
// :: INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
// :: INFO network.ConnectionManager: Selector thread was interrupted!
// :: INFO network.ConnectionManager: ConnectionManager stopped
// :: INFO storage.MemoryStore: MemoryStore cleared
// :: INFO storage.BlockManager: BlockManager stopped
// :: INFO storage.BlockManagerMaster: BlockManagerMaster stopped
// :: INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
// :: INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
// :: INFO spark.SparkContext: Successfully stopped SparkContext
Test is Succeed!!!
// :: INFO Remoting: Remoting shut down
// :: INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
[Spark@Master spark]$ hdfs dfs -cat /user/huxiu/SparkWordCount/part-00001
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
(spark,)
(hadoop,)
(hbase,)
[Spark@Master spark]$ hdfs dfs -ls /user/huxiu/SparkWordCount/
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found items
-rw-r--r-- Spark huxiu -- : /user/huxiu/SparkWordCount/_SUCCESS
-rw-r--r-- Spark huxiu -- : /user/huxiu/SparkWordCount/part-
-rw-r--r-- Spark huxiu -- : /user/huxiu/SparkWordCount/part-
[Spark@Master spark]$ hdfs dfs -cat /user/huxiu/SparkWordCount/part-
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Note:

运行过程中可能会出现 Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory异常,而内存肯定是够的,但就是无法获取资源!检查防火墙,果然客户端只开启的对80端口的访问,其他都禁止了!

Solution:

关闭各节点上的防火墙(service iptables stop),然后在Spark on yarn集群上执行上述脚本runSpark.sh即可

Spark集群测试的更多相关文章

  1. [bigdata] spark集群安装及测试

    在spark安装之前,应该已经安装了hadoop原生版或者cdh,因为spark基本要基于hdfs来进行计算. 1. 下载 spark:  http://mirrors.cnnic.cn/apache ...

  2. spark集群的简单测试和基础命令的使用

    写此篇文章之前,已经搭建好spark集群并测试成功: spark集群搭建文章链接:http://www.cnblogs.com/mmzs/p/8193707.html 一.启动环境 由于每次都要启动, ...

  3. Spark集群部署

    Spark是通用的基于内存计算的大数据框架,可以和hadoop生态系统很好的兼容,以下来部署Spark集群 集群环境:3节点 Master:bigdata1 Slaves:bigdata2,bigda ...

  4. Spark集群 + Akka + Kafka + Scala 开发(3) : 开发一个Akka + Spark的应用

    前言 在Spark集群 + Akka + Kafka + Scala 开发(1) : 配置开发环境中,我们已经部署好了一个Spark的开发环境. 在Spark集群 + Akka + Kafka + S ...

  5. Spark集群 + Akka + Kafka + Scala 开发(2) : 开发一个Spark应用

    前言 在Spark集群 + Akka + Kafka + Scala 开发(1) : 配置开发环境,我们已经部署好了一个Spark的开发环境. 本文的目标是写一个Spark应用,并可以在集群中测试. ...

  6. Spark集群 + Akka + Kafka + Scala 开发(1) : 配置开发环境

    目标 配置一个spark standalone集群 + akka + kafka + scala的开发环境. 创建一个基于spark的scala工程,并在spark standalone的集群环境中运 ...

  7. Spark集群 + Akka + Kafka + Scala 开发(4) : 开发一个Kafka + Spark的应用

    前言 在Spark集群 + Akka + Kafka + Scala 开发(1) : 配置开发环境中,我们已经部署好了一个Spark的开发环境. 在Spark集群 + Akka + Kafka + S ...

  8. 实验室中搭建Spark集群和PyCUDA开发环境

    1.安装CUDA 1.1安装前工作 1.1.1选取实验器材 实验中的每台计算机均装有双系统.选择其中一台计算机作为master节点,配置有GeForce GTX 650显卡,拥有384个CUDA核心. ...

  9. Spark系列—01 Spark集群的安装

    一.概述 关于Spark是什么.为什么学习Spark等等,在这就不说了,直接看这个:http://spark.apache.org, 我就直接说一下Spark的一些优势: 1.快 与Hadoop的Ma ...

随机推荐

  1. DOM3级的变化

    由于存在跨浏览器开发问题所以不推荐使用: 兼容性: event.key 包含所按下键的字符 event.char 属性IE9和safari和chrome并不支持 event.location 返回所按 ...

  2. maven使用中遇到的问题

    一>手动将jar包安装到仓库的命令示例: 首先:编写命令:mvn install:install-file -Dfile=D:\lucene-highlighter-4.10.2.jar -Dg ...

  3. ArcGIS API for Silverlight部署本地地图服务

    这一节我们来讲新建立的ArcGIS API for Silverlight应用程序如何加载自己的地图服务的问题,网上的资料讲的都有点含糊不清,这次我们详细的讲一下配置的步骤: 首先介绍下我们的开发和部 ...

  4. CMake和静态库顺序

    目录 目录 1 前言 1 方法 1 附1:链接静态库的顺序问题 2 附2:再议GCC编译时的静态库依赖次顺问题 3 附3:gcc链接参数--whole-archive的作用 4 附4:让有些“-l”链 ...

  5. 2018-03-17 handler学习使用

    1.handler具体使用https://www.cnblogs.com/JohnTsai/p/5259869.html 2.handlerThread用法https://www.jianshu.co ...

  6. UIActivityIndicatorView 的使用

    // //  UIActivityIndicator.m //  ToolBar // //  Created by lanouhn on 15/1/3. //  Copyright (c) 2015 ...

  7. Libre Office冻结操作-MAC

    冻结行/列 选中所要冻结的行/列,选择window-Freeze 冻结成功 冻结单元格 操作方法与“冻结行/列”一致

  8. Android webview 开启地理位置定位

    WebSettings webSettings = webView.getSettings(); webSettings.setDatabaseEnabled(true); String dir = ...

  9. security.php

    <?php /** * */ class Security { public function csrf_verify() { if(count($_POST) == 0) { return ' ...

  10. poj 2676 如何填满九宫格

    Sudoku Time Limit: 2000 MS Memory Limit: 65536 KB 64-bit integer IO format: %I64d , %I64u Java class ...