Spark记录-Spark-Shell客户端操作读取Hive数据
1.拷贝hive-site.xml到spark/conf下,拷贝mysql-connector-java-xxx-bin.jar到hive/lib下
2.开启hive元数据服务:hive --service metastore
3.开启hadoop服务:sh $HADOOP_HOME/sbin/start-all.sh
4.开启spark服务:sh $SPARK_HOME/sbin/start-all.sh
5.进入spark-shell:spark-shell
6.scala操作hive(spark-sql)
scala>val conf=new SparkConf().setAppName("SparkHive").setMaster("local") //可忽略,已经自动创建了
scala>val sc=new SparkContext(conf) //可忽略,已经自动创建了
scala>val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala>sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ")//这里需要注意数据的间隔符
scala>sqlContext.sql("LOAD DATA INPATH '/user/spark/src.txt' INTO TABLE src ");
scala>sqlContext.sql(" SELECT * FROM src").collect().foreach(println)
scala>sc.stop()
SQL context available as sqlContext. scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
17/12/05 10:38:51 INFO HiveContext: Initializing execution hive, version 1.2.1
17/12/05 10:38:51 INFO ClientWrapper: Inspected Hadoop version: 2.4.0
17/12/05 10:38:51 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.4.0
17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
17/12/05 10:38:51 INFO metastore: Mestastore configuration hive.metastore.warehouse.dir changed from file:/tmp/spark-ecfcdcc1-2bb0-4efc-aa00-96ad1dd47840/metastore to file:/tmp/spark-ea48b58b-ef90-43c0-8d5e-f54a4b4cadde/metastore
17/12/05 10:38:51 INFO metastore: Mestastore configuration javax.jdo.option.ConnectionURL changed from jdbc:derby:;databaseName=/tmp/spark-ecfcdcc1-2bb0-4efc-aa00-96ad1dd47840/metastore;create=true to jdbc:derby:;databaseName=/tmp/spark-ea48b58b-ef90-43c0-8d5e-f54a4b4cadde/metastore;create=true
17/12/05 10:38:51 INFO HiveMetaStore: 0: Shutting down the object store...
17/12/05 10:38:51 INFO audit: ugi=root ip=unknown-ip-addr cmd=Shutting down the object store...
17/12/05 10:38:51 INFO HiveMetaStore: 0: Metastore shutdown complete.
17/12/05 10:38:51 INFO audit: ugi=root ip=unknown-ip-addr cmd=Metastore shutdown complete.
17/12/05 10:38:51 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/12/05 10:38:51 INFO ObjectStore: ObjectStore, initialize called
17/12/05 10:38:51 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
17/12/05 10:38:51 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
17/12/05 10:38:56 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/12/05 10:38:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/12/05 10:38:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/12/05 10:39:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/12/05 10:39:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/12/05 10:39:01 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
17/12/05 10:39:01 INFO ObjectStore: Initialized ObjectStore
17/12/05 10:39:01 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
17/12/05 10:39:02 INFO HiveMetaStore: Added admin role in metastore
17/12/05 10:39:02 INFO HiveMetaStore: Added public role in metastore
17/12/05 10:39:02 INFO HiveMetaStore: No user is added in admin role, since config is empty
17/12/05 10:39:02 INFO SessionState: Created local directory: /tmp/d66a519b-e512-4295-b707-0f688aa238ea_resources
17/12/05 10:39:02 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/d66a519b-e512-4295-b707-0f688aa238ea
17/12/05 10:39:02 INFO SessionState: Created local directory: /tmp/root/d66a519b-e512-4295-b707-0f688aa238ea
17/12/05 10:39:02 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/d66a519b-e512-4295-b707-0f688aa238ea/_tmp_space.db
17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
17/12/05 10:39:02 INFO HiveContext: default warehouse location is /user/hive/warehouse
17/12/05 10:39:02 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/12/05 10:39:02 INFO ClientWrapper: Inspected Hadoop version: 2.4.0
17/12/05 10:39:03 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.4.0
17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
17/12/05 10:39:08 INFO metastore: Trying to connect to metastore with URI thrift://192.168.66.66:9083
17/12/05 10:39:08 INFO metastore: Connected to metastore.
17/12/05 10:39:10 INFO SessionState: Created local directory: /tmp/4989df94-ba31-4ef6-ab78-369043e2067e_resources
17/12/05 10:39:10 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e
17/12/05 10:39:10 INFO SessionState: Created local directory: /tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e
17/12/05 10:39:10 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e/_tmp_space.db
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3be94b12 scala> sqlContext.sql("use siat")
17/12/05 10:39:36 INFO ParseDriver: Parsing command: use siat
17/12/05 10:39:41 INFO ParseDriver: Parse Completed
17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:45 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:45 INFO ParseDriver: Parsing command: use siat
17/12/05 10:39:49 INFO ParseDriver: Parse Completed
17/12/05 10:39:50 INFO PerfLogger: </PERFLOG method=parse start=1512441585044 end=1512441590042 duration=4998 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:50 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:51 INFO Driver: Semantic Analysis Completed
17/12/05 10:39:51 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441590188 end=1512441591560 duration=1372 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:51 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
17/12/05 10:39:51 INFO PerfLogger: </PERFLOG method=compile start=1512441584491 end=1512441591758 duration=7267 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:51 INFO Driver: Concurrency mode is disabled, not creating a lock manager
17/12/05 10:39:51 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:51 INFO Driver: Starting command(queryId=root_20171205103945_2f994f07-9e52-456b-97ee-d03e722116ff): use siat
17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441584488 end=1512441592212 duration=7724 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:52 INFO Driver: Starting task [Stage-0:DDL] in serial mode
17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=runTasks start=1512441592212 end=1512441592496 duration=284 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441591760 end=1512441592497 duration=737 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:52 INFO Driver: OK
17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441592571 end=1512441592571 duration=0 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441584478 end=1512441592571 duration=8093 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441592612 end=1512441592613 duration=1 from=org.apache.hadoop.hive.ql.Driver>
res0: org.apache.spark.sql.DataFrame = [result: string] scala> sqlContext.sql("drop table src")
17/12/05 10:40:13 INFO ParseDriver: Parsing command: drop table src
17/12/05 10:40:13 INFO ParseDriver: Parse Completed
17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:17 INFO ParseDriver: Parsing command: DROP TABLE src
17/12/05 10:40:17 INFO ParseDriver: Parse Completed
17/12/05 10:40:17 INFO PerfLogger: </PERFLOG method=parse start=1512441617979 end=1512441617998 duration=19 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:19 INFO Driver: Semantic Analysis Completed
17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441617999 end=1512441619115 duration=1116 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:19 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=compile start=1512441617977 end=1512441619116 duration=1139 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:19 INFO Hive: Dumping metastore api call timing information for : compilation phase
17/12/05 10:40:19 INFO Hive: Total time spent in this metastore function was greater than 1000ms : getTable_(String, String, )=3999
17/12/05 10:40:19 INFO Driver: Concurrency mode is disabled, not creating a lock manager
17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:19 INFO Driver: Starting command(queryId=root_20171205104017_dd3db388-5058-4af4-9076-90035b4837d9): DROP TABLE src
17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441617977 end=1512441619119 duration=1142 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:40:19 INFO Driver: Starting task [Stage-0:DDL] in serial mode
17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=runTasks start=1512441619119 end=1512441664030 duration=44911 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:04 INFO Hive: Dumping metastore api call timing information for : execution phase
17/12/05 10:41:04 INFO Hive: Total time spent in this metastore function was greater than 1000ms : dropTable_(String, String, boolean, boolean, boolean, )=44266
17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441619118 end=1512441664031 duration=44913 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:04 INFO Driver: OK
17/12/05 10:41:04 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441664032 end=1512441664032 duration=0 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441617976 end=1512441664051 duration=46075 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:04 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441664054 end=1512441664054 duration=0 from=org.apache.hadoop.hive.ql.Driver>
res1: org.apache.spark.sql.DataFrame = [] scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ")
17/12/05 10:41:57 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
17/12/05 10:41:57 INFO ParseDriver: Parse Completed
17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:57 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
17/12/05 10:41:57 INFO ParseDriver: Parse Completed
17/12/05 10:41:57 INFO PerfLogger: </PERFLOG method=parse start=1512441717568 end=1512441717619 duration=51 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:58 INFO CalcitePlanner: Starting Semantic Analysis
17/12/05 10:41:58 INFO CalcitePlanner: Creating table siat.src position=27
17/12/05 10:41:58 INFO Driver: Semantic Analysis Completed
17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441717619 end=1512441718637 duration=1018 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:58 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=compile start=1512441717565 end=1512441718637 duration=1072 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:58 INFO Driver: Concurrency mode is disabled, not creating a lock manager
17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:58 INFO Driver: Starting command(queryId=root_20171205104157_e9b5ed54-e7dc-448a-984c-6d5cb37f964f): CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441717565 end=1512441718735 duration=1170 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:41:58 INFO Driver: Starting task [Stage-0:DDL] in serial mode
17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=runTasks start=1512441718735 end=1512441721846 duration=3111 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:42:01 INFO Hive: Dumping metastore api call timing information for : execution phase
17/12/05 10:42:01 INFO Hive: Total time spent in this metastore function was greater than 1000ms : createTable_(Table, )=2431
17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441718638 end=1512441721849 duration=3211 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:42:01 INFO Driver: OK
17/12/05 10:42:01 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441721852 end=1512441721882 duration=30 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441717564 end=1512441721883 duration=4319 from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:42:01 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441721883 end=1512441721883 duration=0 from=org.apache.hadoop.hive.ql.Driver>
res2: org.apache.spark.sql.DataFrame = [result: string] scala> sqlContext.sql("select * from src").collect().foreach(println)
17/12/05 10:42:54 INFO ParseDriver: Parsing command: select * from src
17/12/05 10:42:54 INFO ParseDriver: Parse Completed
17/12/05 10:42:56 INFO deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/12/05 10:42:58 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 467.6 KB, free 142.8 MB)
17/12/05 10:43:02 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 40.5 KB, free 142.8 MB)
17/12/05 10:43:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.66.66:36024 (size: 40.5 KB, free: 143.2 MB)
17/12/05 10:43:02 INFO SparkContext: Created broadcast 0 from collect at <console>:30
17/12/05 10:43:04 INFO FileInputFormat: Total input paths to process : 0
17/12/05 10:43:04 INFO SparkContext: Starting job: collect at <console>:30
17/12/05 10:43:04 INFO DAGScheduler: Job 0 finished: collect at <console>:30, took 0.043396 s scala> val res=sqlContext.sql("select * from src").collect().foreach(println)
17/12/05 10:43:25 INFO ParseDriver: Parsing command: select * from src
17/12/05 10:43:25 INFO ParseDriver: Parse Completed
17/12/05 10:43:26 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 467.6 KB, free 142.3 MB)
17/12/05 10:43:27 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 40.5 KB, free 142.3 MB)
17/12/05 10:43:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.66.66:36024 (size: 40.5 KB, free: 143.2 MB)
17/12/05 10:43:27 INFO SparkContext: Created broadcast 1 from collect at <console>:29
17/12/05 10:43:27 INFO FileInputFormat: Total input paths to process : 0
17/12/05 10:43:27 INFO SparkContext: Starting job: collect at <console>:29
17/12/05 10:43:27 INFO DAGScheduler: Job 1 finished: collect at <console>:29, took 0.000062 s scala> res scala> val res=sqlContext.sql("select count(*) from src").collect().foreach(println)
17/12/05 10:43:47 INFO ParseDriver: Parsing command: select count(*) from src
17/12/05 10:43:47 INFO ParseDriver: Parse Completed
17/12/05 10:43:48 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 467.0 KB, free 141.8 MB)
17/12/05 10:43:48 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 40.4 KB, free 141.8 MB)
17/12/05 10:43:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.66.66:36024 (size: 40.4 KB, free: 143.1 MB)
17/12/05 10:43:48 INFO SparkContext: Created broadcast 2 from collect at <console>:29
17/12/05 10:43:49 INFO FileInputFormat: Total input paths to process : 0
17/12/05 10:43:49 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.66.66:36024 in memory (size: 40.5 KB, free: 143.2 MB)
17/12/05 10:43:49 INFO SparkContext: Starting job: collect at <console>:29
17/12/05 10:43:49 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.66.66:36024 in memory (size: 40.5 KB, free: 143.2 MB)
17/12/05 10:43:49 INFO DAGScheduler: Registering RDD 15 (collect at <console>:29)
17/12/05 10:43:49 INFO DAGScheduler: Got job 2 (collect at <console>:29) with 1 output partitions
17/12/05 10:43:49 INFO DAGScheduler: Final stage: ResultStage 1 (collect at <console>:29)
17/12/05 10:43:49 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
17/12/05 10:43:49 INFO DAGScheduler: Missing parents: List()
17/12/05 10:43:49 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[18] at collect at <console>:29), which has no missing parents
17/12/05 10:43:49 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 12.0 KB, free 142.7 MB)
17/12/05 10:43:49 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 6.0 KB, free 142.7 MB)
17/12/05 10:43:49 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.66.66:36024 (size: 6.0 KB, free: 143.2 MB)
17/12/05 10:43:49 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
17/12/05 10:43:49 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[18] at collect at <console>:29)
17/12/05 10:43:49 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/12/05 10:44:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/12/05 10:44:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/12/05 10:44:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/12/05 10:44:50 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/12/05 10:45:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/12/05 10:45:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/12/05 10:45:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/12/05 10:45:50 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/12/05 10:45:57 INFO AppClient$ClientEndpoint: Executor added: app-20171205103712-0001/0 on worker-20171204180628-192.168.66.66-7078 (192.168.66.66:7078) with 2 cores
17/12/05 10:45:57 INFO SparkDeploySchedulerBackend: Granted executor ID app-20171205103712-0001/0 on hostPort 192.168.66.66:7078 with 2 cores, 512.0 MB RAM
17/12/05 10:45:59 INFO AppClient$ClientEndpoint: Executor updated: app-20171205103712-0001/0 is now RUNNING
17/12/05 10:46:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/12/05 10:46:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/12/05 10:46:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/12/05 10:46:46 INFO SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (xinfang:10363) with ID 0
17/12/05 10:46:47 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, xinfang, partition 0,PROCESS_LOCAL, 1999 bytes)
17/12/05 10:46:48 INFO BlockManagerMasterEndpoint: Registering block manager xinfang:34620 with 143.3 MB RAM, BlockManagerId(0, xinfang, 34620)
17/12/05 10:46:51 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on xinfang:34620 (size: 6.0 KB, free: 143.2 MB)
17/12/05 10:47:07 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to xinfang:10363
17/12/05 10:47:08 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 82 bytes
17/12/05 10:47:14 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 27243 ms on xinfang (1/1)
17/12/05 10:47:14 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/12/05 10:47:14 INFO DAGScheduler: ResultStage 1 (collect at <console>:29) finished in 204.228 s
17/12/05 10:47:14 INFO DAGScheduler: Job 2 finished: collect at <console>:29, took 204.785107 s
[0] scala> res scala> sc.stop()
17/12/05 10:48:32 INFO SparkUI: Stopped Spark web UI at http://192.168.66.66:4041
17/12/05 10:48:35 INFO SparkDeploySchedulerBackend: Shutting down all executors
17/12/05 10:48:35 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
17/12/05 10:48:35 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/12/05 10:48:36 INFO MemoryStore: MemoryStore cleared
17/12/05 10:48:36 INFO BlockManager: BlockManager stopped
17/12/05 10:48:36 INFO BlockManagerMaster: BlockManagerMaster stopped
17/12/05 10:48:36 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/12/05 10:48:36 INFO SparkContext: Successfully stopped SparkContext scala> 17/12/05 10:48:36 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/12/05 10:48:36 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/12/05 10:48:38 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
Spark记录-Spark-Shell客户端操作读取Hive数据的更多相关文章
- Spark记录-spark编程介绍
Spark核心编程 Spark 核心是整个项目的基础.它提供了分布式任务调度,调度和基本的 I/O 功能.Spark 使用一种称为RDD(弹性分布式数据集)一个专门的基础数据结构,是整个机器分区数据的 ...
- Spark记录-Spark性能优化解决方案
Spark性能优化的10大问题及其解决方案 问题1:reduce task数目不合适解决方式:需根据实际情况调节默认配置,调整方式是修改参数spark.default.parallelism.通常,r ...
- Spark记录-Spark性能优化(开发、资源、数据、shuffle)
开发调优篇 原则一:避免创建重复的RDD 通常来说,我们在开发一个Spark作业时,首先是基于某个数据源(比如Hive表或HDFS文件)创建一个初始的RDD:接着对这个RDD执行某个算子操作,然后得到 ...
- Spark记录-spark介绍
Apache Spark是一个集群计算设计的快速计算.它是建立在Hadoop MapReduce之上,它扩展了 MapReduce 模式,有效地使用更多类型的计算,其中包括交互式查询和流处理.这是一个 ...
- R语言读取Hive数据表
R通过RJDBC包连接Hive 目前Hive集群是可以通过跳板机来访问 HiveServer, 将Hive 中的批量数据读入R环境,并进行后续的模型和算法运算. 1. 登录跳板机后需要首先在Linux ...
- Spark记录-本地Spark读取Hive数据简单例子
注意:将mysql的驱动包拷贝到spark/lib下,将hive-site.xml拷贝到项目resources下,远程调试不要使用主机名 import org.apache.spark._ impor ...
- Spark SQL读取hive数据时报找不到mysql驱动
Exception: Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BoneC ...
- Spark记录-Scala shell命令
1.scala shell命令 scala> :help All commands can be abbreviated, e.g., :he instead of :help. :edit & ...
- Spark记录-Spark on Yarn框架
一.客户端进行操作 1.根据yarnConf来初始化yarnClient,并启动yarnClient2.创建客户端Application,并获取Application的ID,进一步判断集群中的资源是否 ...
随机推荐
- Using svn in CLI with Batch
del %~n0.txt@echo offsetlocal EnableDelayedExpansionfor /f "delims=" %%i in ('DIR /A:D /B' ...
- 【工具技巧】:sublime notepad++ 多行编辑
1. 多行编辑 sublime 最简单的多行编辑实现方法 1. 鼠标选中文件 然后按 ctrl+D 自动选中相同的进行同时编辑 2.选中shift按键+鼠标右键进行选择,可以同时选中多行进行编辑. n ...
- list1与list2求交集的方法总结!
一.有序集合求交集的方法有 a)二重for循环法,时间复杂度O(n*n) b)拉链法,时间复杂度O(n) c)水平分桶,多线程并行 d)bitmap,大大提高运算并行度,时间复杂度O(n) e)跳表, ...
- ZJOI2019 Day1 题解
想要继续向前,就从克服内心的恐惧开始. 麻将 题意 在麻将中,我们称点数连续的三张牌或三张点数一样的成为面子,称两张点数一样的牌为对子.一副十四张麻将牌的胡牌条件是可以分成四个面子和一个对子或者分成七 ...
- BZOJ4516 SDOI2016生成魔咒(后缀数组+平衡树)
一个字符串本质不同的子串数量显然是总子串数减去所有height值.如果一个个往里加字符的话,每次都会改动所有后缀完全没法做.但发现如果从后往前加的话,每次只会添加一个后缀.于是我们把字符串倒过来,每次 ...
- Hibernate基本应用01
一. Hibernate简介 1.1 Hibernate介绍 Hibernate是一个开放源代码的对象关系映射框架,它对JDBC进行了非常轻量级的对象封装,它将POJO与数据库表建立映射关系,是一个全 ...
- mysql test== 坑
错误 <if test="status == '1'"> 正确 <if test="status == '1'.toString()">
- 自学Zabbix之路15.4 Zabbix数据库表结构简单解析-Expressions表、Media表、 Events表
点击返回:自学Zabbix之路 点击返回:自学Zabbix4.0之路 点击返回:自学zabbix集锦 自学Zabbix之路15.4 Zabbix数据库表结构简单解析-Expressions表.Medi ...
- NOIP2012题解
NOIP2012题解 Day1 Vigenère 密码 vigenere 直接模拟就好了,对于那张表找找规律就很短了. #include<iostream> #include<cst ...
- 变量[^_^][T_T]
变量[^_^][T_T]source .bashrcget_ps1(){if [ "$?" = "0" ]then#we're on the system co ...