[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:
[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:
mydf001=sqlContext.read.format("jdbc").option("url","jdbc:mysql://localhost/loudacre")\
.option("dbtable","accounts").option("user","training").option("password","training").load()
In [10]: mydf001=sqlContext.read.format("jdbc").option("url","jdbc:mysql://localhost/loudacre")\
....: .option("dbtable","accounts").option("user","training").option("password","training").load()
17/10/03 05:59:53 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse
17/10/03 05:59:53 INFO hive.HiveContext: Initializing metastore client version 1.1.0 using Spark classes.
17/10/03 05:59:53 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0-cdh5.7.0
17/10/03 05:59:53 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.7.0
17/10/03 05:59:56 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost.localdomain:9083
17/10/03 05:59:56 INFO hive.metastore: Opened a connection to metastore, current connections: 1
17/10/03 05:59:56 INFO hive.metastore: Connected to metastore.
17/10/03 05:59:56 INFO session.SessionState: Created local directory: /tmp/c2d22d09-7425-4bb3-94c3-39cb32267c7d_resources
17/10/03 05:59:56 INFO session.SessionState: Created HDFS directory: /tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session.SessionState: Created local directory: /tmp/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session.SessionState: Created HDFS directory: /tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d/_tmp_space.db
17/10/03 05:59:56 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.
In [11]:
In [11]: type(mydf001)
Out[11]: pyspark.sql.dataframe.DataFrame
In [12]: mydf001.count()
17/10/03 06:00:29 INFO spark.SparkContext: Starting job: count at NativeMethodAccessorImpl.java:-2
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Registering RDD 2 (count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Got job 0 (count at NativeMethodAccessorImpl.java:-2) with 1 output partitions
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[2] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/10/03 06:00:30 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 11.0 KB, free 11.0 KB)
17/10/03 06:00:31 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.2 KB, free 16.1 KB)
17/10/03 06:00:31 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:36793 (size: 5.2 KB, free: 208.8 MB)
17/10/03 06:00:31 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/10/03 06:00:31 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[2] at count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:31 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/10/03 06:00:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 1911 bytes)
17/10/03 06:00:31 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
17/10/03 06:00:32 INFO codegen.GenerateMutableProjection: Code generated in 425.82589 ms
17/10/03 06:00:32 INFO codegen.GenerateUnsafeProjection: Code generated in 78.278589 ms
17/10/03 06:00:33 INFO codegen.GenerateMutableProjection: Code generated in 84.676206 ms
17/10/03 06:00:33 INFO codegen.GenerateUnsafeRowJoiner: Code generated in 60.144399 ms
17/10/03 06:00:33 INFO codegen.GenerateUnsafeProjection: Code generated in 95.977074 ms
17/10/03 06:00:34 INFO jdbc.JDBCRDD: closed connection
17/10/03 06:00:34 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1334 bytes result sent to driver
17/10/03 06:00:34 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3081 ms on localhost (1/1)
17/10/03 06:00:34 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/10/03 06:00:34 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (count at NativeMethodAccessorImpl.java:-2) finished in 3.163 s
17/10/03 06:00:34 INFO scheduler.DAGScheduler: looking for newly runnable stages
17/10/03 06:00:34 INFO scheduler.DAGScheduler: running: Set()
17/10/03 06:00:34 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
17/10/03 06:00:34 INFO scheduler.DAGScheduler: failed: Set()
17/10/03 06:00:34 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[5] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/10/03 06:00:34 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 12.1 KB, free 28.3 KB)
17/10/03 06:00:34 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.6 KB, free 33.9 KB)
17/10/03 06:00:34 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:36793 (size: 5.6 KB, free: 208.8 MB)
17/10/03 06:00:34 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
17/10/03 06:00:34 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[5] at count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:34 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/10/03 06:00:34 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,NODE_LOCAL, 1999 bytes)
17/10/03 06:00:34 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
17/10/03 06:00:34 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
17/10/03 06:00:34 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 32 ms
17/10/03 06:00:35 INFO codegen.GenerateMutableProjection: Code generated in 52.636353 ms
17/10/03 06:00:35 INFO codegen.GenerateMutableProjection: Code generated in 49.757505 ms
17/10/03 06:00:35 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 1666 bytes result sent to driver
17/10/03 06:00:35 INFO scheduler.DAGScheduler: ResultStage 1 (count at NativeMethodAccessorImpl.java:-2) finished in 0.795 s
17/10/03 06:00:35 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 789 ms on localhost (1/1)
17/10/03 06:00:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/10/03 06:00:35 INFO scheduler.DAGScheduler: Job 0 finished: count at NativeMethodAccessorImpl.java:-2, took 6.451521 s
Out[12]: 129761
In [13]:
[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:的更多相关文章
- [Spark][Python]spark 从 avro 文件获取 Dataframe 的例子
[Spark][Python]spark 从 avro 文件获取 Dataframe 的例子 从如下地址获取文件: https://github.com/databricks/spark-avro/r ...
- Spark(Python) 从内存中建立 RDD 的例子
Spark(Python) 从内存中建立 RDD 的例子: myData = ["Alice","Carlos","Frank"," ...
- [Spark][Python]Spark Python 索引页
Spark Python 索引页 为了查找方便,建立此页 === RDD 基本操作: [Spark][Python]groupByKey例子
- [spark][python]Spark map 处理
map 就是对一个RDD的各个元素都施加处理,得到一个新的RDD 的过程 [training@localhost ~]$ cat names.txtYear,First Name,County,Sex ...
- crontab定时运行python脚本访问MySQL遇到问题
最近写了一个python脚本来定时备份MySQL数据库.具体实现如下: 1)python脚本中使用os.system("mysqldump -h127.0.0.1 -uroot -ppass ...
- python+pymysql访问mysql数据库
今天跟大家分享两种场景的python连接MySQL方法: 场景一:连接远程MySQL 首先,安装pymysql:在命令行执行pip install pymysql指令. 然后,导入pymysql: i ...
- [Spark][Python]Spark Join 小例子
[training@localhost ~]$ hdfs dfs -cat people.json {"name":"Alice","pcode&qu ...
- 今天看到可以用sqlalchemy在python上访问Mysql
from sqlalchemy import create_engine, MetaData, and_ 具体的还没有多看.
- 基础 ADO.NET 访问MYSQL 与 MSSQL 数据库例子
虽然实际开发时都是用 Entity 了,但是基础还是要掌握和复习的 ^^ //set connection string, server,database,username,password MySq ...
随机推荐
- 基于Jmeter和Testlink的自动化测试框架研究与实施
关于测试框架搭建的详细过程,会在另一篇文章中详细介绍:http://www.cnblogs.com/leeboke/p/6145977.html 摘 要 目前基于Jmeter的接口自动化测试框架,大多 ...
- Git的安装配置(win环境)
安装: 首先安装win版本的git msysgit:https://git-for-windows.github.io 注:安装时要勾选生成桌面快捷方式. 默认安装完后依次执行: $ git conf ...
- 分享一下我研究SQLSERVER以来收集的笔记
分享一下我研究SQLSERVER以来收集的笔记 前言 为什麽分享??因为像现在网上很多人攻城师那样,转行去卖水果,卖早餐,总有一日我也会离开这个行业的 由于本人不是在大公司上班工资很低,我希望有一天存 ...
- 洗礼灵魂,修炼python(45)--巩固篇—【转载】类的__now__和__init__
学到这里了,相信你应该对__init__非常熟悉了,就是构造器呗,当类被实例化时初始化的作用 但__init__其实不是实例化一个类的时候第一个自动调用的方法.当实例化一个类时,最先被调用的方法 其实 ...
- 几种流行的AJAX框架jQuery,Mootools,Dojo,Ext JS的对比
AJAX是web2.0的基石,现在网上流行几种开源的AJAX框架,比如:jQuery,Mootools,Dojo,Ext JS等等,那么我们到底在什么情况下该使用那个框架? 让我们来想想选择AJAX框 ...
- python第二十一天---昨天没写完作业
作业 2, 模拟计算器开发:实现加减乘除及拓号优先级解析用户输入 1 - 2 * ( (60-30 +(-40/5) * (9-2*5/3 + 7 /3*99/4*2998 +10 * 568/14 ...
- sql最简单的查询语句
-- 2 **************************************************** -- 最简单的查询语句 -- 2.1 ----------------------- ...
- 安全之路 —— C/C++实现利用添加注册表项实现文件自启动
简介 添加注册表项是实现文件自启动的经典方法之一,但因为操作注册表项是一个敏感操作,被报毒可能性较大,但即便如此,这个方法还是值得一学的,因为后期大部分编程都涉及到注册表操作. 最常使用到的注册表项有 ...
- kettle 合并记录步骤中的 关键字段和 比较字段的说明
该步骤用于将两个不同来源的数据合并,这两个来源的数据分别为旧数据和新数据,该步骤将旧数据和新数据按照指定的关键字匹配.比较.合并. 需要设置的参数: 旧数据来源:旧数据来源的步骤 新数据来源.新数据来 ...
- 一道题引发的self和super
这个是那道题目,让写出输出的结果: 刚看到这一道题目的时候我的第一反应就是输出Son Father.但是输出的结果是Son Son. 下面是解析: 我首先建立了两个类,一个Fathe ...