大数据入门到精通11-spark dataframe 基础操作

【大数据入门到精通11-spark dataframe 基础操作】的更多相关文章

大数据入门到精通14--hive 对字符串的操作

一.基本操作 concat(string,string,string)concat_ws(string,string,string)select customer_id,concat_ws(" ",first_name,last_name),email,address_id from customer;lower(string)initcap(string)if 表达式 select customer_id,if (length(first_name)>6 , substring…

大数据入门到精通12--spark dataframe 注册成hive 的临时表

一.获得最初的数据并形成dataframe val ny= sc.textFile("data/new_york/")val header=ny.firstval filterNY =ny.filter(listing=>{ listing.split(",").size==14 && listing!=header })val nyMap= filterNY.map(listing=>{ val listingInfo=listing.…

大数据入门到精通11-spark dataframe 基础操作

// dataframe is the topic 一.获得基础数据.先通过rdd的方式获得数据 val ny= sc.textFile("data/new_york/")val header=ny.firstval filterNY =ny.filter(listing=>{ listing.split(",").size==14 && listing!=header }) //因为后面多是按照表格的形式来处理dataframe,所以这里增加…

大数据入门到精通18--sqoop 导入关系库到hdfs中和hive表中

一,选择数据库,这里使用标准mysql sakila数据库 mysql -u root -D sakila -p 二.首先尝试把表中的数据导入到hdfs文件中,这样后续就可以使用spark来dataframe或者rdd来处理数据 sqoop import --connect "jdbc:mysql://host03.xyy:3306/sakila" --username root --password root --table rental --target-dir "Sqo…

大数据入门到精通2--spark rdd 获得数据的三种方法

通过hdfs或者spark用户登录操作系统,执行spark-shell spark-shell 也可以带参数,这样就覆盖了默认得参数 spark-shell --master yarn --num-executors 2 --executor-memory 2G --driver-memory 1536M 默认值得设置一般在/etc/spark/conf/spark-env.sh里面设置一.通过array数组自动获得 1.枚举生成数组 val arr=Array(1,2,3,4,5,6,7)…

大数据入门到精通13--为后续和MySQL数据库准备

We will be using the sakila database extensively inside the rest of the course and it would be great if you can follow the installation process below. Importing the Sakila Database 一. Change the File .这一步原来提供的文件中可能已经i做好了. Find and Replace all "InnoDB…

【大数据入门到精通11-spark dataframe 基础操作】的更多相关文章

大数据入门到精通14--hive 对字符串的操作

大数据入门到精通12--spark dataframe 注册成hive 的临时表

大数据入门到精通11-spark dataframe 基础操作

大数据入门到精通18--sqoop 导入关系库到hdfs中和hive表中

大数据入门到精通2--spark rdd 获得数据的三种方法

大数据入门到精通13--为后续和MySQL数据库准备

大数据入门到精通9-真正得wordcount

大数据入门到精通8-spark RDD 复合key 和复合value 的map reduce操作

大数据入门到精通5--spark 的 RDD 的 reduce方法使用

大数据入门到精通4--spark的rdd的map使用方式