培训系列5--spark 的 RDD 的 reduce方法使用 1.spark-shell环境下准备数据 val collegesRdd= sc.textFile("/user/hdfs/CollegeNavigator.csv")val header= collegesRdd.first val headerlessRdd= collegesRdd.filter( line=>{ line!= header } ) 2.准备学生数的map val countStuMap= he…
学习了之前的rdd的filter以后,这次来讲spark的map方式 1.获得文件 val collegesRdd= sc.textFile("/user/hdfs/CollegeNavigator.csv")val header= collegesRdd.first 2.通过filter获得纯粹的数据 val headerlessRdd= collegesRdd.filter( line=>{ line!= header } ) 3.查看一下实际数据格式 scala> h…
通过hdfs或者spark用户登录操作系统,执行spark-shell spark-shell 也可以带参数,这样就覆盖了默认得参数 spark-shell --master yarn --num-executors 2 --executor-memory 2G --driver-memory 1536M 默认值得设置一般在/etc/spark/conf/spark-env.sh里面设置 一.通过array数组自动获得 1.枚举生成数组 val arr=Array(1,2,3,4,5,6,7)…
一.做基础数据准备 这次使用fights得数据. scala> val flights= sc.textFile("/user/hdfs/data/Flights/flights.csv")flights: org.apache.spark.rdd.RDD[String] = /user/hdfs/data/Flights/flights.csv MapPartitionsRDD[3] at textFile at <console>:24 scala> val…
一.如何处理RDD的filter 1. 把第一行的行头去掉 scala> val collegesRdd= sc.textFile("/user/hdfs/CollegeNavigator.csv")collegesRdd: org.apache.spark.rdd.RDD[String] = /user/hdfs/CollegeNavigator.csv MapPartitionsRDD[3] at textFile at <console>:24 scala>…
// dataframe is the topic 一.获得基础数据.先通过rdd的方式获得数据 val ny= sc.textFile("data/new_york/")val header=ny.firstval filterNY =ny.filter(listing=>{ listing.split(",").size==14 && listing!=header }) //因为后面多是按照表格的形式来处理dataframe,所以这里增加…
//groupbykey 一.准备数据val flights=sc.textFile("data/Flights/flights.csv")val sampleFlights=sc.parallelize(flights.take(1000))val header=sampleFlights.firstval filteredFlights=sampleFlights.filter(line=>{ line!=header&&line.split(",&…
1.前期数据准备(同之前的章节) val collegesRdd= sc.textFile("/user/hdfs/CollegeNavigator.csv")val header= collegesRdd.first val headerlessRdd= collegesRdd.filter( line=>{ line!= header } ) 2.获得map val typeMapCount= headerlessRdd.map(line=>{val strtype=l…
一,选择数据库,这里使用标准mysql sakila数据库 mysql -u root -D sakila -p 二.首先尝试把表中的数据导入到hdfs文件中,这样后续就可以使用spark来dataframe或者rdd来处理数据 sqoop import --connect "jdbc:mysql://host03.xyy:3306/sakila" --username root --password root --table rental --target-dir "Sqo…
We will be using the sakila database extensively inside the rest of the course and it would be great if you can follow the installation process below. Importing the Sakila Database 一. Change the File .这一步原来提供的文件中可能已经i做好了. Find and Replace all "InnoDB…