刚开始接触scala,觉得语法简单,一时兴起就写了两个简单的例子 public class Calculate { public static void test1(){ for(int i=1;i<10;i++){ for(int j=1;j<=i;j++){ System.out.print(j+"*"+i+"="+i*j+" "); } System.out.println(); } } public static voi
几年前,包括最近,我看了各种书籍.教程.官网.但是真正能够把RDD.DataFrame.DataSet解释得清楚一点的.论据多一点少之又少,甚至有的人号称Spark专家,但在这一块根本说不清楚.还有国内的一些书籍,小猴真的想问一声:Are you OK?书名别再叫精通xxx技术了,请改名为 xxx技术从入门到放弃.这样可以有效避免耽误别人学习,不好吗? 大家都在告诉我们结论,但其实,小猴作为一名长期混迹于开源社区.并仍在一线大数据开发的技术人,深谙技术文化之一: To experience |
第一步:学习使用Scala解释器 开始Scala最简单的方法是使用Scala解释器,它是一个编写Scala表达式和程序的交互式“shell”.在使用Scala之前需要安装Scala,可以参考 First Steps to Scala 内容. 你可以在命令提示符里输入scala使用它: $ scala Welcome to Scala version 2.9.2. Type in expressions to have them evaluated. Type :help for more inf
Using MLLib in ScalaFollowing code snippets can be executed in spark-shell. Binary ClassificationThe following code snippet illustrates how to load a sample dataset, execute a training algorithm on this training data using a static method in the algo
scala> val input =sc.textFile("/home/simon/SparkWorkspace/test.txt")input: org.apache.spark.rdd.RDD[String] = /home/simon/SparkWorkspace/test.txt MapPartitionsRDD[32] at textFile at <console>:24 scala> input.foreach(println)hello sim
scala> val rdd1=sc.parallelize(Array("coffe","coffe","hellp","hellp","pandas","mokey") )rdd1: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[8] at parallelize at <console>:24 s
foreach(f: T => Unit) 对RDD的所有元素应用f函数进行处理,f无返回值./** * Applies a function f to all elements of this RDD. */def foreach(f: T => Unit): Unit scala> val rdd = sc.parallelize(1 to 9, 2) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at p