一.Spark 为什么比 MapReduce 要高效? 举一个例子: select a.state,count(*),AVERAGE(c.price) from a join b on (a.id=b.id) join c on (a.itemId=c.itermId) group by a.state 如果是用 hive 来实现,那么多个此作业将会被转换成 3 个 job 每一个 job 有 一个 map 和一个 reduce,reduce的结果会存储在 hdfs 上 1.hdfs 数据的存储…
SparkContext可以通过parallelize把一个集合转换为RDD def main(args: Array[String]): Unit = { val conf = new SparkConf(); val list = List(1, 2, 3, 4, 5,6); conf.set("spark.master", "local") conf.set("spark.app.name", "spark demo")…
scala集合转化为DS/DF case class TestPerson(name: String, age: Long, salary: Double) val tom = TestPerson(,35.5) val sam = TestPerson(,40.5) val PersonList = mutable.MutableList[TestPerson]() PersonList += tom PersonList += sam val personDS = PersonList.to…
原文地址:https://my.oschina.net/tearsky/blog/629201 摘要: 1.Operation category READ is not supported in state standby 2.配置spark.deploy.recoveryMode选项为ZOOKEEPER 3.多Master如何配置 4.No Space Left on the device(Shuffle临时文件过多) 5.java.lang.OutOfMemory, unable to cr…
问题导读 1.当前集群的可用资源不能满足应用程序的需求,怎么解决? 2.内存里堆的东西太多了,有什么好办法吗? 1.WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster uito ensure that workers are registered and have sufficient memory 当前的集群的可用资源不能满足应用程序所请求的资源. 资源分2…
原文地址:https://my.oschina.net/tearsky/blog/629201 摘要: 1.Operation category READ is not supported in state standby 2.配置spark.deploy.recoveryMode选项为ZOOKEEPER 3.多Master如何配置 4.No Space Left on the device(Shuffle临时文件过多) 5.java.lang.OutOfMemory, unable to cr…
只当个搬运工吧 搭建篇:https://www.cnblogs.com/mafly/p/redis_cluster.html 测试能用 常见问题: 1 redis操作key时出现以下错误 (error) MOVED 5798 127.0.0.1:7001 https://www.fashici.com/tech/356.html 2.ERR] Not all 16384 slots are covered by nodes. https://blog.csdn.net/vtopqx/artic…