最近测试环境基于shc[https://github.com/hortonworks-spark/shc]的hbase-connector总是异常连接不到zookeeper,看下报错日志: 18/06/20 10:45:02 INFO RecoverableZooKeeper: Process identifier=hconnection-0x5175ab05 connecting to ZooKeeper ensemble=localhost:2181 18/06/20 10:45:02 IN…
今天本来想写一个spark dataframe unionall的demo,由于粗心报下面错误: Exception in thread "main" org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the same number of columns, but the left table has 3 columns and the right has 4; at o…
Spark Streaming揭秘 Day15 No Receivers方式思考 在前面也有比较多的篇幅介绍了Receiver在SparkStreaming中的应用,但是我们也会发现,传统的Receiver虽然使用比较方便,但是还是存在不少问题的,今天主要围绕kafka direct access讨论下,如果抛开Receiver来实现inputDStream该怎么做. KafkaRDD 我们知道,在Spark中,数据的访问主要是通过RDD来进行,而针对Kafka的数据,是使用了KafkaRDD.…