Installing Apache Spark on Ubuntu 16.04
Santosh Srinivas
on 07 Nov 2016, tagged onApache Spark, Analytics, Data Minin
I've finally got to a long pending to-do-item to play with Apache Spark.
The following installation steps worked for me on Ubuntu 16.04.
- Download the latest pre-built version from http://spark.apache.org/downloads.html
The below options worked for me:
- Unzip and move Spark
cd ~/Downloads/
tar xzvf spark-2.0.1-bin-hadoop2.7.tgz
mv spark-2.0.1-bin-hadoop2.7/ spark
sudo mv spark/ /usr/lib/
- Install SBT
As mentioned at sbt - Download
echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823
sudo apt-get update
sudo apt-get install sbt
- Make sure Java is installed
If not, install java
sudo apt-add-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
- Configure Spark
cd /usr/lib/spark/conf/
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
Add the following lines
JAVA_HOME=/usr/lib/jvm/java-8-oracle
SPARK_WORKER_MEMORY=4g
- Configure IPv6
Basically, disable IPv6 using sudo vi /etc/sysctl.conf
and add below lines
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
- Configure .bashrc
I modified .bashrc in Sublime Text using subl ~/.bashrc
and added the following lines
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export SBT_HOME=/usr/share/sbt-launcher-packaging/bin/sbt-launch.jar
export SPARK_HOME=/usr/lib/spark
export PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$SBT_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
- Configure fish (Optional - But I love the fish shell)
Modify config.fish
using subl ~/.config/fish/config.fish
and add the following lines
#Credit: http://fishshell.com/docs/current/tutorial.html#tut_startup
set -x PATH $PATH /usr/lib/spark
set -x PATH $PATH /usr/lib/spark/bin
set -x PATH $PATH /usr/lib/spark/sbin
- Test Spark (Should work both in fish and bash)
Run pyspark
(this is available in /usr/lib/spark/bin/
) and test out.
For example ....
>>> a = 5
>>> b = 3
>>> a+b
8
>>> print(“Welcome to Spark”)
Welcome to Spark
## type Ctrl-d to exit
Try also, the built in run-example using run-example org.apache.spark.examples.SparkPi
That's it! You are ready to rock on using Apache Spark!
Next, I plan to checkout analysis using R as mentioned inhttp://www.milanor.net/blog/wp-content/uploads/2016/11/interactiveDataAnalysiswithSparkR_v5.pdf
Installing Apache Spark on Ubuntu 16.04的更多相关文章
- Install and Configure Apache Kafka on Ubuntu 16.04
https://devops.profitbricks.com/tutorials/install-and-configure-apache-kafka-on-ubuntu-1604-1/ by hi ...
- Install LAMP Stack On Ubuntu 16.04
原文:http://www.unixmen.com/how-to-install-lamp-stack-on-ubuntu-16-04/ LAMP is a combination of operat ...
- Ubuntu 16.04 LAMP server tutorial with Apache 2.4, PHP 7 and MariaDB (instead of MySQL)
https://www.howtoforge.com/tutorial/install-apache-with-php-and-mysql-on-ubuntu-16-04-lamp/ This tut ...
- digitalocean --- How To Install Apache Tomcat 8 on Ubuntu 16.04
https://www.digitalocean.com/community/tutorials/how-to-install-apache-tomcat-8-on-ubuntu-16-04 Intr ...
- 安装Hadoop及Spark(Ubuntu 16.04)
安装Hadoop及Spark(Ubuntu 16.04) 安装JDK 下载jdk(以jdk-8u91-linux-x64.tar.gz为例) 新建文件夹 sudo mkdir /usr/lib/jvm ...
- 解决Ubuntu 16.04 上Android Studio2.3上面运行APP时提示DELETE_FAILED_INTERNAL_ERROR Error while Installing APKs的问题
本人工作环境:Ubuntu 16.04 LTS + Android Studio 2.3 AVD启动之后,运行APP,报错提示: DELETE_FAILED_INTERNAL_ERROR Error ...
- Installing Moses on Ubuntu 16.04
Installing Moses on Ubuntu 16.04 The process of installation To install requirements sudo apt-get in ...
- Installing Hyperledger Fabric v1.1 on Ubuntu 16.04 — Part I
There is an entire library of Blockchain APIs which you can select according to the needs that suffi ...
- 如何在Ubuntu 16.04上安装Apache Web服务器
转载自:https://www.howtoing.com/how-to-install-the-apache-web-server-on-ubuntu-16-04 介绍 Apache HTTP服务器是 ...
随机推荐
- Kylin使用笔记-0: kylin介绍
APACHE KYLIN™ 概览 Apache Kylin™是一个开源的分布式分析引擎,提供Hadoop之上的SQL查询接口及多维分析(OLAP)能力以支持超大规模数据,最初由eBay Inc. 开发 ...
- 【LOJ】#2340. 「WC2018」州区划分
题解 学习一个全世界人都会只有我不会的东西 子集变换! 难道我要把这题当板子讲?等等这题好像是板...WC出板题好刺激啊= = 假装我们都做过HAOI2015的FMT题,我们都知道一些FMT怎么解决或 ...
- lr11录制时报“Request Connection: Remote Server @ 0.0.0.0:1080 (Service=?) NOT PROXIED! )”解决方法
在录制脚本的时候出现如下现象: 解决方法: LoadRunner录制脚本时出现:Unable to connect to remote server),有事件没有脚本的问题 1.首先要查看IE浏览 ...
- ref:CodeIgniter框架内核设计缺陷可能导致任意代码执行
ref:https://www.seebug.org/vuldb/ssvid-96217 简要描述: 为准备乌云深圳沙龙,准备几个0day做案例. 官方承认这个问题,说明会发布补丁,但不愿承认这是个『 ...
- Laravel框架初学一路由(基本路由)
基本路由 Laravel最基本的路由:接收一个URI和Closure闭包函数 ,提供了定义路由的一种非常简单且富有表达力的方式 Route::get("foo", function ...
- js中对同步和异步的理解
你应该知道,javascript语言是一门“单线程”的语言,不像java语言,类继承Thread再来个thread.start就可以开辟一个线程,所以,javascript就像一条流水线,仅仅是一条流 ...
- android Handler机制 消息机制
韩梦飞沙 韩亚飞 313134555@qq.com yue31313 han_meng_fei_sha 循环器Looper 管理该线程内对象之间的消息交换 messageExchange 循 ...
- [SimpleOJ229]隧道
题目大意: 有10个格子,初始状态a和b分别在5和6上. 现在有n个任务,每个任务都有特定的位置. 在每个单位时间,a和b可以分别进行以下事件中的任意一件: 1.向左(右)移动一个格子: 2.锁定在当 ...
- linux中的文件编码及编码修改
查看文件编码 在Linux中查看文件编码可以通过以下几种方式: 1.在Vim中可以直接查看文件编码 :set fileencoding 即可显示文件编码格式. 如果你只是想查看其它编码格式的文件或者想 ...
- 使用TensorFlow高级别的API进行编程
这里涉及到的高级别API主要是使用Estimator类来编写机器学习的程序,此外你还需要用到一些数据导入的知识. 为什么使用Estimator Estimator类是定义在tf.estimator.E ...