Spark集群搭建

视频教程

1、优酷

2、YouTube

安装scala环境

下载地址http://www.scala-lang.org/download/

上传scala-2.10.5.tgz到master和slave机器的hadoop用户installer目录下

两台机器都要做

[hadoop@master installer]$ ls

hadoop2  hadoop-2.6.0.tar.gz  scala-2.10.5.tgz

解压

[hadoop@master installer]$ tar -zxvf scala-2.10.5.tgz

[hadoop@master installer]$ mv scala-2.10.5 scala

[hadoop@master installer]$ cd scala

[hadoop@master scala]$ pwd

/home/hadoop/installer/scala

配置环境变量:

[hadoop@master ~]$ vim .bashrc

# .bashrc

# Source global definitions

if [ -f /etc/bashrc ]; then

. /etc/bashrc

fi

# User specific aliases and functions

export JAVA_HOME=/usr/java/jdk1.7.0_79

export HADOOP_HOME=/home/hadoop/installer/hadoop2

export SCALA_HOME=/home/hadoop/installer/scala

export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib:$JAVA_HOME/lib:$SCALA_HOME/lib

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin

[hadoop@master ~]$ . .bashrc

安装python

安装gcc

[root@master ~]# mkdir /RHEL5U4

[root@master ~]# mount /dev/cdrom /media/

[root@master media]# cp -r * /RHEL5U4/

[root@master ~]vim /etc/yum.repos.d/iso.repo

[rhel-Server]

Name=5u4_Server

Baseurl=file:///RHEL5U4/Server

Enable=1

Gpgcheck=0

Gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

yum clean all

yum install gcc

Python安装

[root@master installer]# tar -zxvf Python-2.7.12

上传zlib-1.2.8.tar.gz

替换/root/installer/Python-2.7.12/Modules的zlib

[root@master Python-2.7.12]# ./configure --prefix=/usr/local/python27

[root@master Python-2.7.12]# make

[root@master Python-2.7.12]# make install

[root@master Python-2.7.12]# mv /usr/bin/python /usr/bin/python_old

[root@master Python-2.7.12]# ln -s /usr/local/python27/bin/python /usr/bin/

[root@master Python-2.7.12]# python

Python 2.7.12 (default, Nov  7 2016, 21:42:16)

[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>>

安装spark环境

下载地址http://spark.apache.org/downloads.html

上传spark-2.0.0-bin-hadoop2.6.tgz到master的hadoop用户installer目录下

解压缩

[hadoop@master installer]$ tar -zxvf spark-2.0.0-bin-hadoop2.6.tgz

[hadoop@master installer]$ mv spark-2.0.0-bin-hadoop2.6 spark2

[hadoop@master installer]$ cd spark2/

[hadoop@master spark2]$ ls

bin  conf  data  examples  jars  LICENSE  licenses  NOTICE  python  R  README.md  RELEASE  sbin  yarn

[hadoop@master spark2]$ pwd

/home/hadoop/installer/spark2

[hadoop@master ~]$ vim .bashrc

# .bashrc

# Source global definitions

if [ -f /etc/bashrc ]; then

. /etc/bashrc

fi

# User specific aliases and functions

export JAVA_HOME=/usr/java/jdk1.7.0_79

export HADOOP_HOME=/home/hadoop/installer/hadoop2

export SCALA_HOME=/home/hadoop/installer/scala

export SPARK_HOME=/home/hadoop/installer/spark2

export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib:$JAVA_HOME/lib:$SCALA_HOME/lib

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin

[hadoop@master ~]$ . .bashrc

[hadoop@master ~]$ scp .bashrc slave:~

.bashrc                                                                                            100%  621     0.6KB/s   00:00

在slave机器上执行

[hadoop@slave ~]$ . .bashrc

配置spark

[hadoop@master conf]$ cp spark-env.sh.template spark-env.sh

[hadoop@slave conf]$ vim spark-env.sh

#!/usr/bin/env bash

#

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

#

#    http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

#

export JAVA_HOME=/usr/java/jdk1.7.0_79

export SCALA_HOME=/home/hadoop/installer/scala

export SPARK_MASTER_HOST=master

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_EXECUTOR_MEMORY=600M

export SPARK_DRIVER_MEMORY=600M

[hadoop@slave conf]$ vim slaves

master

slave

[hadoop@master installer]$ scp -r spark2 slave:~/installer/

启动spark集群

[hadoop@master ~]$ start-master.sh

[hadoop@master ~]$ start-slaves.sh

[hadoop@master ~]$ jps

17769 ResourceManager

20192 Master

20275 Worker

17443 NameNode

20521 Jps

17631 SecondaryNameNode

[hadoop@slave ~]$ jps

13297 DataNode

15367 Worker

13408 NodeManager

16245 Jps

Spark wordcount

[hadoop@master ~]$ spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel).

16/11/04 11:05:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/11/04 11:05:09 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.

Spark context Web UI available at http://192.168.3.100:4040

Spark context available as 'sc' (master = local[*], app id = local-1478228709028).

Spark session available as 'spark'.

Welcome to

____              __

/ __/__  ___ _____/ /__

_\ \/ _ \/ _ `/ __/  '_/

/___/ .__/\_,_/_/ /_/\_\   version 2.0.0

/_/

Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.7.0_79)

Type in expressions to have them evaluated.

Type :help for more information.

scala> val file = sc.textFile("hdfs://master:9000/data/wordcount")

16/11/04 11:05:14 WARN util.SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes

file: org.apache.spark.rdd.RDD[String] = hdfs://master:9000/data/input/wordcount MapPartitionsRDD[1] at textFile at <console>:24

scala> val count=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)

count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26

scala> count.collect()

res0: Array[(String, Int)] = Array((package,1), (this,1), (Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version),1), (Because,1), (Python,2), (cluster.,1), (its,1), ([run,1), (general,2), (have,1), (pre-built,1), (YARN,,1), (locally,2), (changed,1), (locally.,1), (sc.parallelize(1,1), (only,1), (Configuration,1), (This,2), (basic,1), (first,1), (learning,,1), ([Eclipse](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-Eclipse),1), (documentation,3), (graph,1), (Hive,2), (several,1), (["Specifying,1), ("yarn",1), (page](http://spark.apache.org/documentation.html),1), ([params]`.,1), ([project,2), (prefer,1), (SparkPi,2), (<http://spark.apache.org/>,1), (engine,1), (version,1), (file,1), (documentation...

scala>

(四)Spark集群搭建-Java&Python版Spark的更多相关文章

  1. (三)Spark-Hadoop集群搭建-Java&Python版Spark

    Spark-Hadoop集群搭建 视频教程: 1.优酷 2.YouTube 配置java 启动ftp [root@master ~]# /etc/init.d/vsftpd restart 关闭 vs ...

  2. Spark集群搭建_YARN

    2017年3月1日, 星期三 Spark集群搭建_YARN 前提:参考Spark集群搭建_Standalone   1.修改spark中conf中的spark-env.sh   2.Spark on ...

  3. Spark集群搭建【Spark+Hadoop+Scala+Zookeeper】

    1.安装Linux 需要:3台CentOS7虚拟机 IP:192.168.245.130,192.168.245.131,192.168.245.132(类似,尽量保持连续,方便记忆) 注意: 3台虚 ...

  4. Spark集群搭建简配+它到底有多快?【单挑纯C/CPP/HADOOP】

    最近耳闻Spark风生水起,这两天利用休息时间研究了一下,果然还是给人不少惊喜.可惜,笔者不善JAVA,只有PYTHON和SCALA接口.花了不少时间从零开始认识PYTHON和SCALA,不少时间答了 ...

  5. hadoop+spark集群搭建入门

    忽略元数据末尾 回到原数据开始处 Hadoop+spark集群搭建 说明: 本文档主要讲述hadoop+spark的集群搭建,linux环境是centos,本文档集群搭建使用两个节点作为集群环境:一个 ...

  6. Spark集群搭建中的问题

    参照<Spark实战高手之路>学习的,书籍电子版在51CTO网站 资料链接 Hadoop下载[链接](http://archive.apache.org/dist/hadoop/core/ ...

  7. spark集群搭建

    文中的所有操作都是在之前的文章scala的安装及使用文章基础上建立的,重复操作已经简写: 配置中使用了master01.slave01.slave02.slave03: 一.虚拟机中操作(启动网卡)s ...

  8. 十、scala、spark集群搭建

    spark集群搭建: 1.上传scala-2.10.6.tgz到master 2.解压scala-2.10.6.tgz 3.配置环境变量 export SCALA_HOME=/mnt/scala-2. ...

  9. Spark集群搭建简要

    Spark集群搭建 1 Spark编译 1.1 下载源代码 git clone git://github.com/apache/spark.git -b branch-1.6 1.2 修改pom文件 ...

随机推荐

  1. P/Invoke:C#调用C++

    P/Invoke的全称是Platform Invoke (平台调用) 它实际上是一种函数调用机制通 过P/Invoke我们就可以调用非托管DLL中的函数. P/Invoke依次执行以下操作: 1. 查 ...

  2. Android Studio快捷键switch case 轻松转换为if else

    Android Studio快捷键switch case 轻松转换为if else 今天碰到的问题,没有找到资料,后面找到了方法,这个记下来,转载请注明出处:http://www.cnblogs.co ...

  3. SqlHelper中IN集合场景下的参数处理

    我手头有个古老的项目,持久层用的是古老的ADO.net.前两天去昆明旅游,其中的一个景点是云南民族村,通过导游介绍知道了一个古老的民族——基诺族,“基”在这个族内代表舅舅,“基诺”意为“跟在舅舅后边” ...

  4. js文章列表的树形结构输出

    文章表设计成这样了 后端直接给了无任何处理的json数据,现在要前端实现树形结构的输出,其实后端处理更简单写,不过既然来了就码出来 var doclist = [{ "id": 1 ...

  5. 验证码识别<1>

    1. 引子 前两天访问学校自助服务器()缴纳网费,登录时发现这系统的验证码也太过“清晰”了,突然脑袋里就蹦出一个想法:如果能够自动识别验证码,然后采用暴力破解的方式,那么密码不是可以轻易被破解吗? p ...

  6. ASP.NET Core中的依赖注入(4): 构造函数的选择与服务生命周期管理

    ServiceProvider最终提供的服务实例都是根据对应的ServiceDescriptor创建的,对于一个具体的ServiceDescriptor对象来说,如果它的ImplementationI ...

  7. Java 快速排序两种实现

    快速排序,只要学习过编程的人肯定都听说过这个名词,但是有时候写的时候还真蒙住了,网上搜罗了下以及查阅了"introduction to algorithm",暂时找到两种实现快排的 ...

  8. java笔记--笔试中极容易出错的表达式的陷阱

    我相信每一个学过java的人儿们都被java表达式虐过,各种"肯定是它,我不可能错!",然后各种"尼玛,真假,怎么可能?",虽然在实际开发中很少会真的让你去使用 ...

  9. 4.C#WinForm基础图片(显示和隐藏)

    要求: 软件上有一张图片,默认是隐藏的.用户在文本框中输入身份证号(131226198105223452),点击按钮,如果年龄大于18岁,则显示图片. 知识点: 取当前年份,Date Time Now ...

  10. 4D卓越团队-两天培训总结

    上周末参加了公司组织的领导力培训课程-4D卓越团队(创业型团队领导力训练项目),感觉有一些用,在这里分享一下. 课前游戏 培训老师先带我们做了一个游戏:每一个人,在同时参加培训的人中找到另外的 6 个 ...