hadoop安装及配置入门篇

声明：

      author： 龚细军

      时间： --

      类型： 笔记

      转载时请注明出处及相应链接。

      链接地址： http://www.cnblogs.com/gongxijun/p/5726024.html

本笔记所记录全部基于真实操作所得，所使用hadoop版本为hadoop-2.7.2，使用操作系统为kylin-linux.

默认是：已经安装好了jdk环境.并已经下载好hadoop&解压之后

1. 下载完成hadoo并解压之后

进入到安装目录，我们会看到如下几个文件夹和文件

/hadoop-2.7.2$ ls

bin  include  lib      LICENSE.txt  NOTICE.txt  README.txt  share

etc  input    libexec  logs         output      sbin        wc-in

介绍一下基本情况：

bin目录： hadoop的指令集合存储区，例如 hadoop ,hdfs , yarn,mapred等这个文件比较重要

我们可以如此使用它们：

/hadoop-2.7.2$ bin/hadoop dfs -cat output/* |more

include目录： C++/C 开发用的头文件

lib目录：提供各种库，c/c++开发库

etc目录：环境配置包，其他的版本采用conf目录替换，进入该目录下会看到

/hadoop-2.7.2/etc/hadoop$ ls | grep .xml

capacity-scheduler.xml

core-site.xml

hadoop-policy.xml

hdfs-site.xml

hdfs-site.xml~

httpfs-site.xml

kms-acls.xml

kms-site.xml

mapred-queues.xml.template

mapred-site.xml.template

ssl-client.xml.example

ssl-server.xml.example

yarn-site.xml

关于如何伪分布式配置

1.配置文件core.site.xml

  <configuration>

            <property>

            <name>fs.default.name</name>

            <value>hdfs://localhost:9000</value>

            </property>

    </configuration>

2.hdfs.site.xml文件配置

    <configuration>

            <property>

            <name>dfs.replication</name>

            <value>1</value>

            </property>

            <property>

            <name>dfs.name.dir</name>

            <value>/home/gongxijun/HDFS/fileinput</value>

            </property>

            <property>

            <name>dfs.data.dir</name>

            <value>/home/gongxijun/HDFS/fileoutput</value>

            </property>

            <property>

            <name>dfs.permissions</name>

            <value>false</value>

            <description>

            if "true" ,enable permission checking in HDFS. if "false",permission checking is turned off,but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode , owner or group of files or directories.

            </description>

            </property>

            </configuration>

3.配置mapred-site.xml文件，如要将mapred-site.xml.template文件复制一份mapred-site.xml，并对mapred-site.xml进行如下配置

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

之后，启动hadoop，输入./start-all.sh

程序pom.xml文件配置

   <dependency>

            <groupId>org.apache.hadoop</groupId>

            <artifactId>hadoop-common</artifactId>

            <version>${hadoop.version}</version>

            <scope>compile</scope>

            <exclusions>

                <exclusion>

                    <artifactId>zookeeper</artifactId>

                    <groupId>org.apache.zookeeper</groupId>

                </exclusion>

                <exclusion>

                    <artifactId>slf4j-log4j12</artifactId>

                    <groupId>org.slf4j</groupId>

                </exclusion>

                <exclusion>

                    <artifactId>jsp-api</artifactId>

                    <groupId>javax.servlet.jsp</groupId>

                </exclusion>

                <exclusion>

                    <artifactId>jasper-runtime</artifactId>

                    <groupId>tomcat</groupId>

                </exclusion>

                <exclusion>

                    <artifactId>jasper-compiler</artifactId>

                    <groupId>tomcat</groupId>

                </exclusion>

                <exclusion>

                    <artifactId>jersey-server</artifactId>

                    <groupId>com.sun.jersey</groupId>

                </exclusion>

                <exclusion>

                    <artifactId>asm</artifactId>

                    <groupId>asm</groupId>

                </exclusion>

            </exclusions>

        </dependency>

运行程序如下：

 package com.qunar.mapReduce;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.LongWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

 import java.io.IOException;

 import java.util.Scanner;

 import java.util.StringTokenizer;

 /**

  * *********************************************************

  * <p/>

  * Author:     XiJun.Gong

  * Date:       2016-07-29 14:59

  * Version:    default 1.0.0

  * Class description：

  * <p/>

  * *********************************************************

  */

 public class MapReduceDemo {

     public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

         private final static IntWritable one = new IntWritable(1);

         private Text word = new Text();

         @Override

         public void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             String line = value.toString();

             StringTokenizer tokenizer = new StringTokenizer(line);

             while (tokenizer.hasMoreTokens()) {

                 word.set(tokenizer.nextToken());

                 context.write(word, one);

             }

         }

         public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

             @Override

             public void reduce(Text key, Iterable<IntWritable> values, Context context)

                     throws IOException, InterruptedException {

                 int sum = 0;

                 for (IntWritable val : values) {

                     sum += val.get();

                 }

                 context.write(key, new IntWritable(sum));

             }

         }

         public static void main(String[] args) throws Exception {

             Configuration configuration = new Configuration();

             Job job = new Job(configuration, "wordCount");

             job.setOutputKeyClass(Text.class);

             job.setOutputValueClass(IntWritable.class);

             job.setMapperClass(Map.class);

             job.setReducerClass(Reduce.class);

             job.setInputFormatClass(TextInputFormat.class);

             job.setOutputFormatClass(TextOutputFormat.class);

             Scanner reader = new Scanner(System.in);

             while (reader.hasNext()) {

                 FileInputFormat.addInputPath(job, new Path(reader.next()));

                 FileOutputFormat.setOutputPath(job, new Path(reader.next()));

                 job.waitForCompletion(true);

             }

         }

     }

 }

运行程序：

Connected to the target VM, address: '127.0.0.1:51980', transport: 'socket'

12:41:05.404 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)

12:41:05.441 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)

12:41:05.442 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[GetGroups], always=false, type=DEFAULT, sampleName=Ops)

12:41:05.444 [main] DEBUG o.a.h.m.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics

12:41:05.871 [main] DEBUG o.a.h.s.a.util.KerberosName - Kerberos krb5 configuration not found, setting default realm to empty

12:41:05.883 [main] DEBUG org.apache.hadoop.security.Groups -  Creating new Groups object

12:41:05.895 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...

12:41:05.896 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path

12:41:05.896 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - java.library.path=/home/gongxijun/Qunar/idea-IU-139.1117.1/bin::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

12:41:05.897 [main] WARN  o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

12:41:05.900 [main] DEBUG o.a.hadoop.util.PerformanceAdvisory - Falling back to shell based

12:41:05.905 [main] DEBUG o.a.h.s.JniBasedUnixGroupsMappingWithFallback - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping

12:41:05.957 [main] DEBUG org.apache.hadoop.util.Shell - setsid exited with exit code 0

12:41:05.957 [main] DEBUG org.apache.hadoop.security.Groups - Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000

12:41:05.961 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login

12:41:05.962 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login commit

12:41:05.968 [main] DEBUG o.a.h.security.UserGroupInformation - using local user:UnixPrincipal: gongxijun

12:41:05.969 [main] DEBUG o.a.h.security.UserGroupInformation - Using user: "UnixPrincipal: gongxijun" with name gongxijun

12:41:05.969 [main] DEBUG o.a.h.security.UserGroupInformation - User entry: "gongxijun"

12:41:05.970 [main] DEBUG o.a.h.security.UserGroupInformation - UGI loginUser:gongxijun (auth:SIMPLE)

输入命令：

/home/gongxijun/web进阶.txt
/home/gongxijun/a.txt

显示结果：

12:44:36.992 [main] INFO  org.apache.hadoop.mapreduce.Job - Counters: 33

    File System Counters

        FILE: Number of bytes read=6316

        FILE: Number of bytes written=518809

        FILE: Number of read operations=0

        FILE: Number of large read operations=0

        FILE: Number of write operations=0

    Map-Reduce Framework

        Map input records=84

        Map output records=85

        Map output bytes=1476

        Map output materialized bytes=1652

        Input split bytes=99

        Combine input records=0

        Combine output records=0

        Reduce input groups=82

        Reduce shuffle bytes=1652

        Reduce input records=85

        Reduce output records=82

        Spilled Records=170

        Shuffled Maps =1

        Failed Shuffles=0

        Merged Map outputs=1

        GC time elapsed (ms)=9

        CPU time spent (ms)=0

        Physical memory (bytes) snapshot=0

        Virtual memory (bytes) snapshot=0

        Total committed heap usage (bytes)=459276288

    Shuffle Errors

        BAD_ID=0

        CONNECTION=0

        IO_ERROR=0

        WRONG_LENGTH=0

        WRONG_MAP=0

        WRONG_REDUCE=0

    File Input Format Counters

        Bytes Read=1335

    File Output Format Counters

        Bytes Written=1311

结果在a.txt文件夹中：

(kafuka卡夫卡)    1

(缺陷：    1

(需要重点学习)    1

---去查看QMQ--message---->broker    1

/    1

1.    2

1.判断线程安全的两个机准：    1

2.    3

3.    1

Apache    1

Cache    1

Client:    1

ConCurrentHashMap    1

Dubbo    1

Executor    1

Futrue/CountDownLatch    1

Guava    1

HTTP:    1

HashMap    1

Hession    1

HttpComponents    1

Java    1

Json    1

Key-Value    1

Kryo(重点)    1

LRU    1

Protobuf    1

QMQ/AMQ/rabbitimq    1

ReadWriterLock    1

ReentrantLock    1

async-http-client    1

c3p0    1

client实现    1

dbpc    1

redis    1

seriialization    1

servlet    1

snchronized    1

spymemcached    1

tomcat-jdbc    1

xmemcached    1

一致性Hash    1

一：    1

三：    1

乐观锁：    1

二：    1

互斥    1

共享数据    1

分布式锁？    1

分布式：    1

前端轮询，后端异步：    1

单例的    1

参数回调    1

可复用资源，创建代价大    1

可扩展性，服务降级，负载均衡，灰度    1

可重入锁    1

可靠性    1

回顾    1

场景：    1

对象池：    1

将对象的状态信息转换为可以存储或传输形式的过程.    1

尽量不要使用本地缓存    1

并发修改    1

序列化：    1

建议：    1

异步调用    1

异步：    1

形成环)    1

性能    1

方式：    1

本地缓存太大，可以使用对象池    1

概念：    1

池化技术    1

消息队列：    1

类型：    1

线程池    1

缓存--本地    1

读写锁：    1

连接池：    1

（分段锁）    1

（推荐使用）    1

，    1

hadoop安装及配置入门篇的更多相关文章

[Hadoop入门] - 2 ubuntu安装与配置 hadoop安装与配置
ubuntu安装(这里我就不一一捉图了,只引用一个网址, 相信大家能力) ubuntu安装参考教程: http://jingyan.baidu.com/article/14bd256e0ca52eb ...
Hadoop生态圈-Hive快速入门篇之Hive环境搭建
Hadoop生态圈-Hive快速入门篇之Hive环境搭建作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.数据仓库(理论性知识大多摘自百度百科) 1>.什么是数据仓库数据 ...
Hadoop生态圈-Hive快速入门篇之HQL的基础语法
Hadoop生态圈-Hive快速入门篇之HQL的基础语法作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 本篇博客的重点是介绍Hive中常见的数据类型,DDL数据定义,DML数据操作 ...
Git客户端的安装与配置入门
GitLab与Git客户端的安装与配置入门,每次配置完一段时间,就忘记配置过程了,为了自己和同学们以后有所参照,特记录了本次下载和配置,其实Git就是一个版本控制系统,类似于SVN,CVS等下载:W ...
gcc g++ 安装与配置入门详解 - 精简归纳
gcc g++ 安装与配置入门详解 - 精简归纳 JERRY_Z. ~ 2020 / 9 / 24 转载请注明出处!️ 目录 gcc g++ 安装与配置入门详解 - 精简归纳一.下载MinGW ...
Hadoop集群--linux虚拟机Hadoop安装与配置、克隆虚拟机
Hadoop集群第四章 Hadoop安装与配置.克隆虚拟机一.Hadoop安装与配置 1.将hadoop安装包通过Xftp传输到虚拟机的/software目录下 2.进入/software目录下, ...
spark学习（2）--hadoop安装、配置
环境: 三台机器 ubuntu14.04 hadoop2.7.5 jdk-8u161-linux-x64.tar.gz (jdk1.8) 架构: machine101 :名称节点.数据节点.Secon ...
一、hadoop安装与配置
准备环境: 系统:centos6.5 64位 192.168.211.129 master 192.168.211.131 slave1 在两台服务器上都要配置ssh免密码登录在192. ...
Hadoop安装与配置
Hadoop介绍上面是官方介绍,翻一下来总结一句话就是:Hadoop是一个高可用,用于分布式处理大规模计算的工具. Hadoop1.2 下载 . Hadoop1.2 安装 1. 安装jDK 2. 配 ...

随机推荐

浮点型数据运算精度bug
/** * 校验是否为数字 * @param arg * @return */ function checkIsNumber(arg){ if(arg != null && arg.t ...
通过jquery获取天气的方法
代码为: $.getScript('http://int.dpool.sina.com/iplookup/iplookup.php?format=js',function(_result){ if(r ...
python :添加的内容具有之前的功能用delegate绑定事件
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/ ...
Java中excute,excuteUpdate,excuteQuery的区别
executeQuery(String sql) 执行select语句,它返回的是查询后得到记录集(resultset). executeUpdate(String sql) 执行 ...
微信支付：curl出错，错误码:60
如下是运行微信支付测试代码时出错代码: Fatal error: Uncaught exception ‘WxPayException‘ with message ‘curl出错,错误码:60‘ in ...
java如何修改java.library.path并且不重启jvm也能生效
先说一下需求吧, 目前在用JCEF实现java程序桌面版包装,源码中需要加载编译好的几个dll文件,而这些文件的路径必然是根据程序安装的路径而变化的,这就需要在程序运行的时候,去动态修改java.li ...
Oracle数据库基础
Oracle基础知识 Oracle的主要特点 1.支持多用户.大事务量的事务处理 2.在保持数据安全性和完整性方面性能的优越 3.支持分布式数据处理.将分布在不同物理位置的数据库用通信网络连接起来,在 ...
php读取大文件
高效率计算文件行数 function count_line($file) { $fp=fopen($file, "r"); $i=0; while(!feof($fp)) { // ...
Unity中各类物理投射性能横向比较
最近在优化摄像机部分代码,抽了个时间对物理投射这块进行了系统性的测试,发现了不少东西测试工程下载地址: http://files.cnblogs.com/files/hont/RaycastTest ...
Git使用指南（1）——Git配置命令
配置用户信息 git config --global user.name bongxin git config --global user.email bongxin@yeah.net 配置文本编辑器 ...

hadoop安装及配置入门篇

hadoop安装及配置入门篇的更多相关文章

随机推荐

热门专题