Hadoop 修改源码以及将修改后的源码应用到部署好的Hadoop中
我的Hadoop版本是hadoop-2.7.3, 我们可以去hadoop官网下载源码hadoop-2.7.3-src,以及编译好的工程文件hadoop-2.7.3, 后者可以直接部署. 前者hadoop-2.7.3-src必须mvn之后才能部署. 我们修改代码必须是在hadoop-2.7.3-src源码中进行, 而源码mvn之后才能部署或使用. 所以我们要先了解Maven. mvn hadoop-2.7.3-src的时候会出现各种问题. (最好用jdk1.7)
其中hadoop-2.7.3-src源码文件中有个BUILDING.txt文件,要重点看. 教你如何构建hadoop. 如下所示:
- Build instructions for Hadoop
- ----------------------------------------------------------------------------------
- Requirements:
- * Unix System
- * JDK 1.7+
- * Maven 3.0 or later
- * Findbugs 1.3.9 (if running findbugs)
- * ProtocolBuffer 2.5.0
- * CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
- * Zlib devel (if compiling native code)
- * openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )
- * Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )
- * Internet connection for first build (to fetch all Maven and Hadoop dependencies)
- ----------------------------------------------------------------------------------
- Installing required packages for clean install of Ubuntu 14.04 LTS Desktop:
- * Oracle JDK 1.7 (preferred)
- $ sudo apt-get purge openjdk*
- $ sudo apt-get install software-properties-common
- $ sudo add-apt-repository ppa:webupd8team/java
- $ sudo apt-get update
- $ sudo apt-get install oracle-java7-installer
- * Maven
- $ sudo apt-get -y install maven
- * Native libraries
- $ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev
- * ProtocolBuffer 2.5.0 (required)
- $ sudo apt-get -y install libprotobuf-dev protobuf-compiler
- Optional packages:
- * Snappy compression
- $ sudo apt-get install snappy libsnappy-dev
- * Bzip2
- $ sudo apt-get install bzip2 libbz2-dev
- * Jansson (C Library for JSON)
- $ sudo apt-get install libjansson-dev
- * Linux FUSE
- $ sudo apt-get install fuse libfuse-dev
- ----------------------------------------------------------------------------------
- Maven main modules:
- hadoop (Main Hadoop project)
- - hadoop-project (Parent POM for all Hadoop Maven modules. )
- (All plugins & dependencies versions are defined here.)
- - hadoop-project-dist (Parent POM for modules that generate distributions.)
- - hadoop-annotations (Generates the Hadoop doclet used to generated the Javadocs)
- - hadoop-assemblies (Maven assemblies used by the different modules)
- - hadoop-common-project (Hadoop Common)
- - hadoop-hdfs-project (Hadoop HDFS)
- - hadoop-mapreduce-project (Hadoop MapReduce)
- - hadoop-tools (Hadoop tools like Streaming, Distcp, etc.)
- - hadoop-dist (Hadoop distribution assembler)
- ----------------------------------------------------------------------------------
- Where to run Maven from?
- It can be run from any module. The only catch is that if not run from utrunk
- all modules that are not part of the build run must be installed in the local
- Maven cache or available in a Maven repository.
- ----------------------------------------------------------------------------------
- Maven build goals:
- * Clean : mvn clean
- * Compile : mvn compile [-Pnative]
- * Run tests : mvn test [-Pnative]
- * Create JAR : mvn package
- * Run findbugs : mvn compile findbugs:findbugs
- * Run checkstyle : mvn compile checkstyle:checkstyle
- * Install JAR in M2 cache : mvn install
- * Deploy JAR to Maven repo : mvn deploy
- * Run clover : mvn test -Pclover [-DcloverLicenseLocation=${user.name}/.clover.license]
- * Run Rat : mvn apache-rat:check
- * Build javadocs : mvn javadoc:javadoc
- * Build distribution : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar]
- * Change Hadoop version : mvn versions:set -DnewVersion=NEWVERSION
- Build options:
- * Use -Pnative to compile/bundle native code
- * Use -Pdocs to generate & bundle the documentation in the distribution (using -Pdist)
- * Use -Psrc to create a project source TAR.GZ
- * Use -Dtar to create a TAR with the distribution (using -Pdist)
- Snappy build options:
- Snappy is a compression library that can be utilized by the native code.
- It is currently an optional component, meaning that Hadoop can be built with
- or without this dependency.
- * Use -Drequire.snappy to fail the build if libsnappy.so is not found.
- If this option is not specified and the snappy library is missing,
- we silently build a version of libhadoop.so that cannot make use of snappy.
- This option is recommended if you plan on making use of snappy and want
- to get more repeatable builds.
- * Use -Dsnappy.prefix to specify a nonstandard location for the libsnappy
- header files and library files. You do not need this option if you have
- installed snappy using a package manager.
- * Use -Dsnappy.lib to specify a nonstandard location for the libsnappy library
- files. Similarly to snappy.prefix, you do not need this option if you have
- installed snappy using a package manager.
- * Use -Dbundle.snappy to copy the contents of the snappy.lib directory into
- the final tar file. This option requires that -Dsnappy.lib is also given,
- and it ignores the -Dsnappy.prefix option.
- OpenSSL build options:
- OpenSSL includes a crypto library that can be utilized by the native code.
- It is currently an optional component, meaning that Hadoop can be built with
- or without this dependency.
- * Use -Drequire.openssl to fail the build if libcrypto.so is not found.
- If this option is not specified and the openssl library is missing,
- we silently build a version of libhadoop.so that cannot make use of
- openssl. This option is recommended if you plan on making use of openssl
- and want to get more repeatable builds.
- * Use -Dopenssl.prefix to specify a nonstandard location for the libcrypto
- header files and library files. You do not need this option if you have
- installed openssl using a package manager.
- * Use -Dopenssl.lib to specify a nonstandard location for the libcrypto library
- files. Similarly to openssl.prefix, you do not need this option if you have
- installed openssl using a package manager.
- * Use -Dbundle.openssl to copy the contents of the openssl.lib directory into
- the final tar file. This option requires that -Dopenssl.lib is also given,
- and it ignores the -Dopenssl.prefix option.
- Tests options:
- * Use -DskipTests to skip tests when running the following Maven goals:
- 'package', 'install', 'deploy' or 'verify'
- * -Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,....
- * -Dtest.exclude=<TESTCLASSNAME>
- * -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java
- ----------------------------------------------------------------------------------
- Building components separately
- If you are building a submodule directory, all the hadoop dependencies this
- submodule has will be resolved as all other 3rd party dependencies. This is,
- from the Maven cache or from a Maven repository (if not available in the cache
- or the SNAPSHOT 'timed out').
- An alternative is to run 'mvn install -DskipTests' from Hadoop source top
- level once; and then work from the submodule. Keep in mind that SNAPSHOTs
- time out after a while, using the Maven '-nsu' will stop Maven from trying
- to update SNAPSHOTs from external repos.
- ----------------------------------------------------------------------------------
- Protocol Buffer compiler
- The version of Protocol Buffer compiler, protoc, must match the version of the
- protobuf JAR.
- If you have multiple versions of protoc in your system, you can set in your
- build shell the HADOOP_PROTOC_PATH environment variable to point to the one you
- want to use for the Hadoop build. If you don't define this environment variable,
- protoc is looked up in the PATH.
- ----------------------------------------------------------------------------------
- Importing projects to eclipse
- When you import the project to eclipse, install hadoop-maven-plugins at first.
- $ cd hadoop-maven-plugins
- $ mvn install
- Then, generate eclipse project files.
- $ mvn eclipse:eclipse -DskipTests
- At last, import to eclipse by specifying the root directory of the project via
- [File] > [Import] > [Existing Projects into Workspace].
- ----------------------------------------------------------------------------------
- Building distributions:
- Create binary distribution without native code and without documentation:
- $ mvn package -Pdist -DskipTests -Dtar
- Create binary distribution with native code and with documentation:
- $ mvn package -Pdist,native,docs -DskipTests -Dtar
- Create source distribution:
- $ mvn package -Psrc -DskipTests
- Create source and binary distributions with native code and documentation:
- $ mvn package -Pdist,native,docs,src -DskipTests -Dtar
- Create a local staging version of the website (in /tmp/hadoop-site)
- $ mvn clean site; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
- ----------------------------------------------------------------------------------
- Installing Hadoop
- Look for these HTML files after you build the document by the above commands.
- * Single Node Setup:
- hadoop-project-dist/hadoop-common/SingleCluster.html
- * Cluster Setup:
- hadoop-project-dist/hadoop-common/ClusterSetup.html
- ----------------------------------------------------------------------------------
- Handling out of memory errors in builds
- ----------------------------------------------------------------------------------
- If the build process fails with an out of memory error, you should be able to fix
- it by increasing the memory used by maven -which can be done via the environment
- variable MAVEN_OPTS.
- Here is an example setting to allocate between 256 and 512 MB of heap space to
- Maven
- export MAVEN_OPTS="-Xms256m -Xmx512m"
- ----------------------------------------------------------------------------------
- Building on Windows
- ----------------------------------------------------------------------------------
- Requirements:
- * Windows System
- * JDK 1.7+
- * Maven 3.0 or later
- * Findbugs 1.3.9 (if running findbugs)
- * ProtocolBuffer 2.5.0
- * CMake 2.6 or newer
- * Windows SDK 7.1 or Visual Studio 2010 Professional
- * Windows SDK 8.1 (if building CPU rate control for the container executor)
- * zlib headers (if building native code bindings for zlib)
- * Internet connection for first build (to fetch all Maven and Hadoop dependencies)
- * Unix command-line tools from GnuWin32: sh, mkdir, rm, cp, tar, gzip. These
- tools must be present on your PATH.
- Unix command-line tools are also included with the Windows Git package which
- can be downloaded from http://git-scm.com/download/win.
- If using Visual Studio, it must be Visual Studio 2010 Professional (not 2012).
- Do not use Visual Studio Express. It does not support compiling for 64-bit,
- which is problematic if running a 64-bit system. The Windows SDK 7.1 is free to
- download here:
- http://www.microsoft.com/en-us/download/details.aspx?id=8279
- The Windows SDK 8.1 is available to download at:
- http://msdn.microsoft.com/en-us/windows/bg162891.aspx
- Cygwin is neither required nor supported.
- ----------------------------------------------------------------------------------
- Building:
- Keep the source code tree in a short path to avoid running into problems related
- to Windows maximum path length limitation. (For example, C:\hdc).
- Run builds from a Windows SDK Command Prompt. (Start, All Programs,
- Microsoft Windows SDK v7.1, Windows SDK 7.1 Command Prompt.)
- JAVA_HOME must be set, and the path must not contain spaces. If the full path
- would contain spaces, then use the Windows short path instead.
- You must set the Platform environment variable to either x64 or Win32 depending
- on whether you're running a 64-bit or 32-bit system. Note that this is
- case-sensitive. It must be "Platform", not "PLATFORM" or "platform".
- Environment variables on Windows are usually case-insensitive, but Maven treats
- them as case-sensitive. Failure to set this environment variable correctly will
- cause msbuild to fail while building the native code in hadoop-common.
- set Platform=x64 (when building on a 64-bit system)
- set Platform=Win32 (when building on a 32-bit system)
- Several tests require that the user must have the Create Symbolic Links
- privilege.
- All Maven goals are the same as described above with the exception that
- native code is built by enabling the 'native-win' Maven profile. -Pnative-win
- is enabled by default when building on Windows since the native components
- are required (not optional) on Windows.
- If native code bindings for zlib are required, then the zlib headers must be
- deployed on the build machine. Set the ZLIB_HOME environment variable to the
- directory containing the headers.
- set ZLIB_HOME=C:\zlib-1.2.7
- At runtime, zlib1.dll must be accessible on the PATH. Hadoop has been tested
- with zlib 1.2.7, built using Visual Studio 2010 out of contrib\vstudio\vc10 in
- the zlib 1.2.7 source tree.
- http://www.zlib.net/
- ----------------------------------------------------------------------------------
- Building distributions:
- * Build distribution with native code : mvn package [-Pdist][-Pdocs][-Psrc][-Dtar]
BULIDING.txt
每个子模块编译后生成的jar包放到了与源代码目录平级的target目录中.
其实mvn是根据pom.xml进行的. 所以只要有pom.xml文件,我们就可以进行mvn. 我们以修改NameNode.java的main()函数为例. 如下所示:
- // NameNode.java
- public static void main(String argv[]) throws Exception {
- System.out.println("Hello NameNode!"); //这是我添加的,当执行start-all.sh时,会启动NameNode类的main()函数,进一步打印
- if (DFSUtil.parseHelpArgument(argv, NameNode.USAGE, System.out, true)) {
- System.exit(0);
- }
- try {
- StringUtils.startupShutdownMessage(NameNode.class, argv, LOG);
- NameNode namenode = createNameNode(argv, null);
- if (namenode != null) {
- namenode.join();
- }
- } catch (Throwable e) {
- LOG.error("Failed to start namenode.", e);
- terminate(1, e);
- }
- }
当我们修改hadoop-2.7.3-src的NameNode.java的源码后,
比如我们修改的NameNode.java在hadoop-2.7.3-src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java中. 而NameNode.java在hadoop-2.7.3的hadoop-2.7.3/share/hadoop/hdfs/hadoop-hdfs-2.7.3.jar中,( 我们可以通过(解)压缩软件打开 hadoop-hdfs-2.7.3.jar, 会发现org/apache/hadoop/hdfs/server/namenode/NameNode.class , 也就是将 .class 文件打成 jia 包 ). mvn好之后的hadoop-hdfs-2.7.3.jar在对应的hadoop-2.7.3-src/hadoop-hdfs-project/hadoop-hdfs/target文件夹目录下, 我们只要用这个mvn好之后的hadoop-hdfs-2.7.3.jar替换部署好的hadoop的对应jar包即可. mvn方法如下:
(1) 我们可以对修改源码的项目整个进行mvn , 然后再部署. 这个缺点是每次都要对整个项目进行mvn , 还要重新进行部署. 所以不要用这种方法.
(2) 我们可以对修改源码的项目整个进行mvn ,然后替换掉我们修改的部分的哪一个jar包即可. 这样做比上面好的地方是,我们不要重新部署hadoop. 只需要替换我们修改的文件所在的jar包即可,不用每次都部署(已经部署好).
(3) NameNode.java在hadoop-hdfs-project里, 该文件夹有pom.xml文件,所以我们可以只对该文件进行mvn, 然后用生成的hadoop-hdfs-2.7.3.jar替换部署好的hadoop的对应jar包即可.( 这样可以节省时间, 因为hadoop-2.7.3-src文件夹下有很多其他文件夹要mvn.)
(4) NameNode.java在hadoop-hdfs-project/hadoop-hdfs里, hadoop-hdfs里也有pom.xml文件, 相比(3),我们只对hadoop-hdfs-project文件夹下的hadoop-hdfs进行mvn,而不管hadoop-hdfs-project文件夹下的其他文件,这样进一步节省时间. 之后然后用生成的hadoop-hdfs-2.7.3.jar替换部署好的hadoop的对应jar包即可. 这里生成的jar包在hadoop-hdfs文件的target文件里.
所以我们采用第(4)种, 但是源代码第一次mvn时都是很慢的,需要下载很多东西( 主要是jar包和 .pom 文件等). 我们可以用如下命令:
- $ su root # 先切换到root用户,否则很可能会出问题
- $ mvn package -Pdist -DskipTests -Dtar #在哪个目录下,就对哪个目录进行编译
成功的话, 会显示 BUILD SUCCESS .
最后的结果,是在启动NameNode的时候,会打印Hello NameNode! . 结果如下:
另外: 上面是我们修改了hadoop-2.7.3-src的源码. 所以要mvn. 如果我们只是使用Hadoop提供的一些接口来实现我们的功能,就不用mvn, 可以用eclipse来实现. 参考我的博客编写hadoop程序并打成jar包上传到hadoop集群运行 .
Hadoop 修改源码以及将修改后的源码应用到部署好的Hadoop中的更多相关文章
- YARN DistributedShell源码分析与修改
YARN DistributedShell源码分析与修改 YARN版本:2.6.0 转载请注明出处:http://www.cnblogs.com/BYRans/ 1 概述 2 YARN Distrib ...
- ABP框架源码学习之修改默认数据库表前缀或表名称
ABP框架源码学习之修改默认数据库表前缀或表名称 1,源码 namespace Abp.Zero.EntityFramework { /// <summary> /// Extension ...
- 51ak带你看MYSQL5.7源码3:修改代码实现你的第一个Mysql版本
从事DBA工作多年 MYSQL源码也是头一次接触 尝试记录下自己看MYSQL5.7源码的历程 目录: 51ak带你看MYSQL5.7源码1:main入口函数 51ak带你看MYSQL5.7源码2:编译 ...
- 神器如 dnSpy,无需源码也能修改 .NET 程序
dnSpy 是 0xd4d 开发的 .NET 程序调试神器. 说它是神器真的毫不为过!它能在完全没有源码的情况下即时调试程序,甚至还能修改程序!本文讲向大家介绍如何使用 dnSpy 修改 .NET 程 ...
- stm32外部时钟源8M换成12M后库函数相应修改总结
前言 在做“自制继电器上位机控制软件”项目的时候,下位机用到USB虚拟串口,将以前写好的USB虚拟串口程序移植到下位机,发现程序计算机无法识别到虚拟串口STMicroelectronics Virtu ...
- 用c#开发微信(2)扫描二维码,用户授权后获取用户基本信息 (源码下载)
本文将介绍基于Senparc.Weixin微信开发框架来实现网页授权来获取用户基本信息.先生成包含授权及回调url信息的二维码:用户用微信扫描之后,被要求授权以获取Ta的用户基本信息:用户授权后,通过 ...
- Hadoop随笔(一):工作流程的源码
一.几个可能会用到的属性值 1.mapred.map.tasks.speculative.execution和mapred.reduce.tasks.speculative.execution 这两个 ...
- 修改了系统自带头文件后,Xcode会报错
1.Xcode自带头文件的路径 /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Develo ...
- Hadoop之HDFS原理及文件上传下载源码分析(下)
上篇Hadoop之HDFS原理及文件上传下载源码分析(上)楼主主要介绍了hdfs原理及FileSystem的初始化源码解析, Client如何与NameNode建立RPC通信.本篇将继续介绍hdfs文 ...
随机推荐
- Unix操作系统LD_PRELOAD简介
http://blog.csdn.net/ieearth/article/details/49952047 Unix操作系统的动态链接库的知识中,这个功能主要就是用来有选择性的载入Unix操作系统不同 ...
- IOS --关于粘贴板 ,剪切板 ,UILabel的复制
在iOS中下面三个控件,自身就有复制-粘贴的功能: 1.UITextView 2.UITextField 3.UIWebView UIKit framework提供了几个类和协议方便我们在自己的应用程 ...
- JS --- 数组循环要用length
socket.on("receive", function (data) { deviceone.print("返回的数据:"+data) // 发送异常 va ...
- invlpg 指令简单介绍
invlpg 指令简单介绍 void tlb_invalidate(pde_t *pgdir, void *va) { // Flush the entry only if we're modifyi ...
- Effective C++ 条款五 了解C++默默编写并调用哪些函数
//申明一个类时,编译器会默认为你提供四个函数. //无参构造函数,析构函数,copy构造函数,copy assignment操作符. template <typename T> ...
- 这个捕鱼游戏制作的真心不错,原创音乐,AV动作,让人流连忘返啊呵呵
女生看完这篇文章后果断地命令男朋友打开电脑和手机 2014-10-10 茶娱饭后 本人纯屌丝宅男一名.专注游戏十年有余,玩过无数大大小小的游戏,对捕鱼游戏情有独钟.我不想说在捕鱼游戏方面有多专业 ...
- Python - scrapy安装中libxml2问题
先到 http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml 下载下面三个库的 whl,然后用pip install 来安装即可 pycurl,lxml,lib ...
- JfreeChart折线图 CSDN-李鹏飞
今天公司里分配给我的工作是JfreeChart折线图本人之前也没接触过如今让我们大家一起完毕! 在这个公司,用到了太多的JfreeChart,今天就对折线图作一个总结,希望对大家有点帮助,我这里直接是 ...
- Python 002- 爬虫爬取淘宝上耳机的信息
参照:https://mp.weixin.qq.com/s/gwzym3Za-qQAiEnVP2eYjQ 一般看源码就可以解决问题啦 #-*- coding:utf-8 -*- import re i ...
- 6.游戏特别离不开脚本(3)-JS脚本操作java(3)(直接操作JS文件或者调用函数)
java直接运行JS脚本文件的语句,游戏开发时,策划的配置文件什么的就可以分开管理了,游戏逻辑也是一样,比如:一个功能一个脚本或者一个系统一个脚本. import java.io.FileNotFou ...