Hadoop介绍

一、简介

Hadoop是一个开源的分布式计算平台,用于存储大数据,并使用MapReduce来处理。Hadoop擅长于存储各种格式的庞大的数据,任意的格式甚至非结构化的处理。两个核心:

  • HDFS:Hadoop分布式文件系统(Hadoop Distributed File System),具有高容错性和伸缩性,使用java开发
  • MapReduce:Google MapReduce的开源实现,分布式编程模型使用户更方便的开发并行应用

使用Hadoop可以轻松的组织计算机资源,从而搭建自己的分布式计算平台,并且可以充分利用集群的计算 和存储能力,完成海量数据的处理。

二、Hadoop的优势

  1. 高可靠性:Hadoop按位存储和处理数据的能力具有很高的可靠性
  2. 高拓展性:Hadoop是在可用的计算机集簇间分配数据完成计算任务的,这些集簇可以拓展到数以千计的节点中
  3. 高效性:Hadoop能够在节点之间动态地移动数据,以保证各个节点的动态平衡,因此其处理速度非常快
  4. 高容错性:Hadoop能够自动保存数据的多份副本,并且能够自动将失败的任务重新分配

三、关联项目

  • Common:为Hadoop及其子项目提供支持的常用工具,主要包括FileSystem,RPC和串行化库。
  • Avro:Avro用于数据序列化的系统。提供了丰富的数据结构类型、快速可压缩的二进制格式、存储持久性数据的文件集、远程调用RPC的功能和简单的动态语言集成功能。
  • MapReduce:是一种编程模型,用于大规模数据集(大于1TB)的并行运算。
  • HDFS:分布式文件系统。
  • YRAN:分布式资源管理。
  • Chukwa:开源的数据收集系统,用于监控和分析大型分布式系统的数据。
  • Hive:一个建立在Hadoop基础之上的数据仓库,它提供了一些用于对Hadoop文件中的数据集进行数据整理、特殊查询和分析存储的工具。Hive提供一种结构化数据的机制,支持类似传统RDBMS的SQL语言的查询语言来帮助那些熟悉SQL的用户查询Hadoop中的数据,该查询语言成为Hive SQL。
  • Hbase:一个分布式的、面向列的开源数据库,适合非结构化的数据存储。主要用于需要随机访问、实时读写的大数据。
  • Pig:是一个对大型数据集进行分析、评估的平台。Pig最突出的优势是它的结构能够经受住高度并行化的检验。
  • Zookeeper:为分布式应用设计的协调服务,主要为用户提供同步、配置管理、分组和命令等服务。

四、编译安装Hadoop

因为我是用的是32位系统,官方预编译版本只有64位的,无法使用,所以得编译源代码。

根据编译文件BUILDING.txt内容,安装hadoop之前需要保证有以下工具:

Hadoop编译说明书

需要:
Unix 系统
JDK1.8
maven 3.3或更高
ProtoBuffer 2.5.0
CMake 3.1或更新(如果需要编译本地代码)
Zlib develop(如果需要编译本地代码)
openssl devel(如果编译原生hadoop-pipe,并获得最佳的HDFS加密性能)
Linux FUSE(用户空间的文件系统) 2.6或更高(如果编译fuse_dfs)
第一次编译需要网络保持连接(获取所有的maven和Hadoop需要的依赖)
Python(发布文档需要)
bats(用于shell代码测试)
Node.js / bower / Ember-cli(用于编译前端UI界面)
---------------------------------------------------------------------
获得具有所有工具的环境的最简单方法是通过Docker提供的配置。
这就需要一本最近的docker版本1.4.1或者更高的可以正常工作的版本 在Linux上,你可以运行下面的命名安装Docker
$ ./start-build-env.sh
接下来显示的提示是位于源树的已安装版本,并且已安装和配置了所有必需的测试和构建工具。
请注意,在此docker环境中,您只能从您开始的位置访问Hadoop源树。因此如果你想运行
dev-support/bin/test-patch /path/to/my.patch
那么这个patch文件必须放在hadoop源树中。 在ubuntu中清楚并安装所需的软件包:
Oracle JDK 1.8 (首选)
$ sudo apt-get purge openjdk*
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
Maven
$ sudo apt-get -y install maven
本地依赖包
$ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev
ProtocolBuffer 2.5.0 (必须)
$ sudo apt-get -y install protobuf-compiler
# 1.下载源码
wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.1.2/hadoop-3.1.2-src.tar.gz
# 2.解压
tar -zxcf hadoop-3.1.2-src.tar.gz
cd hadoop-3.1.2-src
# 3.mvn编译
mvn package -Pdist,native -DskipTests -Dtar

编译这个玩意儿断断续续用了3天时间,下面是遇到的问题总结记录一下。

问题1:

mvn package -Pdist,native -DskipTests -Dtar的时候编译失败:

[ERROR] Failed to execute goal org.codehaus.mojo:native-maven-plugin:1.0-alpha-8:javah (default) on project hadoop-common: Error running javah command: Error executing command line. Exit code:2 -> [Help 1]

解决:

vim hadoop-common-project/hadoop-common/pom.xml将javah的执行路径改为绝对路径

<javahPath>${env.JAVA_HOME}/bin/javah</javahPath>
改为
<javahPath>/usr/bin/javah</javahPath>
# 具体的路径需要对应你机器上的真实路径

问题2:

mvn package -Pdist,native -DskipTests -Dtar的时候编译失败:

[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.1.2:cmake-compile (cmake-compile) on project hadoop-common: CMake failed with error code 1 -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.1.2:cmake-compile (cmake-compile) on project hadoop-common: CMake failed with error code 1

解决:

cmake版本不对,安装cmake3.0版本:

# download
wget https://cmake.org/files/v3.0/cmake-3.0.0.tar.gz
tar -zxvf cmake-3.0.0.tar.gz
cd cmake-3.0.0
./configure
make
sudo apt-get install checkinstall
sudo checkinstall
sudo make install
# 建立软链接
sudo ln -s bin/* /usr/bin/

还是不行。使用mvn package -Pdist,native -DskipTests -Dtar -e -X打印所有日志,可以找到:

[INFO] Running cmake /home/wangjun/software/hadoop-3.1.2-src/hadoop-common-project/hadoop-common/src -DGENERATED_JAVAH=/home/wangjun/software/hadoop-3.1.2-src/hadoop-common-project/hadoop-common/target/native/javah -DJVM_ARCH_DATA_MODEL=32 -DREQUIRE_BZIP2=false -DREQUIRE_ISAL=false -DREQUIRE_OPENSSL=false -DREQUIRE_SNAPPY=false -DREQUIRE_ZSTD=false -G Unix Makefiles
[INFO] with extra environment variables {}
[WARNING] Soft-float JVM detected
[WARNING] CMake Error at /home/wangjun/software/hadoop-3.1.2-src/hadoop-common-project/hadoop-common/HadoopCommon.cmake:182 (message):
[WARNING] Soft-float dev libraries required (e.g. 'apt-get install libc6-dev-armel'
[WARNING] on Debian/Ubuntu)
[WARNING] Call Stack (most recent call first):
[WARNING] CMakeLists.txt:26 (include)
[WARNING]
[WARNING]
[WARNING] -- Configuring incomplete, errors occurred!
[WARNING] See also "/home/wangjun/software/hadoop-3.1.2-src/hadoop-common-project/hadoop-common/target/native/CMakeFiles/CMakeOutput.log".
[WARNING] See also "/home/wangjun/software/hadoop-3.1.2-src/hadoop-common-project/hadoop-common/target/native/CMakeFiles/CMakeError.log".

查看hadoop-common-project/hadoop-common/target/native/CMakeFiles/CMakeError.log日志,看到报错:

gnu/stubs-soft.h: No such file or directory

解决方案:更改hadoop-common-project/hadoop-common/HadoopCommon.cmake,将两处-mfloat-abi=softfp改为-mfloat-abi=hard,参考:https://blog.csdn.net/wuyusheng314/article/details/79428996https://stackoverflow.com/questions/49139125/fatal-error-gnu-stubs-soft-h-no-such-file-or-directory。(最好是重新解压原始包更改完重新编译,要不然可能会出错)

这个改完又有了新问题,编译Apache Hadoop MapReduce NativeTask是报错

[WARNING] /usr/bin/ranlib libgtest.a
[WARNING] make[2]: Leaving directory '/home/wangjun/software/hadoop-3.1.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native'
[WARNING] /usr/local/bin/cmake -E cmake_progress_report /home/wangjun/software/hadoop-3.1.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native/CMakeFiles 1
[WARNING] [ 7%] Built target gtest
[WARNING] make[1]: Leaving directory '/home/wangjun/software/hadoop-3.1.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native'
[WARNING] /tmp/ccpXG9td.s: Assembler messages:
[WARNING] /tmp/ccpXG9td.s:2040: Error: bad instruction `bswap r5'
[WARNING] /tmp/ccpXG9td.s:2063: Error: bad instruction `bswap r1'
[WARNING] make[2]: *** [CMakeFiles/nativetask.dir/build.make:79: CMakeFiles/nativetask.dir/main/native/src/codec/BlockCodec.cc.o] Error 1
[WARNING] make[2]: *** Waiting for unfinished jobs....
[WARNING] make[1]: *** [CMakeFiles/Makefile2:96: CMakeFiles/nativetask.dir/all] Error 2
[WARNING] make[1]: *** Waiting for unfinished jobs....
[WARNING] /tmp/ccBbS5rL.s: Assembler messages:
[WARNING] /tmp/ccBbS5rL.s:1959: Error: bad instruction `bswap r5'
[WARNING] /tmp/ccBbS5rL.s:1982: Error: bad instruction `bswap r1'
[WARNING] make[2]: *** [CMakeFiles/nativetask_static.dir/build.make:79: CMakeFiles/nativetask_static.dir/main/native/src/codec/BlockCodec.cc.o] Error 1
[WARNING] make[2]: *** Waiting for unfinished jobs....
[WARNING] /tmp/cc6DHbGO.s: Assembler messages:
[WARNING] /tmp/cc6DHbGO.s:979: Error: bad instruction `bswap r2'
[WARNING] /tmp/cc6DHbGO.s:1003: Error: bad instruction `bswap r3'
[WARNING] make[2]: *** [CMakeFiles/nativetask_static.dir/build.make:125: CMakeFiles/nativetask_static.dir/main/native/src/codec/Lz4Codec.cc.o] Error 1
[WARNING] make[1]: *** [CMakeFiles/Makefile2:131: CMakeFiles/nativetask_static.dir/all] Error 2
[WARNING] make: *** [Makefile:77: all] Error 2

看错误应该是指令问题,google一番后,找到解决方案:https://issues.apache.org/jira/browse/HADOOP-14922https://issues.apache.org/jira/browse/HADOOP-11505

编辑primitives.h文件,根据https://issues.apache.org/jira/secure/attachment/12693989/HADOOP-11505.001.patch里面的git log修改后重新编译。

经历了3天的折磨,终于成功了!来,看看成功后的显示:

[INFO] No site descriptor found: nothing to attach.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Apache Hadoop Main 3.1.2:
[INFO]
[INFO] Apache Hadoop Main ................................. SUCCESS [ 3.532 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [ 6.274 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [ 3.668 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [ 5.743 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [ 1.739 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [ 4.782 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [ 10.777 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [ 5.156 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [ 18.468 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [ 8.293 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [03:15 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [ 14.700 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 15.340 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [ 0.876 s]
[INFO] Apache Hadoop HDFS Client .......................... SUCCESS [ 46.540 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [02:34 min]
[INFO] Apache Hadoop HDFS Native Client ................... SUCCESS [ 12.125 s]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 20.005 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [ 8.934 s]
[INFO] Apache Hadoop HDFS-RBF ............................. SUCCESS [01:08 min]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [ 0.892 s]
[INFO] Apache Hadoop YARN ................................. SUCCESS [ 0.879 s]
[INFO] Apache Hadoop YARN API ............................. SUCCESS [ 25.531 s]
[INFO] Apache Hadoop YARN Common .......................... SUCCESS [01:57 min]
[INFO] Apache Hadoop YARN Registry ........................ SUCCESS [ 14.521 s]
[INFO] Apache Hadoop YARN Server .......................... SUCCESS [ 0.920 s]
[INFO] Apache Hadoop YARN Server Common ................... SUCCESS [ 23.432 s]
[INFO] Apache Hadoop YARN NodeManager ..................... SUCCESS [ 28.782 s]
[INFO] Apache Hadoop YARN Web Proxy ....................... SUCCESS [ 9.515 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService ....... SUCCESS [ 14.077 s]
[INFO] Apache Hadoop YARN Timeline Service ................ SUCCESS [ 12.728 s]
[INFO] Apache Hadoop YARN ResourceManager ................. SUCCESS [ 51.338 s]
[INFO] Apache Hadoop YARN Server Tests .................... SUCCESS [ 8.675 s]
[INFO] Apache Hadoop YARN Client .......................... SUCCESS [ 13.937 s]
[INFO] Apache Hadoop YARN SharedCacheManager .............. SUCCESS [ 10.853 s]
[INFO] Apache Hadoop YARN Timeline Plugin Storage ......... SUCCESS [ 12.546 s]
[INFO] Apache Hadoop YARN TimelineService HBase Backend ... SUCCESS [ 1.069 s]
[INFO] Apache Hadoop YARN TimelineService HBase Common .... SUCCESS [ 17.176 s]
[INFO] Apache Hadoop YARN TimelineService HBase Client .... SUCCESS [ 15.662 s]
[INFO] Apache Hadoop YARN TimelineService HBase Servers ... SUCCESS [ 0.901 s]
[INFO] Apache Hadoop YARN TimelineService HBase Server 1.2 SUCCESS [ 17.512 s]
[INFO] Apache Hadoop YARN TimelineService HBase tests ..... SUCCESS [ 17.327 s]
[INFO] Apache Hadoop YARN Router .......................... SUCCESS [ 14.430 s]
[INFO] Apache Hadoop YARN Applications .................... SUCCESS [ 1.990 s]
[INFO] Apache Hadoop YARN DistributedShell ................ SUCCESS [ 10.400 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher ........... SUCCESS [ 7.210 s]
[INFO] Apache Hadoop MapReduce Client ..................... SUCCESS [ 2.549 s]
[INFO] Apache Hadoop MapReduce Core ....................... SUCCESS [ 38.022 s]
[INFO] Apache Hadoop MapReduce Common ..................... SUCCESS [ 35.908 s]
[INFO] Apache Hadoop MapReduce Shuffle .................... SUCCESS [ 15.180 s]
[INFO] Apache Hadoop MapReduce App ........................ SUCCESS [ 18.915 s]
[INFO] Apache Hadoop MapReduce HistoryServer .............. SUCCESS [ 15.852 s]
[INFO] Apache Hadoop MapReduce JobClient .................. SUCCESS [ 12.987 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 12.106 s]
[INFO] Apache Hadoop YARN Services ........................ SUCCESS [ 1.812 s]
[INFO] Apache Hadoop YARN Services Core ................... SUCCESS [ 8.685 s]
[INFO] Apache Hadoop YARN Services API .................... SUCCESS [ 9.236 s]
[INFO] Apache Hadoop YARN Site ............................ SUCCESS [ 0.859 s]
[INFO] Apache Hadoop YARN UI .............................. SUCCESS [ 0.840 s]
[INFO] Apache Hadoop YARN Project ......................... SUCCESS [ 34.971 s]
[INFO] Apache Hadoop MapReduce HistoryServer Plugins ...... SUCCESS [ 7.376 s]
[INFO] Apache Hadoop MapReduce NativeTask ................. SUCCESS [02:07 min]
[INFO] Apache Hadoop MapReduce Uploader ................... SUCCESS [ 9.915 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [ 14.651 s]
[INFO] Apache Hadoop MapReduce ............................ SUCCESS [ 15.959 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 11.747 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [ 16.314 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [ 7.115 s]
[INFO] Apache Hadoop Archive Logs ......................... SUCCESS [ 8.686 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [ 12.413 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 10.490 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [ 7.894 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [ 7.098 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [ 19.457 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [ 12.452 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [04:55 min]
[INFO] Apache Hadoop Kafka Library support ................ SUCCESS [ 36.248 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [ 43.752 s]
[INFO] Apache Hadoop Aliyun OSS support ................... SUCCESS [ 34.905 s]
[INFO] Apache Hadoop Client Aggregator .................... SUCCESS [ 17.099 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 18.819 s]
[INFO] Apache Hadoop Resource Estimator Service ........... SUCCESS [ 29.363 s]
[INFO] Apache Hadoop Azure Data Lake support .............. SUCCESS [ 30.145 s]
[INFO] Apache Hadoop Image Generation Tool ................ SUCCESS [ 8.970 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 46.265 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [ 0.883 s]
[INFO] Apache Hadoop Client API ........................... SUCCESS [08:41 min]
[INFO] Apache Hadoop Client Runtime ....................... SUCCESS [06:39 min]
[INFO] Apache Hadoop Client Packaging Invariants .......... SUCCESS [ 4.040 s]
[INFO] Apache Hadoop Client Test Minicluster .............. SUCCESS [13:29 min]
[INFO] Apache Hadoop Client Packaging Invariants for Test . SUCCESS [ 1.937 s]
[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [ 1.865 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [01:56 min]
[INFO] Apache Hadoop Client Modules ....................... SUCCESS [ 5.050 s]
[INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [ 6.457 s]
[INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [ 0.829 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:06 h
[INFO] Finished at: 2019-09-03T14:14:45+08:00
[INFO] ------------------------------------------------------------------------

编译完成后的内容在hadoop-dist里面。感受一下为了编译这个玩意儿尝试了多少个版本:

cmake-3.0.0
cmake-3.3.0
hadoop-2.7.7-src.tar.gz
hadoop-2.9.2-src
hadoop-3.1.2-src
protobuf-2.5.0
cmake-3.1.0
hadoop-2.8.5-src
hadoop-2.9.2-src.tar.gz
hadoop-3.1.2-src.tar.gz
cmake-3.1.0.tar.gz
hadoop-2.7.7-src
hadoop-2.8.5-src.tar.gz
hadoop-3.1.2
hadoop-3.1.2.tar.gz

五、启动运行hadoop

hadoop-dist/target里面的hadoop-3.1.2.tar.gz拷贝到你要安装的位置,解压。

# 进入bin目录,启动前先格式化HDFS系统
cd hadoop-3.1.2/bin
./hdfs namenode -format
......
......
2019-09-03 14:35:53,356 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at raspberrypi/127.0.1.1
************************************************************/
# 启动所有服务
cd ../sbin/
./start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as wangjun in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [raspberrypi]
Starting datanodes
Starting secondary namenodes [raspberrypi]
Starting resourcemanager
Starting nodemanagers

访问8088端口http://localhost:8088就可以看到hadoop的管理界面了!

hadoop的web界面:

# All Applications
http://localhost:8088
# DataNode Information
http://localhost:9864
# Namenode Information
http://localhost:9870
# node
http://localhost:8042
# SecondaryNamenode information
http://localhost:9868

问题1:启动时报错:

$ ./start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as wangjun in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [raspberrypi]
raspberrypi: ERROR: JAVA_HOME is not set and could not be found.
Starting datanodes
localhost: ERROR: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [raspberrypi]
raspberrypi: ERROR: JAVA_HOME is not set and could not be found.
Starting resourcemanager
Starting nodemanagers
localhost: ERROR: JAVA_HOME is not set and could not be found.

解决方案:

vim ./etc/hadoop/hadoop-env.sh
# export JAVA_HOME=
改为具体的java安装路径,比如
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-armhf

大数据学习之路之Hadoop的更多相关文章

  1. 大数据学习之路(1)Hadoop生态体系结构

    Hadoop的核心是HDFS和MapReduce,hadoop2.0还包括YARN. Hadoop1.x的生态系统: Hadoop2.x引入YARN: HDFS(Hadoop分布式文件系统)源自于Go ...

  2. 大数据学习之路又之从小白到用sqoop导出数据

    写这篇文章的目的是总结自己学习大数据的经验,以为自己走了很多弯路,从迷茫到清晰,真的花费了很多时间,希望这篇文章能帮助到后面学习的人. 一.配置思路 安装linux虚拟机--->创建三台虚拟机- ...

  3. 大数据学习之路------借助HDP SANDBOX开始学习

    一开始... 一开始知道大数据这个概念的时候,只是感觉很高大上,引起了我的兴趣.当时也不知道,这个东西是做什么的,有什么用,当然现在看来也是很模糊的样子,但是的确比一开始强了不少. 所以学习的过程可能 ...

  4. 大数据学习笔记之初识Hadoop

    1.Hadoop概述 1.1 Hadoop名字的由来 Hadoop项目作者的孩子给一个棕黄色的大象样子的填充玩具的命名 Hadoop的官网:http://hadoop.apache.org . 1.2 ...

  5. 大数据学习之路之HBASE

    Hadoop之HBASE 一.HBASE简介 HBase是一个开源的.分布式的,多版本的,面向列的,半结构化的NoSql数据库,提供高性能的随机读写结构化数据的能力.它可以直接使用本地文件系统,也可以 ...

  6. 大数据学习——搭建第一台Hadoop主机

    类型:学习笔记 参考:尚硅谷大数据系列教程 工具准备 1.VMware 2.CentOS 7 最小安装版 3.远程工具推荐使用 FinalShell 安装系统 1.打开VMware,根据自己的情况配置 ...

  7. 大数据学习之路-hdfs

    1.什么是hadoop hadoop中有3个核心组件: 分布式文件系统:HDFS —— 实现将文件分布式存储在很多的服务器上 分布式运算编程框架:MAPREDUCE —— 实现在很多机器上分布式并行运 ...

  8. 大数据学习(1)Hadoop安装

    集群架构 Hadoop的安装其实就是HDFS和YARN集群的配置,从下面的架构图可以看出,HDFS的每一个DataNode都需要配置NameNode的位置.同理YARN中的每一个NodeManager ...

  9. 大数据学习之路——MySQL基础(一)——MySQL的基础知识与常见操作

    一.存储引擎 1.含义 存储引擎是数据库底层软件组织,数据库管理系统(DBMS)使用数据引擎进行创建.查询.更新和删除数据.不同的存储引擎提供不同的存储机制.索引技巧.锁定水平等功能,使用不同的存储引 ...

随机推荐

  1. element-ui DatePicker 日期格式处理

    1.使用DatePicker 日期选择器得到的日期格式是这样的 解决方案,添加 value-format="yyyy-MM-dd" <el-date-picker type= ...

  2. js把文字中的空格替换为横线

    问题描述 把一个字符串里的所有空格转换为横线 var str = "hello word"; var params=str.replace(/\ /g, "-" ...

  3. python 版本号比较 重载运算符

    # -*- coding: utf-8 -*- class VersionNum(object): """ 版本号比较 默认版本以“.”分割,各位版本位数不超过3 例一: ...

  4. django同一个项目中连接多个数据库

    一.场景与思路 同一个项目中需要连接多个数据库. 二.代码 代码中主要是三个部分,settings.models以及自己写的一个类. 1.自己写的文件:database_app_router.py  ...

  5. [nginx] nginx源码分析--框架概览

    ALLINONE 所有我分析到的内容,都花在了一张图里. 其中包括: 1.  核心config数据结构. 2.  模块类关系图. 3. 配置类关系图. 4. 主要模块的实例化实体关系图. 5. htt ...

  6. nginx 代理 websocket

    nginx 代理 websocket nginx 首先确认版本必须是1.3以上 map指令的作用: 该作用主要是根据客户端请求中$http_upgrade 的值,来构造改变$connection_up ...

  7. Docker容器化技术(下)

    Docker容器化技术(下) 一.Dockerfile基础命令 1.1.FROM - 基于基准镜像 FROM centos #制作基准镜像(基于centos) FROM scratch #不依赖任何基 ...

  8. DNS子域授权,区域传送

    dig 命令 +recurse  递归查询 默认    +norecurse 不递归查询 dig +recurse  -t A   www.baidu.com @127.0.0.1 dig  -t a ...

  9. 使用BERT模型生成token级向量

    本文默认读者有一定的Transformer基础,如果没有,请先稍作学习Transormer以及BERT. 相信网上有很多方法可以生成BERT向量,最有代表性的一个就是bert as service,用 ...

  10. 团队作业第六次—团队Github实战训练(追光的人)

    所属课程 软件工程1916 作业要求 团队作业第六次-团队Github实战训练 团队名称 追光的人 作业目标 搭建一个相对公平公正的抽奖系统,根据QQ聊天记录,完成从统计参与抽奖人员颁布抽奖结果的基本 ...