大数据(bigdata)练习题
1.在HDFS文件系统的根目录下创建递归目录“1daoyun/file”,将附件中的BigDataSkills.txt文件,上传到1daoyun/file目录中,使用相关命令查看文件系统中1daoyun/file目录的文件列表信息。
答: [root@master MapReduce]# hadoop fs -mkdir -p /1daoyun/file [root@master MapReduce]# hadoop fs -put BigDataSkills.txt /1daoyun/file [root@master MapReduce]# hadoop fs -ls /1daoyun/file Found 1 items -rw-r--r-- 3 root hdfs 1175 2018-02-12 08:01 /1daoyun/file/BigDataSkills.txt |
2.在HDFS文件系统的根目录下创建递归目录“1daoyun/file”,将附件中的BigDataSkills.txt文件,上传到1daoyun/file目录中,上传过程指定BigDataSkills.txt文件在HDFS文件系统中的复制因子为2,并使用fsck工具检查存储块的副本数。
答: [root@master MapReduce]# hadoop fs -mkdir -p /1daoyun/file [root@master MapReduce]# hadoop fs -D dfs.replication=2 -put BigDataSkills.txt /1daoyun/file [root@master MapReduce]# hadoop fsck /1daoyun/file/BigDataSkills.txt DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Connecting to namenode via http://master.hadoop:50070/fsck?ugi=root&path=%2F1daoyun%2Ffile%2FBigDataSkills.txt FSCK started by root (auth:SIMPLE) from /10.0.6.123 for path /1daoyun/file/BigDataSkills.txt at Mon Feb 12 08:11:47 UTC 2018 . /1daoyun/file/BigDataSkills.txt: Under replicated BP-297530755-10.0.6.123-1518056860260:blk_1073746590_5766. Target Replicas is 2 but found 1 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s). Status: HEALTHY Total size: 1175 B Total dirs: 0 Total files: 1 Total symlinks: 0 Total blocks (validated): 1 (avg. block size 1175 B) Minimally replicated blocks: 1 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 1 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 1 (50.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Mon Feb 12 08:11:47 UTC 2018 in 1 milliseconds The filesystem under path '/1daoyun/file/BigDataSkills.txt' is HEALTHY |
3.HDFS文件系统的根目录下存在一个/apps的文件目录,要求开启该目录的可创建快照功能,并为该目录文件创建快照,快照名称为apps_1daoyun,使用相关命令查看该快照文件的列表信息。
答: [hdfs@master ~]# hadoop dfsadmin -allowSnapshot /apps Allowing snaphot on /apps succeeded [hdfs@master ~]# hadoop fs -createSnapshot /apps apps_1daoyun Created snapshot /apps/.snapshot/apps_1daoyun [hdfs@master ~]# hadoop fs -ls /apps/.snapshot Found 1 items drwxrwxrwx - hdfs hdfs 0 2017-05-07 09:48 /apps/.snapshot/apps_1daoyun |
4.为了防止操作人员误删文件,HDFS文件系统提供了回收站的功能,但过多的垃圾文件会占用大量的存储空间。要求在Linux Shell中使用“vi”命令修改相应的配置文件以及参数信息,关闭回收站功能。完成后,重启相应的服务。
答: [root@master ~]# vi /etc/hadoop/ 2.6.1.0-129/0/hdfs-site.xml <property> <name>fs.trash.interval</name> <value>0</value> </property> [root@master ~]# su - hdfs Last login: Mon May 8 09:31:52 UTC 2017 [hdfs@master ~]$ /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf stop namenode [hdfs@master ~]$ /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode [hdfs@master ~]$ /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf stop datanode [hdfs@master ~]$ /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode |
5.使用命令查看hdfs文件系统中/tmp目录下的目录个数,文件个数和文件总大小。
答: [root@master ~]# hadoop fs -count /tmp 21 6 4336 /tmp |
6.在集群节点中/usr/hdp/ 2.6.1.0-129/hadoop-mapreduce/目录下,存在一个案例JAR包hadoop-mapreduce-examples.jar。运行JAR包中的wordcount程序来对/1daoyun/file/BigDataSkills.txt文件进行单词计数,将运算结果输出到/1daoyun/output目录中,使用相关命令查询单词计数结果。
答: [root@master ~]# hadoop jar /usr/hdp/ 2.6.1.0-129/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.3.2.6.1.0-129.jar wordcount /1daoyun/file/BigDataSkills.txt /1daoyun/output [root@master ~]# hadoop fs -cat /1daoyun/output/part-r-00000 "duiya 1 hello 1 nisibusisha 1 wosha" 1 zsh 1 |
7.在集群节点中/usr/hdp/ 2.6.1.0-129/hadoop-mapreduce/目录下,存在一个案例JAR包hadoop-mapreduce-examples.jar。运行JAR包中的sudoku程序来计算下表中数独运算题的结果。
8 |
|||||||||
3 |
6 |
||||||||
7 |
9 |
2 |
|||||||
5 |
7 |
||||||||
4 |
5 |
7 |
|||||||
1 |
3 |
||||||||
1 |
6 |
8 |
|||||||
8 |
5 |
1 |
|||||||
9 |
4 |
||||||||
答: [root@master ~]# cat puzzle1.dta 8 ? ? ? ? ? ? ? ? ? ? 3 6 ? ? ? ? ? ? 7 ? ? 9 ? 2 ? ? ? 5 ? ? ? 7 ? ? ? ? ? ? ? 4 5 7 ? ? ? ? ? 1 ? ? ? 3 ? ? ? 1 ? ? ? ? 6 8 ? ? 8 5 ? ? ? 1 ? ? 9 ? ? ? ? 4 ? ? [root@master hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.1.2.4.3.0-227.jar sudoku /root/puzzle1.dta WARNING: Use "yarn jar" to launch YARN applications. Solving /root/puzzle1.dta 8 1 2 7 5 3 6 4 9 9 4 3 6 8 2 1 7 5 6 7 5 4 9 1 2 8 3 1 5 4 2 3 7 8 9 6 3 6 9 8 4 5 7 2 1 2 8 7 1 6 9 5 3 4 5 2 1 9 7 4 3 6 8 4 3 8 5 2 6 9 1 7 7 9 6 3 1 8 4 5 2 Found 1 solutions |
8.在集群节点中/usr/hdp/2.6.1.0-129/hadoop-mapreduce/目录下,存在一个案例JAR包hadoop-mapreduce-examples.jar。运行JAR包中的grep程序来统计文件系统中/1daoyun/file/BigDataSkills.txt文件中“Hadoop”出现的次数,统计完成后,查询统计结果信息。
答: [root@master hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.1.2.4.3.0-227.jar grep /1daoyun/file/BigDataSkills.txt /output hadoop [root@master hadoop-mapreduce]# hadoop fs -cat /output/part-r-00000 2 hadoop |
9.启动先电大数据平台的Hbase数据库,其中要求使用master节点的RegionServer。在Linux Shell中启动Hbase shell,查看进入HBase shell的当前系统用户。(相关数据库命令语言请全部使用小写格式)
答: hbase(main):003:0> whoami root (auth:SIMPLE) groups: root |
10.开启HBase的安全认证功能,在HBase Shell中设置root用户拥有表xiandian_user的读写与执行的权限,设置完成后,使用相关命令查看其权限信息。
答: 参数 Enable Authorization 参数值 native hbase(main):002:0> grant 'root','RWX','xiandian_user' 0 row(s) in 0.4800 seconds hbase(main):003:0> user_permission 'xiandian_user' User Namespace,Table,Family,Qualifier:Permission root default,xiandian_user,,: [Permission: actions=READ,WRITE,EXEC] 1 row(s) in 0.1180 seconds |
11. 登录hbase数据库,创建一张表为member,列族为'address','info',创建完之后,向该表插入数据,插入的数据为:
'xiandianA','info:age','24'
'xiandianA','info:birthday','1990-07-17'
'xiandianA','info:company','alibaba'
'xiandianA','address:contry','china'
'xiandianA','address:province','zhejiang'
'xiandianA','address:city','hangzhou'
插入完毕后,使用命令查询member表中xiandianA的所有info信息,最后将xiandianA的年龄改为99,并只查询info:age信息。
答: hbase(main):001:0> create 'member','address','info' 0 row(s) in 1.5730 seconds => Hbase::Table - member hbase(main):002:0> list TABLE emp member 2 row(s) in 0.0240 seconds hbase(main):007:0> put'member','xiandianA','info:age','24' 0 row(s) in 0.1000 seconds hbase(main):008:0> put'member','xiandianA','info:birthday','1990-07-17' 0 row(s) in 0.0130 seconds hbase(main):010:0> put'member','xiandianA','info:company','alibaba' 0 row(s) in 0.0080 seconds hbase(main):011:0> put'member','xiandianA','address:contry','china' 0 row(s) in 0.0080 seconds hbase(main):012:0> put'member','xiandianA','address:province','zhejiang' 0 row(s) in 0.0070 seconds hbase(main):013:0> put'member','xiandianA','address:city','hangzhou' 0 row(s) in 0.0090 seconds hbase(main):014:0> get 'member','xiandianA','info' COLUMN CELL info:age timestamp=1522140592336, value=24 info:birthday timestamp=1522140643072, value=1990-07-17 info:company timestamp=1522140745172, value=alibaba 3 row(s) in 0.0170 seconds hbase(main):015:0> hbase(main):016:0* put 'member','xiandianA','info:age','99' 0 row(s) in 0.0080 seconds hbase(main):018:0> get 'member','xiandianA','info:age' COLUMN CELL info:age timestamp=1522141564423, value=99 1 row(s) in 0.0140 seconds |
12.在关系数据库系统中,命名空间namespace是表的逻辑分组,同一组中的表有类似的用途。登录hbase数据库,新建一个命名空间叫newspace并用list查询,然后在这个命名空间中创建表member,列族为'address','info',创建完之后,向该表插入数据,插入的数据为:
'xiandianA','info:age','24'
'xiandianA','info:birthday','1990-07-17'
'xiandianA','info:company','alibaba'
'xiandianA','address:contry','china'
'xiandianA','address:province','zhejiang'
'xiandianA','address:city','hangzhou'
插入完毕后,使用scan命令只查询表中info:age的信息,指定startrow为xiandianA。
答: hbase(main):022:0> create_namespace 'newspace' 0 row(s) in 0.1130 seconds hbase(main):024:0> list TABLE emp member newspace:member 3 row(s) in 0.0100 seconds => ["emp", "member", "newspace:member"] hbase(main):023:0> create 'newspace:member','address','info' 0 row(s) in 1.5270 seconds hbase(main):033:0> put 'newspace:member','xiandianA','info:age','24' 0 row(s) in 0.0620 seconds hbase(main):037:0> put 'newspace:member','xiandianA','info:birthday','1990-07-17' 0 row(s) in 0.0110 seconds hbase(main):038:0> put 'newspace:member','xiandianA','info:company','alibaba' 0 row(s) in 0.0130 seconds hbase(main):039:0> put 'newspace:member','xiandianA','address:contry','china' 0 row(s) in 0.0070 seconds hbase(main):040:0> put 'newspace:member','xiandianA','address:province','zhejiang' 0 row(s) in 0.0070 seconds hbase(main):041:0> put 'newspace:member','xiandianA','address:city','hangzhou' 0 row(s) in 0.0070 seconds hbase(main):044:0> scan 'newspace:member', {COLUMNS => ['info:age'],STARTROW => 'xiandianA'} ROW COLUMN+CELL xiandianA column=info:age, timestamp=1522214952401, value=24 1 row(s) in 0.0160 seconds |
13.登录master节点,在本地新建一个文件叫hbasetest.txt文件,编写内容,要求新建一张表为'test', 列族为'cf',然后向这张表批量插入数据,数据如下所示:
'row1', 'cf:a', 'value1'
'row2', 'cf:b', 'value2'
'row3', 'cf:c', 'value3'
'row4', 'cf:d', 'value4'
在插入数据完毕后用scan命令查询表内容,然后用get命令只查询row1的内容,最后退出hbase shell。使用命令运行hbasetest.txt,将hbasetest.txt的内容和执行命令后的返回结果提交。
答: [root@exam1 ~]# cat hbasetest.txt create 'test', 'cf' list 'test' put 'test', 'row1', 'cf:a', 'value1' put 'test', 'row2', 'cf:b', 'value2' put 'test', 'row3', 'cf:c', 'value3' put 'test', 'row4', 'cf:d', 'value4' scan 'test' get 'test', 'row1' exit [root@exam1 ~]# hbase shell hbasetest.txt 0 row(s) in 1.5010 seconds TABLE test 1 row(s) in 0.0120 seconds 0 row(s) in 0.1380 seconds 0 row(s) in 0.0090 seconds 0 row(s) in 0.0050 seconds 0 row(s) in 0.0050 seconds ROW COLUMN+CELL row1 column=cf:a, timestamp=1522314428726, value=value1 row2 column=cf:b, timestamp=1522314428746, value=value2 row3 column=cf:c, timestamp=1522314428752, value=value3 row4 column=cf:d, timestamp=1522314428758, value=value4 4 row(s) in 0.0350 seconds COLUMN CELL cf:a timestamp=1522314428726, value=value1 1 row(s) in 0.0190 seconds |
14.使用Hive工具来创建数据表xd_phy_course,并定义该表为外部表,外部存储位置为/1daoyun/data/hive,将phy_course_xd.txt导入到该表中,其中xd_phy_course表的数据结构如下表所示。导入完成后,在hive中查询数据表xd_phy_course的数据结构信息。(相关数据库命令语言请全部使用小写格式)
stname(string) |
stID(int) |
class(string) |
opt_cour(string) |
|
答: hive> create external table xd_phy_course (stname string,stID int,class string,opt_cour string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/1daoyun/data/hive'; OK Time taken: 1.197 seconds hive> load data local inpath '/root/phy_course_xd.txt' into table xd_phy_course; Loading data to table default.xd_phy_course Table default.xd_phy_course stats: [numFiles=1, totalSize=89444] OK Time taken: 0.96 seconds hive> desc xd_phy_course2; OK stname string stid int class string opt_cour string Time taken: 0.588 seconds, Fetched: 4 row(s) |
15.使用Hive工具来统计phy_course_xd.txt文件中某高校报名选修各个体育科目的总人数,其中phy_course_xd.txt文件数据结构如下表所示,选修科目字段为opt_cour,将统计的结果导入到表phy_opt_count中,通过SELECT语句查询表phy_opt_count内容。(相关数据库命令语言请全部使用小写格式)
stname(string) |
stID(int) |
class(string) |
opt_cour(string) |
|
答: hive> create table xd_phy_course (stname string,stID int,class string,opt_cour string) row format delimited fields terminated by '\t' lines terminated by '\n'; OK Time taken: 4.067 seconds hive> load data local inpath '/root/phy_course_xd.txt' into table xd_phy_course; Loading data to table default.xd_phy_course Table default.xd_phy_course stats: [numFiles=1, totalSize=89444] OK Time taken: 1.422 seconds hive> create table phy_opt_count (opt_cour string,cour_count int) row format delimited fields terminated by '\t' lines terminated by '\n'; OK Time taken: 1.625 seconds hive> insert overwrite table phy_opt_count select xd_phy_course.opt_cour,count(distinct xd_phy_course.stID) from xd_phy_course group by xd_phy_course.opt_cour; Query ID = root_20170507125642_6af22d21-ae88-4daf-a346-4b1cbcd7d9fe Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established. Status: Running (Executing on YARN cluster with App id application_1494149668396_0004) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 4.51 s -------------------------------------------------------------------------------- Loading data to table default.phy_opt_count Table default.phy_opt_count stats: [numFiles=1, numRows=10, totalSize=138, rawDataSize=128] OK Time taken: 13.634 seconds hive> select * from phy_opt_count; OK badminton 234 basketball 224 football 206 gymnastics 220 opt_cour 0 swimming 234 table tennis 277 taekwondo 222 tennis 223 volleyball 209 Time taken: 0.065 seconds, Fetched: 10 row(s) |
16.使用Hive工具来统计phy_course_score_xd.txt文件中某高校各个班级体育课的平均成绩,使用round函数保留两位小数。其中phy_course_score_xd.txt文件数据结构如下表所示,班级字段为class,成绩字段为score。(相关数据库命令语言请全部使用小写格式)
stname(string) |
stID(int) |
class(string) |
opt_cour(string) |
score(float) |
|
答: hive> create table phy_course_score_xd (stname string,stID int,class string,opt_cour string,score float) row format delimited fields terminated by '\t' lines terminated by '\n'; OK Time taken: 0.339 seconds hive> load data local inpath '/root/phy_course_score_xd.txt' into table phy_course_score_xd; Loading data to table default.phy_course_score_xd Table default.phy_course_score_xd stats: [numFiles=1, totalSize=1910] OK Time taken: 1.061 seconds hive> select class,round(avg(score)) from phy_course_score_xd group by class; Query ID = root_20170507131823_0bfb1faf-3bfb-42a5-b7eb-3a6a284081ae Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1494149668396_0005) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 26.68 s -------------------------------------------------------------------------------- OK Network_1401 73.0 Software_1403 72.0 class NULL Time taken: 27.553 seconds, Fetched: 3 row(s) |
17.使用Hive工具来统计phy_course_score_xd.txt文件中某高校各个班级体育课的最高成绩。其中phy_course_score_xd.txt文件数据结构如下表所示,班级字段为class,成绩字段为score。(相关数据库命令语言请全部使用小写格式)
stname(string) |
stID(int) |
class(string) |
opt_cour(string) |
score(float) |
|
答: hive> create table phy_course_score_xd (stname string,stID int,class string,opt_cour string,score float) row format delimited fields terminated by '\t' lines terminated by '\n'; OK Time taken: 0.339 seconds hive> load data local inpath '/root/phy_course_score_xd.txt' into table phy_course_score_xd; Loading data to table default.phy_course_score_xd Table default.phy_course_score_xd stats: [numFiles=1, totalSize=1910] OK Time taken: 1.061 seconds hive> select class,max(score) from phy_course_score_xd group by class; Query ID = root_20170507131942_86a2bf55-49ac-4c2e-b18b-8f63191ce349 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1494149668396_0005) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.08 s -------------------------------------------------------------------------------- OK Network_1401 95.0 Software_1403 100.0 class NULL Time taken: 144.035 seconds, Fetched: 3 row(s) |
18.在Hive数据仓库将网络日志weblog_entries.txt中分开的request_date和request_time字段进行合并,并以一个下划线“_”进行分割,如下图所示,其中weblog_entries.txt的数据结构如下表所示。(相关数据库命令语言请全部使用小写格式)
md5(STRING) |
url(STRING) |
request_date (STRING) |
request_time (STRING) |
ip(STRING) |
答: hive> create table weblog_entries (md5 string,url string,request_date string,request_time string,ip string) row format delimited fields terminated by '\t' lines terminated by '\n'; OK Time taken: 0.502 seconds hive> load data local inpath '/root/weblog_entries.txt' into table weblog_entries; Loading data to table default.weblog_entries Table default.weblog_entries stats: [numFiles=1, totalSize=251130] OK Time taken: 1.203 seconds hive> select concat_ws('_', request_date, request_time) from weblog_entries; 2012-05-10_21:29:01 2012-05-10_21:13:47 2012-05-10_21:12:37 2012-05-10_21:34:20 2012-05-10_21:27:00 2012-05-10_21:33:53 2012-05-10_21:10:19 2012-05-10_21:12:05 2012-05-10_21:25:58 2012-05-10_21:34:28 Time taken: 0.265 seconds, Fetched: 3000 row(s) |
19. 使用Hive动态地关于网络日志weblog_entries.txt的查询结果创建Hive表。通过创建一张名为weblog_entries_url_length的新表来定义新的网络日志数据库的三个字段,分别是url,request_date,request_time。此外,在表中定义一个获取url字符串长度名为“url_length”的新字段,其中weblog_entries.txt的数据结构如下表所示。完成后查询weblog_entries_url_length表文件内容。(相关数据库命令语言请全部使用小写格式)
md5(STRING) |
url(STRING) |
request_date (STRING) |
request_time (STRING) |
ip(STRING) |
答: hive> create table weblog_entries (md5 string,url string,request_date string,request_time string,ip string) row format delimited fields terminated by '\t' lines terminated by '\n'; OK Time taken: 0.502 seconds hive> load data local inpath '/root/weblog_entries.txt' into table weblog_entries; Loading data to table default.weblog_entries Table default.weblog_entries stats: [numFiles=1, totalSize=251130] OK Time taken: 1.203 seconds hive> create table weblog_entries_url_length as select url, request_date, request_time, length(url) as url_length from weblog_entries; Query ID = root_20170507065123_e3105d8b-84b6-417f-ab58-21ea15723e0a Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1494136863427_0002) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 4.10 s -------------------------------------------------------------------------------- Moving data to: hdfs://master:8020/apps/hive/warehouse/weblog_entries_url_length Table default.weblog_entries_url_length stats: [numFiles=1, numRows=3000, totalSize=121379, rawDataSize=118379] OK Time taken: 5.874 seconds hive> select * from weblog_entries_url_length; /qnrxlxqacgiudbtfggcg.html 2012-05-10 21:29:01 26 /sbbiuot.html 2012-05-10 21:13:47 13 /ofxi.html 2012-05-10 21:12:37 10 /hjmdhaoogwqhp.html 2012-05-10 21:34:20 19 /angjbmea.html 2012-05-10 21:27:00 14 /mmdttqsnjfifkihcvqu.html 2012-05-10 21:33:53 25 /eorxuryjadhkiwsf.html 2012-05-10 21:10:19 22 /e.html 2012-05-10 21:12:05 7 /khvc.html 2012-05-10 21:25:58 10 /c.html 2012-05-10 21:34:28 7 Time taken: 0.08 seconds, Fetched: 3000 row(s) |
20.在master和slaver节点安装Sqoop Clients,完成后,在master节点查看Sqoop的版本信息。
答: [root@master ~]# sqoop version Warning: /usr/hdp/2.4.3.0-227/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 17/05/07 06:56:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.3.0-227 Sqoop 1.4.6.2.4.3.0-227 git commit id d296ad374bd38a1c594ef0f5a2d565d71e798aa6 Compiled by jenkins on Sat Sep 10 00:58:52 UTC 2016 |
21.使用Sqoop工具列出master节点中MySQL中ambari数据库中所有的数据表。
答: [root@master ~]# sqoop list-tables --connect jdbc:mysql://localhost/ambari --username root --password bigdata Warning: /usr/hdp/2.4.3.0-227/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 17/05/07 07:07:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.3.0-227 17/05/07 07:07:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/05/07 07:07:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. ClusterHostMapping QRTZ_BLOB_TRIGGERS QRTZ_CALENDARS QRTZ_CRON_TRIGGERS QRTZ_FIRED_TRIGGERS QRTZ_JOB_DETAILS QRTZ_LOCKS QRTZ_PAUSED_TRIGGER_GRPS QRTZ_SCHEDULER_STATE QRTZ_SIMPLE_TRIGGERS QRTZ_SIMPROP_TRIGGERS QRTZ_TRIGGERS adminpermission adminprincipal adminprincipaltype adminprivilege adminresource adminresourcetype alert_current alert_definition alert_group alert_group_target alert_grouping alert_history alert_notice alert_target alert_target_states ambari_sequences artifact blueprint blueprint_configuration clusterEvent cluster_version clusterconfig clusterconfigmapping clusters clusterservices clusterstate confgroupclusterconfigmapping configgroup configgrouphostmapping execution_command groups hdfsEvent host_role_command host_version hostcomponentdesiredstate hostcomponentstate hostconfigmapping hostgroup hostgroup_component hostgroup_configuration hosts hoststate job kerberos_descriptor kerberos_principal kerberos_principal_host key_value_store mapreduceEvent members metainfo repo_version request requestoperationlevel requestresourcefilter requestschedule requestschedulebatchrequest role_success_criteria servicecomponentdesiredstate serviceconfig serviceconfighosts serviceconfigmapping servicedesiredstate stack stage task taskAttempt topology_host_info topology_host_request topology_host_task topology_hostgroup topology_logical_request topology_logical_task topology_request upgrade upgrade_group upgrade_item users viewentity viewinstance viewinstancedata viewinstanceproperty viewmain viewparameter viewresource widget widget_layout widget_layout_user_widget workflow |
22.在MySQL中创建名为xiandian的数据库,在xiandian数据库中创建xd_phy_course数据表,其数据表结构如表1所示。使用Hive工具来创建数据表xd_phy_course,将phy_course_xd.txt导入到该表中,其中xd_phy_course表的数据结构如表2所示。使用Sqoop工具将hive数据仓库中的xd_phy_course表导出到master节点的MySQL中xiandain数据库的xd_phy_course表。
表1
stname VARCHAR(20) |
stID INT(1) |
class VARCHAR(20) |
opt_cour VARCHAR(20) |
表2
stname(string) |
stID(int) |
class(string) |
opt_cour(string) |
|
答: [root@master ~]# mysql -uroot -pbigdata Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 37 Server version: 5.5.44-MariaDB MariaDB Server Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> create database xiandian; Query OK, 1 row affected (0.00 sec) MariaDB [(none)]> use xiandian; Database changed MariaDB [xiandian]> create table xd_phy_course(stname varchar(20),stID int(1),class varchar(20),opt_cour varchar(20)); Query OK, 0 rows affected (0.20 sec) hive> create table xd_phy_course (stname string,stID int,class string,opt_cour string) row format delimited fields terminated by '\t' lines terminated by '\n'; OK Time taken: 3.136 seconds hive> load data local inpath '/root/phy_course_xd.txt' into table xd_phy_course3; Loading data to table default.xd_phy_course3 Table default.xd_phy_course3 stats: [numFiles=1, totalSize=89444] OK Time taken: 1.129 seconds [root@master ~]# sqoop export --connect jdbc:mysql://localhost:3306/xiandian --username root --password bigdata --table xd_phy_course --hcatalog-table xd_phy_course Warning: /usr/hdp/2.4.3.0-227/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 17/05/07 07:29:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.3.0-227 17/05/07 07:29:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/05/07 07:29:48 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/05/07 07:29:48 INFO tool.CodeGenTool: Beginning code generation 17/05/07 07:29:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `xd_phy_course` AS t LIMIT 1 17/05/07 07:29:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `xd_phy_course` AS t LIMIT 1 17/05/07 07:29:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.3.0-227/hadoop-mapreduce Note: /tmp/sqoop-root/compile/35d4b31b4d93274ba6bde54b3e56a821/xd_phy_course.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/05/07 07:29:50 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/35d4b31b4d93274ba6bde54b3e56a821/xd_phy_course.jar 17/05/07 07:29:50 INFO mapreduce.ExportJobBase: Beginning export of xd_phy_course SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-227/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-227/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. |
23.使用Pig工具在Local模式计算系统日志access-log.txt中的IP的点击数,要求使用GROUP BY语句按照IP进行分组,通过FOREACH 运算符,对关系的列进行迭代,统计每个分组的总行数,最后使用DUMP语句查询统计结果。
答: grunt> copyFromLocal /root/Pig/access-log.txt /user/root/input/log1.txt grunt> A =LOAD '/user/root/input/log1.txt' USING PigStorage (' ') AS (ip,others); grunt> group_ip =group A by ip; grunt> result =foreach group_ip generate group,COUNT(A); grunt> dump result; 2018-02-13 08:13:36,520 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.7.3.2.6.1.0-129 0.16.0.2.6.1.0-129 root 2018-02-13 08:13:37 2018-02-13 08:13:41 GROUP_BY Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_local963723433_0001 1 1 n/a n/a n/a n/a n/a n/a n/a n/a A,group_ip,result GROUP_BY,COMBINER file:/tmp/temp-1479363025/tmp133834330, Input(s): Successfully read 62991 records from: "/user/root/input/log1.txt" Output(s): Successfully stored 182 records in: "file:/tmp/temp-1479363025/tmp133834330" (220.181.108.186,1) (222.171.234.225,142) (http://www.1daoyun.com/course/toregeister",1) |
24.使用Pig工具计算天气数据集temperature.txt中年度最高气温,要求使用GROUP BY语句按照year进行分组,通过FOREACH 运算符,对关系的列进行迭代,统计每个分组的最大值,最后使用DUMP语句查询计算结果。
答: grunt> copyFromLocal /root/Pig/temperature.txt /user/root/temp.txt grunt> A = LOAD '/user/root/temp.txt' USING PigStorage(' ')AS (year:int,temperature:int); grunt> B = GROUP A BY year; grunt> C = FOREACH B GENERATE group,MAX(A.temperature); grunt> dump C; 2018-02-13 08:18:52,107 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY (2012,40) (2013,36) (2014,37) (2015,39) |
25.使用Pig工具统计数据集ip_to_country中每个国家的IP地址数。要求使用GROUP BY语句按照国家进行分组,通过FOREACH 运算符,对关系的列进行迭代,统计每个分组的IP地址数目,最后将统计结果保存到/data/pig/output目录中,并查看数据结果。
答: grunt> copyFromLocal /root/Pig/ip_to_country.txt /user/root/ip_to_country.txt grunt> ip_countries = LOAD '/user/root/ip_to_country.txt' AS (ip: chararray, country:chararray); grunt> country_grpd = GROUP ip_countries BY country; grunt> country_counts = FOREACH country_grpd GENERATE FLATTEN(group),COUNT(ip_countries) as counts; grunt> STORE country_counts INTO '/data/pig/output'; 2018-02-13 08:23:35,621 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY Moldova, Republic of 1 Syrian Arab Republic 1 United Arab Emirates 2 Bosnia and Herzegovina 1 Iran, Islamic Republic of 2 Tanzania, United Republic of 1 |
26.在master节点安装Mahout Client,打开Linux Shell运行mahout命令查看Mahout自带的案例程序。
答: [root@master ~]# mahout MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /usr/hdp/2.6.1.0-129/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/hdp/2.6.1.0-129/hadoop/conf MAHOUT-JOB: /usr/hdp/2.6.1.0-129/mahout/mahout-examples-0.9.0.2.6.1.0-129-job.jar An example program must be given as the first argument. Valid program names are: arff.vector: : Generate Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised HMM training buildforest: : Build the random forest classifier canopy: : Canopy clustering cat: : Print a file or resource as the logistic regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump: : Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump confusion matrix in HTML or text formats concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. describe: : Describe the fields and target variable in a data set evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes fkmeans: : Fuzzy K-means clustering hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering lucene.vector: : Generate Vectors from a Lucene index lucene2seq: : Generate Text SequenceFiles from a Lucene index matrixdump: : Dump matrix in CSV format matrixmult: : Take the product of two matrices parallelALS: : ALS-WR factorization of a rating matrix qualcluster: : Runs clustering experiments and summarizes results in a CSV recommendfactorized: : Compute recommendations using the factorization of a rating matrix recommenditembased: : Compute recommendations using item-based collaborative filtering regexconverter: : Convert text files on a per line basis based on regular expressions resplit: : Splits a set of SequenceFiles into a number of equal splits rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>} rowsimilarity: : Compute the pairwise similarities of the rows of a matrix runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model runlogistic: : Run a logistic regression model against CSV data seq2encoded: : Encoded Sparse Vector generation from Text sequence files seq2sparse: : Sparse Vector generation from Text sequence files seqdirectory: : Generate sequence files (of Text) from a directory seqdumper: : Generic Sequence File dumper seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives seqwiki: : Wikipedia xml dump to sequence file spectralkmeans: : Spectral k-means clustering split: : Split Input data into test and train sets splitDataset: : split a rating dataset into training and probe parts ssvd: : Stochastic SVD streamingkmeans: : Streaming k-means clustering svd: : Lanczos Singular Value Decomposition testforest: : Test the random forest classifier testnb: : Test the Vector-based Bayes classifier trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model trainlogistic: : Train a logistic regression using stochastic gradient descent trainnb: : Train the Vector-based Bayes classifier transpose: : Take the transpose of a matrix validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors vectordump: : Dump vectors from a sequence file to text viterbi: : Viterbi decoding of hidden states from given output states sequence |
27.使用Mahout挖掘工具对数据集user-item-score.txt(用户-物品-得分)进行物品推荐,要求采用基于项目的协同过滤算法,欧几里得距离公式定义,并且每位用户的推荐个数为3,设置非布尔数据,最大偏好值为4,最小偏好值为1,将推荐输出结果保存到output目录中,通过-cat命令查询输出结果part-r-00000中的内容 。
答: [hdfs@master ~]$ hadoop fs -mkdir -p /data/mahout/project [hdfs@master ~]$ hadoop fs -put user-item-score.txt /data/mahout/project [hdfs@master ~]$ mahout recommenditembased -i /data/mahout/project/ user-item-score.txt -o /data/mahout/project/output -n 3 -b false -s SIMILARITY_EUCLIDEAN_DISTANCE --maxPrefsPerUser 4 --minPrefsPerUser 1 --maxPrefsInItemSimilarity 4 --tempDir /data/mahout/project/temp MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /usr/hdp/2.4.3.0-227/hadoop/bin/hadoop and 17/05/15 19:37:25 INFO driver.MahoutDriver: Program took 259068 ms (Minutes: 4.3178) [hdfs@master ~]$ hadoop fs -cat /data/mahout/project/output/part-r-00000 1 [105:3.5941463,104:3.4639049] 2 [106:3.5,105:2.714964,107:2.0] 3 [103:3.59246,102:3.458911] 4 [107:4.7381864,105:4.2794304,102:4.170158] 5 [103:3.8962872,102:3.8564017,107:3.7692602] |
28.在master节点安装启动Flume组件,打开Linux Shell运行flume-ng的帮助命令,查看Flume-ng的用法信息。
答: [root@master ~]# flume-ng help Usage: /usr/hdp/2.6.1.0-129/flume/bin/flume-ng.distro <command> [options]... commands: help display this help text agent run a Flume agent avro-client run an avro Flume client password create a password file for use in flume config version show Flume version info global options: --conf,-c <conf> use configs in <conf> directory --classpath,-C <cp> append to the classpath --dryrun,-d do not actually start Flume, just print the command --plugins-path <dirs> colon-separated list of plugins.d directories. See the plugins.d section in the user guide for more details. Default: $FLUME_HOME/plugins.d -Dproperty=value sets a Java system property value -Xproperty=value sets a Java -X option agent options: --conf-file,-f <file> specify a config file (required) --name,-n <name> the name of this agent (required) --help,-h display help text avro-client options: --rpcProps,-P <file> RPC client properties file with server connection params --host,-H <host> hostname to which events will be sent --port,-p <port> port of the avro source --dirname <dir> directory to stream to avro source --filename,-F <file> text file to stream to avro source (default: std input) --headerFile,-R <file> File containing event headers as key/value pairs on each new line --help,-h display help text Either --rpcProps or both --host and --port must be specified. password options: --outfile The file in which encoded password is stored Note that if <conf> directory is specified, then it is always included first in the classpath. |
29. 根据提供的模板hdfs-example.conf文件,使用Flume NG工具设置master节点的系统路径/opt/xiandian/为实时上传文件至HDFS文件系统的实时路径,设置HDFS文件系统的存储路径为/data/flume/,上传后的文件名保持不变,文件类型为DataStream,然后启动flume-ng agent。
答: [root@master ~]# flume-ng agent --conf-file hdfs-example.conf --name master -Dflume.root.logger=INFO,cnsole Warning: No configuration directory set! Use --conf <dir> to override. Info: Including Hadoop libraries found via (/bin/hadoop) for HDFS access Info: Excluding /usr/hdp/2.4.3.0-227/hadoop/lib/slf4j-api-1.7.10.jar from classpath Info: Excluding /usr/hdp/2.4.3.0-227/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpath Info: Excluding /usr/hdp/2.4.3.0-227/tez/lib/slf4j-api-1.7.5.jar from classpath Info: Including HBASE libraries found via (/bin/hbase) for HBASE access Info: Excluding /usr/hdp/2.4.3.0-227/hbase/lib/slf4j-api-1.7.7.jar from classpath Info: Excluding /usr/hdp/2.4.3.0-227/hadoop/lib/slf4j-api-1.7.10.jar from classpath Info: Excluding /usr/hdp/2.4.3.0-227/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpath Info: Excluding /usr/hdp/2.4.3.0-227/tez/lib/slf4j-api-1.7.5.jar from classpath Info: Excluding /usr/hdp/2.4.3.0-227/hadoop/lib/slf4j-api-1.7.10.jar from classpath Info: Excluding /usr/hdp/2.4.3.0-227/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpath Info: Excluding /usr/hdp/2.4.3.0-227/zookeeper/lib/slf4j-api-1.6.1.jar from classpath Info: Excluding /usr/hdp/2.4.3.0-227/zookeeper/lib/slf4j-log4j12-1.6.1.jar from classpath Info: Including Hive libraries found via () for Hive access [root@master ~]# cat hdfs-example.conf # example.conf: A single-node Flume configuration # Name the components on this agent master.sources = webmagic master.sinks = k1 master.channels = c1 # Describe/configure the source master.sources.webmagic.type = spooldir master.sources.webmagic.fileHeader = true master.sources.webmagic.fileHeaderKey = fileName master.sources.webmagic.fileSuffix = .COMPLETED master.sources.webmagic.deletePolicy = never master.sources.webmagic.spoolDir = /opt/xiandian/ master.sources.webmagic.ignorePattern = ^$ master.sources.webmagic.consumeOrder = oldest master.sources.webmagic.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder master.sources.webmagic.batchsize = 5 master.sources.webmagic.channels = c1 # Use a channel which buffers events in memory master.channels.c1.type = memory # Describe the sink master.sinks.k1.type = hdfs master.sinks.k1.channel = c1 master.sinks.k1.hdfs.path = hdfs://master:8020/data/flume/%{dicName} master.sinks.k1.hdfs.filePrefix = %{fileName} master.sinks.k1.hdfs.fileType = DataStream |
30.在先电大数据平台部署Spark服务组件,打开Linux Shell启动spark-shell终端,将启动的程序进程信息提交。
答: [root@master ~]# spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://172.24.2.110:4040 Spark context available as 'sc' (master = local[*], app id = local-1519375873795). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.1.2.6.1.0-129 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77) Type in expressions to have them evaluated. Type :help for more information. scala> |
31.登录spark-shell,定义i值为1,sum值为0,使用while循环,求从1加到100的值,最后使用scala的标准输出函数输出sum值。
答: scala> var i=1 i: Int = 1 scala> var sum=0 sum: Int = 0 scala> while(i<=100){ | sum+=i | i=i+1 | } scala> println(sum) 5050 |
32.登录spark-shell,定义一个list为(1,2,3,4,5,6,7,8,9),然后利用map函数,对这个list进行元素乘2的操作。
答: scala> import scala.math._ import scala.math._ scala> val nums=List(1,2,3,4,5,6,7,8,9) nums: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9) scala> nums.map(x=>x*2) res18: List[Int] = List(2, 4, 6, 8, 10, 12, 14, 16, 18) |
33.登录spark-shell,定义一个list为("Hadoop","Java","Spark"),然后利用flatmap函数将list转换为单个字母并转换为大写。
答: scala> val data = List("Hadoop","Java","Spark") data: List[String] = List(Hadoop, Java, Spark) scala> println(data.flatMap(_.toUpperCase)) List(H, A, D, O, O, P, J, A, V, A, S, P, A, R, K) |
34.登录大数据云主机master节点,在root目录下新建一个abc.txt,内容为:
hadoop hive
solr redis
kafka hadoop
storm flume
sqoop docker
spark spark
hadoop spark
elasticsearch hbase
hadoop hive
spark hive
hadoop spark
然后登录spark-shell,首先使用命令统计abc.txt的行数,接着对abc.txt文档中的单词进行计数,并按照单词首字母的升序进行排序,最后统计结果行数。
答: scala> val words=sc.textFile("file:///root/abc.txt").count words: Long = 11 scala> val words=sc.textFile("file:///root/abc.txt").flatMap(_.split("\\W+")).map(x=>(x,1)).reduceByKey(_+_).sortByKey().collect words: Array[(String, Int)] = Array((docker,1), (elasticsearch,1), (flume,1), (hadoop,5), (hbase,1), (hive,3), (kafka,1), (redis,1), (solr,1), (spark,5), (sqoop,1), (storm,1)) scala> val words=sc.textFile("file:///root/abc.txt").flatMap(_.split("\\W+")).map(x=>(x,1)).reduceByKey(_+_).count words: Long = 12 |
35. 登录spark-shell,定义一个List(1,2,3,3,4,4,5,5,6,6,6,8,9),使用spark自带函数对这个list进行去重操作。
答: scala> val l = List(1,2,3,3,4,4,5,5,6,6,6,8,9) l: List[Int] = List(1, 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 8, 9) scala> l.distinct res1: List[Int] = List(1, 2, 3, 4, 5, 6, 8, 9) |
大数据(bigdata)练习题的更多相关文章
- 互联网+大数据解决方案(ppt)
from: 互联网+大数据解决方案(ppt) 导读:大数据(bigdata),或称巨量资料,指的是所涉及的资料量规模巨大到无法透过目前主流软件工具,在合理时间内达到撷取.管理.处理.并整理成为帮助企业 ...
- 《一张图看懂华为云BigData Pro鲲鹏大数据解决方案》
8月27日,华为云重磅发布了业界首个鲲鹏大数据解决方案--BigData Pro.该方案采用基于公有云的存储与计算分离架构,以可无限弹性扩容的鲲鹏算力作为计算资源,以支持原生多协议的OBS对象存储服务 ...
- 华为云BigData Pro解读: 鲲鹏云容器助力大数据破茧成蝶
华为云鲲鹏云容器 见证BigData Pro蝶变之旅大数据之路顺应人类科技的进步而诞生,一直顺风顺水,不到20年时间,已渗透到社会生产和人们生活的方方面面,.然而,伴随着信息量的指数级增长,大数据也开 ...
- 大数据学习之BigData常用算法和数据结构
大数据学习之BigData常用算法和数据结构 1.Bloom Filter 由一个很长的二进制向量和一系列hash函数组成 优点:可以减少IO操作,省空间 缺点:不支持删除,有 ...
- 【原创】Thinking in BigData (1)大数据简介
提到大数据,就不得不提到Hadoop,提到Hadoop,就不得不提到Google公布的3篇研究论文:GFS.MapReduce.BigTable,Google确实是一家伟大的公司,开启了全球的大数据时 ...
- BigData:值得了解的十大数据发展趋势
当今,世界无时无刻不在发生着变化.对于技术领域而言,普遍存在的一个巨大变化就是为大数据(Big data)打开了大门,并应用大数据技相关技术来改善各行业的业务并促进经济的发展.目前,大数据的作用已经上 ...
- 开源分布式计算引擎 & 开源搜索引擎 Iveely 0.5.0 为大数据而生
Iveely Computing 产生背景 08年的时候,我开始接触搜索引擎,当时遇到的第一个难题就是大数据实时并发处理,当时实验室的机器我们可以随便用,至少二三十台机器,可以,却没有程序可以将这些机 ...
- [Hadoop 周边] Hadoop和大数据:60款顶级大数据开源工具(2015-10-27)【转】
说到处理大数据的工具,普通的开源解决方案(尤其是Apache Hadoop)堪称中流砥柱.弗雷斯特调研公司的分析师Mike Gualtieri最近预测,在接下来几年,“100%的大公司”会采用Hado ...
- [Hadoop 周边] 浅谈大数据(hadoop)和移动开发(Android、IOS)开发前景【转】
原文链接:http://www.d1net.com/bigdata/news/345893.html 先简单的做个自我介绍,我是云6期的,黑马相比其它培训机构的好偶就不在这里说,想比大家都比我清楚: ...
随机推荐
- SQL 删除重复记录,只保留一条记录
DELETE FROM py_bond_shenzhen_exchange_opinion_2_1 WHERE id NOT IN (SELECT id FROM (SELECT min(id) AS ...
- c语言1博客作业07
一.本周作业头 这个作业属于那个课程 C语言程序设计II 这个作业要求在哪里 https://edu.cnblogs.com/campus/zswxy/SE2019-3/homework/9929 我 ...
- Java中的Listener 监听器
Listener的定义与作用 监听器Listener就是在application,session,request三个对象创建.销毁或者往其中添加修改删除属性时自动执行代码的功能组件. Listener ...
- partial 部分类
partial 关键字允许把类.结构.方法或接口放在多个文件中.一般情况下,一个类全部驻留在单个文件中.但有时,多个开发人员需要访问同一个类,或者某种类型的代码生成器生成了一个类的某部分,所以把类放在 ...
- CF350E Wrong Floyd
洛谷题目链接 前言: 这题其实真的不难 回归正题: 我们首先要明白$floyd$的思想,相信你都来做这道水题了,肯定不陌生,简单的手玩后,我们可以发现: 只要有任意一个点只跟非标记点相连的话,是更新不 ...
- UVA 11754 Code Feat 中国剩余定理+枚举
Code FeatUVA - 11754 题意:给出c个彼此互质的xi,对于每个xi,给出ki个yj,问前s个ans满足ans%xi的结果在yj中有出现过. 一看便是个中国剩余定理,但是同余方程组就有 ...
- 在Android中使用OpenGL ES进行开发第(二)节:定义图形
一.前期基础知识储备笔者计划写三篇文章来详细分析OpenGL ES基础的同时也是入门关键的三个点: ①OpenGL ES是什么?与OpenGL的关系是什么?——概念部分 ②使用OpenGLES绘制2D ...
- pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path && FileNotFoundError: [WinError 2] 系统找不到指定的文件。
C:\Users\k\Desktop\test>python test.py Traceback (most recent call last): File , in run_tesseract ...
- MySQL的概述和基础(学习整理)
1. 数据库基本概念 数据库(DataBase,DB)是用来存储和管理数据的仓库.与其他种类存储和管理数据的方式有所不同的是,数据库是兼持久化存储数据.便捷存储管理数据.使用统一的方式操作数据库几个特 ...
- java基础篇之Object类
1.Object类是所有类的超类 2.Object类的equals方法 public boolean equals(Object obj) {return (this == obj);} equals ...