Hadoop.2.x_WebUV示例

一、网站基本指标(即针对于网站用户行为而产生的日志中进行统计分析)

1. PV:网页浏览量(Page View页面浏览次数,只要进入该网页就产生一条记录,不限IP,统计点每天(较多)/每周/每月/..)

2. UV:独立访客数(Unique Vistor,以Cookie为依据,同一天内一个用户多次访问,只记为一个)

3. VV:访客的访问次数(Visit View,以Session为依据,访客访问网站到关掉该网站所有页面即记为一次访问)

4. IP:独立IP数(即记录不同IP,同一IP访问多次算作一次)

5. 通常网站流量(traffic)是指网站的访问量,是用来描述访问一个网站的用户数量以及用户所浏览的网页数量等指标

   对于虚拟空间商来说流量是指:用户在访问网站过程中,产生的数据量大小

二、UV统计示例(也就是每天每个省份有多少人访问了该网站)

1. 分析需求

    1> 我们得到的是怎样的数据,找出共同点,会议map,shuffle,reduce都做了什么事

    2> 我们想要怎样的数据,列举出来

2. 实施计划注意的地方

    1> 数据以什么分隔,是否我们需要自定义数据类型

    2> 大致上我们需要过滤掉无效的记录

       使用自定义数据类型将我们需要的字段组合起来

       然后根据省份对记录累加(去重阶段)

    3> 数据类型可以不定义,使用Text将字段值组合即可

       然后转到reduce方法,建立hashMap对于数据根据时间和省份递增存储

       在cleanup中组合成想要的输出数据即可

三、UV统计代码示例

WebUvMr.java

============

package com.bigdata_senior.WebUvMr;

import java.io.IOException;

import java.util.HashMap;

import java.util.Map;

import java.util.Set;

import org.apache.commons.lang.StringUtils;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WebUvMr {

	//Mapper Class

	private static class WordCountMapper extends Mapper<LongWritable, Text, Text, NullWritable>{

		private Text mapOutKey = new Text();

		@Override

		public void map(LongWritable key, Text value, Context context)

				throws IOException, InterruptedException {

			String lineValue = value.toString();

			String [] strValue = lineValue.split("\t");

			if(30 > strValue.length){

				return;

			}

			String guidIdValue = strValue[5];

			if(StringUtils.isBlank(guidIdValue)){

				return;

			}

			String trackTimeValue = strValue[17];

			if(StringUtils.isBlank(trackTimeValue)){

				return;

			}

			String dateValue = trackTimeValue.substring(0,10);

			Integer proviceIdValue = Integer.MAX_VALUE;

			try{

				if(StringUtils.isBlank(strValue[23])){

					return;

				}

				proviceIdValue = Integer.valueOf(strValue[23]);

			}catch(Exception e){

				return;

			}

			mapOutKey.set(dateValue+"\t"+proviceIdValue+"_"+guidIdValue);

			//System.out.println("key--> "+mapOutKey+" value--> "+NullWritable.get());

			context.write(mapOutKey, NullWritable.get());

		}

	}

	//Reduce Class

	private static class WordCountReduce extends Reducer<Text, NullWritable, Text, LongWritable>{

		private Map<String,Integer> dateMap;

		private Text outputKey = new Text();

		private LongWritable outputvalue = new LongWritable();

		@Override

		protected void setup(Context context) throws IOException,

				InterruptedException {

			dateMap = new HashMap<String,Integer>();

		}

		@Override

		public void reduce(Text key, Iterable<NullWritable> values,Context context)

				throws IOException, InterruptedException {

			String date = key.toString().split("_")[0];

			if(dateMap.containsKey(date)){

				Integer preUv = dateMap.get(date);

				//System.out.println("====-> "+preUv);

				Integer uv = preUv + 1;

				dateMap.put(date, uv);

			}else{

				dateMap.put(date, 1);

			}

			//System.out.println(dateMap.toString());

		}

		@Override

		protected void cleanup(Context context) throws IOException,

				InterruptedException {

			Set<String> dateSet = dateMap.keySet();

			//System.out.println(dateSet.toString());

			for(String date : dateSet){

				Integer uv = dateMap.get(date);

				outputKey.set(date);

				outputvalue.set(uv);

				System.out.println("result:-->key "+outputKey+" value-->"+outputvalue);

				context.write(outputKey, outputvalue);

			}

		}

	}

	//Driver

	public int run(String[] args) throws Exception {

		Configuration configuration = new Configuration();

		Job job = Job.getInstance(configuration, this.getClass().getSimpleName());

		job.setJarByClass(this.getClass());

		//input

		Path inPath = new Path(args[0]);

		FileInputFormat.addInputPath(job,inPath);

		//output

		Path outPath = new Path(args[1]);

		FileOutputFormat.setOutputPath(job, outPath);

		//mapper

		job.setMapperClass(WordCountMapper.class);

		job.setMapOutputKeyClass(Text.class);

		job.setMapOutputValueClass(NullWritable.class);

		//Reduce

		job.setReducerClass(WordCountReduce.class);

		job.setOutputKeyClass(Text.class);

		job.setOutputValueClass(LongWritable.class);

		//submit job

		boolean isSuccess = job.waitForCompletion(true);

		return isSuccess ? 0 : 1;

	}

	public static void main(String[] args) throws Exception {

		args = new String[]{

			"hdfs://hadoop09-linux-01.ibeifeng.com:8020/user/liuwl/tmp/webuv/input",

			"hdfs://hadoop09-linux-01.ibeifeng.com:8020/user/liuwl/tmp/webuv/output4"

		};

		//run job

		int status = new WebUvMr().run(args);

		System.exit(status);

	}

}

Hadoop.2.x_WebUV示例的更多相关文章

【爬坑】运行 Hadoop 的 MapReduce 示例卡住了
1. 问题说明在以伪分布式模式运行 Hadoop 自带的 MapReduce 示例,卡在了 Running job ,如图所示 2. 解决过程查看日志没得到有用的信息再次确认配置信息没有错误信息 ...
Hadoop Map/Reduce 示例程序WordCount
#进入hadoop安装目录 cd /usr/local/hadoop #创建示例文件:input #在里面输入以下内容: #Hello world, Bye world! vim input #在hd ...
hadoop: hdfs API示例
利用hdfs的api,可以实现向hdfs的文件.目录读写,利用这一套API可以设计一个简易的山寨版云盘,见下图: 为了方便操作,将常用的文件读写操作封装了一个工具类: import org.apach ...
Hadoop - WordCount代码示例
文章来源:http://www.itnose.net/detail/6197823.html import java.io.IOException; import java.util.Iterator ...
Hadoop应用程序示例2
Hadoop应用程序示例
Hadoop: MapReduce2的几个基本示例
1) WordCount 这个就不多说了,满大街都是,网上有几篇对WordCount的详细分析 http://www.sxt.cn/u/235/blog/5809 http://www.cnblogs ...
Hadoop示例程序WordCount编译运行
首先确保Hadoop已正确安装及运行. 将WordCount.java拷贝出来 $ cp ./src/examples/org/apache/hadoop/examples/WordCount.jav ...
实训任务02：Hadoop基础操作
实训任务02:Hadoop基础操作班级学号姓名实训1:创建测试文件上传HDFS,并显示内容需求说明: 在本地计算机上创建测试文件helloH ...

随机推荐

Axure 全局辅助线(转)
普通辅助线作用于当前页全局作用于所有页面 , 包括新建页面创建普通辅助线直接拉出来创建全局辅助线 , 在拉出来的时候按住 Ctrl 默认情况下 , 颜色不同辅助线可以多选 , 用拖选或按 ...
HorizontalScrollView
HorizontalScrollView 链接
unfortunately launcher has stopped
设定虚拟机的配置.
Codeforces Round #276 (Div. 1) E. Sign on Fence 二分+主席树
E. Sign on Fence Bizon the Champion has recently finished painting his wood fence. The fence consi ...
loj 1429(可相交的最小路径覆盖)
题目链接:http://lightoj.com/volume_showproblem.php?problem=1429 思路:这道题还是比较麻烦的,对于求有向图的可相交的最小路径覆盖,首先要解决成环问 ...
xUtils,butterknife...处理findviewbyid
在写android中,经常要出现大量的findviewbyid et_path = (EditText) findViewById(R.id.et_path); tv_info = (TextVi ...
ubuntu 安装fcitx输入法
ubuntu 14 的环境我用的ibus输入法和firefox 36.0.4 版本相互冲突,有bug.在输入栏无法选中,以及复制.查其原因是ibus输入法有问题,需要重新换个输入法. 我先卸载了ib ...
jQuery-品牌列表案例
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title> ...
1.ARC和非ARC文件共存
1.ARC和非ARC文件共存项目->Build Parses->对应的类 1.1.新项目兼容老的非ARC:-fno-objc-arc 1.2.老项目兼容ARC:-fobjc-arc
【bzoj2440】【bzoj3994】莫比乌斯反演学习
哇..原来莫比乌斯代码这么短..顿时感觉逼格-- 写了这道题以后,才稍稍对莫比乌斯函数理解了一些定理:和是定义在非负整数集合上的两个函数,并且满足条件,那么我们得到结论在上面的公式中有一个函数,它 ...

Hadoop.2.x_WebUV示例

Hadoop.2.x_WebUV示例的更多相关文章

随机推荐

热门专题