通过 flume 上传数据到hive
目标: 通过接受 1084端口的http请求信息, 存储到 hive数据库中,
osgi为hive中创建的数据库名称
periodic_report6 为创建的数据表, flume配置如下:
a1.sources=r1
a1.channels=c1
a1.sinks=k1
a1.sources.r1.type = http
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 1084
a1.sources.r1.handler=jkong.test.PlainJSONHandler2
#a1.sources.r1.interceptors=i1 i2
#a1.sources.r1.interceptors.i1.type=regex_filter
#a1.sources.r1.interceptors.i1.regex=\\{.*\\}
#a1.sources.r1.interceptors.i2.type=timestamp
a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=1000
a1.channels.c1.keep-alive=30
a1.sinks.k1.type=hdfs
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.path=hdfs://hadoop:9000/user/hive/warehouse/osgi.db/periodic_report6/day=%y-%m-%d/mf=%{manufacture}/sn=%{deviceId}
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.rollInterval=0
a1.sinks.k1.hdfs.rollSize=67108864
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.idleTimeout=60
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
2. 数据表创建: 格式化json需要多加2个jar包json-serde-1.3.8-jar-with-dependencies.jar 和json-udf-1.3.8-jar-with-dependencies.jar, 具体参考如下flume存储数据到hive) hive 安装参考 程序安装中的 hive 安装
链接:https://pan.baidu.com/s/1suPzGJmtJlsROC6SVpcztQ 密码:zlgg
create table period_data(deviceId STRING,actualTime STRING, manufacture STRING, information STRING) partitioned by (day string, mf string, sn string) row format serde "org.openx.data.jsonserde.JsonSerDe" WITH SERDEPROPERTIES("deviceId"="$.deviceId","actualTime"="$.actualTime","manufacture"="$.manufacture","information"="$.information");
2.1 将数据表中的字段也同样拆分成数据字段的创表语句(还没有试验, 暂时不用)
create table periodic_report4(id BIGINT, deviceId STRING,report_time STRING,information STRUCT<actualTime:BIGINT,dpiVersionInfo:STRING,subDeviceInfo:STRING,wanTrafficData:STRING,ponInfo:STRING,eventType:STRING,potsInfo:STRING,deviceInfo:STRING,deviceStatus:STRING>) row format serde "org.openx.data.jsonserde.JsonSerDe" WITH SERDEPROPERTIES("input.invalid.ignore"="true","id"="$.id","deviceId"="$.deviceId","report_time"="$.report_time","requestParams.actualTime"="$.requestParams.actualTime","requestParams.dpiVersionInfo"="$.requestParams.dpiVersionInfo","requestParams.subDeviceInfo"="$.requestParams.subDeviceInfo","requestParams.wanTrafficData"="$.requestParams.wanTrafficData","requestParams.ponInfo"="$.requestParams.ponInfo","requestParams.eventType"="$.requestParams.eventType","requestParams.potsInfo"="$.requestParams.potsInfo","requestParams.deviceInfo"="$.requestParams.deviceInfo","requestParams.deviceStatus"="$.requestParams.deviceStatus");
3. 启动flume语句:flume 根目录
bin/flume-ng agent --conf ./conf/ -f ./conf/flume.conf --name a1 -Dflume.root.logger=DEBUG,console //带 log 启动 nohup ./flume-ng agent --conf .././conf/ -f .././conf/flume.conf1 --name a1 & 后台启动
4. 启动hive语句: hive bin目录
./hive #启动 hive 客户端
./hive -hiveconf hive.root.logger=DEBUG,console #带log信息启动 ./hiveserver2 #启动 hive2 服务器
nohup ./hiveserver2 & 后台启动 hive2 服务器
5. flume 数据过滤类 , 链接 hive 创建 patition, 需要将 jar 包拷贝到 flume 中的lib目录(链接:https://pan.baidu.com/s/1GR1xbmXwFT_-t7rJJcPvgA 密码:nbv9)
package jkong.test; import java.io.BufferedReader;
import java.io.IOException;
import java.io.PrintWriter;
import java.nio.charset.Charset;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map; import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse; import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.event.EventBuilder;
import org.apache.flume.source.http.BidirectionalHTTPSourceHandler;
import org.apache.flume.source.http.JSONHandler;
import org.json.JSONObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory; public class PlainJSONHandler2 implements BidirectionalHTTPSourceHandler{
private static final Logger LOG = LoggerFactory.getLogger(JSONHandler.class);
private static int data_number = 0;
@Override
public void configure(Context cont) {
data_number = 0;
} @Override
public List<Event> getEvents(HttpServletRequest request, HttpServletResponse respose) {
String readLine = null;
String deviceSN = null;
String actualTime = null;
Map<String, String> headers = null;
try {
if(data_number > 65536)
data_number = 0; if(data_number%800 != 0){
return null;
} BufferedReader reader = request.getReader();
String charset = request.getCharacterEncoding(); if (charset != null) {
LOG.debug("Charset is " + charset);
charset = "UTF-8";
} readLine = reader.readLine(); headers = new HashMap<String, String>(); if(readLine != null){
int start = readLine.indexOf("deviceId");
deviceSN = readLine.substring(start+11, start+23);
start = readLine.indexOf("actualTime");
actualTime = readLine.substring(start+12, start+25);
String manufacture = deviceSN.substring(0, 3);
headers.put("deviceId", deviceSN);
headers.put("manufacture", manufacture); MyRunnable R1 = new MyRunnable(deviceSN);
R1.start(); JSONObject json = new JSONObject();
json.put("deviceId", deviceSN);
json.put("actualTime", actualTime);
json.put("manufacture", manufacture);
json.put("information", readLine);
readLine = json.toString();
}
String result = getResult(deviceSN);
PrintWriter writer = respose.getWriter();
writer.println(result);
writer.flush();
writer.close();
result = "";
} catch (IOException e) {
e.printStackTrace();
}
return getSimpleEvents(readLine, headers);
} public String getResult(String deviceSN){
// long currentTime = System.currentTimeMillis();
// Date curDate = new Date(currentTime);
String result = new String("{\"result\": 0,\"timeStamp\": \"2018-08-14\",\"periodConfigParameter\": {\"uploadConfig\": {\"msgreportInterval\": \"36000\"}}}");
// SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
// String showTime = formatter.format(curDate);
return result;
} @Override
public void onChannelException(HttpServletRequest request, HttpServletResponse response, Exception ex) { } @Override
public void onSuccessfulCommit(HttpServletRequest request, HttpServletResponse response) { } private List<Event> getSimpleEvents(String events, Map<String, String> headers) {
if(events == null)
return null;
List<Event> newEvents = new ArrayList<Event>();
newEvents.add(EventBuilder.withBody(events, Charset.forName("UTF-8"), headers));
System.out.println("info: " + newEvents.toString());
return newEvents;
}
} class MyRunnable implements Runnable {
private Thread t;
private String deviceSN; private String connUrl = "jdbc:hive2://localhost:10000/osgi";
private String userName = "hive";
private String passWord = "hive";
private Connection conn = null;
private String tableName = "period_data";
private boolean isHasPartition = false; MyRunnable(String deviceSN) {
this.deviceSN = deviceSN;
} public void run() {
Date date = new Date();
SimpleDateFormat sd = new SimpleDateFormat("yy-MM-dd");
String day = sd.format(date); String manufacture = deviceSN.substring(0, 3);
addPartition(day, manufacture, deviceSN);
} public void start() {
if (t == null) {
t = new Thread(this, deviceSN);
t.start();
}
} public void addPartition(String day, String manufacture, String deviceSN) {
try {
if (null == conn) {
conn = getConnect(userName, passWord, connUrl);
}
Statement stmt = conn.createStatement();
String addPartition = "alter table "+tableName+" add partition (day='"+day+"', mf='"+manufacture+"', sn='"+deviceSN+"')";
System.out.println(addPartition);
String showPartitions = "show partitions "+ tableName;
System.out.println(showPartitions);
ResultSet res = stmt.executeQuery(showPartitions); while (res.next()) {
System.out.println("已创建分区: "+res.getString(1));
if (("day="+day+"/mf="+manufacture+"/sn="+deviceSN+"").equals(res.getString(1))) {
isHasPartition = true;
}
} if (!isHasPartition) {
System.out.println("开始创建分区!!!");
stmt.executeUpdate(addPartition);
}
isHasPartition = false; } catch (SQLException e) {
e.printStackTrace();
}
} public Connection getConnect(String userName, String passWord, String connUrl) {
String driverName = "org.apache.hive.jdbc.HiveDriver";
Connection conn = null;
try {
Class.forName(driverName);
conn = DriverManager.getConnection(connUrl, userName, passWord);
} catch (ClassNotFoundException e) {
System.out.println("没有找到驱动类");
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}
return conn;
}
}
package jkong.test;
import java.io.BufferedReader;import java.io.IOException;import java.io.PrintWriter;import java.nio.charset.Charset;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.SQLException;import java.sql.Statement;import java.text.SimpleDateFormat;import java.util.ArrayList;import java.util.Date;import java.util.HashMap;import java.util.List;import java.util.Map;
import javax.servlet.http.HttpServletRequest;import javax.servlet.http.HttpServletResponse;
import org.apache.flume.Context;import org.apache.flume.Event;import org.apache.flume.event.EventBuilder;import org.apache.flume.source.http.BidirectionalHTTPSourceHandler;import org.apache.flume.source.http.JSONHandler;import org.json.JSONObject;import org.slf4j.Logger;import org.slf4j.LoggerFactory;
public class PlainJSONHandler2 implements BidirectionalHTTPSourceHandler{private static final Logger LOG = LoggerFactory.getLogger(JSONHandler.class);private static int data_number = 0;@Overridepublic void configure(Context cont) {data_number = 0;}
@Overridepublic List<Event> getEvents(HttpServletRequest request, HttpServletResponse respose) {String readLine = null;String deviceSN = null;String actualTime = null;Map<String, String> headers = null;try {if(data_number > 65536)data_number = 0;if(data_number%800 != 0){return null;}BufferedReader reader = request.getReader();String charset = request.getCharacterEncoding();if (charset != null) { LOG.debug("Charset is " + charset); charset = "UTF-8";}readLine = reader.readLine();headers = new HashMap<String, String>();if(readLine != null){ int start = readLine.indexOf("deviceId"); deviceSN = readLine.substring(start+11, start+23); start = readLine.indexOf("actualTime"); actualTime = readLine.substring(start+12, start+25); String manufacture = deviceSN.substring(0, 3); headers.put("deviceId", deviceSN);headers.put("manufacture", manufacture);MyRunnable R1 = new MyRunnable(deviceSN);R1.start();JSONObject json = new JSONObject();json.put("deviceId", deviceSN);json.put("actualTime", actualTime);json.put("manufacture", manufacture);json.put("information", readLine);readLine = json.toString();}String result = getResult(deviceSN);PrintWriter writer = respose.getWriter(); writer.println(result); writer.flush(); writer.close(); result = "";} catch (IOException e) {e.printStackTrace();}return getSimpleEvents(readLine, headers);}public String getResult(String deviceSN){// long currentTime = System.currentTimeMillis();//Date curDate = new Date(currentTime);String result = new String("{\"result\": 0,\"timeStamp\": \"2018-08-14\",\"periodConfigParameter\": {\"uploadConfig\": {\"msgreportInterval\": \"36000\"}}}");//SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");//String showTime = formatter.format(curDate); return result;}
@Overridepublic void onChannelException(HttpServletRequest request, HttpServletResponse response, Exception ex) {}
@Overridepublic void onSuccessfulCommit(HttpServletRequest request, HttpServletResponse response) {}private List<Event> getSimpleEvents(String events, Map<String, String> headers) {if(events == null)return null; List<Event> newEvents = new ArrayList<Event>(); newEvents.add(EventBuilder.withBody(events, Charset.forName("UTF-8"), headers)); System.out.println("info: " + newEvents.toString()); return newEvents; }}
class MyRunnable implements Runnable {private Thread t;private String deviceSN;private String connUrl = "jdbc:hive2://localhost:10000/osgi";private String userName = "hive";private String passWord = "hive";private Connection conn = null;private String tableName = "period_data";private boolean isHasPartition = false;
MyRunnable(String deviceSN) {this.deviceSN = deviceSN;}
public void run() {Date date = new Date(); SimpleDateFormat sd = new SimpleDateFormat("yy-MM-dd"); String day = sd.format(date); String manufacture = deviceSN.substring(0, 3); addPartition(day, manufacture, deviceSN);}
public void start() {if (t == null) {t = new Thread(this, deviceSN);t.start();}}public void addPartition(String day, String manufacture, String deviceSN) {try {if (null == conn) {conn = getConnect(userName, passWord, connUrl);}Statement stmt = conn.createStatement();String addPartition = "alter table "+tableName+" add partition (day='"+day+"', mf='"+manufacture+"', sn='"+deviceSN+"')";System.out.println(addPartition);String showPartitions = "show partitions "+ tableName;System.out.println(showPartitions);ResultSet res = stmt.executeQuery(showPartitions);while (res.next()) {System.out.println("已创建分区: "+res.getString(1));if (("day="+day+"/mf="+manufacture+"/sn="+deviceSN+"").equals(res.getString(1))) {isHasPartition = true;}}
if (!isHasPartition) {System.out.println("开始创建分区!!!");stmt.executeUpdate(addPartition);}isHasPartition = false;
} catch (SQLException e) {e.printStackTrace();}}public Connection getConnect(String userName, String passWord, String connUrl) {String driverName = "org.apache.hive.jdbc.HiveDriver";Connection conn = null;try {Class.forName(driverName);conn = DriverManager.getConnection(connUrl, userName, passWord);} catch (ClassNotFoundException e) {System.out.println("没有找到驱动类");e.printStackTrace();} catch (SQLException e) {e.printStackTrace();}return conn;}}
通过 flume 上传数据到hive的更多相关文章
- 重新想象 Windows 8.1 Store Apps (89) - 通信的新特性: 下载数据, 上传数据, 上传文件
[源码下载] 重新想象 Windows 8.1 Store Apps (89) - 通信的新特性: 下载数据, 上传数据, 上传文件 作者:webabcd 介绍重新想象 Windows 8.1 Sto ...
- TortoiseGit和msysGit安装及使用笔记(windows下使用上传数据到GitHub)[转]
TortoiseGit和msysGit安装及使用笔记(windows下使用上传数据到GitHub) Git-1.7.11-preview+GitExtensions244SetupComplete+T ...
- Amzon MWS API开发之 上传数据
亚马逊上传数据,现有能操作的功能有很多:库存数量.跟踪号.价格.商品....... 我们可以设置FeedType值,根据需要,再上传对应的xml文件即可. 下面可以看看FeedType类型 这次我们拿 ...
- Amazon MWS 上传数据 (三) 提交请求
前面介绍了设置服务和构造请求,现在介绍提交请求. 上传数据,查询上传操作的工作状态,和处理上传操作返回的报告操作使用的Amazon API 分别为:SubmitFeed(),FeedSubmissio ...
- Amazon MWS 上传数据 (二) 构造请求
上一篇文章提到了Amazon 上传数据有三个步骤,但是每个步骤都需要构造服务和构造请求,服务是一样的,请求各不相同:这个很容易理解,这三个步骤都需要和Amazon服务器交互,所以他们的服务构造是一样的 ...
- Amazon MWS 上传数据 (一) 设置服务
Amazon 上传数据的流程为: 通过 SubmitFeed 操作.加密标头和所有必需的元数据(包括 FeedType 的值在内),来提交 XML 或文本型数据文件.正如亚马逊 MWS的所有提交内容一 ...
- 说说ajax上传数据和接收数据
我是一个脑袋不太灵光的人,所以遇到问题,厚着脸皮去请教大神的时候,害怕被大神鄙视,但是还是被鄙视了.我说自己不要点脸面,那是不可能的,但是,为了能让自己的技术生涯能走的更长远一些,受点白眼,受点嘲笑也 ...
- webclient上传数据到ashx服务
1.上传参数 UploadData()方法可以上传数据参数,需要将所要上传的数据拼成字符. // 创建一个新的 WebClient 实例. WebClient myWebClient = new ...
- ueditor富文本上传图片的时候报错"未找上传数据"
最近因为需求所以在ssh项目中使用了Ueditor富文本插件,但是在上传图片的时候总是提示“未找到上传数据”,之后百度了好久终于弄明白了.因为Ueditor在上传图片的时候会访问controller. ...
随机推荐
- 解决selenium与firefox版本不兼容问题
Python环境下类比 个人使用 32位环境 Python 2.7.12 Selenium 2.53.6 Firefox 47.01 安装selenium可用pip选择对应版本,参考另一教程. 因为在 ...
- 关于wamp中升级PHP+Apache 的问题
首先个人不建议wamp中升级php版本,如果你不信可以试一试,当你php升级后发想,奥,Apache版本不匹配,然后又去升级Apache,结果搞了半天,弄出来了就好,要是没出来,可能你会气死(好吧,气 ...
- 【linux命令】setterm控制终端属性命令(中英文)
[linux命令]setterm控制终端属性命令(中英文) 2018年03月23日 17:13:44 阅读数:489 标签: linux 更多 个人分类: linux 摘自:https://blog. ...
- js-修改url中某个指定的参数的值
/* * url 目标url * arg 需要替换的参数名称 * arg_val 替换后的参数的值 * return url 参数替换后的url */ function changeURLArg(ur ...
- DFS小题
原创 题目为:()()()+()()()=()()() 将1~9这9个数字填入括号,每个数字只能用一次. 枚举: public class Test { public static void main ...
- RegExp正则表达式对象
JavaScript的RegExp对象有两种创建方式,一种是字面量,一种是对象. var r = /pattern/attributes或者new RegExp(pattern, attributes ...
- 使用VS Code编写Markdown文件
VS Code默认支持Markdown文件文件格式,这里介绍两个比较实用的功能,后续有新发现,可以持续更新. 实时预览 顾名思义,实时编辑,实时预览解析效果. 在VS Code扩展中搜索"M ...
- WeStrom自定义设置修改快捷键
按照下图操作,不BB: 终极懒人设置:!!!
- Bitnami WordPress无法修改MySQL root的默认密码的解决方法?
今天准备修改Bitnami WordPress的MySQL root的默认密码,但是总是出现下面错误: ERROR 1045 (28000): Access denied for user 'root ...
- 新增扩展程序功能打包提交新版 WARNING ITMS-90473 警告问题
1.问题描述 自从在主应用中加入SiriShortCut功能之后,打包程序上传至 iTunes Connect 就会出现警告,看其原因描述是CFBundleVersion主应用与子应用的不一致导致的 ...