hadoop的自定义数据类型和与关系型数据库交互
最近有一个需求就是在建模的时候,有少部分数据是postgres的,只能读取postgres里面的数据到hadoop里面进行建模测试,而不能导出数据到hdfs上去。
读取postgres里面的数据库有两种方法,一种就是用hadoop的DBInputFormat(DBInputFormat在hadoop2.4.1的jar里面有两个包,import
org.apache.hadoop.mapreduce.lib.db包和org.apache.hadoop.mapred包,前者是较新的),另外一种就是postgres的CopyManager类。
先说一说用DBInputFormat这个方法吧。
首先在数据库里面创建一个表,插入几条数据测试用
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAawAAAB2CAIAAAAbeDAGAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAA7EAAAOxAGVKw4bAAATF0lEQVR4nO2dPW/iShfHjx9tv3yEK9beKLqiiUQRmlshrdkUbJGUhApLKVZOkVsgpYqUYlNgbRHJqdhI2yTFUnDxSjQ3jVMgpYkiROzlM3DrLfwUMzZ+AwxxwPacXwXOMJ74zBzP6/9w379/BwRBEFZ5AwC/f//edDEQBEHWTbVaBeIEAeDw8HCjhfHz7du3pBUJeTlo1iTDmnW+fftGPrxxLv33338bKsxMElgk5OWgWZMMg9b536YLgCAIsknQCSIIwjToBBEEYZoMOcG+nMvlcnJ/+iG5jC/LuVyufDnedEGST5rMiiwPMetGm0KGnOAyrK1BEW+HTXc9oJ9EVuDN4iTIC8gf9SdHmy4EgiCzSUlPkLziCeXLMe1fTS8sldn4srzfBgBo7+dCspl23mhPvVwO3MhdgPkdD3fnZPqrneZg2UeQRdJr1szjmYWQ+xD2cGgaWQ5a0F/hXc85mLP3nusfGqfBCfbl3H4biucPk8lkclsnF+u3E/vroCkt9djyR32SC82DZjr48XMMAOOfPwYAUP9YttN/UieTycN5EQbNHbkPML4s7zQH9Vt6tb0fyWrkV/SeD+fFJQqcTbJh1mzz9IW+WOY8nDZ8tA22I4FK6nb7C/37LIM6OTuML7+0AYrnD/2j/Ov/Z25S4AT7/7QBoP43eTRlpX+Uzx/1FVKZyx/rADB4/vWiW5Q/nxdpcyGNpXj+2Wksfwp5AMh/+FQEgCdj3P/aHNitKS/8GfH2wXwZJxtmzTqf1MlkMlFgzsOpfywDvHtfBIDipw8e/zXPoDRn2x7PX6XmAIrn6ro9IKR3TpB0I2Ij/+FTsTkY/PjZhx+DoDEBqO0Hjhnb+7np/Z+MMZTnG+/X8wDslofMInVmzTje+up/OB/n/HLw/AsgP9ugvpYwaLdDrq6JFPQEg7gHUvY46oXkj/6uAwya+80ZjQXGxhMAFN+/I1+dMddkMplE6MCTV+WTwfwIazZpNCtTLPtwljBo8fy8DgDt/U1MxabACZKetD3JML6ULw0AoG8NUoeXJsQlkdsATIdoHshoqfjpQ54MspxJDxhfliNYjoy76AQV7RcyTTbMyggveTgRDfpBua2D+x7rIwVOEMrK5LYOg+YOWVd9gg/EJPu5XE76sVKW+SOVzIi7lrLs5uKaOwcAe7GRvNL6R3mA/FH/4bxo/zjqSq/nll+ecGEkG2ZlhJUeTnk5g9IpXLJMtU6479+///79+/DwMFHqEZ1OZ+1FotMX9VtntpZccV1AXgaaNclswjqbpNPpED3BNPQEV8O7R2nx/jOyWrna8u2y90JWZp1mRdggravDi1n2rEZZmUyUhZfiuReyMus0K8IG2e0JIgiCRACd4IrQIz4vnMNNgIRGgkBtnSVZvRKuUPGyax10guEsUH/py77Jdnp1zuyU+5ysk3FZIee2ljsghryARCnNvEhkyFMJvdXLnWmqKt5GrINOMJz8Ud9zqseDfcrRM9nunN8PxamwdN+osyuUbuel+wcRtphbzeYTWgnd25lJpljxFpMGJ0j74bLsk/3w9M9dXxaKWwDMlsSwxS3cryS/2MnP4BmsvrzTHBTr9Rnb/+gaJTmXQLb0tv+hZSl/rLNaGX0Pll59/hpqaF9Cn71mKdB4b9H3K83M10cJKp14u27OoHJxCb3lcrJx0syq5BDs5JUvx85Z9LBDMJ4n/IKKl23ruEiDEyQMnt6rjhzF18X95TniFnMkMYLiFkGxk1/Pvuo3vizvt4vnD/3P78PLQvfL05OR5AT69GDDu/dFFk/rh6rIwAxDRxEjCaYJ3OKdX2lGKS9VGVyCC+AcuSM1YUEJ6/U6zJG0mfm/0wFGQHwoUAkBwKUiZjuolSseI9YBgDQ5QfrfRD6DO1vcYq5eiE/cIkTsBIwncB/17ss7zUH9dvWDpqQMrB0rDlGRIX8IGjqKGElYmpm3cBdjmcoAnuOPrs5YhBIq8yRtbAL/+6ykY18lhLLijIPpm3/xEsaciseMdQDS5ATjYfr/05cmeZk4FWFpGQsyNUNzI6eJBs2dpEy8ZwVn2OR/97vsNTNNBJaoDNN25umMLS6h88N+tJEsAKwmPkSOn61xbJFi6wAAe05wysp6Id73J53aJpDhSvH8wTfV7R2HeMcoIW91xEMUMZIXKtAsUxloO3v++k/bbmXR7r5Y0ibIrIFPxNHDGipeBqyTZifo6R8vocuylCRGUOxkTKZSFq9kuPZiebrhpKjTCSHyzrLFnFgh5MHOTR9FjMSXJvQWPreygj4KaWftdtvnyhaXcIGkTdi9SB0Pig/5KmFfng4+qDLO+efy6hWPKeuk2Qm6VSqW0mVZShIjIHbiU8WaCXngtLaXFbI/we6S12+dvmJ/+s5iiuCDnZUwghhJeJqwW/iVZlbQR6Hz71OrRZZLmSVpM+cXdEOfX3woWAmdRRGXMs7qFY8l66CKzEpEECIZX5Z3mn8ulCohy1+biKywAZJu1lcnKGmzDL668jI1nGDFY806DKjIvCrO+3Vm53z888dgsXZJX97ZWGQFZO2sImnj2krnc1qLK+G8XLHiOWRXReaVWSRFEk3tBAVNmGIVc8/7zerVByueC+wJIgjCNOgEEQRhGrowsuliIAiCrBuyMELnBA8PDzdaGD/fvn0jhy2RLPHvv//+9ddfmy4FEs7d3R1T1rm7u3v79i24F0YSuDRuWdami4DEBsdxmy4CgoSAc4IIgjANOkEEQZgGnSCSCExFUsxVfmeu8itkObJtHXSCSCLg5ROolaYtLdh8NImTNOejk7Rbcy47mJpmun7mUAq0ZFMphf8BcZNt68ToBF8vGhV5UoGHiaQbWsVtanB9DTXnWy3YfBpVkXwQthsHe3zgstOkhErlQqPZd6q9xm7LsCzLsq73/GXgZd2yLOsaLtANemHIOvE4wb4cUeFhFZRSBXpWa/dVMkc2B79VaPQsB13maaW3LMvSdd1SxcWZ+Gj0LMsyWruNqsjvHTRahisTTRKEWnhrMoZXxxf4knXDkHXicYJlxRsAIVbklZ43kiE0RTHN0SMAaJLTAdGkBcMkXq4Op81KkzpVy9JlPpjQVM4eWy3ooBdcidRbB+cEkaRjKmc3AACFbQHEKpzRliOeHNws6B+IqtOsNKiqoquRutAujgunsnyy3cEh8fJkwDroBJFNQXoPi5MZW9e6zBtD2OI9LYeX9dlDhM50xr1ydX8sdDodjutUqyN/UzKVM+ipor9rwjwMWQedILIpjCFsC4uT8aLIA5ijSImvKhzHCcf3UFWd6axeY7dlqKpqWaooyt4RlybV4NpurKJ6OhRw/Y3AkHXQCSIbwhyBaxFxAdrFzXaUxPZcfqRZZFMpdaqeeShR7UEFN8wAW9ZJgRMka+nH91cV3CiTIcwuVMPmwUPTKp3ta1diYxg+VBPV6EtomsS5uhmuLCzj4EZgfe8gU9aJR1mahk4AIEF3mysHPghD1i05pqyQpKBJNTjRo6Q0lZJwc2Do7hYpqnqk5qRJXOUKABo9NXAdetaMTHjmaxxj1kluoKV6vY4qMlmC4ziU0koyDEppkf83BcNhBEGQ1wOdIIIgTJPoaHMow4kgyGuT3Bgjf/zxx6aLgCBIliFzgsmNMcLUHC0jsDb1ni5Ys87d3R35kOgYIwiCIK8NLowgCMI06AQRBGEadIJIyklHHAtWSYN10Akim0KT7COgPil3ivuYuKmU7O92NAtNocdHuzX/QdLERbFIIQxZJxYnSKKLOMj9ODJ14TwprKtZQlSt3gGAaSoXNweG1ev1LMuyDMOwLKPXM/xaI9NgFbtEtml4TAXmCrbeSWKjWKQQhqwTV0+wfjuZTCaTyW0doL0fa7QlEmPEsiyrVziuKfFljGwMuzMgynvdriGrumxIZ2dEO7hb4zihctYNqfamorhFhHaDYk8JjWKRLlizTixOMH/UtzVjyh/rADD48TM+LziNMSJWG7HlimwSXhjVOK6kmKaxtSeCJnGdqq6rsiFxF1u6ZVnG6RZNSroMwvH9VYWrAYyUUeSbJCeKRbpgzTpxzwmOjaeYc5xiKmdwyrbGUWbgZd2ydBkMEHjQoOp0CehgihdGNTLLRIZERmu30bN0WZblLShsRVS6S0wUi5TBmHVidoL9r80BABQ/fcjHmzEVLsOwc5lC644EHkCEjjvghB3ntgovePknK4pFKmHGOnEKKIwvy0Ratf73Ubw+UJO4ymPL0rEbmCVM5Wy4pQMAiKplVaXS6OR0WBudnEJXIMEmAl0KUynVbgoH1zPzvKpwVwAAjZ5uqVSrU5NKoxNV5sl372tUk2pwrU+jWHQ4ScMIrwBMWSe2nuD4skyir8cpKg0AAEqJCM2iB8wQplLihOOCvaRIpp3o4Ejcg5prB4YmcRwn3BxsdziuBqcHhXnhLBIaxSJlMGadeJzg63lAMJWbexqmKrA9CUktvKz36CqXqZS4TtXdMHhZN7bPqK1N4cSyLEuXZdWyTodncBKYPX8cGuRDYqNYpAzGrBPHcHh8KTUH5GN7P9cGAIDi+UM/ljExBnzIKsL2LtARj6X6W87U7Dzv7DKrXDV606QRQlkkJopF+mDJOsmNMZI0dS/k5bAm1pQuWLMOxhhBEAQBQCeIIAjjoBNEEIRpMMYIgiCMgjFGkHXD2tR7umDNOhhjBEEQBADnBBEEYRx0ggiCMA06QSQB+PQ4F6Qt4dnJtZJ166ATRDaEqZRcJ0BvzshHV0CL2cdDrzqeABeBJpq4KBbpgyXrxOEEfSFG4o8x4jwqrKsZgpd1S5d5TdMAXJEooNAyLMsyWg1yJRDmRziG1nZn+r0G0KEVI7FRLNIHS9aJqSdYPH9wxxiJ1Q/S89QkxggGgsgGTjAyUxh1OE44vjom2iCGPyUv6yS8j02vAbCnTr+ebslTieKERrFIGYxZJw4nmD/qO4ox794XY8jRg6hGFCBD0oOtSmcaxp5qWUar0TIsy9JlAR6PBY7jhONHV/Jubdq36FSvwfW1Ujmb0U1IUBSLlMGYdeKeE/z1PACA+sd4RQVpr7tClLaRDMDLuqWKWnckGDS0RFciwyZ7wFVwJy/QIVOvsbst8PI1bZaugVkYiYlikTbYsk58TrAv53K53H77FYRV7cmBHlRSt/KEzEEbwR4vbD9WalCFmytyMayvAUBfhDTMhT1Am10dkhXFIpUwY534nGBZoZOC7f1cLtbAww6i2nrEMUtmMJWz4RYPAIXWtSzKp43CFm92bwokyre3rwHOLLkzOLOMVoMEP3NxVeE4Tji+h6prXqqx2zJUVbUsVRRl74hLk1wCxqJ6OhTwNUtgyDqxb5Ehk4KD519xZahJzqKwJh3DNo6Hs4AnigXpXVSuAMzu8OAk3MJdu/Ng9zO4GlSh2/WmSmgUi5TBmHXicIJ9eboaPP75I+ZJQVGlUQU4rvLYumZe9zwbuKJYGEOgM0jVTi0kRgXFWXJ0+hq6LMqyZ1tFYqNYpAzGrBNTT7C9T/cI7jQHsU8K2j3tGatHSDoRq6Sdiaou85rEcTSkGVkEE45hfhBvutFMuJkT3ozsTKtc+eOBaxLHdaqzqhOtb2xXNpasgzFGkPXBmlhTumDNOhhjBEEQBACdIIIgjINOEEEQpsEYIwiCMArGGEHWDWtT7+mCNetgjBEEQRAAnBNEEIRx0AkiCMI06ASRRGAq0koHoUwzjuNTbsV3JEi2rYNOEEkEvHwCNdeB0GDz0SRHm8klqgHdWlCyKXoUC5vRGcrHzCHb1onXCRJNwVcR0jKV0jyBMiR9+OJT1ODaJUpcqwWbT6NKD9QL226tTufyclEsTKXkvoOTi+86qzBknRid4PiyvN+OLzsv2sXNQWv3tXJHNgG/VbC1leiR+KlUhqXr+ipBFZaLYjFtoL5cZlxnCYasE5sT7Ms7zUGxXo89xAgA0Zc9RREtZtEUxTRHjwDgkmDXpAVySpGjWCAvIvXWiccJji/L++3i+UP/8/tY8vNiKrWbAwwuwiymcnYDAFDYFkCsgh25Rzw5uFkQg2xRFIuriiMFeu/+/Br/RVbJgHXiEVXdaQ7qt07EuZjRLm4OUEs1g5Dew+Jkxta1LvPGELZ4T8vhZX32q3FhFAvT2Lp2BntGa9c19jNO/LrwDMKQdV7uBMeXX9pgy6ruNAcAMGjuxBh5ePR4TwS7j++vKiUlrmyRTWMMIRCFIgReFHkAcxQp8RJRLEZnMwP3dGsMy0pTGLLOy51g/qg/cXg4LwIJxR6ftrQzHdvabfR07BJmBXMEs8Mx+tEu5mkUT1kmikVh9u3n/IkRWLLOm8VJEoEmcZUrAChtW+gHM4HZhWrUeXBT6Wxfq9PExvAR9kLSiaoaT+GYhynrpGWztKjaK/WbLgkSB5pUg71Ia12mUuL8EX5cU08L7hIexQIATAO2h9N9b3Bd7TjTVMGouozBmHUwxgiyPlgTa0oXrFkHY4wgCIIAoBNEEIRx0AkiCMI0GGMEQRBGmcYYqVarmy5MCG/fvt10ERAEyT7/B6T+Vq2HlibVAAAAAElFTkSuQmCC" alt="" />
由于表里面的数据要用来做为map的输入Value,所以要自定义数据类型。
hadoop要自定义数据类型要实现Writable接口,如果是Key要自定义数据类型那么就要实现WritableComparable接口,还要实现里面的比较方法。实现WritableComparable接 口在比较时要反序列话,比较麻烦,那么可以用继承WritableComparator类来实现字节流的比较。
在配置DBInputFormat的输入参数时,必须要有一个数据类型实现DBWritable,所有在这里为Value自定义数据类型要实现DBWritable和Writable两个接口。
package com.qldhlbs.hadoop.demo0420; import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException; import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.lib.db.DBWritable; public class PgDbWritable implements DBWritable, Writable{ private Integer call_type_id;
private String call_type;
private String remark; public PgDbWritable() { } public PgDbWritable(Integer call_type_id, String call_type, String remark){ set(call_type_id, call_type, remark);
} public void set(Integer call_type_id, String call_type, String remark) { this.call_type_id = call_type_id;
this.call_type = call_type;
this.remark = remark;
}
//结果集读取
@Override
public void readFields(ResultSet set) throws SQLException { this.call_type_id = set.getInt(1);
this.call_type = set.getString(2);
this.remark = set.getString(3);
}
//设置参数
@Override
public void write(PreparedStatement ps) throws SQLException { ps.setInt(1, this.call_type_id);
ps.setString(2, this.call_type);
ps.setString(3, this.remark);
}
//反序列化
@Override
public void readFields(DataInput in) throws IOException { this.call_type_id = in.readInt();
this.call_type = in.readUTF();
this.remark = in.readUTF();
}
//序列化
@Override
public void write(DataOutput out) throws IOException { out.writeInt(this.call_type_id);
out.writeUTF(this.call_type);
out.writeUTF(this.remark); } public Integer getCall_type_id() {
return call_type_id;
} public String getCall_type() {
return call_type;
} public String getRemark() {
return remark;
} @Override
public String toString() {
return call_type_id + "\t" + call_type + "\t" + remark;
} @Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((call_type == null) ? 0 : call_type.hashCode());
result = prime * result + ((call_type_id == null) ? 0 : call_type_id.hashCode());
result = prime * result + ((remark == null) ? 0 : remark.hashCode());
return result;
} @Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
PgDbWritable other = (PgDbWritable) obj;
if (call_type == null) {
if (other.call_type != null)
return false;
} else if (!call_type.equals(other.call_type))
return false;
if (call_type_id == null) {
if (other.call_type_id != null)
return false;
} else if (!call_type_id.equals(other.call_type_id))
return false;
if (remark == null) {
if (other.remark != null)
return false;
} else if (!remark.equals(other.remark))
return false;
return true;
} }
首先在PgDbWritable 里面维护对应数据库表的3个字段,并覆写关键的四个方法。每个方法的作用在代码里面有介绍。重写toString,hashCode和equals方法。
自定义数据类型后就是读取数据库的数据了。
package com.qldhlbs.hadoop.demo0420; import java.io.IOException;
import java.sql.SQLException; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.filecache.DistributedCache;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class MapreducePackageDbApp { static class DbReadMapper extends Mapper<LongWritable, PgDbWritable, LongWritable, PgDbWritable>{ @Override
protected void map(LongWritable key, PgDbWritable value,
Mapper<LongWritable, PgDbWritable, LongWritable, PgDbWritable>.Context context)
throws IOException, InterruptedException {
context.write(key, value);
}
} static class DbReadReduce extends Reducer<LongWritable, PgDbWritable, LongWritable, PgDbWritable>{ @Override
protected void reduce(LongWritable key, Iterable<PgDbWritable> values,
Reducer<LongWritable, PgDbWritable, LongWritable, PgDbWritable>.Context context) throws IOException, InterruptedException {
for (PgDbWritable value : values) {
context.write(key, value);
}
}
} @SuppressWarnings("deprecation")
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException, SQLException { Configuration conf = new Configuration(); DBConfiguration.configureDB(conf, "org.postgresql.Driver", "jdbc:postgresql://192.168.0.203/test", "hb", "xxx");
Job job = Job.getInstance(conf); job.setJarByClass(MapreducePackageDbApp.class);
job.setJobName(MapreducePackageDbApp.class.getSimpleName()); DistributedCache.addFileToClassPath(new Path("hdfs://192.168.0.201:49000/user/qldhlbs/lib/postgresql-9.3-1101.jdbc3.jar"), conf); String[] fields = {"call_type_id", "call_type", "remark"}; DBInputFormat<PgDbWritable> in = new DBInputFormat<PgDbWritable>();
in.setConf(conf); //配置DBInputFormat的信息,job, 输入DBWritable, 表名, 查询条件, order by条件, 表的字段数组
DBInputFormat.setInput(job, PgDbWritable.class, "dim_160_168_call_type", null, null, fields); job.setMapperClass(DbReadMapper.class);
//可以不设置reducer,hadoop会自动配置最简的reducer,看源码可以知道是输出map的输出
job.setReducerClass(DbReadReduce.class); job.setOutputKeyClass(LongWritable.class);
//job.setOutputValueClass(Text.class);
job.setOutputValueClass(PgDbWritable.class); job.setInputFormatClass(DBInputFormat.class); FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.0.201:49000/user/qldhlbs/db5")); boolean isSuccess = job.waitForCompletion(true);
System.exit(isSuccess ? 0 : 1);
} }
这里只是一个demo,所以map函数就直接输出读取到的内容就行了,由于reduce函数不写,就是直接写出读取到的map函数数据,所有这里reduce函数也可以不写。
在这里有几点是要注意的,首先这里面的包都是导入的mapreduce的而不是mapred包,混淆会报错;第二点是在hadoop的hdfs上上传一份postgres的驱动包,
先在hdfs上创建一个目录:hadoop fs -mkdir /user/qldhlbs/lib,然后把文件上传上去:hadoop fs -copyFromLocal postgresql-9.3-1101.jdbc3.jar /user/qldhlbs/lib。
在代码里面就是用DistributedCache.addFileToClassPath(new Path("hdfs://192.168.0.201:49000/user/qldhlbs/lib/postgresql-9.3-1101.jdbc3.jar"), conf)这个方法
把jar加载到类路径上去;第三点就是配置DBConfiguration信息,参数依次是Configuration ,数据库驱动,数据库url,用户名,密码。在配置完DBConfiguration信息后,
DBInputFormat<PgDbWritable> in = new DBInputFormat<PgDbWritable>();
in.setConf(conf);
setConf()这个方法不能忘记,一开始就是没调用这个方法把conf给DBInputFormat,一直报空指针异常,后来经过调试查看得知是connection没得到,但是DBConfiguration得到了connection。再进一步调试是DBInputFormat没得到DBConfiguration对象,所以根本就获取不到connection。查看hadoop-mapreduce-client-core-2.4.1源码才解决问题。
public void setConf(Configuration conf)
{
this.dbConf = new DBConfiguration(conf);
try
{
getConnection(); DatabaseMetaData dbMeta = this.connection.getMetaData();
this.dbProductName = dbMeta.getDatabaseProductName().toUpperCase();
}
catch (Exception ex) {
throw new RuntimeException(ex);
} this.tableName = this.dbConf.getInputTableName();
this.fieldNames = this.dbConf.getInputFieldNames();
this.conditions = this.dbConf.getInputConditions();
} public Connection getConnection() {
try {
if (null == this.connection)
{
this.connection = this.dbConf.getConnection();
this.connection.setAutoCommit(false);
this.connection.setTransactionIsolation(8);
}
}
catch (Exception e) {
throw new RuntimeException(e);
}
return this.connection;
}
这是反编译的部分源码,可以看到connection是可以从DBConfiguration对象拿的;第四点就是配置DBInputFormat的信息,参数是job, 输入DBWritable, 表名, 查询条件, order by条件, 表的字段字符串数组。
所有的做完了接下来就可以跑hadoop了。
aaarticlea/png;base64," alt="" />
这是在hdfs里面生成的文件,可以看到数据读取到hdfs上了。
如果不用mapreduce包,用mapred包也是可以的,代码就不上了,差不多,只是不要掉用setConf()方法把conf绑定上去也行。
这是第一种方法,第二种方法就是直接用org.postgresql.copy.CopyManager这个类
public ByteArrayOutputStream copyToStream(String tableOrQuery,String delimiter){
try {
ByteArrayOutputStream out = new ByteArrayOutputStream();
CopyManager copyManager = new CopyManager(
(BaseConnection) getConnection());
String copySql = "COPY " + tableOrQuery + " TO STDOUT";
if (delimiter != null){
copySql = copySql + " WITH DELIMITER AS '"+delimiter+"'";
}
copyManager.copyOut(copySql,
out);
return out;
}catch(Exception e){
e.printStackTrace();
}
return null;
} ByteArrayOutputStream out = copyToStream(sql.toString(), ",");
ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray()); public void uploadFile(String hdfsPath,InputStream in){
try {
FileSystem hdfs = FileSystem.get(conf);
FSDataOutputStream out = hdfs.create(new Path(hdfsPath));
org.apache.hadoop.io.IOUtils.copyBytes(in, out,4096,false);
out.sync();
out.close();
} catch (Exception e) {
// TODO: handle exception
}
}
把流读取出来,用hadoop自带的IOUtils.copyBytes()方法写到hdfs上就可以了就可以了。
hadoop的自定义数据类型和与关系型数据库交互的更多相关文章
- Hadoop MapReduce自定义数据类型
一 自定义数据类型的实现 1.继承接口Writable,实现其方法write()和readFields(), 以便该数据能被序列化后完成网络传输或文件输入/输出: 2.如果该数据需要作为主键key使用 ...
- 关系型数据库 VS NOSQL
转载:https://mp.weixin.qq.com/s/FkoOMY8_vnqSPPTHc2PL1w 行式数据库(关系型数据库) 行式数据库有如下几个缺点: 大数据场景下 I/O 较高,因为数据是 ...
- Hive基础之Hive与关系型数据库的比较
Hive与关系型数据库的比较 使用Hive的CTL(命令行接口)时,你会感觉它很像是在操作关系型数据库,但是实际上,Hive和关系型数据库有很大的不同. 1)Hive和关系型数据库 ...
- hive和关系型数据库
1)hive和关系型数据库存储文件的系统不同. hive使用hdfs(hadoop的分布式文件系统),关系型数据库则是服务器本地的文件系统: 2)hive使用的计算模型是mapreduce,而关系型 ...
- sqoop实现关系型数据库与hadoop之间的数据传递-import篇
由于业务数据量日益增长,计算量非常庞大,传统的数仓已经无法满足计算需求了,所以现在基本上都是将数据放到hadoop平台去实现逻辑计算,那么就涉及到如何将oracle数仓的数据迁移到hadoop平台的问 ...
- Hadoop生态组件Hive,Sqoop安装及Sqoop从HDFS/hive抽取数据到关系型数据库Mysql
一般Hive依赖关系型数据库Mysql,故先安装Mysql $: yum install mysql-server mysql-client [yum安装] $: /etc/init.d/mysqld ...
- ibatis自定义数据类型在不支持中文的数据库存储汉字
道理很简单,把gbk的汉字转换成iso编码存进数据库就可以了,读出来的时候把iso转换成gbk还原出原始的汉字. ibatis可以自定义类型处理器,在这里面做编码转换再适合不过了! sqlmap-co ...
- 使用sqoop 在关系型数据库和Hadoop之间实现数据的抽取
(一)从关系型数据库导入至HDFS 1.将下面的参数保持为 import.script import --connectjdbc:mysql://192.168.1.14:3306/test--use ...
- 一文读懂非关系型数据库(NoSQL)
为了更好的理解非关系型数据库,我又深入的度娘了下 原文地址:https://baijiahao.baidu.com/po/feed/share?wfr=spider&for=pc&co ...
随机推荐
- linux自用命令
文件操作权限设置: chmod 777 文件名/文件夹名 拥有所有权限 http://www_xpc8_com/ chmod 755 文件名/文件夹名 属主有所有权限, ...
- innerHTML、innerText、outerHTML、outerText的区别
我们在用JS/JQ 获取或设置元素内容的时候,通常是获取或设置指定元素之间的内容 <script> //JS document.getElementById('test').innerHT ...
- 如何激活win10 win10激活工具下载
http://www.2cto.com/os/201511/448815.html 官方的win10出来了,可是装在上电脑后要花钱才能用,费用要好几百呢,感觉很不值得,这里我教给大家个免费激活官方wi ...
- C register
1.register修饰符暗示编译程序相应的变量将被频繁地使用,如果可能的话,应将其保存在CPU的寄存器中,以加快其存储速度.例如下面的内存块拷贝代码, /* Procedure for the as ...
- Java关键字介绍
关键字 描述 abstract 抽象方法,抽象类的修饰符 assert 断言条件是否满足 boolean 布尔数据类型 break 跳出循环或者label代码段 byte 8-bit 有符号数据类型 ...
- 2016总结 wjwdive
2016 成长:收获最大的,学会了耐心,学会了宽容,学会了不强求.一念放下,万般自在.我真的是晚熟啊 ^_^! . 读书:<小王子>.<了不起的盖茨比>.<和任何人都聊得 ...
- 关于sort排序
JavaScript的数组排序函数 sort方法,默认是按照ASCII 字符顺序进行升序排列.arrayobj.sort(sortfunction);参数:sortFunction可选项.是用来确定元 ...
- System.Linq.Dynamic的使用
项目中经常用到组合条件查询,根据用户配置的查询条件进行搜索,拼接SQL容易造成SQL注入,普通的LINQ可以用表达式树来完成,但也比较麻烦.有个System.Linq.Dynamic用起来比较方便. ...
- PXE DHCP获取IP与传统DHCP获取IP地址的区别
正常的DHCP获取IP的流程(Discover-Offer-Request-Ack): (Discovery)主机端在LAN中发布MAC地址为FF:FF:FF:FF:FF:FF的广播来寻找DHCP服务 ...
- 启动eclipse时:java is started but returned code 13
此问题是eclipse位数和java位数不匹配造成的 我的是win10 64位的eclipse + 32位的java造成的问题 查看eclipse位数:用记事本打开eclipse根目录下的eclips ...