一、本地模式调试MR程序

  1.准备

    参考之前随笔的windows开发说明处:http://www.cnblogs.com/jiangbei/p/8366238.html

  2.流程

    最重要的是设置LocalJobRunner这个参数,这样mapreduce就变成一个线程在跑了!

    而处理的数据及输出结果可以在本地文件系统,也可以在hdfs上

  3.代码实现

    以wordcount为例,只需在原来的Driver的main()方法中,对conf进行设置:

  1. // 设置本地运行
  2. conf.set("mapreduce.framework.name", "local");
  3. // 本地文件系统
  4. conf.set("fs.defaultFS", "file:///");

   并且,这两个配置是默认配置,如果跑本地,可以省略!

   当然,这样运行需要修改本地文件系统的输入输出:

  1. // 指定job的原始输入/输出目录(可以改为由外面输入,而不必写死)
  2. FileInputFormat.setInputPaths(job, new Path("F:\\c.log"));
  3. FileOutputFormat.setOutputPath(job, new Path("F:\\output"));

    当然,完全可以使用HDFS的文件系统:(此时记得修改对应的目录!)

  1.     // 设置本地运行
  2. conf.set("mapreduce.framework.name", "local");
  3. // 设置文件系统
  4. // conf.set("fs.defaultFS", "hdfs://mini1:9000"); // HDFS文件系统
  5. conf.set("fs.defaultFS", "file:///"); // 本地文件系统
  1. FileInputFormat.setInputPaths(job, new Path("/wordcount/input"));
  2. FileOutputFormat.setOutputPath(job, new Path("/wordcount/output"));

     我们也可以把Path这里换成main方法的args参数,这样就可以在IDEA中动态输入了:

      

   4.常见问题

    ClassNotFound:

      由于相关的包未参与打包,把依赖的provided范围去掉即可(默认即为compile),比如以下两个包:

  1. <!-- hadoop-common(provided) -->
  2. <!-- hadoop-mapreduce-client-jobclient(provided) -->  

  java.lang.UnsatisfiedLinkError

   出现这个错误是windows没有hadoop.dll这个文件,下载hadoop.dll放到C:\Windows\System32中,就可以了

  设置用户(或者通过run configuraton配置-DHADOOP_USER_NAME=hadoop)

  1. System.setProperty("HADOOP_USER_NAME", "hadoop");

  更多配置hadoop在winwows下配置用户的方式:http://blog.csdn.net/wyc09/article/details/16338483

  5.DEBUG

    与普通调试一样打断点进行,IDEA的DEBUG参考之前随笔:http://www.cnblogs.com/jiangbei/p/7766125.html

二、集群模式运行MR程序

  1.相关参数

    以下3个参数都可以在相应的jar包里的xml配置文件中找到默认配置与相关介绍:

  1. mapreduce-client-core>mapred-default.xml
  2. <property>
  3. <name>mapreduce.framework.name</name>
  4. <value>local</value>
  5. <description>The runtime framework for executing MapReduce jobs.
  6. Can be one of local, classic or yarn.
  7. </description>
  8. </property>
  1. yarn-common>yarn-default.xml
  2. <property>
  3. <description>The hostname of the RM.</description>
  4. <name>yarn.resourcemanager.hostname</name>
  5. <value>0.0.0.0</value>
  6. </property>
  1. hadoop-common>core-default.xml
  2. <property>
  3. <name>fs.defaultFS</name>
  4. <value>file:///</value>
  5. <description>The name of the default file system. A URI whose
  6. scheme and authority determine the FileSystem implementation. The
  7. uri's scheme determines the config property (fs.SCHEME.impl) naming
  8. the FileSystem implementation class. The uri's authority is used to
  9. determine the host, port, etc. for a filesystem.</description>
  10. </property>

  这样,我们就可以根据相关介绍,配置我们的集群模式的3个必备参数:(在main方法中配置)

  1. conf.set("mapreduce.framework.name", "yarn");
  2. conf.set("yarn.resourcemanager.hostname", "mini1");
  3. conf.set("fs.defaultFS", "hdfs://mini1:9000/");

  以下更多的配置可能会需要修改源码进行操作,方式参考http://blog.csdn.net/xie_xiansheng/article/details/74453244

  修改的源码:

  1. /**
  2. * Licensed to the Apache Software Foundation (ASF) under one
  3. * or more contributor license agreements. See the NOTICE file
  4. * distributed with this work for additional information
  5. * regarding copyright ownership. The ASF licenses this file
  6. * to you under the Apache License, Version 2.0 (the
  7. * "License"); you may not use this file except in compliance
  8. * with the License. You may obtain a copy of the License at
  9. *
  10. * http://www.apache.org/licenses/LICENSE-2.0
  11. *
  12. * Unless required by applicable law or agreed to in writing, software
  13. * distributed under the License is distributed on an "AS IS" BASIS,
  14. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  15. * See the License for the specific language governing permissions and
  16. * limitations under the License.
  17. */
  18.  
  19. package org.apache.hadoop.mapred;
  20.  
  21. import java.io.IOException;
  22. import java.nio.ByteBuffer;
  23. import java.util.ArrayList;
  24. import java.util.Collection;
  25. import java.util.HashMap;
  26. import java.util.HashSet;
  27. import java.util.List;
  28. import java.util.Map;
  29. import java.util.Vector;
  30.  
  31. import org.apache.commons.logging.Log;
  32. import org.apache.commons.logging.LogFactory;
  33. import org.apache.hadoop.classification.InterfaceAudience.Private;
  34. import org.apache.hadoop.conf.Configuration;
  35. import org.apache.hadoop.fs.FileContext;
  36. import org.apache.hadoop.fs.FileStatus;
  37. import org.apache.hadoop.fs.Path;
  38. import org.apache.hadoop.fs.UnsupportedFileSystemException;
  39. import org.apache.hadoop.io.DataOutputBuffer;
  40. import org.apache.hadoop.io.Text;
  41. import org.apache.hadoop.ipc.ProtocolSignature;
  42. import org.apache.hadoop.mapreduce.Cluster.JobTrackerStatus;
  43. import org.apache.hadoop.mapreduce.ClusterMetrics;
  44. import org.apache.hadoop.mapreduce.Counters;
  45. import org.apache.hadoop.mapreduce.JobContext;
  46. import org.apache.hadoop.mapreduce.JobID;
  47. import org.apache.hadoop.mapreduce.JobStatus;
  48. import org.apache.hadoop.mapreduce.MRJobConfig;
  49. import org.apache.hadoop.mapreduce.QueueAclsInfo;
  50. import org.apache.hadoop.mapreduce.QueueInfo;
  51. import org.apache.hadoop.mapreduce.TaskAttemptID;
  52. import org.apache.hadoop.mapreduce.TaskCompletionEvent;
  53. import org.apache.hadoop.mapreduce.TaskReport;
  54. import org.apache.hadoop.mapreduce.TaskTrackerInfo;
  55. import org.apache.hadoop.mapreduce.TaskType;
  56. import org.apache.hadoop.mapreduce.TypeConverter;
  57. import org.apache.hadoop.mapreduce.protocol.ClientProtocol;
  58. import org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier;
  59. import org.apache.hadoop.mapreduce.v2.LogParams;
  60. import org.apache.hadoop.mapreduce.v2.api.MRClientProtocol;
  61. import org.apache.hadoop.mapreduce.v2.api.protocolrecords.GetDelegationTokenRequest;
  62. import org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils;
  63. import org.apache.hadoop.mapreduce.v2.util.MRApps;
  64. import org.apache.hadoop.security.Credentials;
  65. import org.apache.hadoop.security.SecurityUtil;
  66. import org.apache.hadoop.security.UserGroupInformation;
  67. import org.apache.hadoop.security.authorize.AccessControlList;
  68. import org.apache.hadoop.security.token.Token;
  69. import org.apache.hadoop.yarn.api.ApplicationConstants;
  70. import org.apache.hadoop.yarn.api.ApplicationConstants.Environment;
  71. import org.apache.hadoop.yarn.api.records.ApplicationAccessType;
  72. import org.apache.hadoop.yarn.api.records.ApplicationId;
  73. import org.apache.hadoop.yarn.api.records.ApplicationReport;
  74. import org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext;
  75. import org.apache.hadoop.yarn.api.records.ContainerLaunchContext;
  76. import org.apache.hadoop.yarn.api.records.LocalResource;
  77. import org.apache.hadoop.yarn.api.records.LocalResourceType;
  78. import org.apache.hadoop.yarn.api.records.LocalResourceVisibility;
  79. import org.apache.hadoop.yarn.api.records.ReservationId;
  80. import org.apache.hadoop.yarn.api.records.Resource;
  81. import org.apache.hadoop.yarn.api.records.URL;
  82. import org.apache.hadoop.yarn.api.records.YarnApplicationState;
  83. import org.apache.hadoop.yarn.conf.YarnConfiguration;
  84. import org.apache.hadoop.yarn.exceptions.YarnException;
  85. import org.apache.hadoop.yarn.factories.RecordFactory;
  86. import org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider;
  87. import org.apache.hadoop.yarn.security.client.RMDelegationTokenSelector;
  88. import org.apache.hadoop.yarn.util.ConverterUtils;
  89.  
  90. import com.google.common.annotations.VisibleForTesting;
  91. import com.google.common.base.CaseFormat;
  92.  
  93. /**
  94. * This class enables the current JobClient (0.22 hadoop) to run on YARN.
  95. */
  96. @SuppressWarnings("unchecked")
  97. public class YARNRunner implements ClientProtocol {
  98.  
  99. private static final Log LOG = LogFactory.getLog(YARNRunner.class);
  100.  
  101. private final RecordFactory recordFactory = RecordFactoryProvider.getRecordFactory(null);
  102. private ResourceMgrDelegate resMgrDelegate;
  103. private ClientCache clientCache;
  104. private Configuration conf;
  105. private final FileContext defaultFileContext;
  106.  
  107. /**
  108. * Yarn runner incapsulates the client interface of yarn
  109. *
  110. * @param conf
  111. * the configuration object for the client
  112. */
  113. public YARNRunner(Configuration conf) {
  114. this(conf, new ResourceMgrDelegate(new YarnConfiguration(conf)));
  115. }
  116.  
  117. /**
  118. * Similar to {@link #YARNRunner(Configuration)} but allowing injecting
  119. * {@link ResourceMgrDelegate}. Enables mocking and testing.
  120. *
  121. * @param conf
  122. * the configuration object for the client
  123. * @param resMgrDelegate
  124. * the resourcemanager client handle.
  125. */
  126. public YARNRunner(Configuration conf, ResourceMgrDelegate resMgrDelegate) {
  127. this(conf, resMgrDelegate, new ClientCache(conf, resMgrDelegate));
  128. }
  129.  
  130. /**
  131. * Similar to
  132. * {@link YARNRunner#YARNRunner(Configuration, ResourceMgrDelegate)} but
  133. * allowing injecting {@link ClientCache}. Enable mocking and testing.
  134. *
  135. * @param conf
  136. * the configuration object
  137. * @param resMgrDelegate
  138. * the resource manager delegate
  139. * @param clientCache
  140. * the client cache object.
  141. */
  142. public YARNRunner(Configuration conf, ResourceMgrDelegate resMgrDelegate, ClientCache clientCache) {
  143. this.conf = conf;
  144. try {
  145. this.resMgrDelegate = resMgrDelegate;
  146. this.clientCache = clientCache;
  147. this.defaultFileContext = FileContext.getFileContext(this.conf);
  148. } catch (UnsupportedFileSystemException ufe) {
  149. throw new RuntimeException("Error in instantiating YarnClient", ufe);
  150. }
  151. }
  152.  
  153. @Private
  154. /**
  155. * Used for testing mostly.
  156. * @param resMgrDelegate the resource manager delegate to set to.
  157. */
  158. public void setResourceMgrDelegate(ResourceMgrDelegate resMgrDelegate) {
  159. this.resMgrDelegate = resMgrDelegate;
  160. }
  161.  
  162. @Override
  163. public void cancelDelegationToken(Token<DelegationTokenIdentifier> arg0) throws IOException, InterruptedException {
  164. throw new UnsupportedOperationException("Use Token.renew instead");
  165. }
  166.  
  167. @Override
  168. public TaskTrackerInfo[] getActiveTrackers() throws IOException, InterruptedException {
  169. return resMgrDelegate.getActiveTrackers();
  170. }
  171.  
  172. @Override
  173. public JobStatus[] getAllJobs() throws IOException, InterruptedException {
  174. return resMgrDelegate.getAllJobs();
  175. }
  176.  
  177. @Override
  178. public TaskTrackerInfo[] getBlacklistedTrackers() throws IOException, InterruptedException {
  179. return resMgrDelegate.getBlacklistedTrackers();
  180. }
  181.  
  182. @Override
  183. public ClusterMetrics getClusterMetrics() throws IOException, InterruptedException {
  184. return resMgrDelegate.getClusterMetrics();
  185. }
  186.  
  187. @VisibleForTesting
  188. void addHistoryToken(Credentials ts) throws IOException, InterruptedException {
  189. /* check if we have a hsproxy, if not, no need */
  190. MRClientProtocol hsProxy = clientCache.getInitializedHSProxy();
  191. if (UserGroupInformation.isSecurityEnabled() && (hsProxy != null)) {
  192. /*
  193. * note that get delegation token was called. Again this is hack for
  194. * oozie to make sure we add history server delegation tokens to the
  195. * credentials
  196. */
  197. RMDelegationTokenSelector tokenSelector = new RMDelegationTokenSelector();
  198. Text service = resMgrDelegate.getRMDelegationTokenService();
  199. if (tokenSelector.selectToken(service, ts.getAllTokens()) != null) {
  200. Text hsService = SecurityUtil.buildTokenService(hsProxy.getConnectAddress());
  201. if (ts.getToken(hsService) == null) {
  202. ts.addToken(hsService, getDelegationTokenFromHS(hsProxy));
  203. }
  204. }
  205. }
  206. }
  207.  
  208. @VisibleForTesting
  209. Token<?> getDelegationTokenFromHS(MRClientProtocol hsProxy) throws IOException, InterruptedException {
  210. GetDelegationTokenRequest request = recordFactory.newRecordInstance(GetDelegationTokenRequest.class);
  211. request.setRenewer(Master.getMasterPrincipal(conf));
  212. org.apache.hadoop.yarn.api.records.Token mrDelegationToken;
  213. mrDelegationToken = hsProxy.getDelegationToken(request).getDelegationToken();
  214. return ConverterUtils.convertFromYarn(mrDelegationToken, hsProxy.getConnectAddress());
  215. }
  216.  
  217. @Override
  218. public Token<DelegationTokenIdentifier> getDelegationToken(Text renewer) throws IOException, InterruptedException {
  219. // The token is only used for serialization. So the type information
  220. // mismatch should be fine.
  221. return resMgrDelegate.getDelegationToken(renewer);
  222. }
  223.  
  224. @Override
  225. public String getFilesystemName() throws IOException, InterruptedException {
  226. return resMgrDelegate.getFilesystemName();
  227. }
  228.  
  229. @Override
  230. public JobID getNewJobID() throws IOException, InterruptedException {
  231. return resMgrDelegate.getNewJobID();
  232. }
  233.  
  234. @Override
  235. public QueueInfo getQueue(String queueName) throws IOException, InterruptedException {
  236. return resMgrDelegate.getQueue(queueName);
  237. }
  238.  
  239. @Override
  240. public QueueAclsInfo[] getQueueAclsForCurrentUser() throws IOException, InterruptedException {
  241. return resMgrDelegate.getQueueAclsForCurrentUser();
  242. }
  243.  
  244. @Override
  245. public QueueInfo[] getQueues() throws IOException, InterruptedException {
  246. return resMgrDelegate.getQueues();
  247. }
  248.  
  249. @Override
  250. public QueueInfo[] getRootQueues() throws IOException, InterruptedException {
  251. return resMgrDelegate.getRootQueues();
  252. }
  253.  
  254. @Override
  255. public QueueInfo[] getChildQueues(String parent) throws IOException, InterruptedException {
  256. return resMgrDelegate.getChildQueues(parent);
  257. }
  258.  
  259. @Override
  260. public String getStagingAreaDir() throws IOException, InterruptedException {
  261. return resMgrDelegate.getStagingAreaDir();
  262. }
  263.  
  264. @Override
  265. public String getSystemDir() throws IOException, InterruptedException {
  266. return resMgrDelegate.getSystemDir();
  267. }
  268.  
  269. @Override
  270. public long getTaskTrackerExpiryInterval() throws IOException, InterruptedException {
  271. return resMgrDelegate.getTaskTrackerExpiryInterval();
  272. }
  273.  
  274. @Override
  275. public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts) throws IOException, InterruptedException {
  276.  
  277. addHistoryToken(ts);
  278.  
  279. // Construct necessary information to start the MR AM
  280. ApplicationSubmissionContext appContext = createApplicationSubmissionContext(conf, jobSubmitDir, ts);
  281.  
  282. // Submit to ResourceManager
  283. try {
  284. ApplicationId applicationId = resMgrDelegate.submitApplication(appContext);
  285.  
  286. ApplicationReport appMaster = resMgrDelegate.getApplicationReport(applicationId);
  287. String diagnostics = (appMaster == null ? "application report is null" : appMaster.getDiagnostics());
  288. if (appMaster == null || appMaster.getYarnApplicationState() == YarnApplicationState.FAILED || appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {
  289. throw new IOException("Failed to run job : " + diagnostics);
  290. }
  291. return clientCache.getClient(jobId).getJobStatus(jobId);
  292. } catch (YarnException e) {
  293. throw new IOException(e);
  294. }
  295. }
  296.  
  297. private LocalResource createApplicationResource(FileContext fs, Path p, LocalResourceType type) throws IOException {
  298. LocalResource rsrc = recordFactory.newRecordInstance(LocalResource.class);
  299. FileStatus rsrcStat = fs.getFileStatus(p);
  300. rsrc.setResource(ConverterUtils.getYarnUrlFromPath(fs.getDefaultFileSystem().resolvePath(rsrcStat.getPath())));
  301. rsrc.setSize(rsrcStat.getLen());
  302. rsrc.setTimestamp(rsrcStat.getModificationTime());
  303. rsrc.setType(type);
  304. rsrc.setVisibility(LocalResourceVisibility.APPLICATION);
  305. return rsrc;
  306. }
  307.  
  308. public ApplicationSubmissionContext createApplicationSubmissionContext(Configuration jobConf, String jobSubmitDir, Credentials ts) throws IOException {
  309. ApplicationId applicationId = resMgrDelegate.getApplicationId();
  310.  
  311. // Setup resource requirements
  312. Resource capability = recordFactory.newRecordInstance(Resource.class);
  313. capability.setMemory(conf.getInt(MRJobConfig.MR_AM_VMEM_MB, MRJobConfig.DEFAULT_MR_AM_VMEM_MB));
  314. capability.setVirtualCores(conf.getInt(MRJobConfig.MR_AM_CPU_VCORES, MRJobConfig.DEFAULT_MR_AM_CPU_VCORES));
  315. LOG.debug("AppMaster capability = " + capability);
  316.  
  317. // Setup LocalResources
  318. Map<String, LocalResource> localResources = new HashMap<String, LocalResource>();
  319.  
  320. Path jobConfPath = new Path(jobSubmitDir, MRJobConfig.JOB_CONF_FILE);
  321.  
  322. URL yarnUrlForJobSubmitDir = ConverterUtils.getYarnUrlFromPath(defaultFileContext.getDefaultFileSystem().resolvePath(defaultFileContext.makeQualified(new Path(jobSubmitDir))));
  323. LOG.debug("Creating setup context, jobSubmitDir url is " + yarnUrlForJobSubmitDir);
  324.  
  325. localResources.put(MRJobConfig.JOB_CONF_FILE, createApplicationResource(defaultFileContext, jobConfPath, LocalResourceType.FILE));
  326. if (jobConf.get(MRJobConfig.JAR) != null) {
  327. Path jobJarPath = new Path(jobConf.get(MRJobConfig.JAR));
  328. LocalResource rc = createApplicationResource(FileContext.getFileContext(jobJarPath.toUri(), jobConf), jobJarPath, LocalResourceType.PATTERN);
  329. String pattern = conf.getPattern(JobContext.JAR_UNPACK_PATTERN, JobConf.UNPACK_JAR_PATTERN_DEFAULT).pattern();
  330. rc.setPattern(pattern);
  331. localResources.put(MRJobConfig.JOB_JAR, rc);
  332. } else {
  333. // Job jar may be null. For e.g, for pipes, the job jar is the
  334. // hadoop
  335. // mapreduce jar itself which is already on the classpath.
  336. LOG.info("Job jar is not present. " + "Not adding any jar to the list of resources.");
  337. }
  338.  
  339. // TODO gross hack
  340. for (String s : new String[] { MRJobConfig.JOB_SPLIT, MRJobConfig.JOB_SPLIT_METAINFO }) {
  341. localResources.put(MRJobConfig.JOB_SUBMIT_DIR + "/" + s, createApplicationResource(defaultFileContext, new Path(jobSubmitDir, s), LocalResourceType.FILE));
  342. }
  343.  
  344. // Setup security tokens
  345. DataOutputBuffer dob = new DataOutputBuffer();
  346. ts.writeTokenStorageToStream(dob);
  347. ByteBuffer securityTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength());
  348.  
  349. // Setup the command to run the AM
  350. List<String> vargs = new ArrayList<String>(8);
  351. // vargs.add(MRApps.crossPlatformifyMREnv(jobConf,
  352. // Environment.JAVA_HOME)
  353. // + "/bin/java");
  354. // 改:TODO ----angelababy的男朋友所改-------有任何问题,请联系angelababy
  355. System.out.println(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME) + "/bin/java");
  356. System.out.println("$JAVA_HOME/bin/java");
  357. vargs.add("$JAVA_HOME/bin/java");
  358.  
  359. // TODO: why do we use 'conf' some places and 'jobConf' others?
  360. long logSize = jobConf.getLong(MRJobConfig.MR_AM_LOG_KB, MRJobConfig.DEFAULT_MR_AM_LOG_KB) << 10;
  361. String logLevel = jobConf.get(MRJobConfig.MR_AM_LOG_LEVEL, MRJobConfig.DEFAULT_MR_AM_LOG_LEVEL);
  362. int numBackups = jobConf.getInt(MRJobConfig.MR_AM_LOG_BACKUPS, MRJobConfig.DEFAULT_MR_AM_LOG_BACKUPS);
  363. MRApps.addLog4jSystemProperties(logLevel, logSize, numBackups, vargs, conf);
  364.  
  365. // Check for Java Lib Path usage in MAP and REDUCE configs
  366. warnForJavaLibPath(conf.get(MRJobConfig.MAP_JAVA_OPTS, ""), "map", MRJobConfig.MAP_JAVA_OPTS, MRJobConfig.MAP_ENV);
  367. warnForJavaLibPath(conf.get(MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, ""), "map", MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, MRJobConfig.MAPRED_ADMIN_USER_ENV);
  368. warnForJavaLibPath(conf.get(MRJobConfig.REDUCE_JAVA_OPTS, ""), "reduce", MRJobConfig.REDUCE_JAVA_OPTS, MRJobConfig.REDUCE_ENV);
  369. warnForJavaLibPath(conf.get(MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, ""), "reduce", MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, MRJobConfig.MAPRED_ADMIN_USER_ENV);
  370.  
  371. // Add AM admin command opts before user command opts
  372. // so that it can be overridden by user
  373. String mrAppMasterAdminOptions = conf.get(MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS, MRJobConfig.DEFAULT_MR_AM_ADMIN_COMMAND_OPTS);
  374. warnForJavaLibPath(mrAppMasterAdminOptions, "app master", MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS, MRJobConfig.MR_AM_ADMIN_USER_ENV);
  375. vargs.add(mrAppMasterAdminOptions);
  376.  
  377. // Add AM user command opts
  378. String mrAppMasterUserOptions = conf.get(MRJobConfig.MR_AM_COMMAND_OPTS, MRJobConfig.DEFAULT_MR_AM_COMMAND_OPTS);
  379. warnForJavaLibPath(mrAppMasterUserOptions, "app master", MRJobConfig.MR_AM_COMMAND_OPTS, MRJobConfig.MR_AM_ENV);
  380. vargs.add(mrAppMasterUserOptions);
  381.  
  382. if (jobConf.getBoolean(MRJobConfig.MR_AM_PROFILE, MRJobConfig.DEFAULT_MR_AM_PROFILE)) {
  383. final String profileParams = jobConf.get(MRJobConfig.MR_AM_PROFILE_PARAMS, MRJobConfig.DEFAULT_TASK_PROFILE_PARAMS);
  384. if (profileParams != null) {
  385. vargs.add(String.format(profileParams, ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + TaskLog.LogName.PROFILE));
  386. }
  387. }
  388.  
  389. vargs.add(MRJobConfig.APPLICATION_MASTER_CLASS);
  390. vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDOUT);
  391. vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDERR);
  392.  
  393. Vector<String> vargsFinal = new Vector<String>(8);
  394. // Final command
  395. StringBuilder mergedCommand = new StringBuilder();
  396. for (CharSequence str : vargs) {
  397. mergedCommand.append(str).append(" ");
  398. }
  399. vargsFinal.add(mergedCommand.toString());
  400.  
  401. LOG.debug("Command to launch container for ApplicationMaster is : " + mergedCommand);
  402.  
  403. // Setup the CLASSPATH in environment
  404. // i.e. add { Hadoop jars, job jar, CWD } to classpath.
  405. Map<String, String> environment = new HashMap<String, String>();
  406. MRApps.setClasspath(environment, conf);
  407.  
  408. // Shell
  409. environment.put(Environment.SHELL.name(), conf.get(MRJobConfig.MAPRED_ADMIN_USER_SHELL, MRJobConfig.DEFAULT_SHELL));
  410.  
  411. // Add the container working directory at the front of LD_LIBRARY_PATH
  412. MRApps.addToEnvironment(environment, Environment.LD_LIBRARY_PATH.name(), MRApps.crossPlatformifyMREnv(conf, Environment.PWD), conf);
  413.  
  414. // Setup the environment variables for Admin first
  415. MRApps.setEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ADMIN_USER_ENV), conf);
  416. // Setup the environment variables (LD_LIBRARY_PATH, etc)
  417. MRApps.setEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ENV), conf);
  418.  
  419. // Parse distributed cache
  420. MRApps.setupDistributedCache(jobConf, localResources);
  421.  
  422. Map<ApplicationAccessType, String> acls = new HashMap<ApplicationAccessType, String>(2);
  423. acls.put(ApplicationAccessType.VIEW_APP, jobConf.get(MRJobConfig.JOB_ACL_VIEW_JOB, MRJobConfig.DEFAULT_JOB_ACL_VIEW_JOB));
  424. acls.put(ApplicationAccessType.MODIFY_APP, jobConf.get(MRJobConfig.JOB_ACL_MODIFY_JOB, MRJobConfig.DEFAULT_JOB_ACL_MODIFY_JOB));
  425.  
  426. // 改:TODO BY DHT
  427. for (String key : environment.keySet()) {
  428. String org = environment.get(key);
  429. String linux = getLinux(org);
  430. environment.put(key, linux);
  431. }
  432. // Setup ContainerLaunchContext for AM container
  433. ContainerLaunchContext amContainer = ContainerLaunchContext.newInstance(localResources, environment, vargsFinal, null, securityTokens, acls);
  434.  
  435. Collection<String> tagsFromConf = jobConf.getTrimmedStringCollection(MRJobConfig.JOB_TAGS);
  436.  
  437. // Set up the ApplicationSubmissionContext
  438. ApplicationSubmissionContext appContext = recordFactory.newRecordInstance(ApplicationSubmissionContext.class);
  439. appContext.setApplicationId(applicationId); // ApplicationId
  440. appContext.setQueue( // Queue name
  441. jobConf.get(JobContext.QUEUE_NAME, YarnConfiguration.DEFAULT_QUEUE_NAME));
  442. // add reservationID if present
  443. ReservationId reservationID = null;
  444. try {
  445. reservationID = ReservationId.parseReservationId(jobConf.get(JobContext.RESERVATION_ID));
  446. } catch (NumberFormatException e) {
  447. // throw exception as reservationid as is invalid
  448. String errMsg = "Invalid reservationId: " + jobConf.get(JobContext.RESERVATION_ID) + " specified for the app: " + applicationId;
  449. LOG.warn(errMsg);
  450. throw new IOException(errMsg);
  451. }
  452. if (reservationID != null) {
  453. appContext.setReservationID(reservationID);
  454. LOG.info("SUBMITTING ApplicationSubmissionContext app:" + applicationId + " to queue:" + appContext.getQueue() + " with reservationId:" + appContext.getReservationID());
  455. }
  456. appContext.setApplicationName( // Job name
  457. jobConf.get(JobContext.JOB_NAME, YarnConfiguration.DEFAULT_APPLICATION_NAME));
  458. appContext.setCancelTokensWhenComplete(conf.getBoolean(MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN, true));
  459. appContext.setAMContainerSpec(amContainer); // AM Container
  460. appContext.setMaxAppAttempts(conf.getInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, MRJobConfig.DEFAULT_MR_AM_MAX_ATTEMPTS));
  461. appContext.setResource(capability);
  462. appContext.setApplicationType(MRJobConfig.MR_APPLICATION_TYPE);
  463. if (tagsFromConf != null && !tagsFromConf.isEmpty()) {
  464. appContext.setApplicationTags(new HashSet<String>(tagsFromConf));
  465. }
  466.  
  467. return appContext;
  468. }
  469.  
  470. private String getLinux(String org) {
  471. StringBuilder sb = new StringBuilder();
  472. int c = 0;
  473. for (int i = 0; i < org.length(); i++) {
  474. if (org.charAt(i) == '%') {
  475. c++;
  476. if (c % 2 == 1) {
  477. sb.append("$");
  478. }
  479. } else {
  480. switch (org.charAt(i)) {
  481. case ';':
  482. sb.append(":");
  483. break;
  484.  
  485. case '\\':
  486. sb.append("/");
  487. break;
  488. default:
  489. sb.append(org.charAt(i));
  490. break;
  491. }
  492. }
  493. }
  494. return (sb.toString());
  495. }
  496.  
  497. @Override
  498. public void setJobPriority(JobID arg0, String arg1) throws IOException, InterruptedException {
  499. resMgrDelegate.setJobPriority(arg0, arg1);
  500. }
  501.  
  502. @Override
  503. public long getProtocolVersion(String arg0, long arg1) throws IOException {
  504. return resMgrDelegate.getProtocolVersion(arg0, arg1);
  505. }
  506.  
  507. @Override
  508. public long renewDelegationToken(Token<DelegationTokenIdentifier> arg0) throws IOException, InterruptedException {
  509. throw new UnsupportedOperationException("Use Token.renew instead");
  510. }
  511.  
  512. @Override
  513. public Counters getJobCounters(JobID arg0) throws IOException, InterruptedException {
  514. return clientCache.getClient(arg0).getJobCounters(arg0);
  515. }
  516.  
  517. @Override
  518. public String getJobHistoryDir() throws IOException, InterruptedException {
  519. return JobHistoryUtils.getConfiguredHistoryServerDoneDirPrefix(conf);
  520. }
  521.  
  522. @Override
  523. public JobStatus getJobStatus(JobID jobID) throws IOException, InterruptedException {
  524. JobStatus status = clientCache.getClient(jobID).getJobStatus(jobID);
  525. return status;
  526. }
  527.  
  528. @Override
  529. public TaskCompletionEvent[] getTaskCompletionEvents(JobID arg0, int arg1, int arg2) throws IOException, InterruptedException {
  530. return clientCache.getClient(arg0).getTaskCompletionEvents(arg0, arg1, arg2);
  531. }
  532.  
  533. @Override
  534. public String[] getTaskDiagnostics(TaskAttemptID arg0) throws IOException, InterruptedException {
  535. return clientCache.getClient(arg0.getJobID()).getTaskDiagnostics(arg0);
  536. }
  537.  
  538. @Override
  539. public TaskReport[] getTaskReports(JobID jobID, TaskType taskType) throws IOException, InterruptedException {
  540. return clientCache.getClient(jobID).getTaskReports(jobID, taskType);
  541. }
  542.  
  543. private void killUnFinishedApplication(ApplicationId appId) throws IOException {
  544. ApplicationReport application = null;
  545. try {
  546. application = resMgrDelegate.getApplicationReport(appId);
  547. } catch (YarnException e) {
  548. throw new IOException(e);
  549. }
  550. if (application.getYarnApplicationState() == YarnApplicationState.FINISHED || application.getYarnApplicationState() == YarnApplicationState.FAILED || application.getYarnApplicationState() == YarnApplicationState.KILLED) {
  551. return;
  552. }
  553. killApplication(appId);
  554. }
  555.  
  556. private void killApplication(ApplicationId appId) throws IOException {
  557. try {
  558. resMgrDelegate.killApplication(appId);
  559. } catch (YarnException e) {
  560. throw new IOException(e);
  561. }
  562. }
  563.  
  564. private boolean isJobInTerminalState(JobStatus status) {
  565. return status.getState() == JobStatus.State.KILLED || status.getState() == JobStatus.State.FAILED || status.getState() == JobStatus.State.SUCCEEDED;
  566. }
  567.  
  568. @Override
  569. public void killJob(JobID arg0) throws IOException, InterruptedException {
  570. /* check if the status is not running, if not send kill to RM */
  571. JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0);
  572. ApplicationId appId = TypeConverter.toYarn(arg0).getAppId();
  573.  
  574. // get status from RM and return
  575. if (status == null) {
  576. killUnFinishedApplication(appId);
  577. return;
  578. }
  579.  
  580. if (status.getState() != JobStatus.State.RUNNING) {
  581. killApplication(appId);
  582. return;
  583. }
  584.  
  585. try {
  586. /* send a kill to the AM */
  587. clientCache.getClient(arg0).killJob(arg0);
  588. long currentTimeMillis = System.currentTimeMillis();
  589. long timeKillIssued = currentTimeMillis;
  590. while ((currentTimeMillis < timeKillIssued + 10000L) && !isJobInTerminalState(status)) {
  591. try {
  592. Thread.sleep(1000L);
  593. } catch (InterruptedException ie) {
  594. /** interrupted, just break */
  595. break;
  596. }
  597. currentTimeMillis = System.currentTimeMillis();
  598. status = clientCache.getClient(arg0).getJobStatus(arg0);
  599. if (status == null) {
  600. killUnFinishedApplication(appId);
  601. return;
  602. }
  603. }
  604. } catch (IOException io) {
  605. LOG.debug("Error when checking for application status", io);
  606. }
  607. if (status != null && !isJobInTerminalState(status)) {
  608. killApplication(appId);
  609. }
  610. }
  611.  
  612. @Override
  613. public boolean killTask(TaskAttemptID arg0, boolean arg1) throws IOException, InterruptedException {
  614. return clientCache.getClient(arg0.getJobID()).killTask(arg0, arg1);
  615. }
  616.  
  617. @Override
  618. public AccessControlList getQueueAdmins(String arg0) throws IOException {
  619. return new AccessControlList("*");
  620. }
  621.  
  622. @Override
  623. public JobTrackerStatus getJobTrackerStatus() throws IOException, InterruptedException {
  624. return JobTrackerStatus.RUNNING;
  625. }
  626.  
  627. @Override
  628. public ProtocolSignature getProtocolSignature(String protocol, long clientVersion, int clientMethodsHash) throws IOException {
  629. return ProtocolSignature.getProtocolSignature(this, protocol, clientVersion, clientMethodsHash);
  630. }
  631.  
  632. @Override
  633. public LogParams getLogFileParams(JobID jobID, TaskAttemptID taskAttemptID) throws IOException {
  634. return clientCache.getClient(jobID).getLogFilePath(jobID, taskAttemptID);
  635. }
  636.  
  637. private static void warnForJavaLibPath(String opts, String component, String javaConf, String envConf) {
  638. if (opts != null && opts.contains("-Djava.library.path")) {
  639. LOG.warn("Usage of -Djava.library.path in " + javaConf + " can cause " + "programs to no longer function if hadoop native libraries " + "are used. These values should be set as part of the " + "LD_LIBRARY_PATH in the " + component + " JVM env using " + envConf
  640. + " config settings.");
  641. }
  642. }
  643. }

YARNRunner

  1. /**
  2. * Licensed to the Apache Software Foundation (ASF) under one
  3. * or more contributor license agreements. See the NOTICE file
  4. * distributed with this work for additional information
  5. * regarding copyright ownership. The ASF licenses this file
  6. * to you under the Apache License, Version 2.0 (the
  7. * "License"); you may not use this file except in compliance
  8. * with the License. You may obtain a copy of the License at
  9. *
  10. * http://www.apache.org/licenses/LICENSE-2.0
  11. *
  12. * Unless required by applicable law or agreed to in writing, software
  13. * distributed under the License is distributed on an "AS IS" BASIS,
  14. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  15. * See the License for the specific language governing permissions and
  16. * limitations under the License.
  17. */
  18. package org.apache.hadoop.io.nativeio;
  19.  
  20. import java.io.File;
  21. import java.io.FileDescriptor;
  22. import java.io.FileInputStream;
  23. import java.io.FileOutputStream;
  24. import java.io.IOException;
  25. import java.io.RandomAccessFile;
  26. import java.lang.reflect.Field;
  27. import java.nio.ByteBuffer;
  28. import java.nio.MappedByteBuffer;
  29. import java.nio.channels.FileChannel;
  30. import java.util.Map;
  31. import java.util.concurrent.ConcurrentHashMap;
  32.  
  33. import org.apache.hadoop.classification.InterfaceAudience;
  34. import org.apache.hadoop.classification.InterfaceStability;
  35. import org.apache.hadoop.conf.Configuration;
  36. import org.apache.hadoop.fs.CommonConfigurationKeys;
  37. import org.apache.hadoop.fs.HardLink;
  38. import org.apache.hadoop.io.IOUtils;
  39. import org.apache.hadoop.io.SecureIOUtils.AlreadyExistsException;
  40. import org.apache.hadoop.util.NativeCodeLoader;
  41. import org.apache.hadoop.util.Shell;
  42. import org.apache.hadoop.util.PerformanceAdvisory;
  43. import org.apache.commons.logging.Log;
  44. import org.apache.commons.logging.LogFactory;
  45.  
  46. import sun.misc.Unsafe;
  47.  
  48. import com.google.common.annotations.VisibleForTesting;
  49.  
  50. /**
  51. * JNI wrappers for various native IO-related calls not available in Java.
  52. * These functions should generally be used alongside a fallback to another
  53. * more portable mechanism.
  54. */
  55. @InterfaceAudience.Private
  56. @InterfaceStability.Unstable
  57. public class NativeIO {
  58. public static class POSIX {
  59. // Flags for open() call from bits/fcntl.h
  60. public static final int O_RDONLY = 00;
  61. public static final int O_WRONLY = 01;
  62. public static final int O_RDWR = 02;
  63. public static final int O_CREAT = 0100;
  64. public static final int O_EXCL = 0200;
  65. public static final int O_NOCTTY = 0400;
  66. public static final int O_TRUNC = 01000;
  67. public static final int O_APPEND = 02000;
  68. public static final int O_NONBLOCK = 04000;
  69. public static final int O_SYNC = 010000;
  70. public static final int O_ASYNC = 020000;
  71. public static final int O_FSYNC = O_SYNC;
  72. public static final int O_NDELAY = O_NONBLOCK;
  73.  
  74. // Flags for posix_fadvise() from bits/fcntl.h
  75. /* No further special treatment. */
  76. public static final int POSIX_FADV_NORMAL = 0;
  77. /* Expect random page references. */
  78. public static final int POSIX_FADV_RANDOM = 1;
  79. /* Expect sequential page references. */
  80. public static final int POSIX_FADV_SEQUENTIAL = 2;
  81. /* Will need these pages. */
  82. public static final int POSIX_FADV_WILLNEED = 3;
  83. /* Don't need these pages. */
  84. public static final int POSIX_FADV_DONTNEED = 4;
  85. /* Data will be accessed once. */
  86. public static final int POSIX_FADV_NOREUSE = 5;
  87.  
  88. /* Wait upon writeout of all pages
  89. in the range before performing the
  90. write. */
  91. public static final int SYNC_FILE_RANGE_WAIT_BEFORE = 1;
  92. /* Initiate writeout of all those
  93. dirty pages in the range which are
  94. not presently under writeback. */
  95. public static final int SYNC_FILE_RANGE_WRITE = 2;
  96.  
  97. /* Wait upon writeout of all pages in
  98. the range after performing the
  99. write. */
  100. public static final int SYNC_FILE_RANGE_WAIT_AFTER = 4;
  101.  
  102. private static final Log LOG = LogFactory.getLog(NativeIO.class);
  103.  
  104. private static boolean nativeLoaded = false;
  105. private static boolean fadvisePossible = true;
  106. private static boolean syncFileRangePossible = true;
  107.  
  108. static final String WORKAROUND_NON_THREADSAFE_CALLS_KEY =
  109. "hadoop.workaround.non.threadsafe.getpwuid";
  110. static final boolean WORKAROUND_NON_THREADSAFE_CALLS_DEFAULT = true;
  111.  
  112. private static long cacheTimeout = -1;
  113.  
  114. private static CacheManipulator cacheManipulator = new CacheManipulator();
  115.  
  116. public static CacheManipulator getCacheManipulator() {
  117. return cacheManipulator;
  118. }
  119.  
  120. public static void setCacheManipulator(CacheManipulator cacheManipulator) {
  121. POSIX.cacheManipulator = cacheManipulator;
  122. }
  123.  
  124. /**
  125. * Used to manipulate the operating system cache.
  126. */
  127. @VisibleForTesting
  128. public static class CacheManipulator {
  129. public void mlock(String identifier, ByteBuffer buffer,
  130. long len) throws IOException {
  131. POSIX.mlock(buffer, len);
  132. }
  133.  
  134. public long getMemlockLimit() {
  135. return NativeIO.getMemlockLimit();
  136. }
  137.  
  138. public long getOperatingSystemPageSize() {
  139. return NativeIO.getOperatingSystemPageSize();
  140. }
  141.  
  142. public void posixFadviseIfPossible(String identifier,
  143. FileDescriptor fd, long offset, long len, int flags)
  144. throws NativeIOException {
  145. NativeIO.POSIX.posixFadviseIfPossible(identifier, fd, offset,
  146. len, flags);
  147. }
  148.  
  149. public boolean verifyCanMlock() {
  150. return NativeIO.isAvailable();
  151. }
  152. }
  153.  
  154. /**
  155. * A CacheManipulator used for testing which does not actually call mlock.
  156. * This allows many tests to be run even when the operating system does not
  157. * allow mlock, or only allows limited mlocking.
  158. */
  159. @VisibleForTesting
  160. public static class NoMlockCacheManipulator extends CacheManipulator {
  161. public void mlock(String identifier, ByteBuffer buffer,
  162. long len) throws IOException {
  163. LOG.info("mlocking " + identifier);
  164. }
  165.  
  166. public long getMemlockLimit() {
  167. return 1125899906842624L;
  168. }
  169.  
  170. public long getOperatingSystemPageSize() {
  171. return 4096;
  172. }
  173.  
  174. public boolean verifyCanMlock() {
  175. return true;
  176. }
  177. }
  178.  
  179. static {
  180. if (NativeCodeLoader.isNativeCodeLoaded()) {
  181. try {
  182. Configuration conf = new Configuration();
  183. workaroundNonThreadSafePasswdCalls = conf.getBoolean(
  184. WORKAROUND_NON_THREADSAFE_CALLS_KEY,
  185. WORKAROUND_NON_THREADSAFE_CALLS_DEFAULT);
  186.  
  187. initNative();
  188. nativeLoaded = true;
  189.  
  190. cacheTimeout = conf.getLong(
  191. CommonConfigurationKeys.HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_KEY,
  192. CommonConfigurationKeys.HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_DEFAULT) *
  193. 1000;
  194. LOG.debug("Initialized cache for IDs to User/Group mapping with a " +
  195. " cache timeout of " + cacheTimeout/1000 + " seconds.");
  196.  
  197. } catch (Throwable t) {
  198. // This can happen if the user has an older version of libhadoop.so
  199. // installed - in this case we can continue without native IO
  200. // after warning
  201. PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", t);
  202. }
  203. }
  204. }
  205.  
  206. /**
  207. * Return true if the JNI-based native IO extensions are available.
  208. */
  209. public static boolean isAvailable() {
  210. return NativeCodeLoader.isNativeCodeLoaded() && nativeLoaded;
  211. }
  212.  
  213. private static void assertCodeLoaded() throws IOException {
  214. if (!isAvailable()) {
  215. throw new IOException("NativeIO was not loaded");
  216. }
  217. }
  218.  
  219. /** Wrapper around open(2) */
  220. public static native FileDescriptor open(String path, int flags, int mode) throws IOException;
  221. /** Wrapper around fstat(2) */
  222. private static native Stat fstat(FileDescriptor fd) throws IOException;
  223.  
  224. /** Native chmod implementation. On UNIX, it is a wrapper around chmod(2) */
  225. private static native void chmodImpl(String path, int mode) throws IOException;
  226.  
  227. public static void chmod(String path, int mode) throws IOException {
  228. if (!Shell.WINDOWS) {
  229. chmodImpl(path, mode);
  230. } else {
  231. try {
  232. chmodImpl(path, mode);
  233. } catch (NativeIOException nioe) {
  234. if (nioe.getErrorCode() == 3) {
  235. throw new NativeIOException("No such file or directory",
  236. Errno.ENOENT);
  237. } else {
  238. LOG.warn(String.format("NativeIO.chmod error (%d): %s",
  239. nioe.getErrorCode(), nioe.getMessage()));
  240. throw new NativeIOException("Unknown error", Errno.UNKNOWN);
  241. }
  242. }
  243. }
  244. }
  245.  
  246. /** Wrapper around posix_fadvise(2) */
  247. static native void posix_fadvise(
  248. FileDescriptor fd, long offset, long len, int flags) throws NativeIOException;
  249.  
  250. /** Wrapper around sync_file_range(2) */
  251. static native void sync_file_range(
  252. FileDescriptor fd, long offset, long nbytes, int flags) throws NativeIOException;
  253.  
  254. /**
  255. * Call posix_fadvise on the given file descriptor. See the manpage
  256. * for this syscall for more information. On systems where this
  257. * call is not available, does nothing.
  258. *
  259. * @throws NativeIOException if there is an error with the syscall
  260. */
  261. static void posixFadviseIfPossible(String identifier,
  262. FileDescriptor fd, long offset, long len, int flags)
  263. throws NativeIOException {
  264. if (nativeLoaded && fadvisePossible) {
  265. try {
  266. posix_fadvise(fd, offset, len, flags);
  267. } catch (UnsupportedOperationException uoe) {
  268. fadvisePossible = false;
  269. } catch (UnsatisfiedLinkError ule) {
  270. fadvisePossible = false;
  271. }
  272. }
  273. }
  274.  
  275. /**
  276. * Call sync_file_range on the given file descriptor. See the manpage
  277. * for this syscall for more information. On systems where this
  278. * call is not available, does nothing.
  279. *
  280. * @throws NativeIOException if there is an error with the syscall
  281. */
  282. public static void syncFileRangeIfPossible(
  283. FileDescriptor fd, long offset, long nbytes, int flags)
  284. throws NativeIOException {
  285. if (nativeLoaded && syncFileRangePossible) {
  286. try {
  287. sync_file_range(fd, offset, nbytes, flags);
  288. } catch (UnsupportedOperationException uoe) {
  289. syncFileRangePossible = false;
  290. } catch (UnsatisfiedLinkError ule) {
  291. syncFileRangePossible = false;
  292. }
  293. }
  294. }
  295.  
  296. static native void mlock_native(
  297. ByteBuffer buffer, long len) throws NativeIOException;
  298.  
  299. /**
  300. * Locks the provided direct ByteBuffer into memory, preventing it from
  301. * swapping out. After a buffer is locked, future accesses will not incur
  302. * a page fault.
  303. *
  304. * See the mlock(2) man page for more information.
  305. *
  306. * @throws NativeIOException
  307. */
  308. static void mlock(ByteBuffer buffer, long len)
  309. throws IOException {
  310. assertCodeLoaded();
  311. if (!buffer.isDirect()) {
  312. throw new IOException("Cannot mlock a non-direct ByteBuffer");
  313. }
  314. mlock_native(buffer, len);
  315. }
  316.  
  317. /**
  318. * Unmaps the block from memory. See munmap(2).
  319. *
  320. * There isn't any portable way to unmap a memory region in Java.
  321. * So we use the sun.nio method here.
  322. * Note that unmapping a memory region could cause crashes if code
  323. * continues to reference the unmapped code. However, if we don't
  324. * manually unmap the memory, we are dependent on the finalizer to
  325. * do it, and we have no idea when the finalizer will run.
  326. *
  327. * @param buffer The buffer to unmap.
  328. */
  329. public static void munmap(MappedByteBuffer buffer) {
  330. if (buffer instanceof sun.nio.ch.DirectBuffer) {
  331. sun.misc.Cleaner cleaner =
  332. ((sun.nio.ch.DirectBuffer)buffer).cleaner();
  333. cleaner.clean();
  334. }
  335. }
  336.  
  337. /** Linux only methods used for getOwner() implementation */
  338. private static native long getUIDforFDOwnerforOwner(FileDescriptor fd) throws IOException;
  339. private static native String getUserName(long uid) throws IOException;
  340.  
  341. /**
  342. * Result type of the fstat call
  343. */
  344. public static class Stat {
  345. private int ownerId, groupId;
  346. private String owner, group;
  347. private int mode;
  348.  
  349. // Mode constants
  350. public static final int S_IFMT = 0170000; /* type of file */
  351. public static final int S_IFIFO = 0010000; /* named pipe (fifo) */
  352. public static final int S_IFCHR = 0020000; /* character special */
  353. public static final int S_IFDIR = 0040000; /* directory */
  354. public static final int S_IFBLK = 0060000; /* block special */
  355. public static final int S_IFREG = 0100000; /* regular */
  356. public static final int S_IFLNK = 0120000; /* symbolic link */
  357. public static final int S_IFSOCK = 0140000; /* socket */
  358. public static final int S_IFWHT = 0160000; /* whiteout */
  359. public static final int S_ISUID = 0004000; /* set user id on execution */
  360. public static final int S_ISGID = 0002000; /* set group id on execution */
  361. public static final int S_ISVTX = 0001000; /* save swapped text even after use */
  362. public static final int S_IRUSR = 0000400; /* read permission, owner */
  363. public static final int S_IWUSR = 0000200; /* write permission, owner */
  364. public static final int S_IXUSR = 0000100; /* execute/search permission, owner */
  365.  
  366. Stat(int ownerId, int groupId, int mode) {
  367. this.ownerId = ownerId;
  368. this.groupId = groupId;
  369. this.mode = mode;
  370. }
  371.  
  372. Stat(String owner, String group, int mode) {
  373. if (!Shell.WINDOWS) {
  374. this.owner = owner;
  375. } else {
  376. this.owner = stripDomain(owner);
  377. }
  378. if (!Shell.WINDOWS) {
  379. this.group = group;
  380. } else {
  381. this.group = stripDomain(group);
  382. }
  383. this.mode = mode;
  384. }
  385.  
  386. @Override
  387. public String toString() {
  388. return "Stat(owner='" + owner + "', group='" + group + "'" +
  389. ", mode=" + mode + ")";
  390. }
  391.  
  392. public String getOwner() {
  393. return owner;
  394. }
  395. public String getGroup() {
  396. return group;
  397. }
  398. public int getMode() {
  399. return mode;
  400. }
  401. }
  402.  
  403. /**
  404. * Returns the file stat for a file descriptor.
  405. *
  406. * @param fd file descriptor.
  407. * @return the file descriptor file stat.
  408. * @throws IOException thrown if there was an IO error while obtaining the file stat.
  409. */
  410. public static Stat getFstat(FileDescriptor fd) throws IOException {
  411. Stat stat = null;
  412. if (!Shell.WINDOWS) {
  413. stat = fstat(fd);
  414. stat.owner = getName(IdCache.USER, stat.ownerId);
  415. stat.group = getName(IdCache.GROUP, stat.groupId);
  416. } else {
  417. try {
  418. stat = fstat(fd);
  419. } catch (NativeIOException nioe) {
  420. if (nioe.getErrorCode() == 6) {
  421. throw new NativeIOException("The handle is invalid.",
  422. Errno.EBADF);
  423. } else {
  424. LOG.warn(String.format("NativeIO.getFstat error (%d): %s",
  425. nioe.getErrorCode(), nioe.getMessage()));
  426. throw new NativeIOException("Unknown error", Errno.UNKNOWN);
  427. }
  428. }
  429. }
  430. return stat;
  431. }
  432.  
  433. private static String getName(IdCache domain, int id) throws IOException {
  434. Map<Integer, CachedName> idNameCache = (domain == IdCache.USER)
  435. ? USER_ID_NAME_CACHE : GROUP_ID_NAME_CACHE;
  436. String name;
  437. CachedName cachedName = idNameCache.get(id);
  438. long now = System.currentTimeMillis();
  439. if (cachedName != null && (cachedName.timestamp + cacheTimeout) > now) {
  440. name = cachedName.name;
  441. } else {
  442. name = (domain == IdCache.USER) ? getUserName(id) : getGroupName(id);
  443. if (LOG.isDebugEnabled()) {
  444. String type = (domain == IdCache.USER) ? "UserName" : "GroupName";
  445. LOG.debug("Got " + type + " " + name + " for ID " + id +
  446. " from the native implementation");
  447. }
  448. cachedName = new CachedName(name, now);
  449. idNameCache.put(id, cachedName);
  450. }
  451. return name;
  452. }
  453.  
  454. static native String getUserName(int uid) throws IOException;
  455. static native String getGroupName(int uid) throws IOException;
  456.  
  457. private static class CachedName {
  458. final long timestamp;
  459. final String name;
  460.  
  461. public CachedName(String name, long timestamp) {
  462. this.name = name;
  463. this.timestamp = timestamp;
  464. }
  465. }
  466.  
  467. private static final Map<Integer, CachedName> USER_ID_NAME_CACHE =
  468. new ConcurrentHashMap<Integer, CachedName>();
  469.  
  470. private static final Map<Integer, CachedName> GROUP_ID_NAME_CACHE =
  471. new ConcurrentHashMap<Integer, CachedName>();
  472.  
  473. private enum IdCache { USER, GROUP }
  474.  
  475. public final static int MMAP_PROT_READ = 0x1;
  476. public final static int MMAP_PROT_WRITE = 0x2;
  477. public final static int MMAP_PROT_EXEC = 0x4;
  478.  
  479. public static native long mmap(FileDescriptor fd, int prot,
  480. boolean shared, long length) throws IOException;
  481.  
  482. public static native void munmap(long addr, long length)
  483. throws IOException;
  484. }
  485.  
  486. private static boolean workaroundNonThreadSafePasswdCalls = false;
  487.  
  488. public static class Windows {
  489. // Flags for CreateFile() call on Windows
  490. public static final long GENERIC_READ = 0x80000000L;
  491. public static final long GENERIC_WRITE = 0x40000000L;
  492.  
  493. public static final long FILE_SHARE_READ = 0x00000001L;
  494. public static final long FILE_SHARE_WRITE = 0x00000002L;
  495. public static final long FILE_SHARE_DELETE = 0x00000004L;
  496.  
  497. public static final long CREATE_NEW = 1;
  498. public static final long CREATE_ALWAYS = 2;
  499. public static final long OPEN_EXISTING = 3;
  500. public static final long OPEN_ALWAYS = 4;
  501. public static final long TRUNCATE_EXISTING = 5;
  502.  
  503. public static final long FILE_BEGIN = 0;
  504. public static final long FILE_CURRENT = 1;
  505. public static final long FILE_END = 2;
  506.  
  507. public static final long FILE_ATTRIBUTE_NORMAL = 0x00000080L;
  508.  
  509. /** Wrapper around CreateFile() on Windows */
  510. public static native FileDescriptor createFile(String path,
  511. long desiredAccess, long shareMode, long creationDisposition)
  512. throws IOException;
  513.  
  514. /** Wrapper around SetFilePointer() on Windows */
  515. public static native long setFilePointer(FileDescriptor fd,
  516. long distanceToMove, long moveMethod) throws IOException;
  517.  
  518. /** Windows only methods used for getOwner() implementation */
  519. private static native String getOwner(FileDescriptor fd) throws IOException;
  520.  
  521. /** Supported list of Windows access right flags */
  522. public static enum AccessRight {
  523. ACCESS_READ (0x0001), // FILE_READ_DATA
  524. ACCESS_WRITE (0x0002), // FILE_WRITE_DATA
  525. ACCESS_EXECUTE (0x0020); // FILE_EXECUTE
  526.  
  527. private final int accessRight;
  528. AccessRight(int access) {
  529. accessRight = access;
  530. }
  531.  
  532. public int accessRight() {
  533. return accessRight;
  534. }
  535. };
  536.  
  537. /** Windows only method used to check if the current process has requested
  538. * access rights on the given path. */
  539. private static native boolean access0(String path, int requestedAccess);
  540.  
  541. /**
  542. * Checks whether the current process has desired access rights on
  543. * the given path.
  544. *
  545. * Longer term this native function can be substituted with JDK7
  546. * function Files#isReadable, isWritable, isExecutable.
  547. *
  548. * @param path input path
  549. * @param desiredAccess ACCESS_READ, ACCESS_WRITE or ACCESS_EXECUTE
  550. * @return true if access is allowed
  551. * @throws IOException I/O exception on error
  552. */
  553. public static boolean access(String path, AccessRight desiredAccess)
  554. throws IOException {
  555. return true;
  556. // return access0(path, desiredAccess.accessRight());
  557. }
  558.  
  559. /**
  560. * Extends both the minimum and maximum working set size of the current
  561. * process. This method gets the current minimum and maximum working set
  562. * size, adds the requested amount to each and then sets the minimum and
  563. * maximum working set size to the new values. Controlling the working set
  564. * size of the process also controls the amount of memory it can lock.
  565. *
  566. * @param delta amount to increment minimum and maximum working set size
  567. * @throws IOException for any error
  568. * @see POSIX#mlock(ByteBuffer, long)
  569. */
  570. public static native void extendWorkingSetSize(long delta) throws IOException;
  571.  
  572. static {
  573. if (NativeCodeLoader.isNativeCodeLoaded()) {
  574. try {
  575. initNative();
  576. nativeLoaded = true;
  577. } catch (Throwable t) {
  578. // This can happen if the user has an older version of libhadoop.so
  579. // installed - in this case we can continue without native IO
  580. // after warning
  581. PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", t);
  582. }
  583. }
  584. }
  585. }
  586.  
  587. private static final Log LOG = LogFactory.getLog(NativeIO.class);
  588.  
  589. private static boolean nativeLoaded = false;
  590.  
  591. static {
  592. if (NativeCodeLoader.isNativeCodeLoaded()) {
  593. try {
  594. initNative();
  595. nativeLoaded = true;
  596. } catch (Throwable t) {
  597. // This can happen if the user has an older version of libhadoop.so
  598. // installed - in this case we can continue without native IO
  599. // after warning
  600. PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", t);
  601. }
  602. }
  603. }
  604.  
  605. /**
  606. * Return true if the JNI-based native IO extensions are available.
  607. */
  608. public static boolean isAvailable() {
  609. return NativeCodeLoader.isNativeCodeLoaded() && nativeLoaded;
  610. }
  611.  
  612. /** Initialize the JNI method ID and class ID cache */
  613. private static native void initNative();
  614.  
  615. /**
  616. * Get the maximum number of bytes that can be locked into memory at any
  617. * given point.
  618. *
  619. * @return 0 if no bytes can be locked into memory;
  620. * Long.MAX_VALUE if there is no limit;
  621. * The number of bytes that can be locked into memory otherwise.
  622. */
  623. static long getMemlockLimit() {
  624. return isAvailable() ? getMemlockLimit0() : 0;
  625. }
  626.  
  627. private static native long getMemlockLimit0();
  628.  
  629. /**
  630. * @return the operating system's page size.
  631. */
  632. static long getOperatingSystemPageSize() {
  633. try {
  634. Field f = Unsafe.class.getDeclaredField("theUnsafe");
  635. f.setAccessible(true);
  636. Unsafe unsafe = (Unsafe)f.get(null);
  637. return unsafe.pageSize();
  638. } catch (Throwable e) {
  639. LOG.warn("Unable to get operating system page size. Guessing 4096.", e);
  640. return 4096;
  641. }
  642. }
  643.  
  644. private static class CachedUid {
  645. final long timestamp;
  646. final String username;
  647. public CachedUid(String username, long timestamp) {
  648. this.timestamp = timestamp;
  649. this.username = username;
  650. }
  651. }
  652. private static final Map<Long, CachedUid> uidCache =
  653. new ConcurrentHashMap<Long, CachedUid>();
  654. private static long cacheTimeout;
  655. private static boolean initialized = false;
  656.  
  657. /**
  658. * The Windows logon name has two part, NetBIOS domain name and
  659. * user account name, of the format DOMAIN\UserName. This method
  660. * will remove the domain part of the full logon name.
  661. *
  662. * @param Fthe full principal name containing the domain
  663. * @return name with domain removed
  664. */
  665. private static String stripDomain(String name) {
  666. int i = name.indexOf('\\');
  667. if (i != -1)
  668. name = name.substring(i + 1);
  669. return name;
  670. }
  671.  
  672. public static String getOwner(FileDescriptor fd) throws IOException {
  673. ensureInitialized();
  674. if (Shell.WINDOWS) {
  675. String owner = Windows.getOwner(fd);
  676. owner = stripDomain(owner);
  677. return owner;
  678. } else {
  679. long uid = POSIX.getUIDforFDOwnerforOwner(fd);
  680. CachedUid cUid = uidCache.get(uid);
  681. long now = System.currentTimeMillis();
  682. if (cUid != null && (cUid.timestamp + cacheTimeout) > now) {
  683. return cUid.username;
  684. }
  685. String user = POSIX.getUserName(uid);
  686. LOG.info("Got UserName " + user + " for UID " + uid
  687. + " from the native implementation");
  688. cUid = new CachedUid(user, now);
  689. uidCache.put(uid, cUid);
  690. return user;
  691. }
  692. }
  693.  
  694. /**
  695. * Create a FileInputStream that shares delete permission on the
  696. * file opened, i.e. other process can delete the file the
  697. * FileInputStream is reading. Only Windows implementation uses
  698. * the native interface.
  699. */
  700. public static FileInputStream getShareDeleteFileInputStream(File f)
  701. throws IOException {
  702. if (!Shell.WINDOWS) {
  703. // On Linux the default FileInputStream shares delete permission
  704. // on the file opened.
  705. //
  706. return new FileInputStream(f);
  707. } else {
  708. // Use Windows native interface to create a FileInputStream that
  709. // shares delete permission on the file opened.
  710. //
  711. FileDescriptor fd = Windows.createFile(
  712. f.getAbsolutePath(),
  713. Windows.GENERIC_READ,
  714. Windows.FILE_SHARE_READ |
  715. Windows.FILE_SHARE_WRITE |
  716. Windows.FILE_SHARE_DELETE,
  717. Windows.OPEN_EXISTING);
  718. return new FileInputStream(fd);
  719. }
  720. }
  721.  
  722. /**
  723. * Create a FileInputStream that shares delete permission on the
  724. * file opened at a given offset, i.e. other process can delete
  725. * the file the FileInputStream is reading. Only Windows implementation
  726. * uses the native interface.
  727. */
  728. public static FileInputStream getShareDeleteFileInputStream(File f, long seekOffset)
  729. throws IOException {
  730. if (!Shell.WINDOWS) {
  731. RandomAccessFile rf = new RandomAccessFile(f, "r");
  732. if (seekOffset > 0) {
  733. rf.seek(seekOffset);
  734. }
  735. return new FileInputStream(rf.getFD());
  736. } else {
  737. // Use Windows native interface to create a FileInputStream that
  738. // shares delete permission on the file opened, and set it to the
  739. // given offset.
  740. //
  741. FileDescriptor fd = NativeIO.Windows.createFile(
  742. f.getAbsolutePath(),
  743. NativeIO.Windows.GENERIC_READ,
  744. NativeIO.Windows.FILE_SHARE_READ |
  745. NativeIO.Windows.FILE_SHARE_WRITE |
  746. NativeIO.Windows.FILE_SHARE_DELETE,
  747. NativeIO.Windows.OPEN_EXISTING);
  748. if (seekOffset > 0)
  749. NativeIO.Windows.setFilePointer(fd, seekOffset, NativeIO.Windows.FILE_BEGIN);
  750. return new FileInputStream(fd);
  751. }
  752. }
  753.  
  754. /**
  755. * Create the specified File for write access, ensuring that it does not exist.
  756. * @param f the file that we want to create
  757. * @param permissions we want to have on the file (if security is enabled)
  758. *
  759. * @throws AlreadyExistsException if the file already exists
  760. * @throws IOException if any other error occurred
  761. */
  762. public static FileOutputStream getCreateForWriteFileOutputStream(File f, int permissions)
  763. throws IOException {
  764. if (!Shell.WINDOWS) {
  765. // Use the native wrapper around open(2)
  766. try {
  767. FileDescriptor fd = NativeIO.POSIX.open(f.getAbsolutePath(),
  768. NativeIO.POSIX.O_WRONLY | NativeIO.POSIX.O_CREAT
  769. | NativeIO.POSIX.O_EXCL, permissions);
  770. return new FileOutputStream(fd);
  771. } catch (NativeIOException nioe) {
  772. if (nioe.getErrno() == Errno.EEXIST) {
  773. throw new AlreadyExistsException(nioe);
  774. }
  775. throw nioe;
  776. }
  777. } else {
  778. // Use the Windows native APIs to create equivalent FileOutputStream
  779. try {
  780. FileDescriptor fd = NativeIO.Windows.createFile(f.getCanonicalPath(),
  781. NativeIO.Windows.GENERIC_WRITE,
  782. NativeIO.Windows.FILE_SHARE_DELETE
  783. | NativeIO.Windows.FILE_SHARE_READ
  784. | NativeIO.Windows.FILE_SHARE_WRITE,
  785. NativeIO.Windows.CREATE_NEW);
  786. NativeIO.POSIX.chmod(f.getCanonicalPath(), permissions);
  787. return new FileOutputStream(fd);
  788. } catch (NativeIOException nioe) {
  789. if (nioe.getErrorCode() == 80) {
  790. // ERROR_FILE_EXISTS
  791. // 80 (0x50)
  792. // The file exists
  793. throw new AlreadyExistsException(nioe);
  794. }
  795. throw nioe;
  796. }
  797. }
  798. }
  799.  
  800. private synchronized static void ensureInitialized() {
  801. if (!initialized) {
  802. cacheTimeout =
  803. new Configuration().getLong("hadoop.security.uid.cache.secs",
  804. 4*60*60) * 1000;
  805. LOG.info("Initialized cache for UID to User mapping with a cache" +
  806. " timeout of " + cacheTimeout/1000 + " seconds.");
  807. initialized = true;
  808. }
  809. }
  810.  
  811. /**
  812. * A version of renameTo that throws a descriptive exception when it fails.
  813. *
  814. * @param src The source path
  815. * @param dst The destination path
  816. *
  817. * @throws NativeIOException On failure.
  818. */
  819. public static void renameTo(File src, File dst)
  820. throws IOException {
  821. if (!nativeLoaded) {
  822. if (!src.renameTo(dst)) {
  823. throw new IOException("renameTo(src=" + src + ", dst=" +
  824. dst + ") failed.");
  825. }
  826. } else {
  827. renameTo0(src.getAbsolutePath(), dst.getAbsolutePath());
  828. }
  829. }
  830.  
  831. public static void link(File src, File dst) throws IOException {
  832. if (!nativeLoaded) {
  833. HardLink.createHardLink(src, dst);
  834. } else {
  835. link0(src.getAbsolutePath(), dst.getAbsolutePath());
  836. }
  837. }
  838.  
  839. /**
  840. * A version of renameTo that throws a descriptive exception when it fails.
  841. *
  842. * @param src The source path
  843. * @param dst The destination path
  844. *
  845. * @throws NativeIOException On failure.
  846. */
  847. private static native void renameTo0(String src, String dst)
  848. throws NativeIOException;
  849.  
  850. private static native void link0(String src, String dst)
  851. throws NativeIOException;
  852.  
  853. /**
  854. * Unbuffered file copy from src to dst without tainting OS buffer cache
  855. *
  856. * In POSIX platform:
  857. * It uses FileChannel#transferTo() which internally attempts
  858. * unbuffered IO on OS with native sendfile64() support and falls back to
  859. * buffered IO otherwise.
  860. *
  861. * It minimizes the number of FileChannel#transferTo call by passing the the
  862. * src file size directly instead of a smaller size as the 3rd parameter.
  863. * This saves the number of sendfile64() system call when native sendfile64()
  864. * is supported. In the two fall back cases where sendfile is not supported,
  865. * FileChannle#transferTo already has its own batching of size 8 MB and 8 KB,
  866. * respectively.
  867. *
  868. * In Windows Platform:
  869. * It uses its own native wrapper of CopyFileEx with COPY_FILE_NO_BUFFERING
  870. * flag, which is supported on Windows Server 2008 and above.
  871. *
  872. * Ideally, we should use FileChannel#transferTo() across both POSIX and Windows
  873. * platform. Unfortunately, the wrapper(Java_sun_nio_ch_FileChannelImpl_transferTo0)
  874. * used by FileChannel#transferTo for unbuffered IO is not implemented on Windows.
  875. * Based on OpenJDK 6/7/8 source code, Java_sun_nio_ch_FileChannelImpl_transferTo0
  876. * on Windows simply returns IOS_UNSUPPORTED.
  877. *
  878. * Note: This simple native wrapper does minimal parameter checking before copy and
  879. * consistency check (e.g., size) after copy.
  880. * It is recommended to use wrapper function like
  881. * the Storage#nativeCopyFileUnbuffered() function in hadoop-hdfs with pre/post copy
  882. * checks.
  883. *
  884. * @param src The source path
  885. * @param dst The destination path
  886. * @throws IOException
  887. */
  888. public static void copyFileUnbuffered(File src, File dst) throws IOException {
  889. if (nativeLoaded && Shell.WINDOWS) {
  890. copyFileUnbuffered0(src.getAbsolutePath(), dst.getAbsolutePath());
  891. } else {
  892. FileInputStream fis = null;
  893. FileOutputStream fos = null;
  894. FileChannel input = null;
  895. FileChannel output = null;
  896. try {
  897. fis = new FileInputStream(src);
  898. fos = new FileOutputStream(dst);
  899. input = fis.getChannel();
  900. output = fos.getChannel();
  901. long remaining = input.size();
  902. long position = 0;
  903. long transferred = 0;
  904. while (remaining > 0) {
  905. transferred = input.transferTo(position, remaining, output);
  906. remaining -= transferred;
  907. position += transferred;
  908. }
  909. } finally {
  910. IOUtils.cleanup(LOG, output);
  911. IOUtils.cleanup(LOG, fos);
  912. IOUtils.cleanup(LOG, input);
  913. IOUtils.cleanup(LOG, fis);
  914. }
  915. }
  916. }
  917.  
  918. private static native void copyFileUnbuffered0(String src, String dst)
  919. throws NativeIOException;
  920. }

NativeIO

三、更多MR编程实例

    1.使用mapreduce实现join算法

      需求:

      

      思路:    

        通过将关联的条件作为map输出的key,将两表满足join条件的数据并携带数据所来源的文件信息,

       发往同一个reduce task,在reduce中进行数据的串联

      代码:   

package com.mr.join;

import org.apache.hadoop.io.Writable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException; /**
* bean
*
* @author zcc ON 2018/2/5
**/
public class InfoBean implements Writable{ private int order_id;
private String dateString;
private String p_id;
private int amount;
private String pname;
private int category_id;
private float price;
/**
* flag为0表示封装订单记录,为1表示产品记录
*/
private String flag;
public InfoBean() {
} public InfoBean(int order_id, String dateString, String p_id, int amount, String pname, int category_id, float price, String flag) {
this.order_id = order_id;
this.dateString = dateString;
this.p_id = p_id;
this.amount = amount;
this.pname = pname;
this.category_id = category_id;
this.price = price;
this.flag = flag;
}
public void set(int order_id, String dateString, String p_id, int amount, String pname, int category_id, float price, String flag) {
this.order_id = order_id;
this.dateString = dateString;
this.p_id = p_id;
this.amount = amount;
this.pname = pname;
this.category_id = category_id;
this.price = price;
this.flag = flag;
}
public int getOrder_id() {
return order_id;
} public void setOrder_id(int order_id) {
this.order_id = order_id;
} public String getDateString() {
return dateString;
} public void setDateString(String dateString) {
this.dateString = dateString;
} public String getP_id() {
return p_id;
} public void setP_id(String p_id) {
this.p_id = p_id;
} public int getAmount() {
return amount;
} public void setAmount(int amount) {
this.amount = amount;
} public String getPname() {
return pname;
} public void setPname(String pname) {
this.pname = pname;
} public int getCategory_id() {
return category_id;
} public void setCategory_id(int category_id) {
this.category_id = category_id;
} public float getPrice() {
return price;
} public void setPrice(float price) {
this.price = price;
} public String getFlag() {
return flag;
} public void setFlag(String flag) {
this.flag = flag;
} @Override
public void write(DataOutput out) throws IOException {
out.writeInt(order_id);
out.writeUTF(dateString);
out.writeUTF(p_id);
out.writeInt(amount);
out.writeUTF(pname);
out.writeInt(category_id);
out.writeFloat(price);
out.writeUTF(flag);
} @Override
public void readFields(DataInput in) throws IOException {
this.order_id = in.readInt();
this.dateString = in.readUTF();
this.p_id = in.readUTF();
this.amount = in.readInt();
this.pname = in.readUTF();
this.category_id = in.readInt();
this.price = in.readFloat();
this.flag = in.readUTF();
} @Override
public String toString() {
return "order_id=" + order_id +
", dateString=" + dateString +
", p_id=" + p_id +
", amount=" + amount +
", pname=" + pname +
", category_id=" + category_id +
", price=" + price +
", flag=" + flag;
}
}

InfoBean

package com.mr.join;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileSplit; import java.io.IOException; /**
* mapper
*
* @author zcc ON 2018/2/5
**/
public class JoinMapper extends Mapper<LongWritable,Text,Text,InfoBean>{
InfoBean bean = new InfoBean();
Text k = new Text();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String p_id;
// 通过context读取当前切片信息,当前是文件,可以强转为文件切片(context这个API需要注意)
FileSplit inputSplit = (FileSplit) context.getInputSplit();
String fileName = inputSplit.getPath().getName();
if (fileName.startsWith("1")) {
String[] fields = line.split(",");
p_id = fields[2];
// 为了防止bean里面write等方法出现空指针,我们给后面不存在的变量赋予默认值
bean.set(Integer.parseInt(fields[0]), fields[1], p_id, Integer.parseInt(fields[3]), "", 0, 0, "0");
} else {
String[] fields = line.split(",");
p_id = fields[0];
// 为了防止bean里面write等方法出现空指针,我们给后面不存在的变量赋予默认值
bean.set(0, "", p_id, 0, fields[1], Integer.parseInt(fields[2]), Float.parseFloat(fields[3]), "1");
}
k.set(p_id);
// 将Pid作为key写出去
context.write(k, bean);
}
}

JoinMapper

package com.mr.join;

import org.apache.commons.beanutils.BeanUtils;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException;
import java.lang.reflect.InvocationTargetException;
import java.util.ArrayList;
import java.util.List; /**
* reducer
*
* @author zcc ON 2018/2/5
**/
public class JoinReducer extends Reducer<Text, InfoBean, InfoBean, NullWritable>{
@Override
protected void reduce(Text key, Iterable<InfoBean> values, Context context) throws IOException, InterruptedException {
InfoBean pBean = new InfoBean();
List<InfoBean> orderBeanList = new ArrayList<>();
for (InfoBean value : values) {
if ("1".equals(value.getFlag())) {
try {
BeanUtils.copyProperties(pBean, value);
} catch (Exception e) {
e.printStackTrace();
}
} else {
InfoBean oBean = new InfoBean();
try {
BeanUtils.copyProperties(oBean, value);
orderBeanList.add(oBean);
} catch (Exception e) {
e.printStackTrace();
}
}
}
// 拼接bean,形成最终结果
for (InfoBean orderBean : orderBeanList) {
orderBean.setPname(pBean.getPname());
orderBean.setCategory_id(pBean.getCategory_id());
orderBean.setPrice(pBean.getPrice()); context.write(orderBean, NullWritable.get());
}
}
}

JoinReducer

package com.mr.join;

import com.mr.flowsum.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /**
* driver
*
* @author zcc ON 2018/2/5
**/
public class JoinDriver {
public static void main(String[] args) throws Exception{
Configuration conf = new Configuration();
/*conf.set("mapreduce.framework.name", "yarn");
conf.set("yarn.resourcemanager.hostname", "mini1");
conf.set("fs.defaultFS", "hdfs://mini1:9000/");*/
Job job = Job.getInstance(conf);
// 设置本程序jar包本地位置
job.setJarByClass(JoinDriver.class);
// 指定本业务job要使用的mapper/reducer业务类
job.setMapperClass(JoinMapper.class);
job.setReducerClass(JoinReducer.class);
// 指定map输出的数据类型(由于可插拔的序列化机制导致)
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(InfoBean.class);
// 指定最终输出(reduce)的的数据类型(可选,因为有时候不需要reduce)
job.setOutputKeyClass(InfoBean.class);
job.setOutputValueClass(NullWritable.class);
// 指定job的原始输入/输出目录(可以改为由外面输入,而不必写死)
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// 提交(将job中的相关参数以及java类所在的jar包提交给yarn运行)
// job.submit();
// 反馈集群信息
boolean b = job.waitForCompletion(true);
System.exit(b ? 0 :1);
}
}

JoinDriver

    打包到服务器上,使用命令跑起来:

hadoop jar zk04.jar com.mr.join.JoinDriver /join/input /join/output

    如果需要跟踪代码进行DEBUG查看业务效果,可以采用第一种的本地模式调试!

大数据入门第八天——MapReduce详解(四)本地模式运行与join实例的更多相关文章

  1. 大数据入门第八天——MapReduce详解(三)MR的shuffer、combiner与Yarn集群分析

    /mr的combiner /mr的排序 /mr的shuffle /mr与yarn /mr运行模式 /mr实现join /mr全局图 /mr的压缩 今日提纲 一.流量汇总排序的实现 1.需求 对日志数据 ...

  2. 大数据入门第九天——MapReduce详解(六)MR其他补充

    一.自定义in/outputFormat 1.需求 现有一些原始日志需要做增强解析处理,流程: 1. 从原始日志文件中读取数据 2. 根据日志中的一个URL字段到外部知识库中获取信息增强到原始日志 3 ...

  3. 大数据入门第九天——MapReduce详解(五)mapJoin、GroupingComparator与更多MR实例

    一.数据倾斜分析——mapJoin 1.背景 接上一个day的Join算法,我们的解决join的方式是:在reduce端通过pid进行串接,这样的话: --order ,,P0001, ,,P0001 ...

  4. .NET DLL 保护措施详解(四)各操作系统运行情况

    我准备了WEB应用程序及WinForm应用程序,分别在WIN SERVER 2012/2008/2003.Win7/10上实测,以下为实测结果截图: 2012 2008 2003 WIN7 WIN10 ...

  5. 大数据入门第七天——MapReduce详解(一)入门与简单示例

    一.概述 1.map-reduce是什么 Hadoop MapReduce is a software framework for easily writing applications which ...

  6. 大数据入门第十六天——流式计算之storm详解(一)入门与集群安装

    一.概述 今天起就正式进入了流式计算.这里先解释一下流式计算的概念 离线计算 离线计算:批量获取数据.批量传输数据.周期性批量计算数据.数据展示 代表技术:Sqoop批量导入数据.HDFS批量存储数据 ...

  7. 大数据入门第十四天——Hbase详解(一)入门与安装配置

    一.概述 1.什么是Hbase 根据官网:https://hbase.apache.org/ Apache HBase™ is the Hadoop database, a distributed, ...

  8. 大数据入门第十七天——storm上游数据源 之kafka详解(一)入门与集群安装

    一.概述 1.kafka是什么 根据标题可以有个概念:kafka是storm的上游数据源之一,也是一对经典的组合,就像郭德纲和于谦 根据官网:http://kafka.apache.org/intro ...

  9. 大数据入门第十七天——storm上游数据源 之kafka详解(三)其他问题

    一.kafka文件存储机制 1.topic存储 在Kafka文件存储中,同一个topic下有多个不同partition,每个partition为一个目录,partiton命名规则为topic名称+有序 ...

随机推荐

  1. LeetCode题解之 two sum 问题

    1.题目描述 2.题目分析 考虑使用hashMap的方式将数组中的每个元素和下表对应存储起来,然后遍历数组,计算target 和 数组中每个元素的差值,在hashMap中寻找,一直到找到最后一对. 3 ...

  2. SQL SERVER Management Studio编写SQL时没有智能提示的解决方式

    1. 检查设置里是否启用智能感知(Intellisence),可以在“工具”→“选项”里设置 2. 如果启用后还是无效,可以新建一个查询窗口查询,输入关键词的前面几个字母看是否有提示(或者使用Ctrl ...

  3. flask的g对象

    故名思议我们可以理解这个g对象是一个全局的对象,这个对象存储的是我们这一次请求的所有的信息,只是存储这一次的请求 g:global 1. g对象是专门用来保存用户的数据的.  2. g对象在一次请求中 ...

  4. UIWindow,UINavigationController与UIViewController之间的关系

    UIWindow,UINavigationController与UIViewController之间的关系 虽然每次你都用UINavigationController与UIWindow,但你不一定知道 ...

  5. 使用FBTweak

    使用FBTweak https://github.com/facebook/Tweaks FBTweak是Facebook的一款开源库,用于微调数据而无需我们重复编译跑真机用的,它支持4种类型的cel ...

  6. [2018HN省队集训D8T3] 水果拼盘

    [2018HN省队集训D8T3] 水果拼盘 题意 给定 \(n\) 个集合, 每个集合包含 \([1,m]\) 中的一些整数, 在这些集合中随机选取 \(k\) 个集合, 求这 \(k\) 个集合的并 ...

  7. Maven实战(十)利用 Nexus 来构建企业级 Maven 仓库

    目录 一.简介 Nexus是Maven仓库管理器,用来搭建一个本地仓库服务器,这样做的好处是便于管理,节省网络资源,速度快,还有一个非常有用的功能就是可以通过项目的SNAPSHOT版本管理,来进行模块 ...

  8. 1927. [SDOI2010]星际竞速【费用流】

    Description 10年一度的银河系赛车大赛又要开始了.作为全银河最盛大的活动之一,夺得这个项目的冠军无疑是很多人的 梦想,来自杰森座α星的悠悠也是其中之一.赛车大赛的赛场由N颗行星和M条双向星 ...

  9. PHP microtime() 函数

    定义和用法 microtime() 函数返回当前 Unix 时间戳和微秒数. 语法 microtime(get_as_float) 参数 描述 get_as_float 如果给出了 get_as_fl ...

  10. 说说Javac

    Java语言有Java语言的规范,,这个规范详细描述了Java语言有哪些词法.语法,而Java虚拟机也有其Java虚拟机的规范,同样Java虚拟机的规范和Java语言规范并不一样,它们都有自己的词法和 ...