今天,在做canopy算法实例时,遇到这个问题,所以记录下来。下面是源码:

  

package czx.com.mahout;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.util.ToolRunner;
import org.apache.mahout.common.AbstractJob;
import org.apache.mahout.math.RandomAccessSparseVector;
import org.apache.mahout.math.VectorWritable; public class TextVecWrite extends AbstractJob { public static void main(String[] args) throws Exception {
ToolRunner.run(new Configuration(), new TextVecWrite(), args);
} /**
* TextVecWriterMapper
* @author czx
*;
*/
public static class TextVecWriterMapper extends Mapper<LongWritable, Text, LongWritable,VectorWritable >{
@SuppressWarnings("unchecked")
@Override
protected void map(LongWritable key, Text value,
@SuppressWarnings("rawtypes") org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException, InterruptedException {
String[] split = value.toString().split("\\s{1,}");
RandomAccessSparseVector vector = new RandomAccessSparseVector(split.length);
for(int i=0;i<split.length;++i){
vector.set(i, Double.parseDouble(split[i]));
}
VectorWritable vectorWritable = new VectorWritable(vector);
context.write(key, vectorWritable);
}
} /**
* TextVectorWritableReducer
* @author czx
*
*/
public static class TextVectorWritableReducer extends Reducer<LongWritable, VectorWritable, LongWritable , VectorWritable >{
@Override
protected void reduce(LongWritable arg0, Iterable<VectorWritable> arg1,
Context arg2)
throws IOException, InterruptedException {
for(VectorWritable v:arg1){
arg2.write(arg0, v);
}
}
} @Override
public int run(String[] arg0) throws Exception {
addInputOption();
addOutputOption();
if(parseArguments(arg0)==null){
return -1;
}
Path input = getInputPath();
Path output = getOutputPath();
Configuration conf = getConf(); Job job = new Job(conf,"textvectorWritable with input:"+input.getName());
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setMapperClass(TextVecWriterMapper.class);
job.setReducerClass(TextVectorWritableReducer.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setOutputValueClass(VectorWritable.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(VectorWritable.class);
job.setJarByClass(TextVecWrite.class); FileInputFormat.addInputPath(job, input);
SequenceFileOutputFormat.setOutputPath(job, output); if(!job.waitForCompletion(true)){
throw new InterruptedException("Canopy Job failed processing "+input);
}
return 0;
}
}

  将程序编译打包成JAR并运行如下:

  

 hadoop jar ClusteringUtils.jar czx.com.mahout.TextVecWrite -i /user/hadoop/testdata/synthetic_control.data -o /home/czx/1

  但出现如下错误:

  

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/cli2/Option
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.cli2.Option
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 3 more

  最后,发现是将mahout根目录下的相应的jar包复制到hadoop-2.4.1/share/hadoop/common/lib文件夹下时,少复制了mahout-core-0.9-job.jar,于是复制mahout-core-0.9-job.jar后,重新启动hadoop即可。

  

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/cli2/Option的更多相关文章

  1. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

    学习架构探险,从零开始写Java Web框架时,在学习到springAOP时遇到一个异常: "C:\Program Files\Java\jdk1.7.0_40\bin\java" ...

  2. MyEclipse8.5集成Tomcat7时的启动错误:Exception in thread “main” java.lang.NoClassDefFoundError org/apache/commons/logging/LogFactory

    今天,安装Tomcat7.0.21后,单独用D:\apache-tomcat-7.0.21\bin\startup.bat启动web服务正常.但在MyEclipse8.5中集成配置Tomcat7后,在 ...

  3. MyEclipse8.5集成Tomcat7时的启动错误:Exception in thread “main” java.lang.NoClassDefFoundError org/apache/commons/logging/LogFactory

    今天,安装Tomcat7.0.21后,单独用D:\apache-tomcat-7.0.21\bin\startup.bat启动web服务正常.但 在MyEclipse8.5中集成配置Tomcat7后, ...

  4. 【转】MyEclipse8.5集成Tomcat7时的启动错误:Exception in thread “main” java.lang.NoClassDefFoundError org/apache/commons/logging/LogFactory

    http://www.cnblogs.com/newsouls/p/4021198.html 今天,安装Tomcat7.0.21后,单独用D:\apache-tomcat-7.0.21\bin\sta ...

  5. shiro报错SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".和Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

    未能加载类"org.slf4j.impl.StaticLoggerBinder" 解决方案: <dependency> <groupId>org.slf4j ...

  6. spark-shell报错:Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

    环境: openSUSE42.2 hadoop2.6.0-cdh5.10.0 spark1.6.0-cdh5.10.0 按照网上的spark安装教程安装完之后,启动spark-shell,出现如下报错 ...

  7. 报错:Exception in thread "main" java.lang.NoClassDefFoundError: Lorg/apache/hadoop/fs/FileSystem

    报错现象: Exception in thread "main" java.lang.NoClassDefFoundError: Lorg/apache/hadoop/fs/Fil ...

  8. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/CanUnbuffer

    在执行spark on hive 的时候在  sql.show()处报错 : Exception in thread "main" java.lang.NoClassDefFoun ...

  9. Exception in thread main java.lang.NoClassDefFoundError: org/apache/juli/logging/LogFacto

    报错: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/juli/logging/Log ...

随机推荐

  1. API网关原理

    1.API网关介绍 API网关是一个服务器,是系统的唯一入口.从面向对象设计的角度看,它与外观模式类似.API网关封装了系统内部架构,为每个客户端提供一个定制的API.它可能还具有其它职责,如身份验证 ...

  2. jquery header选择器 语法

    jquery header选择器 语法 作用::header 选择器选取所有标题元素(h1 - h6).广州大理石机械构件 语法:$(":header") jquery heade ...

  3. POJ 2289 多重二分匹配+二分 模板

    题意:在通讯录中有N个人,每个人能可能属于多个group,现要将这些人分组m组,设各组中的最大人数为max,求出该最小的最大值 下面用的是朴素的查找,核心代码find_path复杂度是VE的,不过据说 ...

  4. Anaconda安装PyTorch

    Anaconda是一个Python语言管理器,支持安装基于Python的开发包,例如tensorflow.Pytorch等,以及各种基于Python的IDE. https://www.jb51.net ...

  5. Android中播放声音

    在Android系统中,有两种播放声音的方式,一种是通过MediaPlayer,另外一种是通过SoundPool.前者主要用于播放长时间的音乐,而后者用于播放小段小段的音效,像按键音这种,其优点是资源 ...

  6. 采用.bat批处理命令快速设置Java环境变量

    背景: java课程培训,每次到机房需要重新安装JDK,每次都采用图形界面进行操作比较麻烦(慢),于是在网上查了一下CMD命令设置系统环境变量的方法,再次记录下来. 设置方法: 1.找到JDK安装路径 ...

  7. Httpwatch抓包

    一.下载Httpwatch 二.抓包 1.启动Httpwatch 打开浏览器-选择工具-Httpwatch professional(仅适用于IE和火狐40及以下浏览器) 2.开始抓包 点击“Reco ...

  8. 阿里第一天——maven学习

    详见该博客的讲解 http://www.cnblogs.com/dcba1112/archive/2011/05/01/2033805.html 几个重要的命令: 1,mvn的作用 其他的构建工具包括 ...

  9. [LeetCode]-011-Integer_to_Roman

    Given an integer, convert it to a roman numeral. Input is guaranteed to be within the range from 1 t ...

  10. SQL Server CDC最佳实践

    企业核心业务系统oltp的数据需要通过ETL同步到数据仓库,原始的ETL流程通过定制化从SQL Server中进行数据抽取,经过生产环境的监控,发现ETL过程的query会对生产系统造成额外负载.于是 ...