不管是实验室研究机器学习算法或是公司研发,都有需要自己改进算法的时候,下面就说说怎么在weka里增加改进的机器学习算法。

  一 添加分类算法的流程

  1 编写的分类器必须继承 Classifier或是Classifier的子类;下面用比较简单的zeroR举例说明;

  2 复写接口 buildClassifier,其是主要的方法之一,功能是构造分类器,训练模型;

  3 复写接口 classifyInstance,功能是预测一个标签的概率;或实现distributeForInstance,功能是对得到所有的概率分布;

  4 复写接口getCapabilities,其决定显示哪个分类器,否则为灰色;

  5 参数option的set/get方法;

  6 globalInfo和seedTipText方法,功能是说明作用;

  7 见 第二部分,把这个分类器增加到weka应用程序上;

  zeroR.java源码

  

  1. /*
  2. * This program is free software; you can redistribute it and/or modify
  3. * it under the terms of the GNU General Public License as published by
  4. * the Free Software Foundation; either version 2 of the License, or
  5. * (at your option) any later version.
  6. *
  7. * This program is distributed in the hope that it will be useful,
  8. * but WITHOUT ANY WARRANTY; without even the implied warranty of
  9. * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  10. * GNU General Public License for more details.
  11. *
  12. * You should have received a copy of the GNU General Public License
  13. * along with this program; if not, write to the Free Software
  14. * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  15. */
  16.  
  17. /*
  18. * ZeroR.java
  19. * Copyright (C) 1999 Eibe Frank
  20. *
  21. */
  22.  
  23. package weka.classifiers.rules;
  24.  
  25. import weka.classifiers.Classifier;
  26. import weka.classifiers.Evaluation;
  27. import java.io.*;
  28. import java.util.*;
  29. import weka.core.*;
  30.  
  31. /**
  32. * Class for building and using a 0-R classifier. Predicts the mean
  33. * (for a numeric class) or the mode (for a nominal class).
  34. *
  35. * @author Eibe Frank (eibe@cs.waikato.ac.nz)
  36. * @version $Revision: 1.11 $
  37. */
  38. public class ZeroR extends Classifier implements WeightedInstancesHandler {
  39.  
  40. /** The class value 0R predicts. */
  41. private double m_ClassValue;
  42.  
  43. /** The number of instances in each class (null if class numeric). */
  44. private double [] m_Counts;
  45.  
  46. /** The class attribute. */
  47. private Attribute m_Class;
  48.  
  49. /**
  50. * Returns a string describing classifier
  51. * @return a description suitable for
  52. * displaying in the explorer/experimenter gui
  53. */
  54. public String globalInfo() {
  55. return "Class for building and using a 0-R classifier. Predicts the mean "
  56. + "(for a numeric class) or the mode (for a nominal class).";
  57. }
  58.  
  59. /**
  60. * Generates the classifier.
  61. *
  62. * @param instances set of instances serving as training data
  63. * @exception Exception if the classifier has not been generated successfully
  64. */
  65. public void buildClassifier(Instances instances) throws Exception {
  66.  
  67. double sumOfWeights = ;
  68.  
  69. m_Class = instances.classAttribute();
  70. m_ClassValue = ;
  71. switch (instances.classAttribute().type()) {
  72. case Attribute.NUMERIC:
  73. m_Counts = null;
  74. break;
  75. case Attribute.NOMINAL:
  76. m_Counts = new double [instances.numClasses()];
  77. for (int i = ; i < m_Counts.length; i++) {
  78. m_Counts[i] = ;
  79. }
  80. sumOfWeights = instances.numClasses();
  81. break;
  82. default:
  83. throw new Exception("ZeroR can only handle nominal and numeric class"
  84. + " attributes.");
  85. }
  86. Enumeration enu = instances.enumerateInstances();
  87. while (enu.hasMoreElements()) {
  88. Instance instance = (Instance) enu.nextElement();
  89. if (!instance.classIsMissing()) {
  90. if (instances.classAttribute().isNominal()) {
  91. m_Counts[(int)instance.classValue()] += instance.weight();
  92. } else {
  93. m_ClassValue += instance.weight() * instance.classValue();
  94. }
  95. sumOfWeights += instance.weight();
  96. }
  97. }
  98. if (instances.classAttribute().isNumeric()) {
  99. if (Utils.gr(sumOfWeights, )) {
  100. m_ClassValue /= sumOfWeights;
  101. }
  102. } else {
  103. m_ClassValue = Utils.maxIndex(m_Counts);
  104. Utils.normalize(m_Counts, sumOfWeights);
  105. }
  106. }
  107.  
  108. /**
  109. * Classifies a given instance.
  110. *
  111. * @param instance the instance to be classified
  112. * @return index of the predicted class
  113. */
  114. public double classifyInstance(Instance instance) {
  115.  
  116. return m_ClassValue;
  117. }
  118.  
  119. /**
  120. * Calculates the class membership probabilities for the given test instance.
  121. *
  122. * @param instance the instance to be classified
  123. * @return predicted class probability distribution
  124. * @exception Exception if class is numeric
  125. */
  126. public double [] distributionForInstance(Instance instance)
  127. throws Exception {
  128.  
  129. if (m_Counts == null) {
  130. double[] result = new double[];
  131. result[] = m_ClassValue;
  132. return result;
  133. } else {
  134. return (double []) m_Counts.clone();
  135. }
  136. }
  137.  
  138. /**
  139. * Returns a description of the classifier.
  140. *
  141. * @return a description of the classifier as a string.
  142. */
  143. public String toString() {
  144.  
  145. if (m_Class == null) {
  146. return "ZeroR: No model built yet.";
  147. }
  148. if (m_Counts == null) {
  149. return "ZeroR predicts class value: " + m_ClassValue;
  150. } else {
  151. return "ZeroR predicts class value: " + m_Class.value((int) m_ClassValue);
  152. }
  153. }
  154.  
  155. /**
  156. * Main method for testing this class.
  157. *
  158. * @param argv the options
  159. */
  160. public static void main(String [] argv) {
  161.  
  162. try {
  163. System.out.println(Evaluation.evaluateModel(new ZeroR(), argv));
  164. } catch (Exception e) {
  165. System.err.println(e.getMessage());
  166. }
  167. }
  168. }

  二 添加模糊聚类算法流程

  1.按照weka接口,写好一个模糊聚类算法,源码见最下面FuzzyCMeans.java ;并

  2.把源码拷贝到weka.clusterers路径下;

  3.修改 weka.gui.GenericObjectEditor.props ,在#Lists the Clusterers I want to choose from 的 weka.clusterers.Clusterer=\下加入:weka.clusterers.FuzzyCMeans

  4. 相应的修改 weka.gui.GenericPropertiesCreator.props ,此去不用修改,因为包 weka.clusterers 已经存在,若加入新的包时则必须修改这里,加入新的包;

FuzzyCMeans.java源码:

  1. /*
  2. * This program is free software; you can redistribute it and/or modify
  3. * it under the terms of the GNU General Public License as published by
  4. * the Free Software Foundation; either version 2 of the License, or
  5. * (at your option) any later version.
  6. *
  7. * This program is distributed in the hope that it will be useful,
  8. * but WITHOUT ANY WARRANTY; without even the implied warranty of
  9. * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  10. * GNU General Public License for more details.
  11. *
  12. * You should have received a copy of the GNU General Public License
  13. * along with this program; if not, write to the Free Software
  14. * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  15. */
  16.  
  17. /*
  18. * FCM.java
  19. * Copyright (C) 2007 Wei Xiaofei
  20. *
  21. */
  22. package weka.clusterers;
  23.  
  24. import weka.classifiers.rules.DecisionTableHashKey;
  25. import weka.core.Capabilities;
  26. import weka.core.Instance;
  27. import weka.core.Instances;
  28. import weka.core.Option;
  29. import weka.core.Utils;
  30. import weka.core.WeightedInstancesHandler;
  31. import weka.core.Capabilities.Capability;
  32. import weka.core.matrix.Matrix;
  33. import weka.filters.Filter;
  34. import weka.filters.unsupervised.attribute.ReplaceMissingValues;
  35.  
  36. import java.util.Enumeration;
  37. import java.util.HashMap;
  38. import java.util.Random;
  39. import java.util.Vector;
  40.  
  41. /**
  42. <!-- globalinfo-start -->
  43. * Cluster data using the Fuzzy C means algorithm
  44. * <p/>
  45. <!-- globalinfo-end -->
  46. *
  47. <!-- options-start -->
  48. * Valid options are: <p/>
  49. *
  50. * <pre> -N &lt;num&gt;
  51. * number of clusters.
  52. * (default 2).</pre>
  53. *
  54. * <pre> -F &lt;num&gt;
  55. * exponent.
  56. * (default 2).</pre>
  57. *
  58. * <pre> -S &lt;num&gt;
  59. * Random number seed.
  60. * (default 10)</pre>
  61. *
  62. <!-- options-end -->
  63. *
  64. * @author Wei Xiaofei
  65. * @version 1.03
  66. * @see RandomizableClusterer
  67. */
  68. public class FuzzyCMeans
  69. extends RandomizableClusterer
  70. implements NumberOfClustersRequestable, WeightedInstancesHandler {
  71.  
  72. /** for serialization */
  73. static final long serialVersionUID = -2134543132156464L;
  74.  
  75. /**
  76. * replace missing values in training instances
  77. * 替换训练集中的缺省值
  78. */
  79. private ReplaceMissingValues m_ReplaceMissingFilter;
  80.  
  81. /**
  82. * number of clusters to generate
  83. * 产生聚类的个数
  84. */
  85. private int m_NumClusters = ;
  86.  
  87. /**
  88. * D: d(i,j)=||c(i)-x(j)||为第i个聚类中心与第j个数据点间的欧几里德距离
  89. */
  90. private Matrix D;
  91.  
  92. // private Matrix U;
  93.  
  94. /**
  95. * holds the fuzzifier
  96. * 模糊算子(加权指数)
  97. */
  98. private double m_fuzzifier = ;
  99.  
  100. /**
  101. * holds the cluster centroids
  102. * 聚类中心
  103. */
  104. private Instances m_ClusterCentroids;
  105.  
  106. /**
  107. * Holds the standard deviations of the numeric attributes in each cluster
  108. * 每个聚类的标准差
  109. */
  110. private Instances m_ClusterStdDevs;
  111.  
  112. /**
  113. * For each cluster, holds the frequency counts for the values of each
  114. * nominal attribute
  115. */
  116. private int [][][] m_ClusterNominalCounts;
  117.  
  118. /**
  119. * The number of instances in each cluster
  120. * 每个聚类包含的实例个数
  121. */
  122. private int [] m_ClusterSizes;
  123.  
  124. /**
  125. * attribute min values
  126. * 属性最小值
  127. */
  128. private double [] m_Min;
  129.  
  130. /**
  131. * attribute max values
  132. * 属性最大值
  133. */
  134. private double [] m_Max;
  135.  
  136. /**
  137. * Keep track of the number of iterations completed before convergence
  138. * 迭代次数
  139. */
  140. private int m_Iterations = ;
  141.  
  142. /**
  143. * Holds the squared errors for all clusters
  144. * 平方误差
  145. */
  146. private double [] m_squaredErrors;
  147.  
  148. /**
  149. * the default constructor
  150. * 初始构造器
  151. */
  152. public FuzzyCMeans () {
  153. super();
  154.  
  155. m_SeedDefault = ;//初始化种子个数
  156. setSeed(m_SeedDefault);
  157. }
  158.  
  159. /**
  160. * Returns a string describing this clusterer
  161. * @return a description of the evaluator suitable for
  162. * displaying in the explorer/experimenter gui
  163. * 全局信息, 在图形介面显示
  164. */
  165. public String globalInfo() {
  166. return "Cluster data using the fuzzy k means algorithm";
  167. }
  168.  
  169. /**
  170. * Returns default capabilities of the clusterer.
  171. *
  172. * @return the capabilities of this clusterer
  173. * 聚类容器
  174. */
  175. public Capabilities getCapabilities() {
  176. Capabilities result = super.getCapabilities();
  177.  
  178. result.disableAll();
  179. result.enable(Capability.NO_CLASS);
  180.  
  181. // attributes
  182. result.enable(Capability.NUMERIC_ATTRIBUTES);
  183. result.enable(Capability.MISSING_VALUES);
  184.  
  185. return result;
  186. }
  187.  
  188. /**
  189. * Generates a clusterer. Has to initialize all fields of the clusterer
  190. * that are not being set via options.
  191. *
  192. * @param data set of instances serving as training data
  193. * @throws Exception if the clusterer has not been
  194. * generated successfully
  195. * 聚类产生函数
  196. */
  197. public void buildClusterer(Instances data) throws Exception {
  198.  
  199. // can clusterer handle the data?检测数据能否聚类
  200. getCapabilities().testWithFail(data);
  201.  
  202. m_Iterations = ;
  203.  
  204. m_ReplaceMissingFilter = new ReplaceMissingValues();
  205. Instances instances = new Instances(data);//实例
  206. instances.setClassIndex(-);
  207. m_ReplaceMissingFilter.setInputFormat(instances);
  208. instances = Filter.useFilter(instances, m_ReplaceMissingFilter);
  209.  
  210. m_Min = new double [instances.numAttributes()];
  211. m_Max = new double [instances.numAttributes()];
  212. for (int i = ; i < instances.numAttributes(); i++) {
  213. m_Min[i] = m_Max[i] = Double.NaN;//随机分配不定值
  214. }
  215.  
  216. m_ClusterCentroids = new Instances(instances, m_NumClusters);//聚类中心
  217. int[] clusterAssignments = new int [instances.numInstances()];
  218.  
  219. for (int i = ; i < instances.numInstances(); i++) {
  220. updateMinMax(instances.instance(i));//更新最大最小值
  221. }
  222.  
  223. Random RandomO = new Random(getSeed());//随机数
  224. int instIndex;
  225. HashMap initC = new HashMap();
  226. DecisionTableHashKey hk = null;
  227. /* 利用决策表随机生成聚类中心 */
  228. for (int j = instances.numInstances() - ; j >= ; j--) {
  229. instIndex = RandomO.nextInt(j+);
  230. hk = new DecisionTableHashKey(instances.instance(instIndex),
  231. instances.numAttributes(), true);
  232. if (!initC.containsKey(hk)) {
  233. m_ClusterCentroids.add(instances.instance(instIndex));
  234. initC.put(hk, null);
  235. }
  236. instances.swap(j, instIndex);
  237.  
  238. if (m_ClusterCentroids.numInstances() == m_NumClusters) {
  239. break;
  240. }
  241. }
  242.  
  243. m_NumClusters = m_ClusterCentroids.numInstances();//聚类个数=聚类中心个数
  244.  
  245. D = new Matrix(solveD(instances).getArray());//求聚类中心到每个实例的距离
  246.  
  247. int i, j;
  248. int n = instances.numInstances();
  249. Instances [] tempI = new Instances[m_NumClusters];
  250. m_squaredErrors = new double [m_NumClusters];
  251. m_ClusterNominalCounts = new int [m_NumClusters][instances.numAttributes()][];
  252.  
  253. Matrix U = new Matrix(solveU(instances).getArray());//初始化隶属矩阵U
  254. double q = ;//初始化价值函数值
  255. while (true) {
  256. m_Iterations++;
  257. for (i = ; i < instances.numInstances(); i++) {
  258. Instance toCluster = instances.instance(i);
  259. int newC = clusterProcessedInstance(toCluster, true);//聚类处理实例,即输入的实例应该聚到哪一个簇?!
  260.  
  261. clusterAssignments[i] = newC;
  262. }
  263.  
  264. // update centroids 更新聚类中心
  265. m_ClusterCentroids = new Instances(instances, m_NumClusters);
  266. for (i = ; i < m_NumClusters; i++) {
  267. tempI[i] = new Instances(instances, );
  268. }
  269. for (i = ; i < instances.numInstances(); i++) {
  270. tempI[clusterAssignments[i]].add(instances.instance(i));
  271. }
  272.  
  273. for (i = ; i < m_NumClusters; i++) {
  274.  
  275. double[] vals = new double[instances.numAttributes()];
  276. for (j = ; j < instances.numAttributes(); j++) {
  277.  
  278. double sum1 = , sum2 = ;
  279. for (int k = ; k < n; k++) {
  280. sum1 += U.get(i, k) * U.get(i, k) * instances.instance(k).value(j);
  281. sum2 += U.get(i, k) * U.get(i, k);
  282. }
  283. vals[j] = sum1 / sum2;
  284.  
  285. }
  286. m_ClusterCentroids.add(new Instance(1.0, vals));
  287.  
  288. }
  289.  
  290. D = new Matrix(solveD(instances).getArray());
  291. U = new Matrix(solveU(instances).getArray());//计算新的聿属矩阵U
  292. double q1 = ;//新的价值函数值
  293. for (i = ; i < m_NumClusters; i++) {
  294. for (j = ; j < n; j++) {
  295. /* 计算价值函数值 即q1 += U(i,j)^m * d(i,j)^2 */
  296. q1 += Math.pow(U.get(i, j), getFuzzifier()) * D.get(i, j) * D.get(i, j);
  297. }
  298. }
  299.  
  300. /* 上次价值函数值的改变量(q1 -q)小于某个阀值(这里用机器精度:2.2204e-16) */
  301. if (q1 - q < 2.2204e-16) {
  302. break;
  303. }
  304. q = q1;
  305. }
  306.  
  307. /* 计算标准差 跟K均值一样 */
  308. m_ClusterStdDevs = new Instances(instances, m_NumClusters);
  309. m_ClusterSizes = new int [m_NumClusters];
  310. for (i = ; i < m_NumClusters; i++) {
  311. double [] vals2 = new double[instances.numAttributes()];
  312. for (j = ; j < instances.numAttributes(); j++) {
  313. if (instances.attribute(j).isNumeric()) {//判断属性是否是数值型的?!
  314. vals2[j] = Math.sqrt(tempI[i].variance(j));
  315. } else {
  316. vals2[j] = Instance.missingValue();
  317. }
  318. }
  319. m_ClusterStdDevs.add(new Instance(1.0, vals2));//1.0代表权值, vals2代表属性值
  320. m_ClusterSizes[i] = tempI[i].numInstances();
  321. }
  322. }
  323.  
  324. /**
  325. * clusters an instance that has been through the filters
  326. *
  327. * @param instance the instance to assign a cluster to
  328. * @param updateErrors if true, update the within clusters sum of errors
  329. * @return a cluster number
  330. * 聚类一个实例, 返回实例应属于哪一个簇的编号
  331. * 首先计算输入的实例到所有聚类中心的距离, 哪里距离最小
  332. * 这个实例就属于哪一个聚类中心所在簇
  333. */
  334. private int clusterProcessedInstance(Instance instance, boolean updateErrors) {
  335. double minDist = Integer.MAX_VALUE;
  336. int bestCluster = ;
  337. for (int i = ; i < m_NumClusters; i++) {
  338. double dist = distance(instance, m_ClusterCentroids.instance(i));
  339. if (dist < minDist) {
  340. minDist = dist;
  341. bestCluster = i;
  342. }
  343. }
  344. if (updateErrors) {
  345. m_squaredErrors[bestCluster] += minDist;
  346. }
  347. return bestCluster;
  348. }
  349.  
  350. /**
  351. * Classifies a given instance.
  352. *
  353. * @param instance the instance to be assigned to a cluster
  354. * @return the number of the assigned cluster as an interger
  355. * if the class is enumerated, otherwise the predicted value
  356. * @throws Exception if instance could not be classified
  357. * successfully
  358. * 分类一个实例, 调用clusterProcessedInstance()函数
  359. */
  360. public int clusterInstance(Instance instance) throws Exception {
  361. m_ReplaceMissingFilter.input(instance);
  362. m_ReplaceMissingFilter.batchFinished();
  363. Instance inst = m_ReplaceMissingFilter.output();
  364.  
  365. return clusterProcessedInstance(inst, false);
  366. }
  367.  
  368. /**
  369. * 计算矩阵D, 即 d(i,j)=||c(i)-x(j)||
  370. */
  371. private Matrix solveD(Instances instances) {
  372. int n = instances.numInstances();
  373. Matrix D = new Matrix(m_NumClusters, n);
  374. for (int i = ; i < m_NumClusters; i++) {
  375. for (int j = ; j < n; j++) {
  376. D.set(i, j, distance(instances.instance(j), m_ClusterCentroids.instance(i)));
  377. if (D.get(i, j) == ) {
  378. D.set(i, j, 0.000000000001);
  379. }
  380. }
  381. }
  382.  
  383. return D;
  384. }
  385.  
  386. /**
  387. * 计算聿属矩阵U, 即U(i,j) = 1 / sum(d(i,j)/ d(k,j))^(2/(m-1)
  388. */
  389. private Matrix solveU(Instances instances) {
  390. int n = instances.numInstances();
  391. int i, j;
  392. Matrix U = new Matrix(m_NumClusters, n);
  393.  
  394. for (i = ; i < m_NumClusters; i++) {
  395. for (j = ; j < n; j++) {
  396. double sum = ;
  397. for (int k = ; k < m_NumClusters; k++) {
  398. //d(i,j)/d(k,j)^(2/(m-1)
  399. sum += Math.pow(D.get(i, j) / D.get(k, j), /(getFuzzifier() - ));
  400. }
  401. U.set(i, j, Math.pow(sum, -));
  402. }
  403. }
  404. return U;
  405. }
  406. /**
  407. * Calculates the distance between two instances
  408. *
  409. * @param first the first instance
  410. * @param second the second instance
  411. * @return the distance between the two given instances
  412. * 计算两个实例之间的距离, 返回欧几里德距离
  413. */
  414. private double distance(Instance first, Instance second) {
  415.  
  416. double val1;
  417. double val2;
  418. double dist = 0.0;
  419.  
  420. for (int i = ; i <first.numAttributes(); i++) {
  421. val1 = first.value(i);
  422. val2 = second.value(i);
  423.  
  424. dist += (val1 - val2) * (val1 - val2);
  425. }
  426. dist = Math.sqrt(dist);
  427. return dist;
  428. }
  429.  
  430. /**
  431. * Updates the minimum and maximum values for all the attributes
  432. * based on a new instance.
  433. *
  434. * @param instance the new instance
  435. * 更新所有属性最大最小值, 跟K均值里的函数一样
  436. */
  437. private void updateMinMax(Instance instance) {
  438.  
  439. for (int j = ;j < m_ClusterCentroids.numAttributes(); j++) {
  440. if (!instance.isMissing(j)) {
  441. if (Double.isNaN(m_Min[j])) {
  442. m_Min[j] = instance.value(j);
  443. m_Max[j] = instance.value(j);
  444. } else {
  445. if (instance.value(j) < m_Min[j]) {
  446. m_Min[j] = instance.value(j);
  447. } else {
  448. if (instance.value(j) > m_Max[j]) {
  449. m_Max[j] = instance.value(j);
  450. }
  451. }
  452. }
  453. }
  454. }
  455. }
  456.  
  457. /**
  458. * Returns the number of clusters.
  459. *
  460. * @return the number of clusters generated for a training dataset.
  461. * @throws Exception if number of clusters could not be returned
  462. * successfully
  463. * 返回聚类个数
  464. */
  465. public int numberOfClusters() throws Exception {
  466. return m_NumClusters;
  467. }
  468.  
  469. /**
  470. * 返回模糊算子, 即加权指数
  471. *
  472. * @return 加权指数
  473. * @throws Exception 加权指数不能成功返回
  474. */
  475. public double fuzzifier() throws Exception {
  476. return m_fuzzifier;
  477. }
  478. /**
  479. * Returns an enumeration describing the available options.
  480. *
  481. * @return an enumeration of all the available options.
  482. * 返回一个枚举描述的活动选项(菜单)
  483. */
  484. public Enumeration listOptions () {
  485. Vector result = new Vector();
  486.  
  487. result.addElement(new Option(
  488. "\tnumber of clusters.\n"
  489. + "\t(default 2).",
  490. "N", , "-N <num>"));
  491.  
  492. result.addElement(new Option(
  493. "\texponent.\n"
  494. + "\t(default 2.0).",
  495. "F", , "-F <num>"));
  496.  
  497. Enumeration en = super.listOptions();
  498. while (en.hasMoreElements())
  499. result.addElement(en.nextElement());
  500.  
  501. return result.elements();
  502. }
  503.  
  504. /**
  505. * Returns the tip text for this property
  506. * @return tip text for this property suitable for
  507. * displaying in the explorer/experimenter gui
  508. * 返回文本信息
  509. */
  510. public String numClustersTipText() {
  511. return "set number of clusters";
  512. }
  513.  
  514. /**
  515. * set the number of clusters to generate
  516. *
  517. * @param n the number of clusters to generate
  518. * @throws Exception if number of clusters is negative
  519. * 设置聚类个数
  520. */
  521. public void setNumClusters(int n) throws Exception {
  522. if (n <= ) {
  523. throw new Exception("Number of clusters must be > 0");
  524. }
  525. m_NumClusters = n;
  526. }
  527.  
  528. /**
  529. * gets the number of clusters to generate
  530. *
  531. * @return the number of clusters to generate
  532. * 取聚类个数
  533. */
  534. public int getNumClusters() {
  535. return m_NumClusters;
  536. }
  537.  
  538. /**
  539. * Returns the tip text for this property
  540. * @return tip text for this property suitable for
  541. * displaying in the explorer/experimenter gui
  542. * 返回文本信息
  543. */
  544. public String fuzzifierTipText() {
  545. return "set fuzzifier";
  546. }
  547.  
  548. /**
  549. * set the fuzzifier
  550. *
  551. * @param f fuzzifier
  552. * @throws Exception if exponent is negative
  553. * 设置模糊算子
  554. */
  555. public void setFuzzifier(double f) throws Exception {
  556. if (f <= ) {
  557. throw new Exception("F must be > 1");
  558. }
  559. m_fuzzifier= f;
  560. }
  561.  
  562. /**
  563. * get the fuzzifier
  564. *
  565. * @return m_fuzzifier
  566. * 取得模糊算子
  567. */
  568. public double getFuzzifier() {
  569. return m_fuzzifier;
  570. }
  571.  
  572. /**
  573. * Parses a given list of options. <p/>
  574. *
  575. <!-- options-start -->
  576. * Valid options are: <p/>
  577. *
  578. * <pre> -N &lt;num&gt;
  579. * number of clusters.
  580. * (default 2).</pre>
  581. *
  582. * <pre> -F &lt;num&gt;
  583. * fuzzifier.
  584. * (default 2.0).</pre>
  585. *
  586. * <pre> -S &lt;num&gt;
  587. * Random number seed.
  588. * (default 10)</pre>
  589. *
  590. <!-- options-end -->
  591. *
  592. * @param options the list of options as an array of strings
  593. * @throws Exception if an option is not supported
  594. * 设置活动选项
  595. */
  596. public void setOptions (String[] options)
  597. throws Exception {
  598.  
  599. String optionString = Utils.getOption('N', options);
  600.  
  601. if (optionString.length() != ) {
  602. setNumClusters(Integer.parseInt(optionString));
  603. }
  604.  
  605. optionString = Utils.getOption('F', options);
  606.  
  607. if (optionString.length() != ) {
  608. setFuzzifier((new Double(optionString)).doubleValue());
  609. }
  610. super.setOptions(options);
  611. }
  612.  
  613. /**
  614. * Gets the current settings of FuzzyCMeans
  615. *
  616. * @return an array of strings suitable for passing to setOptions()
  617. * 取得活动选项
  618. */
  619. public String[] getOptions () {
  620. int i;
  621. Vector result;
  622. String[] options;
  623.  
  624. result = new Vector();
  625.  
  626. result.add("-N");
  627. result.add("" + getNumClusters());
  628.  
  629. result.add("-F");
  630. result.add("" + getFuzzifier());
  631.  
  632. options = super.getOptions();
  633. for (i = ; i < options.length; i++)
  634. result.add(options[i]);
  635.  
  636. return (String[]) result.toArray(new String[result.size()]);
  637. }
  638.  
  639. /**
  640. * return a string describing this clusterer
  641. *
  642. * @return a description of the clusterer as a string
  643. * 结果显示
  644. */
  645. public String toString() {
  646. int maxWidth = ;
  647. for (int i = ; i < m_NumClusters; i++) {
  648. for (int j = ;j < m_ClusterCentroids.numAttributes(); j++) {
  649. if (m_ClusterCentroids.attribute(j).isNumeric()) {
  650. double width = Math.log(Math.abs(m_ClusterCentroids.instance(i).value(j))) /
  651. Math.log(10.0);
  652. width += 1.0;
  653. if ((int)width > maxWidth) {
  654. maxWidth = (int)width;
  655. }
  656. }
  657. }
  658. }
  659. StringBuffer temp = new StringBuffer();
  660. String naString = "N/A";
  661. for (int i = ; i < maxWidth+; i++) {
  662. naString += " ";
  663. }
  664. temp.append("\nFuzzy C-means\n======\n");
  665. temp.append("\nNumber of iterations: " + m_Iterations+"\n");
  666. temp.append("Within cluster sum of squared errors: " + Utils.sum(m_squaredErrors));
  667.  
  668. temp.append("\n\nCluster centroids:\n");
  669. for (int i = ; i < m_NumClusters; i++) {
  670. temp.append("\nCluster "+i+"\n\t");
  671. temp.append("\n\tStd Devs: ");
  672. for (int j = ; j < m_ClusterStdDevs.numAttributes(); j++) {
  673. if (m_ClusterStdDevs.attribute(j).isNumeric()) {
  674. temp.append(" "+Utils.doubleToString(m_ClusterStdDevs.instance(i).value(j),
  675. maxWidth+, ));
  676. } else {
  677. temp.append(" "+naString);
  678. }
  679. }
  680. }
  681. temp.append("\n\n");
  682. return temp.toString();
  683. }
  684.  
  685. /**
  686. * Gets the the cluster centroids
  687. *
  688. * @return the cluster centroids
  689. * 取得聚类中心
  690. */
  691. public Instances getClusterCentroids() {
  692. return m_ClusterCentroids;
  693. }
  694.  
  695. /**
  696. * Gets the standard deviations of the numeric attributes in each cluster
  697. *
  698. * @return the standard deviations of the numeric attributes
  699. * in each cluster
  700. * 聚得标准差
  701. */
  702. public Instances getClusterStandardDevs() {
  703. return m_ClusterStdDevs;
  704. }
  705.  
  706. /**
  707. * Returns for each cluster the frequency counts for the values of each
  708. * nominal attribute
  709. *
  710. * @return the counts
  711. */
  712. public int [][][] getClusterNominalCounts() {
  713. return m_ClusterNominalCounts;
  714. }
  715.  
  716. /**
  717. * Gets the squared error for all clusters
  718. *
  719. * @return the squared error
  720. * 取得平方差
  721. */
  722. public double getSquaredError() {
  723. return Utils.sum(m_squaredErrors);
  724. }
  725.  
  726. /**
  727. * Gets the number of instances in each cluster
  728. *
  729. * @return The number of instances in each cluster
  730. * 取每个簇的实例个数
  731. */
  732. public int [] getClusterSizes() {
  733. return m_ClusterSizes;
  734. }
  735.  
  736. /**
  737. * Main method for testing this class.
  738. *
  739. * @param argv should contain the following arguments: <p>
  740. * -t training file [-N number of clusters]
  741. * 主函数
  742. */
  743. public static void main (String[] argv) {
  744. runClusterer(new FuzzyCMeans (), argv);
  745. }
  746. }

机器学习:weka中添加自己的分类和聚类算法的更多相关文章

  1. 在weka中添加libSVM或者HMM等新算法

    转:http://kasy-13.blog.163.com/blog/static/8214691420143226365887/ Weka的全名是怀卡托智能分析环境(Waikato Environm ...

  2. MATLAB中“fitgmdist”的用法及其GMM聚类算法

    MATLAB中“fitgmdist”的用法及其GMM聚类算法 作者:凯鲁嘎吉 - 博客园http://www.cnblogs.com/kailugaji/ 高斯混合模型的基本原理:聚类——GMM,MA ...

  3. 机器学习回顾篇(9):K-means聚类算法. slides

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  4. Ext 向Ext.form.ComboBox()中添加列表的分类

    1.静态 [javascript] view plaincopy var staticComboBox = new Ext.form.ComboBox({   fieldLabel:'回访结果',   ...

  5. Spark MLBase分布式机器学习系统入门:以MLlib实现Kmeans聚类算法

    1.什么是MLBaseMLBase是Spark生态圈的一部分,专注于机器学习,包含三个组件:MLlib.MLI.ML Optimizer. ML Optimizer: This layer aims ...

  6. SIGAI机器学习第二十四集 聚类算法1

    讲授聚类算法的基本概念,算法的分类,层次聚类,K均值算法,EM算法,DBSCAN算法,OPTICS算法,mean shift算法,谱聚类算法,实际应用. 大纲: 聚类问题简介聚类算法的分类层次聚类算法 ...

  7. Weka中数据挖掘与机器学习系列之Weka系统安装(四)

    能来看我这篇博客的朋友,想必大家都知道,Weka采用Java编写的,因此,具有Java“一次编译,到处运行”的特性.支持的操作系统有Windows x86.Windows x64.Mac OS X.L ...

  8. Weka中数据挖掘与机器学习系列之Exploer界面(七)

    不多说,直接上干货! Weka的Explorer(探索者)界面,是Weka的主要图形化用户界面,其全部功能都可通过菜单选择或表单填写进行访问.本博客将详细介绍Weka探索者界面的图形化用户界面.预处理 ...

  9. Weka中数据挖掘与机器学习系列之Weka3.7和3.9不同版本共存(七)

    不多说,直接上干货! 为什么,我要写此博客,原因是(以下,我是weka3.7.8) 以下是,weka3.7.8的安装版本. Weka中数据挖掘与机器学习系列之Weka系统安装(四) 基于此,我安装最新 ...

随机推荐

  1. 去掉first li 的list图标

    ul中,第一个 li 前的小图标,默认情况下为小圆点,在这种情况下,给 first li 设置 list-style-type: none;可以成功去除前面的小圆点的. 当给 li 设置了 list ...

  2. All Of ACM

    数据结构和算法专栏,我会什么写什么  = = 不定时更新 一.数据结构 树状数组详解 线段树详解 二.算法 KMP算法 三.板子 我的代码模板 大整数模板 四.题解报告 几道STL题目(FJUT - ...

  3. PopupWindow弹出框

    使用PopupWindow实现一个悬浮框,悬浮在Activity之上,显示位置可以指定 首先创建pop_window.xml: <?xml version="1.0" enc ...

  4. 【Sentinel】sentinel 集成 apollo 最佳实践

    [Sentinel]sentinel 集成 apollo 最佳实践 前言   在 sentinel 的控制台设置的规则信息默认都是存在内存当中的.所以无论你是重启了 sentinel 的客户端还是 s ...

  5. 在wxml中直接写js代码(wxs)

    我们在h5开发中,很多时候要在html中写到js代码,这个很容易实现.但是在微信小程序开发中,是不能直接在wxml中写js代码的,因此就有了wxs.在wxml中用wxs代码,有以下几种方式(在小程序文 ...

  6. Centos6安装MySQL5.7(yum方式)

    1. 下载并安装用来配置mysql的yum源的rpm包 # 下载 wget http://repo.mysql.com/mysql57-community-release-el6-10.noarch. ...

  7. jquery的api以及用法总结-数据/操作/事件

    数据 .data() 在匹配元素上存储任意相关数据或返回匹配的元素集合中的第一个元素的给定名称的数据存储的值 .data(obj) 一个用于更新数据的键/值对 .data()方法允许我们再dom元素上 ...

  8. 制定一个学习liunx的目标

        制定一个学习liunx的目标       学习目标方法 1.在这五个月的学习时间里,制定一套自己的学习方式. 2.养成做笔记以及写博客的习惯 . 3.坚持上课前预习,自习时间总结 . 4.紧跟 ...

  9. spring5 源码深度解析----- AOP的使用及AOP自定义标签

    我们知道在面向对象OOP编程存在一些弊端,当需要为多个不具有继承关系的对象引入同一个公共行为时,例如日志,安全检测等,我们只有在每个对象里引入公共行为,这样程序中就产生了大量的重复代码,所以有了面向对 ...

  10. 从一道面试题深入了解java虚拟机内存结构

    记得刚大学毕业时,为了应付面试,疯狂的在网上刷JAVA的面试题,很多都靠死记硬背.其中有道面试题,给我的印象非常之深刻,有个大厂的面试官,顺着这道题目,一直往下问,问到java虚拟机的知识,最后把我给 ...