K均值算法实现

运行环境：Ubuntu+Code::Blocks（G++）

K-均值：在D（数据集）中随机地选择k个对象，每个对象代表一个簇的初始均值或中心。对剩下的每个对象，根据其与各个簇中心的欧式距离，将它分配到最相似的簇中。（不能保证k-均值方法收敛于全局最优解，并且它常常终止于一个局部最优解。可以不同的初始簇中心，多次运行k-均值算法。）

代码为3个簇，初始的簇中心为输入的前三个点。（代码是六七月份写的，直接放上来。）

 #include <iostream>

 #include <vector>

 #include <cmath>

 #include <fstream>

 #include <cstdlib>

 #define OP ','

 #define SET_SIZE 3 //簇的个数

 using namespace std;

 const double EXP = 1e-;//用于判断新的中心点与原中心点的距离，如果比exp要小则说明聚类结束

 //定义的点的坐标

 typedef struct point

 {

     double xAxle;

     double yAxle;

     point()

     {

         xAxle = ;

         yAxle = ;

     }

     point(double _x,double _y)

     {

         xAxle = _x;

         yAxle = _y;

     }

 }Point;

 //读取文件里面的内容，到numSave数组里面去

 void readFile(ifstream &inFile,const string &fileName,vector<Point> &numSave)

 {

     inFile.clear();

     inFile.open(fileName.c_str());

     if(!inFile)

     {

         cout << "无法打开输入文件！" << endl;

     }

     //一行一行的读取

     string temp;

     while(getline(inFile,temp))

     {

         //文件里面一行是一个坐标点

         double x = atof(temp.substr(,temp.find(OP)).c_str());

         double y = atof(temp.substr(temp.find(OP) + ,temp.size()-).c_str());

         numSave.push_back(Point(x,y));

     }

     inFile.close();

 }

 //计算距离值

 double calDistance(Point &a,Point &b)

 {

     return sqrt((a.xAxle - b.xAxle) * (a.xAxle - b.xAxle) + (a.yAxle - b.yAxle)*(a.yAxle - b.yAxle));

 }

 //计算一个簇里面的均值以获得中心点

 void calAverage(vector<Point> num,double &xValueAver,double &yValueAver)

 {

     int xValue = ,yValue = ;

     for(unsigned int i = ;i < num.size();i ++)

     {

         xValue += num[i].xAxle;

         yValue += num[i].yAxle;

     }

     //获得平均值

     xValueAver = (double)xValue/num.size();

     yValueAver = (double)yValue/num.size();

 }

 //根据簇中心得到集合，getSetValue存储每一个簇里面的元素

 void getSet(vector<Point> &numSave,vector<Point> &setCentre,vector<Point> (&getSetValue)[SET_SIZE])

 {

     for(unsigned int i = ;i < numSave.size();i ++)

     {

          //设置一个最大值，为了找到最小值

         double temp = 100000000.0;

         //记录最小的距离numSave[i]的值

         unsigned int k = ;

         for(unsigned int j = ;j < SET_SIZE;j ++)

         {

             //计算距离

             double dis = calDistance(numSave[i],setCentre[j]);

             if(temp > dis)

             {

                 temp = dis;

                 k = j;//保留最小距离的那个编号

             }

         }

         //将最小值放在相应编号的簇里面

         getSetValue[k].push_back(numSave[i]);

     }

 }

 void k_average(vector<Point> &numSave,ofstream &os)

 {

     vector<Point> setCentre;

     vector<Point> getSetValue[SET_SIZE];

     vector<Point> tempCentre;//用来保存之前的数据，方便对比

     //初始时将数据的前几个（定义的簇中心个数）作为中心点

     for(unsigned int i = ;i < SET_SIZE && i < numSave.size();i ++)

     {

         setCentre.push_back(numSave[i]);

     }

     while(true)

     {

         for(unsigned int i = ;i < SET_SIZE;i ++)

         {

             getSetValue[i].clear();

         }

         //根据簇中心找到与簇中心相关的点（距离近的点）

          getSet(numSave,setCentre,getSetValue);

          tempCentre = setCentre;

          bool flag = true;

          for(unsigned int i = ;i < SET_SIZE;i ++)

          {

              //输出簇中心点

              os << setCentre[i].xAxle << " and " << setCentre[i].yAxle << endl;

              /*

              for(unsigned int j = 0;j < getSetValue[i].size();j ++)

              {

                  os << getSetValue[i][j].xAxle << " " << getSetValue[i][j].yAxle << "----";

              }

              os << endl;

              */

              //根据新的集合获得新的簇中心

              calAverage(getSetValue[i],setCentre[i].xAxle,setCentre[i].yAxle);

          }

          //当其中有一项当前的簇中心相对于之前的移动了较大距离就继续寻找

          for(unsigned int i = ;i < SET_SIZE;i ++)

          {

              if(fabs(setCentre[i].xAxle - tempCentre[i].xAxle) > EXP || fabs(setCentre[i].yAxle - tempCentre[i].yAxle) > EXP )

              {

                  flag = false;

                  break;

              }

          }

           os << endl;

           //当每个簇中心不再变化时就不用找了

          if(flag)

          {

              break;

          }

     }

 }

 int main()

 {

     ifstream inFile;

     vector<Point> numSave;

     readFile(inFile,"input.txt",numSave);

     ofstream outFile;

     outFile.open("output.txt");

     if(!outFile)

     {

         cout << "不能打开文件。请检查文件！" << endl;

     }

     k_average(numSave,outFile);

     return ;

 }

代码如下：

K均值算法实现的更多相关文章

聚类算法：K-means 算法(k均值算法)
k-means算法: 第一步:选$K$个初始聚类中心,$z_1(1),z_2(1),\cdots,z_k(1)$,其中括号内的序号为寻找聚类中心的迭代运算的次序号. 聚类中心的向量值可任意设 ...
一句话总结K均值算法
一句话总结K均值算法核心:把样本分配到离它最近的类中心所属的类,类中心由属于这个类的所有样本确定. k均值算法是一种无监督的聚类算法.算法将每个样本分配到离它最近的那个类中心所代表的类,而类中心的确 ...
聚类--K均值算法：自主实现与sklearn.cluster.KMeans调用
1.用python实现K均值算法 import numpy as np x = np.random.randint(1,100,20)#产生的20个一到一百的随机整数 y = np.zeros(20) ...
【机器学习】K均值算法（I）
K均值算法是一类非监督学习类,其可以通过观察样本的离散性来对样本进行分类. 例如,在对如下图所示的样本中进行聚类,则执行如下步骤 1:随机选取3个点作为聚类中心. 2:簇分配:遍历所有样本然后依据每个 ...
Bisecting KMeans (二分K均值)算法讲解及实现
算法原理由于传统的KMeans算法的聚类结果易受到初始聚类中心点选择的影响,因此在传统的KMeans算法的基础上进行算法改进,对初始中心点选取比较严格,各中心点的距离较远,这就避免了初始聚类中心会选 ...
KMeans (K均值)算法讲解及实现
算法原理 KMeans算法是典型的基于距离的聚类算法,采用距离作为相似性的评价指标,即认为两个对象的距离越近,其相似度就越大.该算法认为簇是由距离靠近的对象组成的,因此把得到紧凑且独立的簇作为最终目标 ...
聚类分析K均值算法讲解
聚类分析及K均值算法讲解吴裕雄当今信息大爆炸时代,公司企业.教育科学.医疗卫生.社会民生等领域每天都在产生大量的结构多样的数据.产生数据的方式更是多种多样,如各类的:摄像头.传感器.报表.海量网络 ...
K均值算法
为了便于可视化,样本数据为随机生成的二维样本点. from matplotlib import pyplot as plt import numpy as np import random def k ...
K均值算法-python实现
测试数据展示: #coding:utf-8__author__ = 'similarface''''实现K均值算法算法摘要:-----------------------------输入:所有数据点 ...
spark Bisecting k-means（二分K均值算法）
Bisecting k-means(二分K均值算法) 二分k均值(bisecting k-means)是一种层次聚类方法,算法的主要思想是:首先将所有点作为一个簇,然后将该簇一分为二.之后选择能最大程 ...

随机推荐

iphone dev 入门实例2:Pass Data Between View Controllers using segue
Assigning View Controller Class In the first tutorial, we simply create a view controller that serve ...
C#数字千分位问题
1.C#中用最简单的方法把数字(不含小数)转换为千分位格式: 如1234567变成1,234,567 方法:1234567.ToString("###,###") 或 ...
android.database.sqlite.SQLiteCantOpenDatabaseException: unknown error(Sqlite code 14): Could not open database,(OS error - 13:Permission denied)
07-24 15:03:14.490 6291-6291/com.tongyan.nanjing.subway E/SQLiteDatabase: Failed to open database '/ ...
Calculations are rather interesting
Calculations are rather interesting, especially when some thoughts are involved therein.
js让iframe高度自动
HTML: <iframe id="yb_if" width="940px" src="连接" frameborder=0 allow ...
ssh-keygen+ssh-copy-id无密码登录远程LINUX主机（转载）
From:http://blog.163.com/lgh_2002/blog/static/44017526201011333227161/ 1.创建公钥和私钥 ligh@local-host$ ss ...
IDEA Community(社区版) 使用Maven创建Web工程并部署tomcat
由于IDEA社区版(Community)无法直接New一个Web Appplication 所以要使用maven来创建 1.创建一个Project 2. 3. 4.这里在Properties中添加一个 ...
CRM PrincipalObjectAccess(POA)
PrincipalObjectAccess (POA) table is an important table which holds all grants share on CRM objects. ...
oracle客户端精简绿色版-环境变量配置
大型项目开发中,常用的数据库,当属Oracle.但Oracle 客户端安装就要一张光盘,体积很大.而且安装后,基本上就用2个功能:TNS配置服务名,SqlPlus.在开发过程中,大量使用Toad和PL ...
C++primer 练习13.44
13.44:编写标准库string类的简化版本,命名为String.你的类应该至少有一个默认构造函数和一个接受C 风格字符串指针参数的构造函数.使用allocator为你的String类分配所需内存 ...

K均值算法实现

K均值算法实现的更多相关文章

随机推荐

热门专题