机器学习实战4:Adaboost提升:病马实例+非均衡分类问题
Adaboost提升算法是机器学习中很好用的两个算法之一,另一个是SVM支持向量机;机器学习面试中也会经常提问到Adaboost的一些原理;另外本文还介绍了一下非平衡分类问题的解决方案,这个问题在面试中也经常被提到,比如信用卡数据集中,失信的是少数,5:10000的情况下怎么准确分类?
一 引言
1 元算法(集成算法):多个弱分类器的组合;弱分类器的准确率很低 50%接近随机了
这种组合可以是 不同算法 或 同一算法不同配置 或是 数据集的不同部分分配给不同分类器;
2 bagging:把原始数据集随机抽样成S个与原始数据集一样大新数据集(允许有重复值),然后训练S个分类器,最后投票结果集成;
代表:随机森林
3 boosting:关注以后分类器错分的数据,而得到新的分类器;
代表:adaboost
bagging和boosting类似,都是抽样的方式构造多个数据集(特别适用于数据集有限的时候),并且多个组合分类器的类型都相同,但bagging是串行的,下一个分类器在上一个分类器的基础上继续训练得到的,权重均等;而boosting关注的是错分的数据,错分的数据权重大;
二 adaboost(adaptive boost)自适应提升算法
原理:为每一个样本赋均等的权重(D = 1/n),先用这个数据集训练第一个弱分类器,计算错误率,错误率是为了计算这个分类器最后投票的权重alpha,见公式:aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAGoAAAApCAIAAABSjysCAAAGOElEQVRoge2ZXUhTfxjHD91EYUQX1pZGBK4GsjMCY9gLf6UUzBZGF2G20LHmhSgIGw3fUmGWYGGWo5xhFF2MMgZBYZH4gmdoJmG4FMLKXbSXtrOt6Xa2nZ//i8P/cP7b1N8501ODfa+Ov9/zfc6zj8+z8zIEpJWEkD9dQGorjS8ppfElpTS+pJTGl5TS+JJSiuGbn5/XaDQ4jm9Wwlu3bnV1dYXDYW72VMI3NzeXm5s7OTm5iTlJkqyrq1MqlZFIhIOdV3wLCwuFhYUfPnzg4MVxPDc3F8Mw5qLZbL53716SVUUikdra2jt37nDw8oePIIimpiYEQV68eMHBXlFR8ejRI+bK6Ohodnb2uXPnkq/N7XYLBIKlpSW2RmSVR62srAiFwsHBQbZGHMdzcnICgUDMektLi1wuh8/jdrtNJlNzc7PBYIjZ0uv1DQ0NbAvjdXi9Xm9mZiaH7jMajU1NTfHrZWVlrLovPz+/qqrKarVmZGS8f/+eubWwsHD48GG2hfGK79OnT9yGV61Wm83m+PXm5ub18Xk8HvoYw7Bt27Y5nU4AwLt376gDWuFwWCQSffv2jVVhvA6v3W7fvXs3h+EVi8Uejyd+PWZ4a2pqKioq1P9JpVIdP36c3rVYLNu3b5dKpQqFIuFZioqKxsbGWBXGa/fZbLaMjAwO3ScUCqPRaPx6TPfhOB5zB+f3+6kDl8slEAi6u7sdDofb7U54Fo1Go9frWRXGa/cRBJGVlRXffdFoFMfxdYz79+8nSTJ+Hf7Scf/+fQRBHA7H6urq4OBgwi7r6enp6OiAyUaL1+4zGAwymUyj0TD//ziOX758+eTJk+sYE3bf5OSkSCQSiUQWi2XDU4+Nje3cubO8vLyqqkqn0yW8Sb579y7r7mMVvRUiCEKn06Eouk5MQnwrKytOp9PpdNITur48Ho/T6fz169daAVzwwTcqjuNarVatVtfX1weDwZcvX3Z2drJq9bXU398vlUpXGVO8tLTEDFhreDdXWzi8Xq/31KlT/f39AICWlpbe3l6xWPz8+XNqNxKJPHjwAPu/vn//DpncaDSiKDo/P19QUFBZWXnt2rXMzMyHDx/SAWtdOjZXWzi8CoWioKCAOn7y5AmCIJcuXaJ3Q6FQfX39AEM1NTVv376FTE7hIwhCqVQePHjQbrf39vZKpVI6ILXx/fjxY8eOHfQ3dHt7+5EjR5aXl5kxLpeL+aff7w8Gg9Sx2WwuSqSuri4qgMIHAGhsbGxsbAQA9PX17dq1i76zjcdXV1cnl8t7ktPU1BQz51bhs1qtCIJQ39BTU1N79ux59erV4uIi/ZECgYBYLFYzhKKowWCgdoPB4Gwi2Wy2GHzXr1+n8SEIQj/Dx+P78uVLwpys5HA4+MPn9XotFkthYeGxY8fa2tpKSkqYl/9AIMC0+P1++HeQNL729nalUgkg8G2FtgpfIBDQ6XQGg6G2tnZubm5iYkIikVitVk5FJpBKpUJR1GQyyWQyFEWtVmtDQ8OhQ4eGhoaogNTG98eVxgeWl5dLS0v37dsnkUiGh4dZedP4gEqlev36tcvlUigUQqHQ5/PBe9P4AH2PjWEYgiCs3qyxxRcKhYxGo1ar1Wq1JpMJ0vVX46M1NDSEoihJkvAWoVBIEAR8fGtra05Ojt1uv3r1an5+PqSrsrKys7MT/izgj+ArLi5++vQpK8uJEycSvm1OKJIkDxw40NbWBgCw2WyLi4uQxjNnzszOzrIqjG98HR0d1dXVbF3d3d03b96EDHa5XDKZrLi4WCKRjIyMQLoIgsjOzo55g7+heMU3MDAgl8tDoVA0GmX1w/7Hjx/Pnj0LGSyXy8+fPx+NRn0+H/xZMAwrKyuDL4kSf/geP36clZWl1+tv37595coVtj9v5+XlTU9Pw0Tu3buXulzMzMz09fVB5q+urn727BmrkgCf+E6fPv0PQ1+/fmVlHx8fP3r0aCgU2jDy4sWLAoGgvLxcrVb//v0bJvn09LRMJot57oRRajx1AABIkrxx48aFCxc2nMdwOOzz+eBZ4DheWlr6+fNnDlWlDD5Kra2tCoWCfhWWvGZmZkpKSoxGIzd7iuEDAExMTPz8+XOzsr1584Zb31FKPXx/ldL4ktK/reT9lNoY9O8AAAAASUVORK5CYII=" alt="" />;错分的样本权重提升aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFoAAAAhCAIAAADWLytXAAAIkklEQVRogd1Ze0iT7xd/o7zONCwtKeimMyaWimKQgpqlSVHJFFvaxakYKimEqUFZItNl6+LX0Mi7bVMRTa00N69z5pYphNnKNpvzQunc3OZlume/P56+b2PmTKuv/vr8dc5zOefw4TzvOc/zIuD/ECKR6MKFC6iam5vLZDJ/i2Xkt1j5j3H58mUajQYAqK+vT0pKmp6ePnXqVHd392LruVyuUCgcGBgYHx/XbXmN0lFcXEwkEsPCwsLDw0tKSjgcDjo1OTlpZWU1OTkJAMDhcAUFBQAAEol07969hXakUqmXl1dycvKhQ4fs7Ox4PJ5uv2uUDgAAFoulUCh8Pr+oqGjfvn2lpaVwvK+vz8DAQC6Xi8XiAwcOTE9PAwAYDAaBQFhoJDc3l0gkzs/POzs7P3r0aEmna5SOgYEBAwMDLpcLVRqNZmBgIBAI4JStre3MzExlZaWHhwdckJWVFRUVtdAOiUS6e/fus2fPrKys6uvrl/S7RunIzMzEYrGzs7NQlUqlCII8ePAAqng8nsfjtbe3Z2ZmwhFPT893794BAEZHRy9evOjm5hYfH69UKlksVnBwMJlMTkhIaGtrW9LvGqUjMDAwJCQEVSEd//zzD1THxsY0Z0tKSh4/fgzlkydPVldXy2Sybdu2paenL9fvWqRDoVBgMJinT5+iIxwOB0GQvr6+H66XSCRQ+PTpk76+fmhoaEREhLu7e3Jy8nJd/ywdtbW1FAoFADA/Pz8yMgIH5+bmoJCamnrnzh2RSITH49HgVoyKigpLS8uJiQmoqlSqo0ePurm5oe4WQ29vL4IgMpkMqiKRaLmuf4qOmZkZZ2dnKODx+NraWgCAUqm8dOkSXJCZmZmSkgIWL3jLQkREhLe3N5SlUimRSHRycoItQ25uro6NIyMjpqam169fBwCwWKzIyMjluv5OB5VKtbe3DwgIiIiIIBAIeXl56NSrV69OnDgBhc2bN0ul0p6entbWVn9///b2dqVSidLB5/NdXV2XG4QmKBSKvr6+i4tLYmKir6/v4cOHMzIyYKZIJJLo6Gjd27Ozszds2GBvb79nz563b98u1/t3OuD5hLVtcnLSycmJRCLBKZSOjo4OSHlVVVV0dLSXl1d+fr5AIPiNdIjF4q8aUCqV6NTVq1d/2Fxoob6+Pj8/f3h4eAXev9ORmZm5a9cuVC0oKDA1NYUp2t3d7eDgAAC4du1aTEwMXPDixQv0sJDJ5KSkJABAb2+vp6fnCuL4GURGRlZWVv4h4xDf6fDw8NCsXoODgwiCoByfPn2azWaPjo6ibb9KpUKvCQKBoL+/HwBw8+bNhw8fajqYnp5msViLFYVlwcbGprOz89ft6MA3OoaHh9evX19dXY1O1NTUaNIxMzOTk5Oj21ZXV9etW7e0vv+jo6N4PJ5AIMDi5+fnd+TIkbKyshXEmpycLJfLV7Dx5/GNDljbNGtkfHy8g4PDzMzMH3UvEonu/Yv3799/+k2YmppaWTzf6AgKCkJrGwCgra3N3NwcFtRfRENDAw6HIxAIVzXQ1NQEZ4VCIUknwsPDDy0fwcHBEYvg2LFjOA0kJiZq0zE1NbV9+/akpCSFQjE+Pp6Tk4PD4crLywEA3d3dvr6+v0LH+Ph4Z2enWCwWi8Xt7e2FhYWFhYVfvnz5FZu/gunpabEGtE4fMjExER8ff1kDWVlZaD/X0NDAYDBWI+zVAaLWibCwMBqNpnvN34QlmnRnZ2fNRugPYWJigk6n0+n0zs5OLpeL3utXAA6H8/nzZygrFAq5XC6XyxUKBbpAIpHAp4AfYgk69u7du+LIfhJKpfLgwYMkEonBYISEhCAIgl7elouamhpHR8ehoSGoHj9+HEEQPz8/Ozu7gICA7OxslUolk8kCAwPRtwItLHFYhoeH/3R+crncLVu2iMVitVo9Ozu7e/duiUSyAjsCgWDnzp1CoRAdKSwshDdjtVrN5/Otra0zMjLUarVMJnN1dX39+vVCI6v/3iGVStevX+/j4wNfg9PS0hQKRVpaGoFAUCgUUqmURqMJBILZ2dnnz5+XlZX19PQkJib29/ePj4/TaLTW1lZoJyYm5sqVK5qWY2NjNbuHvLw8CwuL+fl5KPv4+CwMZons+G9QVFSkr69vYWGRn58PR9hsNoIgQ0NDCoXC1tb2yZMnTU1NmzZtsra2jouLc3d3t7S0DA0NjYuLMzQ07OnpUSgUhoaGLS0tmmZ37NiRnZ2Nqi9fvkQQhMViqdXqsbExDAYzMjKiFcnqZwcEl8vdv3+/np5eaGjo3Nwcj8dDEOTNmzcAAF9f35KSEgCAjY1NUFAQAIDJZGIwGPiBNDY2bm5uZjAYCII0NzejBnk8nomJyeDgIDpCpVLNzMzgrq9fvyIIsrDPXH062Gw2vAoqlUoymYwgCJPJhHTAS4ONjc1COkxMTOD2xehIT0/H4XCajvz9/dEDAum4f/++VjCrf1jodDqRSISyRCLZunVrcXHxhw8fYHao1WosFpuTkwOFM2fOqNXqxsbGjRs3wi0YDKalpaWjo2PdunWah8XFxeXGjRuoWlVVZW5u3tXVBdWxsTGYHVrBrH520Gg0PT09CoXS2Nh4+/ZtFxeXsbExWGKIRGJKSoqZmRkOhxOLxWh2lJWVGRkZwXpsbGxcUVEBADAyMkKz4+PHj3p6enQ6HQDQ0dGRmpqKxWLr6upQpzA7+Hy+VjCrTwebzeZyuTU1NVQqlUqlou8pdXV1ISEhHA6HTCYzmcyurq7g4OBz5861tLScP3+eQCDQ6fTS0lICgRAVFSWTyWJiYhISEgAAfD4/LCyMQCDAJ/XY2Njy8nKtq3leXp63t7dKpdIKZvXp+F3g8/lOTk5CoXDJlXK53NHRsaGhYeHU30MH+Lcr1d3jy2Sys2fPoj/0tPBX0QEA4HA4un/HSCQSHXf0/wHXPQ+ikTlfzQAAAABJRU5ErkJggg==" alt="" />,对分的样本权重降低aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAGQAAAAmCAIAAAC9EKlkAAAIqElEQVRoge2aaUxTTRfHL2pEgRjcawgRBCyiRVSSGlFqBAMYQlSkFVBUIMVEaBR7VRQEY9KAVsKmgCZQIQQrhkoVAUWwKKhgZEuoFBsWRZa2lAoU2t5O3w/z5gYBoSw+L/g+v09nzsw99+TfmTtbEfAveoP8rxNYSPwr1jRYwGKp1eqfP3/iRaVS+affuFDFwjAsLCzs1atXuMfPz0+hUPzRly5UsQQCAZVKBQCoVConJyexWFxYWBgYGKjn4wUFBaWlpd3d3d+/f9f/pfNXLKlUSqPR3N3d/fz8PD09ORzO6I5Do9HKysoAANnZ2U5OToODg3K5fP369XK5fHyojo6O9vb29vZ2qVSqUqlOnTqVnp7u4+NDJBJramr0T2n+igUAiI2NNTc3l0gktbW1oaGh27Zt6+rqglW4WJGRkQ8ePAAAyOVyBEGam5vHx3F2diaTyWQyOSYmJi8vz93dXavVBgQEJCcnTyufeS3WgQMHQkNDoa3RaKytrVEUhUVfX9/Xr18DAGxtbcvLywEAcrmcQCD09/dPHpPFYsXHx+fk5FhaWgoEgmnlM3/FGh4eNjExKS0txT3h4eE0Gg3a79698/LyUqvVbDZbJpMBAJ4+fRoSEgIAGBkZQVGUTqdTqdTKysoxYcViMZPJZLPZxcXFXC53WinNX7F4PN7y5csHBgZwz5kzZ3CxMAxjMpnV1dV4LYPBgI1jY2MPHz4MAIiKiiKRSMPDw3OV0kzEwjBMq9X+rlaj0QAA1Gr1JG30wd/f/8iRI6M9NjY258+fn/JBHx8fNzc3Op0eEBAQEBAwh+uJaYul1WqPHTvW0dEBAOju7lapVAAAhUIBDalUSiQSe3p6MjIyIiIiZpyWWq22srKKj4/HPTwez9DQUCwWT/msj49PXFwctJubm9Vq9YzTGAOimyZ5eXlnz57V6XQSiWTr1q1isVin06WkpOTm5sIG9vb2vb29KpWKSqXC2hnw4cMHBEE6OjpgkcvlWltb83g8nU5XUVFRUVExybMhISEWFhYikaizs5NCofT19c0sh/FM0LPkcvm5c+dWr15NpVK9vLx27dqVl5eH1wYGBvJ4PADAkydPDh06pNFo+Hx+UFAQi8WCX1MSidTT0wMAuHLlCoPBmMEP+OzZMxsbG0NDw9OnT7u6urq4uDAYjPb2dlibnJz8/v37SR4XCoUEAsHIyGjVqlXPnz+fQQK/Y+JhmJCQYGVlBe2qqqq1a9fis1JgYCDM4Pjx4xwOBwAQERFBoVBiYmIePnwIfhVrzEdn9igUiiVLlkw5GAcGBkpKShoaGub27RMPQ29vbzqdjhd9fX337t0LbSaTmZ2drdPp1q1bV1hYCJ0pKSl1dXXQtre3FwgEOp0uIiIiMTFxroYApK2tjUwmz21M/ZmgZ8lkMlNTUzjWIFFRUQjy35ZtbW07duzo6+sTCATwow4AEAqFcLEDAODz+S0tLQqFYvfu3VKpdHRktVqtHMfkPyacSXBycnJcXV2n2SHmjAnE4nK5uDSQixcvbtq0CS/yeLxPnz5NHjcqKqqkpGSM89atW5vGMfmau6mpaXSxq6trwg3NP8MEYlGpVAqFghdHRkZsbGwuXbr0p1MpKyvLz8+vmDuqqqrmNsOxYimVSktLy9TUVNxz7969NWvW9Pb2ajSaWa4zHz16dGIcg4ODsPbt27dZvxIbG0udBTQajf4bPDw8NoyDxWJNIdaYb1h+fj6CICKRqKWlpaam5vLly2Qy+fPnzzqdLjU1NTMzczYfyJaWltJxqNXq2cT8J/mlZ718+dLf3z8uLu7kyZP79u0LCgoqKiqCR7darTY4OPjr16+z6VkLHX23OxiGrVy5Ep/y/j/Rd7sjEonMzMz+aCeHSCSSpKSktLQ0kUhUW1s7m1Bwe6TT6TAMk0gk4xu0trZ2dnbqH1DfnsXj8Ugk0h/93QAAX758IZFIHA4nIyPD1tYWnrLPjLi4OHwGb2trO3jwIJFIDA4OptPpLi4u8OCwtbXVzs5uymUQjr5iwfXkDJKeFvHx8Xv27IF2Q0PDiRMnZhYnKyvLzc0NHhZBwsLC9u/fD20+n29qagpPpRsbGx0dHac8X4XMr8O/+/fvIwgSGxsLi8nJyQMDA0lJSYmJiQAALpfLZrM7Ojp6enpu37798ePH/Pz8hISEoaGh6upqNpstFAoBAMPDw5aWlgUFBXhYDMN27twZGRmJe7y9vZ2dnaHt7+8/5aIBMr/EglcvBgYGbm5uFRUV0Mnlco2MjAAAvb29xsbGZWVld+/eXbx4MYVCYbPZDg4OHh4eV69e9fPz27Jly9DQUGVlpZGR0eg7LpFIhCBIXV0d7rlx44adnR2009PTHR0d9VlCzi+xAAAajSY3N3fjxo2LFi3KzMwEAJSXl0OxAAAmJiZw+BgbG8Mb1vDw8M2bNwMApFIpvN3JysoiEomjY7JYLCKROFoOb2/v7du3Q7u+vh5BEH1On+eXWIWFhRKJBAAwNDTk6ekJNfqdWPi9IRQLvwpjMpljxPLw8AgKCsKLg4ODBALh5s2bsLhQxYqPj3/8+DG037x5o6dYcE7AxRrTs2Qy2dKlS4uKinDPtWvXzM3N8XG6gMUiEAhZWVmdnZ3BwcFwNmxqalq2bFlCQgKDwTAwMLhw4YJSqTQ2Nn7x4oVGo4HbfgzD6urqEASpr68vKyvbsGED3HhgGJaenr5ixQqZTNbf319aWoqiqJOT0+itCIfDIZFIGIZNmd78Euvbt28CgSAhIQFF0YyMDHzuT05ORlFUKBRGR0dXV1fz+XwURa9fv15XV4eiKIqijY2Nd+7cQVE0LS1NqVSamZkVFxcDALq7u9FfGX9wFBgYGB0drU9680usuYLL5R49elSfziISiSwsLH78+KFP2L9TLK1Wy2Qyp/xTTWdnp4ODA1zN68PfKRZEJBJN3qCvrw9OvnryN4s15/wH0jupiUW3FNcAAAAASUVORK5CYII=" alt="" />; 然后用这个数据集训练第二个若分类器,迭代到弱分类器错误率为0或迭代指定个数的弱分类器停止;
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAV0AAAEMCAIAAADoOPKyAAAgAElEQVR4nO2deSBU3fvAr6RVESJLGyW02NuLkrJU9iVZIlmKUKm077tUtPJSvZVKKVsq0qaiV0SJFvtaGPs2zPj9cb/v/LyzuTNz751hzucv3TtzznNvM595znLPgcgAAADwXyBuBwAAAHgO4AUAAEAN8AIAAKAGeAEAAFDDzAuhoaFLeJtly5adZYsnT57gdosBgH4HMy/IysreuXMnj7dJSEg4xTrTpk1LSEjA7S4DAP2LPrxAJBJxCwVPFi1adO7cOW5HAQDwKPzrhcTERG5HAQDwKPzrhTNnznA7CgCAR2HoBQKBICYmBrwAAPAhDL2QmZkJQRBdL9TU1KDbd1hbW4vlNdIBeAEAYAI7XvDy8oJQ5fDhw1heIx0WLVqUmZmJc6UAQH+BHS9oamoOAC98//4d50oBgP4CO15wdnZG1wvW1tZYXiMdgBcAACZAPQzIysqCIKirq4v2lJaWFrpeOHLkCKMwMGLx4sU/fvzAuVIAoL8A2hGY4OPjY02Pa9eu3WSR4uLieo5pbGzE7mIBAw/gBUwYNGjQlStX3iAgKirKHxkGBgbwH6tXr1ZmEXV1dbdeGBsbp6SkYHf5gP4OO+0IR0dHdL0w8NoRoqKi1dXV2JXPIQ4ODj4+PtyOAsC7sJMvEInEtv/y9OnTe/futbW1ZWVl3WOd3NxcLN1HB6zzhTFjxuA/KQM59vb2mzdv5nYUAN6FnXxhAIB1viAtLY1d4ZwD8gUAc9jJFwYAWOcL48aNw65wzkE9Xzh8+LC7u3trayuKZQK4CDMvyMrKdnd34xkNbgAvoOsFKysrf39/Q0PDp0+fAjsMAJi1IwwNDfFMXfAEtCPQbUdYW1u/fv26vr7eyclJXV39y5cvKBYOwB9m+YKBgQGeisITkC+gni+8evUK/vvNmzf6+vq6urpgRmn/hR0vzJs3bwgHLFu2zLwXhoaGx48fx+wC6dDW1jZu3DjgBRQLVFJSoniBTCaTSKRXr16pq6tfuHChqakJxYoA+MCOF/r7vCbgBUzzBQqtra22traysrIfPnxAsS4ADvCjF8igHWFvf+3aNRQLpMoXelNcXLxr1y5NTc2nT5+iWCMAU/jRCyBfsLe3f/78OYoFysvLM/ICzM2bNzU1Nf39/fGfwwZgA370QkdHx6RJk4AXUCxw+PDhzL1AJpOJROL+/ftXr14dERHR0tKCYu0A1OFHLzQ0NEAQBLyAYoFIvEDhzJkzU6ZMQf56AP4w9EJaWtpA9UJnZ+fUqVOx80JpaamUlBRGhaMCd71AJpPT09NXr17t6+vLy0+R8DMMvXDw4EFGXrh06dIeDrC0tFTtxfLly9PT0zG7QDo0NDSIi4tj94lMSEiQkJDAqHBU4LoXYM6dOycvLx8bG4tiJABUYMcL/Z2GhobBgwcXFhZiVH5JSQnIFxDS2Njo5uZmamoKxjJ5Cj71Aqb9CyBfoKWxsbG0tDQ0NDQ0NLSiooJqslNMTMzKlSu3bNlSX1+PYlQAtuFTLwwdOrSkpASj8vnQC2JiYpWVlYzO/vjxY86cOZMmTVJXV1dXV58wYcK4ceNiYmJ6v6a1tfX48eMaGhrh4eEoBgZgDz71goiIyO/fvzEqn/e9sHz5ctS9UFFRwejsmjVrLCwsKM9ZtrS0xMbGTp06lfYp/paWloCAADU1taysLBTDA7AKn3pBUVERu/J53wsiIiLoeoH5uOz8+fNPnjzZ+0hdXZ2YmFh7ezvd12dnZ69fv/7AgQN1dXUoBglADsteaG9vX7Vqlaamprm5+QEOiIiICAwMvHLlSnh4OM5P7AMv4OyFO3fuSEhIXL16NSMjIyMj4/79+yoqKl5eXiQSidFbiETigQMHFBQUwsLCmGQiAIxg2QsNDQ0SEhLozl/4/PkzltdI5xJ4zQtFRUWvX7+me4pAINA+40QikXJycshkcnZ2NnyksrISPoKExYsX4+kFMpmcnJy8Z8+eMWPGQBC0d+/e27dvd3V19VlsfX396dOnZWVl7927h1KkAESw7IW6ujoRERE+90JOTg4EQV5eXhUVFbSfb5a8cObMmfz8/KKiotWrV1OdyszMVFVVTU1NFRAQaG5ulpOTo3yZiUSihITEtm3bRo4cGRISoq+v7+LisnHjRoSVioqK4umFpqamX79+sV349+/fN2/erKen9/btW7YLAbAEy16oqKhAVwpc8cLEiRMZNW6RkJ2dTQne1tb2yJEjzc3NlLPIvVBRUSEsLAz3zG/YsOH06dOUU7m5uenp6RAEXb58WVVV9fbt22ZmZnFxcR8/fiSTyUQiUUpK6vjx46ampsHBwebm5lu2bEHuhdmzZ7PqBTMzMzMzM0aLKTD3gpGREfLYGJGSkrJq1SpXV1cCgcBhUYA+4UcvvHnzBoKgsWPHjmPAjBkzNjDFwsKC6hLExcX19fU3bdpUV1eXkJAwY8aMPsNob283NDSkdMgVFxdLS0vv2bOnra2NTCY/fPjw6tWrdnZ2cHa2fv3627dvh4WFwb+Z3d3d8LOJdnZ2cHePv79/amoqwjsgJSXFqhemTp0KQZCYmJi1tfXhw4dramp6n2XuhcLCwn/++Yel6hhx+/ZtMTExql5MAOrwoxfevn0LQZCUlJQsUyQlJdm7HEVFRWdn5z7D2LZtm6am5uVezJw5c+zYsRoaGmQyubOz86+//lqxYsXFixdnzZoVEhIyffr0q1evwu99+fIl/BZ5eXlra2t/f381NbUdO3bcvXsXyR0QERHx8fG5zApUd0NeXt7Y2Dg4OLigoICMoH8hKSlp9+7dxsbGxsbG+/bt4+ShqR8/ftja2urr6//zzz/YDTbzOah5ge1vEcSr7Yi2trZPDLhw4QLtVWhqah48eDA5Ofnx48d9eiE7OzswMLC1tTUoKEhBQeHTp08ZGRkSEhKUoZmAgIDz5883NDSkpaUJCAiQyeR3795t3boVPltcXBwdHS0tLX3v3r1Pnz6tXbt2xYoVKSkpVVVVSO4Aij1EkpKS6urqoqKiTKq7ffu2hIREaGjohg0bIAi6ffv2lClToqOjkYTKiKSkpGnTpg0fPhw8XoEFzLxgaGhIe7y7u9vb23s1U8aNGzcIMRAEUTrV8YHzfsfe/Qv6+vp+fn5VVVWUr3RCQgKSfAHGyspqz549ZDI5KytLRkam93F4L0kTExMIglxdXd3c3ObMmWNkZEQmkzMyMkaNGnX16tXDhw+/ePEiOjraxsZmzJgxu3btQlKpiIjIvXv3qqqqqqqqfv36FdoXQUFBI0eOpJWCsbFxVFRUVVUVwvkLdXV14uLi6enp//zzj6ysLCe7k5w6dcrS0lJERAS+IQB0YeYFf39/HCJAPrqGFpx74e7duwYGBkePHn3z5g3tWeReiIuLU1VVra6uJpPJFy5coB2P7OrqsrW1dXBw0NfXLy0tpRwvLy+XkJB49uyZoKBgcHDw4cOHL126hLxvj435C3D/AgRBIiIikyZNunnzZu8nnZh7QU9P78SJE3DYYmJicXFx6enpcnJy7HkhNTXV1NR0x44d1dXVnp6eEAS9fPmSjXIATGDmhWPHjuEZCm7gMH+hTy8QCIQzZ87o6emVlZXdvn3by8tLRUWF6lGuT58+2drampubNzc3x8fHy8vL+/v7nzp1ikwmE4nEkSNH7t27d+nSpcHBwfr6+mZmZth5oaamZuLEifPnz09KSqLbpGfuhcTERHFxcTU1NWlpaQMDg7KyMhUVlb179yIPAKalpeXo0aN6enqUZytiYmKEhIQWL17MalEA5jDzQkhICJ6h4AYveKGkpOT58+fwD2Z3d3dUVBRVJ8vhw4cvX76ck5ND+VH99euXt7c33LNIIpFSUlKIROKbN29yc3OzsrLy8/ORzxFg1QtdXV3MRwf77HfMy8sLCwsLCwv78+dPfn5+Xl4eS1uZkUgkR0fHBQsW3L9/v6Ojo/cpAwODQYMGJSYmIi8N0CfAC+jDUv8CV8B5HnRZWdnFixc/ffpUVla2ePHi2bNnX7x4sbGxEUnJzc3Nt2/fXrVq1bNnz+i+4Pbt2xAEOTk5sRM3gAEM96E7dOjQxYsX8dz6CjcaGxunTZuGXflPnjxxcXHBrnzOERUVTUpKQrFAJvvudXV1LVy40MfHp7m5WVlZee3atTExMbNmzdq5c2efxX769MnIyMje3r6srIzJy1avXj18+PDv37+zEzqAHiznC+3t7Q/p0dTU9PPnzz9//hBZB0vx0aG+vp7P84WxY8eWlZWhWCCTfKGurk5AQODXr18pKSnS0tJwmhAfHz906FCqFkFvCgoK4Pla+fn5fdYeExMDQZC5uTl7wQNoYdkLA2Be099//83nXpCUlKSasMgJ7e3tzNetW7Zs2d69e8vLy4cOHRofH9/d3b1t2zYTExO6Ly4pKXF3dzczM4uPj0ceADwlDLu1+fgNltsRlZWVqHshOzsby5yImoyMDD5vR6C73XZRUZGEhASTF/z69cvIyGjChAkjRowQFhZWUVEZPnz469evqV5WUFBw+vRpDw8PeC8slmI4d+4cBEFbt25lOXoAPfgxX7h37x6m+YKPjw+P5wvobm/x5s0bERER5q/p7OzMzMwMDw8PDw/PycmhGt3o7u5+//799OnTt23bxl67sqamRklJSVJSEiw8jwr86IXU1FRMvXDq1Cm+8kJycvLo0aPZfvunT5/09PSMjIw4XHHz6NGjEBiYQAl+9EJMTAymXjhx4gTwAhKIROLu3btXrVr1/PlzztfsamxsHDNmzLhx4xCOgAKYwNAL+/fvH6heAP2OXPdCV1dXRESEjIzMtm3bWJrgxBxXV1eIG9ujDzwY9jvCk09oj6Pe7ygoKJiXl4dlHwo1t27dsrKywq78EydO8FW/44sXL0RERJC/PikpSVlZOSAggEQioRhGT0/Pz58/x44dC0EMP9UAhDDMF7S1tenmC0Qi8QsCUlNTjyPj4sWLWIqPDljnC4GBgSBfoEt2draVldXWrVt//vyJYgC9sbOzgyDo8uXLGJXPJ7DshQHA33//bWVlhV35Xl5e/OaFMWPGMHkBiURqa2s7derUihUrHjx4gGLVtOTl5Y0YMWLChAlMFpsG9AmfegHTfCEkJISXvZCTkyMsLIxigcnJyfPmzWN0tqmpacuWLePHjw8MDESxUib4+PhAEDRQP734wKdeMDU1xa58T09PXvbCmzdvhg4dimKBycnJtItZk8nklpaWy5cvGxoaXrt2rbOzE8UamfPjxw8IgrS0tPCsdIDBp16gLIiGBTzuhXfv3g0bNgzFApOTk+3s7KgOfv/+feXKlcbGxkgecEAdQ0NDCILAEm9sA7yAPjzuhfLycubdAaxC5YWSkpL169dra2unp6ejWAurIQ0ZMkRPT49bAfR3WPbC79+/lRCjpqa2atUqZWXlbYw5evQolhdIBz73AhbtCNgLBAJhx44dpqamGRkZXJ9cBA9MPHz4kLth9FNY9kJ4eDi68xfGjx+P5QXSgc+9UFlZKSYmhmKBycnJNjY2jx8/Xr58+datW8vLy1EsnG1iYmIEBASMjY1RnDfFP7DshaSkJOAF5vC4F1DPF96/fz906ND169f3XpmWF5g3bx4EQVFRUdwOpP/BshfgZbOAF5jg6enZe0c5XgN1L5DJZOzmKXHC9evXIQiysbHhdiD9D2ZeePToEe3xq1evAi8wB84XmhCDXSR0wcILvAmJRJKWlh40aNCnT5+4HUs/g5kX6G5qBrzQJ2/fvt3yX/T19RUZ40YPc3NzHcYcPXr0DLts2rRJRUUFu8vnKYKCgiAI4uVmHW8CvMCj1NTUfGeXrKys+Pj4CxcuMFIDF0cQcYZIJCorKwsLC+fm5nI7lv4EMy/QXSejpqbGji00NDQWLFgwigb8f7v6hRcAaBESEgJBkLu7O7cD6U+w7IUBAPACX9HY2Dh+/HgRERHk++4AgBcAA5/t27dDEES7/SeAEcALAHTAedEtlujs7JSRkZGXl8d/9Kefwo9eOHjwIPACv+Hl5QVB0M6dO7kdSP+AH71w9OhR4AV+o6ioSFhYeMqUKdwOpH/A0Avz58/X1ta2Ysru3btvosTff/+dmZlZXFxMYAuWnrQHXuBPAgICIPDwNTIYeqGkpOQNMm7durUdVXR0dFRUVKZPn25lZWVjY6OCAFNTU3g6kI2NzcK+kJCQOHDgAJ53GcALZGVljRgxQl9fn9uB9AMG2sq5zc3Nv/oiOjq6ra2N25ECuMCSJUsEBQXv3bvH7UB4HYb5AgAw8EhLSxMSEjI2NgarwjIHeAHAXxgYGEAQlJqayu1AeJqB1o4AAJgDP3zt6+vL7UB4GpAvAPgOTU1NQUFB8PA1E4AXAHwHnDKcP3+e24HwLsALAL6jo6NDTExs7NixZWVl3I6FRwFeAPAjhw8fhiBo37593A6ERwH9jgB+pKurS1hYWE5OrqWlhdux8CIgXwDwKQ4ODhAE3blzh9uB8CLACwA+pbS0VFBQUFZWFvQy0AK8AOBfnJ2dIQjatWsXtwPhOYAXAPxLXl7ekCFDtLS0uL5rHq8BvADga0xNTcHD17QALwD4mi9fvkAQtHDhwq6uLm7HwkMALwD4mu7ubmVlZTAwQQXwAoDfSUtLgyDIyMiI24HwEMALAH6no6NDU1NTQEDg3bt33I6FV6DvhY6Ojh07drj3hb+/fxwGfPnyBQ4jOzsbx1sB4F8iIyMhCLK3t+d2ILwCfS9UV1eLiYndu3cvhnWuXbsWHBy8gTPgxRrhPyZPnqyvry+NjPHjx7u5ua1bt24VY8zMzCoqKnC+0QAeR0NDAwxMUGDohVmzZuEcCoo0NTX9wxhVVdWXL19yO0YAb/HgwQMIghwdHbkdCE8wML3AHBsbG+AFABVEInHs2LFCQkJg52syf3pBWVkZeAFAy5EjRyAIWrduHbcD4T786AWQLwDoUl9fLykpOWjQoJ8/f3I7Fi7Dj14A+QKAEfASb2vXruV2IFyGBS+kpqYaMcbZ2fkSNrx//76MRZhvWwzyBQATZGVlhw8fXlRUxO1AuAkLXggKCoL6CTIyMmr/8uTJE6oLUVNTA14AMOLYsWMQBF27do3bgXATFrxw4MABbn/f2eHWrVtUFzJixAjgBQAjmpqaIAji84EJ+l6IjY0dwF4QFRX98eMHJrcTMCDYvHkzBEH79+/ndiBcg74XQkNDB7AXRo0a9c8//2ByOwEDAnhgYsyYMfDvR2FhIbcjwhv6XigtLR3AXsCtHXHmzJlx48Yxn7jt6Ojo9i/r1693RkBEREQ0Z3z48IFIJBKJRLB9KyO2bt0KQZCpqWlYWNjo0aNTUlJYejuJRIqNjY2Ojg4NDXV2dnZ1dXV2dl6/fj38X7xnz56qqioikdjd3Y1N+JxC3wtFRUUD2AvCwsIfPnzA5Hb+Fw8Pj+joaOSvz8jIeMIZt2/f3ogYMzOzTZs2YXf5/Zrk5OTen6KnT58ieVdOTs758+cNDAyWL19uYmIC3+fQ0NCSkpInT558/vw5LS3tyZMn3t7eGzdutLa2njFjxsGDB0NCQnhtITkW2hEfP36cyyKioqK9b+7kyZPxdQIE0fPCuHHjMLmXNLDqBZx5/fq1lZUVt6PgOUgk0tGjR8eOHdv7U9Tn2rAPHjyYNWuWgYHB8ePHP3/+/Pv3byR1EQiE1NTUNWvWLF261MzMLDk5mUcyCBa8wAYNDQ1/etHW1vaHY+7fvx+OGF9f3+bmZqqocPOCu7s78EK/g0gkjhs3jurXxcvLi+6LSSRSRETEwoULN23alJaWVl9fz16l7e3tjx8/VldXX7Vq1efPnzkIHx2w9QJvgpsXZGVl+coLpaWlvJYPs0deXp6cnFyfXkhKSjI2Nt65cyftbw/bFBcXw7kDd+8kQy/o6uriHApuAC/AoO6F4ODgefPmJScno1gmt/j27ZusrCzFC1Q/k93d3VevXp0zZ05iYiLqVbe3twcFBa1bt+7FixeoF44Qhl4wNjbGORTcAF6AQd0LmzdvvnTp0pYtWywtLQsLC/v7YEdeXh6lQSEpKdnQ0AAf7+rqcnd3P3jwYFtbG3a1NzU16enpHThwgEgkYlcLI0A7AkNkZWUR9mNzBSy8EBkZSSaT8/Ly1NTUbG1tKysrUSwff3o3KHJycshkcmVlpZaW1qNHj3Covbm52cXFxdTUtK6uDofqekN/P+uwsDBVVVWqg6WlpanIwH67XY6QlpbGpyI5ObmcnBx86mKDN2/eWFtbo1igj4/P3bt34b8JBMKjR49MTU0vX76MYhX48+3bNxkZGQiCvnz5Ul1draWl9eDBAzwDiIuLmzVr1s+fP/GsFJP5jrz8I0nGN19obW3Fpy42wC5foFBcXOzp6WloaJibm9vZ2YliXXiSn58vLS39/PlzDQ0NfDIFKl69emVjY4PnDcTECzy+eCaeXuBK4xAhiYmJWHsBprm5eeXKlRMnTuy/652cO3du6NChXJECTHR0tJ2dXXt7Oz7VMWxH+Pj4UB08ePAgQi/ExcVhn+mwD57tiK6uLnzqYoPw8HDs2hG03Lx5c9myZQcPHqypqUGxUhwgkUi2trY3b97kbhh79uzZsmULPnUxzBd2795NdRDkC6zC4/nC3r178ckXKLS3t+/fv3/58uVJSUn9aD/IPXv2eHh4cH0mYnNzs4qKytu3b3Goi2G+sGfPHqqDIF9gFR7PFxITE/HMFyhUVFRYW1svWLDg8+fPKNaOEenp6RISErW1tdwOpKenpycnJ0dLS6uxsRHrikC+gCE8ni/4+/vjnC9QaG9vz83NNTExwbPNzB4mJiZnz57ldhT/z7Vr1/z8/LCuhQUv+Pj4jKAH8AIjeN8Lhw8fRrFA5F6AaW5u9vPzU1ZWfvPmDdezdLo8ePDA09OT1yZoLVmy5Nu3b5hWwUI7ghEvaODljqXW1lYpKSl86uLxdsT27dvPnj2LYoEI2xFU1NbWLliwQFdXF+ch+j4hkUg6Ojo8OB/n7du3tra2RCIRuypYyBcGBiUlJeLi4vjUxfv5wo0bN1AskNV8gUJXV1d6evqsWbOCgoJ458mrwMBAtveYqa+v37lz57Zt27Zt27Zjx47w8PBt27YlJyf/9ddfCQkJr/6F7Sxp7ty5GRkZ7L0XCXznhfz8/DFjxuBTF+974fjx4ygWaGlpydwLra2toaGhdnZ2dnZ2x44de/DgQe8vRltbm5+f37p16+Li4nhhtGLZsmVsz9DbuXMnkp64qKgo9soPCwszMjJi771I4DsvkEH/wr+g3u+4du1aJl6oqKgYO3ash4fHln+RlpZev3491S1qbm5evXq1hoYGd9fmTUtLmzlzJttvh5eBgyBo5MiRTLxw584d9sqvrKycOHEidjMg+c4LHz58GD16ND518b4X0F3yWF9fn4kXtm/fTqWhhoYGVVXVhIQE2hfn5OTo6elt3LgR4cJHqLNhw4Y9e/aw/XZ4EwoIgiZMmPD58+d8ejx8+JBAILBdhaGhIe1aZGhB3wsHDx6k9UJ3dzcvz/ZHyIsXL4SFhfGpi/e9gG7/gry8PBMvbNmyxc7OrvdPHIFAEBcXZzQ5uqOjIzo6WlVV9fnz5/h/8KZPn37z5k22397V1XX69GlYDWvXrsWiWfTXX39ht18efS+YmJjQeuHEiRNiYmI6CKiursYoXM558eLFggUL8KmLVS98/vy5qqqK7ikikZifn4/uxwtnL7S0tBgYGCgoKFA+J9OmTTMxMWE+f6GystLR0VFeXj4zMxPFUJlTWVk5ZcqUlpYWTgrp7u729/eH1WBvb496zv/mzZsxY8ZgtAYEfS/s2rWL1gtOTk4I5zWxuqg2nqCYL3R2djJ/kAa5FwoLCzs6OqKionbs2EF16smTJ/Hx8SUlJaNGjaqpqfHz8ysvL4dPZWVlKSoqOjo6WllZPXv2bN26dR4eHsiXeMbZC2QyuaOj4/v370FBQYGBgYGBgXl5eQg/1jk5Oa6urpqammlpaSgFy4z4+Pj169dzXg6BQLCwsKCoAfWsQU1N7d27d+iWCUPfCxYWFrRemDt3LkIv8PK8JhS9UFNTM3jw4JEjR548eTI2NpZ29gtCL5BIJE1NzevXr7e2ts6ePfvLly+9z1ZXV0tISJSUlGhpaaWnp8+cObOqqgr+8ens7JSUlPz58+ecOXMePHjg5uaWkpJiZmaGMH42vPDx40cmQ2vMvVBQUPDx40eWqusNkUi8e/eumpra/fv3sZ4iGRgYeOTIEVSK6u7unj59Ovy9OHnyJCplUjAzM3v9+jW6ZcLQ98KWLVtovaClpTUwvDBixIhvyGCe+/3586f3Vc+ePdvZ2fnbt28UFyD0QkREhKGhYUdHB5lMfvXqla6uLtyaIJFI8H4zkydPdnV1lZOT8/LycnZ2trS03LdvH5lMzsrKEhMTu3Llio6OzpUrV5SUlLy9vZF/oNnwgpmZmZaW1qlTp+jeHOZeiIiImDNnDkvV0eXo0aOKiopJSUmcFBIcHMzkl1ZfX//u3buclN+biooKZWVlCIIEBQUjIiLQKpZMJjs6Ou7cuRPFAinQ94Kdnd1A9QJLu3JPnDhxCmMYbYcxceJEf3//6OhoJGtkvnr1asaMGQUFBZQjdnZ2kpKSvr6+8AyfhoaGRYsWeXh4jB8/ftOmTS4uLr23RYO7601MTFxdXV+8eLF06dLk5GSEj9yx4YXFixdTLtPQ0PD06dPJycmUTkF5efk/f/4wem9jY2NNTQ1L1TGitLR069at06dP//TpE3slSEhIDB48WEtL68SJE6mpqb1PdXV1ycnJZWdnoxHp/ygrK4M3pBASEkJxV7vw8HCEqj1w4ABLmQUL/Y4DwwsvXryYNGnSWsQoKSmJMGD06NF0L3/w4MEiIiLGxsZz585lPrX+9evXjo6OgYGBFhYWo0ePdnNzs7CwGDx48N27d7jwPM4AACAASURBVOFutoSEBCMjoxs3bpSUlKirq3d2drq7u8+bNw9++6NHj1xdXYWFhdetW/fgwYNp06YtW7bMyckJ4bddT09v0qRJ2traRkZG1sig2m0FRlNT08nJqaysTFJSkklfXV1dnYaGxrVr15qbm/X09MTFxWfMmFFRUcHK/95/SExMXLBgwd69e28yJioqqp4e4uLilPgFBAQMDAysra1TU1Obmpq6urqkpKTQ9UJtbS28hKy0tHRTUxNaxWZkZMyYMQPJK+G1rZcvX3716tX6+vo+ezow8QLWD3VwwosXL9asWYNKUVTtCAiChg8f7u/vHxcXB7/AwMCAuReKi4vh5kNkZKStrS2ZTP748eOoUaMoL9izZ096enpubu6dO3dERETS09MzMzO9vb3h551aW1sdHR23b99eWFjY2NiYlZUlLy/v5eWFMMfW1taWkZFR/hdRUVExMTEBAQFJSUm6/62jR4+m+5gczOrVq2VkZJh4Yc+ePcuWLWtvb1+yZImZmVllZeXmzZtVVFQo6yyzx6FDhxB+LJEwatSo/fv3i4mJ1dbWchJVb2pra+HvzpAhQ9Bd4jA8PHzq1Km5CJCSkup9mWpqagcOHLh//z6jklnwwosXL4IQcO7cORSvHHWw8MLEiRODgoKSk5OLiop6vwBh/0JHR4eqqiqc/D958mTlypWUU5S7qqurC0GQnp4e/M/z58+TyeSrV68OGjTo1atXcL+jv7//5s2b1dTUmCTzvaFqR/z586empqagoKC+vv4XPaqrq3u3I2A0NTU3btz469evtrY2eXl5Jl6wsLCAR1uEhITg1B2+gWzn1eXl5fv27bO0tGTyaTx+/LgeDYx+4dTU1DZt2nTmzBkIgtDKFwgEgra2Nlx+QEAAKmVSoMyqZA9BQUF5eXlTU1PaDwxDL9y7dw/da+ARUPRCR0eHrq5uaWkp3e7Jurq6RYsW9fmIbktLi4WFxaFDh96/f08gEAwMDGhbYXl5edOmTcvPzzc1NT19+nRRURE8T+79+/eCgoKpqamwFzZu3Ojg4IBpvyPFC0ZGRr6+vg8fPuw9PMHcC3///beEhMTbt2+NjIy2bt1KIBAuXbrE/C2MaGtru3DhgrGx8fnz59kY/CORSBISEpSvx4oVKxwcHJKSkuBr6ezslJKSQiVf6O7utrGxgWsxNjZGfRglLCxMVFT01KlTDkxZvXq1gIAArReGDBkye/bsU6dOUUa+KbCQLwwMUPQCczIzMyEIYp4vFBcXr1mzBv7x/PHjx5IlS9atW9fbMsePH/fy8nJycoInBdbW1h45ckRCQsLGxoZMJpeUlBgbG//69cvd3f358+fXrl0LCQlBvpMNG15wcXEJCgp6//493bN9fskjIiJUVFTgz+jIkSPV1NTYmK0UHR2tqKi4efNmVt/YGwkJiQkTJnh6etLd00lfX5/zfKG7u9vX1xf+Bq5atQqLKZvh4eEs9S/AjB8/3tTUNCkpickTKMALWIHEC31SUFAAd0D0pq6uDpVVTNjwQn19PZOxW4Q//oWFhQUFBfCWM7m5uchrf/bsma6urqenZ1lZGfJ30UIgEKKioph8UfX19dl+oonCkSNH4O/hhAkTMJqVGB4ejtCPsrKyqqqqGzdujIyMRPJ/xNALV65cYS3GfgKeXpCVleXNZYhgsJjvyOQz19HRQfVVrKmpERUVRfJkFIFA2Lt3r6OjY15eHgqB9sXOnTs5nNdUWloKP0kpJyeH3YOhlJksfRIXF8fSR5GhFwIDA5GX0o/A0wsGBgY4VMQ2ZmZmeHphy5YtsrKy8+fPv379OjxxC0m/Y1dXV0BAgIaGRkxMDIqhMuf8+fO0E9JZgjI52NDQ8BhjOJzWbWBgkJWVxUkJjGDoBdphhdOnT3v1RXBw8OnTp7EIFC2AFygoKSnh6YWmpqbCwsLY2Ng1a9aIiIjcvHmzTy+EhIRMmTLl4cOHKAaJhMjIyLlz53JSgqKiIpIRAQ4/ititJMLQC9euXaM6iHz+AkaxogLwAgUbGxs8vdCbR48erVy5csiQIYKCgiUlJVRnOzs7ExMTV6xY4e3tTXsWH2bMmMHJ7Mw9e/aI98Xo0aO3b9/OdhVZWVnq6upsv505DL3w4cMHqoPACyzB+17AOV+gZevWrfPnz6c6+OPHDysrK01NTXRnHLKKqanps2fPMK2ioaGBE/UEBweju5x3b+ivB21qapqWlkZ1kDI9gzkCAgLYLVPLOSkpKXZ2djhUlJWVZWhoiENFbOPl5ZWYmIhigQoKCq2trchf393dTSAQKP/88+ePm5ubrq4u7WcPf86ePWthYcHtKJihpKT0+vVrjAoH+QJW8H6+YGtrW1dXh2KB7E1SIpPJ7e3t169fX7Zs2bVr13hkBKe2tlZMTIx2wg+PkJCQoK+vj135fOeFK1euAC/A2NjY8IIXvn37tnDhQicnJxQfNEQFfX394OBgbkdBn3Xr1gUFBWFXPtJ2RENDA91n6fpdOyI+Ph60I2CUlZXRTURZbUeUlZV5e3vPnDnz/fv3HR0dKEaCCg0NDZqamk1NTdwOhJr8/PyZM2diumUR0nyhvb19y5YtLv9l6tSpY2mQlJTETmOcA9oRFJSUlFBcfYTMSr7Q0tISEBCgra2dk5PDa7u89cbFxeXy5cvcjoKa3bt37927F9MqWOh3HBiAfkcKysrK6P4vI8wXwsLCzM3NfX196+rqUKwdCwoKCpYsWVJVVcXtQP6fZ8+eaWlpYZ3FsNC/MDCIjIwE+QKMkpISJ8uiUPHjxw9BQUHm+UJmZuaiRYssLCxoH/rgWQIDA7ds2cLtKP5HU1PTxIkT0V3EgS5854WIiAjgBRglJSUU+x1zc3MhCGLkhW/fvrm6uhoaGjLaLYKXmTdvHp6zsJlw4MABS0tLHIZs6LcjVFRUBmo7ws/PD7QjYJSVlXtPH+CQb9++QRBE244oLCy0tbW1t7f/66+/0KoLZ7Kzs+fMmdPW1sbdMJKSklxcXPDpoKWfLwwaNGig5gvh4eEgX4DBIV+4cOGCoqLihQsXeGEfWk54+fKlg4MDdvtBIglgxowZjLYdQh36XhgxYgSVF7q7u/Py8j58+FD5L0+fPq1kAC6Rs4mfn9/t27dxqIjPvXDlyhVtbW3ap2z6Lz4+PitXruRKz0hRUZG6ujrVKoGYgrR/obKyEuGkJh6f1+Tr63vgwAEcKuJDL4waNaqtrS0/P3/79u2cr57CU6SkpEhISGzfvt3R0RGjRVYYkZOTM3ny5MTERDwrRZovDCQvHD9+HIeK+NALMjIyzs7Ourq6OHSYswGBQGDv1z4jI0NBQQFenv/JkycmJia4NShSUlIUFRWRr82HFkj7F5B7YeLEibhEzia+vr6YTiClkJmZaWpqikNFbIOuF5qamhQVFR8/foxWgaiTl5fHhhcePXokJCQEQRBl245Lly7Z2tp+/foV7QD/A4lEOnbsmJaWVmlpKaYV0QV9L/Re5pwH8fX1xa1/ISQkBIeK2AZdL5DJ5Pr6ehRL4wWio6OHDRsGf7DXrVtHOf7mzRsFBYUjR45g15+6b9++8+fPc2uiBz/mC7h5wdTUNAYZeXl5JM5gI0LUvTDAiI2NHTJkCOWDHR4e3vtsS0tLcHCwm5tbcnIyuvXCG3lSVYcz9L0gJCREtfMfyBdYpaGhIY4B27dvd+cAHR0dGRpWrFhhamrq1hcuLi7G/yImJjbwfuHR4tWrV/DCrRRycnJoX5aZmenr6+vg4FBUVMTh9hBEIvHmzZsLFy7cuXMne4+rowh9L4waNYqqM7mmpobJHmTAC/2Fjo6OT//y/ft3bofDo6SkpFCaD8y9AJOdne3u7m5hYXHu3LmPHz/CGw4jJzs7e8OGDfAyvHgORjKBhXlNtbW1VX2Rl5f3+fNn7MNmHz73AgAJHz9+pNqUeMSIEcXFxczfVVdXFxUVtXr16oULF9rY2Li7uz9+/LihoaGbhpKSksePHz9+/Hj27NnS0tJ6enoRERFcnDRFC9/NdwReACAhPz9/xowZFC8oKysjf29BQUF8fHxMTMzy5cvl6LFq1SoPDw8PDw9urWrbJ/S9ICEhwfUWDkYALwCQUFJSIiUlJSoqeujQIWlpaS8vL25HhCv0vYDduvRcB3gBgIR9+/ZBEOTu7k4mk3NzczncAKbfAbwAAFBTXV0tLS0tJCTE7UC4BsN2RHNzc0FBAc7R4ADwAqBPDh48CEGQlZUVtwPhGvS9sHHjRmdn5zVr1lCNfpuYmEgjQ1dXl+q9GzZscGaRK1euPGSdyspKImM2b94cGRmJ810G9CNKS0tlZGQEBARQXMyq34HfM0719fWJnHHs2LFNbGFgYDChFwO1SxWACv7+/hAEubi4cDsQbsLTzz4CADhTUlIiKSkpLS2N88PUvAbwAgDw/7i5uUEQtG/fPm4HwmWAFwCA/wEPQwgKCnKynezAAHgBAPgfGzduhCCI36Yw0QV4AQAgk/8dhlBQUGD1qacBCfACAEAmk8mWlpYQBO3atYvbgfAEwAsAALmkpGT48OFLlizhcA2FAQPwAgBANjExgSAoNDSU24HwCsALAH6nqqpKWlpaVVUVDENQoL8PHQDAP1hYWEAQ9PjxY24HwkOAfAHA1xQXFw8ePFhRURH0LPQGeAHA18BzFsAjtlSAdgSAf2lsbBQREZk+fXp7ezu3Y+EtQL4A4F9WrVoFQVBsbCy3A+E56Hvh7Nmzgf8SGhr64MGDkJCQ7yzy48cPMplcW1vb3/c4BwxIKisr4WGI2tpabsfCc9DxQl5enrCw8GlkODg4LGaAjo6Om5ububm5i4sL7QYnxsbG06ZNc3Z2hv+gi6Oj41bWuXjx4iumdHd343+jAbwG3LMQFhbG7UB4ETpeSEhIkJCQwLrizs7OxsZGEokE/8Eqz549u80i169ft7W1FRYWbmpqwvrqADxOe3u7oKDgtGnTmpubuR0LL0Kn3/HJkydjx47Fv6sDH6ZOndrc3MztKABcxtPTE4Kg5ORkbgfCo9DPF5SUlPBXFD5MmTIF5At8DjwMoaCg8Pv3b27HwqPQ94KzszP+oeAD8ALA1dUVgqCIiAhuB8K7AC8A+Ivfv3+PHDlSUVGR24HwNEi9UFhY6Mc6AQEBL9ilvLy8rq6uvr6eSCSieMHAC3wOvNzzrVu3uB0IT4PUC9euXUOyyT2KDBkyZOjQocOGDUP3KTfgBX6mra1NUFBQSkqqoaGB27HwNLzrBQrACwC02L9/PwSehkAA8AKAX2hoaBAVFZ0xY0ZHRwe3Y+F1kHrBy8sLeAHQr9m1axcEFmVCBn/lCx0dHRMmTMDHCzdu3DjBCk+ePPnKMYWFhThcWn+kurp65MiRo0ePrqur43Ys/QD+8gKBQJCQkMDHC9ra2m/fvv31X65cuXKOLVxcXM6dO+fs7KzHFGNjY9pHUXrj4OCwfPlyRUVFeXn5R48e4XAfeITTp09DEJScnMztQPoHSL3w8uXLyZMnj2EXISEhXvACGcd2hLa2dklJCQ4VsQSJROrq6mpvbz916tTx48e5HQ5OtLW1DRo0SFxcHAxDIASneU1fv35NZYvY2Fh0t5/mcy9QOHv2LP94Yd++fRAEBQcHczuQfgN/zXesqqrS1tZGd6IUI4AXeITi4uLhw4eLiIiApyGQQ8cLV69eHaheIIN84V9Q98Jff/21Zs0aHuzVO3fuHARB4eHh3A6kP0HHC4GBgQPVC8XFxbNmzQL5AhkDLyxcuNDd3X3s2LE89Q389evXyJEjRUREuB1IP4OOF06ePDlQvZCXlwdBEMgXyNh44dGjR5mZmdu2bfPw8MjJyUGxcLY5ePAgBEFBQUHcDqSfQccLHh4etF748uVLOA1paWm4BIkaeXl5ioqKIF8gY+OF79+/w39nZmZqaGh4eHhUVFSgWAWrpKWlCQgIKCgoDJi9IT5+/IhPRUi9AK+cS4W6ujouQaJGXl7ehAkT8JkGy4degFf6pfD48eOZM2ceP34c3REl5Li7u0MQFBISwpXasSAmJgafivjOCzo6OvjUBbxAJpPr6uoiIyONjY2joqJwXm63uLh4xIgRUlJSnZ2deNaLKa2trfhUBLyAFcALFLq6urZv325vb5+fn4+bHZydnSEIOnnyJD7VDTCAF7DC3Nycl9caxtMLMF+/fp07d+7atWtxGMtMTU0dMWLEokWLsK5ooEJnPWhPT08XFxeqg6tXr6b1goaGBi6L06JGfn6+rq4uPnWZm5s3NTXhUxcbBAUFnThxAsUCFy1a9PPnT+av6ezsfPLkiYaGxq1bt1pbW1GsvTdkMtnKygqCoHPnzmFUxYCHo3xh7ty5eLgLPfDMF5ydnXl5Axv88wUKRUVFbm5uq1atIhAIKAZA4c2bNxAEDaRhCPyh7wXa8V66XrC1tcUlSNTAzQstLS0aGhq8/JQOF70A8/HjRyMjI3d3d9SHh6ytrQcPHgyWe+YE+u2Is2fPUh389OlTTk5OWS9+/vz5+vVrXJIa1MCtHfH7928hIaGKigoc6mIPrrQjqGhpabl69er69etv3LiBVhjPnz8fNGjQihUr0CqQP6GfLzx8+BB/ReEAbvlCdXW1mJgYDz4sQIHr+QKF6upqJycnd3f3goICzsMwNDSEIOjZs2ecF8XPAC9gQnV19axZs3CoiG3w90JXV1d5efmXL18+ffqUm5tLdTYtLW3GjBne3t6cTIKChyFcXFzYLgEAA7yACf3CC+g+4MTcC3/+/HF2dhYWFp41a5aqqurgwYNlZGTi4+N7v6a9vf3Vq1eGhoZ3795lLwYLCwsIgvhqHSqMAF7ABN73gp2d3devX1EskLkX3NzcNmzY0PtIZGSksLBwdXU11Su/f//u4+NjZ2fHaqvk9evXEASZmZmx9C4AXej3O0ZHR+Pf1YEDePY7qqqq4lAR2+jp6eXm5qJYIPN+RzU1tfv37/c+0tDQMHjw4Pfv39N9fUpKiqamppOTU11dHcIAdHR0mBQIYAmk+YKTk5M0Df2udwfkCxSWLl16584dFAtkni88fvxYXFzc2dk5NjY2Njb21q1bGhoa1tbWTB5eaGpqunHjxooVK06dOtXW1sa89vT09OHDh69YsYL9CwD0AqkXli9fTjt/Ye/evbgEiRroeqG+vp7RKTa8gPMknKVLl+LZjiCTyQUFBW5ubgoKCjIyMsbGxnFxcUjmfXV0dAQGBuro6CQkJDCRCDwM8ffff7MTOoAGpO0IJSUlWi/s27cPl6QGNdBtR4wbN87b2/vZs2e0p9hoR8TGxsbExNCWU1FR0d3dbWBg0NPTk5WVRTlVX1+/b9++qKioW7du5eTkPHz48P79+7QlMALndkRJSQmJRGK78JKSkhUrVsCPotGe/fHjx4gRI1auXMl2+QAqkOYLw4cPH2D5AolE4qSoL1++DBo0CIKgIUOGTJgwYcuWLe/fv6ecZSlfuHDhQlpaWnFxMe288k+fPqmoqBAIBFlZWQKBICoqSlkHqbi4WF1d/Z9//hk/fnxAQMCOHTtiY2NNTEwQVspGvmBpaXn8+PGfP3/SPcs8X/Dy8jpy5AhL1VHR3d2dn58/Z86cjRs3Uk2gNjc3hyDow4cPnJQP6A3SfGHSpEkDIF94+vSpmJiYkZGRkZHR+PHjjZBx9OjRyzR4e3vTfZAsJCSkrKysuroaYb6Ql5cnJiaWnp7e09OzdetWPz+/7u5u+NT169cvX74sJyeXkZExbty4o0ePLl++PDAwMCEhoaenp6SkRFJS8t69ezNnzjxx4oSnp6e3t/fRo0cR3go28gVhYWH4MlesWBEUFPTnz5/eZ5nnC1+/fqVcF4dcunTJyMgI3nu2p6cHXpvPw8MDlcIBMEjzBVFR0QGQL4SGho4fP16bMWJiYsOGDZOSkqK9WJYICAhAki+0tLRoampSFufq7OxUUlKys7PLysoik8k5OTmZmZmbNm2aOHGikJCQjY3NhQsXMjMz8/Pz4RfDf5w8eXLz5s0nTpw4ePBgVVUVkvtAIpG0tLQ8PDz2/pcXL16kM4YqZxQWFl65cuXJkyfb29ubmpoUFBSYPw9y4MCB4uLihoaGffv27d2798KFC2ynbK2trfb29hs2bCgrK9PR0YHAFtVog9QL0tLSdPMFPGJEj1evXjk6OjJ5QX19fU1NTVNTU3VfREZGUt0NAQGB4cOHnzp1KiYmpqSkpE8vtLW12draWllZbdiwYf369fAIv7i4OPxrTCaTu7u7T5w4YWlpWVtbKysrW1ZWpqio6O/vTyaTSSTS2rVrL1265OHhoaSk5Ofnt2zZMmNjY2Nj48jIyD7vQ1tbG4fi6424uPj169elpKSYeCEyMlJZWbm2tnb+/PkKCgpnz54dMWLEoUOH+gyVCVVVVSoqKoMGDdLX18dnzU7+gU47YsGCBbTtiOjoaE8aiouLcUlqUOP169fW1taoFPX06VP4WyEqKjpv3jxPT89v375RziLpd3z79u3Hjx97enqio6PNzc17enrKy8unTJnS3t4Ov+Dvv/++c+dOT09Pc3OznJxcT09PZWWlq6vr9+/fe3p6urq67t27N2/ePDifP3nypL29/YsXL2pqapDEr6end/ny5QRWGDZsGJURxMTEdHR0goODa2trIQhi0o6YPHny/fv3v3z5AkEQ3Hf48uXLsWPHIgmVEcnJyVOnTpWTk0tKSuKkHAAt/DXf8cOHD6ampqgU9evXLy0trcOHD9NdrI2lfkcjIyN4KkFycrKGhgZ8sL6+3tvbG95sdvbs2UJCQvCetObm5vDqGGFhYSNHjrx48aKjo+OFCxeio6OXLl0Kf0mQVMpGv+PIkSNhHQwaNGj27Nnbtm2j9P8RCAQIgpj0O9ra2h46dKitrU1GRgZu7MTHx0+ZMoWlACh8/fp1zpw5a9euxW29Q36DjhdMTU0HqheePXvGvB3BEo2NjYxOIfeCn5+fk5MTvAaBu7v7+fPnaV9jZGTk6+s7b968b9++UQ4WFxfPnj27sbERHo8ICws7f/488j5/NrwwZcqURYsWBQYG/vnzh+pUn14oLy8XFRU1NTUVFhZWUVGxtLScNGnSy5cvWQqATCZ3dXWdO3fO09MzLi6O1fcCkEPHC0ZGRgPVC6dOnTpz5gwOFSHxQktLy+bNmwMCAshkcnx8vKWl5YIFC3o30cvKykJCQpycnHbv3k0mk9PT0zU0NAwMDOC1TIqLi+HxCEVFxYCAAD8/PxMTk8DAQIQRsuGFL1++MDoFj6HS+qI38CJuFIqLi1mqvbu7++LFi6qqqqCLEQfoeEFOTm4Ae4Hq6R2MQOIFIpHYe/igrKyMKgEJCgr6/Plz7ynADQ0N165dg3+WOzs74T/KysoqKyvr6upqamr6nC9MAd35jgQCYdq0acxfc/HixV27dq1evXrnzp27du1KSEhAXn5ERISuru6hQ4dqamo4ixSACL7zwrFjx3CoqF88H4GnF3bs2DFp0iRHR0d4pzIXFxdlZeWDBw8yL5ZEIqWmpsIqYW/RFwB7IPWCm5tbMA1FRUW4BIkaeLYjeHxRXJy9MGnSJHi2BYXExEQxMTEmb6msrFy/fr2enh4wAv4g9YKAgADtwHW/m9eEpxcsLCxwqIhtcPaChoaGn58f5Z8tLS1nz56dP38+o9K2bt2qqanJRsckABU48sKDBw9wCRI18PQCj2+6g7MXioqKFBQUJCUlxcXFxcXFRUVFNTQ0qDIIMpnc1tZ27Nix5cuXJyUlgTFILkLfC4mJiVQHQb7AEtXV1XZ2djhUxDb49zuSyeTPnz9fuXLl2bNn7969ozrV2dkZExOzePHiXbt28fL+fXwCfS9QHtqjALzAEtXV1fb29jhUxDZc8QIj4NtlYGBQWlqKVkgATqDvhcePH1MdpOuFy5cv4xIkauDmhRs3bvD4eISuri4veKGurm7fvn1aWlrv3r3j5e25+A36XqB9CoWuFygPAvYXcPNCaGgoL3uhtbV18ODBdPv5y8vL2ZgjwIYXWltb7927Z29vHxMT8/v3b1ZrBGAKUi9IS0sv+Jfx48fDXggLC8MlSNTA0wu83I7o6OiYOHEi7VrMbMOqF3Jzc01MTGxtbcvLy9GKAYAiSL3Qu3O4ra2NQCAQCISuri5so0MbPL3g5OSEQ0VsY2VlheK2sci9UF5evmbNGj09PZAj8DJIvTAwwNMLu3btwqEi9mhtbYUgCOf+hcbGxuDgYH19/ZSUFLTqBWAE8AIm8L4XJCQk8MwXEhISli1bduTIEV7esxNAAXgBE4AXKMCrp+zatQvnhfABnAC8gAm87wUc2hHfvn1buXKlg4MDoyWkATwL8AIm8L4XxMTEUHxmmcoLBALBz8/P3t4+ISGBw/X4AVwBkRcqKipERERGc4a2trYdu7i6ut5hHdoxMAcHB+AFMpnc2to6efJkFAukeIFIJF68eHH69Ol37tzpd8NVAAqIvBAXF0c7qYn38fX1pbo0kC/AYOSFuLi4RYsWXbx4EV7nHtB/obMe9Pjx47u6unofiY+P5/Z3nB38/PyoLu306dOBgYEYLqP7L2FhYbt378ahIvZoa2uTl5dHscD6+npRUdFjx45lZGSgWCyAW1DnC93d3bKysgMjX7C2tqa6OpAvwKCeL3R2dva7SfEAJlB7ISsrC4KggeEF2qWfHRwcnj9/jukNhQkNDfXw8MjvC26tVoi6FwADDOp2xOfPnyEIGqjtiHXr1uHTjsjIyNiwYUNgYKCPj48uYywsLNx6sXLlyql9sWzZMj8O2L17d0pKyvXr19FtRwAGGIjaEZmZmdz+jrMDbb/j4sWL8ckX2INIJDZzRnx8fCQy0tPTuX25AN4FUb9jT0/P69evX/2XbSixaNEiJdaRlJTs0wt79+6lugodHR188gUAoF/TX+c11dfX/+gL2gUCFy9ejE+/IwDQJak6tgAAANdJREFUr6H2QnZ2toCAAO97gT0WL1589+5dbkcBAPA6dPodp0+fTiKRuJK9YI2Ojk5UVBS3owAAeB1E45QDhsWLF9OuTQ4AAKgAXgAAANRQe6GlpUVVVXXDhg1U4+oL+sLT0/M4xzx69OjOnTtfWaSwsBDh1QIvAABIoDMe0dLSQnWkpqamgAPevXv34cOHjIyMcxxgZma2jAErV650o8eiRYt0dXUn92LYsGHACwBAn9DxwoCBSCQSicT2/wKWAwAA+mQgewEAALAH8AIAAKAGeAEAAFDzf70PF+nvQVaOAAAAAElFTkSuQmCC" alt="" />
直观如图,第一个分类器每个样本权重均等,最后根据错误率计算alpha=0.69;然后调整样本权重,错分的权重增加,得第二个分类器的alpha0.97;同理第三个分类器的alpha=0.90;最后投票,总的结果是= 0.69*D1 + 0.97*D2 + 0.90*D3
(1)弱分类器:本文采用是时单层分类器,又叫树桩分类器,是决策树最简单的一种;
def stumpClassify(dataMatrix,dimen,threshVal,threshIneq):#just classify the data
retArray = ones((shape(dataMatrix)[],))
if threshIneq == 'lt':
retArray[dataMatrix[:,dimen] <= threshVal] = -1.0
else:
retArray[dataMatrix[:,dimen] > threshVal] = -1.0
return retArray def buildStump(dataArr,classLabels,D):
dataMatrix = mat(dataArr); labelMat = mat(classLabels).T
m,n = shape(dataMatrix)
numSteps = 10.0; bestStump = {}; bestClasEst = mat(zeros((m,)))
minError = inf #init error sum, to +infinity
for i in range(n):#loop over all dimensions
rangeMin = dataMatrix[:,i].min(); rangeMax = dataMatrix[:,i].max();
stepSize = (rangeMax-rangeMin)/numSteps
for j in range(-,int(numSteps)+):#loop over all range in current dimension
for inequal in ['lt', 'gt']: #go over less than and greater than
threshVal = (rangeMin + float(j) * stepSize)
predictedVals = stumpClassify(dataMatrix,i,threshVal,inequal)#call stump classify with i, j, lessThan
errArr = mat(ones((m,)))
errArr[predictedVals == labelMat] =
weightedError = D.T*errArr #calc total error multiplied by D
# print "split: dim %d, thresh %.2f, thresh ineqal: %s, the weighted error is %.3f" % (i, threshVal, inequal, weightedError)
if weightedError < minError:
minError = weightedError
bestClasEst = predictedVals.copy()
bestStump['dim'] = i
bestStump['thresh'] = threshVal
bestStump['ineq'] = inequal
return bestStump,minError,bestClasEst
原理:遍历每个属性,以一定步长,枚举大于和小于:找一条错误率最小的与垂直坐标轴的直线分开样本点;
例如 ins= (a,b,c) , 找到的若分类器是 a= 1 or b = 2 or c =3 这样的垂直坐标轴的直线;
(2)adaboost训练分类器的代码;
原理如上介绍,训练分类器就是为了得到若分类器的参数dim,thresh,ineq和alpha,前三个参数dim,thresh,ineq是弱分类器树桩分类器的参数,最后一个alpha是集合多弱分类器结果的权重;
def adaBoostTrainDS(dataArr,classLabels,numIt=): weakClassArr = []
m = shape(dataArr)[]
D = mat(ones((m,))/m) #init D to all equal
aggClassEst = mat(zeros((m,)))
for i in range(numIt):
bestStump,error,classEst = buildStump(dataArr,classLabels,D)#build Stump
print 'error',error
#print "D:",D.T
alpha = float(0.5*log((1.0-error)/max(error,1e-)))#calc alpha, throw in max(error,eps) to account for error=
bestStump['alpha'] = alpha
weakClassArr.append(bestStump) #store Stump Params in Array
print "classEst: ",classEst.T
expon = multiply(-*alpha*mat(classLabels).T,classEst) #exponent for D calc, getting messy
D = multiply(D,exp(expon)) #Calc New D for next iteration
D = D/D.sum()
print 'D',D
#calc training error of all classifiers, if this is quit for loop early (use break)
aggClassEst += alpha*classEst
aggErrors = multiply(sign(aggClassEst) != mat(classLabels).T,ones((m,)))
errorRate = aggErrors.sum()/m
print "total error: ",errorRate
if errorRate == 0.0: break
return weakClassArr,aggClassEst
(3)测试adaboost代码:
根据弱i训练分类器得到的参数,使用设置参数的弱分类器对测试样本进行预测,最后结果通过alpha集成;
def adaClassify(datToClass,classifierArr):
dataMatrix = mat(datToClass)#do stuff similar to last aggClassEst in adaBoostTrainDS
m = shape(dataMatrix)[]
aggClassEst = mat(zeros((m,)))
for i in range(len(classifierArr)):
classEst = stumpClassify(dataMatrix,classifierArr[i]['dim'],\
classifierArr[i]['thresh'],\
classifierArr[i]['ineq'])#call stump classify
aggClassEst += classifierArr[i]['alpha']*classEst
print aggClassEst
return sign(aggClassEst)
三 病马数据集实例
datArr,labelArr = loadDataSet('HorseTraining2.txt')
classifierArr = adaBoostTrainDS(datArr,labelArr,)
testArr,testLabelArr = loadDataSet('HorseTraining2.txt')
prediciton = adaClassify(testArr,classifierArr) error = mat(ones((,)))
error[prediciton != mat(testLabelArr ).T] .sum()
这个实例就是调用了上面adaboost的接口,值得注意的是,这个病马的数据集是我们在上一篇文章logistics算法时用到的,在logistics里错误率是0.3,因为这个数据集有很多缺失值,难预测;而adaboost的50个弱分类器的错误率只有0.21;
主意: 弱分类器的个数,太少易欠拟合,太多易过拟合,最好的是适当的个数;就像一张经典的图,横坐标是弱分类器的个数,训练样本的错误率越来越低,测试样本的错误率是对勾型,取拐点处个数最好了,既不过拟合也不欠拟合。
四 不平衡分类问题
不平衡问题是正例和负例的比例相差很大,比如信用卡账户是否欠账,5个正例,5000个负例;
1解决方案
1)预处理级:过采样和欠采样及混合采样;
抽样过程可以通过随机或制定的方式实现:
(1)过采样:复制正例样本,增加样本个数;或者增加和正例样本相似的样本;
(2)欠采样:删除距离边界较远负例样本,上例中为了平衡,需要删除4950个负例;
(3)混合过采样和欠采样
2)算法级:代价敏感;
举个例子说明是什么代价敏感分类器:
二分类器代价矩阵:
真实结果|预测结果 | +1 | -1 |
+1 | -5 | 1 |
-1 | 50 | 0 |
根据代价矩阵表,求出最后的总的代价,选择代价最小的类做为左后的预测结果。
2 AUC计算代码:
就不能把准确率自己作为不平衡问题的评价指标了,因为在不平衡分类中,100个样本,90正例,10负例;则粗暴的把100个全分为正类就可以达到很高的100%准确率。这显然不是我们想要的结果。召回率这时候也起到了作用,正类中分对了多少,90%。
AUC是最为理想的一个指标:(通过正例和负例pairs的排名计算)
def plotROC(predStrengths, classLabels):
import matplotlib.pyplot as plt
cur = (1.0,1.0) #cursor
ySum = 0.0 #variable to calculate AUC
numPosClas = sum(array(classLabels)==1.0)
yStep = /float(numPosClas); xStep = /float(len(classLabels)-numPosClas)
sortedIndicies = predStrengths.argsort()#get sorted index, it's reverse
fig = plt.figure()
fig.clf()
ax = plt.subplot()
#loop through all the values, drawing a line segment at each point
for index in sortedIndicies.tolist()[]:
if classLabels[index] == 1.0:
delX = ; delY = yStep;
else:
delX = xStep; delY = ;
ySum += cur[]
#draw line from cur to (cur[]-delX,cur[]-delY)
ax.plot([cur[],cur[]-delX],[cur[],cur[]-delY], c='b')
cur = (cur[]-delX,cur[]-delY)
ax.plot([,],[,],'b--')
plt.xlabel('False positive rate'); plt.ylabel('True positive rate')
plt.title('ROC curve for AdaBoost horse colic detection system')
ax.axis([,,,])
plt.show()
print "the Area Under the Curve is: ",ySum*xStep
五 总结
优点:准确度较高,无参数调整;
缺点:对离散值敏感;
数据类型:数值和离散型;
机器学习实战4:Adaboost提升:病马实例+非均衡分类问题的更多相关文章
- 机器学习实战------利用logistics回归预测病马死亡率
大家好久不见,实战部分一直托更,很不好意思.本文实验数据与代码来自机器学习实战这本书,倾删. 一:前期代码准备 1.1数据预处理 还是一样,设置两个数组,前两个作为特征值,后一个作为标签.当然这是简单 ...
- 《机器学习实战》AdaBoost算法(手稿+代码)
Adaboost:多个弱分类器组成一个强分类器,按照每个弱分类器的作用大小给予不同的权重 一.Adaboost理论部分 1.1 adaboost运行过程 注释:算法是利用指数函数降低误差,运行过程通过 ...
- 机器学习实战笔记--AdaBoost(实例代码)
#coding=utf-8 from numpy import * def loadSimpleData(): dataMat = matrix([[1. , 2.1], [2. , 1.1], [1 ...
- 机器学习实战之AdaBoost算法
一,引言 前面几章的介绍了几种分类算法,当然各有优缺.如果将这些不同的分类器组合起来,就构成了我们今天要介绍的集成方法或者说元算法.集成方法有多种形式:可以使多种算法的集成,也可以是一种算法在不同设置 ...
- 机器学习实战笔记7(Adaboost)
1:简单概念描写叙述 Adaboost是一种弱学习算法到强学习算法,这里的弱和强学习算法,指的当然都是分类器,首先我们须要简介几个概念. 1:弱学习器:在二分情况下弱分类器的错误率会低于50%. 事实 ...
- 【机器学习实战】第7章 集成方法(随机森林和 AdaBoost)
第7章 集成方法 ensemble method 集成方法: ensemble method(元算法: meta algorithm) 概述 概念:是对其他算法进行组合的一种形式. 通俗来说: 当做重 ...
- [机器学习]-Adaboost提升算法从原理到实践
1.基本思想: 综合某些专家的判断,往往要比一个专家单独的判断要好.在”强可学习”和”弱可学习”的概念上来说就是我们通过对多个弱可学习的算法进行”组合提升或者说是强化”得到一个性能赶超强可学习算法的算 ...
- 《机器学习实战第7章:利用AdaBoost元算法提高分类性能》
import numpy as np import matplotlib.pyplot as plt def loadSimpData(): dataMat = np.matrix([[1., 2.1 ...
- [机器学习实战-Logistic回归]使用Logistic回归预测各种实例
目录 本实验代码已经传到gitee上,请点击查收! 一.实验目的 二.实验内容与设计思想 实验内容 设计思想 三.实验使用环境 四.实验步骤和调试过程 4.1 基于Logistic回归和Sigmoid ...
随机推荐
- NSNotification\KVO\block\delegate的区别和用法
在开发ios应用的时候,我们会经常遇到一个常见的问题:在不过分耦合的前提下,controllers间怎么进行通信.在IOS应用不断的出现三种模式来实现这种通信: 1.委托delegation: 2.通 ...
- 谢欣伦 - OpenDev原创教程 - 无连接套接字类CxUdpSocket
这是一个精练的无连接套接字类,类名.函数名和变量名均采用匈牙利命名法.小写的x代表我的姓氏首字母(谢欣伦),个人习惯而已,如有雷同,纯属巧合. CxUdpSocket的使用如下(以某个叫做CSomeC ...
- Oracle EBS Setup
1. Prevent close other forms after close original form
- package.json
1,项目按住shift,右击鼠标:"在此处打开命令行窗口" 2,cmd输入:npm init 输入name,varsion....license项的信息,yes 3,此项目中自动创 ...
- Maven项目WEB-INF/views无法引入js,css静态文件解决方法
web.xml针对文件后缀配置以下,对客户端请求的静态资源如图片.JS文件等的请求交由默认的servlet进行处理 <servlet-mapping> <servlet-name&g ...
- iOS应用程序的生命周期
iOS应用程序一般都是由自己编写的代码和系统框架(system frameworks)组成,系统框架提供一些基本infrastructure给所有app来运行,而你提供自己编写的代码来定制app的外观 ...
- 浅谈iOS开发中方法延迟执行的几种方式
Method1. performSelector方法 Method2. NSTimer定时器 Method3. NSThread线程的sleep Method4. GCD 公用延迟执行方法 - (vo ...
- mysql和CSV
1.mysql导入和导出数据可以通过mysql命令或者mysqldump来完成.mysqldump可以导入和导出完整的表结构和数据.mysql命令可以导入和导出csv文件. 1.mysql支持导入和导 ...
- oracle not in,not exists,minus 数据量大的时候的性能问题
http://blog.csdn.net/greenappple/article/details/7073349/ 耗时 minus<not exists<not in
- selinux 导致无法启动httpd
selinux 导致无法启动httpd ansible_dire:~ # /etc/init.d/httpd restart 停止 httpd: [失败]正在启动 httpd:(13)Permissi ...