xgboost的遗传算法调参
遗传算法适应度的选择:
机器学习的适应度可以是任何性能指标 —准确度,精确度,召回率,F1分数等等。根据适应度值,我们选择表现最佳的父母(“适者生存”),作为幸存的种群。
交配:
存活下来的群体中的父母将通过交配产生后代,使用两个步骤的组合:交叉/重组和突变。
交叉:交配父母的基因(参数)将被重新组合,产生后代,每个孩子从父母双方遗传一些基因(参数);
突变:一些基因(参数)的值将被改变以保持遗传多样性,这使得遗传算法通常能够得到更好的解决方案。
备注:我们保留幸存的父母,以便保留最好的适应度参数,以防后代的适应度值比父母差。
xgboost超参数搜索遗传算法模块:
模块将具有遵循以下四个步骤的功能:初始化种群,选择,交叉,变异
import numpy as np
import random
from sklearn.metrics import f1_score
import xgboost class GeneticXgboost:
def __init__(self,num_parents=None):
"""
param num_parents:种群个体的数量 """
self.num_parents = num_parents def initilialize_poplulation(self):
"""
初始化种群,即生成规定数量的种群的基因
learning_rate,n_estimators,max_depth,min_child_weightsubsample,olsample_bytree,gamma
return:array,shape=[self.num_parents,num_gene]
"""
learningRate = np.empty([self.num_parents, 1])
nEstimators = np.empty([self.num_parents, 1],dtype = np.uint8)
maxDepth = np.empty([self.num_parents, 1],dtype = np.uint8)
minChildWeight = np.empty([self.num_parents,1])
gammaValue = np.empty([self.num_parents,1])
subSample = np.empty([self.num_parents,1])
colSampleByTree = np.empty([self.num_parents,1])
for i in range(self.num_parents):
#生成每个个体
learningRate[i] = round(np.random.uniform(0.01, 1), 2)
nEstimators[i] = int(random.randrange(10, 1500, step = 25))
maxDepth[i] = int(random.randrange(1, 10, step=1))
minChildWeight[i] = round(random.uniform(0.01, 10.0),2)
gammaValue[i] = round(random.uniform(0.01, 10.0),2)
subSample[i] = round(random.uniform(0.01, 1.0), 2)
colSampleByTree[i] = round(random.uniform(0.01, 1.0), 2)
population = np.concatenate((learningRate,nEstimators,maxDepth,minChildWeight,
gammaValue,subSample,colSampleByTree),axis=1)
return population def fitness_function(self,y_true,y_pred):
"""
定义适应度函数
"""
fitness = round((f1_score(y_true,y_pred,average='weighted')),4)
return fitness def fitness_compute(self,population,dMatrixTrain,dMatrixtest,y_test):
"""
计算适应度值
param population: 种群
param dMatrixTrain:训练数据,(X,y)
param dMatrixtest: 测试数据, (x,y)
param y_test: 测试数据y
return 种群中每个个体的适应度值
"""
f1_Score = []
for i in range(population.shape[0]):#遍历种群中的每一个个体
param = {'objective': 'binary:logistic',
'learning_rate': population[i][0],
'n_estimators': population[i][1],
'max_depth': int(population[i][2]),
'min_child_weight': population[i][3],
'gamma': population[i][4],
'subsample': population[i][5],
'colsample_bytree': population[i][6],
'seed': 24}
num_round = 100
model = xgboost.train(param,dMatrixTrain,num_round)
preds = model.predict(dMatrixtest)
preds = preds>0.5
f1 = self.fitness_function(y_test,preds)
f1_Score.append(f1)
return f1_Score def parents_selection(self,population,fitness,num_store):
"""
根据适应度值来选择保留种群中的个体数量
param population:种群,shape=[self.num_parents,num_gene]
param num_store: 需要保留的个体数量
param fitness: 适应度值,array
return 种群中保留的最好个体,shape=[num_store,num_gene]
"""
#用于存储需要保留的个体
selectedParents = np.empty((num_store,population.shape[1]))
for parentId in range(num_store):
#找到最大值的索引
bestFitnessId = np.where(fitness == np.max(fitness))
bestFitnessId = bestFitnessId[0][0]
#保存对应的个体基因
selectedParents[parentId,:] = population[bestFitnessId, :]
#将提取了值的最大适应度赋值-1,避免再次提取到
fitness[bestFitnessId] = -1 return selectedParents def crossover_uniform(self,parents,childrenSize):
"""
交叉
我们使用均匀交叉,其中孩子的每个参数将基于特定分布从父母中独立地选择
param parents:
param childrenSize:
return
""" crossoverPointIndex = np.arange(0,np.uint8(childrenSize[1]),1,dtype= np.uint8) crossoverPointIndex1 = np.random.randint(0,np.uint8(childrenSize[1]),
np.uint8(childrenSize[1]/2)) crossoverPointIndex2 = np.array(list(set(crossoverPointIndex)-set(crossoverPointIndex1))) children = np.empty(childrenSize) #将两个父代个体进行交叉
for i in range(childrenSize[0]):
#find parent1 index
parent1_index = i%parents.shape[0]
#find parent 2 index
parent2_index = (i+1)%parents.shape[0]
#insert parameters based on random selected indexes in parent1
children[i,crossoverPointIndex1] = parents[parent1_index,crossoverPointIndex1]
#insert parameters based on random selected indexes in parent1
children[i,crossoverPointIndex2] = parents[parent2_index,crossoverPointIndex2]
return children def mutation(self, crossover, num_param):
'''
突变
随机选择一个参数并通过随机量改变值来引入子代的多样性
param crossover:要进行突变的种群
param num_param:参数的个数
return
''' #定义每个参数允许的最小值和最大值
minMaxValue = np.zeros((num_param,2)) minMaxValue[0,:] = [0.01, 1.0] #min/max learning rate
minMaxValue[1,:] = [10, 2000] #min/max n_estimator
minMaxValue[2,:] = [1, 15] #min/max depth
minMaxValue[3,:] = [0, 10.0] #min/max child_weight
minMaxValue[4,:] = [0.01, 10.0] #min/max gamma
minMaxValue[5,:] = [0.01, 1.0] #min/maxsubsample
minMaxValue[6,:] = [0.01, 1.0] #min/maxcolsample_bytree #突变随机改变每个后代中的单个基因
mutationValue = 0
parameterSelect = np.random.randint(0,7,1) if parameterSelect == 0:
#learning_rate
mutationValue = round(np.random.uniform(-0.5, 0.5), 2)
if parameterSelect == 1:
#n_estimators
mutationValue = np.random.randint(-200, 200, 1)
if parameterSelect == 2:
#max_depth
mutationValue = np.random.randint(-5, 5, 1)
if parameterSelect == 3:
#min_child_weight
mutationValue = round(np.random.uniform(5, 5), 2)
if parameterSelect == 4:
#gamma
mutationValue = round(np.random.uniform(-2, 2), 2)
if parameterSelect == 5:
#subsample
mutationValue = round(np.random.uniform(-0.5, 0.5), 2)
if parameterSelect == 6:
#colsample
mutationValue = round(np.random.uniform(-0.5, 0.5), 2) #通过更改一个参数来引入变异,如果超出范围则设置为max或min
for idx in range(crossover.shape[0]):
crossover[idx, parameterSelect] = crossover[idx,parameterSelect]+mutationValue if(crossover[idx,parameterSelect]>minMaxValue[parameterSelect,1]):
crossover[idx,parameterSelect] = minMaxValue[parameterSelect,1] if(crossover[idx,parameterSelect] < minMaxValue[parameterSelect,0]):
crossover[idx,parameterSelect] = minMaxValue[parameterSelect,0] return crossover ######################参数收缩测试##############################################
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler X,y = load_breast_cancer(return_X_y=True) X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=1) ss = StandardScaler()
X_train = ss.fit_transform(X_train)
X_test = ss.transform(X_test) xgDMatrixTrain = xgboost.DMatrix(X_train,y_train)
xgbDMatrixTest = xgboost.DMatrix(X_test, y_test) number_of_parents = 8 #初始种群数量
number_of_generations = 4 #种群繁殖代数,即迭代次数
number_of_parameters = 7 #将被优化的参数数量
number_of_parents_mating = 4 #每代被保留的个体数量 gx = GeneticXgboost(num_parents=number_of_parents) #定义种群的大小
populationSize = (number_of_parents,number_of_parameters) #初始种群
population = gx.initilialize_poplulation()
#定义一个数组来存储fitness历史
FitnessHistory = np.empty([number_of_generations+1, number_of_parents])
#定义一个数组来存储每个父节点和生成的每个参数的值
populationHistory = np.empty([(number_of_generations+1)*number_of_parents,
number_of_parameters])
#历史记录中插入初始参数的值
populationHistory[0:number_of_parents,:] = population #训练
for generation in range(number_of_generations):
print("This is number %s generation" %(generation))
#train the dataset and obtain fitness
FitnessValue = gx.fitness_compute(population=population,
dMatrixTrain=xgDMatrixTrain,
dMatrixtest=xgbDMatrixTest,
y_test=y_test) FitnessHistory[generation,:] = FitnessValue
print('Best F1 score in the iteration = {}'.format(np.max(FitnessHistory[generation,:])))
#保留的父代
parents = gx.parents_selection(population=population,
fitness=FitnessValue,
num_store=number_of_parents_mating)
#生成的子代
children = gx.crossover_uniform(parents=parents,
childrenSize=(populationSize[0]-parents.shape[0],number_of_parameters)) #增加突变以创造遗传多样性
children_mutated = gx.mutation(children, number_of_parameters) #创建新的种群,其中将包含以前根据fitness value选择的父代,和生成的子代
population[0:parents.shape[0], :] = parents
population[parents.shape[0]:, :] = children_mutated
populationHistory[(generation+1)*number_of_parents:(generation+1)*number_of_parents+number_of_parents,:]=population #最终迭代的最佳解决方案
fitness = gx.fitness_compute(population=population,
dMatrixTrain=xgDMatrixTrain,
dMatrixtest=xgbDMatrixTest,
y_test=y_test) bestFitnessIndex = np.where(fitness == np.max(fitness))[0][0]
print("Best fitness is =", fitness[bestFitnessIndex]) print("Best parameters are:")
print('learning_rate=', population[bestFitnessIndex][0])
print('n_estimators=', population[bestFitnessIndex][1])
print('max_depth=', int(population[bestFitnessIndex][2]))
print('min_child_weight=', population[bestFitnessIndex][3])
print('gamma=', population[bestFitnessIndex][4])
print('subsample=', population[bestFitnessIndex][5])
print('colsample_bytree=', population[bestFitnessIndex][6])
转载:https://www.toutiao.com/i6602143792273293837/
xgboost的遗传算法调参的更多相关文章
- XGBoost 重要参数(调参使用)
XGBoost 重要参数(调参使用) 数据比赛Kaggle,天池中最常见的就是XGBoost和LightGBM. 模型是在数据比赛中尤为重要的,但是实际上,在比赛的过程中,大部分朋友在模型上花的时间却 ...
- XGBOOST应用及调参示例
该示例所用的数据可从该链接下载,提取码为3y90,数据说明可参考该网页.该示例的“模型调参”这一部分引用了这篇博客的步骤. 数据前处理 导入数据 import pandas as pd import ...
- xgboost/gbdt在调参时为什么树的深度很少就能达到很高的精度?
问题: 用xgboost/gbdt在在调参的时候把树的最大深度调成6就有很高的精度了.但是用DecisionTree/RandomForest的时候需要把树的深度调到15或更高.用RandomFore ...
- 【Python机器学习实战】决策树与集成学习(七)——集成学习(5)XGBoost实例及调参
上一节对XGBoost算法的原理和过程进行了描述,XGBoost在算法优化方面主要在原损失函数中加入了正则项,同时将损失函数的二阶泰勒展开近似展开代替残差(事实上在GBDT中叶子结点的最优值求解也是使 ...
- xgboost参数及调参
常规参数General Parameters booster[default=gbtree]:选择基分类器,可以是:gbtree,gblinear或者dart.gbtree和draf基于树模型,而gb ...
- xgboost使用调参
欢迎关注博主主页,学习python视频资源 https://blog.csdn.net/q383700092/article/details/53763328 调参后结果非常理想 from sklea ...
- Xgboost调参总结
一.参数速查 参数分为三类: 通用参数:宏观函数控制. Booster参数:控制每一步的booster(tree/regression). 学习目标参数:控制训练目标的表现. 二.回归 from xg ...
- xgboost的sklearn接口和原生接口参数详细说明及调参指点
from xgboost import XGBClassifier XGBClassifier(max_depth=3,learning_rate=0.1,n_estimators=100,silen ...
- xgboost入门与实战(实战调参篇)
https://blog.csdn.net/sb19931201/article/details/52577592 xgboost入门与实战(实战调参篇) 前言 前面几篇博文都在学习原理知识,是时候上 ...
随机推荐
- docker中进行IDA远程调试提示“TRACEME: Operation not permitted[1] Closing connection from 192.168.109.1...”的解决方法
加入 --security-opt seccomp:unconfined选项,关闭docker远程命令执行保护 如: docker run --security-opt seccomp:unconfi ...
- Hibernate和Spring整合出现懒加载异常:org.hibernate.LazyInitializationException: could not initialize proxy - no Session
出现问题: SSH整合项目里,项目目录结构如下: 在EmployeeAction.java的list()方法里将employees的list放入到request的Map中. EmployeeActi ...
- SSH整合:Unable to instantiate Action, employeeAction, defined for 'emp-list' in namespace '/'employeeAction - action
SSH整合,照着视频敲的,不知为何会报错,经历了快两周的折磨给解决了.记录下来给后面需要帮助的人,也算极好的了. Struts Problem Report Struts has detected a ...
- BigPipe 大的页面分割成一个一个管道
bigpipe创新驱动力 node实现 具体实现 过去十年,现代web站点变得更加动态和内容化,交互性也逐步增强, 传统的页面处理的方式却没有保持一样的速度发展,越来越不能满足用户对极致性能的追求. ...
- Locust 其他协议
Locust 是基于HTTP作为主要目标构建的,但是他同样可以扩展其他的协议,接受请求与获得返回.在编写的客户端的时候,我们就要使用到最常使用的 request_success 和 request_f ...
- 查看linux安装包的版本信息-TX2
前言 新到手一块TX2板子,想要检查系统是否安装某软件及其版本. 操作命令 Cuda8.:nvcc --version Opencv:pkg-config --modversion opencv G+ ...
- Android中对文件的读写进行操作
1. 在文件的地方生成一个read.txt文件,并且写入一个read数据.IO流用完之后一定要记得关闭. 对于try和catch是对于错误的抓取. 2. 首先先new file来找到那个文件,然后在通 ...
- 《DSP using MATLAB》Problem 3.12
- leetcode:Minimum Depth of Binary Tree【Python版】
1.类中递归调用添加self: 2.root为None,返回0 3.root不为None,root左右孩子为None,返回1 4.返回l和r最小深度,l和r初始为极大值: # Definition f ...
- Python自动发邮件——smtplib和email库和yagmail库
''' 一.先导入smtplib模块 导入MIMEText库用来做纯文本的邮件模板 二.发邮件几个相关的参数,每个邮箱的发件服务器不一样,以163为例子百度搜索服务器是 smtp.163.com 三. ...