这是个人在竞赛中对LGB模型进行调参的详细过程记录,主要包含下面六个步骤:

  1. 大学习率,确定估计器参数n_estimators/num_iterations/num_round/num_boost_round
  2. 确定num_leavesmax_depth
  3. 确定min_data_in_leaf
  4. 确定bagging_fraction+bagging_freqfeature_fraction
  5. 确定L1L2正则reg_alphareg_lambda
  6. 降低学习率

【这里必须说一下,lightbg的参数的同义词实在太多了,很多不同的参数表示的是同一个意思,不过本文中使用“/”分开】

0 并行优化

主要有两种feature parallel特征并行和data parallel数据并行。具体的过程我不也不了解,因为我没有多个CPU给我耍(穷)。

  • feature parallel:每个worker有全部的训练数据,但是他们只用部分特征进行训练,然后不同worker之间交流他们的局部最优特征和分裂点,比较出来哪一个是全局最优的。
  • data parallel: 每一个worker有部分的样本,然后绘制局部特征直方图。彼此交流之后,得到全局直方图进行训练。

【虽然具体的机制不太了解,但是最重要的是:小数据用feature parallel,大数据用data parallel

1. 估计器数量

不管怎么样,我们先把学习率先定一个较高的值,这里取 learning_rate = 0.1,其次确定估计器boosting/boost/boosting_type的类型,不过默认都会选gbdt

这里可以体现,虽然LGB和XGB经常拿来和GBDT比较,但是其本质都还是GBDT的boost思想

为了确定估计器的数目,也就是boosting迭代的次数,也可以说是残差树的数目,参数名为n_estimators/num_iterations/num_round/num_boost_round。我们可以先将该参数设成一个较大的数,然后在cv结果中查看最优的迭代次数,具体如代码。

在这之前,我们必须给其他重要的参数一个初始值。初始值的意义不大,只是为了方便确定其他参数。下面先给定一下初始值:

以下参数根据具体项目要求定:

  1. 'boosting_type'/'boosting': 'gbdt'
  2. 'objective': 'regression'
  3. 'metric': 'rmse'

以下参数我选择的初始值,你可以根据自己的情况来选择:

  1. 'max_depth': 6 ### 根据问题来定咯,由于我的数据集不是很大,所以选择了一个适中的值,其实4-10都无所谓。
  2. 'num_leaves': 50 ### 由于lightGBM是leaves_wise生长,官方说法是要小于2^max_depth
  3. 'subsample'/'bagging_fraction':0.8 ### 数据采样
  4. 'colsample_bytree'/'feature_fraction': 0.8 ### 特征采样

下面我是用LightGBM的cv函数进行演示:

  1. params = {
  2. 'boosting_type': 'gbdt',
  3. 'objective': 'regression',
  4. 'learning_rate': 0.1,
  5. 'num_leaves': 50,
  6. 'max_depth': 6,
  7. 'subsample': 0.8,
  8. 'colsample_bytree': 0.8,
  9. }
  1. data_train = lgb.Dataset(df_train, y_train, silent=True)
  2. cv_results = lgb.cv(
  3. params, data_train, num_boost_round=1000, nfold=5, stratified=False, shuffle=True, metrics='rmse',
  4. early_stopping_rounds=50, verbose_eval=50, show_stdv=True, seed=0)
  5. print('best n_estimators:', len(cv_results['rmse-mean']))
  6. print('best cv score:', cv_results['rmse-mean'][-1])

运行结果是:

  1. [50] cv_agg's rmse: 1.38497 + 0.0202823
  2. best n_estimators: 43
  3. best cv score: 1.3838664241

所以我们得到了结果,在学习率0.1的时候,有43个估计器的时候效果最好。所以现在我们已经调整好了一个参数了:n_estimators/num_iterations/num_round/num_boost_round=43

【在硬件设备允许的条件下,学习率还是越小越好】

2. 提高拟合程度

这是提高精确度的最重要的参数。

  • max_depth:设置树深度,深度越大可能过拟合
  • num_leaves:因为 LightGBM 使用的是 leaf-wise 的算法,因此在调节树的复杂程度时,使用的是 num_leaves 而不是 max_depth。大致换算关系:num_leaves = 2^(max_depth),但是它的值的设置应该小于 2^(max_depth),否则可能会导致过拟合。

【这里虽然说了num_leaves与max_depth之间的关系,但是并不是严格的,大概在这个左右就好了。】

接下来同时对这两个参数调优,引入sklearn中的GridSearchCV()函数进行网格搜索,当然也可以使用贝叶斯搜索,贝叶斯这个之前在个人博客讲过,之后我有空了再搬运到公众号好了。

不过这个搜索过程,非常耗时间,非常消耗精力。对于大数据集的话,建议贝叶斯,或者就简单调整下就行了。一般这种参数优化的空间非常有限。

  1. from sklearn.model_selection import GridSearchCV
  2. ### 我们可以创建lgb的sklearn模型,使用上面选择的(学习率,评估器数目)
  3. model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=50,
  4. learning_rate=0.1, n_estimators=43, max_depth=6,
  5. metric='rmse', bagging_fraction = 0.8,feature_fraction = 0.8)
  6. params_test1={
  7. 'max_depth': range(3,8,2),
  8. 'num_leaves':range(50, 170, 30)
  9. }
  10. gsearch1 = GridSearchCV(estimator=model_lgb, param_grid=params_test1, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
  1. gsearch1.fit(df_train, y_train)
  2. gsearch1.grid_scores_, gsearch1.best_params_, gsearch1.best_score_

来看下运行的结果:

  1. Fitting 5 folds for each of 12 candidates, totalling 60 fits
  2. [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 2.0min
  3. [Parallel(n_jobs=4)]: Done 60 out of 60 | elapsed: 3.1min finished
  4. ([mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 50},
  5. mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 80},
  6. mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 110},
  7. mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 140},
  8. mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 50},
  9. mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 80},
  10. mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 110},
  11. mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 140},
  12. mean: -1.89254, std: 0.10904, params: {'max_depth': 7, 'num_leaves': 50},
  13. mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 80},
  14. mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 110},
  15. mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 140}],
  16. {'max_depth': 7, 'num_leaves': 80},
  17. -1.8602436718814157)

这里运行了12个参数组合,得到的最优解是在max_depth为7,num_leaves为80的情况下,分数为-1.860。

这里必须说一下,sklearn模型评估里的scoring参数都是采用的higher return values are better than lower return values(较高的返回值优于较低的返回值)。

但是,我采用的metric策略采用的是均方误差(rmse),越低越好,所以sklearn就提供了neg_mean_squared_error参数,也就是返回metric的负数,所以就均方差来说,也就变成负数越大越好了。

所以,可以看到,最优解的分数为-1.860,转化为均方差为np.sqrt(-(-1.860)) = 1.3639,明显比step1的分数要好很多。(之前用的是rmse均方根误差,要开方)

至此,我们将我们这步得到的最优解代入第三步。其实,我这里只进行了粗调,如果要得到更好的效果,可以将max_depth在7附近多取几个值,num_leaves在80附近多取几个值。千万不要怕麻烦,虽然这确实很麻烦。

  1. params_test2={
  2. 'max_depth': [6,7,8],
  3. 'num_leaves':[68,74,80,86,92]
  4. }
  5. gsearch2 = GridSearchCV(estimator=model_lgb, param_grid=params_test2, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
  6. gsearch2.fit(df_train, y_train)
  7. gsearch2.grid_scores_, gsearch2.best_params_, gsearch2.best_score_
  1. Fitting 5 folds for each of 15 candidates, totalling 75 fits
  2. [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 2.8min
  3. [Parallel(n_jobs=4)]: Done 75 out of 75 | elapsed: 5.1min finished
  4. ([mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 68},
  5. mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 74},
  6. mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 80},
  7. mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 86},
  8. mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 92},
  9. mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 68},
  10. mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 74},
  11. mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 80},
  12. mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 86},
  13. mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 92},
  14. mean: -1.88197, std: 0.11295, params: {'max_depth': 8, 'num_leaves': 68},
  15. mean: -1.89117, std: 0.12686, params: {'max_depth': 8, 'num_leaves': 74},
  16. mean: -1.86390, std: 0.12259, params: {'max_depth': 8, 'num_leaves': 80},
  17. mean: -1.86733, std: 0.12159, params: {'max_depth': 8, 'num_leaves': 86},
  18. mean: -1.86665, std: 0.12174, params: {'max_depth': 8, 'num_leaves': 92}],
  19. {'max_depth': 7, 'num_leaves': 68},
  20. -1.8602436718814157)

可见最大深度7是没问题的,但是看细节的话,发现在最大深度为7的情况下,叶结点的数量对分数并没有影响。

3. 降低过拟合

说到这里,就该降低过拟合了。

  • min_data_in_leaf是一个很重要的参数, 也叫min_child_samples,它的值取决于训练数据的样本个树和num_leaves. 将其设置的较大可以避免生成一个过深的树, 但有可能导致欠拟合。

  • min_sum_hessian_in_leaf:也叫min_child_weight,使一个结点分裂的最小海森值之和,真拗口(Minimum sum of hessians in one leaf to allow a split. Higher values potentially decrease overfitting)

关于第二个参数,其实我不是非常的明白,因为不太了解hessian值和hessian矩阵?之后有空抽个时间好好学习一下,我学习的过程就是这样查漏补缺2333。请大家关注公众号,这样不会错过每一个干货

我们采用跟上面相同的方法进行:

  1. params_test3={
  2. 'min_child_samples': [18, 19, 20, 21, 22],
  3. 'min_child_weight':[0.001, 0.002]
  4. }
  5. model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,
  6. learning_rate=0.1, n_estimators=43, max_depth=7,
  7. metric='rmse', bagging_fraction = 0.8, feature_fraction = 0.8)
  8. gsearch3 = GridSearchCV(estimator=model_lgb, param_grid=params_test3, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
  9. gsearch3.fit(df_train, y_train)
  10. gsearch3.grid_scores_, gsearch3.best_params_, gsearch3.best_score_

结果是:

  1. Fitting 5 folds for each of 10 candidates, totalling 50 fits
  2. [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 2.9min
  3. [Parallel(n_jobs=4)]: Done 50 out of 50 | elapsed: 3.3min finished
  4. ([mean: -1.88057, std: 0.13948, params: {'min_child_samples': 18, 'min_child_weight': 0.001},
  5. mean: -1.88057, std: 0.13948, params: {'min_child_samples': 18, 'min_child_weight': 0.002},
  6. mean: -1.88365, std: 0.13650, params: {'min_child_samples': 19, 'min_child_weight': 0.001},
  7. mean: -1.88365, std: 0.13650, params: {'min_child_samples': 19, 'min_child_weight': 0.002},
  8. mean: -1.86024, std: 0.11364, params: {'min_child_samples': 20, 'min_child_weight': 0.001},
  9. mean: -1.86024, std: 0.11364, params: {'min_child_samples': 20, 'min_child_weight': 0.002},
  10. mean: -1.86980, std: 0.14251, params: {'min_child_samples': 21, 'min_child_weight': 0.001},
  11. mean: -1.86980, std: 0.14251, params: {'min_child_samples': 21, 'min_child_weight': 0.002},
  12. mean: -1.86750, std: 0.13898, params: {'min_child_samples': 22, 'min_child_weight': 0.001},
  13. mean: -1.86750, std: 0.13898, params: {'min_child_samples': 22, 'min_child_weight': 0.002}],
  14. {'min_child_samples': 20, 'min_child_weight': 0.001},
  15. -1.8602436718814157)

这是我经过粗调后细调的结果,可以看到,min_data_in_leaf的最优值为20,而min_sum_hessian_in_leaf对最后的值几乎没有影响。且这里调参之后,最优的结果还是-1.86024,没有提升。

4. 采样降低过拟合

这两个参数都是为了降低过拟合的。

feature_fraction参数来进行特征的子抽样。这个参数可以用来防止过拟合及提高训练速度。

bagging_fraction+bagging_freq参数必须同时设置,bagging_fraction相当于subsample样本采样,可以使bagging更快的运行,同时也可以降拟合。bagging_freq默认0,表示bagging的频率,0意味着没有使用bagging,k意味着每k轮迭代进行一次bagging。

不同的参数,同样的方法:

  1. params_test4={
  2. 'feature_fraction': [0.5, 0.6, 0.7, 0.8, 0.9],
  3. 'bagging_fraction': [0.6, 0.7, 0.8, 0.9, 1.0]
  4. }
  5. model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,
  6. learning_rate=0.1, n_estimators=43, max_depth=7,
  7. metric='rmse', bagging_freq = 5, min_child_samples=20)
  8. gsearch4 = GridSearchCV(estimator=model_lgb, param_grid=params_test4, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
  9. gsearch4.fit(df_train, y_train)
  10. gsearch4.grid_scores_, gsearch4.best_params_, gsearch4.best_score_
  1. Fitting 5 folds for each of 25 candidates, totalling 125 fits
  2. [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 2.6min
  3. [Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed: 7.1min finished
  4. ([mean: -1.90447, std: 0.15841, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.5},
  5. mean: -1.90846, std: 0.13925, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.6},
  6. mean: -1.91695, std: 0.14121, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.7},
  7. mean: -1.90115, std: 0.12625, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.8},
  8. mean: -1.92586, std: 0.15220, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.9},
  9. mean: -1.88031, std: 0.17157, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.5},
  10. mean: -1.89513, std: 0.13718, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.6},
  11. mean: -1.88845, std: 0.13864, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.7},
  12. mean: -1.89297, std: 0.12374, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.8},
  13. mean: -1.89432, std: 0.14353, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.9},
  14. mean: -1.88088, std: 0.14247, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.5},
  15. mean: -1.90080, std: 0.13174, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.6},
  16. mean: -1.88364, std: 0.14732, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.7},
  17. mean: -1.88987, std: 0.13344, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.8},
  18. mean: -1.87752, std: 0.14802, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.9},
  19. mean: -1.88348, std: 0.13925, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.5},
  20. mean: -1.87472, std: 0.13301, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.6},
  21. mean: -1.88656, std: 0.12241, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.7},
  22. mean: -1.89029, std: 0.10776, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.8},
  23. mean: -1.88719, std: 0.11915, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.9},
  24. mean: -1.86170, std: 0.12544, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.5},
  25. mean: -1.87334, std: 0.13099, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.6},
  26. mean: -1.85412, std: 0.12698, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.7},
  27. mean: -1.86024, std: 0.11364, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.8},
  28. mean: -1.87266, std: 0.12271, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.9}],
  29. {'bagging_fraction': 1.0, 'feature_fraction': 0.7},
  30. -1.8541224387666373)

从这里可以看出来,bagging_feactionfeature_fraction的理想值分别是1.0和0.7,一个很重要原因就是,我的样本数量比较小(4000+),但是特征数量很多(1000+)。所以,这里我们取更小的步长,对feature_fraction进行更细致的取值。

下面微调一下:

  1. params_test5={
  2. 'feature_fraction': [0.62, 0.65, 0.68, 0.7, 0.72, 0.75, 0.78 ]
  3. }
  4. model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,
  5. learning_rate=0.1, n_estimators=43, max_depth=7,
  6. metric='rmse', min_child_samples=20)
  7. gsearch5 = GridSearchCV(estimator=model_lgb, param_grid=params_test5, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
  8. gsearch5.fit(df_train, y_train)
  9. gsearch5.grid_scores_, gsearch5.best_params_, gsearch5.best_score_
  1. Fitting 5 folds for each of 7 candidates, totalling 35 fits
  2. [Parallel(n_jobs=4)]: Done 35 out of 35 | elapsed: 2.3min finished
  3. ([mean: -1.86696, std: 0.12658, params: {'feature_fraction': 0.62},
  4. mean: -1.88337, std: 0.13215, params: {'feature_fraction': 0.65},
  5. mean: -1.87282, std: 0.13193, params: {'feature_fraction': 0.68},
  6. mean: -1.85412, std: 0.12698, params: {'feature_fraction': 0.7},
  7. mean: -1.88235, std: 0.12682, params: {'feature_fraction': 0.72},
  8. mean: -1.86329, std: 0.12757, params: {'feature_fraction': 0.75},
  9. mean: -1.87943, std: 0.12107, params: {'feature_fraction': 0.78}],
  10. {'feature_fraction': 0.7},
  11. -1.8541224387666373)

好吧,feature_fraction就是0.7了

5. L1与L2

正则化参数lambda_l1(reg_alpha), lambda_l2(reg_lambda),毫无疑问,是降低过拟合的,两者分别对应l1正则化和l2正则化。我们也来尝试一下使用这两个参数。

  1. params_test6={
  2. 'reg_alpha': [0, 0.001, 0.01, 0.03, 0.08, 0.3, 0.5],
  3. 'reg_lambda': [0, 0.001, 0.01, 0.03, 0.08, 0.3, 0.5]
  4. }
  5. model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,
  6. learning_rate=0.b1, n_estimators=43, max_depth=7,
  7. metric='rmse', min_child_samples=20, feature_fraction=0.7)
  8. gsearch6 = GridSearchCV(estimator=model_lgb, param_grid=params_test6, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
  9. gsearch6.fit(df_train, y_train)
  10. gsearch6.grid_scores_, gsearch6.best_params_, gsearch6.best_score_
  1. Fitting 5 folds for each of 49 candidates, totalling 245 fits
  2. [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 2.8min
  3. [Parallel(n_jobs=4)]: Done 192 tasks | elapsed: 10.6min
  4. [Parallel(n_jobs=4)]: Done 245 out of 245 | elapsed: 13.3min finished
  5. ([mean: -1.85412, std: 0.12698, params: {'reg_alpha': 0, 'reg_lambda': 0},
  6. mean: -1.85990, std: 0.13296, params: {'reg_alpha': 0, 'reg_lambda': 0.001},
  7. mean: -1.86367, std: 0.13634, params: {'reg_alpha': 0, 'reg_lambda': 0.01},
  8. mean: -1.86787, std: 0.13881, params: {'reg_alpha': 0, 'reg_lambda': 0.03},
  9. mean: -1.87099, std: 0.12476, params: {'reg_alpha': 0, 'reg_lambda': 0.08},
  10. mean: -1.87670, std: 0.11849, params: {'reg_alpha': 0, 'reg_lambda': 0.3},
  11. mean: -1.88278, std: 0.13064, params: {'reg_alpha': 0, 'reg_lambda': 0.5},
  12. mean: -1.86190, std: 0.13613, params: {'reg_alpha': 0.001, 'reg_lambda': 0},
  13. mean: -1.86190, std: 0.13613, params: {'reg_alpha': 0.001, 'reg_lambda': 0.001},
  14. mean: -1.86515, std: 0.14116, params: {'reg_alpha': 0.001, 'reg_lambda': 0.01},
  15. mean: -1.86908, std: 0.13668, params: {'reg_alpha': 0.001, 'reg_lambda': 0.03},
  16. mean: -1.86852, std: 0.12289, params: {'reg_alpha': 0.001, 'reg_lambda': 0.08},
  17. mean: -1.88076, std: 0.11710, params: {'reg_alpha': 0.001, 'reg_lambda': 0.3},
  18. mean: -1.88278, std: 0.13064, params: {'reg_alpha': 0.001, 'reg_lambda': 0.5},
  19. mean: -1.87480, std: 0.13889, params: {'reg_alpha': 0.01, 'reg_lambda': 0},
  20. mean: -1.87284, std: 0.14138, params: {'reg_alpha': 0.01, 'reg_lambda': 0.001},
  21. mean: -1.86030, std: 0.13332, params: {'reg_alpha': 0.01, 'reg_lambda': 0.01},
  22. mean: -1.86695, std: 0.12587, params: {'reg_alpha': 0.01, 'reg_lambda': 0.03},
  23. mean: -1.87415, std: 0.13100, params: {'reg_alpha': 0.01, 'reg_lambda': 0.08},
  24. mean: -1.88543, std: 0.13195, params: {'reg_alpha': 0.01, 'reg_lambda': 0.3},
  25. mean: -1.88076, std: 0.13502, params: {'reg_alpha': 0.01, 'reg_lambda': 0.5},
  26. mean: -1.87729, std: 0.12533, params: {'reg_alpha': 0.03, 'reg_lambda': 0},
  27. mean: -1.87435, std: 0.12034, params: {'reg_alpha': 0.03, 'reg_lambda': 0.001},
  28. mean: -1.87513, std: 0.12579, params: {'reg_alpha': 0.03, 'reg_lambda': 0.01},
  29. mean: -1.88116, std: 0.12218, params: {'reg_alpha': 0.03, 'reg_lambda': 0.03},
  30. mean: -1.88052, std: 0.13585, params: {'reg_alpha': 0.03, 'reg_lambda': 0.08},
  31. mean: -1.87565, std: 0.12200, params: {'reg_alpha': 0.03, 'reg_lambda': 0.3},
  32. mean: -1.87935, std: 0.13817, params: {'reg_alpha': 0.03, 'reg_lambda': 0.5},
  33. mean: -1.87774, std: 0.12477, params: {'reg_alpha': 0.08, 'reg_lambda': 0},
  34. mean: -1.87774, std: 0.12477, params: {'reg_alpha': 0.08, 'reg_lambda': 0.001},
  35. mean: -1.87911, std: 0.12027, params: {'reg_alpha': 0.08, 'reg_lambda': 0.01},
  36. mean: -1.86978, std: 0.12478, params: {'reg_alpha': 0.08, 'reg_lambda': 0.03},
  37. mean: -1.87217, std: 0.12159, params: {'reg_alpha': 0.08, 'reg_lambda': 0.08},
  38. mean: -1.87573, std: 0.14137, params: {'reg_alpha': 0.08, 'reg_lambda': 0.3},
  39. mean: -1.85969, std: 0.13109, params: {'reg_alpha': 0.08, 'reg_lambda': 0.5},
  40. mean: -1.87632, std: 0.12398, params: {'reg_alpha': 0.3, 'reg_lambda': 0},
  41. mean: -1.86995, std: 0.12651, params: {'reg_alpha': 0.3, 'reg_lambda': 0.001},
  42. mean: -1.86380, std: 0.12793, params: {'reg_alpha': 0.3, 'reg_lambda': 0.01},
  43. mean: -1.87577, std: 0.13002, params: {'reg_alpha': 0.3, 'reg_lambda': 0.03},
  44. mean: -1.87402, std: 0.13496, params: {'reg_alpha': 0.3, 'reg_lambda': 0.08},
  45. mean: -1.87032, std: 0.12504, params: {'reg_alpha': 0.3, 'reg_lambda': 0.3},
  46. mean: -1.88329, std: 0.13237, params: {'reg_alpha': 0.3, 'reg_lambda': 0.5},
  47. mean: -1.87196, std: 0.13099, params: {'reg_alpha': 0.5, 'reg_lambda': 0},
  48. mean: -1.87196, std: 0.13099, params: {'reg_alpha': 0.5, 'reg_lambda': 0.001},
  49. mean: -1.88222, std: 0.14735, params: {'reg_alpha': 0.5, 'reg_lambda': 0.01},
  50. mean: -1.86618, std: 0.14006, params: {'reg_alpha': 0.5, 'reg_lambda': 0.03},
  51. mean: -1.88579, std: 0.12398, params: {'reg_alpha': 0.5, 'reg_lambda': 0.08},
  52. mean: -1.88297, std: 0.12307, params: {'reg_alpha': 0.5, 'reg_lambda': 0.3},
  53. mean: -1.88148, std: 0.12622, params: {'reg_alpha': 0.5, 'reg_lambda': 0.5}],
  54. {'reg_alpha': 0, 'reg_lambda': 0},
  55. -1.8541224387666373)

哈哈,看来我多此一举了。

6. 降低learning_rate

回到第一步,不过我们使用的是已经优化好的参数值:

  1. params = {
  2. 'boosting_type': 'gbdt',
  3. 'objective': 'regression',
  4. 'learning_rate': 0.005,
  5. 'num_leaves': 80,
  6. 'max_depth': 7,
  7. 'min_data_in_leaf': 20,
  8. 'subsample': 1,
  9. 'colsample_bytree': 0.7,
  10. }
  11. data_train = lgb.Dataset(df_train, y_train, silent=True)
  12. cv_results = lgb.cv(
  13. params, data_train, num_boost_round=10000, nfold=5, stratified=False, shuffle=True, metrics='rmse',
  14. early_stopping_rounds=50, verbose_eval=100, show_stdv=True)
  15. print('best n_estimators:', len(cv_results['rmse-mean']))
  16. print('best cv score:', cv_results['rmse-mean'][-1])
  1. [100] cv_agg's rmse: 1.52939 + 0.0261756
  2. [200] cv_agg's rmse: 1.43535 + 0.0187243
  3. [300] cv_agg's rmse: 1.39584 + 0.0157521
  4. [400] cv_agg's rmse: 1.37935 + 0.0157429
  5. [500] cv_agg's rmse: 1.37313 + 0.0164503
  6. [600] cv_agg's rmse: 1.37081 + 0.0172752
  7. [700] cv_agg's rmse: 1.36942 + 0.0177888
  8. [800] cv_agg's rmse: 1.36854 + 0.0180575
  9. [900] cv_agg's rmse: 1.36817 + 0.0188776
  10. [1000] cv_agg's rmse: 1.36796 + 0.0190279
  11. [1100] cv_agg's rmse: 1.36783 + 0.0195969
  12. best n_estimators: 1079
  13. best cv score: 1.36772351783

参考链接:

  1. https://www.2cto.com/kf/201607/528771.html
  2. https://zhuanlan.zhihu.com/p/30627440
  3. https://www.jianshu.com/p/b4ac0596e5ef
  4. https://www.cnblogs.com/bjwu/p/9307344.html





工程能力UP | LightGBM的调参干货教程与并行优化的更多相关文章

  1. 机器学习系列:LightGBM 可视化调参

    大家好,在100天搞定机器学习|Day63 彻底掌握 LightGBM一文中,我介绍了LightGBM 的模型原理和一个极简实例.最近我发现Huggingface与Streamlit好像更配,所以就开 ...

  2. DJI-A2调参详细教程

    DJI-A2飞控系统用户手册 https://wenku.baidu.com/view/bb632f88227916888586d749.html DJI-A2调参软件视频教程 http://www. ...

  3. LightGBM调参笔记

    本文链接:https://blog.csdn.net/u012735708/article/details/837497031. 概述在竞赛题中,我们知道XGBoost算法非常热门,是很多的比赛的大杀 ...

  4. DeepMind提出新型超参数最优化方法:性能超越手动调参和贝叶斯优化

    DeepMind提出新型超参数最优化方法:性能超越手动调参和贝叶斯优化 2017年11月29日 06:40:37 机器之心V 阅读数 2183   版权声明:本文为博主原创文章,遵循CC 4.0 BY ...

  5. 自动调参库hyperopt+lightgbm 调参demo

    在此之前,调参要么网格调参,要么随机调参,要么肉眼调参.虽然调参到一定程度,进步有限,但仍然很耗精力. 自动调参库hyperopt可用tpe算法自动调参,实测强于随机调参. hyperopt 需要自己 ...

  6. LightGBM 调参方法(具体操作)

     sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频) https://study.163.com/course/introduction.htm?courseId=1005269003& ...

  7. lightgbm调参方法

    gridsearchcv: https://www.cnblogs.com/bjwu/p/9307344.html gridsearchcv+lightgbm cv函数调参: https://www. ...

  8. 【集成学习】lightgbm调参案例

    lightgbm使用leaf_wise tree生长策略,leaf_wise_tree的优点是收敛速度快,缺点是容易过拟合. # lightgbm关键参数 # lightgbm调参方法cv 代码git ...

  9. XGBoost和LightGBM的参数以及调参

    一.XGBoost参数解释 XGBoost的参数一共分为三类: 通用参数:宏观函数控制. Booster参数:控制每一步的booster(tree/regression).booster参数一般可以调 ...

随机推荐

  1. Openvas简介

    Openvas是Nessus的一个开源分支,用于管理目标系统的漏洞. Openvas初始化:openvas-setup,会自动进行初始化配置.Openvas工作原理图如下: OpenVASManage ...

  2. 使用 nuget server 的 API 来实现搜索安装 nuget 包

    使用 nuget server 的 API 来实现搜索安装 nuget 包 Intro nuget 现在几乎是 dotnet 开发不可缺少的一部分了,还没有用过 nuget 的就有点落后时代了,还不快 ...

  3. windows虚拟机安装mac

    在虚拟机上安装mac 首先参考这个:http://jingyan.baidu.com/article/7f41ecec039936593d095c87.html 如果完成不了,请参看下面的.     ...

  4. html+css快速入门教程(3)

    练习: 1.画盒子 2.相框 5 基础选择器 5.1 id选择器 ID选择器与类选择器的定义与引用方式类似,只是定义的符号不一样.ID通常表示唯一值,因此,ID选择器在CSS 中通常只出现一次.如果出 ...

  5. C++中i++和++i的区别

    目录 效果上的区别 性能上的区别 分析汇编代码 进行性能实验 二者的选择 效果上的区别 i++是对变量i递增,但返回原值,++i是对变量i进行递增,并返回终值. 可以用以下代码加以验证: int i ...

  6. PHPstorm快捷键的学习

    1.Ctrl + 空格 当输入代码时,PHPstorm 会自动出现联想选项. 但是,如果在输入时联想时错过了选择,我们要想让他再一次出现联想,通常采用的方法是在先前的输入后面再输入字符,这时联想又会出 ...

  7. 猿灯塔:Java程序员月薪三万,需要技术达到什么水平?

    最近跟朋友在一起聚会的时候,提了一个问题,说Java程序员如何能月薪达到二万,技术水平需要达到什么程度?人回答说这只能是大企业或者互联网企业工程师才能拿到.也许是的,小公司或者非互联网企业拿二万的不太 ...

  8. 很实用的h5实现名片扫描识功能快速结合市场运营

    功能描述: 点击名片识别按钮,将名片上的个人信息扫描并解析出来显示. 实现步骤: 1.点击第一个页面上的名片识别按钮,调出手机摄像头和相册,让用户进行选择 2.获取照片或者图片的base64数据,传值 ...

  9. css怎么样设置透明度?

    css怎么样设置透明度?下面本篇文章就来给大家介绍一下使用css设置透明度的方法.有一定的参考价值,有需要的朋友可以参考一下,希望对大家有所帮助. 在CSS中想要设置透明度,可以使用opacity属性 ...

  10. SpringBoot系列之前后端接口安全技术JWT

    @ 目录 1. 什么是JWT? 2. JWT令牌结构怎么样? 2.1 标头(Header) 2.2 有效载荷(Playload) 2.3 签名(Signature) 3. JWT原理简单介绍 4. J ...