机器学习框架ML.NET学习笔记【9】自动学习

一、概述

本篇我们首先通过回归算法实现一个葡萄酒品质预测的程序，然后通过AutoML的方法再重新实现，通过对比两种实现方式来学习AutoML的应用。

首先数据集来自于竞赛网站kaggle.com的UCI Wine Quality Dataset数据集，访问地址：https://www.kaggle.com/c/uci-wine-quality-dataset/data

该数据集，输入为一些葡萄酒的化学检测数据，比如酒精度等，输出为品酒师的打分，具体字段描述如下：

Data fields

Input variables (based on physicochemical tests):

1 - fixed acidity

2 - volatile acidity

3 - citric acid

4 - residual sugar

5 - chlorides

6 - free sulfur dioxide

7 - total sulfur dioxide

8 - density

9 - pH

10 - sulphates

11 - alcohol

Output variable (based on sensory data):

12 - quality (score between 0 and 10)

Other:

13 - id (unique ID for each sample, needed for submission)

二、代码

namespace Regression_WineQuality

{

    public class WineData

    {

        [LoadColumn()]

        public float FixedAcidity;

        [LoadColumn()]

        public float VolatileAcidity;

        [LoadColumn()]

        public float CitricACID;

        [LoadColumn()]

        public float ResidualSugar;

        [LoadColumn()]

        public float Chlorides;

        [LoadColumn()]

        public float FreeSulfurDioxide;

        [LoadColumn()]

        public float TotalSulfurDioxide;

        [LoadColumn()]

        public float Density;

        [LoadColumn()]

        public float PH;

        [LoadColumn()]

        public float Sulphates;

        [LoadColumn()]

        public float Alcohol;

        [LoadColumn()]

        [ColumnName("Label")]

        public float Quality;

        [LoadColumn()]

        public float Id;

    }

    public class WinePrediction

    {

        [ColumnName("Score")]

        public float PredictionQuality;

    }

    class Program

    {

        static readonly string ModelFilePath = Path.Combine(Environment.CurrentDirectory, "MLModel", "model.zip");

        static void Main(string[] args)

        {

            Train();

            Prediction();

            Console.WriteLine("Hit any key to finish the app");

            Console.ReadKey();

        }

        public static void Train()

        {

            MLContext mlContext = new MLContext(seed: );

            // 准备数据

            string TrainDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "winequality-data-full.csv");

            var fulldata = mlContext.Data.LoadFromTextFile<WineData>(path: TrainDataPath, separatorChar: ',', hasHeader: true);

            var trainTestData = mlContext.Data.TrainTestSplit(fulldata, testFraction: 0.2);

            var trainData = trainTestData.TrainSet;

            var testData = trainTestData.TestSet;

            // 创建学习管道并通过训练数据调整模型

            var dataProcessPipeline = mlContext.Transforms.DropColumns("Id")

                .Append(mlContext.Transforms.NormalizeMeanVariance(nameof(WineData.FreeSulfurDioxide)))

                .Append(mlContext.Transforms.NormalizeMeanVariance(nameof(WineData.TotalSulfurDioxide)))

                .Append(mlContext.Transforms.Concatenate("Features", new string[] { nameof(WineData.FixedAcidity),

                                                                                    nameof(WineData.VolatileAcidity),

                                                                                    nameof(WineData.CitricACID),

                                                                                    nameof(WineData.ResidualSugar),

                                                                                    nameof(WineData.Chlorides),

                                                                                    nameof(WineData.FreeSulfurDioxide),

                                                                                    nameof(WineData.TotalSulfurDioxide),

                                                                                    nameof(WineData.Density),

                                                                                    nameof(WineData.PH),

                                                                                    nameof(WineData.Sulphates),

                                                                                    nameof(WineData.Alcohol)}));

            var trainer = mlContext.Regression.Trainers.LbfgsPoissonRegression(labelColumnName: "Label", featureColumnName: "Features");

            var trainingPipeline = dataProcessPipeline.Append(trainer);

            var trainedModel = trainingPipeline.Fit(trainData);

            // 评估

            var predictions = trainedModel.Transform(testData);

            var metrics = mlContext.Regression.Evaluate(predictions, labelColumnName: "Label", scoreColumnName: "Score");

            PrintRegressionMetrics(trainer.ToString(), metrics);

            // 保存模型

            Console.WriteLine("====== Save model to local file =========");

            mlContext.Model.Save(trainedModel, trainData.Schema, ModelFilePath);

        }

        static void Prediction()

        {

            MLContext mlContext = new MLContext(seed: );

            ITransformer loadedModel = mlContext.Model.Load(ModelFilePath, out var modelInputSchema);

            var predictor = mlContext.Model.CreatePredictionEngine<WineData, WinePrediction>(loadedModel);

            WineData wineData = new WineData

            {

                FixedAcidity = 7.6f,

                VolatileAcidity = 0.33f,

                CitricACID = 0.36f,

                ResidualSugar = 2.1f,

                Chlorides = 0.034f,

                FreeSulfurDioxide = 26f,

                TotalSulfurDioxide = 172f,

                Density = 0.9944f,

                PH = 3.42f,

                Sulphates = 0.48f,

                Alcohol = 10.5f

            };

            var wineQuality = predictor.Predict(wineData);

            Console.WriteLine($"Wine Data  Quality is:{wineQuality.PredictionQuality} ");

        }

    }

}

关于泊松回归的算法，我们在进行人脸颜值判断的那篇文章已经介绍过了，这个程序没有涉及任何新的知识点，就不重复解释了，主要目的是和下面的AutoML代码对比用的。

三、自动学习

我们发现机器学习的大致流程基本都差不多，如：准备数据-明确特征-选择算法-训练等，有时我们存在这样一个问题：该选择什么算法？算法的参数该如何配置？等等。而自动学习就解决了这个问题，框架会多次重复数据选择、算法选择、参数调优、评估结果这一过程，通过这个过程找出评估效果最好的模型。

全部代码如下：

namespace Regression_WineQuality

{

    public class WineData

    {

        [LoadColumn()]

        public float FixedAcidity;

        [LoadColumn()]

        public float VolatileAcidity;

        [LoadColumn()]

        public float CitricACID;

        [LoadColumn()]

        public float ResidualSugar;

        [LoadColumn()]

        public float Chlorides;

        [LoadColumn()]

        public float FreeSulfurDioxide;

        [LoadColumn()]

        public float TotalSulfurDioxide;

        [LoadColumn()]

        public float Density;

        [LoadColumn()]

        public float PH;

        [LoadColumn()]

        public float Sulphates;

        [LoadColumn()]

        public float Alcohol;

        [LoadColumn()]

        [ColumnName("Label")]

        public float Quality;

        [LoadColumn()]

        public float ID;

    }

    public class WinePrediction

    {

        [ColumnName("Score")]

        public float PredictionQuality;

    }

    class Program

    {

        static readonly string ModelFilePath = Path.Combine(Environment.CurrentDirectory, "MLModel", "model.zip");

        static readonly string TrainDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "winequality-data-train.csv");

        static readonly string TestDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "winequality-data-test.csv");

        static void Main(string[] args)

        {

            TrainAndSave();

            LoadAndPrediction();

            Console.WriteLine("Hit any key to finish the app");

            Console.ReadKey();

        }

        public static void TrainAndSave()

        {

            MLContext mlContext = new MLContext(seed: );

            // 准备数据

            var trainData = mlContext.Data.LoadFromTextFile<WineData>(path: TrainDataPath, separatorChar: ',', hasHeader: true);

            var testData = mlContext.Data.LoadFromTextFile<WineData>(path: TestDataPath, separatorChar: ',', hasHeader: true);

            var progressHandler = new RegressionExperimentProgressHandler();

            uint ExperimentTime = ;

            ExperimentResult<RegressionMetrics> experimentResult = mlContext.Auto()

               .CreateRegressionExperiment(ExperimentTime)

               .Execute(trainData, "Label", progressHandler: progressHandler);           

            Debugger.PrintTopModels(experimentResult);

            RunDetail<RegressionMetrics> best = experimentResult.BestRun;

            ITransformer trainedModel = best.Model;

            // 评估 BestRun

            var predictions = trainedModel.Transform(testData);

            var metrics = mlContext.Regression.Evaluate(predictions, labelColumnName: "Label", scoreColumnName: "Score");

            Debugger.PrintRegressionMetrics(best.TrainerName, metrics);

            // 保存模型

            Console.WriteLine("====== Save model to local file =========");

            mlContext.Model.Save(trainedModel, trainData.Schema, ModelFilePath);

        }

        static void LoadAndPrediction()

        {

            MLContext mlContext = new MLContext(seed: );

            ITransformer loadedModel = mlContext.Model.Load(ModelFilePath, out var modelInputSchema);

            var predictor = mlContext.Model.CreatePredictionEngine<WineData, WinePrediction>(loadedModel);

            WineData wineData = new WineData

            {

                FixedAcidity = 7.6f,

                VolatileAcidity = 0.33f,

                CitricACID = 0.36f,

                ResidualSugar = 2.1f,

                Chlorides = 0.034f,

                FreeSulfurDioxide = 26f,

                TotalSulfurDioxide = 172f,

                Density = 0.9944f,

                PH = 3.42f,

                Sulphates = 0.48f,

                Alcohol = 10.5f

            };

            var wineQuality = predictor.Predict(wineData);

            Console.WriteLine($"Wine Data  Quality is:{wineQuality.PredictionQuality} ");

        }

    }

}

四、代码分析

1、自动学习过程

            var progressHandler = new RegressionExperimentProgressHandler();

            uint ExperimentTime = ;

            ExperimentResult<RegressionMetrics> experimentResult = mlContext.Auto()

               .CreateRegressionExperiment(ExperimentTime)

               .Execute(trainData, "Label", progressHandler: progressHandler);           

            Debugger.PrintTopModels(experimentResult); //打印所有模型数据

ExperimentTime 是允许的试验时间，progressHandler是一个报告程序，当每完成一种学习，系统就会调用一次报告事件。

    public class RegressionExperimentProgressHandler : IProgress<RunDetail<RegressionMetrics>>

    {

        private int _iterationIndex;

        public void Report(RunDetail<RegressionMetrics> iterationResult)

        {

            _iterationIndex++;

            Console.WriteLine($"Report index:{_iterationIndex},TrainerName:{iterationResult.TrainerName},RuntimeInSeconds:{iterationResult.RuntimeInSeconds}");

        }

    }

调试结果如下：

Report index:1,TrainerName:SdcaRegression,RuntimeInSeconds:12.5244426

Report index:2,TrainerName:LightGbmRegression,RuntimeInSeconds:11.2034988

Report index:3,TrainerName:FastTreeRegression,RuntimeInSeconds:14.810409

Report index:4,TrainerName:FastTreeTweedieRegression,RuntimeInSeconds:14.7338553

Report index:5,TrainerName:FastForestRegression,RuntimeInSeconds:15.6224459

Report index:6,TrainerName:LbfgsPoissonRegression,RuntimeInSeconds:11.1668197

Report index:7,TrainerName:OnlineGradientDescentRegression,RuntimeInSeconds:10.5353

Report index:8,TrainerName:OlsRegression,RuntimeInSeconds:10.8905459

Report index:9,TrainerName:LightGbmRegression,RuntimeInSeconds:10.5703296

Report index:10,TrainerName:FastTreeRegression,RuntimeInSeconds:19.4470509

Report index:11,TrainerName:FastTreeTweedieRegression,RuntimeInSeconds:63.638882

Report index:12,TrainerName:LightGbmRegression,RuntimeInSeconds:10.7710518

学习结束后我们通过Debugger.PrintTopModels打印出所有模型数据：

   public class Debugger

    {

        private const int Width = ;

        public  static void PrintTopModels(ExperimentResult<RegressionMetrics> experimentResult)

        {

            var topRuns = experimentResult.RunDetails

                .Where(r => r.ValidationMetrics != null && !double.IsNaN(r.ValidationMetrics.RSquared))

                .OrderByDescending(r => r.ValidationMetrics.RSquared);

            Console.WriteLine("Top models ranked by R-Squared --");

            PrintRegressionMetricsHeader();

            for (var i = ; i < topRuns.Count(); i++)

            {

                var run = topRuns.ElementAt(i);

                PrintIterationMetrics(i + , run.TrainerName, run.ValidationMetrics, run.RuntimeInSeconds);

            }

        }       

        public static void PrintRegressionMetricsHeader()

        {

            CreateRow($"{"",-4} {"Trainer",-35} {"RSquared",8} {"Absolute-loss",13} {"Squared-loss",12} {"RMS-loss",8} {"Duration",9}", Width);

        }

        public static void PrintIterationMetrics(int iteration, string trainerName, RegressionMetrics metrics, double? runtimeInSeconds)

        {

            CreateRow($"{iteration,-4} {trainerName,-35} {metrics?.RSquared ?? double.NaN,8:F4} {metrics?.MeanAbsoluteError ?? double.NaN,13:F2} {metrics?.MeanSquaredError ?? double.NaN,12:F2} {metrics?.RootMeanSquaredError ?? double.NaN,8:F2} {runtimeInSeconds.Value,9:F1}", Width);

        }

        public static void CreateRow(string message, int width)

        {

            Console.WriteLine("|" + message.PadRight(width - ) + "|");

        }

}

其中CreateRow代码功能用于排版。调试结果如下：

Top models ranked by R-Squared --

|     Trainer                             RSquared Absolute-loss Squared-loss RMS-loss  Duration                 |

|1    FastTreeTweedieRegression             0.4731          0.46         0.41     0.64      63.6                 |

|2    FastTreeTweedieRegression             0.4431          0.49         0.43     0.65      14.7                 |

|3    FastTreeRegression                    0.4386          0.54         0.49     0.70      19.4                 |

|4    LightGbmRegression                    0.4177          0.52         0.45     0.67      10.8                 |

|5    FastTreeRegression                    0.4102          0.51         0.45     0.67      14.8                 |

|6    LightGbmRegression                    0.3944          0.52         0.46     0.68      11.2                 |

|7    LightGbmRegression                    0.3501          0.60         0.57     0.75      10.6                 |

|8    FastForestRegression                  0.3381          0.60         0.58     0.76      15.6                 |

|9    OlsRegression                         0.2829          0.56         0.53     0.73      10.9                 |

|10   LbfgsPoissonRegression                0.2760          0.62         0.63     0.80      11.2                 |

|11   SdcaRegression                        0.2746          0.58         0.56     0.75      12.5                 |

|12   OnlineGradientDescentRegression       0.0593          0.69         0.81     0.90      10.5                 |

根据结果可以看到，一些算法被重复试验，但在使用同一个算法时其配置参数并不一样，如阙值、深度等。

2、获取最优模型

            RunDetail<RegressionMetrics> best = experimentResult.BestRun;

            ITransformer trainedModel = best.Model;

获取最佳模型后，其评估和保存的过程和之前代码一致。用测试数据评估结果：

*************************************************

*       Metrics for FastTreeTweedieRegression regression model

*------------------------------------------------

*       LossFn:        0.67

*       R2 Score:      0.34

*       Absolute loss: .63

*       Squared loss:  .67

*       RMS loss:      .82

*************************************************

看结果识别率约70%左右，这种结果是没有办法用于生产的，问题应该是我们没有找到决定葡萄酒品质的关键特征。

五、小结

到这篇文章为止，《ML.NET学习笔记系列》就结束了。学习过程中涉及的原始代码主要来源于：https://github.com/dotnet/machinelearning-samples 。

该工程中还有一些其他算法应用的例子，包括：聚类、矩阵分解、异常检测，其大体流程基本都差不多，有了我们这个系列的学习基础有兴趣的朋友可以自己研究一下。

六、资源获取

源码下载地址：https://github.com/seabluescn/Study_ML.NET

回归工程名称：Regression_WineQuality

AutoML工程名称：Regression_WineQuality_AutoML

点击查看机器学习框架ML.NET学习笔记系列文章目录

机器学习框架ML.NET学习笔记【9】自动学习的更多相关文章

机器学习框架ML.NET学习笔记【3】文本特征分析
一.要解决的问题问题:常常一些单位或组织召开会议时需要录入会议记录,我们需要通过机器学习对用户输入的文本内容进行自动评判,合格或不合格.(同样的问题还类似垃圾短信检测.工作日志质量分析等.) 处理思 ...
机器学习框架ML.NET学习笔记【1】基本概念与系列文章目录
一.序言微软的机器学习框架于2018年5月出了0.1版本,2019年5月发布1.0版本.期间各版本之间差异(包括命名空间.方法等)还是比较大的,随着1.0版发布,应该是趋于稳定了.之前在园子里也看到 ...
机器学习框架ML.NET学习笔记【4】多元分类之手写数字识别
一.问题与解决方案通过多元分类算法进行手写数字识别,手写数字的图片分辨率为8*8的灰度图片.已经预先进行过处理,读取了各像素点的灰度值,并进行了标记. 其中第0列是序号(不参与运算).1-64列是像 ...
机器学习框架ML.NET学习笔记【2】入门之二元分类
一.准备样本接上一篇文章提到的问题:根据一个人的身高.体重来判断一个人的身材是否很好.但我手上没有样本数据,只能伪造一批数据了,伪造的数据比较标准,用来学习还是蛮合适的. 下面是我用来伪造数据的代码 ...
机器学习框架ML.NET学习笔记【5】多元分类之手写数字识别（续）
一.概述上一篇文章我们利用ML.NET的多元分类算法实现了一个手写数字识别的例子,这个例子存在一个问题,就是输入的数据是预处理过的,很不直观,这次我们要直接通过图片来进行学习和判断.思路很简单,就是 ...
机器学习框架ML.NET学习笔记【6】TensorFlow图片分类
一.概述通过之前两篇文章的学习,我们应该已经了解了多元分类的工作原理,图片的分类其流程和之前完全一致,其中最核心的问题就是特征的提取,只要完成特征提取,分类算法就很好处理了,具体流程如下: 之前介绍 ...
机器学习框架ML.NET学习笔记【7】人物图片颜值判断
一.概述这次要解决的问题是输入一张照片,输出人物的颜值数据. 学习样本来源于华南理工大学发布的SCUT-FBP5500数据集,数据集包括 5500 人,每人按颜值魅力打分,分值在 1 到 5 分之间 ...
机器学习框架ML.NET学习笔记【8】目标检测（采用YOLO2模型）
一.概述本篇文章介绍通过YOLO模型进行目标识别的应用,原始代码来源于:https://github.com/dotnet/machinelearning-samples 实现的功能是输入一张图片, ...
thinkphp学习笔记9—自动加载
原文:thinkphp学习笔记9-自动加载 1.命名空间自动加载在3.2版本中不需要手动加载类库文件,可以很方便的完成自动加载. 系统可以根据类的命名空间自动定位到类库文件,例如定义了一个类Org\ ...

随机推荐

Python 2.7获取网站源代码的几种方式_20160924
#coding:utf-8 import urllib2,cookielib if __name__ == '__main__': root_url='https://www.baidu.com/' ...
javacpp-FFmpeg系列之2：通用拉流解码器，支持视频拉流解码并转换为YUV、BGR24或RGB24等图像像素数据
javacpp-ffmpeg系列: javacpp-FFmpeg系列之1:视频拉流解码成YUVJ420P,并保存为jpg图片 javacpp-FFmpeg系列之2:通用拉流解码器,支持视频拉流解码并转 ...
myeclipse 2017破解安装教程+开发环境部署（jdk+tomcat）
点击安装包,进入安装界面,点击next 选择接受协议,点击next 选择安装目录,点击next 格局自己电脑的机型选择32bit或64bit,点击next 安装完成后不要运行MyEclipse,将 & ...
浏览器，tab页显示隐藏的事件监听--页面可见性
//监听浏览器tab切换,以便在tab切换之后,页面隐藏的时候,把弹幕停止 document.addEventListener('webkitvisibilitychange', function() ...
BZOJ3524：[POI2014]Couriers
浅谈主席树:https://www.cnblogs.com/AKMer/p/9956734.html 题目传送门:https://www.lydsy.com/JudgeOnline/problem.p ...
基于zookeeper的MySQL主主负载均衡的简单实现
1.先上原理图 2.说明两个mysql采用主主同步的方式进行部署. 在安装mysql的服务器上安装客户端(目前是这么做,以后想在zookeeper扩展集成),客户端实时监控mysql应用的可用性,可 ...
C#线程处理基本知识
章节: 线程与线程处理讨论多线程的优缺点,并概括了可以创建线程或使用线程池线程的几种情形. 托管线程中的异常描述不同版本 .NET Framework 的线程中的未经处理的异常的行为,尤其是导致应 ...
JSP的优势和劣势与php的比较
一 jsp的优势与劣势由于JSP页面的内置脚本语言是基于Java编程语言的,而且所有的JSP页面都被编译成为Java Servlet,JSP页面就具有Java技术的所有好处,包括健壮的存储管理和 ...
SpringMVC之六：Controller详细介绍
一.简介在SpringMVC 中,控制器Controller 负责处理由DispatcherServlet 分发的请求,它把用户请求的数据经过业务处理层处理之后封装成一个Model ,然后再把该Mo ...
windows平台下新网络库RIO ( Winsock high-speed networking Registered I/O)
What's New for Windows Sockets Microsoft Windows 8 and Windows Server 2012 introduce new Windows Soc ...

机器学习框架ML.NET学习笔记【9】自动学习

机器学习框架ML.NET学习笔记【9】自动学习的更多相关文章

随机推荐

热门专题