题目下载【传送门

第1题

简述:对于一组网络数据进行异常检测.

第1步:读取数据文件,使用高斯分布计算 μ 和 σ²:

  1. % The following command loads the dataset. You should now have the
  2. % variables X, Xval, yval in your environment
  3. load('ex8data1.mat');
  4.  
  5. % Estimate my and sigma2
  6. [mu sigma2] = estimateGaussian(X);

其中高斯分布计算函数estimateGaussian:

  1. function [mu sigma2] = estimateGaussian(X)
  2.  
  3. % Useful variables
  4. [m, n] = size(X);
  5.  
  6. % You should return these values correctly
  7. mu = zeros(n, 1);
  8. sigma2 = zeros(n, 1);
  9.  
  10. mu = mean(X);
  11. sigma2 = var(X, 1);
  12. % mu = mu';
  13. % sigma2 = sigma2';
  14.  
  15. end

第2步:计算概率p(x):

  1. % Returns the density of the multivariate normal at each data point (row)
  2. % of X
  3. p = multivariateGaussian(X, mu, sigma2);

其中概率计算函数

  1. function p = multivariateGaussian(X, mu, Sigma2)
  2.  
  3. k = length(mu);
  4.  
  5. if (size(Sigma2, 2) == 1) || (size(Sigma2, 1) == 1)
  6. Sigma2 = diag(Sigma2);
  7. end
  8.  
  9. X = bsxfun(@minus, X, mu(:)');
  10. p = (2 * pi) ^ (- k / 2) * det(Sigma2) ^ (-0.5) * ...
  11. exp(-0.5 * sum(bsxfun(@times, X * pinv(Sigma2), X), 2));
  12.  
  13. end

第3步:可视化数据,并绘制概率等高线:

  1. % Visualize the fit
  2. visualizeFit(X, mu, sigma2);
  3. xlabel('Latency (ms)');
  4. ylabel('Throughput (mb/s)');

其中visualizeFit函数:

  1. function visualizeFit(X, mu, sigma2)
  2.  
  3. [X1,X2] = meshgrid(0:.5:35);
  4. Z = multivariateGaussian([X1(:) X2(:)],mu,sigma2);
  5. Z = reshape(Z,size(X1));
  6.  
  7. plot(X(:, 1), X(:, 2),'bx');
  8. hold on;
  9. % Do not plot if there are infinities
  10. if (sum(isinf(Z)) == 0)
  11. contour(X1, X2, Z, 10.^(-20:3:0)');
  12. end
  13. hold off;
  14.  
  15. end

运行结果:

第4步:使用交叉验证集选出最佳参数 ε:

  1. pval = multivariateGaussian(Xval, mu, sigma2);
  2.  
  3. [epsilon F1] = selectThreshold(yval, pval);
  4. fprintf('Best epsilon found using cross-validation: %e\n', epsilon);
  5. fprintf('Best F1 on Cross Validation Set: %f\n', F1);

其中selectThreshold函数:

  1. function [bestEpsilon bestF1] = selectThreshold(yval, pval)
  2.  
  3. bestEpsilon = 0;
  4. bestF1 = 0;
  5. F1 = 0;
  6.  
  7. stepsize = (max(pval) - min(pval)) / 1000;
  8. for epsilon = min(pval):stepsize:max(pval)
  9. predictions = pval < epsilon;
  10. tp = sum(predictions .* yval);
  11. prec = tp / sum(predictions);
  12. rec = tp / sum(yval);
  13. F1 = 2 * prec * rec / (prec + rec);
  14.  
  15. if F1 > bestF1
  16. bestF1 = F1;
  17. bestEpsilon = epsilon;
  18. end
  19. end
  20.  
  21. end

运行结果:

第5步:找出异常点,并可视化标记:

  1. % Find the outliers in the training set and plot the
  2. outliers = find(p < epsilon);
  3.  
  4. % Draw a red circle around those outliers
  5. hold on
  6. plot(X(outliers, 1), X(outliers, 2), 'ro', 'LineWidth', 2, 'MarkerSize', 10);
  7. hold off

运行结果:

第2题

简述:实现电影推荐系统

第1步:读取数据文件(截取较少的数据):

  1. % Load data
  2. load ('ex8_movies.mat');
  3.  
  4. % Y is a 1682x943 matrix, containing ratings (1-5) of 1682 movies on
  5. % 943 users
  6. %
  7. % R is a 1682x943 matrix, where R(i,j) = 1 if and only if user j gave a
  8. % rating to movie i
  9.  
  10. % Load pre-trained weights (X, Theta, num_users, num_movies, num_features)
  11. load ('ex8_movieParams.mat');
  12.  
  13. % Reduce the data set size so that this runs faster
  14. num_users = 4; num_movies = 5; num_features = 3;
  15. X = X(1:num_movies, 1:num_features);
  16. Theta = Theta(1:num_users, 1:num_features);
  17. Y = Y(1:num_movies, 1:num_users);
  18. R = R(1:num_movies, 1:num_users);

第2步:计算代价函数和梯度:

  1. J = cofiCostFunc([X(:) ; Theta(:)], Y, R, num_users, num_movies, ...
  2. num_features, 1.5);

其中cofiCostFunc函数:

  1. function [J, grad] = cofiCostFunc(params, Y, R, num_users, num_movies, ...
  2. num_features, lambda)
  3.  
  4. % Unfold the U and W matrices from params
  5. X = reshape(params(1:num_movies*num_features), num_movies, num_features);
  6. Theta = reshape(params(num_movies*num_features+1:end), ...
  7. num_users, num_features);
  8.  
  9. % You need to return the following values correctly
  10. J = 0;
  11. X_grad = zeros(size(X));
  12. Theta_grad = zeros(size(Theta));
  13.  
  14. cost = (X * Theta' - Y) .* R;
  15. J = 1 / 2 * sum(sum(cost .^ 2));
  16. J = J + lambda / 2 * (sum(sum(Theta .^ 2)) + sum(sum(X .^ 2)));
  17.  
  18. X_grad = cost * Theta;
  19. X_grad = X_grad + lambda * X;
  20.  
  21. Theta_grad = X' * cost;
  22. Theta_grad = Theta_grad' + lambda * Theta;
  23.  
  24. grad = [X_grad(:); Theta_grad(:)];
  25.  
  26. end

第3步:进行梯度检测:

  1. % Check gradients by running checkNNGradients
  2. checkCostFunction(1.5);

其中checkCostFunction函数:

  1. function checkCostFunction(lambda)
  2.  
  3. % Set lambda
  4. if ~exist('lambda', 'var') || isempty(lambda)
  5. lambda = 0;
  6. end
  7.  
  8. %% Create small problem
  9. X_t = rand(4, 3);
  10. Theta_t = rand(5, 3);
  11.  
  12. % Zap out most entries
  13. Y = X_t * Theta_t';
  14. Y(rand(size(Y)) > 0.5) = 0;
  15. R = zeros(size(Y));
  16. R(Y ~= 0) = 1;
  17.  
  18. %% Run Gradient Checking
  19. X = randn(size(X_t));
  20. Theta = randn(size(Theta_t));
  21. num_users = size(Y, 2);
  22. num_movies = size(Y, 1);
  23. num_features = size(Theta_t, 2);
  24.  
  25. numgrad = computeNumericalGradient( ...
  26. @(t) cofiCostFunc(t, Y, R, num_users, num_movies, ...
  27. num_features, lambda), [X(:); Theta(:)]);
  28.  
  29. [cost, grad] = cofiCostFunc([X(:); Theta(:)], Y, R, num_users, ...
  30. num_movies, num_features, lambda);
  31.  
  32. disp([numgrad grad]);
  33. fprintf(['The above two columns you get should be very similar.\n' ...
  34. '(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n']);
  35.  
  36. diff = norm(numgrad-grad)/norm(numgrad+grad);
  37. fprintf(['If your cost function implementation is correct, then \n' ...
  38. 'the relative difference will be small (less than 1e-9). \n' ...
  39. '\nRelative Difference: %g\n'], diff);
  40.  
  41. end

其中computeNumericalGradient函数:

  1. function numgrad = computeNumericalGradient(J, theta)
  2.  
  3. numgrad = zeros(size(theta));
  4. perturb = zeros(size(theta));
  5. e = 1e-4;
  6. for p = 1:numel(theta)
  7. % Set perturbation vector
  8. perturb(p) = e;
  9. loss1 = J(theta - perturb);
  10. loss2 = J(theta + perturb);
  11. % Compute Numerical Gradient
  12. numgrad(p) = (loss2 - loss1) / (2*e);
  13. perturb(p) = 0;
  14. end
  15.  
  16. end

  

第4步:对某一用户进行预测,初始化用户的信息:

  1. movieList = loadMovieList();
  2.  
  3. % Initialize my ratings
  4. my_ratings = zeros(1682, 1);
  5.  
  6. my_ratings(1) = 4;
  7. my_ratings(98) = 2;
  8. my_ratings(7) = 3;
  9. my_ratings(12)= 5;
  10. my_ratings(54) = 4;
  11. my_ratings(64)= 5;
  12. my_ratings(66)= 3;
  13. my_ratings(69) = 5;
  14. my_ratings(183) = 4;
  15. my_ratings(226) = 5;
  16. my_ratings(355)= 5;

其中loadMovieList函数:

  1. function movieList = loadMovieList()
  2.  
  3. %% Read the fixed movieulary list
  4. fid = fopen('movie_ids.txt');
  5.  
  6. % Store all movies in cell array movie{}
  7. n = 1682; % Total number of movies
  8.  
  9. movieList = cell(n, 1);
  10. for i = 1:n
  11. % Read line
  12. line = fgets(fid);
  13. % Word Index (can ignore since it will be = i)
  14. [idx, movieName] = strtok(line, ' ');
  15. % Actual Word
  16. movieList{i} = strtrim(movieName);
  17. end
  18. fclose(fid);
  19.  
  20. end

第5步:将新用户增加到数据集中:

  1. % Load data
  2. load('ex8_movies.mat');
  3.  
  4. % Y is a 1682x943 matrix, containing ratings (1-5) of 1682 movies by
  5. % 943 users
  6. %
  7. % R is a 1682x943 matrix, where R(i,j) = 1 if and only if user j gave a
  8. % rating to movie i
  9.  
  10. % Add our own ratings to the data matrix
  11. Y = [my_ratings Y];
  12. R = [(my_ratings ~= 0) R];

第6步:均值归一化:

  1. % Normalize Ratings
  2. [Ynorm, Ymean] = normalizeRatings(Y, R);

其中normalizeRatings函数:

  1. function [Ynorm, Ymean] = normalizeRatings(Y, R)
  2.  
  3. [m, n] = size(Y);
  4. Ymean = zeros(m, 1);
  5. Ynorm = zeros(size(Y));
  6. for i = 1:m
  7. idx = find(R(i, :) == 1);
  8. Ymean(i) = mean(Y(i, idx));
  9. Ynorm(i, idx) = Y(i, idx) - Ymean(i);
  10. end
  11.  
  12. end

第7步:实现梯度下降,训练模型:

  1. % Useful Values
  2. num_users = size(Y, 2);
  3. num_movies = size(Y, 1);
  4. num_features = 10;
  5.  
  6. % Set Initial Parameters (Theta, X)
  7. X = randn(num_movies, num_features);
  8. Theta = randn(num_users, num_features);
  9.  
  10. initial_parameters = [X(:); Theta(:)];
  11.  
  12. % Set options for fmincg
  13. options = optimset('GradObj', 'on', 'MaxIter', 100);
  14.  
  15. % Set Regularization
  16. lambda = 10;
  17. theta = fmincg (@(t)(cofiCostFunc(t, Ynorm, R, num_users, num_movies, ...
  18. num_features, lambda)), ...
  19. initial_parameters, options);
  20.  
  21. % Unfold the returned theta back into U and W
  22. X = reshape(theta(1:num_movies*num_features), num_movies, num_features);
  23. Theta = reshape(theta(num_movies*num_features+1:end), ...
  24. num_users, num_features);

第8步:实现推荐功能:

  1. p = X * Theta';
  2. my_predictions = p(:,1) + Ymean;
  3.  
  4. movieList = loadMovieList();
  5.  
  6. [r, ix] = sort(my_predictions, 'descend');
  7. fprintf('\nTop recommendations for you:\n');
  8. for i=1:10
  9. j = ix(i);
  10. fprintf('Predicting rating %.1f for movie %s\n', my_predictions(j), ...
  11. movieList{j});
  12. end

运行结果:

机器学习作业(八)异常检测与推荐系统——Matlab实现的更多相关文章

  1. 基于机器学习的web异常检测

    基于机器学习的web异常检测 Web防火墙是信息安全的第一道防线.随着网络技术的快速更新,新的黑客技术也层出不穷,为传统规则防火墙带来了挑战.传统web入侵检测技术通过维护规则集对入侵访问进行拦截.一 ...

  2. 基于机器学习的web异常检测——基于HMM的状态序列建模,将原始数据转化为状态机表示,然后求解概率判断异常与否

    基于机器学习的web异常检测 from: https://jaq.alibaba.com/community/art/show?articleid=746 Web防火墙是信息安全的第一道防线.随着网络 ...

  3. 机器学习作业(七)非监督学习——Matlab实现

    题目下载[传送门] 第1题 简述:实现K-means聚类,并应用到图像压缩上. 第1步:实现kMeansInitCentroids函数,初始化聚类中心: function centroids = kM ...

  4. 机器学习作业(二)逻辑回归——Matlab实现

    题目太长啦!文档下载[传送门] 第1题 简述:实现逻辑回归. 第1步:加载数据文件: data = load('ex2data1.txt'); X = data(:, [1, 2]); y = dat ...

  5. Andrew Ng机器学习课程笔记--week9(上)(异常检测&推荐系统)

    本周内容较多,故分为上下两篇文章. 一.内容概要 1. Anomaly Detection Density Estimation Problem Motivation Gaussian Distrib ...

  6. 【原】Coursera—Andrew Ng机器学习—课程笔记 Lecture 15—Anomaly Detection异常检测

    Lecture 15 Anomaly Detection 异常检测 15.1 异常检测问题的动机 Problem Motivation 异常检测(Anomaly detection)问题是机器学习算法 ...

  7. Stanford机器学习---第十一讲.异常检测

    之前一直在看Standford公开课machine learning中Andrew老师的视频讲解https://class.coursera.org/ml/class/index 同时配合csdn知名 ...

  8. 【原】Coursera—Andrew Ng机器学习—Week 9 习题—异常检测

    [1]异常检测 [2]高斯分布 [3]高斯分布 [4] 异常检测 [5]特征选择 [6] [7]多变量高斯分布 Answer: ACD B 错误.需要矩阵Σ可逆,则要求m>n  测验1 Answ ...

  9. 斯坦福机器学习视频笔记 Week9 异常检测和高斯混合模型 Anomaly Detection

    异常检测,广泛用于欺诈检测(例如“此信用卡被盗?”). 给定大量的数据点,我们有时可能想要找出哪些与平均值有显着差异. 例如,在制造中,我们可能想要检测缺陷或异常. 我们展示了如何使用高斯分布来建模数 ...

随机推荐

  1. java 入门如何设计类

    2019/12/24   |    在校大二上学期    |    太原科技大学 初学java后,我们会发现java难点不在于Java语法难学,而是把我们挂在了如何设计类的“吊绳”上了.这恰恰也是小白 ...

  2. Windows server 2012 出现大量无名已断开连接用户清楚办法

    打开cmd命令窗口,执行  taskkill /f /im winlogon.exe /t

  3. 软链接和硬链接——Linux中的文件共享

    硬链接(Hard Link)和软链接也称为符号链接(Symbolic Link)的目的是为了解决文件的共享使用问题.要阐明其原理,必须先理解Linux的文件存储方式. 索引结点 Linux是一个UNI ...

  4. opencv —— minEnclosingCircle、fitEllipse 寻找包裹轮廓的最小圆、点集拟合椭圆

    寻找包裹轮廓的最小圆:minEnclosingCircle 函数 返回圆应满足:① 轮廓上的点均在圆形空间内.② 没有面积更小的满足条件的圆. void minEnclosingCircle(Inpu ...

  5. Lucene之查询索引

    Query子类 TermQuery:根据域和关键词进行搜索 /** * termQuery根据域和关键词进行搜索 */ @Test public void termQuery() throws IOE ...

  6. 后端跨域的N种方法

    简单来说,CORS是一种访问机制,英文全称是Cross-Origin Resource Sharing,即我们常说的跨域资源共享,通过在服务器端设置响应头,把发起跨域的原始域名添加到Access-Co ...

  7. Mac 下如何判断 WIFI 的极限传输速度还有信号强度?

    每当你加入一个无线网络后,按住Option键并点击屏幕右上角的Wi-Fi图标,就会发现除了平常的各种网络外,还出现了关于网络连接技术细节的列表. 比如说,如果想知道信号强度的信息,则需要尤其关注列表中 ...

  8. ADO.NET事务封装

    在数据库工具类编写的过程中,对事务的处理操作想避免各个原子操作的事务对象赋值重复操作,想对外暴露的方法为如下形式 public bool ExecuteTransition(Action Transi ...

  9. day 9 深浅拷贝

    浅copy 现有数据 data = { "name":"alex", "age":18, "scores":{ &quo ...

  10. EPEL添加与删除

    EPEL简介 EPEL的全称叫 Extra Packages for Enterprise Linux,由Fedora社区打造,如它的全称,这是一个为红帽系列及衍生发行版如CentOS.Fedora提 ...