simple-libfm-example-part1
原文:https://thierrysilbermann.wordpress.com/2015/02/11/simple-libfm-example-part1/
I often get email of people asking me how to run libFM and having trouble to understand the whole pipeline. If you are verse in Machine Learning, a first step is to take a look at Steffen Rendle’s paper ‘Factorization Machines‘ and this one too:Factorization Machines with libFM
I’ll try to explain how to train different kinds of models with the 4 different learning algorithms that libFM provides and use the features of libFM (like grouping and relations)
But first, here is a toy example of how each file should be. (Was posted in the libfm google group)
Simple example for 2 users and 3 items. We have 2 users, 3 items in our training set and now you want to test on the same 2 users, but now you have 4 items (the same 3 from training + one new))
Each user has a categorical feature age [“18-25”, “26-40”, “40-60”] and each item has a numerical feature price.
I one-hot encoded the users:
0 is User1
1 is User2
Same thing for items,
2 is Item1,
3 is Item2,
4 is Item3,
5 is Item4
The categorical feature age need to be one encoded too
6 is the category “18-25”,
7 is the category “26-40”,
8 is the category “40-60”
And finally the numerical feature price for the item
9 will represent the price feature
One sample can be:
5 0:1 3:1 6:1 9:20
#User1 who is 23yo is giving a rating of 5 on Item2 which costs 20 euros
We can then construct example and create a training and test set.
train.libfm
5 0:1 2:1 6:1 9:12.5
5 0:1 3:1 6:1 9:20
4 0:1 4:1 6:1 9:78
1 1:1 2:1 8:1 9:12.5
1 1:1 3:1 8:1 9:20
num_features = 10 #Computed on the highest integer value that represents a feature (here 9 for the Item price) + 1 (because we expect people to start at 0)
test.libfm
0 1:1 4:1 8:1 9:78
0 0:1 5:1 6:1
num_features = 10 #Computed on the highest integer value that represents a feature (here 9 for the Item price) + 1 (because we expect people to start at 0)
For the test, I have two samples I want prediction. The 0 doesn’t really have any effect in testing (Only useful if you have the true value, then libFM will output the RMSE error on it but will not use it to train the model)
Just to be sure, here is the meaning of those two samples in test:
0 1:1 4:1 8:1 9:78
#Here User2 who is 41yo is rating Item3 which costs 78 euros and we gave a rating of 0 because we don’t know yet the real rating0 0:1 5:1 6:1
#We want to know which rating User1 who is 23yo will give to a not-yet seen Item4 and we don’t know the price
This format is the same as for libSVM
From here you have two files: train.libfm and test.libfm (the extension doesn’t matter)
You can then run libFM like this for regression (predicting ratings):
./libfm -task r -method mcmc -train train.libfm -test test.libfm -iter 10 -dim ‘1,1,2’ -out output.libfm
So the model was train using [MCMC (-method mcmc)] on [10 (-iter 10)] iterations using a [linear model (+bias) and using factorization with 2 latent factors. (-dim ‘1,1,2’)]
You will then get some output out of the command line and prediction will be written in the file ‘output.libfm’
Discussions
This is of course a toy example but show you what you can use in libFM to train your model.
I wouldn’t recommand using the price feature like this but maybe do some transformation like log to avoid having a feature with large value but I hope you get the point.
simple-libfm-example-part1的更多相关文章
- PyNest——Part1:neurons and simple neural networks
neurons and simple neural networks pynest – nest模拟器的界面 神经模拟工具(NEST:www.nest-initiative.org)专为仿真点神经元的 ...
- [译] Extending jQuery Part1 Simple extensions
本章包含: JQuery 的起源和目标. 你能扩展JQuery 的那些部分. JQuery 扩展的实例. 如今,JQuery 已经是网络上最受欢迎的JavaScript Library. 1.1 jQ ...
- Linux平台 Oracle 10gR2(10.2.0.5)RAC安装 Part1:准备工作
Linux平台 Oracle 10gR2(10.2.0.5)RAC安装 Part1:准备工作 环境:OEL 5.7 + Oracle 10.2.0.5 RAC 1.实施前准备工作 1.1 服务器安装操 ...
- PHP设计模式(一)简单工厂模式 (Simple Factory For PHP)
最近天气变化无常,身为程序猿的寡人!~终究难耐天气的挑战,病倒了,果然,程序猿还需多保养自己的身体,有句话这么说:一生只有两件事能报复你:不够努力的辜负和过度消耗身体的后患.话不多说,开始吧. 一.什 ...
- Linux平台 Oracle 11gR2 RAC安装Part1:准备工作
一.实施前期准备工作 1.1 服务器安装操作系统 1.2 Oracle安装介质 1.3 共享存储规划 1.4 网络规范分配 二.安装前期准备工作 2.1 各节点系统时间校对 2.2 各节点关闭防火墙和 ...
- Design Patterns Simplified - Part 3 (Simple Factory)【设计模式简述--第三部分(简单工厂)】
原文链接:http://www.c-sharpcorner.com/UploadFile/19b1bd/design-patterns-simplified-part3-factory/ Design ...
- WATERHAMMER: A COMPLEX PHENOMENON WITH A SIMPLE SOLUTION
开启阅读模式 WATERHAMMER A COMPLEX PHENOMENON WITH A SIMPLE SOLUTION Waterhammer is an impact load that is ...
- BZOJ 3489: A simple rmq problem
3489: A simple rmq problem Time Limit: 40 Sec Memory Limit: 600 MBSubmit: 1594 Solved: 520[Submit] ...
- Le lié à la légèreté semblait être et donc plus simple
Il est toutefois vraiment à partir www.runmasterfr.com/free-40-flyknit-2015-hommes-c-1_58_59.html de ...
- ZOJ 3686 A Simple Tree Problem
A Simple Tree Problem Time Limit: 3 Seconds Memory Limit: 65536 KB Given a rooted tree, each no ...
随机推荐
- 最短网络Agri-Net
[问题描述] 农民约翰被选为他们镇的镇长!他其中一个竞选承诺就是在镇上建立起互联网,并连接到所有的农场.当然,他需要你的帮助.约翰已经给他的农场安排了一条高速的网络线路,他想把这条线路共享给其他农场. ...
- bzoj 4573: [Zjoi2016]大森林 lct splay
http://www.lydsy.com/JudgeOnline/problem.php?id=4573 http://blog.csdn.net/lych_cys/article/details/5 ...
- 最小生成树 Prim(普里姆)算法和Kruskal(克鲁斯特尔)算法
Prim算法 1.概览 普里姆算法(Prim算法),图论中的一种算法,可在加权连通图里搜索最小生成树.意即由此算法搜索到的边子集所构成的树中,不但包括了连通图里的所有顶点(英语:Vertex (gra ...
- python开发_HTMLParser_html文档解析
''' 在HTMLParser类中,定义了很多的方法,但是很多方法都是没有实现的, 这需要我们继承HTMLParser类,自己去实现一些方法 如: # Overridable -- handle st ...
- 三个实例演示 Java Thread Dump 日志分析(转)
原文链接:http://www.cnblogs.com/zhengyun_ustc/archive/2013/01/06/dumpanalysis.html 转来当笔记^_^ jstack Dump ...
- Red Hat Enterprise Linux 7.4上安装Oracle 11.2.0.4
1. 配置Yum源及关闭SeLinux [root@localhost ~]# mkdir /media/rhel [root@localhost ~]# mount /dev/cdrom /medi ...
- 淘宝海量数据库之八-攻克随机IO难关 -----阿里正祥的博客
http://blog.sina.com.cn/s/blog_3fc85e260100qwv8.html
- 如何使用 Core Plot 的 API 帮助文档
Core Plot 可是 iOS 下绝好的图表组件,虽说它的相关资料不甚丰富,特别是中文的,英文的还是有几篇不错的文章,不过 Core Plot 自身提供的 API 帮助文档,以及代码示例其实很有用的 ...
- 【Linux】在虚拟机上安装CentOS7
在配置好的机子上,可以装个双系统,但是在我自己的本子上,磁盘读写太垃圾了,连压缩卷 都执行不了,分不出空间,装不了CentOS系统,没办法,采用虚拟机的方式,把它转起来. -------------- ...
- Windows和Linux下如何查看端口被哪个进程占用
Windows: C:/Users/ewanbao>netstat -aon|findstr "123" TCP 127.0.0.1:55123 0.0 ...