what difference between libfm and libffm

https://www.kaggle.com/users/25112/steffen-rendle/forum

Congratulations to Yu-Chin, Wei-Sheng, Yong and Michael!

There have been several questions about the relationship between FM and FFM. Here are my thoughts about the differences and similarities.

Notation

m categorical variables (="fields")
k is the factorization dimension of FM
k' is the factorization dimension of FFM

Models (slightly simplified)

FM is defined as

y(x) = sum_i sum_j>i 〈v_i,v_j〉 x_i x_j

FFM is defined as

y(x) = sum_i sum_j>i 〈v^J_i,v^I_j〉 x_i x_j

The difference between both models is that FFM assumes that the factors between interactions (e.g. v_i of (I,J) and v_i of (I,L)) are independent whereas FM uses a shared parameter space.

Number of parameters and costs

FFM has k' * (m-1) parameters per predictor variable.
FM has k parameters per predictor variable.
FFM has a runtime complexity of k' * m * (m-1) / 2 = O(k' * m^2) per training example
FM has a runtime complexity of k * m = O(k * m) per training example (because the nested sums can be decomposed due to parameter sharing).

That means from a cost point of view, an FFM with dimensionality k' should be compared to an FM with an m times larger dimension, i.e. k=k'*m. With this choice both FFM and FM have the same number of parameters (memory costs) and the same runtime complexity.

Expressiveness

FFM and FM have different assumptions on the interaction matrix V. But given a large enough k and k', both can represent any possible second order polynomial model.

The motivation of FM and FFM is to approximate the (unobserved) pairwise interaction matrix W of polynomial regression by a low rank solution V*V^t = W. FM and FFM have different assumptions how V looks like. FFM assumes that V has a block structure:

| v^2_1 v^1_2 0 0 0 | | v^3_1 0 v^1_3 0 0 | | v^4_1 0 0 v^1_4 0 | V(FFM) = | v^5_1 0 0 0 v^1_5 | | 0 v^3_2 v^2_3 0 0 | | 0 v^4_2 0 v^2_4 0 | | ... |

FM does not assume such a structure:

V(FM) = | v_1 v_2 v_3 v_4 v_5 |

(Note that v are not scalars but vectors of length k' (for FFM) or of length k (for FM). Also to shorten notation, one entry v in the matrices above represents all the v vectors of a "field"/ categorical variable.)

If the assumption of FFM holds, then FFM needs less parameters than FM to describe V because FM would need parameters to capture the 0s.

If the assumption of FM holds, then FM needs less parameters than FFM to describe V because FFM would need to repeat values of vectors as it requires separate parameters.

==============================

You are very welcome!

(2) They are similar but not the same. The FM model in the paper you provided
is field unaware. The difference between equation 1 in the paper and the
formula on page 14 of our slide is that our w is not only indexed by j1 and
j2, but also indexed by f1 and f2. Consider the example on page 15, if
Rendle's FM is applied, it becomes:

w376^Tw248x376x248 + w376^Tw571x376x571 + w376^Tw942x376x942
+ w248^Tw571x248x571 + w248^Tw942x248x942
+ w571^Tw942x571x942

BTW, we use k = 4, not k = 2.

Please let me know if you have more questions. :)

Inspector wrote:

1) I think this helped a lot. I was confused what the hashing trick was doing. I was thinking perhaps the value of a feature, say 5a9ed9b0, was REPLACED by an integer. So, I understand now that one-hot-encoding is still being used, it is just the indexing of the data which is improved (memory wise) when hashed.

2) So this is essentially the same model as shown in the Rendle paper http://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf (equation 1) , where you used k=2 (the number of factors)? Is this correct?

Thanks very much!!

Some hints about the usage of libFM:

@Kapil: The order of features in the design matrix has no effect on the model -- for sure you should use the same ordering in training/ test set and in each line of each file. Theoretically there might be a difference because the learning algorithm iterates from the first to the last feature. So changing the order, might change the convergence slightly.

about K2: The larger K2, the more complex the model gets. Usually, the larger K2, the better, but too large values can also overfit. So start with small values of K2 and increase it (e.g. double it) until you get the best quality (on your holdout set). Runtime depends linearly on K2.

about generating libFM files: If your data is purely categorical and in some kind of CSV or TSV format, you can also use the Perl-script in the "script/"-folder of libFM to generate libFM-compatible files.

about "linear regression" and libFM: A factorization machine (=FM) includes linear regression. E.g. if you choose K2=0, then libFM does exactly the same as linear regression. If you choose K2>0, then an FM is "linear regression + second order polynomial regression with factorized pairwise interactions".

what difference between libfm and libffm的更多相关文章

xlearn安装
xlearn简介 xLearn is a high performance, easy-to-use, and scalable machine learning package, which can ...
FM系列
在计算广告中,CTR是非常重要的一环.对于特征组合来说,业界通用的做法主要有两大类:FM系列和Tree系列.这里我们来介绍一下FM系列. 在传统的线性模型中,每个特征都是独立的,如果需要考虑特征与特征 ...
Java 堆内存与栈内存异同(Java Heap Memory vs Stack Memory Difference)
--reference Java Heap Memory vs Stack Memory Difference 在数据结构中,堆和栈可以说是两种最基础的数据结构,而Java中的栈内存空间和堆内存空间有 ...
What's the difference between a stub and mock?
I believe the biggest distinction is that a stub you have already written with predetermined behavio ...
[转载]Difference between <context:annotation-config> vs <context:component-scan>
在国外看到详细的说明一篇,非常浅显透彻.转给国内的筒子们:-) 原文标题: Spring中的<context:annotation-config>与<context:componen ...
What's the difference between <b> and <strong>, <i> and <em> in HTML/XHTML? When should you use each?
ref:http://stackoverflow.com/questions/271743/whats-the-difference-between-b-and-strong-i-and-em The ...
difference between forward and sendredirect
Difference between SendRedirect and forward is one of classical interview questions asked during jav ...
Add Digits, Maximum Depth of BinaryTree, Search for a Range, Single Number,Find the Difference
最近做的题记录下. 258. Add Digits Given a non-negative integer num, repeatedly add all its digits until the ...
MySQL: @variable vs. variable. Whats the difference?
MySQL: @variable vs. variable. Whats the difference? up vote351down votefavorite 121 In another qu ...

随机推荐

Codeforces.739E.Gosha is hunting(DP 带权二分)
题目链接 \(Description\) 有\(n\)只精灵,两种精灵球(高级和低级),每种球能捕捉到第\(i\)只精灵的概率已知.求用\(A\)个低级球和\(B\)个高级球能捕捉到精灵数的最大期望. ...
android实现gif图播放、暂停、继续播放
之前做过一个项目,在android上实现gif图的播放以及点击屏幕弹出窗口显示gif图片的暂停,之前一直用gifView的jar包实现gif图片的显示,但是在gif暂停.继续播放这块没有找到好的解决方 ...
bzoj 2178
这题调精度真痛苦啊(向管理员要了数据才调出来). 用的是hwd在WC2015上讲的方法,考虑将原图分割,根据每个圆的左右边界和圆与圆交点的横坐标来分割,这样原图就被分成很多竖着的长条,并且每一条中间都 ...
Hyper-V创建固定大小虚拟机
1.新建硬盘点击确定,就创建好了一个固定大小的vhd文件,下面我们开始创建虚拟机. 2.创建虚拟机输入虚拟机名称选择第一代虚拟机我这里给虚拟机分配512MB内存网络配置在这之前我们已经创建 ...
GDB高级用法
http://blog.csdn.net/wwwsq/article/details/7086151
绝对定位的div的居中方法，下面的写法兼容IE系列浏览器和火狐浏览器。
详细解说,直接看样式:#dingwei{padding:10px;background-color:#003300;color:#FFFFFF; width:600px;height:300px; d ...
java 入门书籍（java7）
一.Java从入门到精通 <Java从入门到精通(第3版)>从刚開始学习的人角度出发,通过通俗易懂的语言.丰富多彩的实例.具体介绍了使用Java语言进行程序开发须要掌握的知识. <J ...
线程系列07,使用lock语句块或Interlocked类型方法保证自增变量的数据同步
假设多个线程共享一个静态变量,如果让每个线程都执行相同的方法每次让静态变量自增1,这样的做法线程安全吗?能保证自增变量数据同步吗?本篇体验使用lock语句块和Interlocked类型方法保证自增变量 ...
探秘C#中的yield关键字
在"C#中,什么时候用yield return"中,我们了解到:使用yield return返回集合,不是一次性加载到内存中,而是客户端每调用一次就返回一个集合元素,是一种&quo ...
iOS中alloc与init
面向对象的3大特性,封装继承和多态. 我遇到过封装相关的问题,因为初级封装简单,常常暴露出被你封装的接口,进一步进行高级封装隐藏接口的时候才发现,封装是一门学问,而这门学问得从最基础的alloc与in ...

what difference between libfm and libffm

what difference between libfm and libffm的更多相关文章

随机推荐

热门专题