https://www.kaggle.com/users/25112/steffen-rendle/forum

Congratulations to Yu-Chin, Wei-Sheng, Yong and Michael!

There have been several questions about the relationship between FM and FFM. Here are my thoughts about the differences and similarities.

Notation

  • m categorical variables (="fields")
  • k is the factorization dimension of FM
  • k' is the factorization dimension of FFM

Models (slightly simplified)

  • FM is defined as

      y(x) = sum_i sum_j>i 〈v_i,v_j〉 x_i x_j

  • FFM is defined as

      y(x) = sum_i sum_j>i 〈v^J_i,v^I_j〉 x_i x_j

The difference between both models is that FFM assumes that the factors between interactions (e.g. v_i of (I,J) and v_i of (I,L)) are independent whereas FM uses a shared parameter space.

Number of parameters and costs

  • FFM has k' * (m-1) parameters per predictor variable.
  • FM has k parameters per predictor variable.
  • FFM has a runtime complexity of k' * m * (m-1) / 2 = O(k' * m^2) per training example
  • FM has a runtime complexity of k * m = O(k * m) per training example (because the nested sums can be decomposed due to parameter sharing).

That means from a cost point of view, an FFM with dimensionality k' should be compared to an FM with an m times larger dimension, i.e. k=k'*m. With this choice both FFM and FM have the same number of parameters (memory costs) and the same runtime complexity.

Expressiveness

FFM and FM have different assumptions on the interaction matrix V. But given a large enough k and k', both can represent any possible second order polynomial model.

The motivation of FM and FFM is to approximate the (unobserved) pairwise interaction matrix W of polynomial regression by a low rank solution V*V^t = W. FM and FFM have different assumptions how V looks like. FFM assumes that V has a block structure:


         | v^2_1  v^1_2  0      0      0     |
         | v^3_1  0      v^1_3  0      0     |
         | v^4_1  0      0      v^1_4  0     |
V(FFM) = | v^5_1  0      0      0      v^1_5 |
         | 0      v^3_2  v^2_3  0      0     |
         | 0      v^4_2  0      v^2_4  0     |
         | ...                               |

FM does not assume such a structure:

V(FM) = | v_1  v_2  v_3 v_4 v_5 |

(Note that v are not scalars but vectors of length k' (for FFM) or of length k (for FM). Also to shorten notation, one entry v in the matrices above represents all the v vectors of a "field"/ categorical variable.)

If the assumption of FFM holds, then FFM needs less parameters than FM to describe V because FM would need parameters to capture the 0s.

If the assumption of FM holds, then FM needs less parameters than FFM to describe V because FFM would need to repeat values of vectors as it requires separate parameters.

==============================

You are very welcome!

(2) They are similar but not the same. The FM model in the paper you provided
is field unaware. The difference between equation 1 in the paper and the
formula on page 14 of our slide is that our w is not only indexed by j1 and
j2, but also indexed by f1 and f2. Consider the example on page 15, if
Rendle's FM is applied, it becomes:

w376^Tw248x376x248 + w376^Tw571x376x571 + w376^Tw942x376x942
+ w248^Tw571x248x571 + w248^Tw942x248x942
+ w571^Tw942x571x942

BTW, we use k = 4, not k = 2.

Please let me know if you have more questions. :)

Inspector wrote:

1) I think this helped a lot. I was confused what the hashing trick was doing. I was thinking perhaps the value of a feature, say 5a9ed9b0, was REPLACED by an integer. So, I understand now that one-hot-encoding is still being used, it is just the indexing of the data which is improved (memory wise) when hashed.

2) So this is essentially the same model as shown in the Rendle paper http://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf (equation 1) , where you used k=2 (the number of factors)? Is this correct?

Thanks very much!!

Some hints about the usage of libFM:

@Kapil: The order of features in the design matrix has no effect on the model -- for sure you should use the same ordering in training/ test set and in each line of each file. Theoretically there might be a difference because the learning algorithm iterates from the first to the last feature. So changing the order, might change the convergence slightly.

about K2: The larger K2, the more complex the model gets. Usually, the larger K2, the better, but too large values can also overfit. So start with small values of K2 and increase it (e.g. double it) until you get the best quality (on your holdout set). Runtime depends linearly on K2.

about generating libFM files: If your data is purely categorical and in some kind of CSV or TSV format, you can also use the Perl-script in the "script/"-folder of libFM to generate libFM-compatible files.

about "linear regression" and libFM: A factorization machine (=FM) includes linear regression. E.g. if you choose K2=0, then libFM does exactly the same as linear regression. If you choose K2>0, then an FM is "linear regression + second order polynomial regression with factorized pairwise interactions".

what difference between libfm and libffm的更多相关文章

  1. xlearn安装

    xlearn简介 xLearn is a high performance, easy-to-use, and scalable machine learning package, which can ...

  2. FM系列

    在计算广告中,CTR是非常重要的一环.对于特征组合来说,业界通用的做法主要有两大类:FM系列和Tree系列.这里我们来介绍一下FM系列. 在传统的线性模型中,每个特征都是独立的,如果需要考虑特征与特征 ...

  3. Java 堆内存与栈内存异同(Java Heap Memory vs Stack Memory Difference)

    --reference Java Heap Memory vs Stack Memory Difference 在数据结构中,堆和栈可以说是两种最基础的数据结构,而Java中的栈内存空间和堆内存空间有 ...

  4. What's the difference between a stub and mock?

    I believe the biggest distinction is that a stub you have already written with predetermined behavio ...

  5. [转载]Difference between <context:annotation-config> vs <context:component-scan>

    在国外看到详细的说明一篇,非常浅显透彻.转给国内的筒子们:-) 原文标题: Spring中的<context:annotation-config>与<context:componen ...

  6. What's the difference between <b> and <strong>, <i> and <em> in HTML/XHTML? When should you use each?

    ref:http://stackoverflow.com/questions/271743/whats-the-difference-between-b-and-strong-i-and-em The ...

  7. difference between forward and sendredirect

    Difference between SendRedirect and forward is one of classical interview questions asked during jav ...

  8. Add Digits, Maximum Depth of BinaryTree, Search for a Range, Single Number,Find the Difference

    最近做的题记录下. 258. Add Digits Given a non-negative integer num, repeatedly add all its digits until the ...

  9. MySQL: @variable vs. variable. Whats the difference?

    MySQL: @variable vs. variable. Whats the difference?   up vote351down votefavorite 121 In another qu ...

随机推荐

  1. 本地文件包含漏洞(LFI漏洞)

    0x00 前言 本文的主要目的是分享在服务器遭受文件包含漏洞时,使用各种技术对Web服务器进行攻击的想法. 我们都知道LFI漏洞允许用户通过在URL中包括一个文件.在本文中,我使用了bWAPP和DVW ...

  2. Java-----jar反编译修改重新打包

    http://blog.csdn.net/hekewangzi/article/details/44676797 一.使用反编译工具JD-GUI(JD-GUI相关操作见Java-----反编译clas ...

  3. Windows Server 2008 R2下将nginx安装成windows系统服务

    一直在Linux平台上部署web服务,但是最近的一个项目,必须要用windows,不得已再次研究了nginx在windows下的表现,因为Apache httpd在Windows下表现其实也不算太好, ...

  4. zoj 3229 上下界网络最大可行流带输出方案

    收获: 1. 上下界网络流求最大流步骤: 1) 建出无环无汇的网络,并看是否存在可行流 2) 如果存在,那么以原来的源汇跑一次最大流 3) 流量下界加上当前网络每条边的流量就是最大可行流了. 2. 输 ...

  5. java泛型中的E,K,V,T,U,S

    注释: java 泛型类型使用大写形式,且比较短,这是常见的 在java库中,使用变量 E 表示集合的元素类型 K 和 V 分别表示数据库表数据的键key和值value的类型 T(如果有需要还可以使用 ...

  6. Lingoes 一款功能强大、简明易用的多语言词典和文本翻译软件

    Lingoes 软件自述 Lingoes 是一款功能强大.简明易用的多语言词典和文本翻译软件,支持多达80种语言互查互译,这些语言包括 英.法.德.意.俄.中.日.韩.西.葡.阿拉伯语 及更多... ...

  7. 查找(二)简单清晰的B树、Trie树具体解释

    查找(二) 散列表 散列表是普通数组概念的推广.因为对普通数组能够直接寻址,使得能在O(1)时间内訪问数组中的任何位置.在散列表中,不是直接把keyword作为数组的下标,而是依据keyword计算出 ...

  8. Ext BoxComponent

    Ext.BoxComponent也是一个比较重要的基础类,它直接继承自Ext.Component,并实现了定位和控制自身大小的功能. 可以使用pageX.pageY.x.y为Ext.BoxCompon ...

  9. C# 结构体定义 转换字节数组 z

    客户端采用C++开发,服务端采用C#开发,所以双方必须保证各自定义结构体成员类型和长度一致才能保证报文解析的正确性. [StructLayoutAttribute(LayoutKind.Sequent ...

  10. mr

    大数据技术 —— MapReduce 简介 本文为senlie原创,转载请保留此地址:http://www.cnblogs.com/senlie/ 1.概要很多计算在概念上很直观,但由于输入数据很大, ...