simrank python实现
1、数据
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
2、simrank 的python实现
import numpy as np
from numpy import matrix with open('sample1 (1).txt','r') as log_fp:
logs = [log.strip() for log in log_fp.readlines()]
# print(logs)
logs_tuple = [tuple(log.split(",")) for log in logs]
# print (logs_tuple) queries = list(set([log[0] for log in logs_tuple]))
# print(queries) #['digital camera', 'flower', 'pc', 'camera', 'tv']
ads = list(set([log[1] for log in logs_tuple]))
# print(ads)#['hp.com', 'teleflora.com', 'bestbuy.com', 'orchids.com'] graph = np.matrix(np.zeros([len(queries),len(ads)]))
# print(graph) #6行4列的0矩阵 for log in logs_tuple:
query = log[0]
ad = log[1]
q_i = queries.index(query)
a_j = ads.index(ad)
graph[q_i,a_j] +=1
print(graph) query_sim = matrix(np.identity(len(queries)))
print(query_sim)
ad_sim = matrix(np.identity(len(ads)))
print(ad_sim) def get_ads_num(query):
q_i = queries.index(query)
return graph[q_i] def get_queries_num(ad):
a_j = ads.index(ad)
return graph.transpose()[a_j] def get_ads(query):
series = get_ads_num(query).tolist()[0]
return [ads[x] for x in range(len(series)) if series[x] > 0] def get_queries(ad):
series = get_queries_num(ad).tolist()[0]
return [queries[x] for x in range(len(series)) if series[x] > 0] def query_simrank(q1,q2,c):
if q1 == q2 :
return 1
prefix = c/(get_ads_num(q1).sum() *get_ads_num(q2).sum())
postfix = 0
for ad_i in get_ads(q1):
for ad_j in get_ads(q2):
i = ads.index(ad_i)
j = ads.index(ad_j)
postfix += ad_sim[i,j]
return prefix*postfix def ad_simrank(a1,a2,c):
if a1 == a2 :
return 1
prefix = c/(get_queries_num(a1).sum()*get_queries_num(a2).sum())
postfix = 0
for query_i in get_queries(a1):
for query_j in get_queries(a2):
i = queries.index(query_i)
j = queries.index(query_j)
postfix += query_sim[i,j]
return prefix*postfix def simrank(c=0.8,times = 1):
global query_sim,ad_sim for run in range(times):
new_query_sim = matrix(np.identity(len(queries)))
for qi in queries:
for qj in queries:
i = queries.index(qi)
j = queries.index(qj)
new_query_sim[i,j] =query_simrank(qi,qj,c) new_ad_sim = matrix(np.identity(len(ads)))
for ai in ads:
for aj in ads :
i = ads.index(ai)
j = ads.index(aj)
new_ad_sim[i,j] =ad_simrank(ai,aj,c) query_sim = new_query_sim
ad_sim = new_ad_sim if __name__ == '__main__':
print (queries)
print(ads)
simrank()
print(query_sim)
print(ad_sim)
[[15. 0. 0. 0.]
[ 0. 0. 10. 0.]
[ 5. 0. 20. 0.]
[ 7. 0. 30. 0.]
[ 0. 16. 0. 15.]]
[[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
['tv', 'pc', 'camera', 'digital camera', 'flower']
['bestbuy.com', 'teleflora.com', 'hp.com', 'orchids.com']
[[1. 0. 0.00213333 0.00144144 0. ]
[0. 1. 0.0032 0.00216216 0. ]
[0.00213333 0.0032 1. 0.00172973 0. ]
[0.00144144 0.00216216 0.00172973 1. 0. ]
[0. 0. 0. 0. 1. ]]
[[1.00000000e+00 0.00000000e+00 9.87654321e-04 0.00000000e+00]
[0.00000000e+00 1.00000000e+00 0.00000000e+00 3.33333333e-03]
[9.87654321e-04 0.00000000e+00 1.00000000e+00 0.00000000e+00]
[0.00000000e+00 3.33333333e-03 0.00000000e+00 1.00000000e+00]]
simrank python实现的更多相关文章
- Python中的多进程与多线程(一)
一.背景 最近在Azkaban的测试工作中,需要在测试环境下模拟线上的调度场景进行稳定性测试.故而重操python旧业,通过python编写脚本来构造类似线上的调度场景.在脚本编写过程中,碰到这样一个 ...
- Python高手之路【六】python基础之字符串格式化
Python的字符串格式化有两种方式: 百分号方式.format方式 百分号的方式相对来说比较老,而format方式则是比较先进的方式,企图替换古老的方式,目前两者并存.[PEP-3101] This ...
- Python 小而美的函数
python提供了一些有趣且实用的函数,如any all zip,这些函数能够大幅简化我们得代码,可以更优雅的处理可迭代的对象,同时使用的时候也得注意一些情况 any any(iterable) ...
- JavaScript之父Brendan Eich,Clojure 创建者Rich Hickey,Python创建者Van Rossum等编程大牛对程序员的职业建议
软件开发是现时很火的职业.据美国劳动局发布的一项统计数据显示,从2014年至2024年,美国就业市场对开发人员的需求量将增长17%,而这个增长率比起所有职业的平均需求量高出了7%.很多人年轻人会选择编 ...
- 可爱的豆子——使用Beans思想让Python代码更易维护
title: 可爱的豆子--使用Beans思想让Python代码更易维护 toc: false comments: true date: 2016-06-19 21:43:33 tags: [Pyth ...
- 使用Python保存屏幕截图(不使用PIL)
起因 在极客学院讲授<使用Python编写远程控制程序>的课程中,涉及到查看被控制电脑屏幕截图的功能. 如果使用PIL,这个需求只需要三行代码: from PIL import Image ...
- Python编码记录
字节流和字符串 当使用Python定义一个字符串时,实际会存储一个字节串: "abc"--[97][98][99] python2.x默认会把所有的字符串当做ASCII码来对待,但 ...
- Apache执行Python脚本
由于经常需要到服务器上执行些命令,有些命令懒得敲,就准备写点脚本直接浏览器调用就好了,比如这样: 因为线上有现成的Apache,就直接放它里面了,当然访问安全要设置,我似乎别的随笔里写了安全问题,这里 ...
- python开发编译器
引言 最近刚刚用python写完了一个解析protobuf文件的简单编译器,深感ply实现词法分析和语法分析的简洁方便.乘着余热未过,头脑清醒,记下一点总结和心得,方便各位pythoner参考使用. ...
随机推荐
- python3.0笔记
python文件头 #!/usr/bin/env python # -*- coding: utf- -*- ''' Created on 2017年5月9日 @author: Administrat ...
- Mysql数据库密码忘记的解决办法
密码忘记——破解密码 跳过授权方式,直接登录!! 1.以管理员身份打开cmd 2.停掉mysql服务端 C:\WINDOWS\system32>net stop mysql MySQL 服务正在 ...
- EntityFrameworkCore.MySql
1.点击“工具”->“NuGet包管理器”->“程序包管理器控制台” 分别安装以下几个包 Mysql 版本: Install-Package MySql.Data.EntityFramew ...
- Series.str方法
1 对dataframe的某一列用str处理后,其类型是<class 'pandas.core.strings.StringMethods'>.可以对df.['列名'].str直接进行切片 ...
- 16/7/7_PHP-面向对象关键词
转载地址: http://blog.sina.com.cn/s/blog_5182b171010092fb.html PHP5 是一具备了大部分面向对象语言的特性的语言,比PHP4 有了很多的面向对象 ...
- Mac018--VisualBox & ubuntu 安装
一.安装虚拟机VMware 参考博客:https://blog.csdn.net/u013142781/article/details/50529030 Step1:下载ubuntu镜像 注:选择Ub ...
- python3爬虫之urllib初探
urllib主要包含request(请求模块).error(异常处理模块).parse(工具模块).robotparser(识别网站的robots.txt文件,是否允许爬取). request(请求模 ...
- Java 实现Excel的简单读取操作
JAVA实现Excel表单的简单读取操作 实现Excel表单的简单读取操作,首先要导入相关的jar包: 如图所示: 此处贴上代码: public static List<List<Stri ...
- postman从body,headers,data中获取token后回写做全局变量
设置全局变量
- display:table
display:table的CSS声明能够让一个HTML元素和它的子节点像table元素一样.使用基于表格的CSS布局,使我们能够轻松定义一个单元格的边界.背景等样式,而不会产生因为使用了table那 ...