【346】TF-IDF

>>> from sklearn.feature_extraction.text import TfidfTransformer

>>> from sklearn.feature_extraction.text import CountVectorizer

>>> corpus=["I come to China to travel",

    "This is a car polupar in China",

    "I love tea and Apple ",

    "The work is to write some papers in science"]

>>> vectorizer=CountVectorizer()

>>> transformer = TfidfTransformer()

>>> tfidf = transformer.fit_transform(vectorizer.fit_transform(corpus))

>>> print(tfidf)

  (0, 16)	0.4424621378947393

  (0, 15)	0.697684463383976

  (0, 4)	0.4424621378947393

  (0, 3)	0.348842231691988

  (1, 14)	0.45338639737285463

  (1, 9)	0.45338639737285463

  (1, 6)	0.3574550433419527

  (1, 5)	0.3574550433419527

  (1, 3)	0.3574550433419527

  (1, 2)	0.45338639737285463

  (2, 12)	0.5

  (2, 7)	0.5

  (2, 1)	0.5

  (2, 0)	0.5

  (3, 18)	0.3565798233381452

  (3, 17)	0.3565798233381452

  (3, 15)	0.2811316284405006

  (3, 13)	0.3565798233381452

  (3, 11)	0.3565798233381452

  (3, 10)	0.3565798233381452

  (3, 8)	0.3565798233381452

  (3, 6)	0.2811316284405006

  (3, 5)	0.2811316284405006

>>> print(vectorizer.get_feature_names())

['and', 'apple', 'car', 'china', 'come', 'in', 'is', 'love', 'papers', 'polupar', 'science', 'some', 'tea', 'the', 'this', 'to', 'travel', 'work', 'write']

说明：其中 (0, 16) 表示第一行文本，索引为 16 的词，对应的是“travel”，以此类推。

继续上面的信息，获取对应 term 的 tfidf 值，tfidf 变量对应的是 (4, 19) 矩阵的值，对应不同的句子，不同的 term。

>>> tfidf_array = tfidf.toarray()    #获取array，然后遍历array，并分别转为list

>>> names_list = vectorizer.get_feature_names()    #获取names的list

>>> for i in range(0, len(corpus)):

	print(corpus[i],'\n')

	tmp_list = tfidf_array[i].tolist()

	for j in range(0, len(names_list)):

		if tmp_list[j] != 0:

			if len(names_list[j])>=7:

				print(names_list[j],'\t',tmp_list[j])

			else:

				print(names_list[j],'\t\t',tmp_list[j])

	print('')

I come to China to travel 

china 		 0.348842231691988

come 		 0.4424621378947393

to 		 0.697684463383976

travel 		 0.4424621378947393

This is a car polupar in China 

car 		 0.45338639737285463

china 		 0.3574550433419527

in 		 0.3574550433419527

is 		 0.3574550433419527

polupar 	 0.45338639737285463

this 		 0.45338639737285463

I love tea and Apple  

and 		 0.5

apple 		 0.5

love 		 0.5

tea 		 0.5

The work is to write some papers in science 

in 		 0.2811316284405006

is 		 0.2811316284405006

papers 		 0.3565798233381452

science 	 0.3565798233381452

some 		 0.3565798233381452

the 		 0.3565798233381452

to 		 0.2811316284405006

work 		 0.3565798233381452

write 		 0.3565798233381452

>>>

获取 TF(Term Frequency)

>>> X = vectorizer.fit_transform(corpus)

>>> X.toarray()

array([[0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0],

       [0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0],

       [1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],

       [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1]],

      dtype=int64)

>>> vector_array = X.toarray()

>>> for i in range(0, len(corpus)):

	print(corpus[i],'\n')

	tmp_list = vector_array[i].tolist()

	for j in range(0, len(names_list)):

		if tmp_list[j] != 0:

			if len(names_list[j])>=7:

				print(names_list[j],'\t',tmp_list[j])

			else:

				print(names_list[j],'\t\t',tmp_list[j])

	print('')

I come to China to travel 

china 		 1

come 		 1

to 		 2

travel 		 1

This is a car polupar in China 

car 		 1

china 		 1

in 		 1

is 		 1

polupar 	 1

this 		 1

I love tea and Apple  

and 		 1

apple 		 1

love 		 1

tea 		 1

The work is to write some papers in science 

in 		 1

is 		 1

papers 		 1

science 	 1

some 		 1

the 		 1

to 		 1

work 		 1

write 		 1

>>>

【346】TF-IDF的更多相关文章

【TensorFlow】tf.nn.softmax_cross_entropy_with_logits的用法
在计算loss的时候,最常见的一句话就是 tf.nn.softmax_cross_entropy_with_logits ,那么它到底是怎么做的呢? 首先明确一点,loss是代价值,也就是我们要最小化 ...
【TensorFlow】tf.nn.max_pool实现池化操作
max pooling是CNN当中的最大值池化操作,其实用法和卷积很类似有些地方可以从卷积去参考[TensorFlow]tf.nn.conv2d是怎样实现卷积的? tf.nn.max_pool(va ...
【转载】 tf.ConfigProto和tf.GPUOptions用法总结
原文地址: https://blog.csdn.net/C_chuxin/article/details/84990176 -------------------------------------- ...
【Tensorflow】tf.nn.depthwise_conv2d如何实现深度卷积?
版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明. 本文链接:https://blog.csdn.net/mao_xiao_feng/article/ ...
【Tensorflow】tf.nn.atrous_conv2d如何实现空洞卷积？膨胀卷积
介绍关于空洞卷积的理论可以查看以下链接,这里我们不详细讲理论: 1.Long J, Shelhamer E, Darrell T, et al. Fully convolutional network ...
【六】tf和cgi进行联合试验，完成日志服务器
[任务6]tf和cgi进行联合试验,完成日志服务器 [任务6]tf和cgi进行联合试验,完成日志服务器改装gen-cpp目录下client.cpp文件启动Nginx服务和gen-cpp目录下编译后 ...
【转载】 tf.train.slice_input_producer()和tf.train.batch()
原文地址: https://www.jianshu.com/p/8ba9cfc738c2 ------------------------------------------------------- ...
【TensorFlow】tf.nn.embedding_lookup函数的用法
tf.nn.embedding_lookup函数的用法主要是选取一个张量里面索引对应的元素.tf.nn.embedding_lookup(tensor, id):tensor就是输入张量,id就是张量 ...
【TensorFlow】tf.nn.conv2d是怎样实现卷积的？
tf.nn.conv2d是TensorFlow里面实现卷积的函数,参考文档对它的介绍并不是很详细,实际上这是搭建卷积神经网络比较核心的一个方法,非常重要 tf.nn.conv2d(input, fil ...

随机推荐

这些 .Net and Core 相关的开源项目，你都知道吗？（持续更新中...）
最近更新时间2017-12-28 序列化 Json.NET http://json.codeplex.com/Json.Net是一个读写Json效率比较高的.Net框架.Json.Net 使得在.Ne ...
android datepicker monthOfYear getMonth(): 获取当前月（注意：返回数值为0..11，需要自己+1来显示）.....
关键点: 1. getMonth(): 获取当前月(注意:返回数值为0..11,需要自己+1来显示) 2. 初始年(译者注:注意使用new Date()初始化年时,需要+1900,如下:dat ...
Hadoop文件系统支持释疑之S3
一.引言 Hadoop版本提供了对多种文件系统的支持,但是这些文件系统是以何种方式实现的,其实现原理是什么以前并没有深究过.今天正好有人咨询我这个问题:Hadoop对S3的支持原理是什么?特此总结一下 ...
BASIC-24_蓝桥杯_龟兔赛跑预测
示例代码: #include <stdio.h> int main(void){ int t1 = 0 , t2 = 0 , l1 = 0 , l2 = 0 ; int v1 = 0 , ...
BASIC-1_蓝桥杯_闰年判断
正确代码: #include <stdio.h> int main(void){ int year = 0 ; scanf("%d",&year); if (y ...
RDD之二：原理
RDD简介在集群背后,有一个非常重要的分布式数据架构,即弹性分布式数据集(Resilient Distributed Dataset,RDD).RDD是Spark的最基本抽象,是对分布式内存的抽象使 ...
开发框架-APP：Hybird App
ylbtech-开发框架-APP:Hybird App Hybrid App(混合模式移动应用)是指介于web-app.native-app这两者之间的app,兼具“Native App良好用户交互体 ...
2天时间终于把ntopng装好了
1.环境centos6.7x642.安装步骤,首先把centos按优化步骤完成3.更改centos的yum源,更改为阿里云的源.4.[root@netmon ntopng]# cat /etc/yum ...
利用Red Blob游戏介绍A*算法
转自:http://gad.qq.com/program/translateview/7194337 在游戏中,我们经常想要找到从一个位置到另一个位置的路径.我们不只是想要找到最短距离,同时也要考虑旅 ...
Ubuntu登录系统失败的解决方案
问题一: 只能用guest用户登录下,如何切换成普通用户登录解决: 重启,同时按Esc建,直至进入到恢复模式下: 选择第一项,进入: 找到ro...那一行,把ro之后的删除,并把ro修改为rw si ...

【346】TF-IDF

【346】TF-IDF的更多相关文章

随机推荐

热门专题