python机器学习-乳腺癌细胞挖掘(博主亲自录制视频)https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

机器学习,统计项目合作QQ:231469242

Wordnet with NLTK

  1. 英语的同义词和反义词函数
  1. # -*- coding: utf-8 -*-
  2. """
  3. Spyder Editor
  4.  
  5. 英语的同义词和反义词函数
  6. """
  7.  
  8. import nltk
  9. from nltk.corpus import wordnet
  10. syns=wordnet.synsets('program')
  11. '''
  12. syns
  13. Out[11]:
  14. [Synset('plan.n.01'),
  15. Synset('program.n.02'),
  16. Synset('broadcast.n.02'),
  17. Synset('platform.n.02'),
  18. Synset('program.n.05'),
  19. Synset('course_of_study.n.01'),
  20. Synset('program.n.07'),
  21. Synset('program.n.08'),
  22. Synset('program.v.01'),
  23. Synset('program.v.02')]
  24.  
  25. '''
  26.  
  27. print(syns[0].name())
  28.  
  29. '''
  30. plan.n.01
  31. '''
  32.  
  33. #just the word只显示文字,lemma要点
  34. print(syns[0].lemmas()[0].name())
  35. '''
  36. plan
  37. '''
  38. #单词句子使用
  39. print(syns[0].examples())
  40. '''
  41. ['they drew up a six-step plan', 'they discussed plans for a new bond issue']
  42. '''
  43.  
  44. '''
  45. synonyms=[]
  46. antonyms=[]
  47.  
  48. list_good=wordnet.synsets("good")
  49. for syn in list_good:
  50. for l in syn.lemmas():
  51. #print('l.name()',l.name())
  52. synonyms.append(l.name())
  53. if l.antonyms():
  54. antonyms.append(l.antonyms()[0].name())
  55.  
  56. print(set(synonyms))
  57. print(set(antonyms))
  58. '''
  59.  
  60. word="good"
  61. #返回一个单词的同义词和反义词列表
  62. def Word_synonyms_and_antonyms(word):
  63. synonyms=[]
  64. antonyms=[]
  65. list_good=wordnet.synsets(word)
  66. for syn in list_good:
  67. for l in syn.lemmas():
  68. #print('l.name()',l.name())
  69. synonyms.append(l.name())
  70. if l.antonyms():
  71. antonyms.append(l.antonyms()[0].name())
  72. return (set(synonyms),set(antonyms))
  73.  
  74. #返回一个单词的同义词列表
  75. def Word_synonyms(word):
  76. list_synonyms_and_antonyms=Word_synonyms_and_antonyms(word)
  77. return list_synonyms_and_antonyms[0]
  78.  
  79. #返回一个单词的反义词列表
  80. def Word_antonyms(word):
  81. list_synonyms_and_antonyms=Word_synonyms_and_antonyms(word)
  82. return list_synonyms_and_antonyms[1]
  83.  
  84. '''
  85. Word_synonyms("evil")
  86. Out[43]:
  87. {'evil',
  88. 'evilness',
  89. 'immorality',
  90. 'iniquity',
  91. 'malefic',
  92. 'malevolent',
  93. 'malign',
  94. 'vicious',
  95. 'wickedness'}
  96.  
  97. Word_antonyms('evil')
  98. Out[44]: {'good', 'goodness'}
  99. '''

wordNet是一个英语词汇数据库,普林斯顿大学创建,是nltk语料库的一部分

WordNet is a lexical database for the English language, which was created by Princeton, and is part of the NLTK corpus.

You can use WordNet alongside the NLTK module to find the meanings
of words, synonyms同义词, antonyms反义词, and more. Let's cover some examples.

First, you're going to need to import wordnet:

  1. from nltk.corpus import wordnet

Then, we're going to use the term "program" to find synsets 同义词集合like so:

  1. syns = wordnet.synsets("program")

An example of a synset:

  1. print(syns[0].name())

plan.n.01

Just the word: 只显示单词

  1. print(syns[0].lemmas()[0].name())

plan

Definition of that first synset:

  1. print(syns[0].definition())

a series of steps to be carried out or goals to be accomplished

Examples of the word in use:

  1. print(syns[0].examples())

['they drew up a six-step plan', 'they discussed plans for a new bond issue']

Next, how might we discern synonyms and antonyms to a word? The lemmas will be synonyms, and then you can use .antonyms to find the antonyms to the lemmas. As such, we can populate some lists like:

  1. synonyms = []
  2. antonyms = []
  3.  
  4. for syn in wordnet.synsets("good"):
  5. for l in syn.lemmas():
  6. synonyms.append(l.name())
  7. if l.antonyms():
  8. antonyms.append(l.antonyms()[0].name())
  9.  
  10. print(set(synonyms))
  11. print(set(antonyms))
{'beneficial', 'just', 'upright', 'thoroughly', 'in_force', 'well', 'skilful', 'skillful', 'sound', 'unspoiled', 'expert', 'proficient', 'in_effect', 'honorable', 'adept', 'secure', 'commodity', 'estimable', 'soundly', 'right', 'respectable', 'good', 'serious', 'ripe', 'salutary', 'dear', 'practiced', 'goodness', 'safe', 'effective', 'unspoilt', 'dependable', 'undecomposed', 'honest', 'full', 'near', 'trade_good'} {'evil', 'evilness', 'bad', 'badness', 'ill'}

As you can see, we got many more synonyms than antonyms, since we just looked up the antonym for the first lemma, but you could easily balance this buy also doing the exact same process for the term "bad."

比较单词近似度

Next, we can also easily use WordNet to compare the similarity of two words and their tenses, by incorporating the Wu and Palmer method for semantic related-ness.

Let's compare the noun of "ship" and "boat:"

  1. w1 = wordnet.synset('ship.n.01')
  2. w2 = wordnet.synset('boat.n.01')
  3. print(w1.wup_similarity(w2))

0.9090909090909091

  1. w1 = wordnet.synset('ship.n.01')
  2. w2 = wordnet.synset('car.n.01')
  3. print(w1.wup_similarity(w2))

0.6956521739130435

  1. w1 = wordnet.synset('ship.n.01')
  2. w2 = wordnet.synset('cat.n.01')
  3. print(w1.wup_similarity(w2))

0.38095238095238093

Next, we're going to pick things up a bit and begin to cover the topic of Text Classification.

自然语言22_Wordnet with NLTK的更多相关文章

  1. 转 --自然语言工具包(NLTK)小结

    原作者:http://www.cnblogs.com/I-Tegulia/category/706685.html 1.自然语言工具包(NLTK) NLTK 创建于2001 年,最初是宾州大学计算机与 ...

  2. 自然语言17_Chinking with NLTK

    https://www.pythonprogramming.net/chinking-nltk-tutorial/?completed=/chunking-nltk-tutorial/ 代码 # -* ...

  3. 自然语言16_Chunking with NLTK

    Chunking with NLTK 对chunk分类数据结构可以图形化输出,用于分析英语句子主干结构 # -*- coding: utf-8 -*-"""Created ...

  4. Python自然语言处理工具NLTK的安装FAQ

    1 下载Python 首先去python的主页下载一个python版本http://www.python.org/,一路next下去,安装完毕即可 2 下载nltk包 下载地址:http://www. ...

  5. Python自然语言工具包(NLTK)入门

    在本期文章中,小生向您介绍了自然语言工具包(Natural Language Toolkit),它是一个将学术语言技术应用于文本数据集的 Python 库.称为“文本处理”的程序设计是其基本功能:更深 ...

  6. Python NLTK 自然语言处理入门与例程(转)

    转 https://blog.csdn.net/hzp666/article/details/79373720     Python NLTK 自然语言处理入门与例程 在这篇文章中,我们将基于 Pyt ...

  7. NLTK在自然语言处理

    nltk-data.zip 本文主要是总结最近学习的论文.书籍相关知识,主要是Natural Language Pracessing(自然语言处理,简称NLP)和Python挖掘维基百科Infobox ...

  8. Python自然语言处理工具小结

    Python自然语言处理工具小结 作者:白宁超 2016年11月21日21:45:26 目录 [Python NLP]干货!详述Python NLTK下如何使用stanford NLP工具包(1) [ ...

  9. 自然语言处理(NLP)入门学习资源清单

    Melanie Tosik目前就职于旅游搜索公司WayBlazer,她的工作内容是通过自然语言请求来生产个性化旅游推荐路线.回顾她的学习历程,她为期望入门自然语言处理的初学者列出了一份学习资源清单. ...

随机推荐

  1. Web服务器磁盘满故障

    问题: 硬盘显示被写满,但是用du -sh /*查看时占用硬盘空间之和还远小于硬盘大小,即找不到硬盘分区是怎么被写满的.今天下午接到同事紧急求助,说生产线服务器硬盘满了.该删的日志都删掉了.可空间还是 ...

  2. difference between append and appendTo

    if you need append some string to element and need set some attribute on these string at the same ti ...

  3. 理解MySQL数据库覆盖索引

    话说有这么一个表: CREATE TABLE `user_group` ( `id` int(11) NOT NULL auto_increment, `uid` int(11) NOT NULL, ...

  4. 【ASP.NET Identity系列教程(一)】ASP.NET Identity入门

    注:本文是[ASP.NET Identity系列教程]的第一篇.本系列教程详细.完整.深入地介绍了微软的ASP.NET Identity技术,描述了如何运用ASP.NET Identity实现应用程序 ...

  5. ES6新特性:Javascript中Set和WeakSet类型的数据结构

    ES6提供了新的数据结构Set,Set对象不是数组, 可以用来保存对象或者基本类型, 所有保存的值都是唯一的, chrome浏览器>38和FF>13,以及nodeJS,对Set支持良好, ...

  6. Redis必须注意的慢查询问题

    今天解析服务在查询Redis的Set数据过程中抛出timeout exception,产生异常的方法是: db.SetMembers(key); 这个API返回结果是指定set内的所有kv对象: 解决 ...

  7. cygwin-介绍-安装

    初学linux时,最头疼的是,因为windows和linux各有优点,各有用途,所以只能麻烦的在两者之间切换,不断的重启.开机时也麻烦,因为初学者大多数使用windows,装了linux后,开机会自动 ...

  8. HTML-正则表达式

    常用HTML正则表达式      1.只能输入数字和英文的:       <input onkeyup="value=value.replace(/[\W]/g,'') "  ...

  9. dede使用方法----调用列表的标签使用及说明

    列表页的标签: {dede:list pagesize ='16'}---------列表页开始标签,16表示每页显示16条 [field:arcurl/]---------------------- ...

  10. C++编译期多态与运行期多态

    前言 今日的C++不再是个单纯的"带类的C"语言,它已经发展成为一个多种次语言所组成的语言集合,其中泛型编程与基于它的STL是C++发展中最为出彩的那部分.在面向对象C++编程中, ...