3阶马尔可夫链自然语言处理python

一、简介：

把每三个三个单词作为一个整体进行训练。

举一个例子：

input:

my dream is that I can be an engineer, so I design more applications for people to use.

my dream is that I can be a bird, so I can fly to everywhere I want.

it is also my dream that I can be a house, so I can warm you in the cold winter.

生成的马尔可夫链：

{'START': ['my dream is'], 'my dream is': ['that i can'], 'dream is that': ['i can be'], 'is that i': ['can be a'], 'that i can': ['be a house,'], 'i can be': ['a house, so'], 'can be an': ['engineer, so i'], 'be an engineer,': ['so i design'], 'an engineer, so': ['i design more'], 'engineer, so i': ['design more applications'], 'so i design': ['more applications for'], 'i design more': ['applications for people'], 'design more applications': ['for people to'], 'more applications for': ['people to use.\nmy'], 'applications for people': ['to use.\nmy dream'], 'for people to': ['use.\nmy dream is'], 'people to use.\nmy': ['dream is that'], 'to use.\nmy dream': ['is that i'], 'use.\nmy dream is': ['that i can'], 'can be a': ['house, so i'], 'be a bird,': ['so i can'], 'a bird, so': ['i can fly'], 'bird, so i': ['can fly to'], 'so i can': ['warm you in'], 'i can fly': ['to everywhere i'], 'can fly to': ['everywhere i want.\nit'], 'fly to everywhere': ['i want.\nit is'], 'to everywhere i': ['want.\nit is also'], 'everywhere i want.\nit': ['is also my'], 'i want.\nit is': ['also my dream'], 'want.\nit is also': ['my dream that'], 'is also my': ['dream that i'], 'also my dream': ['that i can'], 'my dream that': ['i can be'], 'dream that i': ['can be a'], 'be a house,': ['so i can'], 'a house, so': ['i can warm'], 'house, so i': ['can warm you'], 'i can warm': ['you in the'], 'can warm you': ['in the cold'], 'warm you in': ['the cold winter.'], 'you in the': ['cold winter.'], 'in the cold': ['winter.'], 'END': ['the cold winter.', 'winter.', 'cold winter.']}

生成的文本：

my dream is that i can be a house, so i can warm you in the cold winter.

代码:

 # I sperate the input.txt with space, and use dictionary to store the next three words after the current 3 words.

 # in the same time, store the first three word as the beginning, and the last three or two or one words as the end

 # how to generate output.txt: form the start, start to look for the next three words in ramdom, once meets the end, the geration is end.

 import random

 fhand=open("E:\\a2.txt",'r',encoding='UTF-8')

 dataset_file=fhand.read()

 # dataset_file='my friend makes the best raspberry pies'

 dataset_file=dataset_file.lower().split(' ')

 model={}

 for i, word in enumerate(dataset_file):

     if i == len(dataset_file) - 3:

         model['END'] = model.get('END', []) + [dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]]

         model['END'] = model.get('END', []) + [dataset_file[i + 2]]

         model['END'] = model.get('END', []) + [dataset_file[i + 1] +" "+dataset_file[i + 2]]

     elif i == 0:

         model['START'] = model.get('START', []) + [dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]]

         # model['START']=model.get('START',[])+[dataset_file[i]]

         # model['START']=model.get('START',[])+[dataset_file[i]+" "+dataset_file[i+1]]

         model[dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]] = model.get(word, []) + [

             dataset_file[i + 3] + " " + dataset_file[i + 4] + " " + dataset_file[i + 5]]

     elif i <= (len(dataset_file) - 6):

         model[dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]] = model.get(word, []) + [

             dataset_file[i + 3] + " " + dataset_file[i + 4] + " " + dataset_file[i + 5]]

     elif i == (len(dataset_file) - 5):

         model[dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]] = model.get(word, []) + [

             dataset_file[i + 3] + " " + dataset_file[i + 4]]

     elif i == (len(dataset_file) - 4):

         model[dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]] = model.get(word, []) + [

             dataset_file[i + 3]]

 print(model)

 generated = []

 while True:

     if not generated:

         words = model['START']

     elif generated[-1] in model['END']:

         break

     else:

         words = model[generated[-1]]

     generated.append(random.choice(words))

 fhand=open("E:\output.txt",'a')

 for word in generated:

     fhand.write(word+" ")

     print(word,end=' ')

3阶马尔可夫链自然语言处理python的更多相关文章

Python标准模块--functools
1 模块简介 functools,用于高阶函数:指那些作用于函数或者返回其它函数的函数,通常只要是可以被当做函数调用的对象就是这个模块的目标. 在Python 2.7 中具备如下方法, cmp_to_ ...
自然语言26_perplexity信息
http://www.ithao123.cn/content-296918.html 首页 > 技术 > 编程 > Python > Python 文本挖掘:简单的自然语言统计 ...
可爱的 Python : Python中的函数式编程，第三部分
英文原文:Charming Python: Functional programming in Python, Part 3,翻译:开源中国摘要: 作者David Mertz在其文章<可爱的 ...
Python 与 Javascript 之比较
最近由于工作的需要开始开发一些Python的东西,由于之前一直在使用Javascript,所以会不自觉的使用一些Javascript的概念,语法什么的,经常掉到坑里.我觉得对于从Javascript转 ...
python学习菜单
一.python简介二.python字符串三.列表四.集合.元组.字典五.函数六.python 模块七.python 高阶函数八.python 装饰器九.python 迭代器与生成器 ...
Python 与 Javascript 比较
最近由于工作的需要开始开发一些Python的东西,由于之前一直在使用Javascript,所以会不自觉的使用一些Javascript的概念,语法什么的,经常掉到坑里.我觉得对于从Javascript转 ...
时间序列算法理论及python实现（1-算法理论部分）
如果你在寻找时间序列是什么?如何实现时间序列?那么请看这篇博客,将以通俗易懂的语言,全面的阐述时间序列及其python实现. 就餐饮企业而言,经常会碰到如下问题. 由于餐饮行业是胜场和销售同时进行的, ...
MyFirstDay(附6篇python亲历面试题)
一直以来都是在看别人写的内容,学习前辈们的经验,总感觉自己好像没有什么值得拿出来分享和交流的知识,最近在准备换工作(python后端开发),坐标上海,2019年3月,半个月面了6家(感觉效率是真不高. ...
Python 练习汇总
1. Python练习_Python初识_day1 2. Python练习_Python初识_day2 3. Python练习_初识数据类型_day3 4. Python练习_数据类型_day4 5. ...

随机推荐

大数据 - hadoop - HDFS+Zookeeper实现高可用
高可用(Hign Availability,HA) 一.概念作用:用于解决负载均衡和故障转移(Failover)问题. 问题描述:一个NameNode挂掉,如何启动另一个NameNode.怎样让两个 ...
Failed to load ApplicationContext ，Error creating bean with name 'adminUserService': Injection of autowired dependencies failed;
Druid配置的时候出现这个问题: "C:\Program Files\Java\jdk1.8.0_191\bin\java" -ea -Didea.test.cyclic.buf ...
DPDK- program_guide 2
Data Plane Development Kit(DPDK) RTE_SDK and RTE_TARGET must be configured. ~EAL ~librte_mempool ~li ...
C语言权威指南和书单 - 专家级别
注: 点击标题即可下载 1. Advanced Programming in the UNIX Environment, 3rd Edition 2. Essential C 3. Computer ...
USB接口禁用与启用
前几天闺蜜淘了一台小本,但是发现计算机USB接口是禁用的,有点头疼,所以问了万能的度娘,找到了n种办法.不过这一种是适用于我的情况,简单记录一下. 解决方法:(主要就是修改注册表) 1.打开注册表编辑 ...
chrome中安装.crx后缀的离线插件
在前端开发中常常需要在chrome中安装一些插件辅助开发,比如最常用的Postman.React Developer Tools.Vue.js devtools等等...今天分享一下不需要“FQ”的插 ...
NSURLConnectionDataDelegate
#pragma mark-NSURLConnectionDataDelegate //收到回应 - (void)connection:(NSURLConnection *)connection did ...
rem,em,与px的比较用法
在Web中使用什么单位来定义页面的字体大小,至今天为止都还在激烈的争论着,有人说PX做为单位好,有人说EM优点多,还有人在说百分比方便,以至于出现了CSS Font-Size: em vs. px v ...
锋利的jQuery初学（3）
jQuery详细介绍 1,$的含义:就是一个名称符号:jquery占用了两个变量:$和jquery; 2,js与jQuery的入口函数区别 (1),js的window.onload事件是等到所有内容加 ...
mininet的学习之三----------mininet中流表应用实战
我们看的同一个视频,工整的整理见以下网址 https://blog.csdn.net/weixin_36372879/article/details/86371583#commentBox

3阶马尔可夫链 自然语言处理python

3阶马尔可夫链 自然语言处理python的更多相关文章

随机推荐

热门专题

3阶马尔可夫链自然语言处理python

3阶马尔可夫链自然语言处理python的更多相关文章