1. 分词 word segmentation

国内有jieba 分词

2. Named Entity Recognition

  1. 训练自己的Model

      

How can I train my own NER model

https://nlp.stanford.edu/software/crf-faq.html#a

C:\my_study\ML\NLP\stanford-ner--->java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop chinese.meal.fpp.prop
Invoked on Thu Mar :: CST with arguments: -prop chinese.meal.fpp.prop
usePrevSequences=true
useClassFeature=true
useTypeSeqs2=true
useSequences=true
wordShape=chris2useLC
useTypeySequences=true
useDisjunctive=true
noMidNGrams=true
serializeTo=ner-model.ser.gz
maxNGramLeng=
useNGrams=true
usePrev=true
useNext=true
maxLeft=
trainFile=chinese.meal.fpp.tsv
map=word=,answer=
useWord=true
useTypeSeqs=true
numFeatures =
Time to convert docs to feature indices: 0.0 seconds
numClasses: [=O,=TIME,=QUANTITY,=UNIT,=FOOD]
numDocuments:
numDatums:
numFeatures:
Time to convert docs to data/labels: 0.0 seconds
numWeights:
QNMinimizer called on double function of variables, using M = .
An explanation of the output:
Iter The number of iterations
evals The number of function evaluations
SCALING <D> Diagonal scaling was used; <I> Scaled Identity
LINESEARCH [## M steplength] Minpack linesearch
-Function value was too high
-Value ok, gradient positive, positive curvature
-Value ok, gradient negative, positive curvature
-Value ok, gradient negative, negative curvature
[.. B] Backtracking
VALUE The current function value
TIME Total elapsed time
|GNORM| The current norm of the gradient
{RELNORM} The ratio of the current to initial gradient norms
AVEIMPROVE The average improvement / current value
EVALSCORE The last available eval score Iter ## evals ## <SCALING> [LINESEARCH] VALUE TIME |GNORM| {RELNORM} AVEIMPROVE EVALSCORE Iter evals <D> [M 1.000E-1] 9.068E2 .04s |4.550E1| {4.995E-1} 0.000E0 -
Iter evals <D> [M 1.000E0] 6.222E2 .05s |3.525E1| {3.870E-1} 2.287E-1 -
Iter evals <D> [M 1.000E0] 2.386E2 .07s |5.406E1| {5.935E-1} 9.334E-1 -
Iter evals <D> [M 1.000E0] 9.082E1 .08s |1.571E1| {1.724E-1} 2.246E0 -
Iter evals <D> [M 1.000E0] 7.031E1 .10s |1.181E1| {1.297E-1} 2.379E0 -
Iter evals <D> [M 1.000E0] 5.308E1 .11s |1.025E1| {1.125E-1} 2.681E0 -
Iter evals <D> [1M 2.740E-1] 2.988E1 .14s |7.586E0| {8.328E-2} 4.193E0 -
Iter evals <D> [1M 1.292E-1] 2.234E1 .16s |6.471E0| {7.105E-2} 4.949E0 -
Iter evals <D> [1M 1.801E-1] 1.615E1 .18s |5.573E0| {6.118E-2} 6.127E0 -
Iter evals <D> [1M 1.815E-1] 1.218E1 .24s |4.477E0| {4.915E-2} 7.346E0 -
Iter evals <D> [1M 3.119E-1] 8.873E0 .30s |4.694E0| {5.154E-2} 6.912E0 -
Iter evals <D> [1M 4.760E-1] 6.621E0 .31s |2.092E0| {2.296E-2} 3.504E0 -
Iter evals <D> [M 1.000E0] 6.093E0 .32s |1.906E0| {2.092E-2} 1.390E0 -
Iter evals <D> [M 1.000E0] 5.844E0 .33s |9.067E-1| {9.955E-3} 1.103E0 -
Iter evals <D> [M 1.000E0] 5.721E0 .33s |5.774E-1| {6.339E-3} 8.279E-1 -
Iter evals <D> [M 1.000E0] 5.660E0 .34s |3.535E-1| {3.881E-3} 4.279E-1 -
Iter evals <D> [M 1.000E0] 5.640E0 .35s |1.946E-1| {2.137E-3} 2.961E-1 -
Iter evals <D> [M 1.000E0] 5.632E0 .36s |7.832E-2| {8.599E-4} 1.868E-1 -
Iter evals <D> [M 1.000E0] 5.631E0 .38s |3.559E-2| {3.907E-4} 1.163E-1 -
Iter evals <D> [M 1.000E0] 5.631E0 .39s |2.149E-2| {2.359E-4} 5.758E-2 -
Iter evals <D> [M 1.000E0] 5.631E0 .41s |1.027E-2| {1.128E-4} 1.758E-2 -
Iter evals <D> [M 1.000E0] 5.631E0 .42s |3.631E-3| {3.986E-5} 8.218E-3 -
Iter evals <D> [M 1.000E0] 5.631E0 .44s |1.629E-3| {1.789E-5} 3.791E-3 -
Iter evals <D> [M 1.000E0] 5.631E0 .45s |9.548E-4| {1.048E-5} 1.596E-3 -
Iter evals <D> [M 1.000E0] 5.631E0 .45s |5.724E-4| {6.284E-6} 5.196E-4 -
Iter evals <D> [M 1.000E0] 5.631E0 .47s |1.578E-4| {1.732E-6} 1.686E-4 -
QNMinimizer terminated due to average improvement: | newest_val - previous_val | / |newestVal| < TOL
Total time spent in optimization: .49s
CRFClassifier training ... done [0.6 sec].
Serializing classifier to ner-model.ser.gz... done.

2. 使用训练好的Model来evaluate 一下,看看效果怎么样.

C:\my_study\ML\NLP\stanford-ner--->java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier ner-model.ser.gz -testFile chinese.meal.fpp.test.tsv
Invoked on Thu Mar :: CST with arguments: -loadClassifier ner-model.ser.gz -testFile chinese.meal.fpp.test.tsv
testFile=chinese.meal.fpp.test.tsv
loadClassifier=ner-model.ser.gz
Loading classifier from ner-model.ser.gz ... done [0.1 sec].
我 O O
今天 O O
晚上 TIME TIME
吃 O O
了 O O
两 QUANTITY QUANTITY
盘 UNIT UNIT
回锅肉 FOOD FOOD CRFClassifier tagged words in documents at 88.89 words per second.
Entity P R F1 TP FP FN
FOOD 1.0000 1.0000 1.0000
QUANTITY 1.0000 1.0000 1.0000
TIME 1.0000 1.0000 1.0000
UNIT 1.0000 1.0000 1.0000
Totals 1.0000 1.0000 1.0000

还不错哦!

Ref:

1. Standford NLP NER: https://nlp.stanford.edu/software/CRF-NER.html

Food Log with Speech Recognition and NLP的更多相关文章

  1. 论文翻译:2015_DNN-Based Speech Bandwidth Expansion and Its Application to Adding High-Frequency Missing Features for Automatic Speech Recognition of Narrowband Speech

    论文地址:基于DNN的语音带宽扩展及其在窄带语音自动识别中加入高频缺失特征的应用 论文代码:github 博客作者:凌逆战 博客地址:https://www.cnblogs.com/LXP-Never ...

  2. Utterance-Wise Recurrent Dropout And Iterative Speaker Adaptation For Robust Monaural Speech Recognition

    单声道语音识别的逐句循环Dropout迭代说话人自适应     WRBN(wide residual BLSTM network,宽残差双向长短时记忆网络) [2] J. Heymann, L. Dr ...

  3. FPGA 17最佳论文导读 ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA

    欢迎转载,转载请注明:本文出自Bin的专栏blog.csdn.net/xbinworld. 技术交流QQ群:433250724,欢迎对算法.机器学习技术感兴趣的同学加入. 后面陆续写一些关于神经网络加 ...

  4. [翻译]Review——How to do Speech Recognition with Deep Learning

    原文地址:https://medium.com/@ageitgey/machine-learning-is-fun-part-6-how-to-do-speech-recognition-with-d ...

  5. Speech Recognition Grammar Specification Version 1.0 JavaScript TTS 文本发音

    Speech Recognition Grammar Specification Version 1.0 https://www.w3.org/TR/speech-grammar/ W3C Recom ...

  6. 论文阅读笔记“Attention-based Audio-Visual Fusion for Rubust Automatic Speech recognition”

    关于论文的阅读笔记 论文的题目是“Attention-based Audio-Visual Fusion for Rubust Automatic Speech recognition”,翻译成中文为 ...

  7. Speech Recognition Java Code - HMM VQ MFCC ( Hidden markov model, Vector Quantization and Mel Filter Cepstral Coefficient)

    Hi everyone,I have shared speech recognition code inhttps://github.com/gtiwari333/speech-recognition ...

  8. C#的语音识别 using System.Speech.Recognition;

    using System; using System.Collections.Generic; using System.Linq; using System.Speech.Recognition; ...

  9. 第三篇:ASR(Automatic Speech Recognition)语音识别

    ASR(Automatic Speech Recognition)语音识别: 百度语音--语音识别--python SDK文档: https://ai.baidu.com/docs#/ASR-Onli ...

随机推荐

  1. configure: error: no acceptable C compiler found in $PATH 问题解决

    解决办法: 安装GCC软件套件 [root@localhost ~]# yum install gccLoaded plugins: fastestmirrorLoading mirror speed ...

  2. 【CH6901】骑士放置

    题目大意:给定一个 N*M 的棋盘,有一些格子禁止放棋子.问棋盘上最多能放多少个不能互相攻击的骑士(国际象棋的"骑士",类似于中国象棋的"马",按照" ...

  3. UOJ176 新年的繁荣

    题目链接 Boruvka生成树算法 \(Boruvka\)算法就是先把每个点看作一个联通块,然后不断在联通块之间找最优的边进行合并.因为每次联通块的数量最少缩小一半.所以合并次数是\(log\)的 先 ...

  4. ImageMagick - 智能的灰度空间(GRAYColorspace)让人窒息

    今天在处理一张 gray.jpg 图片时,发现生成的图片色彩空间是: GRAYColorspace 可我在代码中明明设置了: MagickWand * mw = NewMagickWand (); M ...

  5. 腾讯云centos安装python3.6和pip

    不知道腾讯云的centos和阿里云的centos一不一样,反正两个云平台的Ubuntu系统是不一样的,照着同样的教程敲,往往掉坑里. 安装一些centos依赖库: 这一步很关键,很多报错往往都因为少了 ...

  6. POJ3734Blocks(递推+矩阵快速幂)

    题目链接:http://poj.org/problem?id=3734 题意:给出n个排成一列的方块,用红.蓝.绿.黄四种颜色给它们染色,求染成红.绿的方块个数同时为偶数的方案数模10007的值. 题 ...

  7. R语音:解决cor.test报错的 'y'必需是数值矢量

    'y'必需是数值矢量,产生该类报错可能是含有NA值. 只需要在该数值上加入as.double函数即可.见下命令: ##先测试是不是数值型 is.numeric(data[,2]) #[1] FALSE ...

  8. 关于递推算法求解约瑟夫环问题P(n,m,k,s)

    一. 问题描述 已知n个人,分别以编号1,2,3,...,n表示,围坐在一张圆桌周围.从编号为k的人开始报数1,数到m的那个人出列:他的下一个人又从1开始报数,数到m的那个人又出列:依此规律重复下去, ...

  9. template specifiers not specified in declaration of ‘template<class Key> class hash’

    尝试写显示特化样例的时候,写了如下代码 #include <iostream> #include <cstddef> using namespace std; #define ...

  10. (set stringstream)单词数 hdu2072

    单词数 Time Limit: 1000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others) Total Submis ...