一年之前,我做梦也想不到会来这里写技术总结.误打误撞来到了上海西南某高校,成为了文科专业的工科男,现在每天除了膜ha,就是恶补CS.导师是做计算语言学的,所以当务之急就是先自学计算机自然语言处理,打好底子准备做科研(认真脸). 进入正题,从图书馆找了本“Natural Language Processing with Python” (影印版),书长这个样子,作者是Steven Bird, Ewan Klein和Edward Loper.粘贴个豆瓣链接供参考:https://book.douba…
spaCy is a library for advanced natural language processing in Python and Cython. spaCy is built on the very latest research, but it isn't researchware. It was designed from day one to be used in real products. spaCy currently supports English, Germa…
-<Natural Language Processing with Python> 链接:https://pan.baidu.com/s/1_oalRiUEw6bXbm2dy5q_0Q 密码:r318…
Spoken input (top left) is analyzed, words are recognized, sentences are parsed and interpreted in context, application-specific actions take place (top right); a response is planned, realized as a syntactic structure, then to suitably inflected word…
用Enthought Canopy作图果然方便.昨天频频出现无法识别pylab模块的异常,今天终于搞好了.以下是今天出来的图:…
[解释] The dimension of word vectors is usually smaller than the size of the vocabulary. Most common sizes for word vectors ranges between 50 and 400. [解释] 过用t-SNE算法来将单词可视化.t-SNE算法所做的就是把这些n维的数据用一种非线性的方式映射到2维平面上,可以得知t-SNE中这种映射很复杂而且很非线性. [解释] Yes, word v…
Speech and Natural Language Processing obtain from this link: https://github.com/edobashira/speech-language-processing A curated list of speech and natural language processing resources. Other lists can be found in this list. If you want to contribut…
https://www.programmableweb.com/news/how-5-natural-language-processing-apis-stack/analysis/2014/07/28 The world is awash in digital data. The challenge: making sense of that data. To tackle that challenge, a growing number of companies are turning to…
第二周 自然语言处理与词嵌入(Natural Language Processing and Word Embeddings) 词汇表征(Word Representation) 上周我们学习了 RNN.GRU 单元和 LSTM 单元.本周你会看到我们如何把这些知识用到 NLP 上,用于自然语言处理,深度学习已经给这一领域带来了革命性的变革.其中一个很关键的概念就是词嵌入(word embeddings),这是语言表示的一种方式,可以让算法自动的理解一些类似的词,比如男人对女人,比如国王对王后,…
第二周 自然语言处理与词嵌入(Natural Language Processing and Word Embeddings) 2.1 词汇表征(Word Representation) 词汇表示,目前为止一直都是用词汇表来表示词,上周提到的词汇表,可能是 10000 个单词,我们一直用 one-hot 向量来表示词.这种表示方法的一大缺点就是它把每个词孤立起来,这样使得算法对相关词的泛化能力不强. 换一种表示方式会更好,如果不用 one-hot 表示,而是用特征化的表示来表示每个词,man,w…