SA: 情感分析资源（Corpus、Dictionary）

先主要摘自一篇中文Survey，
http://wenku.baidu.com/view/0c33af946bec0975f465e277.html

4.2 情感分析的资源建设 4.2.1 情感分析的语料除了4.1节中三个国际/国内评测所提供的语料外,不少研究单位和个人也提供了一定规模的语料. 1. 康奈尔大学(Cornell)提供的影评数据集(http://www.cs.cornell.edu/people/pabo/movie-review-data/):由电影评论组成,其中持肯定和否定态度的各1,000篇;另外还有标注了褒贬极性的句子各5,331句,标注了主客观标签的句子各5,000句.目前影评库被广泛应用于各种粒度的,如词语、句子和篇章级情感分析研究中. 2. 伊利诺伊大学芝加哥分校(UIC)的Hu和Liu提供的产品领域的评论语料:主要包括从亚马逊和Cnet下载的五种电子产品的网络评论(包括两个品牌的数码相机,手机,MP3和DVD播放器).其中他们将这些语料按句子为单元详细标注了评价对象,情感句的极性及强度等信息.因此,该语料适合于评价对象抽取和句子级主客观识别,以及情感分类方法的研究.此外,Liu还贡献了比较句研究[74]方面的语料. 3. Janyce Wiebe等人所开发的MPQA(Multiple-Perspective QA)库:包含535篇不同视角的新闻评论,它是一个进行了深度标注的语料库.其中标注者为每个子句手工标注出一些情感信息,如观点持有者,评价对象,主观表达式以及其极性与强度.文献[75]描述了整个的标注流程.MPQA语料适合于新闻评论领域任务的研究. 4. 麻省理工学院(MIT)的Barzilay等人构建的多角度餐馆评论语料:共4,488篇,每篇语料分别按照五个角度(饭菜,环境,服务,价钱,整体体验)分别标注上1~5个等级.这组语料为单文档的基于产品属性的情感文摘提供了研究平台. 5. 国内的中科院计算所的谭松波博士提供的较大规模的中文酒店评论语料:约有10,000篇,并标注了褒贬类别,可以为中文的篇章级的情感分类提供一定的平台. 4.2.2 情感分析的词典资源情感分析发展到现在,有不少前人总结出来的情感资源,大多数表现为评价词词典资源. 1. GI(General Inquirer)评价词词典(英文,http://www.wjh.harvard.edu/～inquirer/).该词典收集了1,914个褒义词和2,293个贬义词,并为每个词语按照极性,强度,词性等打上不同的标签,便于情感分析任务中的灵活应用. 2. NTU评价词词典(繁体中文).该词典由台湾大学收集,含有2,812个褒义词与8,276个贬义词[76]. 3. 主观词词典(英文,http://www.cs.pitt.edu/mpqa/).该词典的主观词语来自OpinionFinder系统,该词典含有8,221个主观词,并为每个词语标注了词性,词性还原以及情感极性. 4. HowNet评价词词典(简体中文、英文,http://www.keenage.com/html/e_index.html).该词典包含9,193个中文评价词语/短语, 9,142个英文评价词语/短语,并被分为褒贬两类.其中,该词典提供了评价短语,为情感分析提供了更丰富的情感资源.

再补上上次总结的：
http://site.douban.com/204776/widget/notes/12599608/note/284723117/
##Datasets for SA:
###Lexicons:
[1]
The General Inquirer Lexicon
•Homepage: http://www.wjh.harvard.edu/~inquirer
•Categories
–Positive (1,915 words) and Negative (2,291 words)
–Strong vs Weak, Active vs Passive, Overstated versus Understated
–Pleasure, Pain, Virtue, Vice, Motivation, Cognitive Orientation, etc
•Free for research use
Philip J. Stone, Dexter C Dunphy, Marshall S. Smith, Daniel M. Ogilvie. 1966. The General Inquirer: A Computer Approach to Content Analysis. MIT Press.

[2]
LIWC (Linguistic Inquiry and Word Count)
•Homepage: http://www.liwc.net/
•2,300 words, > 70 classes
–Affective Processes
•negative emotion (bad, weird, hate, problem, tough)
•positive emotion (love, nice, sweet)
–Cognitive Processes
•Tentative (maybe, perhaps, guess), Inhibition (block, constraint)
–Pronouns, Negation (no, never), Quantifiers (few, many)
•$30 or $90 fee
Pennebaker, J.W., Booth, R.J., & Francis, M.E. (2007). Linguistic Inquiry and Word Count: LIWC 2007.

[3]
MPQA Subjectivity Cues Lexicon
•Homepage: http://www.cs.pitt.edu/mpqa/subj_lexicon.html
•6,885 words from 8,221 lemmas
–2,718 positive
–4,912 negative
•Each word annotated for intensity (strong, weak)
•GNU GPL
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. HLT-EMNLP-2005.
Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003

[4]
Opinion Lexicon
•Homepage: http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar
•6,786 words
–2,006 positive
–4,783 negative
•Bing Liu's Page on Opinion Mining
Minqing Hu and Bing Liu. Mining and Summarizing Customer Reviews. ACM SIGKDD-2004

[5]
SentiWordNet
•Homepage: http://sentiwordnet.isti.cnr.it/
•All WordNet synsets automatically annotated for degrees of positivity, negativity, and neutrality/objectiveness
–[estimable(J,3)] “may be computed or estimated”
•Pos 0 Neg 0 Obj 1
–[estimable(J,1)] “deserving of respect or high regard”
•Pos .75 Neg 0 Obj .25
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010 SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. LREC-2010

Sentiment Classification of Reviews Using SentiWordNet
http://arrow.dit.ie/cgi/viewcontent.cgi?article=1000&context=ittpapnin

###Corpus and Reviews:
[1]
Movie reviews
–Internet Movie Database (IMDb)
•http://www.cs.cornell.edu/people/pabo/movie-review-data/
•http://reviews.imdb.com/Reviews/
–700 positive / 700 negative

[2]
MOVIEREVIEWSET (Pang and Lee 2004)
[3]
MPQACORPUS (Wiebeet al. 2005)
[4]
PRODUCTREVIEWSET (Yi et al. 2003)

[2]-[4]
http://www.cs.uic.edu/liub/FBS/sentiment-analysis.html
http://www.cs.pitt.edu/mpqa/
http://ai.stanford.edu/amaas/data/sentiment
http://people.csail.mit.edu/jrennie/20Newsgroups

[5]
BOOKREVIEWSET (Aueand Gamon, 2005)
[6]
SENTENCESET (Kim and Hovy2004)

[7]
The J.D. Power and Associates Sentiment Corpus
http://verbs.colorado.edu/jdpacorpus/
The JDPA Corpus consists of user-generated content (blog posts) containing opinions about automobiles and digital cameras. They have been manually annotated for named, nominal, and pronominal mentions of entities. Entities are marked with the aggregate sentiment expressed toward them in the document. Mentions of each entity are marked as co-referential. Mentions are assigned semantic types consisting of the Automatic Content Extraction (ACE) mention types and additional domain-specific types. Meronymy (part-of and feature-of) and instance relations are also annotated. Expressions which convey sentiment toward an entity are annotated with the polarity of their prior and contextual sentiments as well the mentions they target. The following modifiers are annotated. These may target other modifiers or sentiment expressions

negators (expressions which invert the polarity of a sentiment expression or modifier)
neutralizers (expressions that do not commit the the speaker to the truth of the target sentiment expression or modifier)
committers (expressions which shift the commitment of the speaker toward the truth a sentiment expression or modifier)
intensifiers (expressions which shift the intensity of a sentiment expression or modifier)
Additionally, we have annotated when the opinion holder of a sentiment expression is someone other than the author of the blog by linking the expression to the holder. We also annotate when two entities are compared on a particular dimension.

The data, organized into training and testing sets, consists of 515 documents (blog posts) covering 330,762 tokens which make up 19,322 sentences. 87,532 mentions, 15,637 sentiment expressions, and 22,662 relations between entities (co-reference groups) are annotated.

Please see the included README file for more information about this data. For a more detailed explanation of the preparation of the corpus, please read The JDPA Sentiment Corpus Annotation Guidelines or The ICWSM 2010 JDPA Sentiment Corpus for the Automotive Domain.

##Packages and APIs for SA:
http://stackoverflow.com/questions/10233087/sentiment-analysis-using-r
https://sites.google.com/site/miningtwitter/questions/sentiment

##Apps for SA:
Twitteratr
Tweetfeel
Twitter sentiment / Sentiment140

SA: 情感分析资源（Corpus、Dictionary）的更多相关文章

如何使用百度EasyDL进行情感分析
使用百度EasyDL定制化训练和服务平台有一段时间了,越来越能体会到EasyDL的易用性.在此之前我也接触过不少的深度学习平台,如类脑平台.Google的GCP深度学习平台.AWS深度学习平台,但我觉 ...
朴素贝叶斯算法下的情感分析——C#编程实现
这篇文章做了什么朴素贝叶斯算法是机器学习中非常重要的分类算法,用途十分广泛,如垃圾邮件处理等.而情感分析(Sentiment Analysis)是自然语言处理(Natural Language Pr ...
Python爬虫和情感分析简介
摘要这篇短文的目的是分享我这几天里从头开始学习Python爬虫技术的经验,并展示对爬取的文本进行情感分析(文本分类)的一些挖掘结果. 不同于其他专注爬虫技术的介绍,这里首先阐述爬取网络数据动机,接着 ...
C#编程实现朴素贝叶斯算法下的情感分析
C#编程实现这篇文章做了什么朴素贝叶斯算法是机器学习中非常重要的分类算法,用途十分广泛,如垃圾邮件处理等.而情感分析(Sentiment Analysis)是自然语言处理(Natural Lang ...
pyhanlp文本分类与情感分析
语料库本文语料库特指文本分类语料库,对应IDataSet接口.而文本分类语料库包含两个概念:文档和类目.一个文档只属于一个类目,一个类目可能含有多个文档.比如搜狗文本分类语料库迷你版.zip,下载前 ...
基于 Spark 的文本情感分析
转载自:https://www.ibm.com/developerworks/cn/cognitive/library/cc-1606-spark-seniment-analysis/index.ht ...
【转】用python实现简单的文本情感分析
import jieba import numpy as np # 打开词典文件,返回列表 def open_dict(Dict='hahah',path = r'/Users/zhangzhengh ...
Spark 的情感分析
Spark 的情感分析本文描述了基于 Spark 如何构建一个文本情感分析系统.文章首先介绍文本情感分析基本概念和应用场景,其次描述采用 Spark 作为分析的基础技术平台的原因和本文使用到技术组件 ...
文本情感分析(一)：基于词袋模型(VSM、LSA、n-gram)的文本表示
现在自然语言处理用深度学习做的比较多,我还没试过用传统的监督学习方法做分类器,比如SVM.Xgboost.随机森林,来训练模型.因此,用Kaggle上经典的电影评论情感分析题,来学习如何用传统机器学习 ...

随机推荐

NIO vs. BIO
性能测试 BIO -- Blocking IO 即阻塞式IO NIO -- Non-Blocking IO, 即非阻塞式IO或异步IO 性能 -- 所谓的性能是指服务器响应客户端的能力,对于服务器我们 ...
Android 7.0 UICC 分析(四)
本文讲解SIMRecords /frameworks/opt/telephony/src/java/com/android/internal/telephony/uicc/SIMRecords.jav ...
js-url打开方式
引用自 : 老张的博客 *.location.href 用法: top.location.href="url" 在顶层页面打开url(跳出框架) self.loc ...
【转】Oracle Database PSU/CPU
转自: http://www.cnblogs.com/ebs-blog/archive/2011/07/28/2167232.html 1. 什么是PSU/CPU?CPU: Critical Patc ...
C#设置字体（FontDIalog）、颜色（ColorDialog）对话框控件
设置字体控件为FontDialog,设置颜色的控件为ColorDialog.这两个控件的使用和OpenFileDialog(打开文件)及FolderBroswerDialog(打开文件夹)的使用类似. ...
Quartz.net 定时调度CronTrigger时间配置格式说明
1. CronTrigger时间格式配置说明 CronTrigger配置格式: 格式: [秒] [分] [小时] [日] [月] [周] [年] 序号说明是否必填允许填写的值允许的通配符 ...
使用angularJS遇见的一些问题的解决方案
1. angularJS的$http.post请求,SpringMVC后台接收不到参数值的解决方案问题一般为:400 Required String parameter 'rPassword' is ...
js中设置元素class的三种方法小结
一.el.setAttribute('class','abc'); 代码如下: .abc { background: red; } test div var div = document.getEl ...
ExtJs知识点概述
1.前言 ExtJS的前身是YUI(Yahoo User Interface).经过不断的发展与改进,ExtJS现在已经成功发布到了ExtJS 6版本,是一套目前最完整和最成熟的javascript基 ...
学习总结relative和absolute
之前对于absolute和relative不了解,现在整理先设置relative再设置absolute 因为父不设置relative 那么子会向上寻找祖先元素,看是否设置relative.如果有则相 ...

SA: 情感分析资源（Corpus、Dictionary）

SA: 情感分析资源（Corpus、Dictionary）的更多相关文章

随机推荐

热门专题