Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper
the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and two corpora, previously annotated by readability level.
hyphenator:
h_en.pairs('beautiful'
[['beau', 'tiful'], [u'beauti', 'ful']]
the system extracts 52 features, grouped in 7 groups: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, and some extra features.
语言:葡萄牙语
one based on a five-levels scale
(A1, A2, B1, B2, C1)
a second experiment based in a simplified
three-levels scale (A, B and C)
3 nlp工具
STRING:相当于葡萄牙语的nltk
The YAH Hyphenator:This is a rule-based system that applies
various word processing division rules.
hypotaxis 从属结构
parataxis 并列结构
4 特征
The set of 52 features extracted by the system consists
in: (i) part-of-speech (POS) tags, chunks, words
and sentences features; (ii) verb features and different
metrics involving averages and frequencies; (iii)
several metrics involving syllables; and (iv) extra features.
名词、命名体识别对文本理解很重要
句法结构:名词短语、介词短语
助动词可以形成更长更复杂的动词链
hypotaxis 从属结构
parataxis 并列结构
Word frequency:unigram-based,拉普拉斯平滑
动词、名词比例,句长
Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper的更多相关文章
- DL4NLP —— seq2seq+attention机制的应用:文档自动摘要(Automatic Text Summarization)
两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...
- Measuring Text Difficulty Using Parse-Tree Frequency
https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf In p ...
- OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification-paper
这篇论文的related work非常详尽地介绍了各种readability的语料 abstract这个paper描述了onestopengilish这个三个level的文本语料的收集和整理,阐述了再 ...
- hiho一下 第一百零七周 Give My Text Back(微软笔试题)
题目1 : Give My Text Back 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exam Littl ...
- Give My Text Back
Give My Text Back 标签(空格分隔): 算法 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exa ...
- Text Style Transfer论文笔记
Text Style Transfer主要是指Non-Parallel Data条件下的,具体的paper list见: https://github.com/fuzhenxin/Style-Tran ...
- Official Program for CVPR 2015
From: http://www.pamitc.org/cvpr15/program.php Official Program for CVPR 2015 Monday, June 8 8:30am ...
- SAP常用命令及BASIS操作
Pfcg 角色,权限参数文件配置Su53 查看权限对象 st01 跟踪St22 看dump,以分析错误 eg.找到ABAP程序出错的地方,找出fou ...
- SAP ECC FI配置文档
SAP ECC 6.0 Configuration Document Financial Accounting (FI) Table of Content TOC \O "1-2" ...
随机推荐
- Java 8 默认方法(Default Methods)
Java 8 默认方法(Default Methods) Posted by Ebn Zhang on December 20, 2015 Java 8 引入了新的语言特性——默认方法(Default ...
- 把旧系统迁移到.Net Core 2.0 日记(9) -- T4 Template
想着用T4 Template 自动生成代码,省了功夫. 发现T4 Template 挺笨的. 我开始这样写是会报错的 <# var modualName = "CRM" ...
- Win10系列:VC++媒体播放
媒体播放包括视频播放和音频播放,在开发Windows应用商店应用的过程中可以使用MediaElement控件来播放视频文件和音频文件.本节将通过一个具体的示例介绍如何使用MediaElement控件来 ...
- 怎样判断JS对象中的属性
// 如何在不访问属性值的情况下判断对象中是否存在这个属性 var obj = { a: 2 }; Object.defineProperty( obj, 'b', // 让 b 不可枚举 { enu ...
- what’s this?
jdk,jre,jvm三者区别:JDK: (Java Development ToolKit) java开发工具包.JDK是整个java的核心! 包括了java运行环境 JRE(Java Runtim ...
- 【原创】<Debug> “duplicate connection name”
[Problem] duplicate connection name [Solution] 在Qt上使用SQLite的时候,如果第二次使用QSqlDatabase::addDatabase()方式时 ...
- 稀疏 部分 Checkout
To easily select only the items you want for the checkout and force the resulting working copy to ke ...
- MYSQL基础知识小盲区
MYSQL必会的知识 命令行 启动mysql: mysql -u用户名 -p密码 显示表中的各列详细信息: show columns form tablename 等价于 desc ...
- 初始JSP
什么是JSP 1.JSP(Java Server Pages):在HTML中嵌入Java脚本代码 静态内容是JSP页面中的静态文本,其基本是HTML文本,与Java和JSP语法无关. 例子: 运行结果 ...
- jmeter4.0 源码编译 二次开发
准备: 1.jmeter4.0源码 - apache-jmeter-4.0_src.zip 2.IDE Eclipse - Oxygen.3 Release (4.7.3) 3.JDK - 1.8.0 ...