Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper
the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and two corpora, previously annotated by readability level.
hyphenator:
h_en.pairs('beautiful'
[['beau', 'tiful'], [u'beauti', 'ful']]
the system extracts 52 features, grouped in 7 groups: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, and some extra features.
语言:葡萄牙语
one based on a five-levels scale
(A1, A2, B1, B2, C1)
a second experiment based in a simplified
three-levels scale (A, B and C)
3 nlp工具
STRING:相当于葡萄牙语的nltk
The YAH Hyphenator:This is a rule-based system that applies
various word processing division rules.
hypotaxis 从属结构
parataxis 并列结构
4 特征
The set of 52 features extracted by the system consists
in: (i) part-of-speech (POS) tags, chunks, words
and sentences features; (ii) verb features and different
metrics involving averages and frequencies; (iii)
several metrics involving syllables; and (iv) extra features.
名词、命名体识别对文本理解很重要
句法结构:名词短语、介词短语
助动词可以形成更长更复杂的动词链
hypotaxis 从属结构
parataxis 并列结构
Word frequency:unigram-based,拉普拉斯平滑
动词、名词比例,句长
Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper的更多相关文章
- DL4NLP —— seq2seq+attention机制的应用:文档自动摘要(Automatic Text Summarization)
两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...
- Measuring Text Difficulty Using Parse-Tree Frequency
https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf In p ...
- OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification-paper
这篇论文的related work非常详尽地介绍了各种readability的语料 abstract这个paper描述了onestopengilish这个三个level的文本语料的收集和整理,阐述了再 ...
- hiho一下 第一百零七周 Give My Text Back(微软笔试题)
题目1 : Give My Text Back 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exam Littl ...
- Give My Text Back
Give My Text Back 标签(空格分隔): 算法 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exa ...
- Text Style Transfer论文笔记
Text Style Transfer主要是指Non-Parallel Data条件下的,具体的paper list见: https://github.com/fuzhenxin/Style-Tran ...
- Official Program for CVPR 2015
From: http://www.pamitc.org/cvpr15/program.php Official Program for CVPR 2015 Monday, June 8 8:30am ...
- SAP常用命令及BASIS操作
Pfcg 角色,权限参数文件配置Su53 查看权限对象 st01 跟踪St22 看dump,以分析错误 eg.找到ABAP程序出错的地方,找出fou ...
- SAP ECC FI配置文档
SAP ECC 6.0 Configuration Document Financial Accounting (FI) Table of Content TOC \O "1-2" ...
随机推荐
- import 和 from … import 模块的变量、方法引用差异
import 和 from … import 模块的变量.方法引用差异 还是上面例子中的模块 support.py: def print_func( par ): print "Hello ...
- 字符串和数组----vector
vector能容纳绝大多数类型的对象作为其元素,但是因为引用不是对象,所以不存在包含引用的vector. 使用vector需要包含头文件vector. 1.初始化vector对象的方法 1)vecto ...
- ubuntu compile openjdk87
0. use oracle JDK,not OpenJDK 1. 遇到错误Error:./gamma: relocation error: /usr/lib/jvm/java-7-openjdk-am ...
- RegDataToDataType
function RegDataToDataType(Value: TRegDataType): Integer; begin case Value of rdString: Result := RE ...
- html回顾随笔JS(*^__^*)
---恢复内容开始--- map遍历 function b(){ var week = new Map(); week.set("Mon","星期一"); we ...
- 2018-2019-2 20175224 实验三《敏捷开发与XP实验》实验报告
一.实验报告封面 课程:Java程序设计 班级:1752班 姓名:艾星言 学号:20175224 指导教师:娄嘉鹏 实验日期:2019年4月29日 实验时间:13:45 - 15:25 实验序号:24 ...
- 前端select动态加载
<select id ="ycode" cssclass="form-control selectpicker" name="ydljgId&q ...
- 18-10-15 服务器删除数据的方法【Elasticsearch 数据删除 (delete_by_query 插件安装使用)】方法二没有成功
rpa 都是5.xx ueba 分为2.0 或者5.0 上海吴工删除数据的方法 在许多项目中,用户提供的数据存储盘大小有限,在运行一段时间后,大小不够就需要删除历史的 Elasticsearch 数 ...
- 18-10-18 Python 思维导图 很棒的
赠送 14 张 Python 知识点思维导图 来源 | Python学习联盟 本文主要涵盖了 Python 编程的核心知识(暂不包括标准库及第三方库). 按顺序依次展示了以下内容的一系列思维导图: ...
- python基础举例应用
将下述两个变量的值交换s1='alex's2='SB's1,s2=s2,s1print(s1,s2) 判断下述结果msg1='alex say my name is alex,my age is 73 ...