the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and two corpora, previously annotated by readability level.

hyphenator:

h_en.pairs('beautiful'
[['beau', 'tiful'], [u'beauti', 'ful']]

the system extracts 52 features, grouped in 7 groups: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, and some extra features.

语言:葡萄牙语

one based on a five-levels scale
(A1, A2, B1, B2, C1)
a second experiment based in a simplified
three-levels scale (A, B and C)

3 nlp工具
STRING:相当于葡萄牙语的nltk
The YAH Hyphenator:This is a rule-based system that applies
various word processing division rules.

hypotaxis 从属结构
parataxis 并列结构

4 特征
The set of 52 features extracted by the system consists
in: (i) part-of-speech (POS) tags, chunks, words
and sentences features; (ii) verb features and different
metrics involving averages and frequencies; (iii)
several metrics involving syllables; and (iv) extra features.

名词、命名体识别对文本理解很重要
句法结构:名词短语、介词短语
助动词可以形成更长更复杂的动词链
hypotaxis 从属结构
parataxis 并列结构
Word frequency:unigram-based,拉普拉斯平滑
动词、名词比例,句长

Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper的更多相关文章

  1. DL4NLP —— seq2seq+attention机制的应用:文档自动摘要(Automatic Text Summarization)

    两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...

  2. Measuring Text Difficulty Using Parse-Tree Frequency

    https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf In p ...

  3. OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification-paper

    这篇论文的related work非常详尽地介绍了各种readability的语料 abstract这个paper描述了onestopengilish这个三个level的文本语料的收集和整理,阐述了再 ...

  4. hiho一下 第一百零七周 Give My Text Back(微软笔试题)

    题目1 : Give My Text Back 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exam Littl ...

  5. Give My Text Back

    Give My Text Back 标签(空格分隔): 算法 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exa ...

  6. Text Style Transfer论文笔记

    Text Style Transfer主要是指Non-Parallel Data条件下的,具体的paper list见: https://github.com/fuzhenxin/Style-Tran ...

  7. Official Program for CVPR 2015

    From:  http://www.pamitc.org/cvpr15/program.php Official Program for CVPR 2015 Monday, June 8 8:30am ...

  8. SAP常用命令及BASIS操作

    Pfcg         角色,权限参数文件配置Su53        查看权限对象  st01  跟踪St22         看dump,以分析错误  eg.找到ABAP程序出错的地方,找出fou ...

  9. SAP ECC FI配置文档

    SAP ECC 6.0 Configuration Document Financial Accounting (FI) Table of Content TOC \O "1-2" ...

随机推荐

  1. Java 8 默认方法(Default Methods)

    Java 8 默认方法(Default Methods) Posted by Ebn Zhang on December 20, 2015 Java 8 引入了新的语言特性——默认方法(Default ...

  2. 把旧系统迁移到.Net Core 2.0 日记(9) -- T4 Template

    想着用T4 Template 自动生成代码,省了功夫. 发现T4 Template 挺笨的. 我开始这样写是会报错的  <#  var modualName = "CRM"  ...

  3. Win10系列:VC++媒体播放

    媒体播放包括视频播放和音频播放,在开发Windows应用商店应用的过程中可以使用MediaElement控件来播放视频文件和音频文件.本节将通过一个具体的示例介绍如何使用MediaElement控件来 ...

  4. 怎样判断JS对象中的属性

    // 如何在不访问属性值的情况下判断对象中是否存在这个属性 var obj = { a: 2 }; Object.defineProperty( obj, 'b', // 让 b 不可枚举 { enu ...

  5. what’s this?

    jdk,jre,jvm三者区别:JDK: (Java Development ToolKit) java开发工具包.JDK是整个java的核心! 包括了java运行环境 JRE(Java Runtim ...

  6. 【原创】<Debug> “duplicate connection name”

    [Problem] duplicate connection name [Solution] 在Qt上使用SQLite的时候,如果第二次使用QSqlDatabase::addDatabase()方式时 ...

  7. 稀疏 部分 Checkout

    To easily select only the items you want for the checkout and force the resulting working copy to ke ...

  8. MYSQL基础知识小盲区

    MYSQL必会的知识 命令行 启动mysql:     mysql  -u用户名 -p密码 显示表中的各列详细信息:    show columns form tablename  等价于  desc ...

  9. 初始JSP

    什么是JSP 1.JSP(Java Server Pages):在HTML中嵌入Java脚本代码 静态内容是JSP页面中的静态文本,其基本是HTML文本,与Java和JSP语法无关. 例子: 运行结果 ...

  10. jmeter4.0 源码编译 二次开发

    准备: 1.jmeter4.0源码 - apache-jmeter-4.0_src.zip 2.IDE Eclipse - Oxygen.3 Release (4.7.3) 3.JDK - 1.8.0 ...