the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and two corpora, previously annotated by readability level.

hyphenator:

h_en.pairs('beautiful'
[['beau', 'tiful'], [u'beauti', 'ful']]

the system extracts 52 features, grouped in 7 groups: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, and some extra features.

语言:葡萄牙语

one based on a five-levels scale
(A1, A2, B1, B2, C1)
a second experiment based in a simplified
three-levels scale (A, B and C)

3 nlp工具
STRING:相当于葡萄牙语的nltk
The YAH Hyphenator:This is a rule-based system that applies
various word processing division rules.

hypotaxis 从属结构
parataxis 并列结构

4 特征
The set of 52 features extracted by the system consists
in: (i) part-of-speech (POS) tags, chunks, words
and sentences features; (ii) verb features and different
metrics involving averages and frequencies; (iii)
several metrics involving syllables; and (iv) extra features.

名词、命名体识别对文本理解很重要
句法结构:名词短语、介词短语
助动词可以形成更长更复杂的动词链
hypotaxis 从属结构
parataxis 并列结构
Word frequency:unigram-based,拉普拉斯平滑
动词、名词比例,句长

Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper的更多相关文章

  1. DL4NLP —— seq2seq+attention机制的应用:文档自动摘要(Automatic Text Summarization)

    两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...

  2. Measuring Text Difficulty Using Parse-Tree Frequency

    https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf In p ...

  3. OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification-paper

    这篇论文的related work非常详尽地介绍了各种readability的语料 abstract这个paper描述了onestopengilish这个三个level的文本语料的收集和整理,阐述了再 ...

  4. hiho一下 第一百零七周 Give My Text Back(微软笔试题)

    题目1 : Give My Text Back 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exam Littl ...

  5. Give My Text Back

    Give My Text Back 标签(空格分隔): 算法 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exa ...

  6. Text Style Transfer论文笔记

    Text Style Transfer主要是指Non-Parallel Data条件下的,具体的paper list见: https://github.com/fuzhenxin/Style-Tran ...

  7. Official Program for CVPR 2015

    From:  http://www.pamitc.org/cvpr15/program.php Official Program for CVPR 2015 Monday, June 8 8:30am ...

  8. SAP常用命令及BASIS操作

    Pfcg         角色,权限参数文件配置Su53        查看权限对象  st01  跟踪St22         看dump,以分析错误  eg.找到ABAP程序出错的地方,找出fou ...

  9. SAP ECC FI配置文档

    SAP ECC 6.0 Configuration Document Financial Accounting (FI) Table of Content TOC \O "1-2" ...

随机推荐

  1. 正则表达式test()和exec()、 search() 和 replace()用法实例

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...

  2. 关于datetimepicker只显示年、月、日的设置

    如下是只显示月的sample code: <link rel="stylesheet" href="css/datetimepicker/bootstrap-dat ...

  3. flask-security(一)快速入门

    很多例程都是基于flask-sqlalchemy的. 但是我使用sqlalchemy,并没有使用sqlalchemy,看中的也就是flask的灵活性. 暂时写flask的程序,但是为了以后写别的程序方 ...

  4. selenium(七)webdriverwait,高级等待,替代sleep

    #coding=utf-8 from selenium import webdriver from selenium.webdriver.common.by import By from seleni ...

  5. C++基础知识:继承

    1.继承的概念 面向对象中的继承指类之间的父子关系子类拥有父类的所有成员变量和成员函数子类就是一种特殊的父类子类对象可以当作父类对象使用子类可以拥有父类没有的方法和属性 2.C++中的访问级别与继承p ...

  6. DevExpress v18.1新版亮点——CodeRush for VS篇(二)

    用户界面套包DevExpress v18.1日前正式发布,本站将以连载的形式为大家介绍各版本新增内容.本文将介绍了CodeRush for Visual Studio v18.1 的新功能,快来下载试 ...

  7. Java基础-语法定义

    Java三个体系 Java SE(Java Platform,Standard Edition).Java SE 以前称为 J2SE.它允许开发和部署在桌面.服务器.嵌入式环境和实时环境中使用的 Ja ...

  8. 通过泛型获得继承类的类原型getGenericSuperclass

    首先贴上代码 package com; import java.lang.reflect.ParameterizedType; import java.lang.reflect.Type; /** * ...

  9. win8 下面 listen 的队列长度貌似无效了 上c/s 代码 并附截图,有图有真相

    #include <WinSock2.h> #include <stdio.h> #include <windows.h> DWORD ServerRoutine( ...

  10. Problem B: 重载函数:max

    Description 编写两个名为max的函数,它们是重载函数 ,用于求两个整数或实数的最大值.它们的原型分别是: int max(int a,int b); double max(double a ...