Measuring Text Difficulty Using Parse-Tree Frequency

https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf

In previous work, we conducted a preliminary corpus study of grammar frequency which showed that difficult texts use a wider variety of high-level grammatical structures (Kauchak et al., 2012). However, because of the large number of structural variations possible, no clear indication was found showing specific structures predominantly appearing in either easy or difficult documents.

In this work, we propose a much more fine-grained analysis. We propose a measure of text difficulty based on grammatical frequency and show how it can be used to identify sentences with difficult syntactic structures. In particular, the grammatical difficulty of a sentence is measured based on the frequency of occurrence of the top-level parse tree structure of the sentence in a large corpus

根据term familiarity创造了grammer familiarity的概念：

Grammar familiarity is measured as the frequency of the 3rd level sentence parse tree

实验:

将wiki根据第三等级的parse tree分成了11 bins，计算每个bin出现的频率；

然后每个bin随机挑选20个句子（抛去句子长度和ter familiarity的影响），招募了三十几个人对句子评分（5 points）以及完形填空；

结果发现，即使表现看不出区别的句子，3rd parse tree出现频率越高的句子，评分越简单，所花时间越短

假设：

examine how grammatical frequency impacts the difficulty of a sentence and introduce a new measure of sentence-level text difficulty based on the grammatical structure of the sentence.

句子难度氛围perceived和actual

Our work here makes a step towards better simplification tools by 1) introducing a sentence-level, data-driven approach for measuring the grammatical difficulty of a sentence and 2) specifically measuring the impact of this measure using both how difficult a sentence looks (perceived difficulty) as well as how difficult a sentence is to understand (actual difficulty).

实词和动词在简单文本中更普遍

simple texts use simpler words, fewer overall words and words that are more general (Coster & Kauchak, 2011; Napoles & Dredze, 2010; Zhu, Bernhard, & Gurevych, 2010). Certain types of words have also been found to be more prevalent in simpler texts including function words and verbs (Kauchak, Leroy, & Coster, 2012; Leroy & Endicott, 2011).

The Role of Syntax in Simplification

The syntax or grammar of a language dictates how words and phrases interact to form sentences。

splitting long sentences has been show to improve Cloze scores (Kandula, Curtis, & Zeng-Treitler, 2010) and additive and causal表原因或目的connectors were easier to fill in than adversative or sequential connectors转折或者表时间顺序的连词 (Goldman & Murray, 1992). It has been suggested that grammatical difficulty is particularly important for L2 learners since they are still trying to learn appropriate grammatical structures for the language (Callan & Eskenazi, 2007; Clahsen & Felser, 2006).

（注：LOGICAL CONNECTORS https://staff.washington.edu/marynell/grammar/logicalconnectors.html）

简单文本更高比例的使用动词、function words和副词，难文本更高比例使用形容词和名词、更长的名词短语；医疗文本中，简单文本更容易使用主动语态；subject-verb-object versus object-subject-verb ordering也有区别

Some initial success has been achieved by automated simplification systems that perform syntactic transformations, 例如减少介词短语、不定式、改变动词时态

如何选择parse tree structure

We chose to focus on the 3rd level since it represents a compromise between generality and specificity.

45% of sentences in the corpus (2.47M) have unique 4th level parse tree structures, often because the 4th level regularly includes lexical components. 这样包含单词之后，很难泛华到其他句子

To remove anomalous data and likely misparses, we ignored any structure that had only been seen once among the 5.4 millions sentences. After filtering, this results in 139,969 unique 3rd level structures.

表一：

two sentences that have the same 3rd level structure, but that have varying frequency, ordered from most frequent to least. Because we focus on the high-level structure, the length of the sentences with the same structure also can vary widely

图2

grammatical frequency follows a Zipf – like distribution, with the most common structures occurring very frequently and many structures occurring infrequently

This approach for measuring the grammatical difficulty of text represents a generalized and datadriven approach that goes beyond specific, theory-based grammatical components of text difficult (e.g. active vs. passive voice, self-embedded clauses, etc. (Meyer & Rice, 1984)) and provides a generic framework for measuring grammatical difficulty.超越了基于理论的语法成分

评估：

To minimize confounding factors that might influence sentence difficulty we control for sentence length and term familiarity

1）We ranked the 139,939 unique 3rd level structures and divided them into 11 frequency bins.第一个bin占签1%频率的structure，后面是个依次占10%

2）Each of the 5.4 million Wikipedia sentences can be mapped to one of the 11 frequency bins and we selected a subset of these for our study.

3）只取长度均值附近的句子，假设句子长度服从均匀分布，去除六分之一最长的和最短的，还剩三分之二的句子

4）每个bin随机选20句，探究grammer frequency和句子长度term familiarity的关系：

考虑句子长度：在剩下的句子中按照句子长度分为3个等级，每个bin选择的10个属于最长的句子和10属于最短的句子

考虑term familiarity：在剩下的句子中选择Google web语料，计算句子单词的familiarity的均值，按照familiarity分为3个等级，每个bin选择的10个属于familiarity得分最高的句子和20个最低的

--》

This process resulted in a sample of 220 sentences in 11 frequency bins with each bin containing 5 long sentences with high familiarity, 5 long with low familiarity, 5 short with high familiarity, and 5 short with low familiarity

For each of the 220 sentences, we recruited 30 participants for a total of N=6,600 samples. To ensure the quality and accuracy of the data, participants were restricted to be within the United States and to have a previous approval rating of 95%.

众包：MTurk is a crowdsourcing tool where requesters can upload tasks to be accomplished by a set of workers for a fee.

结果：

A paired-samples t–test showed our two control variables to be effective, with length significantly different between short and long sentences (t(10) = -60.47, p < 0.001) and word frequency significantly different between the high and low group (t(10) = -38.47, p < 0.001).

1）实际难度：To measure actual difficulty (first dependent variable) we used a Cloze test. The basic Cloze test involves replacing every nth word in a text with a blank. Participants are then asked to fill in the blanks and are scored based on how many of their answers matched the original text (Taylor, 1953).

We employed a multiple-choice Cloze test. For each sentence, four nouns were randomly selected and replaced with blanks. For each sentence, we create five multiple-choice options containing the four removed words in different random orders, one of which is the correct ordering.

2）To measure perceived difficulty (second dependent variable), participants were asked to rate the sentences on a 5-point Likert scale with higher numbers representing more difficult sentences.

Each condition (11 x 2 x 2) had 5 sentences and for each sentence we gathered 30 responses, resulting in a dataset of N=6,600. T

An ANOVA shows these differences to be significant (F(10,6556)= 3.453, p < 0.001), for grammar frequency and sentence length, and (F(10,6556)= 1.870, p = 0.044), for grammar frequency and term familiarity. In addition, the interaction between all three variables is also significant (F(10, 6556) = 4.650, p < 0.001)

（注：方差分析(Analysis of Variance，简称ANOVA)，又称“变异数分析”）

1、独立样本T检验一般仅仅比较两组数据有没有区别，区别的显著性，如比较两组人的身高，体重等等，而这两组一般都是独立的，没有联系的，只是比较这两组数据有没有统计学上的区别或差异。

2、单因素ANOVA也就是单因素方差分析，是用来研究一个控制变量的不同水平是否对观测变量产生了显著影响。说白了就是分析x的变化对y的影响的显著性，所以一般变量之间存在某种影响关系的，验证一种变量的变化对另一种变量的影响显著性的检验。一般的，方差分析都是配对的。

更少见的structure更难~one-tailed Pearson correlation coefficient:

To complete this analysis and understand the strength of the effect on actual difficulty, we calculated a one-tailed Pearson correlation coefficient between the grammar frequency and the actual difficulty (percentage correct) for both the raw scores and scores aggregated by frequency bin. There was a negative correlation between grammar frequency and the actual difficulty of the sentence (raw scores: N = 6,600, r = -0.053, p < 0.01; bin averages: N = 11, r = -0.596, p < 0.05) indicating that sentences that used less frequent structures were harder to understand.

在中等structure frequency的时候，长句和短句的准确率差不多：

In contrast to actual difficulty, we also find a main effect of the sentence length on perceived difficulty with longer sentences seen as more difficult (averaged 2.2) than the shorter sentences (averaged 2.0). Surprisingly, there was no effect of the average term frequency on perceived difficulty.

The effect of grammar frequency on perceived difficulty is smaller in shorter sentences and those with lower term frequency

Both high and low frequency sentences show a jump in difficulty, though it occurs earlier (bin 7) for low frequency sentences than for high frequency sentences (bin 8)

we found a significant correlation between how well readers performed on the Cloze test and how difficult they thought a sentence was. Lower accuracy correlated with higher difficulty scores (N = 11, r = -0.574, p < 0.05; N = 6600, r = -0.203, p < 0.01)

Actual and perceived difficulty as measured in our user study for the 220 sentences binned by the Flesch-Kincaid grade level:

即使fk公式来看220个句子的难度差别很小，但是perceived difficulty确实很大的

GRAMMAR FAMILIARITY AS AN ANALYSIS TOOL：

corpus：

Each of the texts were tokenized and split into sentences using the Stanford CoreNLP toolkit and then parsed using the Berkeley Parser (the same preprocessing as the frequency bins)

总结：

1、阐述了现有的研究上，actual readability和perceived difficulty的不对等

2、阐述了parse tree leve3 frequency和actual和perceived difficulty的相关性，有效性

3、在短句中actual difficulty受到grammer的影响很小，因为shorter sentences are easy to understand and any effect of grammar is difficult to detect (ceiling effect)

Similarly, in sentences with low term familiarity (i.e. more difficult words) the grammar familiarity doesn’t impact text difficulty since users are struggling with the lexical difficulty

However, in sentences with very familiar terms, which are easier to understand, grammar frequency does have an impact on actual difficulty; only in sentences where the words are more familiar does the grammatical frequency have a strong effect. Interestingly, there was very little impact overall of term frequency on actual difficulty.

Based on these observations, we hypothesize that there is a relation between grammatical frequency and term frequency. Future studies are required to fully validate these hypotheses. Our study has limitations. Text comprehension was measured with individual

Measuring Text Difficulty Using Parse-Tree Frequency的更多相关文章

编译-构建Shell语法的语法树（parse tree）
翻译自:Generating a parse tree from a shell grammar - DEV Community
Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper
the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and tw ...
工具类-Fastjson入门使用
简介什么是Fastjson? fastjson是阿里巴巴的开源JSON解析库,它可以解析JSON格式的字符串,支持将Java Bean序列化为JSON字符串,也可以从JSON字符串反序列化到Java ...
ASP.NET MVC5+EF6+EasyUI 后台管理系统（38）-Easyui-accordion+tree漂亮的菜单导航
系列目录本节主要知识点是easyui 的手风琴加树结构做菜单导航有园友抱怨原来菜单非常难看,但是基于原有树形无限级别的设计,没有办法只能已树形展示先来看原来的效果改变后的效果,当然我已经做好了 ...
构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统（38）-Easyui-accordion+tree漂亮的菜单导航
原文:构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(38)-Easyui-accordion+tree漂亮的菜单导航系列目录本节主要知识点是easyui ...
DL4NLP —— seq2seq+attention机制的应用：文档自动摘要（Automatic Text Summarization）
两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...
go标准库的学习-text/template
参考:https://studygolang.com/pkgdoc 导入方式: import "text/template" template包实现了数据驱动的用于生成文本输出的模 ...
READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification-paper
https://aclanthology.info/pdf/W/W11/W11-2308.pdf 2 background2000年以前 ----传统可读性准则局限于表面的文本特征,例如the Fle ...
无限分级和tree结构数据增删改【提供Demo下载】
无限分级很多时候我们不确定等级关系的层级,这个时候就需要用到无限分级了. 说到无限分级,又要扯到递归调用了.(据说频繁递归是很耗性能的),在此我们需要先设计好表机构,用来存储无限分级的数据.当然,以 ...

随机推荐

小程序数据绑定点赞效果切换（交流QQ群：604788754）
如果对本例有更好的意见和建议,希望给予留言或是加群跟群主联系,交流学习. WXML: <block wx:for="{{nums}}" wx:for-index='idx' ...
[Codeforces543D]Road Improvement
Problem 刚开始每条边都是坏的,现在要选取一个点使得其他点到这个点的路径上最多只有一条坏路,问至少要修好多少条边 Solution 如果以1为根,那么是个简单的树形DP 设根从u转移到v,那么u ...
HFun.快速开发平台（二）=》自定义列表实例
应用系统中数据列表的展现是开发内容之一,实现的方式基本是通过编号具体的访问列表页实现,通过检索条件进行数据源的获取,列字段的描述,还可能会有检索条件的实现,列表数据的导出等功能. 为了将重复工作进行简 ...
xcode: 解决 __nw_connection_get_connected_socket_block_invoke 1 Connection has no connected handler
Run -> Arguments -> Environment Variables -> Add -> Name: "OS_ACTIVITY_MODE", ...
java的http请求实例
package vqmp.data.pull.vqmpull.common.utils; import org.slf4j.Logger; import org.slf4j.LoggerFactory ...
java web后台工作原理
多时候我们都想知道,web容器或web服务器(比如Tomcat或者jboss)是怎样工作的?它们是怎样处理来自全世界的http请求的?它们在幕后做了什么动作?Java Servlet API(例如Se ...
JAVA 平时作业二
编写一个 Java 应用程序,统计数组{1,3,4,7,2,1,1,5,2,5,7,2,1,1,3},统计显示每种数字其出现的次数以及出现最多和最少次数的数字 public class Number ...
【C#】await & Result DeadLock
随意使用异步的await和Result,被弄得欲仙欲死,然后看了 Don't Block on Async Code,稍许明白,翻译然后加上自己的理解以加深印象. 会死锁的两个例子 UI例子 publ ...
小程序wx.chooseImage的坑
选择图片后可能重新执行onshow()和onhide(),可以在页面中添加锁变量,选择图片前获取,选择完释放,onshow中利用锁来中断执行
OpenGL坐标系的理解
搬运自: https://learnopengl-cn.github.io/01%20Getting%20started/08%20Coordinate%20Systems/#3d 为了将坐标从一个坐 ...

Measuring Text Difficulty Using Parse-Tree Frequency

Measuring Text Difficulty Using Parse-Tree Frequency的更多相关文章

随机推荐

热门专题