(1)What is Sentence Centrality and Centroid-based Summarization ?

  Extractive summarization works by choosing a subset of the sentences in the original documents. This process can be viewed as identifying the most central sentences in a (multi-document) cluster that give the necessary and sufficient amount of information related to the main theme of the cluster.

  The centroid of a cluster is a pseudo-document which consists of words that have tf×idf scores above a predefined threshold, where tf is the frequency of a word in the cluster, and idf values are typically computed over a much larger and similar genre data set.

  In centroid-based summarization (Radev, Jing, & Budzikowska, 2000), the sentences that contain more words from the centroid of the cluster are considered as central. This is a measure of how close the sentence is to the centroid of the cluster.

(2)Centrality-based Sentence Salience:

  All of our approaches are based on the concept of prestige in social networks. A social network is a mapping of relationships between interacting entities (e.g. people, organizations, computers). Social networks are represented as graphs, where the nodes represent the entities and the links represent the relations between the nodes.

  A cluster of documents can be viewed as a network of sentences that are related to each other. We hypothesize that the sentences that are similar to many of the other sentences in a cluster are more central (or salient) to the topic.

  There are two points to clarify in this definition of centrality:

  1.How to define similarity between two sentences.

  2.How to compute the overall centrality of a sentence given its similarity to other sentences.

  To define similarity, we use the bag-of-words model to represent each sentence as an N-dimensional vector, where N is the number of all possible words in the target language. For each word that occurs in a sentence, the value of the corresponding dimension in the vector representation of the sentence is the number of occurrences of the word in the sentence times the idf of the word. The similarity between two sentences is then defined by the cosine between two corresponding vectors:

  A cluster of documents may be represented by a cosine similarity matrix where each entry in the matrix is the similarity between the corresponding sentence pair.

  Figure 1 shows a subset of a cluster used in DUC 2004, and the corresponding cosine similarity matrix. Sentence ID dXsY indicates the Y th sentence in the Xth document.

                                             Figure 1: Intra-sentence cosine similarities in a subset of cluster d1003t from DUC 2004.

  This matrix can also be represented as a weighted graph where each edge shows the cosine similarity between a pair of sentence (Figure 2).

Figure 2: Weighted cosine similarity graph for the cluster in Figure 1.

(3)Degree Centrality:

  Since we are interested in significant similarities, we can eliminate some low values in this matrix by defining a threshold so that the cluster can be viewed as an (undirected) graph.

  Figure 3 shows the graphs that correspond to the adjacency matrices derived by assuming the pair of sentences that have a similarity above 0.1, 0.2, and 0.3, respectively, in Figure 1 are similar to each other. Note that there should also be self links for all of the nodes in the graphs since every sentence is trivially similar to itself. Although we omit the self links for readability, the arguments in the following sections assume that they exist.

                                                             -----------------------------------------------------------------------

                                                               -----------------------------------------------------------------------

Figure 3: Similarity graphs that correspond to thresholds 0.1, 0.2, and 0.3, respectively, for the cluster in Figure 1.

  A simple way of assessing sentence centrality by looking at the graphs in Figure 3 is to count the number of similar sentences for each sentence. We define degree centrality of a sentence as the degree of the corresponding node in the similarity graph. As seen in Table 1, the choice of cosine threshold dramatically influences the interpretation of centrality. Too low thresholds may mistakenly take weak similarities into consideration while too high thresholds may lose many of the similarity relations in a cluster.

Table 1: Degree centrality scores for the graphs in Figure 3. Sentence d4s1 is the most central sentence for thresholds 0.1 and 0.2.

JRSmith©2014 - Feedback

Learning LexRank——Graph-based Centrality as Salience in Text Summarization(一)的更多相关文章

  1. Deep Learning of Graph Matching 阅读笔记

    Deep Learning of Graph Matching 阅读笔记 CVPR2018的一篇文章,主要提出了一种利用深度神经网络实现端到端图匹配(Graph Matching)的方法. 该篇文章理 ...

  2. Learning Context Graph for Person Search

    Learning Context Graph for Person Search 2019-06-24 09:14:03 Paper:http://openaccess.thecvf.com/cont ...

  3. Learning Conditioned Graph Structures for Interpretable Visual Question Answering

    Learning Conditioned Graph Structures for Interpretable Visual Question Answering 2019-05-29 00:29:4 ...

  4. 《Deep Learning of Graph Matching》论文阅读

    1. 论文概述 论文首次将深度学习同图匹配(Graph matching)结合,设计了end-to-end网络去学习图匹配过程. 1.1 网络学习的目标(输出) 是两个图(Graph)之间的相似度矩阵 ...

  5. DAG-GNN: DAG Structure Learning with Graph Neural Networks

    目录 概 主要内容 代码 Yu Y., Chen J., Gao T. and Yu M. DAG-GNN: DAG structure learning with graph neural netw ...

  6. Graph Based SLAM 基本原理

    作者 | Alex 01 引言 SLAM 基本框架大致分为两大类:基于概率的方法如 EKF, UKF, particle filters 和基于图的方法 .基于图的方法本质上是种优化方法,一个以最小化 ...

  7. 论文解读( N2N)《Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization》

    论文信息 论文标题:Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximiz ...

  8. 论文解读(GMT)《Accurate Learning of Graph Representations with Graph Multiset Pooling》

    论文信息 论文标题:Accurate Learning of Graph Representations with Graph Multiset Pooling论文作者:Jinheon Baek, M ...

  9. Learning Latent Graph Representations for Relational VQA

    The key mechanism of transformer-based models is cross-attentions, which implicitly form graphs over ...

随机推荐

  1. 关于RGB转换YUV的探讨与实现

    最近在Android手机上使用相机识别条形码工作取得了比较理想的进展,自动识别功能基本完成,然而在手动识别指定条形码图片时遇到困难,由于Zxing开源Jar包识别图片的颜色编码式为YUV,而普通的图片 ...

  2. 服装销售系统数据库课程设计(MVC)

    <数据库课程设计> 名称:Jia服装销售网站 姓名:陈文哲 学号:…… 班级:11软件工程 指导老师:索剑 目录 目录 1 需求分析 3 一:销售部门机构情况 3 二:销售部门的业务活动情 ...

  3. 第三篇——第二部分——第五文 配置SQL Server镜像——域环境SQL Server镜像日常维护

    本文接上面两篇搭建镜像的文章: 第三篇--第二部分--第三文 配置SQL Server镜像--域环境:http://blog.csdn.net/dba_huangzj/article/details/ ...

  4. LVM物理卷命令

    1. 物理卷命令  一般维护命令:  #pvscan //在系统的全部磁盘中搜索已存在的物理卷  #pvdisplay 物理卷全路径名称 //用于显示指定物理卷的属性. #pvdata 物理卷全路径名 ...

  5. 深入理解Fsync----JBD内核调试 专业打杂程序员 @github yy哥

    http://hustcat.github.io/ http://www.cnblogs.com/hustcat/p/3283955.html http://blog.sina.com.cn/s/ar ...

  6. Windows与Linux下文件操作监控的实现

    一.需求分析: 随着渲染业务的不断进行,数据传输渐渐成为影响业务时间最大的因素.究其原因就是因为数据传输耗费较长的时间.于是,依托于渲染业务的网盘开发逐渐成为迫切需要解决的需求.该网盘的实现和当前市场 ...

  7. mongnodb 启动脚本

    开始用mongodb建立一套监控体系,安装解压即可.附上编写的mongodb启动管理脚本. 建议 mkdir sbin 目录,放到sbin目录下.废话少说,代码如下: #!/bin/bash MONG ...

  8. MySQL数字类型中的三种常用种类

    数字类型 MySQL数字类型按照我的分类方法分为三类:整数类.小数类和数字类. MySQL数字类型之一我所谓的“数字类” 就是指 DECIMAL 和 NUMERIC,它们是同一种类型.它严格的说不是一 ...

  9. Spring Framework jar官方直接下载路径

    SPRING官方网站改版后,建议都是通过 Maven和Gradle下载,对不使用Maven和Gradle开发项目的,下载就非常麻烦,下给出Spring Framework jar官方直接下载路径: h ...

  10. 在fragment中调用SharedPreferences

    [o] Activity中调用SharedPreferences的方式: String prefsName = "mysetting"; SharedPreferences pre ...