【NLP新闻-2013.06.16】Representative Reviewing
英语原文地址:http://nlp.hivefire.com/articles/share/40221/
注:本人翻译NLP新闻只为学习专业英语和扩展视野,如果翻译的不好,请谅解!
(实在是读不大懂,翻译的一塌糊涂…如果有人能明白这篇文章的大题意思,一定要留言,感激不尽!)
When thinking about how best to review papers, it seems helpful to have some conception of what good reviewing is. As far as I can tell, this is almost always only discussed in the specific context of a paper (i.e. your rejected paper), or at most an area (i.e. what a “good paper” looks like for that area) rather than general principles. Neither individual papers or areas are sufficiently general for a large conference—every paper differs in the details, and what if you want to build a new area and/or cross areas?
当考虑如何最好的去审查论文的时候,如果对什么是好的审查有一些概念和理解的话,那么是有帮助的。据我所知,这种情况只有在讨论一篇论文具体语境的时候(例如你拒绝的论文)出现或者在一个大多数的领域,而不是一般的规则。没有任何一个人或者一个领域的知识是足以应付一个大型会议的-每一个论文在细节上是不同的,要是你建立一个新的领域或者交叉的领域会怎么样呢?
An unavoidable reason for reviewing is that the community of research is too large. In particular, it is not possible for a researcher to read every paper which someone thinks might be of interest. This reason for reviewing exists independent of constraints on rooms or scheduling formats of individual conferences. Indeed, history suggests that physical constraints are relatively meaningless over the long term — growing conferences simply use more rooms and/or change formats to accommodate the growth.
一个不可避免的审查的原因是研究的团体太大了。尤其是,不可能每一个研究者阅读每一篇他感兴趣的论文。这个原因独立存在于房间的限制和个人会议调度安排。实际上,历史表明,物理上的限制在时代发展的前提下是毫无意义的,长期增长的会议仅仅简单的使用了更多的房间,或者改变了形式来适应增长。
This suggests that a generic test for paper acceptance should be “Are there a significant number of people who will be interested?” This question could theoretically be answered by sending the paper to every person who might be interested and simply asking them. In practice, this would be an intractable use of people’s time: We must query far fewer people and achieve an approximate answer to this question. Our goal then should be minimizing the approximation error for some fixed amount of reviewing work.
这表明,论文的一般测试验收应该是:“是否会有相当多的人感兴趣?”。这个问题可以理论上通过把这篇论文给每个可能感兴趣的人并且只询问他们是否感兴趣来回答。实际上,这将比较难管理的去使用别人的时间:我们必须查询更少的人并且获得大概的针对这个问题的回答。我们的目标应该在固定的审查工作中减少近似值误差。
Viewed from this perspective, the first way that things can go wrong is by misassignment of reviewers to papers, for which there are two easy failure modes available.
从这个角度看事情,第一种方式评审论文分配不当可能会出现错误,这里有两种容易失效的模式。
- When reviewer/paper assignment is automated based on an affinity graph, the affinity graph may be low quality or the constraint on the maximum number of papers per reviewer can easily leave some papers with low affinity to all reviewers orphaned.
- 当评审者/论文的分配根据亲和图自动分配,亲和图的质量也许很低或者每个人的论文数量的最大值的限制会很容易剩余一些论文,与评审者具有低的亲和力让他们孤立。
- When reviewer/paper assignments are done by one person, that person may choose reviewers who are all like-minded, simply because this is the crowd that they know. I’ve seen this happen at the beginning of the reviewing process, but the more insidious case is when it happens at the end, where people are pressed for time and low quality judgements can become common.
- 当评审者/论文分配是由一个人完成,这个人可能会选择志趣相投的评审者,因为这些是他们知道的。我已经看到了这种模式已经出现在评审进程当中,但是更多隐藏的事件发生在最后,在最后阶段人们压时间,低质量的评判,成为了常见的现象。
An interesting approach for addressing the constraint objective would be optimizing a different objective, such as the product of affinities rather than the sum. I’ve seen no experimentation of this sort.
一种有趣的方法解决约束目标可以为优化不同的目标,比如产品的亲和力而不是总和。我还没有看到过有人使用这种方法。
For ICML, there are about 3 levels of “reviewer”: the program chair who is responsible for all papers, the area chair who is responsible for organizing reviewing on a subset of papers, and the program committee member/reviewer who has primary responsibility for reviewing. In 2012 tried to avoid these failure modes in a least-system effort way using a blended approach. We used bidding to get a higher quality affinity matrix. We used a constraint system to assign the first reviewer to each paper and two area chairs to each paper. Then, we asked each area chair to find one reviewer for each paper. This obviously dealt with the one-area-chair failure mode. It also helps substantially with low quality assignments from the constrained system since (a) the first reviewer chosen is typically higher quality than the last due to it being the least constrained (b) misassignments to area chairs are diagnosed at the beginning of the process by ACs trying to find reviewers (c) ACs can reach outside of the initial program committee to find reviewers, which existing automated systems can not do.
(ICML Intermedia Casting Markup Language媒体选择标记语言)ICML,有三种水平的“评审者”:程序的主要负责者,负责所有的论文;区域负责者,负责组织审查论文的子集还有程序的委员会成员们,评审者们,有直接的评审的责任。在2012年试着去防止这些失效的模型在最小系统的工作方式下使用混合的工作方式。我们通过招标来获得更高质量的亲和矩阵。我们使用一个限制系统来分配第一个评审者给每一篇论文然后两个区域的负责者一篇分配给每一篇论文。然后,我们询问每一个区域的负责者去为每一个论文寻找到一个评审者。这很明显是一one-area-chair失效处理模式。他还从本质上有助于低质量的从限制系统分配第一个评审者选择,与最后一个相比通常会质量更高,因为它会变得最少的约束误配给区域负责者,这些负责者在进程开始会被ACs诊断,试着去发现评审者ACs可以去初始程序委员会的外面去寻找评审者,这些已经存在的自动系统是不能做到的。
The next way that reviewing can go wrong is via biased reviewing.
下一种方式的评审通过偏见评审可能会出现错误。
- Author name bias is a famous one. In my experience it is real: well known authors automatically have their paper taken seriously, which particularly matters when time is short. Furthermore, I’ve seen instances where well-known authors can slide by with proof sketches that no one fully understands.
- 作者姓名的偏见就是一个著名的例子。在我的经历中:著名的作家们自动的把他们的论文认真对待,特别是当时间很短的时候。此外,我也已经看见过著名的作家在梗概没有人能完全理解的情况下经过证明会下跌。
- Review anchoring is a very significant problem if it occurs. This does not happen in the standard review process, because the reviews of others are not visible to other reviewers until they are complete.
- 评审的稳定一旦发生是一个非常关键的问题。他在标准的评审程序中还没有出现,因为其他的评审直到他们完成相对于其他的评审来说是不可见的。
- A more subtle form of bias is when one reviewer is simply much louder or charismatic than others. Reviewing without an in-person meeting is actually helpful here, as it reduces this problem substantially.
- 一个更不易察觉的偏见的形式是当一个评审者仅仅更加高调的或者相比其他有魅力。没有一个人的会议的评审实际上是非常有效的,就好像充分的削弱了这个问题。
Reviewing can also be low quality. A primary issue here is time: most reviewers will submit a review within a time constraint, but it may not be high quality due to limits on time. Minimizing average reviewer load is quite important here. Staggered deadlines for reviews are almost certainly also helpful. A more subtle thing is discouraging low quality submissions. My favored approach here is to publish all submissions nonanonymously after some initial period of time.
评审也会变得质量低。一个重要的问题就是时间:大部分的评审者将会提交一个评审在时间的限制内,但是这样可能质量可能不会很高,就是因为时间的限制。减少平均的评审者的载入是非常重要的。错开的截至时间对于评审者来说是非常有帮助的。一个更加不易察觉的事情是发现低质量的提交。我最喜欢的方法是发布所有的提交上来的结果在一些初试时间过后。
Another significant issue in reviewer quality is motivation. Making reviewers not anonymous to each other helps with motivation as poor reviews will at least be known to some. Author feedback also helps with motivation, as reviewers know that authors will be able to point out poor reviewing. It is easy to imagine that further improvements in reviewer motivation would be helpful.
另一个关键的问题是,评审质量是动力。使每一个评审者不匿名的对于其他人会有助于动机,正如的不良的评论会至少被一些人知道。作者反馈也有助于动机,例如评论者知道作者将会指出不好的评论。也很容易想象会有更深层次的改善在评论者动机上。
A third form of low quality review is based on miscommunication. Maybe there is silly typo in a paper? Maybe something was confusing? Being able to communicate with the author can greatly reduce ambiguities.
第三种低质量的评审形式是错误传达。也许有人在文章中写了错字。也许一些事情是疑惑的。能够与作者联系上可以大大的减少歧义。
The last problem is dictatorship at decision time for which I’ve seen several variants. Sometimes this comes in the form of giving each area chair a budget of papers to “champion”. Sometimes this comes in the form of an area chair deciding to override all reviews and either accept or more likely reject a paper. Sometimes this comes in the form of a program chair doing this as well. The power of dictatorship is often available, but it should not be used: the wiser course is keeping things representative.
最后一个问题是我已经见过的一些变种,在决定的时刻独裁。有些时候出现在给每个领域预算“冠军”的论文。有的时候出现在一个领域的负责者裁决去覆盖所有的评论或者接受或者可能拒绝一个论文。有的时候出现在一个程序的负责者做这样的事情。独裁的力量是可以获得的,但是不能使用:比较明智的做法是保持事物的代表性。
At ICML 2012, we tried to deal with this via a defined power approach. When reviewers agreed on the accept/reject decision, that was the decision. If the reviewers disgreed, we asked the two area chairs to make decisions and if they agreed, that was the decision. It was only when the ACs disagreed that the program chairs would become involved in the decision.
在ICML2012,我们试着去通过一个定义的有作用的方法去处理这些问题。当评论者同意一个接受的或者拒绝的决定时,这种方法就是一个决定。如果评论者不同意,我们将会询问两个领域的负责人来做决定,如果他们同意了,那么这就是最终决定的结果。仅仅只有在ACs不同意的时候,程序负责人才会被加入到决定的判决当中来。
The above provides an understanding of how to create a good reviewing process for a large conference. With this in mind, we can consider various proposals at the peer review workshop and elsewhere.
上面提供了一个关于怎样去创建一个良好的评论进程在一个会议当中的理解。记住这一点,我们可以考虑各种不同的提议在同行评审的研讨会上或者一些其他的地方。
- Double Blind Review. This reduces bias, at the cost of decreasing reviewer motivation. Overall, I think it’s a significant long term positive for a conference as “insiders” naturally become more concerned with review quality and “outsiders” are more prone to submit.
- Double Blind Review.这种方式降低了偏见,以减少评论者的动机为代价。全面的看,我认为这对于会议来说是一种有意义的可以长期发展的方式,就像,知情人很自然的成为了更多关联评审质量,外部人容易有倾斜的去提交。
- Better paper/reviewer matching. A pure win, with the only caveat that you should be familiar with failure modes and watch out for them.
- Better paper/reviewer matching.一种纯粹的胜出,仅有的需要注意的是你必须熟悉失效模型并且对他们保持警觉。
- Author feedback. This improves review quality by placing a check on unfair reviews and reducing miscommunication at some cost in time.
- 作者反馈。这可以提高评论的质量,对于检查和定位一个不公平的评论和减少最后由于误传产生的代价。
- Allowing an appendix or ancillary materials. This allows authors to better communicate complex ideas, at the potential cost of reviewer time. A standard compromise is to make reading an appendix optional for reviewers.
- 允许附录的材料。作者可以更好的交流复杂的观点,可以更好的改善审稿时间的成本。一个标准的折中解决的方案是可以阅读可选的附录。
- Open reviews. Open reviews means that people can learn from other reviews, and that authors can respond more naturally than in single round author feedback.
- Open reviews.Open reviews的意思是人们可以从其他的评论中学习,并且作者可以更自然的回复而不是单轮的作者回馈。
It’s important to note that none of the above are inherently contradictory. This is not necessarily obvious as proponents of open review and double blind review have found themselves in opposition at times. These approaches can be accommodated by simply hiding authors names for a fixed period of 2 months while the initial review process is ongoing.
值得指出的是,上面提到的没有一种是相互矛盾的。也没有必要明显的分出open view的拥护者还是double blind review的拥护者,以达到对立。这些方法可以通过隐藏作者的名字固定的2个月的时间,当评论进程初始开始进行的时候来适应。
Representative reviewing seems like the real difficult goal. If a paper is rejected in a representative reviewing process, then perhaps it is just not of sufficient interest. Similarly, if a paper is accepted, then perhaps it is of real and meaningful interest. And if the reviewing process is not representative, then perhaps we should fix the failure modes.
Representative reviewing 看起来是一个非常困难的目标。如果一篇论文在Representative reviewing 的进程中被拒绝了,然后可能是不够有足够的兴趣。相似的,如果一个论文被接受了,然后可能这篇文章真的是非常的有意义和感兴趣。如果评审进程不是典型的,然后可能我们应该修改一下失效模型。
【NLP新闻-2013.06.16】Representative Reviewing的更多相关文章
- 【NLP新闻-2013.06.03】New Book Where Humans Meet Machines
英语原文地址:http://nlp.hivefire.com/articles/share/39865/ 注:本人翻译NLP新闻只为学习专业英语和扩展视野,如果翻译的不好,请谅解! (我挺想看这本书的 ...
- http://www.cnblogs.com/younggun/archive/2013/07/16/3193800.html
http://www.cnblogs.com/younggun/archive/2013/07/16/3193800.html
- 曼孚科技:AI自然语言处理(NLP)领域常用的16个术语
自然语言处理(NLP)是人工智能领域一个十分重要的研究方向.NLP研究的是实现人与计算机之间用自然语言进行有效沟通的各种理论与方法. 本文整理了NLP领域常用的16个术语,希望可以帮助大家更好地理解 ...
- Cheatsheet: 2013 06.01 ~ 06.22
.NET Git for Visual Studio and .NET developers How to download multiple files concurrently using Web ...
- 微软嵌入式WEC2013产品研讨会(深圳站---2013.10.16)
主要内容如下: 1. Windows Embedded Compact 2013面向的市场 主要面向工业自动化.医疗设备和零售行业这些市场,和物联网关系非常紧密. 2. Windo ...
- <2013 06 24> 关于Zigbee项目_Munik_TUM_eCar
(本月)6月4号到德国慕尼黑,参与TUM大学的一个电动车项目组,预计时间3个月. 我的任务是参与wireless的研究,主要就是用无线链接取代有线链接(汽车线缆很多很讨厌). 使用的是TI MP430 ...
- AtCoder Beginner Contest 100 2018/06/16
A - Happy Birthday! Time limit : 2sec / Memory limit : 1000MB Score: 100 points Problem Statement E8 ...
- Cheatsheet: 2013 06.23 ~ 06.30, Farewell GoogleReader(2008.07.20~2013.06.30)
Mobile Resources for Mac and iOS Developers- Introduction to Objective-C Modules Other 10 Principles ...
- 2013/7/16 HNU_训练赛4
CF328B Sheldon and Ice Pieces 题意:给定一个数字序列,问后面的数字元素能够组成最多的组数. 分析:把2和5,6和9看作是一个元素,然后求出一个最小的组数就可以了. #in ...
随机推荐
- 定时任务cron表达式解析
cron表达式2种: Seconds Minutes Hours DayofMonth Month DayofWeek Year或 Seconds Minutes Hours DayofMonth M ...
- Maven安装本地jar包至本地repository
1.安装jar包 Maven 安装 JAR 包的命令是: mvn install:install-file -Dfile=jar包的位置 -DgroupId=上面的groupId -Dartifa ...
- JavaScript面向对象初步认识
一.面向对象初步认识 1.什么是对象? 一句话解释: 万物皆对象 对象是有属性的:用{ }来写入! 2.对象的应用(接口) json 接口简单的说就是后台提供给前端提供数据的,让我们进行渲染! 请求接 ...
- 2017华南理工华为杯H bx值(容斥问题)
题目描述 对于一个nnn个数的序列 a1,a2,⋯,ana_1,a_2,\cdots,a_na1,a2,⋯,an,从小到大排序之后为ap1,ap2,⋯,apna_{p_1},a_{p ...
- HDU 5632 Rikka with Array [想法题]
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=5632 ------------------------------------------------ ...
- 牛客提高D5t1 deco的abs
分析 傻子题? 对d取模后随便贪心即可 代码 #include<iostream> #include<cstdio> #include<cstring> #incl ...
- UltraISO 9.6.1.3016(带注册机)
UltraISO 9.6.1.3016 链接: http://pan.baidu.com/s/1kTqO6hD密码: ehdc
- Jmeter接口测试报告模板优化(续)
在之前的基础上又优化了一下: 1.增加了对接口响应时间段的统计,如小于0.5s的请求有多少,0.5-1s的有多少,大于1s的有多少.可以自行修改.且不同范围内的时间字体颜色不一样,便于区分. < ...
- Python几行代码实现邮件发送
话不多说直接进入正题 首先我们需要安装一个名为'zmail'的包,终端执行'pip install zmail'即可实现安装. 直接上代码 import zmail mail = { 'subject ...
- Java程序流程控制
程序流程控制有 选择,循环,以及跳转结构 选择结构中无非就是 If 和 switch语句我两种都做了一些小案例 1. 利用Scanner库来获得控制台用户输入的数字与代码中定义的变量比较 packag ...