Ten Tips for Writing CS Papers, Part 2
Ten Tips for Writing CS Papers, Part 2
This continues the first part on tips to write computer science papers.
6. Ideal Structure of a Paragraph
A paper has different levels of formal structure: sections, subsections, paragraphs, sentences. It is important to ensure that the structure of the content aligns well with the formal structure because the formal structure is readily perceived by the reader, whereas the structure of the content is not. With a good alignment we make it easy for the reader to have the right mental model for the organization of the content; this enables a better navigation and memory of the content.
An important consequence of a well organized paper is to minimize the possible surprise for the reader. In general you may want to surprise readers with how amazing your method or achievements are, but not through the organization of the paper.
How to align the content with the formal structure? There is more to say about this and I recommend the references at the end of this article, but here I want to focus on the structure of one or multiple paragraphs. The basic rules are:
- One paragraph should contain only a single idea or a single point of argumentation.
- The beginning and the end of a paragraph glue the paragraph into the surrounding content.
There is an ambiguity as to what constitutes a separate idea and indeed paragraphs may be of quite different lengths.
To achieve a good structure, here is a recipe that works for me. For a section I would like to write I make a list of bullet points of things I want to say, with one bullet point being a single idea or important point. Each point may have one or more dependencies on other points and I use the dependencies to order the list. Finally, I write one paragraph for each item on the list and I may add an additional paragraph at the beginning and end of the section to connect the section to the surrounding content.
I found that this recipe also makes my job as a writer easier because it overcomes my writing inhibition in two ways. First, I can start by simply making a list and this does not feel like writing. Second, once the ordering of ideas is clear, the actual writing becomes a lot simpler.
Here is an example of a less-than-ideal paragraph from Section 2.3 in (Gehler and Nowozin, 2008).
"As already mentioned to our knowledge (Argyriou et al., 2006) were the first to note the possibility of an infinite set of base kernels and they also stated the subproblem (Problem 1). We will defer the discussion of the subproblem to the next section and shortly comment on the differences of the Algorithm of (Argyriou et al., 2006) and the IKL Algorithm. We denote with g the objective value of a standard SVM classifier with loss function L."
Let us reverse engineer the content of this paragraph, then restructure it. The paragraph makes two points: first, a connection to the work of (Argyriou et al., 2006). Second, it establishes some notation. So it should perhaps be split into two paragraphs.
For the first point, the beginning is also less than ideal: "as already mentioned to our knowledge"; it is a bit redundant and apologetic to point out that we already mentioned it and that we may not know better. The second point, the notation, is okay by itself, but it is unclear why it follows the first: is it done in order to enable the comparison between approaches? We would need to read ahead to find out. (This is indeed the case.) Here is a proposed improvement:
(Argyriou et al., 2006) first recognized the possibility of an infinite set of base kernels and we now discuss the connection to our work.
To make the connection explicit we first establish the notation we will use throughout the paper. We use g to denote the objective value of a standard SVM classifier, where L is the loss function.
It is simpler to read and makes it clear why we introduce the notation. Also note the end and beginning of the two short paragraphs: the end of the first paragraph tells you what comes next ("the connection to our work"), the beginning of the second paragraph tells you how this is done (through notation). The flow between the two paragraphs is natural now and they could almost be merged into one again with the single point of the resulting paragraph being "the connection between (Argyriou et al.) and our work".
7. Avoid Ambiguous Relative Pronouns (This, These, That, Which)
When used properly, a relative pronoun, such as "this", "these", "that", "which", can effectively refer to a previously mentioned noun, and that has to be remembered by the reader.
In the previous sentence, which entity did "that" refer to? Is it "a previously mentioned noun"? Or is it "a relative pronoun"? Or is it the proper use?
Ambiguities of relative pronouns are common because the writer does not experience the ambiguity. After all, it is clear to the writer what he refers to. Train yourself to recognize any potentially ambiguous relative pronoun, ideally by using a highlighter to mark them in a printout.
To resolve the ambiguity the easiest solution is simply to add the noun it refers to. For the above example, "that" would become "that noun".
(Another issue I ran into frequently is in deciding between "which" in cases where "that" should have been used, such as in "We use an algorithm which is efficient." I remember annoying a former American colleague of mine by using "which" a bit too often. Some advice is available.)
Here is a real example from an ICDM 2008 paper of mine. I highlight all relative pronouns.
Extracting such geometric patterns from molecular 3D structures is one of the central topic in computational biology, and numerous approaches have been proposed. Most of them are optimization methods, which detect one pattern at a time by minimizing a loss function (e.g., [14, 15, 6]). They are different from our approach enumerating all patterns satisfying a certain geometric criterion. In particular, they do not have a minimum support constraint. Instead they try to find a motif that matches all graphs.
This is not the worst example but can be improved nevertheless. The first "which" is best removed, the other relative pronouns are best clarified. Here is a proposed improvement:
Extracting such geometric patterns from molecular 3D structures is one of the central topic in computational biology, and numerous approaches have been proposed. Most of them are optimization methods, detecting one pattern at a time by minimizing a loss function (e.g., [14, 15, 6]). These optimization methods are different from our approach enumerating all patterns satisfying a certain geometric criterion. In particular, other methods do not have a minimum support constraint and instead try to find a motif that matches all graphs.
8. Provide Continuation Markers
Continuation markers are sentences or paragraphs, typically at the beginning of sections, to tell the reader what will be presented next and to tell the reader how it is relevant or how it relates to what has been presented already. It provides structure and flow, connecting the different parts of the paper.
Here is an example, from an ICCV 2015 paper:
"3. Method
We now describe our model for tracking fast moving objects. While the motion model is standard, the observation model for raw ToF captures is a novel contribution."
Note two elements here: first, there is an explicit statement of what will be presented next (the model for tracking fast moving objects). Second, we establish relevance with respect to the contribution.
There are two reasons why thinking about natural continuation markers for reading the paper is important. First, it enables navigation through the paper by allowing the reader to skip sections more efficiently. Second, without the necessary background it may take a reader multiple repeated readings to fully understand the paper. If you lost the reader, providing a natural re-entry point makes it easier to continue reading the paper despite a lack of understanding of some parts.
Both reasons are especially important for reviewers, a special type of reader. Ideally the reviewer is an expert in the field already, so we would like to make it easy for him to quickly navigate to relevant parts of the paper. Less ideally, the reviewer is working under time pressure or without keen interest in the work; in this case we would like to minimize misunderstanding or missing important points during reading.
It is important to co-locate the continuation markers with the actual text itself. It is not sufficient to provide a mini table-of-contents as part of the introduction ("In Section 2 we present related work. In Section 3 we present our method. etc.").
9. Multiple Authors
It is a reality that most computer science papers are authored by multiple authors. Coordinating the writing between multiple authors can be challenging on both the level of content and in terms of technology.
In terms of content, in my experience a recipe for disaster is to divide the paper into parts and agree that "Author A will write the introduction, author B will write the method, etcetera". The resulting draft will be incoherent and everyone has an excuse for delaying their part due to perceived dependencies ("I will write the method once the notation is defined in the introduction", "I will write the introduction when we have results").
Also, when dividing up work this way the draft can be poorly balanced in terms of relevant parts, as sub-authors tend to be assigned to the parts they have contributed to the most, which provides an incentive to describe their own contribution in too much detail (for example senior authors writing the introduction will fill it discussing their past research agenda that led to this work; the author writing about the implementation will want to go into detail because it was really difficult to get it to work and people may miss just how difficult it was, etcetera).
It is better to assign responsibility to a single author to write a full draft, then iterate together over this draft. There are two reasons why it is better: first, clear responsibility gets stuff done; second, the draft will be more coherent with a more linear flow of arguments.
The single author draft works best if the draft writer is an experienced author because iterating on a poorly organized draft may take more effort than a complete rewrite. When iterating on a draft it is important to distinguish substantial from minor changes. Minor changes are changes that fix issues locally, such as adding a sentence for clarification, changes of word order, typos, etc. These changes are important but not urgent. Most accomplished authors I know prefer to make these changes in passes through the full paper, much like polishing the paper with each reading.
Substantial changes are things like addition or removal of sections, changing the order of the presentation, enlarging or shrinking the claimed contribution, etcetera. Such changes can have large implications on the other parts of the paper which need to be addressed and therefore such changes are important and urgent because they require less time if made early.
In terms of technology, I frequently experienced problems due to the diversity of authors and their working style. Often some authors will be senior authors with a proven but dated work setup, for example, not using basic version control systems and being stuck in an unflexible editor that mangles LaTeX every time it opens a file. To be fair, these authors are often most essential in terms of providing feedback on the content of the paper and they may have little time available to stay up to date with the latest tools. For addressing this problem with technology, my recommendations are the following:
- Use a version control system: this should almost go without saying and even if you are the sole author of a paper it is best to use a version control system because it provides a simple method to back your work up. But for multiple authors coordinating the writing of a paper without a version control system is simply a waste of time and nerves of everyone involved.
- Use a friendly version control system that provides a simple web interface;Bitbucket is my favorite for paper writing because it offers free private git repositories and allows you to view changes in a neat timeline in the browser. While hardly surprising to any git user, this feature is readily appreciated by everyone. Also, for minor changes Bitbucket actually allows editing from within the browser.
- For yourself: when writing LaTeX write one sentence in a line and use a line break after each sentence. This makes merging conflicts easier and leads to fewer surprises with strange editors breaking long lines. (I also found that this helps me to improve the organization of a paragraph because every sentence now starts at the beginning of a line.)
- When you need only high level feedback from your coauthors, sending them a PDF for annotation via email may still be the most efficient way.
10. Authorship and Author Ordering
Except for the writing itself, another common problem with multiple authors is discussions about authorship and author ordering. While not related to writing papers per se, I do want to share some remarks on this topic. There are only a few common situations where debates about author ordering arise. Here are a few common examples, with the more common cases first:
- A small contributor or someone involved in early discussions wants to be a co-author, but other authors disagree based on the amount of time they contributed.
- There is a PhD student, a post-doc, and a faculty author and in most computer science venues the recognition is strongest for the first and last author position. The post-doc feels he guided the student the most so deserves to be recognized, but the faculty member may feel different based on seniority or being the source of funding.
- Two or more students contributed to a piece of work and see their contribution as the strongest; this happens sometimes when a student postpones a line of work and another student is continuing with the work, directed by a joint supervisor.
- Two or more senior authors feel that they started or guided the project the most.
Obviously there is no "right way" to handle all circumstances, and indeed computer science handles authorship differently to, say, mathematics, for example. Of course everyone agrees that scientific authorship should imply substantial contributions to the work, but that is about as ambiguous a statement as can be made. To be more concrete, here are some observations.
First, some conflicts can be anticipated, for example the case of two students. Here, it is best to discuss a possible publication and authorship as soon as the second student gets involved. This discussion should be summarized via email for future reference. Likewise for the case of the small contributor, as soon as it is clear the work will end up in a publication a discussion should help to set expectations, for example to offer authorship only if additional work is invested.
Second, as a young PhD student one naturally underestimates the implicit future benefits that arise from co-authorship. For example the senior co-authors may present the work at venues otherwise inaccessible, or the work will lead to substantial future collaborations with the original co-authors.
Third, when considering whether to include a small contributor as co-author, the problem is most often not the co-authorship itself, but possible future actions by the contributor after the paper is published (for example, giving seminar talks about the paper). The other authors may then feel that the credit and opportunities are taken away from them. By discussing not just the co-authorship itself early but instead also what future paper-related actions are done by whom these problems can be avoided. For example, all authors may agree that seminar and job talks about the work should only be presented by the lead author.
Recommended Reading
I have bought many books on writing, especially when I started my PhD. But there is one that stands above all others, and if you are writing papers I can recommend this to you, no matter whether you just start out or have been writing since decades.
This book is "Scientific Writing: A Reader and Writer's Guide" by Jean-Luc Lebrun.
Acknowledgements. Thanks to Jonathan Strahl for corrections to the article.
Ten Tips for Writing CS Papers, Part 2的更多相关文章
- Ten Tips for Writing CS Papers, Part 1
Ten Tips for Writing CS Papers, Part 1 As a non-native English speaker I can relate to the challenge ...
- Tips for writing a paper
Tips for writing a paper 1. Tips for Paper Writing 2.• Before you write a paper • When you are writi ...
- 写出完美论文的十个技巧10 Tips for Writing the Perfect Paper
10 Tips for Writing the Perfect Paper Like a gourmet meal or an old master painting, the perfect col ...
- 10 Tips for Writing Better Code (阅读理解)
出发点 http://www.tuicool.com/articles/A7VrE33 阅读中文版本<编写质优代码的十个技巧>,对于我编码十年的经验,也有相同感受, 太多的坑趟过,太多的经 ...
- 17 Tips For Writing An Excellent Email Subject Line
Out of the billions of emails that are sent every day, how can you make sure that yours stands out? ...
- 10 Useeful Tips for Writing Effective Bash Scripts in Linux
1.Always Use Comments in Scripts2.Make a Scripts exit When Fails Sometimes bash may continue to e ...
- (转)A Survival Guide to a PhD
Andrej Karpathy blog About Hacker's guide to Neural Networks A Survival Guide to a PhD Sep 7, 2016 T ...
- (转) A Survival Guide to a PhD
A Survival Guide to a PhD Sep 7, 2016 This guide is patterned after my “Doing well in your courses”, ...
- Practical Go: Real world advice for writing maintainable Go programs
转自:https://dave.cheney.net/practical-go/presentations/qcon-china.html?from=timeline 1. Guiding pri ...
随机推荐
- 【转】【C++】ShellExecute, WinExec, CreateProcess 三者的区别
ShellExecute ShellExecute的功能是运行一个外部程序(或者是打开一个已注册的文件.打开一个目录.打印一个文件等等),并对外部程序有一定的控制. 有几个API函数都可以实现这些功能 ...
- Linux Shell编程一
交互模式 --当Shell收到用户输入命令后,就开始执行这项命令,并把结果显示到屏幕上,结束后Shell又会显示系统提示符,等待用户输入下一条命令. 后台运行 --后台运行的符号为"& ...
- Caffe学习系列(16):caffemodel可视化
通过前面的学习,我们已经能够正常训练各种数据了.设置好solver.prototxt后,我们可以把训练好的模型保存起来,如lenet_iter_10000.caffemodel. 训练多少次就自动保存 ...
- 编写高质量iOS代码与OS X代码的effective 方法小结
一.熟悉OC: 了解OC的起源: OC和C++,Java等面向对象语言类似,不过有很方面差别.因为该语言使用 消息结构而非函数调用. 消息结构和函数调用的区别:前者是在其运行时所应执行的代码由运行环 ...
- html5新增选择器
分享点html5的学习笔记,比较基础,突然发现通过写博客来记笔记有很多优点呢,平常记得笔记比较简单,复习起来比较吃力,看自己的博客理解起来还比较轻松,而且只有真正理解了才能表达清楚让别人看懂,还锻炼语 ...
- 【JS笔记】私有变量
1.任何函数中定义的变量都可以认为是私有变量.函数内部可以访问,外部不能访问. 可以通过闭包创建特权方法访问私有变量. function Foo(){ var n=10; this.returnN=f ...
- JavaScript基础1
JavaScript写在<script></script>之间 <script type="text/javascript">表示在< ...
- Git.Framework 框架随手记--ORM查询数据集合 二
一. 分组查询 在SQL中使用Group By 来对数据分组,在实际中分组中一般与聚合函数一并使用.在Git.Framework中提供了相应的分组方法 DataTable Group(T entit ...
- Object C学习笔记20-结构体
在学习Object C中的过程中,关于struct的资料貌似非常少,查阅了C方面的资料总结了一些学习心得! 一. 定义结构 结构体是一种数据类型的组合和数据抽象.结构体的定义语法如下: struct ...
- [Aaronyang] 写给自己的WPF4.5 笔记[3MenuItem中的icon]
敢于尝试,就等于你已经向成功迈出了第一步 --Aaronyang的博客(www.ayjs.net)-www.8mi.me =============时隔两年后再看WPF========== 因为以前的 ...