Code is not literature
http://www.gigamonkeys.com/code-reading/
I have started code reading groups at the last two companies I’ve worked at, Etsy and Twitter, and some folks have asked for my advice about code reading and running code reading groups. Tl;dr: don’t start a code reading group. What you should start instead I’ll get to in a moment but first I need to explain how I arrived at my current opinion.
As a former English major and a sometimes writer, I had always been drawn to the idea that code is like literature and that we ought to learn to write code the way we learn to write English: by reading good examples. And I’m certainly not the only one to have taken this point of view—Donald Knuth, in addition to his work on The Art of Computer Programming and TeX, has long been a proponent of what he calls Literate Programming and has published several of his large programs as books.
On the other hand, long before I got to Etsy and started my first code reading group I had in hand several pieces of evidence that should have suggested to me that this was the wrong way to look at things.
First, when I did my book of interviews with programmers, Coders at Work, I asked pretty much everyone about code reading. And while most of them said it was important and that programmers should do it more, when I asked them about what code they had read recently, very few of them had any great answers. Some of them had done some serious code reading as young hackers but almost no one seemed to have a regular habit of reading code. Knuth, the great synthesizer of computer science, does seem to read a lot of code and Brad Fitzpatrick was able to talk about several pieces of open source code that he had read just for the heck of it. But they were the exceptions.
If that wasn’t enough, after I finished Coders I had a chance to interview Hal Abelson, the famous MIT professor and co-author of the Structure and Interpretation of Computer Programs. The first time I talked to him I asked my usual question about reading code and he gave the standard answer—that it was important and we should do it more. But he too failed to name any code he had read recently other than code he was obliged to: reviewing co-workers’ code at Google where he was on sabbatical and grading student code at MIT. Later I asked him about this disconnect:
Seibel: I’m still curious about this split between what people say and what they actually do. Everyone says, “People should read code” but few people seem to actually do it. I’d be surprised if I interviewed a novelist and asked them what the last novel they had read was, and they said, “Oh, I haven’t really read a novel since I was in grad school.” Writers actually read other writers but it doesn’t seem that programmers really do, even though we say we should.
Abelson: Yeah. You’re right. But remember, a lot of times you crud up a program to make it finally work and do all of the things that you need it to do, so there’s a lot of extraneous stuff around there that isn’t the core idea.
Seibel: So basically you’re saying that in the end, most code isn’t worth reading?
Abelson: Or it’s built from an initial plan or some kind of pseudocode. A lot of the code in books, they have some very cleaned-up version that doesn’t do all the stuff it needs to make it work.
Seibel: I’m thinking of the preface to SICP, where it says, “programs must be written for people to read and only incidentally for machines to execute.” But it seems the reality you just described is that in fact, most programs are written for machines to execute and only incidentally, if at all, for people to read.
Abelson: Well, I think they start out for people to read, because there’s some idea there. You explain stuff. That’s a little bit of what we have in the book. There are some fairly significant programs in the book, like the compiler. And that’s partly because we think the easiest way to explain what it’s doing is to express that in code.
Yet somehow even this explicit acknowledgement that most real code isn’t actually in a form that can be simply read wasn’t enough to lead me to abandon the literature seminar model when I got to Etsy. For our first meeting I picked Jeremy Ashkenas’s backbone.js because many of the Etsy developers would be familiar with Javascript and because I know Jeremy is particularly interested in writing readable code. I still envisioned something like a literature seminar but I figured that a lot of people wouldn’t actually have done the reading in advance (well, maybe not so different from a literature seminar) so I decided to start things off by presenting the code myself before the group discussion.
As I prepared my presentation, I found myself falling into my usual pattern when trying to really understand a piece of code—in order to grok it I have to essentially rewrite it. I’ll start by renaming a few things so they make more sense to me and then I’ll move things around to suit my ideas about how to organize code. Pretty soon I’ll have gotten deep into the abstractions (or lack thereof) of the code and will start making bigger changes to the structure of the code. Once I’ve completely rewritten the thing I usually understand it pretty well and can even go back to the original and understand it too. I have always felt kind of bad about this approach to code reading but it's the only thing that's ever worked for me.
My presentation to the code reading group started with stock backbone.js and then walked through the changes I would make to it to make it, by my lights, more understandable. At one point I asked if people thought we should move on to the group discussion but nobody seemed very interested. Hopefully seeing my refactoring gave people some of the same insights into the underlying structure of the original that I had obtained by doing the refactoring.
The second meeting of the Etsy code reading group featured Avi Bryant demonstrating how to use the code browsing capabilities of Smalltalk to navigate through some code. In that case, because few of the Etsy engineers had any experience with Smalltalk, we had no expectation that folks would read the code in advance. But the presentation was an awesome chance for folks to get exposed to the power of the Smalltalk development environment and for me to heckle Avi about Smalltalk vs Lisp differences.
When I got to Twitter I inexplicably still had the literature seminar model in mind even though neither of the two meetings of the Etsy reading group—which folks seemed to like pretty well—followed that model at all. When I sent out the email inviting Twitter engineers to join a code reading group the response was pretty enthusiastic. The first meeting was, yet again, a presentation of some code, in this case the internals of the Scala implementation of Future that is used throughout Twitter’s many services, presented by Marius Eriksen, who wrote most of it.
It was sometime after that presentation that I finally realized the obvious: code is not literature. We don’t read code, we decode it. We examine it. A piece of code is not literature; it is a specimen. Knuth said something that should have pointed me down this track when I asked him about his own code reading:
Knuth: But it’s really worth it for what it builds in your brain. So how do I do it? There was a machine called the Bunker Ramo 300 and somebody told me that the Fortran compiler for this machine was really amazingly fast, but nobody had any idea why it worked. I got a copy of the source-code listing for it. I didn’t have a manual for the machine, so I wasn’t even sure what the machine language was.
But I took it as an interesting challenge. I could figure out
BEGIN
and then I would start to decode. The operation codes had some two-letter mnemonics and so I could start to figure out “This probably was a load instruction, this probably was a branch.” And I knew it was a Fortran compiler, so at some point it looked at column seven of a card, and that was where it would tell if it was a comment or not.
After three hours I had figured out a little bit about the machine. Then I found these big, branching tables. So it was a puzzle and I kept just making little charts like I’m working at a security agency trying to decode a secret code. But I knew it worked and I knew it was a Fortran compiler—it wasn’t encrypted in the sense that it was intentionally obscure; it was only in code because I hadn’t gotten the manual for the machine.
Eventually I was able to figure out why this compiler was so fast. Unfortunately it wasn’t because the algorithms were brilliant; it was just because they had used unstructured programming and hand optimized the code to the hilt.
It was just basically the way you solve some kind of an unknown puzzle—make tables and charts and get a little more information here and make a hypothesis. In general when I’m reading a technical paper, it’s the same challenge. I’m trying to get into the author’s mind, trying to figure out what the concept is. The more you learn to read other people’s stuff, the more able you are to invent your own in the future, it seems to me.
He’s not describing reading literature; he’s describing a scientific investigation. So now I have a new mode for how people should get together to gain insights from code which I explained to the Twitter code reading group like this:
Preparing for the talk I’m going to give to the Girls who Code cohort, I started thinking about what to tell them about code reading and code they should read. And once again it struck me that for all the lip service we pay to the idea of reading code, most programmers really don’t read that much code, at least not just for the sake of reading it. As a simple proof: name me one piece of code that you’ve read and that you can be reasonably sure that most other good programmers will have read or will at least have heard of. Not many, right? Probably none.
But then it hit me. Code is not literature and we are not readers. Rather, interesting pieces of code are specimens and we are naturalists. So instead of trying to pick out a piece of code and reading it and then discussing it like a bunch of Comp Lit. grad students, I think a better model is for one of us to play the role of a 19th century naturalist returning from a trip to some exotic island to present to the local scientific society a discussion of the crazy beetles they found: “Look at the antenna on this monster! They look incredibly ungainly but the male of the species can use these to kill small frogs in whose carcass the females lay their eggs.”
The point of such a presentation is to take a piece of code that the presenter has understood deeply and for them to help the audience understand the core ideas by pointing them out amidst the layers of evolutionary detritus (a.k.a. kluges) that are also part of almost all code. One reasonable approach might be to show the real code and then to show a stripped down reimplementation of just the key bits, kind of like a biologist staining a specimen to make various features easier to discern.
The ideal presentation should be aimed at an audience of gentleman and lady programmers—smart, and generally capable but without, necessarily, any specific knowledge of the domain from which the code comes. Presentations should provide enough context for the audience to to understand what the code is and should explain any details of the implementation language that may be obscure to the average programmer.
Since I had my epiphany we’ve had several meetings of the code reading group, now known as the Royal Society of Twitter for Improving Coding Knowledge, along the new lines. We’re still learning about the best ways to present code but the model feels very right. Also, I no longer feel bad about my dissection-based approach to reading code.
The biggest lesson so far is that code is very dense. A half hour presentation is just enough time to present maybe a dozen meaty lines of code and one main idea. It is also almost certainly the case that the presenters, who have to actually really dig down into a piece of code, get more out of it than anybody. But it does seem that a good presentation can at least expose people to the main ideas and maybe give them a head start if they do decide to read the code themselves.
Code is not literature的更多相关文章
- [渣译文] 使用 MVC 5 的 EF6 Code First 入门 系列:为ASP.NET MVC应用程序创建更复杂的数据模型
这是微软官方教程Getting Started with Entity Framework 6 Code First using MVC 5 系列的翻译,这里是第六篇:为ASP.NET MVC应用程序 ...
- MVC 5 的 EF6 Code First 入门
英文渣水平,大伙凑合着看吧…… 这是微软官方SignalR 2.0教程Getting Started with Entity Framework 6 Code First using MVC 5 系列 ...
- [翻译][MVC 5 + EF 6] 5:Code First数据库迁移与程序部署
原文:Code First Migrations and Deployment with the Entity Framework in an ASP.NET MVC Application 1.启用 ...
- 用 MVC 5 的 EF6 Code First 入门 系列:MVC程序中实体框架的Code First迁移和部署
用 MVC 5 的 EF6 Code First 入门 系列:MVC程序中实体框架的Code First迁移和部署 这是微软官方SignalR 2.0教程Getting Started with En ...
- MVC3+EF4.1学习系列(一)-------创建EF4.1 code first的第一个实例
基于EF4.1 code first 简单的CRUD 园子中已经有很多了 ~~ 真不想再写这个了 可是为了做一个完整的小demo 从开始 到后面的一些简单重构 还是决定认真把这个写出来 争取写些别人 ...
- [渣译文] 使用 MVC 5 的 EF6 Code First 入门 系列:建立一个EF数据模型
英文渣水平,大伙凑合着看吧…… 这是微软官方教程Getting Started with Entity Framework 6 Code First using MVC 5 系列的翻译,这里是第一篇: ...
- MVC5使用EF6 Code First--创建EF数据模型(一)
此Web应用程序演示如何使用Entity Framework 6和Visual Studio 2015创建ASP.NET MVC 5应用程序.本教程使用“Code First ”即代码先行.有关如何在 ...
- 使用 MVC 5 的 EF6 Code First 入门 系列:建立一个EF数据模型
这是微软官方SignalR 2.0教程Getting Started with Entity Framework 6 Code First using MVC 5 系列的翻译,这里是第一篇:建立一个E ...
- 在.net core中数据操作的两种方式(Db first && Code first)
在开发过程中我们通常使用的是Db first这种模式,而在.net core 中推荐使用的却是 code first 反正我是很不习惯这种开发模式 于是就搜寻整个微软的官方文档,终于找到了有关.net ...
随机推荐
- 创建SSTP VPN,适应win7, 控制台导入证书
1. 按 视窗+r 打开运行,运行 mmc命令 2.打开控制台窗口 3.在默认界面 按 Ctrl+m 快捷键 打开添加单元 窗口,然后在其左侧列表双击 证书 项 4.在打开的证书管理单元 第一步中选择 ...
- 制作ubuntu安装u盘
Ubuntu官方中文译名为友帮拓,是一款开源免费的linux操作系统.与其他的linux操作系统不同之处在于Ubuntu的软件包清单只包含那些高质量的重要应用程序,因此深受广大linux用户的喜爱,那 ...
- 《Java程序设计》实验三 实验报告
实验三 敏捷开发与XP实践 实验内容 XP基础 XP核心实践 相关工具 实验要求 1.没有Linux基础的同学建议先学习<Linux基础入门(新版)><Vim编辑器> 课程 2 ...
- hdu-------1081To The Max
To The Max Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others)Total ...
- 张艾迪(创始人):拥抱单身与自由的Eidyzhang
拥抱单身与自由(Single+Freedom) 拥抱伟大的梦想与理想.年轻一代的张扬与自信 拥抱AOOOiA.Global.224C的一切是我对这个世界的态度 +AOOOiA.Global.224C创 ...
- 【如何快速的开发一个完整的iOS直播app】(推流篇)
前言 在看这篇之前,如果您还不了解直播原理,请查看这篇文章如何快速的开发一个完整的iOS直播app(原理篇) 开发一款直播app,肯定需要流媒体服务器,本篇主要讲解直播中流媒体服务器搭建,并且讲解了如 ...
- spring关于urlpattern
视图解析器(ViewResolver)注册中央调度器定制处理器jsp页面搭建springmvc.xml配置效果图第一个案例提升----视图解析器关于urlpattern说法最好配成*.do 不能配成/ ...
- 用HTML5实现的各种排序算法的动画比较 及算法小结
用HTML5实现的各种排序算法的动画比较 http://www.webhek.com/misc/comparison-sort/ 几种排序算法效率的比较 来源:http://blog.chinauni ...
- Chrome开发,debug的使用方法。(转)
怎样打开Chrome的开发者工具? 你可以直接在页面上点击右键,然后选择审查元素: 或者在Chrome的工具中找到: 或者,你直接记住这个快捷方式: Ctrl+Shift+I (或者Ctrl+Shif ...
- ANGULAR 开发用户选择器指令
在开发表单时,我们需要使用经常需要使用到用户选择器,用户的数据一般使用如下方式存储: 用户1,用户2,用户3 我们可以使用angular指令实现选择器. <!DOCTYPE html> ...