I researched a lot about storage system classes given at good universities this year. This had two reasons: The first was this
post of a researcher at NetApp, about the missing of a good storage or file system class book and secondly our
own storage systems class where I was the TA.
In this post I want to give a short overview about the various different courses, their focus, and other things. Please note, the following text might contain errors or misconceptions on my part. I also might have missed other storage courses at these universities.
University of California, Santa Cruz:
Let's begin with the course of the University of California in Santa Cruz. Storage is a huge at UCSC with the Storage Systems Research Center that partners with nearly very everyone. The
ceph file system and the
crush hash function are two outcomes of their research.
The course consists of a series of lectures (two per week), lots of reading material, and a project. The lectures are about file systems beginning with uniprocessor filesystems, performance analysis and (very fast) to distributed filesystems. They also cover fault tolerance and other advanced topics. Their reading material consists of 37 papers from classics like "
File System Design for an NFS File System Appliance" to state of the art research papers like "
An Analysis of Data Corruption in the Storage Stack" (FAST 2008) that come about two weeks before.
I miss some important basics that IMHO are important for understanding storage system design, like properties of modern hard disks and I am not that into archival storage (my boss is), but it is a really good designed course. Unfortunately, the lecture slides are not available online.
Columbia University, New York
I may have missed one, but the last storage related course at Columbia University had been in 2004 by Kostas Magoutis. The course is focused on network storage and probably relies on basics from an Operating Systems class or a basis storage class. The lectures had been one per week with one to three papers are reading material per week.
Really nice is that the lecturer has posted notes how the read the papers with questions and annotations to some of the material. Interestingly, data deduplication is covered with the LBFS, the Venti paper, and Henson'sCompare-By-Hash papers.
There are three books recommended for the course "UNIX Internals (1996)", "The Design and the Implementation of the 4.4 BSD Operating System (1996)" and "NFS Illustrated (1999)".
Cornell University, New York
Advanced Distributed Storage Systems, Spring 2009:
At the Cornel University, I found the course and advanced distributed storage systems by Hakim Westherspoon (has taken part in the
OceanStore project). The lectures, given two per week, handle "Cloud Computing, "Network File Systems", the important topics of Consistency, Availability, Replication, and Scalability.
I think the major strength of this course is that it seems to focus much more than the other courses and the important concepts needed for storage system design, implementation and research than the focus on standards, products, and storage management issues. The major weakness is that the individual lectures are very focused on the research papers, whose content is presented. Even to the point that there is no single presentation scheme. I think the overall consistency of the lecture is weakened this way.
One interesting aspect of the course is that the students have to write and hand-in short summaries of the reading material papers consisting a summary (3-4 sentences), two or three major strength points, two or three weaknesses and one question of future work that should be followed in the option of the student.
The have to projects as part of the course: In the first the students have to develop a distributed file system based on Amazon Web Service infrastructure. the second is a research project, the students have to come up with by themselves.
John Hopkins University
Storage Systems, Fall 2007:
At the John Hopkins University -- where our professors of Christian Scheideler and my advisor Andre Brinkmann (as visiting PhD student) had formerly been -- I found the Storage Systems course by Randal Burns.
As usual the course consists of a lecture series (2 lectures as 50min per week), home works, and a project. I like that the course some basics like disk drive architecture that a essential to understand the design of storage systems. On the other side it is a bit short on distributed file systems.
University of Notre Dame:
The University of Notre Dame offered in 2005 the course "Distributed Storage" by Surendar Chandra.
As usual the course consists of a series of lectures (2 per week) and a project. The lectures topics are "Naming and location", "Consistency and Replication", "Distributed Storage Management", "Security", "Peer-to-Peer Storage and Sensors", and "Energy Management". The reading material consists of not less than 40 papers. My impression is that the collection of reading material differs much from the material of the other courses covered here, e.g. the well-known "classical" papers are not linked.
Technion:
Technion is the "Israel Institute of Technology" in Haifa and I said
before: I am pretty envy to the students there. However, not especially because of the "Filesystems" course.
The lecture series consists of an short introduction on disk drive architecture, RAID, sequential data processing on tapes (hey, I infer here from the pictures in the slides only), disk-based sorting, B-Trees, Hashing, concurrency and transactions as well as recovery.
The course recommends five books: "
File Structures and Analytic Approach", "Transactional Information Systems", "Principles of Database and Knowledge-Base Systems", "Database Management Systems", and "Database System Implementation". None of these books are directly filesystem related. The books match exactly to the lectures, mostly related to the basics shared between databases and storage systems, but nothing directly related to file systems.
The assignments seem to be pretty similar to ours. It seems to consist of multiple assignments about an easy filesystem implementation. However, the assignments are given also in Hebrew, so I don't understand them. I expected more from a Technion course.
University of Wisconsin in Madison:
The advanced storage systems class given at the University of Wisconsin seems to be a nicely structures class with interesting topics: It begins with local storage systems, but moves very quickly (3. topic) to distributed and mobile systems. Then important concepts like reliability and fault tolerance, performance and scalability as well as caching, replication and consistency are discussed. The reading material is a nice list of now classics like the
WAFL paper, the
AutoRAID paper, the
GoogleFS and
MapReduce, but also
Row Diagonal Parity and the
"soft update" paper.
What universities are missing:
The University of California, Berkeley is missing: The home of BSD (and therefore the Fast File System), RAID, and a lot of early work in P2P storage seems to have no course focussed on storage or file systems. I could not find classes in Stanford, Harvard, MIT, and Carnegie Mellon.
Summary
To sum these courses up a bit: Most courses have large amounts of reading material. This is unusual in Germany (or at least at
UPB). I had enough courses (especially in the SE part) without any reading material: We followed this "US style" in our course, but only with 12 papers. Most courses have a project assignment for the students where the students have to come up with an own topic. I really like this, too.
Our own courses
"Our" own storage systems course consists a lecture series with 15 lectures a 90 min and 6 assignments.
The lecture starts very slow, with "Magnetic Storage Systems" (week 1), Disk Scheduling (week 2), an introduction in MEMS and Flash storage (week 3), and RAID (week 4, 5). Next came filesystems (6,7) and storage connection technologies like SCSI (week 8) to SANS (week 9). Network and parallel file systems are treated in week 10 - 12.
The assignments consisted of programming small FUSE filesystem in C (step-by-step).
In the last third of the lecture, the courses treated advanced storage topics that are interesting for our current research project like Long Term Archiving, HPC IO (MPI IO), Contentious Data Protection (CDP), Data Deduplication and P2P Storage.
Last words:
I really liked studying and comparing the storage system lectures. These lecture provide a pretty good overview about the classical (I should call them "essential") research papers of our field and an overview about related books as long as a real storage system course book is missing.
I am impressed that so many universities have "project" assignments where the students have to come up with a topic by themselves. These lectures show want is possible on good (mainly US-) universities, with motivated students, and with the right foundations.
This blog is copied from: http://dirkmeister.blogspot.com/2009/12/storage-system-and-file-system-courses.html
- PatentTips – EMC Virtual File System
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention generally relates to net ...
- Extension of write anywhere file system layout
A file system layout apportions an underlying physical volume into one or more virtual volumes (vvol ...
- Extensible File System
An extensible file system format for portable storage media is provided. The extensible file system ...
- FUSE and File System
FUSE: File system in USErspace. So what is a file system? A file system maps file paths to file cont ...
- Union File System
目录 Union File System AUFS Docker是如何使用AUFS的 image layer 和 AUFS (docker版本不同可能会有区别,我的是在/var/lib/docker下 ...
- Low-overhead enhancement of reliability of journaled file system using solid state storage and de-duplication
A mechanism is provided in a data processing system for reliable asynchronous solid-state device bas ...
- filebench - File system and storage benchmark - 模拟生成各种各样的应用的负载 - A Model Based File System Workload Generator
兼容posix 接口的文件系统中我们不仅要测试 posix 接口是否兼容.随机读,随机写,顺序读,顺序写等读写模式下的性能.我们还要测试在不同工作负载条件下的文件系统的性能的情况:Filebench ...
- Design and Implementation of the Sun Network File System
Introduction The network file system(NFS) is a client/service application that provides shared file ...
- 谷歌三大核心技术(一)The Google File System中文版
谷歌三大核心技术(一)The Google File System中文版 The Google File System中文版 译者:alex 摘要 我们设计并实现了Google GFS文件系统,一个 ...
随机推荐
- 在OneNote中快速插入当前日期和时间
做笔记,难免有时需要记录当时的时间,记住这个快捷键会让记笔记的效率提升一点. To insert the current date and time, press Alt+Shift+F. To in ...
- SpringBoot学习:整合shiro(身份认证和权限认证),使用EhCache缓存
项目下载地址:http://download.csdn.NET/detail/aqsunkai/9805821 (一)在pom.xml中添加依赖: <properties> <shi ...
- 根据ip地址获取用户所在地
java代码: package com.henu.controller; import java.io.BufferedReader; import java.io.IOException; impo ...
- 【Scala】Scala-case-参考资料
Scala-case-参考资料 scala case_百度搜索 Scala School - 基础知识(续) scala case匹配值 - CSDN博客 scala入门教程:scala中的match ...
- libxml2的安装及使用[总结]
1.前言 xml广泛应用于网络数据交换,配置文件.Web服务等等.近段时间项目中做一些配置文件,原来是用ini,现在改用xml.xml相对来说可视性更为直观,很容易看出数据之间的层次关系.关于xml的 ...
- 转: Xshell鼠标选中,终端立即中断(CTRL-C)的问题
转自: https://nkcoder.github.io/2014/05/05/xshell-select-interrupt-dict/ Xshell选中文字复制时中断 在Xshell中设置了“自 ...
- 调试工具DebugView
可以用来跟踪如下函数的输出: Win32 OutputDebugString Kernel-mode DbgPrint All kernel-mode variants of DbgPrint i ...
- jchat-windows-master 编译输出日志
第一个项目成功生成的输出日志 >------ 已启动全部重新生成: 项目: QxOrm, 配置: Debug x64 ------ >Moc'ing IxModel.h... >Mo ...
- FireDAC中的SQLite(二)
我们接下来将要使用FDDemo.sdb数据库进行访问,开始我们的第一个SQLite访问例子. 我们的FDDemo.sdb存放目录在:C:\Program Files (x86)\Embarcadero ...
- (算法)等概率选出m个整数
题目: 从大小为n的整数数组A中随机选出m个整数,要求每个元素被选中的概率相同. 思路: n选m,等概率情况下,每个数被选中的概率为m/n. 方法: 初始化:从A中选择前m个元素作为初始数组: 随机选 ...