I researched a lot about storage system classes given at good universities this year. This had two reasons: The first was thispost of a researcher at NetApp, about the missing of a good storage or file system class book and secondly our own storage systems class where I was the TA.
 
In this post I want to give a short overview about the various different courses, their focus, and other things. Please note, the following text might contain errors or misconceptions on my part. I also might have missed other storage courses at these universities.
 
University of California, Santa Cruz: 
 
Let's begin with the course of the University of California in Santa Cruz. Storage is a huge at UCSC with the Storage Systems Research Center that partners with nearly very everyone. The ceph file system and the crush hash function are two outcomes of their research. 
 
The course consists of a series of lectures (two per week), lots of reading material, and a project. The lectures are about file systems beginning with uniprocessor filesystems, performance analysis and (very fast) to distributed filesystems. They also cover fault tolerance and other advanced topics. Their reading material consists of 37 papers from classics like "File System Design for an NFS File System Appliance"  to state of the art research papers like "An Analysis of Data Corruption in the Storage Stack" (FAST 2008) that come about two weeks before.

I miss some important basics that IMHO are important for understanding storage system design, like properties of modern hard disks and I am not that into archival storage (my boss is), but it is a really good designed course. Unfortunately, the lecture slides are not available online.

 
Columbia University, New York
Advanced Topics in Network Storage Systems, Spring 2004:
http://www1.cs.columbia.edu/~magoutis/cs699810-spring04/index.html
 
I may have missed one, but the last storage related course at Columbia University had been in 2004 by Kostas Magoutis. The course is focused on network storage and probably relies on basics from an Operating Systems class or a basis storage class. The lectures had been one per week with one to three papers are reading material per week.

Really nice is that the lecturer has posted notes how the read the papers with questions and annotations to some of the material. Interestingly, data deduplication is covered with the LBFS, the Venti paper, and Henson'sCompare-By-Hash papers.

There are three books recommended for the course "UNIX Internals (1996)", "The Design and the Implementation of the 4.4 BSD Operating System (1996)" and "NFS Illustrated (1999)".

 
Cornell University, New York
Advanced Distributed Storage Systems, Spring 2009: 
 
At the Cornel University, I found the course and advanced distributed storage systems by Hakim Westherspoon (has taken part in the OceanStore project). The lectures, given two per week, handle "Cloud Computing, "Network File Systems", the important topics of Consistency, Availability, Replication, and Scalability. 
 
I think the major strength of this course is that it seems to focus much more than the other courses and the important concepts needed for storage system design, implementation and research than the focus on standards, products, and storage management issues. The major weakness is that the individual lectures are very focused on the research papers, whose content is presented. Even to the point that there is no single presentation scheme. I think the overall consistency of the lecture is weakened this way. 
 
One interesting aspect of the course is that the students have to write and hand-in short summaries of the reading material papers consisting a summary (3-4 sentences), two or three major strength points, two or three weaknesses and one question of future work that should be followed in the option of the student.
 
The have to projects as part of the course: In the first the students have to develop a distributed file system based on Amazon Web Service infrastructure. the second is a research project, the students have to come up with by themselves.
 
For the course 6 books are recommended: Two books by Richard Stevens (UNIX Network ProgrammingAdvanced Programming in the UNIX Enviromment), two books by Tanenbaum (Modern Operating SystemsDistributed Systems), "The Design and Implementation of the 4.4 BSD Operating Systems", and "The C++ Programming Language" book by Stroustrup.
 
John Hopkins University
Storage Systems, Fall 2007:
 
At the John Hopkins University -- where our professors of Christian Scheideler and my advisor Andre Brinkmann (as visiting PhD student) had formerly been -- I found the Storage Systems course by Randal Burns.
 
As usual the course consists of a lecture series (2 lectures as 50min per week), home works, and a project. I like that the course some basics like disk drive architecture that a essential to understand the design of storage systems. On the other side it is a bit short on distributed file systems.
 
 
University of Notre Dame:
 
The University of Notre Dame offered in 2005 the course "Distributed Storage" by Surendar Chandra. 
 
As usual the course consists of a series of lectures (2 per week) and a project. The lectures topics are "Naming and location", "Consistency and Replication", "Distributed Storage Management", "Security", "Peer-to-Peer Storage and Sensors", and "Energy Management". The reading material consists of not less than 40 papers. My impression is that the collection of reading material differs much from the material of the other courses covered here, e.g. the well-known "classical" papers are not linked.
 
Technion
 
Technion is the "Israel Institute of Technology" in Haifa and I said before: I am pretty envy to the students there. However, not especially because of the "Filesystems" course.
 
The lecture series consists of an short introduction on disk drive architecture, RAID, sequential data processing on tapes (hey, I infer here from the pictures in the slides only), disk-based sorting, B-Trees, Hashing, concurrency and transactions as well as recovery. 
 
The course recommends five books: "File Structures and Analytic Approach", "Transactional Information Systems", "Principles of Database and Knowledge-Base Systems", "Database Management Systems", and "Database System Implementation". None of these books are directly filesystem related. The books match exactly to the lectures, mostly related to the basics shared between databases and storage systems, but nothing directly related to file systems.
 
The assignments seem to be pretty similar to ours. It seems to consist of multiple assignments about an easy filesystem implementation. However, the assignments are given also in Hebrew, so I don't understand them. I expected more from a Technion course. 
 
University of Wisconsin in Madison: 
Advanced Storage Systems, Spring 2006:
http://pages.cs.wisc.edu/~remzi/Classes/738/Spring2006/
 
The advanced storage systems class given at the University of Wisconsin seems to be a nicely structures class with interesting topics: It begins with local storage systems, but moves very quickly (3. topic) to distributed and mobile systems. Then important concepts like reliability and fault tolerance, performance and scalability as well as caching, replication and consistency are discussed. The reading material is a nice list of now classics like the WAFL paper, the AutoRAID paper, the GoogleFS andMapReduce, but also Row Diagonal Parity and the "soft update" paper.

What universities are missing:
The University of California, Berkeley is missing: The home of BSD (and therefore the Fast File System), RAID, and a lot of early work in P2P storage seems to have no course focussed on storage or file systems. I could not find classes in Stanford, Harvard, MIT, and Carnegie Mellon.

 
Summary
To sum these courses up a bit: Most courses have large amounts of reading material. This is unusual in Germany (or at least atUPB). I had enough courses (especially in the SE part) without any reading material: We followed this "US style" in our course, but only with 12 papers. Most courses have a project assignment for the students where the students have to come up with an own topic. I really like this, too.
 
Our own courses
Storage Systems (German), University of Paderborn, Spring 2009:
http://pc2.uni-paderborn.de/teaching/lectures/speichersysteme/
 
"Our" own storage systems course consists a lecture series with 15 lectures a 90 min and 6 assignments.
 
The lecture starts very slow, with "Magnetic Storage Systems" (week 1), Disk Scheduling (week 2), an introduction in MEMS and Flash storage (week 3), and RAID (week 4, 5). Next came filesystems (6,7) and storage connection technologies like SCSI (week 8) to SANS (week 9). Network and parallel file systems are treated in week 10 - 12. 
 
The assignments consisted of programming small FUSE filesystem in C (step-by-step).
 
In the last third of the lecture, the courses treated advanced storage topics that are interesting for our current research project like Long Term Archiving, HPC IO (MPI IO), Contentious Data Protection (CDP), Data Deduplication and P2P Storage.
 
In addition to the reading material, we referred to the book "Linux Device Drivers".

Our professor, Andre Brinkmann also gave a short course (6 lectures) called "Theoretical Aspects of Storage Systems Research" at the Politechnika Wroclwska in Poland, which is a very condensed version of our course focussed on the theoretical aspects.

 
Last words:
I really liked studying and comparing the storage system lectures. These lecture provide a pretty good overview about the classical (I should call them "essential") research papers of our field and an overview about related books as long as a real storage system course book is missing.

I am impressed that so many universities have "project" assignments where the students have to come up with a topic by themselves. These lectures show want is possible on good (mainly US-) universities, with motivated students, and with the right foundations.

 
This blog is copied from: http://dirkmeister.blogspot.com/2009/12/storage-system-and-file-system-courses.html

Storage System and File System Courses的更多相关文章

  1. PatentTips – EMC Virtual File System

    BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention generally relates to net ...

  2. Extension of write anywhere file system layout

    A file system layout apportions an underlying physical volume into one or more virtual volumes (vvol ...

  3. Extensible File System

    An extensible file system format for portable storage media is provided. The extensible file system ...

  4. FUSE and File System

    FUSE: File system in USErspace. So what is a file system? A file system maps file paths to file cont ...

  5. Union File System

    目录 Union File System AUFS Docker是如何使用AUFS的 image layer 和 AUFS (docker版本不同可能会有区别,我的是在/var/lib/docker下 ...

  6. Low-overhead enhancement of reliability of journaled file system using solid state storage and de-duplication

    A mechanism is provided in a data processing system for reliable asynchronous solid-state device bas ...

  7. filebench - File system and storage benchmark - 模拟生成各种各样的应用的负载 - A Model Based File System Workload Generator

    兼容posix 接口的文件系统中我们不仅要测试 posix 接口是否兼容.随机读,随机写,顺序读,顺序写等读写模式下的性能.我们还要测试在不同工作负载条件下的文件系统的性能的情况:Filebench ...

  8. Design and Implementation of the Sun Network File System

    Introduction The network file system(NFS) is a client/service application that provides shared file ...

  9. 谷歌三大核心技术(一)The Google File System中文版

    谷歌三大核心技术(一)The Google File System中文版  The Google File System中文版 译者:alex 摘要 我们设计并实现了Google GFS文件系统,一个 ...

随机推荐

  1. [leetcode]Substring with Concatenation of All Words @ Python

    原题地址:https://oj.leetcode.com/problems/substring-with-concatenation-of-all-words/ 题意: You are given a ...

  2. IT知识大扫盲

    做了这么多软件开发,下列一些知识不一定都懂. 首先,说一些电子商务扫盲的名词: 常见的电子商务类型有:C2C.B2B.B2C.C2B.O2O等等,下面来简要说明下这几种类型. C2C(Customer ...

  3. MYSQL数据删除数据,物理空间没释放

    当您的库中删除了大量的数据后,您可能会发现数据文件尺寸并没有减小.这是因为删除操作后在数据文件中留下碎片所致.OPTIMIZE TABLE 是指对表进行优化.如果已经删除了表的一大部分数据,或者如果已 ...

  4. php7安装mongoDB扩展

    本文我们使用pecl命令来安装 首先来到php7的安装目录 $ /usr/local/php7/bin/pecl install mongodb 回车,执行成功后,会输出以下结果: …… Build ...

  5. 几种梯度下降方法对比(Batch gradient descent、Mini-batch gradient descent 和 stochastic gradient descent)

    https://blog.csdn.net/u012328159/article/details/80252012 我们在训练神经网络模型时,最常用的就是梯度下降,这篇博客主要介绍下几种梯度下降的变种 ...

  6. JAVA 中不错的开源FTP组件:commons-net

    第一步:引入jar到pom.xml. <!-- https://mvnrepository.com/artifact/commons-net/commons-net --> <dep ...

  7. vim的查找与替换

    http://harttle.land/2016/08/08/vim-search-in-file.html

  8. 大智慧专业财务PFFIN(N,M)函数N的取值一览表

    每股指标 1001 摊薄每股收益 1002 净资产收益率 1003 每股经营活动现金流量 1004 每股净资产 1005 每股资本公积金 1006 每股未分配利润 1007 每股主营收入 1008 扣 ...

  9. Angular入门笔记

    AngularJS(下面简称其为ng)是Google开源的一款JavaScript MVC框架,弥补了HTML在构建应用方面的不足,其通过使用指令(directives)结构来扩展HTML词汇,使开发 ...

  10. [Algorithm] How many times is a sorted array rotated?

    Given a sorted array, for example: // [2,5,6,8,11,12,15,18] Then we rotated it 1 time, it becomes: / ...