Global UNIX file system cylinder group cache
A global cylinder group (CG) cache is stored in file server memory and shared by a plurality of file systems supported by the file server. The global CG cache comprises a number CG entries which are pre-allocated in memory. As different file systems are accessed, global CG entries in the CG cache are used to store CG block information for the accesses. With such an arrangement, a file server may support multiple file systems using a single global CG cache without starvation and other the adverse performance impacts of the prior art. According to one aspect of the invention, the global CG cache is periodically scanned to reclaim memory. In contrast to the prior art, where multiple scans were periodically performed of multiple CG caches for memory reclamation, the use of a single CG cache minimizes the impact of CG cache maintenance on file server performance.
FIELD OF THE INVENTION
This invention relates generally to the field of file systems and more particularly to caching cylinder group information.
BACKGROUND OF THE INVENTION
FIG. 1 illustrates several components that may be included in a typical UNIX file system (UFS). Available space of a disk storage system 3 is generally shared by one or more file systems such as file system A (8) and file system B (9). File systems define the use of data storage devices that offer access to an array of fixed-size blocks, or sectors. The file system software is responsible for organizing these sectors into files and directories, and keeping track of which sectors belong to which file and which are not being used.
A UNIX file system (UFS) is a file system used by many UNIX and Unix-like operating systems. Each Unix file system is composed of a group of boot blocks, a superblock and a collection of cylinder groups (CGs). The boot block stores code which is used to initialize the file system. The superblock stores statistical, geometric and behavioral tuning parameters associated with the file system. Each cylinder group (CG) has the following components: a backup copy of the superblock a CG block which stores information to the particular cylinder, a collection of inode data structures that store metadata associated with the cylinder group data blocks, and the cylinder group data blocks. Each CG block entry includes a first portion characterizing the state of the associated cylinder, and a second portion including the cylinder group header of the associated cylinder. The state of the associated cylinder may include a cylinder group number, the block number within the cylinder where the cylinder block resides, state and other cylinder group counters. The cylinder group block stores attribute and inode information for the cylinder, and will generally vary in size from one file system to another based on the inode density of the cylinder group.
A file server such as file server 2 accesses storage devices associated with the file systems. File servers generally includes a memory 4, a portion of which is allocated to the temporary storage of data retrieved from the disk device to increase application performance by the reducing delays associated with I/O disk access. To support a typical UFS file system, the file server allocates a portion of memory to each file system to store a cylinder group cache (CG cache 5,6) for the file system. Each CG cache entry stores an in memory representation of each cylinder block for the associated file system. For example, in FIG. 1 the memory includes FSA CG cache 5 for storing CG block for file system A, and FS CG cache 6 for storing CG blocks for file system B. A CG cache is assembled in memory when each UFS is first mounted, and remains in memory until the associated file system is unmounted.
Storing the CG block in memory as described above allows the file server to more quickly retrieve cylinder information, resulting in faster allocation of and access to data blocks. However, as file systems are mounted and storage for the associated CG caches is reserved in memory, memory can quickly become full. Larger file systems, having concomitantly larger CG caches, will receive a larger allocation of memory, effectively starving out other file systems and reducing the performance of the file server.
One technique that is used to reclaim memory to reduce starvation involves searching the CG block entries of each individual CG cache to determine which CGs are least recently accessed. The cylinder block information associated with the LRU cylinder can be deleted, allowing at least a portion of the memory to be reclaimed for use by other file systems. Generally the file systems control the scanning of the CG caches during a synch period allocated to each file system. CG entries which are determined to be 'aged' during the synch period are deleted from memory. While this reclaiming scheme helps to reduce the impact that larger file systems have on file server resources, in heavily loaded storage systems the reclamation may not be sufficient to eliminate memory starvation. In addition, the frequent scanning of CG caches by multiple file systems in the reclaiming scheme reduces the performance of the file server. It would be desirable to identify a CG caching scheme that would overcome the problems of the prior art.
SUMMARY OF THE INVENTION
According to one aspect of the invention, a global cylinder group (CG) cache is stored in a memory of a file server and shared by a plurality of file systems supported by the file server. The global CG cache comprises a number CG entries which are pre-allocated in memory. As different file systems are accessed, global CG entries in the CG cache are used to store CG block information for the accesses. Thus space in the global CG cache is reserved only for CG blocks associated with recently accessed, or 'active' file system cylinders. The global CG cache is indexed using a combination of a file system identifier and cylinder group number. With such an arrangement, a file server may support multiple file systems using a single global CG cache without starvation and other the adverse performance impacts of the prior art. According to one aspect of the invention, the global CG cache is periodically scanned to reclaim memory. In contrast to the prior art, where multiple scans were periodically performed of multiple CG caches for memory reclamation, the use of a single CG cache minimizes the impact of CG cache maintenance on file server performance.
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 is a block diagram of a storage system that utilizes a prior art UNIX file system (UFS) cylinder cache arrangement;
FIG. 2 is a block diagram of a storage system that may include a file server including a global cylinder cache of the present invention;
FIG. 3 is a block diagram provided to illustrate one embodiment of a cache controller and global cylinder group cache that may operate according to the principles of the present invention;
FIG. 4 is a diagram provided to illustrate several fields that may be included in acylinder group block entry of the present invention;
FIG. 5 is a flow diagram provided to illustrate several exemplary steps that may be performed by the cache controller of FIG. 3 to allocate cylinder group block entries and generate a list of cylinder group block entries;
FIG. 6 is a flow diagram which illustrates several steps that may be performed by the cache controller to populate a cylinder group cache of the present invention in response to file server access requests;
FIG. 7 is a flow diagram provided to illustrate several steps that may be performed in a cylinder group entry aging process of the present invention;
FIG. 8 is a flow diagram provided to illustrate several steps that may be performed in a cylinder group entry reclaim process of the present invention;
FIG. 9 is a flow diagram provided to illustrate several steps that may be performed by the cache controller of FIG. 4 to reclaim entries in the global cylinder group cache of the present invention; and
FIG. 10 illustrates an alternate embodiment of the invention which uses a bit map to identify available CG entries in the global CG cache of the present invention.
DETAILED DESCRIPTION
Referring now to FIG. 2, a file server 10 of the present invention is device that includes hardware and software components that are used to control the access to file systems 24 and 26 by file server applications. The file server applications may operate to control access to storage of data for clients such as clients 11, 13 and 15. The file server 10 may include a client interface for communicating with the clients, where the interface may be a Local Area Network (LAN) or Wide Area Network (WAN) interface. For example, where storage 12 is a Network Attached Storage (NAS) device, the client interface may be an Internet Protocol (IP) interface.
The file server also includes a disk interface 18 for communicating with storage 12. According to one aspect of the invention, the file server maintains a global Cylinder Group (CG) cache 17 in memory 16. The global CG cache 17 is a single cache that is simultaneously shared by multiple file systems. Thus global CG cache 17 provides temporary storage of CG information for a plurality of file systems, such as file systems 24 and 26. The CG cache 17 is managed by cache controller 20. According to one aspect of the invention the cache controller allocates memory locations in the CG cache to in a manner that maximizes the amount of active file system information that is stored in the CG cache. Rather than reserve space in the memory to store CG entries that are infrequently accessed, CG entries are allocated to file systems on an as-needed basis. Allocating CG entries of the global CG cache on an as-needed basis allows multiple file systems to be serviced with increased performance. The cache controller 20 may be implemented in software, hardware or a combination thereof. Although the cache controller will be shown and described as having various components for performing various functions, it will be appreciated that the present invention is not limited to the components or structures described herein.
FIG. 3 provides a more detailed diagram of the cache controller 17 and global CG cache 12. The cache controller 17 may include a cylinder group (CG) cache wrapper 30, a Least Recently Used (LRU) list 32, a global hash table 34 and hash logic33. The CG cache wrapper 30 may be implemented as object oriented software, with the cylinder group (CG) entries being represented as objects having associated data structures, although this is not a requirement. The CG wrapper controls the initialization, population and reclamation of entries in the global CG cache.
The CG cache 17 is shown in more detail to include a predetermined number of entries such as entry 35 for storing CG block. The number of entries that are allocated to the CG cache is a matter of design choice, taking into account other demands on file server memory. In one embodiment, blocks which are allocated to the CG cache may be intelligently pre-empted for use by other applications using techniques similar to those described in patent application Ser. No. 11/427,526, entitled "Method and Apparatus for Intelligent Pre-emption of Buffer Cache locations", by Bhayana et al., filed on Jun. 29, 2006 and incorporated herein by reference (referred to herein as the Bhayana application).
In one embodiment the CG cache 17 is advantageously implemented as a Hash Q that is indexed using a hash index. The hash index is generated by hashing the file system identifier and CG number of a file system reference using hash logic 33. The present invention is not limited to the use of any particular hash function, although the hash function should be one that ensures that there are fewer collisions in the hash memory address space for an identified number of file systems and cylinder groups. A hash table 34 maps a memory address of the CG block to the hash index. In this description the term 'global CG cache' and Hash Q may be used interchangeably.
List 32 stores pointers to the memory locations that are pre-allocated for storing CG block entries. The list 32 is advantageously arranged as a linked list in Least Recently Used order and therefore memory locations associated with least recently accessed cylinder block entries are at the top of the list. A list entry may include, for example, a pointer to a cylinder block entry in memory, as well as a pointer to the next LRU cylinder block entry. When the UFS accesses cylinders that are not already cached, a CG entry is allocated to the access. The Hash Q is indexed using a combination of the file system identifier and CG identifier included in the access. With this arrangement, storage for CG entries is only allocated when the CG is originally accessed by the mounted file system. As a result memory is allocated only to accessed, active portions of the file systems, thereby allowing a greater number of file systems to be supported by the file server.
The use of a single global CG cache also increases file server performance because only one CG cache needs to be scanned to quickly identify available memory. Such an arrangement removes the need to perform multiple scans to analyze the different CG caches associated with the different file servers in the prior art. According to one aspect of the invention, available CG entries are identified by tracking the time period since access (referred to herein as 'recency' of access) of the CG using a counter in the CG entry. The counter is initialized each time the cylinder is accessed and decremented during each subsequent synchronization, or 'aging' period. In one implementation, the counter is referred to as a 'dirty' counter and is maintained as part of the CG entry stored in the Hash Q.
FIG. 4 illustrates several exemplary fields that may be included in a CG entry 35. The CG entry 35 includes a CG number36, a block number 37, identifying the on disk block number for the associated CG block, CG state 38 which may identify a state of the CG entry, such as whether the CG entry is free (for allocation) or valid (i.e., storing valid CG data). The CG entry 35 is also shown to include a dirty counter 39 and a CG block 40. The CG block holds the on-disk copy of CG block that describes the current CG state and is generally implemented as an object including free inode and block maps. Other implementations which use a subset of the above fields, or include additional fields, are intended to be equivalents to the illustrated CG entry, and thus the present invention is not limited to the inclusion of specific fields in the CG entry.
The dirty counter 39 is used in the present invention to track CG entry recency of access. The dirty counter is initialized with an integer value when the CG is accessed. At each subsequent synch interval the counter is decremented. In one embodiment, because the dirty counter 39 is used to 'age' the CG entries, it may also be used by the CG wrapper to maintain LRU list order. When the dirty counter is decremented to a pre-determined value, a portion of memory holding CG block can be released by deleting the CG block associated with the CG entry.
FIGS. 5-9 are flow diagrams which illustrate several exemplary steps that may be taken by the CG wrapper or cache controller to populate and maintain the global CG cache of the present invention. It should be understood that the flow diagrams are meant to be illustrative only, and the present invention is not limited to the specific steps shown in the flow diagrams, but rather encompasses any equivalent processes that support the use of a global UFS CG cache.
FIG. 5 illustrates steps that may be performed in an exemplary process 40 that initially allocates CG entries in memory to build a global CG cache of the present invention. At step 42 a number N of cache entries is selected by the file server operating system or network administrator. A number of bytes to be reserved for each the CG entries is also determined, and memory addresses of the CG entries for the global CG cache are identified. The selected values are a matter of design choice. At step 44 pointers to the memory addresses of the CG entries in the global CG cache are stored in the list32, and the list is available for use.
FIG. 6 is a flow diagram that illustrates exemplary steps that may be performed by the cache controller or CG wrapper process 50 when accessing the global CG cache. After a file system access is detected at step 52 the process proceeds to step 53 where a hash index is generated using a file system identifier (such as a file system number) and a CG number of the access. At step it is determined whether the CG entry associated with the hash index is a valid CG entry, for example by examining the CG state field 38 of the CG entry. If the CG entry is valid, then the CG block information is already present in the global CG cache. The process continues to step 56, where the CG entry is updated to reflect changes to the CG status that result from the access.
If, however, it was determined at step 54 that the CG entry associated with the hash index is not Valid, then memory has not been allocated for the CG of the file system. At step 55, a CG entry pointer is retrieved from the list 32, and the hash table is updated to map the popped CG entry address to the hash index. At step 56, the CG entry is updated to reflect any changes to the CG that result from the access. At step 58, the CG entry is moved to the end of the LRU list 32, and the dirty counters for the CG block entry is updated to reflect the recent access of the block.
FIG. 7 is a flow diagram that illustrates exemplary steps that may be performed in a process 60 that flush CG blocks. In one embodiment, a timer initiated synch period occurs at pre-defined intervals (such as 30 seconds). At each synch interval, each CG entry is accessed and the dirty counter for the entry is evaluated, as illustrated in steps 64 and 65 in FIG. 7. If the dirty counter is determined to be less than or equal to a pre-defined threshold, then at step 66 the CG block memory corresponding to the CG entry is deleted. If the dirty counter was not less than or equal to the threshold value, then at step67 the dirty counter is decremented. The process continues to step 69 where the next CG entry is 'aged' until all CG block entries in the global CG cache have been evaluated for currency.
FIG. 8 illustrates an exemplary process that may be performed to flush entries of the global CG cache when a file system is unmounted. When it is determined at step 72 that a file system is to be unmounted, at step 73 all the CG entries corresponding to that file system are processed. The dirty CG entries associated with file system are flushed to disk. The CG state of the CG block entry is set to 'free,' and at step 76 the pointers to the CG block entries are placed at the top of list 32. The flush process continues until it is determined at step 77 that the file system flush is completed.
Referring now to FIG. 9, when all entries are in use, one of the entry from the LRU list is re-claimed and re-used to hold the new requested CG header. As mentioned above, one method of identifying an entry to be reclaimed is described in the Bhayana application. FIG. 9 illustrates several steps in a process that reclaims blocks based on a use count.
Referring now to FIG. 10, although the above has described the use of a linked list for identifying potential CG entries for allocation, the use of a linked list, ordered or unordered, is not a limitation of the present invention. For example, in an alternate embodiment a bitmap such as bitmap 125 may advantageously be used to quickly identify available CG entries for allocation to file system cylinder accesses. In such an embodiment, the bitmap may simply store a bit to indicate whether the block is available. In an alternate embodiment, the bitmap may be augmented in the manner described in the Bhayana application described above to include a link to the CG entry, to allow CG state information to be considered when determining which CG entry to select for reclamation purposes. In an alternate embodiment, CG state (such as Free, Valid, Swapped, etc.) may be included directly in the bitmap to facilitate CG block entry allocation.
Accordingly, a global CG cache that is shared by multiple file systems has been shown and described, where the global CG cache is comprised of pre-allocated buffers which are allocated to store CG block information for file system cylinders on an as-needed basis. Global CG cache entry availability may be determined using access recency information to maintain an ordered list of entries. Alternatively, availability may be tracked by maintaining a bit map of the CG block entries. Such an arrangement allows a file server to support multiple file systems using a single global CG cache without starvation and other the adverse performance impacts of the prior art. In addition, a single periodic scan of the global CG cache enables cache maintenance to be performed with only a minimal impact on file server performance.
SRC=https://www.google.com.hk/patents/US8028011
Global UNIX file system cylinder group cache的更多相关文章
- File System Design Case Studies
SRC=http://www.cs.rutgers.edu/~pxk/416/notes/13-fs-studies.html Paul Krzyzanowski April 24, 2014 Int ...
- File System Programming --- (二)
File System Basics The file systems in OS X and iOS handle the persistent storage of data files, app ...
- Design and Implementation of the Sun Network File System
Introduction The network file system(NFS) is a client/service application that provides shared file ...
- PatentTips – EMC Virtual File System
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention generally relates to net ...
- Unix:关于一个file在file system和disk中占用空间
參考文献: Harley Hahns:Guide to Unix and Linux. Chap 24 -->首先要有的关键概念:the amount of "disk space&q ...
- Yandex Big Data Essentials Week1 Unix Command Line Interface File System exploration
File System Function In computing, a file system or filesystem is used to control how data is stored ...
- Linux File System Change Monitoring Technology、Notifier Technology
catalog . 为什么要监控文件系统 : hotplug . udev . fanotify(fscking all notification system) . inotify . code e ...
- Common Internet File System
CIFS (Common Internet File System) is a protocol that gained popularity around the year 2000, as ven ...
- Extension of write anywhere file system layout
A file system layout apportions an underlying physical volume into one or more virtual volumes (vvol ...
随机推荐
- ant整合junit自己主动化測试
一. 使用Junit进行測试 1. Java业务代码: public class HelloWorld { // 測试返回"world" public String hello() ...
- [POJ 2282] The Counting Problem
[题目链接] http://poj.org/problem?id=2282 [算法] 数位DP [代码] #include <algorithm> #include <bitset& ...
- zgb老师关于java集合的总结
数组:存多个数据 操作不便集合(Collection ):存多个数据 管理多个数据 提供更为方便的操作Iterator:专门针对集合 进行迭代List:有序的 允许重复的元素ArrayList:底层数 ...
- C++数字图像处理(1)-伽马变换
https://blog.csdn.net/huqiang_823/article/details/80767019 1.算法原理 伽马变换(幂律变换)是常用的灰度变换,是一种简单的图像增强算法 ...
- Android学习——利用RecyclerView编写聊天界面
1.待会儿会用到RecyclerView,首先在app/build.gradle(注意有两个build.gradle,选择app下的那个)当中添加依赖库,如下: dependencies { comp ...
- android黑科技系列——实现静态的默认安装和卸载应用
一.访问隐藏的API方式进行静态的默认安装和卸载 1.系统安装程序 android自带了一个安装程序—/system/app/PackageInstaller.apk.大多数情况下,我们手机上安装应用 ...
- 关于生sql中的空值
在数据库中的空值无非就是两种形式的表示 一种是什么都没有,一种是以NuLL显示的 , 那么在C# 读取出来怎么判断呢, DtStatus.Rows[0]["FetchCode"]. ...
- C# 4.0新加特性
协变和逆变 这个在C#2.0中就已经支持委托的协变和逆变了,C#4.0开始支持针对泛型接口的协变和逆变: IList<string> strings = new List<strin ...
- spring的四种数据源配置
DriverManagerDataSource spring自带的数据源,配置如下: <bean id="dataSource" class="org.spr ...
- C语言运算符类型
算术运算符 运算符 描述 + 两个操作数相加 - 第一操作数减去第二个操作数 * 两个操作数相乘 / 分子除以分母 % 模运算和整数除法后的余数 ++ 递增操作增加一个整数值 -- 递减操作减少一个整 ...