C# List源码分析(一)

事件原因，之前在公司写代码的时候，带我的师傅建议我对List的长度最好在初始化的时候进行优化，这样对GC更加友好，所以就有了这个文章，来理解下List 容量自适应的实现。

List 继承于IList，IReadOnlyList

// C# 源码

public class List<T> : IList<T>, System.Collections.IList, IReadOnlyList<T>

{

    private const int _defaultCapacity = 4;

    private T[] _items;

    [ContractPublicPropertyName("Count")]

    private int _size;

    private int _version;

    [NonSerialized]

    private Object _syncRoot;

    static readonly T[]  _emptyArray = new T[0];        

    // 其他内容

 }

继承层次上跟JAVA差不多，继承于IList，然后在网上是ICollection

默认容量

从代码中可以看出，默认的容量是4，但是根JAVA一样，都不会立刻申请内存空间

// JAVA code

    /**

     * Constructs an empty list with an initial capacity of ten.

     */

    public ArrayList() {

        // 该变量初始化为

        // private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

        this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;

    }

// C# code

    // Constructs a List. The list is initially empty and has a capacity

    // of zero. Upon adding the first element to the list the capacity is

    // increased to 16, and then increased in multiples of two as required.

    public List() {

        _items = _emptyArray;

    }

但是很有意思，微软给出的官方的注释说容量会在第一次增加到16，然后每次都是加倍增加的。

但是看了代码，我觉得微软的这个注释可能需要更新了，这初始化的长度很明显是4啊，还写了注释告诉我是16。

[TestClass]

public class BasicTest

{

    [TestMethod]

    public void TestMethod1()

    {

        List<int> list = new List<int>();

        Assert.AreEqual(list.Capacity, 0);

        list.Add(1);

        Assert.AreEqual(list.Capacity, 4);

    }

}

写了个简单的UT，没错，初始容量为0，第一次初始化容量为4.说明了代码确实比注释容易维护。

自增长实现方式

Java跟C#有的地方确实不太一样，C#的Add操作返回void，Java返回boolean，其实我觉得C#挺好的，至少Java我写add还真的从来没用过他的返回值。看了源码，我也没发现这个add操作什么情况下返回false。

参考微软提供的C# Add的代码。

// C# Code

// Adds the given object to the end of this list. The size of the list is

// increased by one. If required, the capacity of the list is doubled

// before adding the new element.

//

public void Add(T item) {

    if (_size == _items.Length)

        EnsureCapacity(_size + 1);

    _items[_size++] = item;

    _version++;

}

// Ensures that the capacity of this list is at least the given minimum

// value. If the currect capacity of the list is less than min, the

// capacity is increased to twice the current capacity or to min,

// whichever is larger.

private void EnsureCapacity(int min) {

    if (_items.Length < min) {

        int newCapacity = _items.Length == 0? _defaultCapacity : _items.Length * 2;

        // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.

        // Note that this check works even when _items.Length overflowed thanks to the (uint) cast

        if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;

        if (newCapacity < min) newCapacity = min;

        Capacity = newCapacity;

    }

}

// Gets and sets the capacity of this list.  The capacity is the size of

// the internal array used to hold items.  When set, the internal

// array of the list is reallocated to the given capacity.

//

public int Capacity {

    get {

        Contract.Ensures(Contract.Result<int>() >= 0);

        return _items.Length;

    }

    set {

        if (value < _size) {

            ThrowHelper.ThrowArgumentOutOfRangeException(

                ExceptionArgument.value, ExceptionResource.ArgumentOutOfRange_SmallCapacity);

        }

        Contract.EndContractBlock();

        if (value != _items.Length) {

            if (value > 0) {

                T[] newItems = new T[value];

                if (_size > 0) {

                    Array.Copy(_items, 0, newItems, 0, _size);

                }

                _items = newItems;

            }

            else {

                _items = _emptyArray;

            }

        }

    }

}

最初所有数据中推荐编程者使用List的一大重要原因，就是自适应。在Add执行之前一定会执行EnsureCapacity函数，来确保数组中的长度足够，可以添加新的元素进去。

从源码上看得出来，代码是线程不安全的。所以最好别使用公用的List来添加，并发情况下，无法判断_size++的执行顺序。

C# 一大特性感觉不太容易让人理解啊，就是这个属性，《CLR via C#》一书中推荐取消Property，我觉得还是有点道理的，跟field还真是傻傻分不清楚，但是本质上是一个方法。

可以看出，在配置Capacity属性的时候，会对_items进行重新new，然后复制，是一个频繁操作内存的操作。

对GC友好就是指这里吧，如果在初始化的时候配置了容量的话，内部存储的时候就不会使用Array.Copy方法，从而产生很多垃圾对象。

C# 中 List默认的容量其实是4，所以最好还是初始化容量吧，可以想象，如果一个列表里面有129个元素，那么代码中对Capacity的调用会有很多次，4->8->16->32->64->128->256，不但最后的容量中产生了大量的浪费，前面的一堆对象也都需要GC搞定了。也就是252个对象。浪费还是很严重的。

List中的Remove

List的Remove操作每次都要进行内存的整理，其实是操作消耗较大的，代码如下：

// Removes the element at the given index. The size of the list is

// decreased by one.

public bool Remove(T item) {

    int index = IndexOf(item);

    if (index >= 0) {

        RemoveAt(index);

        return true;

    }

    return false;

}

// Returns the index of the first occurrence of a given value in a range of

// this list. The list is searched forwards from beginning to end.

// The elements of the list are compared to the given value using the

// Object.Equals method.

//

// This method uses the Array.IndexOf method to perform the

// search.

//

public int IndexOf(T item) {

    Contract.Ensures(Contract.Result<int>() >= -1);

    Contract.Ensures(Contract.Result<int>() < Count);

    return Array.IndexOf(_items, item, 0, _size);

}

// Array.cs

public static int IndexOf<T>(T[] array, T value, int startIndex, int count) {

    if (array==null) {

        throw new ArgumentNullException("array");

    }

    if (startIndex < 0 || startIndex > array.Length ) {

        throw new ArgumentOutOfRangeException("startIndex",

            Environment.GetResourceString("ArgumentOutOfRange_Index"));

    }

    if (count < 0 || count > array.Length - startIndex) {

        throw new ArgumentOutOfRangeException("count",

            Environment.GetResourceString("ArgumentOutOfRange_Count"));

    }

    Contract.Ensures(Contract.Result<int>() < array.Length);

    Contract.EndContractBlock();

    return EqualityComparer<T>.Default.IndexOf(array, value, startIndex, count);

}

// Removes the element at the given index. The size of the list is

// decreased by one.

public void RemoveAt(int index) {

    if ((uint)index >= (uint)_size) {

        ThrowHelper.ThrowArgumentOutOfRangeException();

    }

    Contract.EndContractBlock();

    _size--;

    if (index < _size) {

        Array.Copy(_items, index + 1, _items, index, _size - index);

    }

    _items[_size] = default(T);

    _version++;

}

从代码来看，remove操作优先看的是能否找到该元素，如果能找到，将其移除，返回True，否则，返回false

C#的索引方法有点复杂，点到EqualityComparer里面看了一下索引的方法，这也是C#跟Java的不同之处了，Java的泛型里面是不能写入Primitive类型的，因为Primitive类型其实是不继承Object的，所以无法调用其中的equals方法。

但是C#是支持的，所以，会判断元类型的Type，然后选取对应的Equals方法。

现在回头看下RemoveAt方法，该方法仍然会调用Array.Copy操作，所以，可想而知删除操作的复杂度了，内存中平均删除一个元素，要移动n/2个元素，复杂度为O(n)

而RemoveAll方法本身是复杂度为O(n)的，所以最好不要在循环中写Remove操作吧。

以下为List源码地址

List源码

 Array源码