Understanding the managed heap
Understanding the managed heap
Another common problem faced by many Unity developers is the unexpected expansion of the managed heap. In Unity, the managed heap expands much more readily than it shrinks. Furthermore, Unity’s garbage collection strategy tends to fragment memory, which can prevent a large heap from shrinking.
Technical details: how the managed heap operates and why it expands
The “managed heap” is a section of memory that is automatically managed by the memory manager of a Project’s scripting runtime (Mono or IL2CPP). All objects created in managed code must be allocated on the managed heap(2) (Note: Strictly speaking, all non-null reference-typed objects and all boxed value-typed objects must be allocated on the managed heap).
In the above diagram, the white box represents a quantity of memory apportioned to the managed heap, and the colored boxes within it represent data values stored within the managed heap’s memory space. When additional values are needed, more space is allocated from within the managed heap.
The garbage collector runs periodically(3) (Note: The exact timing is platform-dependent). This sweeps through all objects on the heap, marking for deletion any objects that are no longer referenced. Unreferenced objects are then deleted, freeing up memory.
Crucially, Unity’s garbage collection – which uses the Boehm GC algorithm – is non-generational and non-compacting. “Non-generational” means that the GC must sweep through the entire heap when performing a collection pass, and its performance therefore degrades as the heap expands. “Non-compacting” means that objects in memory are not relocated in order to close gaps between objects.
The above diagram shows an example of memory fragmentation. When an object is released, its memory is freed. However, the freed space does not become part of a single large pool of “free memory”. The objects on either side of the freed object may still be in use. Because of this, the freed space is a “gap” between other segments of memory (this gap is indicated by the red circle in the diagram). The newly-freed space can therefore only be used to store data of identical or lesser size than the freed object.
When allocating an object, remember that the object must always occupy a contiguous block of space in memory.
This leads to the core problem of memory fragmentation: while the overall amount of space available in the heap may be substantial, it is possible that some or all of that space is in small “gaps” between allocated objects. In this case, even though there may be enough total space to accommodate a certain allocation, the managed heap cannot find a large enough block of contiguous memory in which to fit the allocation.
However, if a large object is allocated and there is insufficient contiguous free space to accommodate the object, as illustrated above, the Unity memory manager performs two operations.
First, if it has not already done so, the garbage collector runs. This attempts to free up enough space to fulfill the allocation request.
If, after the GC runs, there is still not enough contiguous space to fit the requested amount of memory, the heap must expand. The specific amount that the heap expands is platform-dependent; however, most Unity platforms double the size of the managed heap.
Key problems with the heap
The core issues with managed heap expansion are twofold:
Unity does not often release the memory pages allocated to the managed heap when it expands; it optimistically retains the expanded heap, even if a large portion of it is empty. This is to prevent the need to re-expand the heap should further large allocations occur.
On most platforms, Unity eventually releases the pages used by empty portions of the managed heap back to the operating system. The interval at which this occurs is not guaranteed and should not be relied upon.
The address space used by the managed heap is never returned to the operating system.
For 32-bit programs, this can lead to address space exhaustion if the managed heap expands and contracts many times. If a program’s available memory address space is exhausted, the operating system will terminate the program.
For 64-bit programs, the address space is sufficiently large that this is extremely unlikely to occur for programs whose running time does not exceed the average human lifespan.
Temporary allocations
Many Unity projects are found to operate with several tens or hundreds of kilobytes of temporary data being allocated to the managed heap each frame. This is often extremely detrimental to a project’s performance. Consider the following math:
If a program allocates one kilobyte (1kb) of temporary memory each frame, and is running at 60 frames per second, then it must allocate 60 kilobytes of temporary memory per second. Over the course of a minute, this adds up to 3.6 megabytes of garbage in memory. Invoking the garbage collector once per second is likely to be detrimental to performance, but allocating 3.6 megabytes per minute is problematic when attempting to run on low-memory devices.
Further, consider loading operations. If a large number of temporary objects are generated during a heavy Asset-loading operation, and those objects are referenced until the operation completes, then the garbage collector is unable to release those temporary objects and the managed heap needs to expand – even though many of the objects it contains will be released a short time later.
Keeping track of managed memory allocations is relatively simple. In Unity’s CPU Profiler, the Overview has a “GC Alloc” column. This column displays the number of bytes allocated on the managed heap in a specific frame (4) (Note: Note that this is not identical to the number of bytes temporarily allocated during a given frame. The profile displays the number of bytes allocated in a specific frame, even if some/all of the allocated memory is reused in subsequent frames). With the “Deep Profiling” option enabled, it’s possible to track down the method in which these allocations occur.
Note that some script methods cause allocations when running in the Editor, but do not produce allocations after the project has been built. GetComponent
is the most common example; this method always allocates when executed in the Editor, but not in a built project.
In general, it is strongly recommended that all developers minimize managed heap allocations whenever the project is in an interactive state. Allocations during non-interactive operations, such as Scene loading, are less problematic.
Basic memory conservation
There are a handful of relatively simple techniques that can be employed to reduce managed heap allocations.
Collection and array reuse
When using C#’s Collection classes or Arrays, consider reusing or pooling the allocated Collection or Array whenever possible. The Collection classes expose a Clear method which eliminates the Collection’s values but does not release the memory allocated to the Collection.
void Update() {
List<float> nearestNeighbors = new List<float>();
findDistancesToNearestNeighbors(nearestNeighbors);
nearestNeighbors.Sort();
// … use the sorted list somehow …
}
This is particularly useful when allocating temporary “helper” Collections for complex computations. A very simple example might be the following code:
In this example, the nearestNeighbors
List is allocated once per frame in order to collect a set of data points. It’s very simple to hoist this List out of the method and into the containing class, which avoids allocating a new List each frame:
List<float> m_NearestNeighbors = new List<float>();
void Update() {
m_NearestNeighbors.Clear();
findDistancesToNearestNeighbors(NearestNeighbors);
m_NearestNeighbors.Sort();
// … use the sorted list somehow …
}
In this version, the List’s memory is retained and reused across multiple frames. New memory is only allocated when the List needs to expand.
Closures and anonymous methods
There are two points to consider when using closures and anonymous methods.
First, all method references in C# are reference types, and are therefore allocated on the heap. Temporary allocations can be easily created by passing a method reference as an argument. This allocation occurs regardless of whether the method being passed is an anonymous method or a predefined one.
Second, converting an anonymous method to a closure significantly increases the amount of memory required to pass the closure to method receiving it.
Consider the following code:
List<float> listOfNumbers = createListOfRandomNumbers();
listOfNumbers.Sort( (x, y) =>
(int)x.CompareTo((int)(y/2))
);
This snippet uses a simple anonymous method to control the sorting order of the list of numbers created on the first line. However, if a programmer wished to make this snippet reusable, it is tempting to substitute the constant 2
for a variable in local scope, like so:
List<float> listOfNumbers = createListOfRandomNumbers();
int desiredDivisor = getDesiredDivisor();
listOfNumbers.Sort( (x, y) =>
(int)x.CompareTo((int)(y/desiredDivisor))
);
The anonymous method now requires the method to be able to access the state of a variable outside of the method’s scope, and so has become a closure. The desiredDivisor
variable must be passed into the closure somehow so that it can be used by the actual code of the closure.
To do this, C# generates an anonymous class that can retain the externally-scoped variables needed by the closure. A copy of this class is instantiated when the closure is passed to the Sort
method, and the copy is initialized with the value of the desiredDivisor
integer.
Because executing the closure requires instantiation of a copy of its generated class, and all classes are reference types in C#, then executing the closure requires allocation of an object on the managed heap.
In general, it is best to avoid closures in C# whenever possible. Anonymous methods and method references should be minimized in performance-sensitive code, and especially in code that executes on a per-frame basis.
Anonymous methods under IL2CPP
Currently, inspection of code generated by IL2CPP reveals that the simple declaration and assignment of a variable of type System.Function
allocates a new object. This is true whether the variable is explicit (declared in a method/class) or implicit (declared as an argument to another method).
As such, any use of anonymous methods under the IL2CPP scripting backend allocates managed memory. This is not the case under the Mono scripting backend.
Furthermore, IL2CPP displays dramatically different levels of managed memory allocation depending on the way in which a method argument is declared. Closures, as expected, allocate the most memory per call.
Unintuitively, predefined methods allocate nearly as much memory as closures when passed as arguments under the IL2CPP scripting backend. Anonymous methods generate the least amount of transient garbage on the heap, by one or more orders of magnitude.
Therefore, if a project is intended to ship on the IL2CPP scripting backend, there are three key recommendations:
Prefer coding styles that do not require passing methods as arguments.
When unavoidable, prefer anonymous methods over predefined methods.
Avoid closures, regardless of scripting backend.
Boxing
Boxing is one of the most common sources of unintended temporary memory allocations found in Unity projects. It occurs whenever a value-typed value is utilized as a reference type; this most often occurs when passing primitive value-typed variables (such as int
and float
) to object-typed methods.
In this extremely simple example, the integer in x is boxed in order to be passed to the object.Equals
method, because the Equals
method on object
requires that an object
be passed to it.
int x = 1;
object y = new object();
y.Equals(x);
C# IDEs and compilers generally do not issue warnings about boxing, even though it leads to unintended memory allocations. This is because the C# language was developed with the assumption that small temporary allocations would be efficiently handled by generational garbage collectors and allocation-size-sensitive memory pools.
While Unity’s allocator does use different memory pools for small and large allocations, Unity’s garbage collector is not
generational and therefore cannot efficiently sweep out the small, frequent temporary allocations generated by boxing.
Boxing should be avoided wherever possible when writing C# code for Unity runtimes.
Identifying boxing
Boxing shows up in CPU traces as calls to one of a few methods, depending on the scripting backend in use. These generally take one of the following forms, where <some class>
is the name of some other class or struct, and …
is some number of arguments:
<some class>::Box(…)
Box(…)
<some class>_Box(…)
It can also be located by searching the output of a decompiler or IL viewer, such as the IL viewer tool built into ReSharper or the dotPeek decompiler. The IL instruction is “box”.
Dictionaries and enums
One common cause of boxing is the use of enum
types as keys for Dictionaries. Declaring an enum
creates a new value type that is treated like an integer behind the scenes, but enforces type-safety rules at compile time.
By default, a call to Dictionary.add(key, value)
results in a call to Object.getHashCode(Object)
. This method is used to obtain the appropriate hash code for the Dictionary’s key, and is used in all methods that accept a key: Dictionary.tryGetValue, Dictionary.remove
, etc.
The Object.getHashCode
method is reference-typed, but enum
values are always value types. Therefore, for enum-keyed Dictionaries, every method call results in the key being boxed at least once.
The following code snippet illustrates a simple example that demonstrates this boxing problem:
enum MyEnum { a, b, c };
var myDictionary =
new Dictionary<MyEnum, object>();
myDictionary.Add(MyEnum.a, new object());
To solve this problem, it is necessary to write a custom class that implements the IEqualityComparer
interface and assign an instance of that class as the Dictionary’s comparer (Note: This object is usually stateless, and therefore can be reused with different Dictionary instances to save memory).
The following is a simple example of an IEqualityComparer for the above code snippet.
public class MyEnumComparer : IEqualityComparer<MyEnum> {
public bool Equals(MyEnum x, MyEnum y) {
return x == y;
}
public int GetHashCode(MyEnum x) {
return (int)x;
}
}
An instance of the above class could be passed to the Dictionary’s constructor.
Foreach loops
In Unity’s version of the Mono C# compiler, use of the foreach
loop forces Unity to box a value each time the loop terminates (Note: The value is boxed once each time the loop as a whole finishes executing. It does not box once per iteration of the loop, so memory usage remains the same regardless of whether the loop runs two times or 200 times). This is because the IL generated by Unity’s C# compiler constructs a generic value-type Enumerator in order to iterate over the value collection.
This Enumerator implements the IDisposable
interface, which must be called when the loop terminates. However, calling interface methods on value-typed objects (such as structs and Enumerators) requires boxing them.
Examine the following very simple example code:
int accum = 0;
foreach(int x in myList) {
accum += x;
}
The above, when run through Unity’s C# compiler, produces the following Intermediate Language:
.method private hidebysig instance void
ILForeach() cil managed
{
.maxstack 8
.locals init (
[0] int32 num,
[1] int32 current,
[2] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32> V_2
)
// [67 5 - 67 16]
IL_0000: ldc.i4.0
IL_0001: stloc.0 // num
// [68 5 - 68 74]
IL_0002: ldarg.0 // this
IL_0003: ldfld class [mscorlib]System.Collections.Generic.List`1<int32> test::myList
IL_0008: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0/*int32*/> class [mscorlib]System.Collections.Generic.List`1<int32>::GetEnumerator()
IL_000d: stloc.2 // V_2
.try
{
IL_000e: br IL_001f
// [72 9 - 72 41]
IL_0013: ldloca.s V_2
IL_0015: call instance !0/*int32*/ valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::get_Current()
IL_001a: stloc.1 // current
// [73 9 - 73 23]
IL_001b: ldloc.0 // num
IL_001c: ldloc.1 // current
IL_001d: add
IL_001e: stloc.0 // num
// [70 7 - 70 36]
IL_001f: ldloca.s V_2
IL_0021: call instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::MoveNext()
IL_0026: brtrue IL_0013
IL_002b: leave IL_003c
} // end of .try
finally
{
IL_0030: ldloc.2 // V_2
IL_0031: box valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>
IL_0036: callvirt instance void [mscorlib]System.IDisposable::Dispose()
IL_003b: endfinally
} // end of finally
IL_003c: ret
} // end of method test::ILForeach
} // end of class test
The most relevant code is the __finally { … }__
block near the bottom. The callvirt
instruction discovers the location of the IDisposable.Dispose
method in memory before invoking the method, and requires that the Enumerator be boxed.
In general, foreach
loops should be avoided in Unity. Not only do they box, but the method-call cost of iterating over collections via Enumerators is generally much slower than manual iteration via a for
or while
loop.
Note that the C# compiler upgrade in Unity 5.5 significantly improves Unity’s ability to generate IL. In particular, the boxing operations has been eliminated from foreach
loops. This eliminates the memory overhead associated with foreach
loops. However, the CPU performance difference compared to equivalent Array-based code remains, due to method-call overhead.
Array-valued Unity APIs
A more pernicious and less-visible cause of spurious array allocation is the repeated accessing of Unity APIs that return arrays. All Unity APIs that return arrays create a new copy of the array each time they are accessed. It is extremely non-optimal to access an array-valued Unity API more often than necessary.
As an example, the following code spuriously creates four copies of the vertices
array per loop iteration. The allocations are occur each time the .vertices
property is accessed.
for(int i = 0; i < mesh.vertices.Length; i++)
{
float x, y, z;
x = mesh.vertices[i].x;
y = mesh.vertices[i].y;
z = mesh.vertices[i].z;
// ...
DoSomething(x, y, z);
}
This can be trivially refactored into a single array allocation, regardless of the number of loop iterations, by capturing the vertices
array before entering the loop:
var vertices = mesh.vertices;
for(int i = 0; i < vertices.Length; i++)
{
float x, y, z;
x = vertices[i].x;
y = vertices[i].y;
z = vertices[i].z;
// ...
DoSomething(x, y, z);
}
While the CPU cost of accessing a property once is not very high, repeated accesses within tight loops create CPU performance hotspots. Further, repeated accesses unnecessarily expand the managed heap.
This problem is extremely common on mobile, because the Input.touches
API behaves similarly to the above. It is extremely common for projects to contain code similar to the following, where an allocation occurs each time the .touches
property is accessed.
for ( int i = 0; i < Input.touches.Length; i++ )
{
Touch touch = Input.touches[i];
// …
}
This can, of course, be trivially improved by hoisting the array allocation out of the loop condition:
Touch[] touches = Input.touches;
for ( int i = 0; i < touches.Length; i++ )
{
Touch touch = touches[i];
// …
}
However, there are now versions of many Unity APIs that do not cause memory allocations. These should generally be favored, when they’re available.
int touchCount = Input.touchCount;
for ( int i = 0; i < touchCount; i++ )
{
Touch touch = Input.GetTouch(i);
// …
}
Converting the above example to the allocation-less Touch API is simple:
Note that the property access (Input.touchCount
) is still kept outside the loop condition in order to save the CPU cost of invoking the property’s get
method.
Empty array reuse
Some teams prefer to return empty arrays instead of null
when an array-valued method needs to return an empty set. This coding pattern is common in many managed languages, particularly C# and Java.
In general, when returning a zero-length array from a method, it is considerably more efficient to return a pre-allocated singleton instance of the zero-length array than to repeatedly create empty arrays(5) (Note: Naturally, an exception should be made when the array is resized after being returned).
Footnotes
(1) This is because, on most platforms, readback from GPU memory is extremely slow. Reading a Texture from GPU memory into a temporary buffer for use by CPU code (e.g.
Texture.GetPixel
) would be very nonperformant.(2) Strictly speaking, all non-null reference-typed objects and all boxed value-typed objects must be allocated on the managed heap.
(3) The exact timing is platform-dependent.
(4) Note that this is not identical to the number of bytes temporarily allocated during a given frame. The profile displays the number of bytes allocated in a specific frame, even if some/all of the allocated memory is reused in subsequent frames.
(5) Naturally, an exception should be made when the array is resized after being returned.
Understanding the managed heap的更多相关文章
- Windbg Extension NetExt 使用指南 【3】 ---- 挖掘你想要的数据 Managed Heap
摘要 : NetExt中有两个比较常用的命令可以用来分析heap上面的对象. 一个是!wheap, 另外一个是!windex. !wheap 这个命令可以用于打印出heap structure信息. ...
- .NET:CLR via C#The Managed Heap and Garbage Collection
Allocating Resources from the Managed Heap The CLR requires that all objects be allocated from the m ...
- Advanced .NET Debugging: Managed Heap and Garbage Collection(转载,托管堆查内存碎片问题解决思路)
原文地址:http://www.informit.com/articles/article.aspx?p=1409801&seqNum=4 Debugging Managed Heap Fra ...
- [No0000147]深入浅出图解C#堆与栈 C# Heap(ing) VS Stack(ing)理解堆与栈4/4
前言 虽然在.Net Framework 中我们不必考虑内在管理和垃圾回收(GC),但是为了优化应用程序性能我们始终需要了解内存管理和垃圾回收(GC).另外,了解内存管理可以帮助我们理解在每一个程 ...
- C# 堆栈和堆 Heap & Stack
首先堆栈和堆(托管堆)都在进程的虚拟内存中.(在32位处理器上每个进程的虚拟内存为4GB) 堆栈stack 堆栈中存储值类型. 堆栈实际上是向下填充,即由高内存地址指向低内存地址填充. 堆栈的工作方式 ...
- 基于托管的C++来使用WPF - Using WPF with Managed C++
基于托管的C++来使用WPF - Using WPF with Managed C++ Posted by Zeeshan Amjad This article was originally publ ...
- .NET 框架(转自wiki)
.NET Framework (pronounced dot net) is a software framework developed by Microsoft that runs primari ...
- The Truth About .NET Objects And Sharing Them Between AppDomains
From http://geekswithblogs.net/akraus1/archive/2012/07/25/150301.aspx I have written already some ti ...
- Garbage Collectors – Serial vs. Parallel vs. CMS vs. G1 (and what’s new in Java 8)
转自:http://blog.takipi.com/garbage-collectors-serial-vs-parallel-vs-cms-vs-the-g1-and-whats-new-in-ja ...
随机推荐
- sas 变量类型转换
data b2: set b1; newbl=put(oldbl,10.); run; 根据转换后的类型灵活填写
- SQL-33 创建一个actor表,包含如下列信息
题目描述 创建一个actor表,包含如下列信息 列表 类型 是否为NULL 含义 actor_id smallint(5) not null 主键id first_name varchar(45) ...
- 进程创建过程详解 CreateProcess
转载请您注明出处:http://www.cnblogs.com/lsh123/p/7405796.html 0x01 CreateProcessW CreateProcess的使用有ANSI版本的Cr ...
- mybatis动态sql #和$的区别
$和#都支持动态sql:就是你传什么它就是什么 区别: 1.#可以防止sql注入在sql执行时显示 '?' 比$安全 SELECT * FROM table WHERE id = ? 2.在使用#传入 ...
- Android开发 ---Fragment片段布局
前言 Fragment想必大家不陌生吧,在日常开发中,对于Fragment的使用也很频繁,现在主流的APP中,基本的架构也都是一个主页,然后每个Tab项用Fragment做布局,不同选项做切换,使用起 ...
- 百度地图API示例:鼠标绘制点线面 控件修改
需求 :在使用地图API时,绘制工具栏控件想自己选择哪些要,哪些不要. 可以查看相应的类:官网地址: http://api.map.baidu.com/library/DrawingManager/1 ...
- 相册 垂直居中; 水平居中. 1)宽度 大于高度, 宽度 100%; 2) 高度 大于 宽度 , 高度100%; getimagesize , list --->line-height , text-align, vertical-align, max-height, max-width
一: 效果: 1) 黑色 部分是 相框. 2) 图片 要实现 水平居中, 垂直居中 3) 如果 宽度 大于 高度 ,那么 宽度 100% ,如图1 , 高度 自适应 ,同时不能超过黑色相框的 高度 ; ...
- php优秀框架codeigniter学习系列——异常和错误处理机制
这篇介绍下CI框架的异常和错误处理机制. 在入口文件index.php中,根据设置的环境参数设置error_reporting的范围,和是否显示错误. 在CI初始化程序CodeIgniter.php中 ...
- PHP中session_start 函数详解使用方法
一.官方 session_status() 返回值为: PHP_SESSION_DISABLED 会话是被禁用的. PHP_SESSION_NONE 会话是启用的,但不存在当前会话. PHP_SESS ...
- 在Linux系统下mail命令的用法
在Linux系统下mail命令的测试 1. 最简单的一个例子: mail -s test admin@aispider.com 这条命令的结果是发一封标题为test的空信给后面的邮箱,如果你有mta并 ...