计算机体系结构 -内存优化vm+oom

http://www.cnblogs.com/dkblog/archive/2011/09/06/2168721.html
https://www.kernel.org/doc/Documentation/vm/

内存设置参数位置:
[root@server1 vm]# pwd
/proc/sys/vm

[root@server1 vm]# ls
block_dump                 extfrag_threshold           memory_failure_recovery  numa_zonelist_order       stat_interval
compact_memory             extra_free_kbytes           min_free_kbytes          oom_dump_tasks            swappiness
dirty_background_bytes     hugepages_treat_as_movable  min_slab_ratio           oom_kill_allocating_task  unmap_area_factor
dirty_background_ratio     hugetlb_shm_group           min_unmapped_ratio       overcommit_memory         vfs_cache_pressure
dirty_bytes                laptop_mode                 mmap_min_addr            overcommit_ratio          would_have_oomkilled
dirty_expire_centisecs     legacy_va_layout            nr_hugepages             page-cluster              zone_reclaim_mode
dirty_ratio                lowmem_reserve_ratio        nr_hugepages_mempolicy   panic_on_oom
dirty_writeback_centisecs  max_map_count               nr_overcommit_hugepages  percpu_pagelist_fraction
drop_caches                memory_failure_early_kill   nr_pdflush_threads       scan_unevictable_pages

每个进程 OOM设置:

[root@server1 ]# pwd

/proc/

[root@server1 ]# ls |grep oom

oom_adj

oom_score

oom_score_adj


/proc/slabinfo
/proc/buddyinfo
/proc/zoneinfo
/proc/meminfo

[root@monitor /]# slabtop

 Active / Total Objects (% used)    : 347039 / 361203 (96.1%)
 Active / Total Slabs (% used)      : 24490 / 24490 (100.0%)
 Active / Total Caches (% used)     : 88 / 170 (51.8%)
 Active / Total Size (% used)       : 98059.38K / 99927.38K (98.1%)
 Minimum / Average / Maximum Object : 0.02K / 0.28K / 4096.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
115625 115344  99%    0.10K   3125       37     12500K buffer_head
 73880  73437  99%    0.19K   3694       20     14776K dentry
 42184  42180  99%    0.99K  10546        4     42184K ext4_inode_cache
 20827  20384  97%    0.06K    353       59      1412K size-64
 16709  13418  80%    0.05K    217       77       868K anon_vma_chain
 15792  15708  99%    0.03K    141      112       564K size-32
 11267  10323  91%    0.20K    593       19      2372K vm_area_struct
 10806  10689  98%    0.64K   1801        6      7204K proc_inode_cache
  9384   5232  55%    0.04K    102       92       408K anon_vma
  7155   7146  99%    0.07K    135       53       540K selinux_inode_security
  7070   7070 100%    0.55K   1010        7      4040K radix_tree_node
  6444   6443  99%    0.58K   1074        6      4296K inode_cache
  5778   5773  99%    0.14K    214       27       856K sysfs_dir_cache
  3816   3765  98%    0.07K     72       53       288K Acpi-Operand
  2208   2199  99%    0.04K     24       92        96K Acpi-Namespace
  1860   1830  98%    0.12K     62       30       248K size-128
  1440   1177  81%    0.19K     72       20       288K size-192
  1220    699  57%    0.19K     61       20       244K filp
   660    599  90%    1.00K    165        4       660K size-1024

[root@monitor xx]# cat /proc/meminfo |grep HugePage

AnonHugePages:       kB

HugePages_Total:

HugePages_Free:

HugePages_Rsvd:

HugePages_Surp:          0

1.vi /etc/sysctl.conf
  加入
  vm.nr_hugepages = 10

2.sysctl -p
[root@monitor /]#  cat /proc/meminfo |grep Huge
AnonHugePages:      2048 kB
HugePages_Total:      10
HugePages_Free:       10
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

3.应用于应用程序
[root@monitor /]# mkdir /hugepages
[root@monitor /]# mount -t  hugetlbfs  none  /hugepages

[root@monitor /]# dd if=/dev/zero of=/hugepages/a.out bs=1M count=5

Hugetable page：

Hugetlbfs support is built on top of multiple page size support that is provided by most modern 
architectures

Users can use the huge page support in Linux kernel by either using the mmap system call or 
standard Sysv shared memory system calls (shmget, shmat)

cat /proc/meminfo | grep HugePage

Improving TLB performance：

Kernel must usually flush TLB entries upon a context switch

Use free, contiguous physical pages

    Automatically via the buddy allocator

    /proc/buddyinfo

Manually via hugepages (not pageable)

    Linux supports large sized pages through the hugepages mechanism

    Sometimes known as bigpages, largepages or the hugetlbfs filesystem

Consequences

    TLB cache hit more likely

    Reduces PTE visit count

Tuning TLB performance

Check size of hugepages

     x86info -a | grep “Data TLB”

     dmesg

     cat /proc/meminfo

Enable hugepages

    1.In /etc/sysctl.conf

       vm.nr_hugepages = n

    2.Kernel parameter     //操作系动起动时传参数

        hugepages=n

Configure hugetlbfs if needed by application

     mmap system call requires that hugetlbfs is mounted

     mkdir  /hugepages

     mount -t  hugetlbfs  none  /hugepages

     shmat and shmget system calls do not require hugetlbfs

Trace every system call made by a program

strace  -o  /tmp/strace.out  -p  PID

grep  mmap  /tmp/strace.out

Summarize system calls

strace  -c  -p PID   or

strace  -c  COMMAND
strace command

Other uses

Investigate lock contentions

Identify problems caused by improper file permissions

Pinpoint IO problems

Strategies for using memory
使用内存优化

1.Reduce overhead for tiny memory objects

  Slab cache
  cat /proc/slabinfo

  echo 'ext4_inode_cache 108 54 8' >/proc/slabinfo 

2.Reduce or defer service time for slower subsystems

    Filesystem metadata: buffer cache ,  slab cache   //缓存文件元数据

    Disk IO: page cache                              //缓存数据

    Interprocess communications: shared memory       //共享内存

    Network IO: buffer cache, arp cache, connection tracking

3.Considerations when tuning memory

    How should pages be reclaimed to avoid pressure?

    Larger writes are usually more efficient due to re-sorting

内存参数设置:
vm.min_free_kbytes：
1.因为内存耗近,系统会崩溃
2.因此保有空闲内存剩下,当进程请求内存分配，不足会把其他内存交换到SWAP中，从而便腾去足够空间去给请求
　　Tuning vm.min_free_kbytes only be necessary when an application regularly needs to allocate a large block of memory, then frees that same memory

　　使用情况：
   It may well be the case that 
    the system has too little disk bandwidth, 
    too little CPU power, or
    too little memory to handle its load

    Linux 提供了这样一个参数min_free_kbytes，用来确定系统开始回收内存的阀值，控制系统的空闲内存。值越高，内核越早开始回收内存，空闲内存越高。
    http://www.cnblogs.com/itfriend/archive/2011/12/14/2287160.html

Consequences

　　Reduces service time for demand paging

　　Memory is not available for other useage

　　Can cause pressure on ZONE_NORMAL

Linux服务器内存使用量超过阈值，触发报警。

问题排查

首先，通过free命令观察系统的内存使用情况，显示如下：

total       used       free     shared    buffers     cached

Mem:

-/+ buffers/cache:

Swap:

其中，可以看出内存总量为24675796KB，已使用22617644KB，只剩余2058152KB。

然后，接着通过top命令，shift + M按内存排序后，观察系统中使用内存最大的进程情况，发现只占用了18GB内存，其他进程均很小，可忽略。

因此，还有将近4GB内存（22617644KB-18GB，约4GB）用到什么地方了呢？

进一步，通过cat /proc/meminfo发现，其中有将近4GB（ KB）的Slab内存：

......

Mapped:           kB

Slab:           kB

PageTables:       kB

......

Slab是用于存放内核数据结构缓存，再通过slabtop命令查看这部分内存的使用情况：

OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME

  %    .21K           3494744K dentry_cache

   %    .09K               33404K buffer_head

   %    .74K              120832K ext3_inode_cache

发现其中大部分（大约3.5GB）都是用于了dentry_cache。

问题解决

drop_caches

To free pagecache:

	echo 1 > /proc/sys/vm/drop_caches      [include buffer cache and page cache]

To free reclaimable slab objects (includes dentries and inodes):

	echo 2 > /proc/sys/vm/drop_caches      [ 说明dentris and inodes不在 buffer cache 与 page cache中]

To free slab objects and pagecache:            [全部释放]

	echo 3 > /proc/sys/vm/drop_caches

http://www.kernel.org/doc/Documentation/sysctl/vm.txt

注意：在清空缓存之前使用sync命令同步数据到磁盘

. 方法1需要用户具有root权限，如果不是root，
但有sudo权限，可以通过sysctl命令进行设置： 
$sync 
$sudo sysctl -w vm.drop_caches=3

$sudo sysctl -w vm.drop_caches= #recovery drop_caches 操作后可以通过

sudo sysctl -a | grep drop_caches查看是否生效。

物理内存过量使用是以swap为基础的:    //数据库上不要用,因为被SWAP很慢

vm.overcommit_memory

     = heuristic overcommit  //系统自决定过量使用

     = always overcommit     //总是能够使用SWAP

     = commit all RAM plus a percentage of swap (may be > )   
            = RAM+ SWAP*overcommit_ratio 
            [SWAP*overcommit_ratio]<实际虚拟内存SWAP

        vm.overcommit_ratio:   //可以超出物理内存的百分比，一般不要超过%50

             Specified the percentage of physical memory allowed to be overcommited
             when the vm.overcommit_memory is set to


View Committed_AS in /proc/meminfo

An estimate of how much RAM is required to avoid an out of memory (OOM) condition 
for the current workload on a system

Slab cache

Tiny kernel objects are stored in the slab

    Extra overhead of tracking is better than using  page/object

    Example: filesystem metadata (dentry and inode caches )

Monitoring

   /proc/slabinfo

   slabtop

   vmstat -m

Tuning a particular slab cache

    echo  “cache_name  limit   batchcount  shared” > /proc/slabinfo

    limit   the maximum number of objects that will be cached for each CPU

　　 batchcount  the maximum number of global cache objects that will be transferred to the per-CPU cache when it becomes empty

　 　shared  the sharing behavior for Symmetric MultiProcessing (SMP) systems

arp cache

ARP entries map hardware addresses to protocol addresses

   1. Cached in /proc/net/arp

    By default, the cache is limited to  entries as a soft limit and   entries as a hard limit    超过512会自动修简

　　
   2. Garbage collection removes stale or older entries

[root@server1 proc]# cat /proc/net/arp
IP address       HW type     Flags       HW address            Mask     Device
112.74.75.247    0x1         0x2         70:f9:6d:ee:67:af     *        eth1
10.24.223.247    0x1         0x2         70:f9:6d:ee:67:af     *        eth0

Insufficient ARP cache leads to

    Intermittent timeouts between hosts

    ARP thrashing

Too much ARP cache puts pressure on ZONE_NORMAL

List entries             //显示缓存条目

    ip neighbor list

Flush cache             //清空缓存条目

    ip neighbor flush dev ethX

Adjust where the gc will leave arp table alone

　　net.ipv4.neigh.default.gc_thresh1

　　default 128                       //小于128条目不被清除 不管是否过期,不被GC清除

Soft upper limit                     //软限制: 超过512,超过5秒,被清除

　　net.ipv4.neigh.default.gc_thresh2

　　defalut

　　Becomes hard limit after  seconds

Hard upper limit                        //硬限制

　　net.ipv4.neigh.default.gc_thresh3

Garbage collection frequency in seconds   //每隔几秒钟执行清理  大于128,过期的条目 （5分钟过期)

　　net.ipv4.neigh.default.gc_interval

vvm.lowmem_reserve_ratio

For some specialised workloads on highmem machines it is dangerous for the kernel to allow process memory to be allocated from the "lowmem" zone

Linux page allocator has a mechanism which prevents allocations which could use highmem from using too much lowmem

The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is in defending these lower zones

If you have a machine which uses highmem or ISA DMA and your applications are using mlock(), or if you are running with no swap then you probably should change the lowmem_reserve_ratio setting

page cache:

A large percentage of paging activity is due to I/O

    File reads: each page of file read from disk into memory

    These pages form the page cache

    Page cache is always checked for IO requests

Directory reads

    Reading and writing regular files

    Reading and writing via block device files, DISK IO

    Accessing memory mapped files, mmap

    Accessing swapped out pages

Pages in the page cache are associated with file data

Tuning page cache:

View page cache allocation in /proc/meminfo

Tune length/size of memory

    vm.lowmem_reserve_ratio

    vm.vfs_cache_pressure

Tune arrival/completion rate

    vm.page-cluster

    vm.zone_reclaim_mode

vfs_cache_pressure

Controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects

1.At the default value of vfs_cache_pressure=
  the kernel will attempt to reclaim dentries and inodes at a "fair" rate with respect to pagecache and swapcache reclaim

2.Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches

3.When vfs_cache_pressure=, the kernel will never reclaim dentries and inodes due to memory pressure and this can easily lead to out-of-memory conditions


4.Increasing vfs_cache_pressure beyond  causes the kernel to prefer to reclaim dentries and inodes

0:不回收dentries和inodes;
1-99：倾向于不回收 dentries和 inodes
100： 当 page cache 和 swap cache回收时,回收dentries和inodes
100+: 倾向于回收dentries和 inode

page-cluster

　　1.page-cluster controls the number of pages which are written to swap in a single attempt

   2.It is a logarithmic value - setting it to zero means "1 page", setting it to  means "2 pages",
       setting it to  means "4 pages", etc

   3.he default value is three (eight pages at a time)

   4.There may be some small benefits in tuning this to a different value if your workload is swap-intensive

1.2的n次个页交换到SWAP,当系统需要大量使用SWAP时 （虚拟化,云计算环境)

zone_reclaim_mode： 

　　Zone_reclaim_mode allows someone to set more or less aggressive approaches to reclaim memory
      when a zone runs out of memory

   If it is set to zero then no zone reclaim occurs

   Allocations will be satisfied from other zones / nodes in the system

This is value ORed together of

 = Zone reclaim on                       回收打打

 = Zone reclaim writes dirty pages out   回收写的脏页

 = Zone reclaim swaps pages

Anonymous pages：
Anonymous pages can be another large consumer of data

Are not associated with a file, but instead contain:

    Program data – arrays, heap allocations, etc     //打开的文件再PAGECACHE中

    Anonymous memory regions

    Dirty memory mapped process private pages

    IPC shared memory region pages

View summary usage

   grep Anon  /proc/meminfo

   cat  /proc/PID/statm

   Anonymous pages = RSS - Shared

Anonymous pages are eligible for swap  匿名页不能交换到swap

计算机体系结构 -内存优化vm+oom的更多相关文章

【MDCC技术大咖秀】Android内存优化之OOM
大神分析的很全面,所以就转过来保存一份,转自:http://www.csdn.net/article/2015-09-18/2825737/1 以下为正文: Android的内存优化是性能优化中很重要 ...
Android内存优化之OOM
内容大多都是和OOM有关的实践总结概要.理解错误或是偏差的地方,还请多包涵指正,谢谢!本人Q:1524447071 (一)Android的内存管理机制 Google在Android的官网上有这样一篇文 ...
计算机体系结构-内存调优IPC OOMK
man ipc [root@server1 proc]# man ipcIPC(2) Linux Programmer’s Manual ...
Android避免OOM（内存优化）
Android内存优化是性能优化很重要的一部分,而如何避免OOM又是内存优化的核心. Android内存管理机制 android官网有一篇文章 Android是如何管理应用的进程与内存分配 Andro ...
Linux性能优化之内存优化(二)
前言不知道大家看完前面一章关于CPU优化,是否受到相应的启发呢?如果遇到任何问题,可以留言和一起探讨这方面的问题.接下来我们介绍一些关于内存方面的知识.内存管理软件包括虚拟内存系统.地址转换.交换. ...
java虚拟机学习-JVM内存管理：深入Java内存区域与OOM(3)
概述 Java与C++之间有一堵由内存动态分配和垃圾收集技术所围成的高墙,墙外面的人想进去,墙里面的人却想出来. 对于从事C.C++程序开发的开发人员来说,在内存管理领域,他们即是拥有最高权力的皇帝又 ...
KVM总结-KVM性能优化之内存优化
我们说完CPU方面的优化(http://blog.csdn.net/dylloveyou/article/details/71169463),接着继续第二块内容,也就是内存方面的优化.内存方面有以下四 ...
关于android性能，内存优化
转:http://www.starming.com/index.php?action=plugin&v=wave&tpl=union&ac=viewgrouppost& ...
2017版：KVM 性能优化之内存优化
我们说完CPU方面的优化,接着我们继续第二块内容,也就是内存方面的优化.内存方面有以下四个方向去着手: EPT 技术大页和透明大页 KSM 技术内存限制 1. EPT技术 EPT也就是扩展页表,这 ...

随机推荐

hierarchyviewer偶然不能使用的解决方法
在DDMS的device中可以看到设备,并显示可以debug的状态,可以看到不显示进程的信息,但是hierarchyviewer也却不显示各个Window. 在控制台的打印信息如下: - hierar ...
alter system register
alter system register的用法 1 Static Registration via set the listener.ora2 Dynamic Instance Registrati ...
log4j详细配置说明
log4j配置祥解第一步:加入log4j-1.2.8.jar到lib下.第二步:在CLASSPATH下建立log4j.properties.内容如下:1 log4j.rootCategory=INF ...
linux源码Makefile的详细分析
目录一.概述 1.本文的意义 2.Linux内核Makefile文件组成二.Linux内核Makefile的“make解析”过程 1 顶层Makefile阶段 1.从总目标uImage说起 2.v ...
Ubuntu启动错误Checking Battery State的处理
一.问题描述二.处理方法方法一: 按下 ctrl + alt + F1,进入终端,使用管理员权限执行下列代码 sudo rm /etc/X11/xorg.conf sudo reboot 方法二: ...
Core MVC
Core MVC 配置全局路由前缀前言大家好,今天给大家介绍一个 ASP.NET Core MVC 的一个新特性,给全局路由添加统一前缀.严格说其实不算是新特性,不过是Core MVC特有的. 应 ...
oracle excute immediate 单引号转义
excute immedaite 后接单引号,但是遇到 add xxx default ' ' ,命令中还有单引号的情况,需要转义.这时候不是用传统的 \ 反斜杠来转义,而是用单引号转义 execu ...
Qt中事件处理的方法（图文并茂，仔细看看）
http://blog.csdn.net/qing666888/article/details/14111271 http://blog.csdn.net/qing666888/article/det ...
java static 执行顺序
先加载类,然后再实例化类. 继承与static 面试题目如下:请写出程序执行完成之后的结果. package extend; public class X { Y y=new Y(); static{ ...
poj2752 bzoj3670
2752这是一道关于next函数的题(其实好像也可以用后缀数组暴力搞搞,但大概会超时)根据next[i]=max{j} (s[0..j]=s[i-j..i] j<i)不难发现这正是某个串既是前缀 ...

计算机体系结构 -内存优化vm+oom

计算机体系结构 -内存优化vm+oom的更多相关文章

随机推荐

热门专题