[转]Linux中进程内存与cgroup内存的统计
From: http://hustcat.github.io/about/
Linux中进程内存与cgroup内存的统计
在Linux内核,对于进程的内存使用与Cgroup的内存使用统计有一些相同和不同的地方。
进程的内存统计
一般来说,业务进程使用的内存主要有以下几种情况:
(1)用户空间的匿名映射页(Anonymous pages in User Mode address spaces),比如调用malloc分配的内存,以及使用MAP_ANONYMOUS的mmap;当系统内存不够时,内核可以将这部分内存交换出去;
(2)用户空间的文件映射页(Mapped pages in User Mode address spaces),包含map file和map tmpfs;前者比如指定文件的mmap,后者比如IPC共享内存;当系统内存不够时,内核可以回收这些页,但回收之前可能需要与文件同步数据;
(3)文件缓存(page in page cache of disk file);发生在程序通过普通的read/write读写文件时,当系统内存不够时,内核可以回收这些页,但回收之前可能需要与文件同步数据;
(4)buffer pages,属于page cache;比如读取块设备文件。
其中(1)和(2)是算作进程的RSS,(3)和(4)属于page cache。
与进程内存统计相关的几个文件:
/proc/[pid]/stat
(23) vsize %lu
Virtual memory size in bytes.
(24) rss %ld
Resident Set Size: number of pages the process has
in real memory. This is just the pages which count
toward text, data, or stack space. This does not
include pages which have not been demand-loaded in,
or which are swapped out.
RSS的计算:
对应top的RSS列,do_task_stat
#define get_mm_rss(mm) \
(get_mm_counter(mm, file_rss) + get_mm_counter(mm, anon_rss))
即RSS=file_rss + anon_rss
- /proc/[pid]/statm
Provides information about memory usage, measured in pages.
The columns are:
size (1) total program size
(same as VmSize in /proc/[pid]/status)
resident (2) resident set size
(same as VmRSS in /proc/[pid]/status)
share (3) shared pages (i.e., backed by a file)
text (4) text (code)
lib (5) library (unused in Linux 2.6)
data (6) data + stack
dt (7) dirty pages (unused in Linux 2.6)
见函数proc_pid_statm
int task_statm(struct mm_struct *mm, int *shared, int *text,
int *data, int *resident)
{
*shared = get_mm_counter(mm, file_rss);
*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
>> PAGE_SHIFT;
*data = mm->total_vm - mm->shared_vm;
*resident = *shared + get_mm_counter(mm, anon_rss);
return mm->total_vm;
}
top的SHR=file_rss。实际上,进程使用的共享内存,也是算到file_rss的,因为共享内存基于tmpfs。
anon_rss与file_rss的计算
static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd,
pgoff_t pgoff, unsigned int flags, pte_t orig_pte)
{
if (flags & FAULT_FLAG_WRITE) {
if (!(vma->vm_flags & VM_SHARED)) {
anon = 1;///anon page
...
if (anon) {
inc_mm_counter(mm, anon_rss);
page_add_new_anon_rmap(page, vma, address);
} else {
inc_mm_counter(mm, file_rss);
page_add_file_rmap(page);
cgroup 的内存统计
stat file
memory.stat file includes following statistics
# per-memory cgroup local status
cache - # of bytes of page cache memory.
rss - # of bytes of anonymous and swap cache memory (includes
transparent hugepages).
rss_huge - # of bytes of anonymous transparent hugepages.
mapped_file - # of bytes of mapped file (includes tmpfs/shmem)
pgpgin - # of charging events to the memory cgroup. The charging
event happens each time a page is accounted as either mapped
anon page(RSS) or cache page(Page Cache) to the cgroup.
pgpgout - # of uncharging events to the memory cgroup. The uncharging
event happens each time a page is unaccounted from the cgroup.
swap - # of bytes of swap usage
dirty - # of bytes that are waiting to get written back to the disk.
writeback - # of bytes of file/anon cache that are queued for syncing to
disk.
inactive_anon - # of bytes of anonymous and swap cache memory on inactive
LRU list.
active_anon - # of bytes of anonymous and swap cache memory on active
LRU list.
inactive_file - # of bytes of file-backed memory on inactive LRU list.
active_file - # of bytes of file-backed memory on active LRU list.
unevictable - # of bytes of memory that cannot be reclaimed (mlocked etc).
相关代码
static void
mem_cgroup_get_local_stat(struct mem_cgroup *mem, struct mcs_total_stat *s)
{
s64 val;
/* per cpu stat */
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_CACHE);
s->stat[MCS_CACHE] += val * PAGE_SIZE;
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_RSS);
s->stat[MCS_RSS] += val * PAGE_SIZE;
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_FILE_MAPPED);
s->stat[MCS_FILE_MAPPED] += val * PAGE_SIZE;
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_PGPGIN_COUNT);
s->stat[MCS_PGPGIN] += val;
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_PGPGOUT_COUNT);
s->stat[MCS_PGPGOUT] += val;
if (do_swap_account) {
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_SWAPOUT);
s->stat[MCS_SWAP] += val * PAGE_SIZE;
}
/* per zone stat */
val = mem_cgroup_get_local_zonestat(mem, LRU_INACTIVE_ANON);
s->stat[MCS_INACTIVE_ANON] += val * PAGE_SIZE;
val = mem_cgroup_get_local_zonestat(mem, LRU_ACTIVE_ANON);
s->stat[MCS_ACTIVE_ANON] += val * PAGE_SIZE;
val = mem_cgroup_get_local_zonestat(mem, LRU_INACTIVE_FILE);
s->stat[MCS_INACTIVE_FILE] += val * PAGE_SIZE;
val = mem_cgroup_get_local_zonestat(mem, LRU_ACTIVE_FILE);
s->stat[MCS_ACTIVE_FILE] += val * PAGE_SIZE;
val = mem_cgroup_get_local_zonestat(mem, LRU_UNEVICTABLE);
s->stat[MCS_UNEVICTABLE] += val * PAGE_SIZE;
}
数据结构
struct mem_cgroup {
...
/*
* statistics. This must be placed at the end of memcg.
*/
struct mem_cgroup_stat stat; ///统计数据
};
/* memory cgroup统计值
* Statistics for memory cgroup.
*/
enum mem_cgroup_stat_index {
/*
* For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
*/
MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */
MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */
MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
MEM_CGROUP_STAT_PGPGIN_COUNT, /* # of pages paged in */
MEM_CGROUP_STAT_PGPGOUT_COUNT, /* # of pages paged out */
MEM_CGROUP_STAT_EVENTS, /* sum of pagein + pageout for internal use */
MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
MEM_CGROUP_STAT_NSTATS,
};
struct mem_cgroup_stat_cpu {
s64 count[MEM_CGROUP_STAT_NSTATS];
} ____cacheline_aligned_in_smp;
struct mem_cgroup_stat {
struct mem_cgroup_stat_cpu cpustat[0];
};
rss and cache
cache - # of bytes of page cache memory. rss - # of bytes of anonymous and swap cache memory (includes transparent hugepages).
static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
struct page_cgroup *pc,
long size)
{
...
cpustat = &stat->cpustat[cpu];
if (PageCgroupCache(pc))
__mem_cgroup_stat_add_safe(cpustat,
MEM_CGROUP_STAT_CACHE, numpages);
else
__mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_RSS,
numpages);
static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
struct page_cgroup *pc,
enum charge_type ctype,
int page_size)
{
switch (ctype) {
case MEM_CGROUP_CHARGE_TYPE_CACHE:
case MEM_CGROUP_CHARGE_TYPE_SHMEM: //file cache + shm
SetPageCgroupCache(pc);
SetPageCgroupUsed(pc);
break;
case MEM_CGROUP_CHARGE_TYPE_MAPPED:
ClearPageCgroupCache(pc);
SetPageCgroupUsed(pc);
break;
default:
break;
}
///更新统计值
mem_cgroup_charge_statistics(mem, pc, page_size);
int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask)
{
...
if (page_is_file_cache(page))
return mem_cgroup_charge_common(page, mm, gfp_mask,
MEM_CGROUP_CHARGE_TYPE_CACHE, NULL); ///file cache
/* shmem */
if (PageSwapCache(page)) {
ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &mem);
if (!ret)
__mem_cgroup_commit_charge_swapin(page, mem,
MEM_CGROUP_CHARGE_TYPE_SHMEM);
} else
ret = mem_cgroup_charge_common(page, mm, gfp_mask, ///shm memory
MEM_CGROUP_CHARGE_TYPE_SHMEM, mem);
可以看到,cache包含共享内存和file cache
mapped_file
mapped_file - # of bytes of mapped file (includes tmpfs/shmem)
void mem_cgroup_update_file_mapped(struct page *page, int val)
{
... __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_FILE_MAPPED, val);
__do_fault –> page_add_file_rmap –> mem_cgroup_update_file_mapped。
inactive_anon
inactive_anon - # of bytes of anonymous and swap cache memory on inactive LRU list.
static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
struct page **pagep, enum sgp_type sgp, gfp_t gfp, int *fault_type)
{
...
lru_cache_add_anon(page);
/**
* lru_cache_add: add a page to the page lists
* @page: the page to add
*/
static inline void lru_cache_add_anon(struct page *page)
{
__lru_cache_add(page, LRU_INACTIVE_ANON);
}
从这里可以看到,共享内存会增加到INACTIVE_ANON。
inactive_file
inactive_file - # of bytes of file-backed memory on inactive LRU list.文件使用的page cache(不包含共享内存)
static inline void lru_cache_add_file(struct page *page)
{
__lru_cache_add(page, LRU_INACTIVE_FILE);
}
add_to_page_cache_lru –> lru_cache_add_file.
示例程序
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#define BUF_SIZE 4000000000
#define MYKEY 26
int main(int argc,char **argv){
int shmid;
char *shmptr;
if((shmid = shmget(MYKEY,BUF_SIZE,IPC_CREAT)) ==-1){
fprintf(stderr,"Create Share Memory Error0m~Z%s\n\a",strerror(errno));
exit(1);
}
if((shmptr =shmat(shmid,0,0))==(void *)-1){
printf("shmat error!\n");
exit(1);
}
memset(shmptr,'\0',1000000000);
printf("sleep...\n");
while(1)
sleep(1);
exit(0);
}
执行程序前后,cgroup memory.stat的值:
- 执行前:*
# cat memory.stat
cache 1121185792
rss 23678976
rss_huge 0
mapped_file 14118912
inactive_anon 1002643456
active_anon 23687168
inactive_file 46252032
active_file 72282112
- 执行后:*
# cat memory.stat
cache 2121187328
rss 23760896
rss_huge 0
mapped_file 1014124544
inactive_anon 2002608128
active_anon 23736320
inactive_file 46247936
active_file 72286208
ipcs -m
0x0000001a 229380 root 0 4000000000 1
可以看到cgroup中,共享内存计算在cache、mapped_file、inactive_anon中。
小结
(1)进程rss与cgroup rss的区别
进程的RSS为进程使用的所有物理内存(file_rss+anon_rss),即Anonymous pages+Mapped apges(包含共享内存)。cgroup RSS为(anonymous and swap cache memory),不包含共享内存。两者都不包含file cache。
(2)cgroup cache包含file cache和共享内存。
参考
http://man7.org/linux/man-pages/man5/proc.5.html
https://www.kernel.org/doc/Documentation/cgroups/memory.txt
[转]Linux中进程内存与cgroup内存的统计的更多相关文章
- linux中进程控制
1.进程标识 每个进程都有一个非负整型表示的唯一的进程ID.进程ID标识符总是唯一的. 虽然进程ID是唯一的,但某个ID被回收后,ID号是可以复用的. ID为0的进程通常是调度进程(其常常被称交换进 ...
- 【网络编程基础】Linux下进程通信方式(共享内存,管道,消息队列,Socket)
在网络课程中,有讲到Socket编程,对于tcp讲解的环节,为了加深理解,自己写了Linux下进程Socket通信,在学习的过程中,又接触到了其它的几种方式.记录一下. 管道通信(匿名,有名) 管道通 ...
- Linux中进程控制块PCB-------task_struct结构体结构
Linux中task_struct用来控制管理进程,结构如下: struct task_struct { //说明了该进程是否可以执行,还是可中断等信息 volatile long state; // ...
- 『学了就忘』Linux系统管理 — 82、Linux中进程的查看(ps命令)
目录 1.ps命令介绍 2.ps aux命令示例 3.ps -le命令示例 4.pstree命令 1.ps命令介绍 ps命令是用来静态显示系统中进程的命令. 不过这个命令有些特殊,它部分命令的选项前不 ...
- 『学了就忘』Linux系统管理 — 83、Linux中进程的查看(top命令)
目录 1.top命令介绍 2.top命令示例 3.top命令输出项解释 4.top命令常用的实例 1.top命令介绍 top命令是用来动态显示系统中进程的命令. [root@localhost ~]# ...
- Linux中进程的优先级
Linux採用两种不同的优先级范围,一种是nice值.还有一种是实时优先级. 1.nice值 nice值得范围是-20~19,默认值是0. 越大的nice值意味着更低的优先级.也就是说nice值为-2 ...
- Linux 中进程的管理
Linux 的进程信号 1 HUP 挂起 2 INT 中断 3 QUIT 结束运行 9 KILL 无条件终止 11 SEGV 段错误 15 TERM 尽可能终止 17 STOP 无条件终止运 ...
- 使用Visual VM 查看linux中tomcat运行时JVM内存
前言:在生产环境中经常发生服务器内存溢出,假死或者线程死锁等异常,导致服务不可用.我们经常使用的解决方法是通过分析错误日记,然后去寻找代码到底哪里出现了问题,这样的方式也许会奏效,但是排查起来耗费时间 ...
- 关于Linux下进程间使用共享内存和信号量通信的时的编译问题
今天在编译一个使用信号量实现进程同步时,出现了库函数不存在的问题.如下图 编译结果实际上是说,没include相应的头文件,或是头文件不存在(即系统不支持该库函数) 但我man shm_open是可以 ...
随机推荐
- Python数据导入
1.csv格式数据导入 import pandas as pd w=pd.read.csv("数据地址") w.describe() w.sort_values(by=" ...
- jvm实例的个数
Generally speaking, each application will get its own JVM instance and its own OS-level process and ...
- Hibernate连接数据库
包结构如下图所示(按图标进行对齐): 环境搭好后代码分为以下几步: /** * private static final Configuration CONFIGURATION; * private ...
- python-----使用requirements.txt批量安装包
首先写一个 requirements.txt,格式如图: 然后使用命令行,到 requirements.txt 所在的目录下,执行命令: pip install -r requirements.txt ...
- Could not find modernizr-2.6.2 in any of the sources GitLab: API is not accessible
Could not find modernizr-2.6.2 in any of the sources GitLab: API is not accessible bundle exec rake ...
- TI BLE:读本机地址
uint8 ownAddress[B_ADDR_LEN]; //B_ADDR_LEN=6GAPRole_GetParameter(GAPROLE_BD_ADDR, ownAddress); #def ...
- kernel信息及其相关命令
内核 linux内核是单内核体系设计.但充分借鉴了微内核设计体系的优点,为内核引入模块化机制 内核组成部分: kernel: 内核核心,一般为bzImage,通常在/boot 目录下,名称为vmlin ...
- 为什么JavaWeb项目要分层
首先让我们坐着时光机回到n年前的web开发.那个时候最早都是静态的html页面,后来有了数据库,有了所谓的动态页面,然后程序猿在编码的时候,会把所有的代码都写在页面上,包括数据库连接,包括事务控制,接 ...
- hdu1512 Monkey King(并查集,左偏堆)
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=1512 题目大意:有n个猴子,一开始每个猴子只认识自己.每个猴子有一个力量值,力量值越大表示这个猴子打架 ...
- Linux 搭建Discuz论坛
title: Linux 搭建Discuz论坛 Welcome to Fofade's Blog! 这里是Linux 搭建论坛的一些命令记录 命令摘记: 下载文件:Discuz 安装环境:PHP Ap ...