mem alloc

page

  1. Noticeble:
  2. 1. there are two kind of page: virtual page, physical page.
  3. 2. the page struct is abstract of physical memory page, but not virtual memory!
  4. struct page {
  5. unsigned long flags; //page's status,eg: is dirty page or not?
  6. atomic_t _count; //the page referred count
  7. atomic_t _mapcount;
  8. ...
  9. void *virtual; //virtual field means the physical page's virltual address.
  10. }
  11. related function:
  12. 1. __count should be controled by page_count(), if page_count() return 0, means the page is not used!

zone

  1. Normally, there are zone like: ZONE_DMA, ZONE_NORMAL, ZONE_HIGHMMEM, ZONE_DMA32
  2. x86_64 only 2 zone: ZONE_NORMAL, ZONE_DMA
  3. struct zone {
  4. unsigned long watermark[NR_WMARK];
  5. ...
  6. const char *name; // This feild controled by alloc_pages in mm/page_alloc.c
  7. }

alloc page related functions

alloc pages

  1. struct page * alloc_pages(gfp_t gfp_mask, unsigned int order);
  2. unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
  3. struct page * alloc_page(gfp_t gfp_mask);
  4. unsigned long __get_free_page(gfp_t gfp_mask, unsigned int order);
  5. void * page_address(struct page *page);
  6. // Get a page fill out by 0.
  7. unsigned long get_zeroed_page(unsigned int gfp_mask);

free pages

  1. void __free_pages(struct page *page, unsigned int order);
  2. void free_pages(unsigned long addr, unsigned int order);
  3. void free_page(unsigned long addr);

e.g

Get 8 pages

  1. unsigned long page;
  2. page = __get_free_pages(GFP_KERNEL, 3);
  3. if (!page){
  4. return -ENOMEM;
  5. }
  6. free_pages(page, 3);

kmalloc()

If you want apply one page or two and more, maybe get_free_page() is more suiteble;

kmalloc is suiteble for apply bytes level size

  1. struct dog *p;
  2. p = kmalloc(sizeof(struct dog), GFP_KERNEL);

kfree()

  1. # include <linux/slab.h>
  2. void kfree(const void *str);
  1. char *buf;
  2. buf = kmalloc(BUF_SIZE, GFP_ATOMIC);
  3. if (!buf) {
  4. kfree(buf);
  5. }

gfp_mask mark

  1. 行为修饰符
  2. 区修饰符
  1. __GFP_DMA
  2. __GFP_DMA32
  3. __GFP_HIGHMEM
  4. In fact only these marks, there is no __GFP_NORMAL mark,
  5. because default will use normal zone area, normally !
  1. 不能给_get_free_page & kmalloc() 指定ZONE_HIGHMEM, 因为这两个函数返回的都是逻辑地址,而不是page 结构。
  2. 只有alloc_pages()才能分配高端内存, 实际上大部分使用情况下,我们不需要指定zone的描述符,normal足以。

vmalloc()

  1. void * vmalloc(unsigned long size);
  2. void vfree(const void *addr);

vmalloc(): the area applied is virtual address, and must be continuous, the

physical address don't need be continuous!

kmalloc(): will make sure the physical address applied must be continuous, so, the virtual address must be continuous too naturelly!

Normally, hardware need physical memory address applied is continuous, because hardware is beside kernel's memory management, Hardware don't

know what is virtual address.

More kmalloc() but not vmalloc():

Athough, kmalloc() can apply continuous physical memory address, but it have many advantages, just like low performance consumption, So, we'd like

to use kmalloc() normally, but use vmalloc() only under extremely conditions

slab

There are two mainly struct at slab subsystem:

  1. struct kmem_cache;
  2. struct slabinfo;

There are three kind of status for each slab:

  1. fill
  2. partial
  3. empty
  1. kmem_cache corresponed to a type of collected struct like "struct inode",
  2. there are lots of small "struct" in kernel need to be "alloc" and "free"
  3. frequently, So, Sun corporation designed "SLAB" conception to solve this
  4. problem, Acttually, it is cache, alloc memory area pre, and use it like
  5. a poll.
  6. But we always misunderstand the conception between "struct kmem_cache"
  7. and "struct slabinfo", We can introduce a new conception "A High Cache"
  8. which is corresponding a "struct kmem_cache".
  9. "struct kmem_cache" is corresponding to ONE type of "struct".
  10. "struct slabinfo" is the subset of kmem_cache, each slab struct is a
  11. set of memory address(maybe one or more pages)
  12. struct kmem_cache {
  13. unsigned int object_size;/* The original size of the object */
  14. unsigned int size; /* The aligned/padded/added on size */
  15. unsigned int align; /* Alignment as calculated */
  16. slab_flags_t flags; /* Active flags on the slab */
  17. const char *name; /* Slab name for sysfs */
  18. int refcount; /* Use counter */
  19. void (*ctor)(void *); /* Called on object slot creation */
  20. struct list_head list; /* List of all slab caches on the system */
  21. };

How to create A High Cache?

  1. struct kmem_cache * kmem_cache_create(const char *name,
  2. size_t size,
  3. size_t align,
  4. unsigned long flags,
  5. void (*otor)(void *));

How to destroy A High Cache?

  1. int kmem_cache_destroy(struct kmem_cache *cachep);
  2. * if you want to destory this High Cache, you must make sure all slab
  3. is empty
  4. * return 0 means destroy success!
  5. * alway used at a module be unset

How to alloc a objet from A High Cache?

There are more than one slab struct in "A High Cache". So, If we want

to alloc a address for a "small object struct", the condition is there

should have "not empty status slab" in this "A High Cache"!

  1. void * kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags);
  1. This will mark the object objp as status "unused".
  2. void kmeme_cache_free(struct kmem_cache *cachep, void *objp;

A SLAB EXAMPLE

Let's analyse a example , the struct "task_struct".

Well, this is a very famous struct, right?

  1. struct kmem_cache *task_struct_cachep; // This is kmem_cache name rule,
  2. task_struct_cachep a variable point
  3. to a struct kmem_cache.
  4. task_struct_cachep = kmem_cache_create("task_struct",
  5. sizeof(struct task_truct),
  6. ARCH_MIN_TASKALIGN,
  7. SLAB_PANIC | SLAB_NOTRACK,
  8. NULL);
  9. As bellow, we can see kmem_cache_create function's return value is the
  10. kmem_cache struct.
  11. So, we can say, we created a "A High Cache" named as "task_struct_cachep".
  12. and the struct type will be stored in "task_struct_cachep" is "struct task_struct".
  13. when excute fork() function, we must be create a new struct
  14. "struct task_struct", the mainly work will be done at do_fork();
  15. struct task_struct *tsk;
  16. tsk = kmem_cache_alloc(task_struct_cachep, GFP_KERNEL);
  17. if (!tsk) {
  18. return NULL;
  19. }
  20. ...
  21. ...
  22. kmem_cache_free(task_struct_cachep, tsk); //free object tsk from task_struct_cachep
  23. int err;
  24. err = kmem_cache_destroy(task_struct_cachep);
  25. if (err) {
  26. ...
  27. }

How kernel abstract memory

A global struct page array: mem_map[]

If your memory is 76G, you will have page count: 7610241024/4k = 19922944 pages, So, the mem_map[] array's size is 19922944.

NODE节点

  1. In NUMA structure. the NODE is abstract as struct pglist_data, usually use it's typedef name called: pg_data_t.
  2. the list pgdat_list connect with pg_data_t->node_next.
  1. typedef struct pglist_data {
  2. struct zone node_zones[MAX_NR_ZONES];
  3. struct zonelist node_zonelists[MAX_ZONELISTS];
  4. int nr_zones;
  5. #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
  6. struct page *node_mem_map;
  7. #ifdef CONFIG_PAGE_EXTENSION
  8. struct page_ext *node_page_ext;
  9. #endif
  10. #endif
  11. #ifndef CONFIG_NO_BOOTMEM
  12. struct bootmem_data *bdata;
  13. #endif
  14. #ifdef CONFIG_MEMORY_HOTPLUG
  15. /*
  16. * Must be held any time you expect node_start_pfn, node_present_pages
  17. * or node_spanned_pages stay constant. Holding this will also
  18. * guarantee that any pfn_valid() stays that way.
  19. *
  20. * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
  21. * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG.
  22. *
  23. * Nests above zone->lock and zone->span_seqlock
  24. */
  25. spinlock_t node_size_lock;
  26. #endif
  27. unsigned long node_start_pfn;
  28. unsigned long node_present_pages; /* total number of physical pages */
  29. unsigned long node_spanned_pages; /* total size of physical page
  30. range, including holes */
  31. int node_id;
  32. wait_queue_head_t kswapd_wait;
  33. wait_queue_head_t pfmemalloc_wait;
  34. struct task_struct *kswapd; /* Protected by
  35. mem_hotplug_begin/end() */
  36. int kswapd_order;
  37. enum zone_type kswapd_classzone_idx;
  38. int kswapd_failures; /* Number of 'reclaimed == 0' runs */
  39. #ifdef CONFIG_COMPACTION
  40. int kcompactd_max_order;
  41. enum zone_type kcompactd_classzone_idx;
  42. wait_queue_head_t kcompactd_wait;
  43. struct task_struct *kcompactd;
  44. #endif
  45. #ifdef CONFIG_NUMA_BALANCING
  46. /* Lock serializing the migrate rate limiting window */
  47. spinlock_t numabalancing_migrate_lock;
  48. /* Rate limiting time interval */
  49. unsigned long numabalancing_migrate_next_window;
  50. /* Number of pages migrated during the rate limiting time interval */
  51. unsigned long numabalancing_migrate_nr_pages;
  52. #endif
  53. /*
  54. * This is a per-node reserve of pages that are not available
  55. * to userspace allocations.
  56. */
  57. unsigned long totalreserve_pages;
  58. #ifdef CONFIG_NUMA
  59. /*
  60. * zone reclaim becomes active if more unmapped pages exist.
  61. */
  62. unsigned long min_unmapped_pages;
  63. unsigned long min_slab_pages;
  64. #endif /* CONFIG_NUMA */
  65. /* Write-intensive fields used by page reclaim */
  66. ZONE_PADDING(_pad1_)
  67. spinlock_t lru_lock;
  68. } pg_data_t;

mem reclaim

mem writeback

  1. 内存缓存
  2. 内存管理

其中内存缓存机制中,重要的结构体:

  1. struct page {
  2. unsigned long flags;
  3. union {
  4. struct address_space *mapping;
  5. }
  6. union {
  7. pgoff_t index;
  8. }
  9. ..
  10. }
  11. 若页面Cache中页的所有者是文件,address_space对象就嵌入在VFS inode对象中的
  12. i_data字段中。
  13. i_mapping字段总是指向含有inode数据的页所有者的address_space对象,
  14. address_space对象中的host字段指向其所有者的inode对象。
  15. struct address_space {
  16. struct inode *host;
  17. struct radix_tree_root page_tree;
  18. const struct address_space_operations *a_ops;
  19. ..
  20. ..
  21. }
  22. struct address_space_operations {
  23. }
  24. struct inode {
  25. struct address_space *i_mapping;
  26. struct address_space *i_data;
  27. }

http://oenhan.com/linux-cache-writeback

kernel memory code learn的更多相关文章

  1. Kernel Memory Layout on ARM Linux

    这是内核自带的文档,讲解ARM芯片的内存是如何布局的!比较简单,对于初学者可以看一下!但要想深入理解Linux内存管理,建议还是找几本好书看看,如深入理解Linux虚拟内存,嵌入系统分析,Linux内 ...

  2. 從 kernel source code 查出 版本號碼

    kernel/Makefile 1 VERSION = 4 2 PATCHLEVEL = 4 3 SUBLEVEL = 21 4 EXTRAVERSION = 5 NAME = Blurry Fish ...

  3. linux kernel & source code analysis& hacking

    https://kernelnewbies.org/ http://www.tldp.org/LDP/lki/index.html https://kernelnewbies.org/ML https ...

  4. Linux kernel Programming - Allocating Memory

    kmalloc #include <linux/slab.h> void *kmalloc(size_t size,int flags); void kfree(void *addr); ...

  5. How to compile and install Linux Kernel 5.1.2 from source code

    How to compile and install Linux Kernel 5.1.2 from source code Compiling a custom kernel has its adv ...

  6. Windows Kernel Security Training Courses

    http://www.codemachine.com/courses.html#kerdbg Windows Kernel Internals for Security Researchers Thi ...

  7. Spring Boot Memory Performance

    The Performance Zone is brought to you in partnership with New Relic. Quickly learn how to use Docke ...

  8. Microsoft Windows CE 5.0 Board Support Package, Boot Loader, and Kernel Startup Sequence

    Summary Learn about the initial, low-level startup sequence and the hardware platform functions that ...

  9. linux kernel RCU 以及读写锁

    信号量有一个很明显的缺点,没有区分临界区的读写属性,读写锁允许多个线程进程并发的访问临界区,但是写访问只限于一个线程,在多处理器系统中允许多个读者访问共享资源,但是写者有排他性,读写锁的特性如下:允许 ...

随机推荐

  1. UVA1601 The Morning afther Halloween

    题目大意 w h (w, h <= 16)的网格有 n ( n <= 3) 个小写字母(代表鬼)其余的是‘#’(代表障碍格) 或 ‘ ’(代表空格. 要求把他们移动到对应的大写字母里.每步 ...

  2. Set的非重复判断是根据什么判断的

    HashSet 首先来看下HashSet的add()这个方法的源代码: public boolean add(E e) { return map.put(e, PRESENT)==null; } 由此 ...

  3. Luogu[SDOI2008]Sue的小球

    题目描述 Sue和Sandy最近迷上了一个电脑游戏,这个游戏的故事发在美丽神秘并且充满刺激的大海上,Sue有一支轻便小巧的小船.然而,Sue的目标并不是当一个海盗,而是要收集空中漂浮的彩蛋,Sue有一 ...

  4. android api level对应表(copy)

    Platform Version API Level VERSION_CODE Notes Android 4.4 19 KITKAT Platform Highlights Android 4.3 ...

  5. Akka源码分析-Remote-收消息

    上一遍博客中,我们分析了网络链接建立的过程,一旦建立就可以正常的收发消息了.发送消息的细节不再分析,因为对于本地的actor来说这个过程相对简单,它只是创立链接然后给指定的netty网路服务发送消息就 ...

  6. C#中Random

    说明:C#中的随机数是一个伪随机数,随机数字从一组有限的数字选择以相同的概率,所选的数字不是完全随机的,因为使用数学算法来选择它们.在大多数Windows系统中,Random的15毫秒内创建的对象很可 ...

  7. php微信开放平台--第三方网页微信扫码登录(OAuth2.0)

    第一.OAuth2.0 OAuth(开放授权)是一个开放标准,允许用户让第三方应用访问该用户在某一网站上存储的私密的资源(如照片,视频,联系人列表),而无需将用户名和密码提供给第三方应用. 允许用户提 ...

  8. C/C++ Python的函数默认参数

    发现C/C++  Python的函数可以使用默认参数,来减少传参时候的参数个数. 但是:这样的默认参数最好是不变对象! #include <stdio.h> #include <st ...

  9. 如何下载 Nginx (windows 版本)并且简单的使用

    官网地址:http://nginx.org/ 进到官网 我这里下载的是 稳定版的 windows版本. 开始我们的简单测试 步骤一:找到nginx的压缩包,(随意找个地方)解压 步骤二:进入conf文 ...

  10. [ USACO 2018 OPEN ] Out of Sorts (Gold)

    \(\\\) \(Description\) 运行以下代码对一长为\(N\)的数列\(A\)排序,不保证数列元素互异: cnt = 0 sorted = false while (not sorted ...