1. migrate_pages系统调用

Linux提供了migrate_pages系统调用，从old_nodes中获取原内存节点，从new_nodes中获取目的内存节点；然后将当前进程的mm_struct作为参数，调用do_migrate_pages进行迁移操作。

SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,

        const unsigned long __user *, old_nodes,

        const unsigned long __user *, new_nodes)

{

    const struct cred *cred = current_cred(), *tcred;

    struct mm_struct *mm = NULL;

    struct task_struct *task;

    nodemask_t task_nodes;

    int err;

    nodemask_t *old;

    nodemask_t *new;

    NODEMASK_SCRATCH(scratch);

    if (!scratch)

        return -ENOMEM;

    old = &scratch->mask1;

    new = &scratch->mask2;

    err = get_nodes(old, old_nodes, maxnode);

    if (err)

        goto out;

    err = get_nodes(new, new_nodes, maxnode);

    if (err)

        goto out;

...

    mm = get_task_mm(task);

    put_task_struct(task);

    if (!mm) {

        err = -EINVAL;

        goto out;

    }

    err = do_migrate_pages(mm, old, new,

        capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);

...

}

do_migrate_pages最终将迁移工作交给migrate_pages来处理：do_migrate_pages-->migrate_to_node-->migrate_pages。

2. migrate_pages

migrate_pages-------------------------------------页面迁移核心函数

    unmap_and_move

        get_new_page------------------------------分配新页面

        __unmap_and_move--------------------------迁移页面到新页面

            move_to_new_page

                page_mapping----------------------找到页面对应的地址空间

                migrate_page----------------------将旧页面的相关信息迁移到新页面

                    migrate_page_copy

                remove_migration_ptes-------------利用方向映射找到映射旧页面的每个PTE

                    remove_migration_pte----------处理其中一个虚拟地址

from表示将要迁移的页面链表，get_new_page是内存函数指针，put_new_page是迁移失败时释放目标页面的函数指针，private是传递给get_new_page的参数，mode是迁移模式，reason表示迁移原因。

int migrate_pages(struct list_head *from, new_page_t get_new_page,

        free_page_t put_new_page, unsigned long private,

        enum migrate_mode mode, int reason)

{

    int retry = ;

    int nr_failed = ;

    int nr_succeeded = ;

    int pass = ;

    struct page *page;

    struct page *page2;

    int swapwrite = current->flags & PF_SWAPWRITE;

    int rc;

    if (!swapwrite)

        current->flags |= PF_SWAPWRITE;

    for(pass = ; pass <  && retry; pass++) {--------------------------尝试10次，从from摘取一个页面，调用unmap_and_move()函数进行页迁移，返回MIGRATEPAGE_SUCCESS表示页迁移成功。

        retry = ;

        list_for_each_entry_safe(page, page2, from, lru) {

            cond_resched();

            if (PageHuge(page))

                rc = unmap_and_move_huge_page(get_new_page,

                        put_new_page, private, page,

                        pass > , mode);

            else

                rc = unmap_and_move(get_new_page, put_new_page,

                        private, page, pass > , mode);

            switch(rc) {

            case -ENOMEM:

                goto out;

            case -EAGAIN:

                retry++;

                break;

            case MIGRATEPAGE_SUCCESS:

                nr_succeeded++;

                break;

            default:

                /*

                 * Permanent failure (-EBUSY, -ENOSYS, etc.):

                 * unlike -EAGAIN case, the failed page is

                 * removed from migration page list and not

                 * retried in the next outer loop.

                 */

                nr_failed++;

                break;

            }

        }

    }

    rc = nr_failed + retry;

out:

    if (nr_succeeded)

        count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);

    if (nr_failed)

        count_vm_events(PGMIGRATE_FAIL, nr_failed);

    trace_mm_migrate_pages(nr_succeeded, nr_failed, mode, reason);

    if (!swapwrite)

        current->flags &= ~PF_SWAPWRITE;

    return rc;

}

newpage是由get_new_page分配的页面，__unmap_and_move尝试迁移页面page到新分配的页面newpage。

__unmap_new_page被unmap_new_page调用，其中force在尝试次数超过2时，就被置1。

static int __unmap_and_move(struct page *page, struct page *newpage,

                int force, enum migrate_mode mode)

{

    int rc = -EAGAIN;

    int page_was_mapped = ;

    struct anon_vma *anon_vma = NULL;

    if (!trylock_page(page)) {------------------------------------尝试给页面枷锁，返回false表示已经有别的进程给page枷锁，返回true表示当前进程可以成功获取锁。

        if (!force || mode == MIGRATE_ASYNC)----------------------加锁失败，且强制迁移或异步模式，则忽略这个页面。

            goto out;

        /*

         * It's not safe for direct compaction to call lock_page.

         * For example, during page readahead pages are added locked

         * to the LRU. Later, when the IO completes the pages are

         * marked uptodate and unlocked. However, the queueing

         * could be merging multiple pages for one bio (e.g.

         * mpage_readpages). If an allocation happens for the

         * second or third page, the process can end up locking

         * the same page twice and deadlocking. Rather than

         * trying to be clever about what pages can be locked,

         * avoid the use of lock_page for direct compaction

         * altogether.

         */

        if (current->flags & PF_MEMALLOC)-------------------------可能在直接内存压缩路径上，睡眠等待页面锁是不安全的，忽略此页面。

            goto out;

        lock_page(page);

    }

    if (PageWriteback(page)) {

        /*

         * Only in the case of a full synchronous migration is it

         * necessary to wait for PageWriteback. In the async case,

         * the retry loop is too short and in the sync-light case,

         * the overhead of stalling is too much

         */

        if (mode != MIGRATE_SYNC) {

            rc = -EBUSY;

            goto out_unlock;

        }

        if (!force)

            goto out_unlock;

        wait_on_page_writeback(page);

    }

    /*

     * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,

     * we cannot notice that anon_vma is freed while we migrates a page.

     * This get_anon_vma() delays freeing anon_vma pointer until the end

     * of migration. File cache pages are no problem because of page_lock()

     * File Caches may use write_page() or lock_page() in migration, then,

     * just care Anon page here.

     */

    if (PageAnon(page) && !PageKsm(page)) {

        /*

         * Only page_lock_anon_vma_read() understands the subtleties of

         * getting a hold on an anon_vma from outside one of its mms.

         */

        anon_vma = page_get_anon_vma(page);

        if (anon_vma) {

            /*

             * Anon page

             */

        } else if (PageSwapCache(page)) {

            /*

             * We cannot be sure that the anon_vma of an unmapped

             * swapcache page is safe to use because we don't

             * know in advance if the VMA that this page belonged

             * to still exists. If the VMA and others sharing the

             * data have been freed, then the anon_vma could

             * already be invalid.

             *

             * To avoid this possibility, swapcache pages get

             * migrated but are not remapped when migration

             * completes

             */

        } else {

            goto out_unlock;

        }

    }

    if (unlikely(isolated_balloon_page(page))) {

        /*

         * A ballooned page does not need any special attention from

         * physical to virtual reverse mapping procedures.

         * Skip any attempt to unmap PTEs or to remap swap cache,

         * in order to avoid burning cycles at rmap level, and perform

         * the page migration right away (proteced by page lock).

         */

        rc = balloon_page_migrate(newpage, page, mode);

        goto out_unlock;

    }

    /*

     * Corner case handling:

     * 1. When a new swap-cache page is read into, it is added to the LRU

     * and treated as swapcache but it has no rmap yet.

     * Calling try_to_unmap() against a page->mapping==NULL page will

     * trigger a BUG.  So handle it here.

     * 2. An orphaned page (see truncate_complete_page) might have

     * fs-private metadata. The page can be picked up due to memory

     * offlining.  Everywhere else except page reclaim, the page is

     * invisible to the vm, so the page can not be migrated.  So try to

     * free the metadata, so the page can be freed.

     */

    if (!page->mapping) {

        VM_BUG_ON_PAGE(PageAnon(page), page);

        if (page_has_private(page)) {

            try_to_free_buffers(page);

            goto out_unlock;

        }

        goto skip_unmap;

    }

    /* Establish migration ptes or remove ptes */

    if (page_mapped(page)) {----------------------------------------------有pte映射的页面，调用try_to_unmap()解除页面所有映射

        try_to_unmap(page,

            TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);

        page_was_mapped = ;

    }

skip_unmap:

    if (!page_mapped(page))-----------------------------------------------已经解除完所有映射的页面，将页面迁移到新分配的页面newpage。

        rc =move_to_new_page(newpage, page, page_was_mapped, mode);

    if (rc && page_was_mapped)--------------------------------------------rc不为0表示迁移页面失败，调用remove_migration_ptes()删掉迁移的pte。

        remove_migration_ptes(page, page);

    /* Drop an anon_vma reference if we took one */

    if (anon_vma)

        put_anon_vma(anon_vma);

out_unlock:

    unlock_page(page);

out:

    return rc;

}

move_to_new_page试讲page内容迁移到newpage，mode是迁移模式异步、或者同步。

static int move_to_new_page(struct page *newpage, struct page *page,

                int page_was_mapped, enum migrate_mode mode)

{

    struct address_space *mapping;

    int rc;

    /*

     * Block others from accessing the page when we get around to

     * establishing additional references. We are the only one

     * holding a reference to the new page at this point.

     */

    if (!trylock_page(newpage))--------------------------------------------持锁失败，说明可能被其它进程加锁，BUG进行处理。

        BUG();

    /* Prepare mapping for the new page.*/

    newpage->index = page->index;

    newpage->mapping = page->mapping;

    if (PageSwapBacked(page))

        SetPageSwapBacked(newpage);

    mapping = page_mapping(page);-------------------------------------------检查当前页面你是否被映射。如果page属于slab或是匿名页面，返回为空。SWAP则返回swap_address_space空间；其余page cache直接返回page->mapping。

    if (!mapping)

        rc =migrate_page(mapping, newpage, page, mode);--------------------slab或者匿名页面调用migrate_page()将旧页面相关信息迁移到新页面。

    else if (mapping->a_ops->migratepage)

        /*

         * Most pages have a mapping and most filesystems provide a

         * migratepage callback. Anonymous pages are part of swap

         * space which also has its own migratepage callback. This

         * is the most common path for page migration.

         */

        rc = mapping->a_ops->migratepage(mapping,

                        newpage, page, mode);-------------------------------有mapping的情况，调用migratepage函数进行迁移。

    else

        rc = fallback_migrate_page(mapping, newpage, page, mode);

    if (rc != MIGRATEPAGE_SUCCESS) {

        newpage->mapping = NULL;

    } else {

        mem_cgroup_migrate(page, newpage, false);

        if (page_was_mapped)

            remove_migration_ptes(page, newpage);

        page->mapping = NULL;

    }

    unlock_page(newpage);

    return rc;

}

migrate_pte函数进行页面复制工作。

int migrate_page(struct address_space *mapping,

        struct page *newpage, struct page *page,

        enum migrate_mode mode)

{

    int rc;

    BUG_ON(PageWriteback(page));    /* Writeback must be complete */

    rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, );-------------对于匿名页面来说，什么也不做直接返回。

    if (rc != MIGRATEPAGE_SUCCESS)

        return rc;

    migrate_page_copy(newpage, page);--------------------------------------------------把页面page复制到新页面newpage中。

    return MIGRATEPAGE_SUCCESS;

}

remove_migration_ptes利用RMAP反向映射系统找到映射旧页面的每个pte，然后和新页面建立新的映射关系。

static void remove_migration_ptes(struct page *old, struct page *new)

{

    struct rmap_walk_control rwc = {

        .rmap_one =remove_migration_pte,

        .arg = old,

    };

    rmap_walk(new, &rwc);

}

static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,

                 unsigned long addr, void *old)

{

    struct mm_struct *mm = vma->vm_mm;

    swp_entry_t entry;

     pmd_t *pmd;

    pte_t *ptep, pte;

     spinlock_t *ptl;

    if (unlikely(PageHuge(new))) {

        ptep = huge_pte_offset(mm, addr);

        if (!ptep)

            goto out;

        ptl = huge_pte_lockptr(hstate_vma(vma), mm, ptep);

    } else {

        pmd = mm_find_pmd(mm, addr);

        if (!pmd)

            goto out;

        ptep = pte_offset_map(pmd, addr);-------------------------------------通过mm和addr找到对应页表项pte。

        /*

         * Peek to check is_swap_pte() before taking ptlock?  No, we

         * can race mremap's move_ptes(), which skips anon_vma lock.

         */

        ptl = pte_lockptr(mm, pmd);

    }

     spin_lock(ptl);

    pte = *ptep;

    if (!is_swap_pte(pte))

        goto unlock;

    entry = pte_to_swp_entry(pte);

    if (!is_migration_entry(entry) ||

        migration_entry_to_page(entry) != old)

        goto unlock;

    get_page(new);

    pte = pte_mkold(mk_pte(new, vma->vm_page_prot));

    if (pte_swp_soft_dirty(*ptep))

        pte = pte_mksoft_dirty(pte);

    /* Recheck VMA as permissions can change since migration started  */

    if (is_write_migration_entry(entry))

        pte = maybe_mkwrite(pte, vma);

#ifdef CONFIG_HUGETLB_PAGE

    if (PageHuge(new)) {

        pte = pte_mkhuge(pte);

        pte = arch_make_huge_pte(pte, vma, new, );

    }

#endif

    flush_dcache_page(new);

    set_pte_at(mm, addr, ptep, pte);-----------------------------------------把映射的pte页表项内容设置到新页面pte中，重新建立映射关系。

    if (PageHuge(new)) {

        if (PageAnon(new))

            hugepage_add_anon_rmap(new, vma, addr);

        else

            page_dup_rmap(new);

    } else if (PageAnon(new))

        page_add_anon_rmap(new, vma, addr);

    else

        page_add_file_rmap(new);---------------------------------------------把新页面newpage添加到RMAP反向映射系统中。

    /* No need to invalidate - it was non-present before */

    update_mmu_cache(vma, addr, ptep);---------------------------------------更新相应的cache

unlock:

    pte_unmap_unlock(ptep, ptl);

out:

    return SWAP_AGAIN;

}

Linux内存管理 (15)页面迁移的更多相关文章

Linux内存管理之页面回收【转】
转自:http://blog.csdn.net/bullbat/article/details/7311205 请求调页机制,只要用户态进程继续执行,他们就能获得页框,然而,请求调页没有办法强制进程释 ...
Linux内存管理 (13)回收页面
专题:Linux内存管理专题关键词:LRU.活跃/不活跃-文件缓存/匿名页面.Refault Distance. 页面回收.或者回收页面也即page reclaim,依赖于LRU链表对页面进行分类: ...
Linux内存管理 (14)匿名页面生命周期
专题:Linux内存管理专题关键词:匿名页面.换入.换出. 如果要将匿名页面的生命周期进行划分,大概就是诞生.使用.换出.换入和销毁. 内核中使用匿名页面的地方有很多,产生缺页中断之后匿名页面就诞生 ...
Linux内存管理 (4)分配物理页面
专题:Linux内存管理专题关键词:分配掩码.伙伴系统.水位(watermark).空闲伙伴块合并. 我们知道Linux内存管理是以页为单位进行的,对内存的管理是通过伙伴系统进行. 从Linux内存 ...
Linux内存描述之内存页面page–Linux内存管理(四)
服务器体系与共享存储器架构日期内核版本架构作者 GitHub CSDN 2016-06-14 Linux-4.7 X86 & arm gatieme LinuxDeviceDriver ...
Linux内存管理 (5)slab分配器
专题:Linux内存管理专题关键词:slab/slub/slob.slab描述符.kmalloc.本地/共享对象缓冲池.slabs_partial/slabs_full/slabs_free.ava ...
Linux内存管理专题
Linux的内存管理涉及到的内容非常庞杂,而且与内核的方方面面耦合在一起,想要理解透彻非常困难. 在开始学习之前进行了一些准备工作<如何展开Linux Memory Management学习?& ...
Linux内存管理 (25)内存sysfs节点解读
1. General 1.1 /proc/meminfo /proc/meminfo是了解Linux系统内存使用状况主要接口,也是free等命令的数据来源. 下面是cat /proc/meminfo的 ...
Linux内存管理 (21)OOM
专题:Linux内存管理专题关键词:OOM.oom_adj.oom_score.badness. Linux内核为了提高内存的使用效率采用过度分配内存(over-commit memory)的办法, ...

随机推荐

Python2 编码问题分析
本文浅显易懂,绿色纯天然,手工制作,请放心阅读. 编码问题是一个很大很杂的话题,要向彻底的讲明白可以写一本书了.导致乱码的原因很多,系统平台.编程语言.多国语言.软件程序支持.用户选择等都可能导致无法 ...
Linux基础命令第三天
1,vim编辑器命令模式下: pageup 往上翻页 pagedown 往下翻页 H 光标移动到屏幕首行 gg 光标动荡到文档的首行,如果前面加上n,表示移动到n行 G 移动文档最后一行 /name ...
[转]Gitlab-CI持续集成之Runner配置和CI脚本
本文转自:https://www.cnblogs.com/jiukun/p/7481287.html 一.简介 1. 为实现持续集成,需为该项目准备以下两样东西: 1)软件集成脚本.(gitlab-c ...
折腾Java设计模式之策略模式
博客原文地址简介在策略模式(Strategy Pattern)中,一个类的行为或其算法可以在运行时更改.这种类型的设计模式属于行为型模式.简单理解就是一组算法,可以互换,再简单点策略就是封装算法. ...
Selenium自动化-调用Mysql数据库
上几篇博客发布了几篇Selenium入门知识和进阶, 现在附上如何从数据库中取值能够逐行取值,并且返回二维数组 import java.io.FileInputStream; import jav ...
Selenium自动化 Xpath-元素定位
最近在教妹子做自动化测试,妹子基础差,于是想到很多初学自动化的朋友们学习的知识没有规范化,信息太过杂乱.所以,本文整理了一些自动化元素定位方式: 这次将讲Xpath定位! 什么是Xpath: Path ...
dns server 域名解析总结
1.客户有两种使用公网域名解析的方法,一种是,直接配置A记录,将域名直接解析到ip地址.第二种是,配置NS记录,将对这个域名的解析分配给另外一个域名服务器,这个域名服务器就是客户自己搭建的内部域名服务 ...
JavaScript中的高阶函数
之前写的<JavaScript学习手册>,客户跟我说有些内容不适合初学者,让我删了,感觉挺可惜的,拿到这里和大家分享. JavaScript中的一切都是对象,这句话同样适用于函数.函数对象 ...
Git在商业项目中的使用流程
一引言这一篇文章还是记录我在杭州工作的总结. 我刚来公司的时候,对Git的使用很头痛,因为在学校里面很少用这个东西,即使用,一般也只有一个分支,不会出现代码冲突和代码合并的情况.但是公司里面一个项 ...
Dubbo开发，利用项目模拟提供者和消费者之间的调用--初学
开发工具:IDEA,虚拟机 VMware Workstation 预备工作:安装好zookeeper的虚拟机,电脑jdk更换为1.7,本地tomcat启动,能够访问以下页面即可进行开发 2.建立以下s ...

Linux内存管理 (15)页面迁移

1. migrate_pages系统调用

2. migrate_pages

Linux内存管理 (15)页面迁移的更多相关文章

随机推荐

热门专题