Linux3.10.0块IO子系统流程（7）-- 请求处理完成

和提交请求相反，完成请求的过程是从低层驱动开始的。请求处理完成分为两个部分：上半部和下半部。开始时，请求处理完成总是处在中断上下文，在这里的主要任务是将已完成的请求放到某个队列中，然后引发软终端让中断“下半部”来处理，这是通常的做法。而“下半部”则依次处理队列中的每一个已完成的请求。

在讲派发SCSI命令的时候，提到了scsi_done，低层驱动在初始化硬件时，注册过一个中断回调函数。在硬件中断被引发时，中断回调函数将被调用，如果是对SCSI命令的相应，则将找到对应的scsi_cmnd描述符，低层设备驱动处理完这个请求后，调用保存在它里面的scsi_done函数，将它交给SCSI核心来处理。

 /**

 * scsi_done - Enqueue the finished SCSI command into the done queue.

 * @cmd: The SCSI Command for which a low-level device driver (LLDD) gives

 * ownership back to SCSI Core -- i.e. the LLDD has finished with it.

 *

 * Description: This function is the mid-level's (SCSI Core) interrupt routine,

 * which regains ownership of the SCSI command (de facto) from a LLDD, and

 * enqueues the command to the done queue for further processing.

 *

 * This is the producer of the done queue who enqueues at the tail.

 *

 * This function is interrupt context safe.

 */

 static void scsi_done(struct scsi_cmnd *cmd)

 {

     trace_scsi_dispatch_cmd_done(cmd);

     blk_complete_request(cmd->request);

 }

 /**

 * blk_complete_request - end I/O on a request

 * @req:      the request being processed

 *

 * Description:

 *     Ends all I/O on a request. It does not handle partial completions,

 *     unless the driver actually implements this in its completion callback

 *     through requeueing. The actual completion happens out-of-order,

 *     through a softirq handler. The user must have registered a completion

 *     callback through blk_queue_softirq_done().

 *     如果用户在编译内核时指定了FAIL_IO_TIMEOUT选项，则提供在请求处理完成时注入错误的能力。

 *     Linux内核包含了大量的代码来“注入”错误，其思想是模拟故障，让我们检查程序对故障的处理是否完善。

 *     请求完成逻辑调用blk_mark_rq_complete函数以原子的方式设置块设备驱动层请求的REQ_ATOM_COMPLETE标志位，这是为了防止错误恢复定时器同时来试图“抢夺”这个块设备驱动层请求

 **/

 void blk_complete_request(struct request *req)

 {

     if (unlikely(blk_should_fake_timeout(req->q)))

         return;

     if (!blk_mark_rq_complete(req))

         __blk_complete_request(req);

 }

一般来说，Linux软中断遵循谁引发谁执行的原则。但有一种情况我们需要考虑，在SMP（多对称处理器）系统中，假设一个进程运行在一个CPU上，它执行了一个读文件操作，该操作一步一步向低层推进，终于到了块IO层进而接触到了磁盘驱动，到了硬件层CPU就管不着了，这时执行读操作的进程不得不在一个等待队列上等待，进程开始睡眠，睡眠以后，磁盘操作交给了磁盘硬件，操作中硬件通过中断来通知操作的执行情况。很显然操作执行完毕后也是通过中断来通知的，可是被中断的CPU还是执行读文件的进程所在的那个CPU吗？这是无法保证的。

我们知道IO完成是通过软中断来执行的，完成操作也就是唤醒原始的进程，如果是被磁盘中断的CPU来引发IO完成软中断，那么由Linux软中断谁引发谁执行的原则，就应该由此被中断的CPU来执行IO完成软中断。实际上就是这个CPU唤醒了在不同CPU上睡眠的进程，但是唤醒不同CPU上的进程开销很大，涉及迁移、计数、负载均衡等细节。

我们只需记住原始的睡眠的进程所在的CPU，就可以在硬件中断完成后引发软中断的时刻将软中断路由到这个被记住的CPU上，这样的话，最终的操作就是一个软中断唤醒了在当前CPU上睡眠的进程，这个开销是很小的。

了解这些之后，再看以下的代码：

 void __blk_complete_request(struct request *req)

 {

     int ccpu, cpu;

     struct request_queue *q = req->q;

     unsigned long flags;

     bool shared = false;

     BUG_ON(!q->softirq_done_fn);

     local_irq_save(flags);

     cpu = smp_processor_id();

     /*

      * Select completion CPU

      */

     if (req->cpu != -) {

         ccpu = req->cpu;

         if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))

             shared = cpus_share_cache(cpu, ccpu);

     } else

         ccpu = cpu;

     /*

      * If current CPU and requested CPU share a cache, run the softirq on

      * the current CPU. One might concern this is just like

      * QUEUE_FLAG_SAME_FORCE, but actually not. blk_complete_request() is

      * running in interrupt handler, and currently I/O controller doesn't

      * support multiple interrupts, so current CPU is unique actually. This

      * avoids IPI sending from current CPU to the first CPU of a group.

      */

     if (ccpu == cpu || shared) {

         struct list_head *list;

 do_local:

         list = &__get_cpu_var(blk_cpu_done);

         list_add_tail(&req->csd.list, list);

         /*

          * if the list only contains our just added request,

          * signal a raise of the softirq. If there are already

          * entries there, someone already raised the irq but it

          * hasn't run yet.

          */

         if (list->next == &req->csd.list)

             raise_softirq_irqoff(BLOCK_SOFTIRQ);    // 触发软中断，这个中断绑定blk_done_softirq

     } else if (raise_blk_irq(ccpu, req))

         goto do_local;

     local_irq_restore(flags);

 }

软中断BLOCK_SOFTIRQ在blk_softirq_init中初始化，这个函数执行以下工作：

1.为每个CPU初始化一个链表，用来记录已完成的请求

2.注册软中断

3.注册一个通知结构，主要目的是为了在某个CPU离线时，将它已完成请求链表中的项转移到当前CPU的已完成链表，并引发软中断执行

 static __init int blk_softirq_init(void)

 {

     int i;

     for_each_possible_cpu(i)

         INIT_LIST_HEAD(&per_cpu(blk_cpu_done, i));

     open_softirq(BLOCK_SOFTIRQ, blk_done_softirq);

     register_hotcpu_notifier(&blk_cpu_notifier);

     return ;

 }

blk_softirq_init

软中断处理函数如下，这个函数首先将CPU已完成请求链表中的所有项转移到一个局部链表，这样做的目的是为了在这进行处理的时候，尽可能少地打扰CPU的完成请求链表，也就是不妨碍新的完成请求加入到这个链表。然后循环处理局部链表的每个项，将它从链表中删除，然后调用请求队列的软中断完成回调函数来处理。

 /*

 * Softirq action handler - move entries to local list and loop over them

 * while passing them to the queue registered handler.

 */

 static void blk_done_softirq(struct softirq_action *h)

 {

     struct list_head *cpu_list, local_list;

     local_irq_disable();

     cpu_list = &__get_cpu_var(blk_cpu_done);

     list_replace_init(cpu_list, &local_list);

     local_irq_enable();

     while (!list_empty(&local_list)) {

         struct request *rq;

         rq = list_entry(local_list.next, struct request, csd.list);

         list_del_init(&rq->csd.list);

         rq->q->softirq_done_fn(rq);

     }

 }

软中断完成回调函数是依赖请求队列的，对于SCSI设备，这个回调函数被设定为scsi_softirq_done，具体设定的时机是在为SCSI设备分配请求队列时，参见scsi_alloc_queue

 static void scsi_softirq_done(struct request *rq)

 {

     struct scsi_cmnd *cmd = rq->special;

     unsigned long wait_for = (cmd->allowed + ) * rq->timeout;

     int disposition;

     INIT_LIST_HEAD(&cmd->eh_entry);

     /* 首先修改所属SCSI设备的统计计数器，包括递增已完成命令计数器iodone_cnt和返回错误结果时递增已出错命令计数器ioerr_cnt */

     atomic_inc(&cmd->device->iodone_cnt);

     if (cmd->result)

         atomic_inc(&cmd->device->ioerr_cnt);

     /*

      * scsi_decide_disposition确定如何处理这条命令

      * SUCCESS：调用scsi_finish_command结束，后续继续分析

      * NEEDS_RETRY：

      * ADD_TO_MLQUEUE：后面两种情况都将命令重新排入请求队列，前者立即重试，后者经过一定延时后重试

      * 其他返回值调用scsi_eh_scmd_add进入错误恢复。如果进入错误恢复流程，返回1，这种情况下无需再处理这条命令，如果返回0则只能调用scsi_finish_command结束

      */

     disposition = scsi_decide_disposition(cmd);

     if (disposition != SUCCESS &&

         time_before(cmd->jiffies_at_alloc + wait_for, jiffies)) {

         sdev_printk(KERN_ERR, cmd->device,

                 "timing out command, waited %lus\n",

                 wait_for/HZ);

         disposition = SUCCESS;

     }

     scsi_log_completion(cmd, disposition);

     switch (disposition) {

         case SUCCESS:

             scsi_finish_command(cmd);

             break;

         case NEEDS_RETRY:

             scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY);

             break;

         case ADD_TO_MLQUEUE:

             scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);

             break;

         default:

             if (!scsi_eh_scmd_add(cmd, ))

                 scsi_finish_command(cmd);

     }

 }

scsi_finish_command

 /**

 * scsi_finish_command - cleanup and pass command back to upper layer

 * @cmd: the command

 *

 * Description: Pass command off to upper layer for finishing of I/O

 *              request, waking processes that are waiting on results,

 *              etc.

 */

 void scsi_finish_command(struct scsi_cmnd *cmd)

 {

     struct scsi_device *sdev = cmd->device;

     struct scsi_target *starget = scsi_target(sdev);

     struct Scsi_Host *shost = sdev->host;

     struct scsi_driver *drv;

     unsigned int good_bytes;

     scsi_device_unbusy(sdev);

         /*

          * Clear the flags which say that the device/host is no longer

          * capable of accepting new commands.  These are set in scsi_queue.c

          * for both the queue full condition on a device, and for a

          * host full condition on the host.

      *

      * XXX(hch): What about locking?

          */

         shost->host_blocked = ;

     starget->target_blocked = ;

         sdev->device_blocked = ;

     /*

      * If we have valid sense information, then some kind of recovery

      * must have taken place.  Make a note of this.

      */

     if (SCSI_SENSE_VALID(cmd))

         cmd->result |= (DRIVER_SENSE << );

     SCSI_LOG_MLCOMPLETE(, sdev_printk(KERN_INFO, sdev,

                 "Notifying upper driver of completion "

                 "(result %x)\n", cmd->result));

     /*

      * 要进行完成处理，首先必须知道SCSI已经成功完成的字节数，scsi_bufflen函数从SCSI数据缓冲区得到这个数据

      * 如果请求不是来自SCSI公共服务层，那么它一定来自上层，也就表明处理这个请求的设备必定被绑定到了高层驱动，

      * 如果定义了done回调，则调用它，对于SCSI磁盘高层驱动，对应实现为sd_done函数，这个函数返回调整后的已完成字节数

      * 有了已完成字节数，就可以调用scsi_io_completion

      */

     good_bytes = scsi_bufflen(cmd);

         if (cmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {

         int old_good_bytes = good_bytes;

         drv = scsi_cmd_to_driver(cmd);

         if (drv->done)

             good_bytes = drv->done(cmd);

         /*

          * USB may not give sense identifying bad sector and

          * simply return a residue instead, so subtract off the

          * residue if drv->done() error processing indicates no

          * change to the completion length.

          */

         if (good_bytes == old_good_bytes)

             good_bytes -= scsi_get_resid(cmd);

     }

     scsi_io_completion(cmd, good_bytes);

 }

scsi_io_completion……

Linux3.10.0块IO子系统流程（7）-- 请求处理完成的更多相关文章

Linux3.10.0块IO子系统流程（0）-- 块IO子系统概述
前言:这个系列主要是记录自己学习Linux块IO子系统的过程,其中代码分析皆基于Linux3.10.0版本,如有描述错误或不妥之处,敬请指出! 参考书籍:存储技术原理分析--基于Linux 2.6内核 ...
Linux3.10.0块IO子系统流程（3）-- SCSI策略例程
很长时间以来,Linux块设备使用了一种称为“蓄流/泄流”(plugging/unplugging)的技术来改进吞吐率.简单而言,这种工作方式类似浴盆排水系统的塞子.当IO被提交时,它被储存在一个队列 ...
Linux3.10.0块IO子系统流程（2）-- 构造、排序、合并请求
Linux块设备可以分为三类.分别针对顺序访问物理设备.随机访问物理设备和逻辑设备(即“栈式设备”) 类型 make_request_fn request_fn 备注 SCSI 设备等从bio构 ...
Linux3.10.0块IO子系统流程（6）-- 派发SCSI命令到低层驱动
在SCSI策略例程中最后调用scsi_dispatch_cmd将SCSI命令描述符派发给低层驱动进行处理 /** * scsi_dispatch_command - Dispatch a comman ...
Linux3.10.0块IO子系统流程（5）-- 为SCSI命令准备聚散列表
SCSI数据缓冲区组织成聚散列表的形式.Linux内核中表示聚散列表的基本数据结构是scatterlist,虽然名字中有list,但它只对应一个内存缓冲区,聚散列表就是多个scatterlist的组合 ...
Linux3.10.0块IO子系统流程（1）-- 上层提交请求
Linux通用块层提供给上层的接口函数是submit_bio.上层在构造好bio之后,调用submit_bio提交给通用块层处理. submit_bio函数如下: void submit_bi ...
Linux3.10.0块IO子系统流程（4）-- 为请求构造SCSI命令
首先来看scsi_prep_fn int scsi_prep_fn(struct request_queue *q, struct request *req) { struct scsi_device ...
DPA 9.1.85 升级到DPA 10.0.352流程
SolarWinds DPA的升级其实是一件非常简单的事情,这里介绍一下从DPA 9.1.95升级到 DPA 10.0.352版本的流程.为什么要升级呢? DPA给用户发的邮件已经写的非常清楚了(如下 ...
【转】linux IO子系统和文件系统读写流程
原文地址:linux IO子系统和文件系统读写流程我们含有分析的,是基于2.6.32及其后的内核. 我们在linux上总是要保存数据,数据要么保存在文件系统里(如ext3),要么就保存在裸设备里.我 ...

随机推荐

关于php
public private protected 修饰词 public: 公有类型在子类中可以通过self::var调用public方法或属性,parent::method调用父类方法在实例中可以 ...
Why Choose MB SD C5 with Engineer Software
MB SD C5 with engineer software performed good and now is released. Unlike the old clone C5 which us ...
MySQL查询命令_SELECT 子查询
首先创建一个table mysql> create table Total (id int AUTO_INCREMENT PRIMARY KEY,name char(20),stu_num in ...
oracle insert、append、parallel、随后查询的redo与磁盘读写
SQL> set autotrace traceonly statistics; SQL> insert into big_table_dir_test1 select * from bi ...
Vim Tricks
Vim Tricks operations replace :$s/from/to/g 全文替换 :10,20s/from/to/g 从第10行开始,替换至第20行 :10,20s/from/to/g ...
JS设计模式（10）职责链模式（重要）
什么是职责链模式? 重要性:4 星,在项目中能对 if-else 语句进行优化定义:避免请求发送者与接收者耦合在一起,让多个对象都有可能接收请求,将这些对象连接成一条链,并且沿着这条链传递请求,直到 ...
FSMC_LCD
1. TFT-LCD(Thin Film Transistor Liquid Crystal Display)[薄膜晶体管液晶显示器] 2. 液晶物质在熔融状态或在溶液状态下虽然获得了液体物质的流动 ...
尚硅谷面试第一季-17Redis 在项目中的使用场景
数据类型使用场景 String 比如说 ,我想知道什么时候封锁一个IP地址.Incrby命令 Hash 存储用户信息[id,name,age] Hset(key,field,value) Hset( ...
bash的基础特性
命令历史:history 环境变量: HISTSIZE: 命令历史的条数 HISTFILE:~/.bash_history HISTFILESIZE: 命令历史文件记录历史的条数 history -d ...
论文笔记：Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries
Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries 2018-09-18 09:58:50 Pape ...

Linux3.10.0块IO子系统流程（7）-- 请求处理完成

Linux3.10.0块IO子系统流程（7）-- 请求处理完成的更多相关文章

随机推荐

热门专题