源代码及NVMe协议版本

  • SPDK : spdk-17.07.1
  • DPDK : dpdk-17.08
  • NVMe Spec: 1.2.1

基本分析方法

  • 01 - 到官网http://www.spdk.io/下载spdk-17.07.1.tar.gz
  • 02 - 到官网http://www.dpdk.org/下载dpdk-17.08.tar.xz
  • 03 - 创建目录nvme/src, 将spdk-17.07.1.tar.gz和dpdk-17.08.tar.xz解压缩到nvme/src中,然后用OpenGrok创建网页版的源代码树
  • 04 - 阅读SPDK/NVMe驱动源代码, 同时参考NVMeDirect和Linux内核NVMe驱动

1. 识别NVMe固态硬盘的方法

NVMe SSD是一个PCIe设备, 那么怎么识别这种类型的设备? 有两种方法。

方法1: 通过Device ID + Vendor ID

方法2: 通过Class Code

在Linux内核NVMe驱动中,使用的是第一种方法。而在SPDK中,使用的是第二种方法。 上代码:

  • src/spdk-17.07.1/include/spdk/pci_ids.h
  1. 52 /**
  2. 53 * PCI class code for NVMe devices.
  3. 54 *
  4. 55 * Base class code 01h: mass storage
  5. 56 * Subclass code 08h: non-volatile memory
  6. 57 * Programming interface 02h: NVM Express
  7. 58 */
  8. 59 #define SPDK_PCI_CLASS_NVME 0x010802

而Class Code (0x010802) 在NVMe Specification中的定义如下:

2. Hello World

开始学习一门新的语言或者开发套件的时候,总是离不开"Hello World"。 SPDK也不例外, 让我们从hello_world.c开始, 看一下main()是如何使用SPDK/NVMe驱动的API的,从而帮助我们发现使用NVMe SSDs的主逻辑,

  • src/spdk-17.07.1/examples/nvme/hello_world/hello_world.c
  1. 306 int main(int argc, char **argv)
  2. 307 {
  3. 308 int rc;
  4. 309 struct spdk_env_opts opts;
  5. 310
  6. 311 /*
  7. 312 * SPDK relies on an abstraction around the local environment
  8. 313 * named env that handles memory allocation and PCI device operations.
  9. 314 * This library must be initialized first.
  10. 315 *
  11. 316 */
  12. 317 spdk_env_opts_init(&opts);
  13. 318 opts.name = "hello_world";
  14. 319 opts.shm_id = 0;
  15. 320 spdk_env_init(&opts);
  16. 321
  17. 322 printf("Initializing NVMe Controllers\n");
  18. 323
  19. 324 /*
  20. 325 * Start the SPDK NVMe enumeration process. probe_cb will be called
  21. 326 * for each NVMe controller found, giving our application a choice on
  22. 327 * whether to attach to each controller. attach_cb will then be
  23. 328 * called for each controller after the SPDK NVMe driver has completed
  24. 329 * initializing the controller we chose to attach.
  25. 330 */
  26. 331 rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);
  27. 332 if (rc != 0) {
  28. 333 fprintf(stderr, "spdk_nvme_probe() failed\n");
  29. 334 cleanup();
  30. 335 return 1;
  31. 336 }
  32. 337
  33. 338 if (g_controllers == NULL) {
  34. 339 fprintf(stderr, "no NVMe controllers found\n");
  35. 340 cleanup();
  36. 341 return 1;
  37. 342 }
  38. 343
  39. 344 printf("Initialization complete.\n");
  40. 345 hello_world();
  41. 346 cleanup();
  42. 347 return 0;
  43. 348 }

main()的处理流程为:

  1. 001 - 317 spdk_env_opts_init(&opts);
  2. 002 - 320 spdk_env_init(&opts);
  3. 003 - 331 rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);
  4. 004 - 345 hello_world();
  5. 005 - 346 cleanup();
  • 001-002,spdk运行环境初始化
  • 003,调用函数spdk_nvme_probe()主动发现NVMe SSDs设备。 显然, 接下来我们要分析的关键函数就是spdk_nvme_probe()。
  • 004,调用函数hello_world()做简单的读写操作
  • 005,调用函数cleanup()以释放内存资源,detach NVMe SSD设备等。

在分析关键函数spdk_nvme_probe()之前,让我们先搞清楚两个问题:

  • 问题1: 每一块NVMe固态硬盘里都一个控制器(Controller), 那么发现的所有NVMe固态硬盘(也就是NVMe Controllers)以什么方式组织在一起?
  • 问题2: 每一块NVMe固态硬盘都可以划分为多个NameSpace (类似逻辑分区的概念), 那么这些NameSpace以什么方式组织在一起?

对有经验的C程序员来说,回答这两个问题很easy,那就是链表。我们的hello_world.c也是这么干的。看代码:

  1. 39 struct ctrlr_entry {
  2. 40 struct spdk_nvme_ctrlr *ctrlr;
  3. 41 struct ctrlr_entry *next;
  4. 42 char name[1024];
  5. 43 };
  6. 44
  7. 45 struct ns_entry {
  8. 46 struct spdk_nvme_ctrlr *ctrlr;
  9. 47 struct spdk_nvme_ns *ns;
  10. 48 struct ns_entry *next;
  11. 49 struct spdk_nvme_qpair *qpair;
  12. 50 };
  13. 51
  14. 52 static struct ctrlr_entry *g_controllers = NULL;
  15. 53 static struct ns_entry *g_namespaces = NULL;

其中,

  • g_controllers是管理所有NVMe固态硬盘(i.e. NVMe Controllers)的全局链表头。
  • g_namespaces是管理所有的namespaces的全局链表头。

那么,回到main()的L338-342, 就很好理解了。 因为g_controllers指针为NULL, 所以没有找到NVMe SSD盘啊,于是cleanup后退出。

  1. 338 if (g_controllers == NULL) {
  2. 339 fprintf(stderr, "no NVMe controllers found\n");
  3. 340 cleanup();
  4. 341 return 1;
  5. 342 }

现在看看hello_world.c是如何使用spdk_nvme_probe()的,

  1. 331 rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);

显然,probe_cb和attach_cb是两个callback函数, (其实还有remove_cb, L331未使用)

  • probe_cb: 当枚举到一个NVMe设备的时候被调用
  • attach_cb: 当一个NVMe设备已经被attach(挂接?)到一个用户态的NVMe 驱动的时候被调用

probe_cb, attach_cb以及remove_cb的相关定义如下:

  • src/spdk-17.07.1/include/spdk/nvme.h
  1. 268 /**
  2. 269 * Callback for spdk_nvme_probe() enumeration.
  3. 270 *
  4. 271 * \param opts NVMe controller initialization options. This structure will be populated with the
  5. 272 * default values on entry, and the user callback may update any options to request a different
  6. 273 * value. The controller may not support all requested parameters, so the final values will be
  7. 274 * provided during the attach callback.
  8. 275 * \return true to attach to this device.
  9. 276 */
  10. 277 typedef bool (*spdk_nvme_probe_cb)(void *cb_ctx, const struct spdk_nvme_transport_id *trid,
  11. 278 struct spdk_nvme_ctrlr_opts *opts);
  12. 279
  13. 280 /**
  14. 281 * Callback for spdk_nvme_probe() to report a device that has been attached to the userspace NVMe driver.
  15. 282 *
  16. 283 * \param opts NVMe controller initialization options that were actually used. Options may differ
  17. 284 * from the requested options from the probe call depending on what the controller supports.
  18. 285 */
  19. 286 typedef void (*spdk_nvme_attach_cb)(void *cb_ctx, const struct spdk_nvme_transport_id *trid,
  20. 287 struct spdk_nvme_ctrlr *ctrlr,
  21. 288 const struct spdk_nvme_ctrlr_opts *opts);
  22. 289
  23. 290 /**
  24. 291 * Callback for spdk_nvme_probe() to report that a device attached to the userspace NVMe driver
  25. 292 * has been removed from the system.
  26. 293 *
  27. 294 * The controller will remain in a failed state (any new I/O submitted will fail).
  28. 295 *
  29. 296 * The controller must be detached from the userspace driver by calling spdk_nvme_detach()
  30. 297 * once the controller is no longer in use. It is up to the library user to ensure that
  31. 298 * no other threads are using the controller before calling spdk_nvme_detach().
  32. 299 *
  33. 300 * \param ctrlr NVMe controller instance that was removed.
  34. 301 */
  35. 302 typedef void (*spdk_nvme_remove_cb)(void *cb_ctx, struct spdk_nvme_ctrlr *ctrlr);
  36. 303
  37. 304 /**
  38. 305 * \brief Enumerate the bus indicated by the transport ID and attach the userspace NVMe driver
  39. 306 * to each device found if desired.
  40. 307 *
  41. 308 * \param trid The transport ID indicating which bus to enumerate. If the trtype is PCIe or trid is NULL,
  42. 309 * this will scan the local PCIe bus. If the trtype is RDMA, the traddr and trsvcid must point at the
  43. 310 * location of an NVMe-oF discovery service.
  44. 311 * \param cb_ctx Opaque value which will be passed back in cb_ctx parameter of the callbacks.
  45. 312 * \param probe_cb will be called once per NVMe device found in the system.
  46. 313 * \param attach_cb will be called for devices for which probe_cb returned true once that NVMe
  47. 314 * controller has been attached to the userspace driver.
  48. 315 * \param remove_cb will be called for devices that were attached in a previous spdk_nvme_probe()
  49. 316 * call but are no longer attached to the system. Optional; specify NULL if removal notices are not
  50. 317 * desired.
  51. 318 *
  52. 319 * This function is not thread safe and should only be called from one thread at a time while no
  53. 320 * other threads are actively using any NVMe devices.
  54. 321 *
  55. 322 * If called from a secondary process, only devices that have been attached to the userspace driver
  56. 323 * in the primary process will be probed.
  57. 324 *
  58. 325 * If called more than once, only devices that are not already attached to the SPDK NVMe driver
  59. 326 * will be reported.
  60. 327 *
  61. 328 * To stop using the the controller and release its associated resources,
  62. 329 * call \ref spdk_nvme_detach with the spdk_nvme_ctrlr instance returned by this function.
  63. 330 */
  64. 331 int spdk_nvme_probe(const struct spdk_nvme_transport_id *trid,
  65. 332 void *cb_ctx,
  66. 333 spdk_nvme_probe_cb probe_cb,
  67. 334 spdk_nvme_attach_cb attach_cb,
  68. 335 spdk_nvme_remove_cb remove_cb);

为了不被proce_cb, attach_cb, remove_cb带跑偏了,我们接下来看看结构体struct spdk_nvme_transport_id和spdk_nvme_probe()函数的主逻辑。

  • src/spdk-17.07.1/include/spdk/nvme.h
  1. 142 /**
  2. 143 * NVMe transport identifier.
  3. 144 *
  4. 145 * This identifies a unique endpoint on an NVMe fabric.
  5. 146 *
  6. 147 * A string representation of a transport ID may be converted to this type using
  7. 148 * spdk_nvme_transport_id_parse().
  8. 149 */
  9. 150 struct spdk_nvme_transport_id {
  10. 151 /**
  11. 152 * NVMe transport type.
  12. 153 */
  13. 154 enum spdk_nvme_transport_type trtype;
  14. 155
  15. 156 /**
  16. 157 * Address family of the transport address.
  17. 158 *
  18. 159 * For PCIe, this value is ignored.
  19. 160 */
  20. 161 enum spdk_nvmf_adrfam adrfam;
  21. 162
  22. 163 /**
  23. 164 * Transport address of the NVMe-oF endpoint. For transports which use IP
  24. 165 * addressing (e.g. RDMA), this should be an IP address. For PCIe, this
  25. 166 * can either be a zero length string (the whole bus) or a PCI address
  26. 167 * in the format DDDD:BB:DD.FF or DDDD.BB.DD.FF
  27. 168 */
  28. 169 char traddr[SPDK_NVMF_TRADDR_MAX_LEN + 1];
  29. 170
  30. 171 /**
  31. 172 * Transport service id of the NVMe-oF endpoint. For transports which use
  32. 173 * IP addressing (e.g. RDMA), this field shoud be the port number. For PCIe,
  33. 174 * this is always a zero length string.
  34. 175 */
  35. 176 char trsvcid[SPDK_NVMF_TRSVCID_MAX_LEN + 1];
  36. 177
  37. 178 /**
  38. 179 * Subsystem NQN of the NVMe over Fabrics endpoint. May be a zero length string.
  39. 180 */
  40. 181 char subnqn[SPDK_NVMF_NQN_MAX_LEN + 1];
  41. 182 };

对于NVMe over PCIe, 我们只需要关注"NVMe transport type"这一项:

  1. 154 enum spdk_nvme_transport_type trtype;

而目前,支持两种传输类型, PCIe和RDMA。

  1. 130 enum spdk_nvme_transport_type {
  2. 131 /**
  3. 132 * PCIe Transport (locally attached devices)
  4. 133 */
  5. 134 SPDK_NVME_TRANSPORT_PCIE = 256,
  6. 135
  7. 136 /**
  8. 137 * RDMA Transport (RoCE, iWARP, etc.)
  9. 138 */
  10. 139 SPDK_NVME_TRANSPORT_RDMA = SPDK_NVMF_TRTYPE_RDMA,
  11. 140 };

有关RDMA的问题,我们后面暂时不做讨论,因为我们目前主要关心NVMe over PCIe

接下来看函数spdk_nvme_probe()的代码,

  • src/spdk-17.07.1/lib/nvme/nvme.c
  1. 396 int
  2. 397 spdk_nvme_probe(const struct spdk_nvme_transport_id *trid, void *cb_ctx,
  3. 398 spdk_nvme_probe_cb probe_cb, spdk_nvme_attach_cb attach_cb,
  4. 399 spdk_nvme_remove_cb remove_cb)
  5. 400 {
  6. 401 int rc;
  7. 402 struct spdk_nvme_ctrlr *ctrlr;
  8. 403 struct spdk_nvme_transport_id trid_pcie;
  9. 404
  10. 405 rc = nvme_driver_init();
  11. 406 if (rc != 0) {
  12. 407 return rc;
  13. 408 }
  14. 409
  15. 410 if (trid == NULL) {
  16. 411 memset(&trid_pcie, 0, sizeof(trid_pcie));
  17. 412 trid_pcie.trtype = SPDK_NVME_TRANSPORT_PCIE;
  18. 413 trid = &trid_pcie;
  19. 414 }
  20. 415
  21. 416 if (!spdk_nvme_transport_available(trid->trtype)) {
  22. 417 SPDK_ERRLOG("NVMe trtype %u not available\n", trid->trtype);
  23. 418 return -1;
  24. 419 }
  25. 420
  26. 421 nvme_robust_mutex_lock(&g_spdk_nvme_driver->lock);
  27. 422
  28. 423 nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
  29. 424
  30. 425 if (!spdk_process_is_primary()) {
  31. 426 TAILQ_FOREACH(ctrlr, &g_spdk_nvme_driver->attached_ctrlrs, tailq) {
  32. 427 nvme_ctrlr_proc_get_ref(ctrlr);
  33. 428
  34. 429 /*
  35. 430 * Unlock while calling attach_cb() so the user can call other functions
  36. 431 * that may take the driver lock, like nvme_detach().
  37. 432 */
  38. 433 nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
  39. 434 attach_cb(cb_ctx, &ctrlr->trid, ctrlr, &ctrlr->opts);
  40. 435 nvme_robust_mutex_lock(&g_spdk_nvme_driver->lock);
  41. 436 }
  42. 437
  43. 438 nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
  44. 439 return 0;
  45. 440 }
  46. 441
  47. 442 nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
  48. 443 /*
  49. 444 * Keep going even if one or more nvme_attach() calls failed,
  50. 445 * but maintain the value of rc to signal errors when we return.
  51. 446 */
  52. 447
  53. 448 rc = nvme_init_controllers(cb_ctx, attach_cb);
  54. 449
  55. 450 return rc;
  56. 451 }

spdk_nvme_probe()的处理流程为:

  1. 001 405: rc = nvme_driver_init();
  2. 002 410-414: set trid if it is NULL
  3. 003 416: check NVMe trtype via spdk_nvme_transport_available(trid->trtype)
  4. 004 423: nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
  5. 005 425: check spdk process is primary, if not, do something at L426-440
  6. 006 448: rc = nvme_init_controllers(cb_ctx, attach_cb);

接下来,让我们看看函数nvme_transport_ctrlr_scan(),

  1. 423 nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
  1. /* src/spdk-17.07.1/lib/nvme/nvme_transport.c#92 */
  2.  
  3. 91 int
  4. 92 nvme_transport_ctrlr_scan(const struct spdk_nvme_transport_id *trid,
  5. 93 void *cb_ctx,
  6. 94 spdk_nvme_probe_cb probe_cb,
  7. 95 spdk_nvme_remove_cb remove_cb)
  8. 96 {
  9. 97 NVME_TRANSPORT_CALL(trid->trtype, ctrlr_scan, (trid, cb_ctx, probe_cb, remove_cb));
  10. 98 }

而宏NVME_TRANSPORT_CALL的定义是:

  1. /* src/spdk-17.07.1/lib/nvme/nvme_transport.c#60 */
  2. 52 #define TRANSPORT_PCIE(func_name, args) case SPDK_NVME_TRANSPORT_PCIE: return nvme_pcie_ ## func_name args;
  3. ..
  4. 60 #define NVME_TRANSPORT_CALL(trtype, func_name, args) \
  5. 61 do { \
  6. 62 switch (trtype) { \
  7. 63 TRANSPORT_PCIE(func_name, args) \
  8. 64 TRANSPORT_FABRICS_RDMA(func_name, args) \
  9. 65 TRANSPORT_DEFAULT(trtype) \
  10. 66 } \
  11. 67 SPDK_UNREACHABLE(); \
  12. 68 } while (0)
  13. ..

于是, nvme_transport_ctrlr_scan()被转化为nvme_pcie_ctrlr_scan()调用(对NVMe over PCIe)来说,

  1. /* src/spdk-17.07.1/lib/nvme/nvme_pcie.c#620 */
  2. 619 int
  3. 620 nvme_pcie_ctrlr_scan(const struct spdk_nvme_transport_id *trid,
  4. 621 void *cb_ctx,
  5. 622 spdk_nvme_probe_cb probe_cb,
  6. 623 spdk_nvme_remove_cb remove_cb)
  7. 624 {
  8. 625 struct nvme_pcie_enum_ctx enum_ctx = {};
  9. 626
  10. 627 enum_ctx.probe_cb = probe_cb;
  11. 628 enum_ctx.cb_ctx = cb_ctx;
  12. 629
  13. 630 if (strlen(trid->traddr) != 0) {
  14. 631 if (spdk_pci_addr_parse(&enum_ctx.pci_addr, trid->traddr)) {
  15. 632 return -1;
  16. 633 }
  17. 634 enum_ctx.has_pci_addr = true;
  18. 635 }
  19. 636
  20. 637 if (hotplug_fd < 0) {
  21. 638 hotplug_fd = spdk_uevent_connect();
  22. 639 if (hotplug_fd < 0) {
  23. 640 SPDK_TRACELOG(SPDK_TRACE_NVME, "Failed to open uevent netlink socket\n");
  24. 641 }
  25. 642 } else {
  26. 643 _nvme_pcie_hotplug_monitor(cb_ctx, probe_cb, remove_cb);
  27. 644 }
  28. 645
  29. 646 if (enum_ctx.has_pci_addr == false) {
  30. 647 return spdk_pci_nvme_enumerate(pcie_nvme_enum_cb, &enum_ctx);
  31. 648 } else {
  32. 649 return spdk_pci_nvme_device_attach(pcie_nvme_enum_cb, &enum_ctx, &enum_ctx.pci_addr);
  33. 650 }
  34. 651 }

接下来重点看看L647对应的函数spck_pci_nvme_enumerate()就好,因为我们的目标是看明白是如何利用Class Code发现SSD设备的。

  1. 647 return spdk_pci_nvme_enumerate(pcie_nvme_enum_cb, &enum_ctx);
  1. /* src/spdk-17.07.1/lib/env_dpdk/pci_nvme.c */
  2.  
  3. 81 int
  4. 82 spdk_pci_nvme_enumerate(spdk_pci_enum_cb enum_cb, void *enum_ctx)
  5. 83 {
  6. 84 return spdk_pci_enumerate(&g_nvme_pci_drv, enum_cb, enum_ctx);
  7. 85 }

注意: L84第一个参数为一个全局变量g_nvme_pci_drv的地址, ( 看到一个全局结构体变量总是令人兴奋的:-) )

  1. /* src/spdk-17.07.1/lib/env_dpdk/pci_nvme.c */
  2.  
  3. 38 static struct rte_pci_id nvme_pci_driver_id[] = {
  4. 39 #if RTE_VERSION >= RTE_VERSION_NUM(16, 7, 0, 1)
  5. 40 {
  6. 41 .class_id = SPDK_PCI_CLASS_NVME,
  7. 42 .vendor_id = PCI_ANY_ID,
  8. 43 .device_id = PCI_ANY_ID,
  9. 44 .subsystem_vendor_id = PCI_ANY_ID,
  10. 45 .subsystem_device_id = PCI_ANY_ID,
  11. 46 },
  12. 47 #else
  13. 48 {RTE_PCI_DEVICE(0x8086, 0x0953)},
  14. 49 #endif
  15. 50 { .vendor_id = 0, /* sentinel */ },
  16. 51 };
  17. ..
  18. 53 static struct spdk_pci_enum_ctx g_nvme_pci_drv = {
  19. 54 .driver = {
  20. 55 .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
  21. 56 .id_table = nvme_pci_driver_id,
  22. ..
  23. 66 },
  24. 67
  25. 68 .cb_fn = NULL,
  26. 69 .cb_arg = NULL,
  27. 70 .mtx = PTHREAD_MUTEX_INITIALIZER,
  28. 71 .is_registered = false,
  29. 72 };

啊哈! 终于跟Class Code (SPDK_PCI_CLASS_NVME=0x010802)扯上了关系。 全局变量g_nvme_pci_drv就是在L53行定义的,而g_nvme_pci_drv.driver.id_table则是在L38行定义的。

  1. 38 static struct rte_pci_id nvme_pci_driver_id[] = {
  2. ..
  3. 41 .class_id = SPDK_PCI_CLASS_NVME,
  4. ..
  5. 53 static struct spdk_pci_enum_ctx g_nvme_pci_drv = {
  6. 54 .driver = {
  7. ..
  8. 56 .id_table = nvme_pci_driver_id,
  9. ..

那么,我们只需要进一步深挖spdk_pci_enumerate()就可以找到SSD设备是如何被发现的了...

  1. /* src/spdk-17.07.1/lib/env_dpdk/pci.c#150 */
  2.  
  3. 149 int
  4. 150 spdk_pci_enumerate(struct spdk_pci_enum_ctx *ctx,
  5. 151 spdk_pci_enum_cb enum_cb,
  6. 152 void *enum_ctx)
  7. 153 {
  8. ...
  9. 168
  10. 169 #if RTE_VERSION >= RTE_VERSION_NUM(17, 05, 0, 4)
  11. 170 if (rte_pci_probe() != 0) {
  12. 171 #else
  13. 172 if (rte_eal_pci_probe() != 0) {
  14. 173 #endif
  15. ...
  16. 184 return 0;
  17. 185 }

省略了一些代码,我们接下来重点关注L170,

  1. 170 if (rte_pci_probe() != 0) {

从rte_pci_probe()函数的实现开始,我们就深入到DPDK的内部了,代码如下,

  1. /* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#413 */
  2.  
  3. 407 /*
  4. 408 * Scan the content of the PCI bus, and call the probe() function for
  5. 409 * all registered drivers that have a matching entry in its id_table
  6. 410 * for discovered devices.
  7. 411 */
  8. 412 int
  9. 413 rte_pci_probe(void)
  10. 414 {
  11. 415 struct rte_pci_device *dev = NULL;
  12. 416 size_t probed = 0, failed = 0;
  13. 417 struct rte_devargs *devargs;
  14. 418 int probe_all = 0;
  15. 419 int ret = 0;
  16. 420
  17. 421 if (rte_pci_bus.bus.conf.scan_mode != RTE_BUS_SCAN_WHITELIST)
  18. 422 probe_all = 1;
  19. 423
  20. 424 FOREACH_DEVICE_ON_PCIBUS(dev) {
  21. 425 probed++;
  22. 426
  23. 427 devargs = dev->device.devargs;
  24. 428 /* probe all or only whitelisted devices */
  25. 429 if (probe_all)
  26. 430 ret = pci_probe_all_drivers(dev);
  27. 431 else if (devargs != NULL &&
  28. 432 devargs->policy == RTE_DEV_WHITELISTED)
  29. 433 ret = pci_probe_all_drivers(dev);
  30. 434 if (ret < 0) {
  31. 435 RTE_LOG(ERR, EAL, "Requested device " PCI_PRI_FMT
  32. 436 " cannot be used\n", dev->addr.domain, dev->addr.bus,
  33. 437 dev->addr.devid, dev->addr.function);
  34. 438 rte_errno = errno;
  35. 439 failed++;
  36. 440 ret = 0;
  37. 441 }
  38. 442 }
  39. 443
  40. 444 return (probed && probed == failed) ? -1 : 0;
  41. 445 }

L430是我们关注的重点,

  1. 430 ret = pci_probe_all_drivers(dev);

函数pci_probe_all_drivers()的实现如下:

  1. /* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#307 */
  2.  
  3. 301 /*
  4. 302 * If vendor/device ID match, call the probe() function of all
  5. 303 * registered driver for the given device. Return -1 if initialization
  6. 304 * failed, return 1 if no driver is found for this device.
  7. 305 */
  8. 306 static int
  9. 307 pci_probe_all_drivers(struct rte_pci_device *dev)
  10. 308 {
  11. 309 struct rte_pci_driver *dr = NULL;
  12. 310 int rc = 0;
  13. 311
  14. 312 if (dev == NULL)
  15. 313 return -1;
  16. 314
  17. 315 /* Check if a driver is already loaded */
  18. 316 if (dev->driver != NULL)
  19. 317 return 0;
  20. 318
  21. 319 FOREACH_DRIVER_ON_PCIBUS(dr) {
  22. 320 rc = rte_pci_probe_one_driver(dr, dev);
  23. 321 if (rc < 0)
  24. 322 /* negative value is an error */
  25. 323 return -1;
  26. 324 if (rc > 0)
  27. 325 /* positive value means driver doesn't support it */
  28. 326 continue;
  29. 327 return 0;
  30. 328 }
  31. 329 return 1;
  32. 330 }

L320是我们关注的重点,

  1. 320 rc = rte_pci_probe_one_driver(dr, dev);
  1. /* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#200 */
  2.  
  3. 195 /*
  4. 196 * If vendor/device ID match, call the probe() function of the
  5. 197 * driver.
  6. 198 */
  7. 199 static int
  8. 200 rte_pci_probe_one_driver(struct rte_pci_driver *dr,
  9. 201 struct rte_pci_device *dev)
  10. 202 {
  11. 203 int ret;
  12. 204 struct rte_pci_addr *loc;
  13. 205
  14. 206 if ((dr == NULL) || (dev == NULL))
  15. 207 return -EINVAL;
  16. 208
  17. 209 loc = &dev->addr;
  18. 210
  19. 211 /* The device is not blacklisted; Check if driver supports it */
  20. 212 if (!rte_pci_match(dr, dev))
  21. 213 /* Match of device and driver failed */
  22. 214 return 1;
  23. 215
  24. 216 RTE_LOG(INFO, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
  25. 217 loc->domain, loc->bus, loc->devid, loc->function,
  26. 218 dev->device.numa_node);
  27. 219
  28. 220 /* no initialization when blacklisted, return without error */
  29. 221 if (dev->device.devargs != NULL &&
  30. 222 dev->device.devargs->policy ==
  31. 223 RTE_DEV_BLACKLISTED) {
  32. 224 RTE_LOG(INFO, EAL, " Device is blacklisted, not"
  33. 225 " initializing\n");
  34. 226 return 1;
  35. 227 }
  36. 228
  37. 229 if (dev->device.numa_node < 0) {
  38. 230 RTE_LOG(WARNING, EAL, " Invalid NUMA socket, default to 0\n");
  39. 231 dev->device.numa_node = 0;
  40. 232 }
  41. 233
  42. 234 RTE_LOG(INFO, EAL, " probe driver: %x:%x %s\n", dev->id.vendor_id,
  43. 235 dev->id.device_id, dr->driver.name);
  44. 236
  45. 237 if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
  46. 238 /* map resources for devices that use igb_uio */
  47. 239 ret = rte_pci_map_device(dev);
  48. 240 if (ret != 0)
  49. 241 return ret;
  50. 242 }
  51. 243
  52. 244 /* reference driver structure */
  53. 245 dev->driver = dr;
  54. 246 dev->device.driver = &dr->driver;
  55. 247
  56. 248 /* call the driver probe() function */
  57. 249 ret = dr->probe(dr, dev);
  58. 250 if (ret) {
  59. 251 dev->driver = NULL;
  60. 252 dev->device.driver = NULL;
  61. 253 if ((dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) &&
  62. 254 /* Don't unmap if device is unsupported and
  63. 255 * driver needs mapped resources.
  64. 256 */
  65. 257 !(ret > 0 &&
  66. 258 (dr->drv_flags & RTE_PCI_DRV_KEEP_MAPPED_RES)))
  67. 259 rte_pci_unmap_device(dev);
  68. 260 }
  69. 261
  70. 262 return ret;
  71. 263 }

L212是我们关注的重点,

  1. 212 if (!rte_pci_match(dr, dev))

而rte_pci_match()的实现如下,

  1. /* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#163 */
  2.  
  3. 151 /*
  4. 152 * Match the PCI Driver and Device using the ID Table
  5. 153 *
  6. 154 * @param pci_drv
  7. 155 * PCI driver from which ID table would be extracted
  8. 156 * @param pci_dev
  9. 157 * PCI device to match against the driver
  10. 158 * @return
  11. 159 * 1 for successful match
  12. 160 * 0 for unsuccessful match
  13. 161 */
  14. 162 static int
  15. 163 rte_pci_match(const struct rte_pci_driver *pci_drv,
  16. 164 const struct rte_pci_device *pci_dev)
  17. 165 {
  18. 166 const struct rte_pci_id *id_table;
  19. 167
  20. 168 for (id_table = pci_drv->id_table; id_table->vendor_id != 0;
  21. 169 id_table++) {
  22. 170 /* check if device's identifiers match the driver's ones */
  23. 171 if (id_table->vendor_id != pci_dev->id.vendor_id &&
  24. 172 id_table->vendor_id != PCI_ANY_ID)
  25. 173 continue;
  26. 174 if (id_table->device_id != pci_dev->id.device_id &&
  27. 175 id_table->device_id != PCI_ANY_ID)
  28. 176 continue;
  29. 177 if (id_table->subsystem_vendor_id !=
  30. 178 pci_dev->id.subsystem_vendor_id &&
  31. 179 id_table->subsystem_vendor_id != PCI_ANY_ID)
  32. 180 continue;
  33. 181 if (id_table->subsystem_device_id !=
  34. 182 pci_dev->id.subsystem_device_id &&
  35. 183 id_table->subsystem_device_id != PCI_ANY_ID)
  36. 184 continue;
  37. 185 if (id_table->class_id != pci_dev->id.class_id &&
  38. 186 id_table->class_id != RTE_CLASS_ANY_ID)
  39. 187 continue;
  40. 188
  41. 189 return 1;
  42. 190 }
  43. 191
  44. 192 return 0;
  45. 193 }

看到这里,我们终于找到了SSD设备是如何被发现的, L185-187是我们最希望看到的三行代码:

  1. 185 if (id_table->class_id != pci_dev->id.class_id &&
  2. 186 id_table->class_id != RTE_CLASS_ANY_ID)
  3. 187 continue;

而结构体struct rte_pci_driver和struct rte_pci_device的定义为:

  1. /* src/dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#100 */
  2.  
  3. 96 /**
  4. 97 * A structure describing an ID for a PCI driver. Each driver provides a
  5. 98 * table of these IDs for each device that it supports.
  6. 99 */
  7. 100 struct rte_pci_id {
  8. 101 uint32_t class_id; /**< Class ID (class, subclass, pi) or RTE_CLASS_ANY_ID. */
  9. 102 uint16_t vendor_id; /**< Vendor ID or PCI_ANY_ID. */
  10. 103 uint16_t device_id; /**< Device ID or PCI_ANY_ID. */
  11. 104 uint16_t subsystem_vendor_id; /**< Subsystem vendor ID or PCI_ANY_ID. */
  12. 105 uint16_t subsystem_device_id; /**< Subsystem device ID or PCI_ANY_ID. */
  13. 106 };
  14.  
  15. /* src/dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#120 */
  16.  
  17. 120 /**
  18. 121 * A structure describing a PCI device.
  19. 122 */
  20. 123 struct rte_pci_device {
  21. 124 TAILQ_ENTRY(rte_pci_device) next; /**< Next probed PCI device. */
  22. 125 struct rte_device device; /**< Inherit core device */
  23. 126 struct rte_pci_addr addr; /**< PCI location. */
  24. 127 struct rte_pci_id id; /**< PCI ID. */
  25. 128 struct rte_mem_resource mem_resource[PCI_MAX_RESOURCE];
  26. 129 /**< PCI Memory Resource */
  27. 130 struct rte_intr_handle intr_handle; /**< Interrupt handle */
  28. 131 struct rte_pci_driver *driver; /**< Associated driver */
  29. 132 uint16_t max_vfs; /**< sriov enable if not zero */
  30. 133 enum rte_kernel_driver kdrv; /**< Kernel driver passthrough */
  31. 134 char name[PCI_PRI_STR_SIZE+1]; /**< PCI location (ASCII) */
  32. 135 };
  33.  
  34. /* src/dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#178 */
  35.  
  36. 175 /**
  37. 176 * A structure describing a PCI driver.
  38. 177 */
  39. 178 struct rte_pci_driver {
  40. 179 TAILQ_ENTRY(rte_pci_driver) next; /**< Next in list. */
  41. 180 struct rte_driver driver; /**< Inherit core driver. */
  42. 181 struct rte_pci_bus *bus; /**< PCI bus reference. */
  43. 182 pci_probe_t *probe; /**< Device Probe function. */
  44. 183 pci_remove_t *remove; /**< Device Remove function. */
  45. 184 const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */
  46. 185 uint32_t drv_flags; /**< Flags contolling handling of device. */
  47. 186 };

到此为止,我们可以对SSD设备发现做如下总结

  • 01 - 使用Class Code (0x010802)作为SSD设备发现的依据
  • 02 - 发现SSD设备的时候,从SPDK进入到DPDK中,函数调用栈为:
  1. 00 hello_word.c
  2. 01 -> main()
  3. 02 --> spdk_nvme_probe()
  4. 03 ---> nvme_transport_ctrlr_scan()
  5. 04 ----> nvme_pcie_ctrlr_scan()
  6. 05 -----> spdk_pci_nvme_enumerate()
  7. 06 ------> spdk_pci_enumerate(&g_nvme_pci_drv, ...) | SPDK |
  8. =========================================================================
  9. 07 -------> rte_pci_probe() | DPDK |
  10. 08 --------> pci_probe_all_drivers()
  11. 09 ---------> rte_pci_probe_one_driver()
  12. 10 ----------> rte_pci_match()
  • 03 - DPDK中环境抽象层(EAL: Environment Abstraction Layer)的函数rte_pci_match()是发现SSD设备的关键。
  • 04 - DPDK的EAL在DPDK架构中所处的位置,如下图所示:

  1. Your greatness is measured by your horizons. | 你的成就是由你的眼界来衡量的。

[SPDK/NVMe存储技术分析]004 - SSD设备的发现的更多相关文章

  1. [SPDK/NVMe存储技术分析]003 - NVMeDirect论文

    说明: 之所以要翻译这篇论文,是因为参考此论文可以很好地理解SPDK/NVMe的设计思想. NVMeDirect: A User-space I/O Framework for Application ...

  2. [SPDK/NVMe存储技术分析]002 - SPDK官方介绍

    Introduction to the Storage Performance Development Kit (SPDK) | SPDK概述 By Jonathan S. (Intel), Upda ...

  3. [SPDK/NVMe存储技术分析]001 - SPDK/NVMe概述

    1. NVMe概述 NVMe是一个针对基于PCIe的固态硬盘的高性能的.可扩展的主机控制器接口. NVMe的显著特征是提供多个队列来处理I/O命令.单个NVMe设备支持多达64K个I/O 队列,每个I ...

  4. [SPDK/NVMe存储技术分析]008 - RDMA概述

    毫无疑问地,用来取代iSCSI/iSER(iSCSI Extensions for RDMA)技术的NVMe over Fabrics着实让RDMA又火了一把.在介绍NVMe over Fabrics ...

  5. [SPDK/NVMe存储技术分析]005 - DPDK概述

    注: 之所以要中英文对照翻译下面的文章,是因为SPDK严重依赖于DPDK的实现. Introduction to DPDK: Architecture and PrinciplesDPDK概论:体系结 ...

  6. [SPDK/NVMe存储技术分析]012 - 用户态ibv_post_send()源码分析

    OFA定义了一组标准的Verbs,并提供了一个标准库libibvers.在用户态实现NVMe over RDMA的Host(i.e. Initiator)和Target, 少不了要跟OFA定义的Ver ...

  7. [SPDK/NVMe存储技术分析]007 - 初识UIO

    注: 要进一步搞清楚SSD盘对应的PCI的BAR寄存器的映射,有必要先了解一下UIO(Userspace I/O). UIO(Userspace I/O)是运行在用户空间的I/O技术.在Linux系统 ...

  8. [SPDK/NVMe存储技术分析]015 - 理解内存注册(Memory Registration)

    使用RDMA, 必然关系到内存区域(Memory Region)的注册问题.在本文中,我们将以mlx5 HCA卡为例回答如下几个问题: 为什么需要注册内存区域? 注册内存区域有嘛好处? 注册内存区域的 ...

  9. [SPDK/NVMe存储技术分析]014 - (NVMe over PCIe)Host端的命令处理流程

    NVMe over PCIe最新的NVMe协议是1.3. 在7.2.1讲了Command Processing流程.有图有真相. This section describes command subm ...

随机推荐

  1. 手把手教你实现pynq-z2条形码识别

    我是 雪天鱼,一名FPGA爱好者,研究方向是FPGA架构探索和SOC设计. 关注公众号,拉你进"IC设计交流群". 1.前言 单单实现一个二维码识别就花了将近一个星期,这篇文章我就 ...

  2. 05.python语法入门--垃圾回收机制

    # (1)垃圾回收机制GC# 引用计数# x = 10 # 值10引用计数为1# y = x   # 值10引用计数为2## y = 1000 # 值10引用计数减少为1# del x     # 值 ...

  3. pytest(3)-测试命名规则

    前言 在自动化测试项目中,单元测试框架运行时需要先搜索测试模块(即测试用例所在的.py文件),然后在测试模块中搜索测试类或测试函数,接着在测试类中搜索测试方法,最后加入到队列中,再按执行顺序执行测试. ...

  4. .NET6: 开发基于WPF的摩登三维工业软件 (7)

    做为一个摩登的工业软件,提供可编程的脚本能力是必不可少的能力.脚本既可以方便用户进行二次开发,也对方便对程序进行自动化测试.本文将结合AnyCAD对Python脚本支持的能力和WPF快速开发带脚本编辑 ...

  5. jmeter非gui之shell脚本

    非gui运行脚本,如果目录非空,会报不能写的错 可以通过shell脚本来处理: #!/bin/bash filename=`date +'%Y%m%d%H%M%S'` if [ -d /root/te ...

  6. CSRF靶场练习

    实验目的 了解CSRF跨站伪造请求实验 实验原理 CSRF的原理 CSRF(Cross-site Request Forgery)是指跨站点请求伪造,也就是跨站漏洞攻击,通常用来指 WEB 网站的这一 ...

  7. 密码破解工具Brutus

    实验目的 利用brutus将暴力破解ftp密码 实验原理 brutus将多次尝试ftp密码进行密码爆破 实验内容 利用brutus将暴力破解ftp密码 实验环境描述 1. 学生机与实验室网络直连; 2 ...

  8. 2020年最为典型的BI工具有哪些?

    现在可视化BI 可以帮助充分利用企业在日常运营中积累的大量数据,帮助企业做出理性的决策,降低风险,减少损失.以下五款我认为是2020年最为典型的BI工具: (1)Tableau Tableau是国外市 ...

  9. Oracle数据库常用命令整理

    转至:https://blog.csdn.net/creativemobile/article/details/8982164 1监听 (1)启动监听 lsnrctl start (2)停止监听  l ...

  10. 用python构建一个多维维数组

    用python构建一个二维数组 解法? 方法1: num_list=[0]*x//表示位创建一个一维数组为num_lis[x],且数组中的每一项都为0 num_list=[[0]*x for i in ...