


0 论文

论文是2018年的,发表在医学期刊《Circulation》的一篇文章《Fully Automated Echocardiogram Interpretation in Clinical Practice》 (超声心动图在临床中的自动化检测)。现在对于整体的学习做一个回顾,可以当成导读:整个文章的算法方面不难,分类模型用的VGG,分割模型用的Unet,损失函数中规中矩,图片处理中规中矩,算是一个老方法在医学领域的一个使用。本文包含三个部分,英文的论文原文内容,宋体的百度翻译内容,以及加粗字体的我的理解与精炼的内容。

1 概述

Using 14 035 echocardiograms spanning a 10-year period, we trained and evaluated convolutional neural network models for multiple tasks, including automated identification of 23 viewpoints and segmentation of cardiac chambers across 5 common views. The segmentation output was used to quantify chamber volumes and left ventricular mass, determine ejection fraction, and facilitate automated determination of longitudinal strain through speckle tracking. Results were evaluated through comparison to manual segmentation and measurements from 8666 echocardiograms obtained during the routine clinical workflow. Finally, we developed models to detect 3 diseases: hypertrophic cardiomyopathy, cardiac amyloid, and pulmonary arterial hypertension.


Convolutional neural networks accurately identified views (eg, 96% for parasternal long axis), including flagging partially obscured cardiac chambers, and enabled the segmentation of individual cardiac chambers. The resulting cardiac structure measurements agreed with study report values (eg, median absolute deviations of 15% to 17% of observed values for left ventricular mass, left ventricular diastolic volume, and left atrial volume). In terms of function, we computed automated ejection fraction and longitudinal strain measurements (within 2 cohorts), which agreed with commercial software-derived values (for ejection fraction, median absolute deviation=9.7% of observed, N=6407 studies; for strain, median absolute deviation=7.5%, n=419, and 9.0%, n=110) and demonstrated applicability to serial monitoring of patients with breast cancer for trastuzumab cardiotoxicity. Overall, we found automated measurements to be comparable or superior to manual measurements across 11 internal consistency metrics (eg, the correlation of left atrial and ventricular volumes). Finally, we trained convolutional neural networks to detect hypertrophic cardiomyopathy, cardiac amyloidosis, and pulmonary arterial hypertension with C statistics of 0.93, 0.87, and 0.85, respectively.


2 pipeline


Preprocessing entailed automated downloading of echocardiograms in Digital Imaging and Communications in Medicine format, separating videos from still images, extracting metadata (eg, frame rate, heart rate), converting them into numeric arrays for matrix computations, and deidentifying images by overwriting patient health information. We next used convolutional neu- ral networks (described later) for automatically determining echocardiographic views. Based on the identified views, videos were routed to specific segmentation models (parasternal long axis [PLAX], parasternal short axis, apical 2-chamber [A2c], api- cal 3-chamber, and apical 4-chamber [A4c]), and the output was used to derive chamber measurements, including lengths, areas, volumes, and mass estimates. Next, we generated 2 commonly used automated measures of left ventricular (LV) function: ejection fraction and longitudinal strain. Finally, we derived models to detect 3 diseases: hypertrophic cardiomyop- athy, pulmonary arterial hypertension, and cardiac amyloidosis.


3 技术细节

Specifically, 277 echocardiograms col- lected over a 10-year period were used to derive a view clas- sification model (Table II in the online-only Data Supplement). The image segmentation model was trained from 791 images divided over 5 separate views (Table III in the online-only Data Supplement). Comparison of automated and manual mea- surements was made against 8666 echocardiograms, with the majority of measurements made from 2014 to 2017 (Table IV in the online-only Data Supplement). For this pur- pose, we used all studies where these measurements were available (ie, there was no selection bias). The number of images used for training the different segmentation models was not planned in advance, and models were retrained as more data accrued over time. From initial testing, we rec- ognized that at least 60 images would be needed, and we allocated more training data and resources to A2c and A4c views because these were more central to measurements for both structure and function.



3.1 预处理

We identified 260 patients at UCSF who met guideline-based criteria for hypertrophic cardiomyopathy: “unexplained left ventricular (LV) hypertrophy (maximal LV wall thickness ≥ 15 mm) associated with nondilated ventricular chambers in the absence of another cardiac or systemic disease that itself would be capable of producing the magnitude of hypertro- phy evident in a given patient.”9 These patients were selected from 2 sources: the UCSF Familial Cardiomyopathy Clinic and the database of clinical echocardiograms. Patients had a variety of thickening patterns, including upper septal hyper- trophy, concentric hypertrophy, and predominantly apical hypertrophy. A subset of patients underwent genetic testing. Overall, 18% of all patients had pathogenic or likely patho- genic mutations. We downloaded all echocardiograms within the UCSF database corresponding to these patients and confirmed evidence of hypertrophy. We excluded bicycle, treadmill, and dobutamine stress echocardiograms because these tend to include slightly modified views or image anno- tations that could have confounding effects on models trained for disease detection. We also excluded studies of patients conducted after septal myectomy or alcohol septal ablation and studies of patients with pacemakers or implantable defibrillators. Control patients were also selected from the UCSF echocardiographic database. For each hypertrophic cardiomyopathy (HCM) case study, ≤5 matched control studies were selected, with matching by age (in 10-year bins), sex, year of study, ultrasound device manufacturer, and model. This process was simplified by organizing all of our studies in a nested format in a python dictionary so we can look up studies by these characteris- tics. Given that the marginal cost of analyzing additional samples is minimal in our automated system, we did not perform a greedy search for matched controls. Case, con- trol, and study characteristics are described in Table V in the online-only Data Supplement.

We did not require that cases were disease-free, only that they did not have HCM.

我们在加州大学旧金山分校发现了260名符合肥厚性心肌病指南标准的患者:“未解释的左心室(LV)肥大(最大左室壁厚≥15 mm)与非扩张性心室室有关,而另一种心脏或系统性疾病本身能够产生这些患者选自2个来源:加州大学旧金山分校家族性心肌病诊所和临床超声心动图数据库。患者有各种各样的增厚模式,包括上中隔高营养型,向心性肥大和以心尖肥大为主。一部分病人接受了基因检测。总的来说,18%的患者有致病性或可能的致病性突变。




(补充资料)Additionally, each echocardiogram contains periphery information unique to different output settings on ultrasound machines used to collect the data. This periphery information details additional details collected (i.e. electrocardiogram, blood pressure, etc.). To improve generalizability across institutions, we wanted the classification of views to use ultrasound data and not metadata presented in the periphery. To address this issue, every image is randomly cropped between 0-20 pixels from each edge and resized to 224x224 during training. This provides variation in the periphery information, which guides the network to target more relevant features and improves the overall robustness of our view classification models.


(补充资料)Training data comprised of 10 random frames from each manually labeled echocardiographic video. We trained our network on approximately 70,000 pre -processed images. For stochastic optimization, we used the ADAM optimizer2 with an initial learning rate of 1e-5 and mini-batch size of 64. For regularization, we applied a weight decay of 1e-8 on all network weights and dropout with probability 0.5 on the fully connected layers. We ran our tests for 20 epochs or ~20,000 iterations, which takes ~3.5 hours on a Nvidia GTX 1080. Runtime per video was 600 ms on average.

Accuracy was assessed by 5-fold cross-validation at the individual image level. When deploying the model, we would average the prediction probabilities for 10 randomly selected images from each video.




  1. 我们排除了bicycle、treadmill和多巴酚丁胺负荷超声心动图,因为这些超声心动图往往包括稍微修改的视图或图像注释,可能会对模型产生混淆的影响;
  2. 我们也排除了对间隔肌切除术或酒精性间隔消融术后患者的研究,以及对使用起搏器或植入式除颤器的患者的研究。
  3. 对照组患者也从加州大学旧金山分校超声心动图数据库中选择。
  4. 为了提高跨机的鲁棒性,从每个图片的每个边缘随机剪裁0到20个像素,并在训练期间调整成224x224大小。
  5. 一个标注的视频中抽取10个视频帧作为训练的输入,所以右7W个输入,这个卷积也是2D的卷积,在推理阶段把10个帧的预测值的均值作为视频的预测值

3.2 卷积网络

We first developed a model for view classification. Typical echocardiograms consist of ≥70 separate videos representing multiple viewpoints. Furthermore, with rotation and adjust- ment of the zoom level of the ultrasound probe, sonogra- phers actively focus on substructures within an image, thus creating many variations of these views. Unfortunately, none of these views is labeled explicitly. Thus, the first learning step involves teaching the machine to recognize individual echo- cardiographic views. Models are trained using manual labels assigned to indi- vidual images. Using the 277 studies described earlier, we assigned 1 of 30 labels to each video (eg, parasternal long axis or subcostal view focusing on the abdominal aorta). Because discrimination of all views (subcostal, hepatic vein versus subcostal, inferior vena cava) was not necessary for our downstream analyses, we ultimately used only 23 view classes for our final model (Table IX in the online-only Data Supplement). The training data consisted of 7168 individually labeled videos.



3.3 VGG分类网络结构


The VGG network1 takes a fixed-sized input of grayscale images with dimensions 224x224 pixels (we use scikit-image to resize by linear interpolation). Each image is passed through ten convolution layers, five max-pool layers, and three fully connected layers. (We experimented with a larger number of convolution layers but saw no improvement for our task). All co nvolutional layers consist of 3x3 filters with stride 1 and all max-pooling is applied over a 2x2 window with stride 2. The convolution layers consist of 5 groups of 2 convolution layers, which are each followed by 1 max pool layer. The stack of convolutions is followed by two fully connected layers, each with 4096 hidden units, and a final fully connected layer with 23 output units. The output is fed into a 23-way softmax layer to represent 23 different echocardiographic views. This final step represents a standard multinomial logistic regression with 23 mutually exclusive classes. The predictors in this model are the output nodes of the neural network. The view with the highest probability was selected as the predicted view.

VGG network1采用尺寸为224x224像素的固定大小的灰度图像输入(我们使用scikit图像通过线性插值调整大小)。每个图像通过十个卷积层、五个最大池层和三个完全连接的层。(我们尝试了大量的卷积层,但没有发现我们的任务有任何改进)。所有共决层由3x3过滤器组成,步长为1,所有max池应用于步长为2的2x2窗口上。卷积层由5组2个卷积层组成,每个卷积层后面有1个最大池层。卷积之后是两个完全连接的层,每个层有4096个隐藏单元,最后一个完全连接层有23个输出单元。输出被送入23路softmax层,以表示23种不同的超声心动图视图。最后一步是标准的多项式logistic回归,有23个互斥类。该模型中的预测因子是神经网络的输出节点。选择概率最大的视图作为预测视图。

3.4 图像分割

To train image segmentation models, we derived a CNN based on the U-net architecture described by Ronneberger et al3. The U-net-based network we used accepts a 384x384 pixel fixed-sized image as input, and is composed of a contracting path and an expanding path with a total of 23 convolutional layers. The contracting path is composed of twelve convolutional layers with 3x3 filters followed by a rectified linear unit and four max pool layers each using a 2x2 window with stride 2 for down-sampling. The expanding path is composed of ten convolutional layers with 3x3 filters followed by a rectified linear unit, and four 2x2 up-convolution layers. Every up- convolution in the expansion path is concatenated with a feature map from the contracting path with same dimension. This is performed to recover the loss of pixel and feature locality due to downsampling images, which in turn enables pixel-level classification. The final layer uses a 1x1 convolution to map each feature vector to the output classes. Separate U-net CNN networks were trained to perform segmentation on images from PLAX, PSAX (at the level of the papillary muscle), A4c, A3c, and A2c views. Training data was derived for each class of echocardiographic view via manual segmentation. We performed data augmentation techniques including cropping and blacking out random areas of the echocardiographic image in order to improve model performance in the setting of a limited amount of training data. The rationale is that models that are robust to such variation are likely to generalize better to unseen data. Training data underwent varying degrees of cropp ing (or no cropping) at random amounts for each edge of the image. Similarly, circular areas of random size set at random locations in the echocardiographic image were set to 0-pixel intensity to achieve ''blackout''.This U-net architecture and the data augmentationtechniques enabled highly efficient training, achieving accurate segmentation from a relatively low number of training examples. Finally, in addition to pixelwise cross-entropy loss, we included a distance-based loss penalty for misclassified pixels. The loss function was based on the distance from the closest pixel with the same misclassified class in the ground truth image. This helped mitigate erroneous pixel predictions across the images. We used an Intersection Over Union (IoU) metric for assessment of results. The IoU takes the number of pixels which overlap between the ground truth and automated segmentation (for a given class, such as left atrial blood pool) and divides them by the total number of pixels assigned to that class by either method. It ranges between 0 and 100.









4 遇到的问题

During the training process, we found that our CNN models readily segmented the LV across a wide range of videos from hundreds of studies, and we were thus interested in understanding the origin of the extreme outliers in our Bland-Altman plots (Figure 4). We under- took a formal analysis of the 20 outlier cases where the discrepancy between manual and automated measure- ments for LV end diastolic volume was highest (>99.5th percentile). This included 10 studies where the auto- mated value was estimated to be much higher than manual (DiscordHI) and 10 where the reverse was seen (DiscordLO). For each study, we repeated the manual LV end diastolic volume measurement. For every 1 of the 10 studies in DiscordHI, we de- termined that the automated result was in fact cor- rect (median absolute deviation=8.6% of the repeat manual value), whereas the prior manual measure- ment was markedly inaccurate (median absolute devia- tion=70%). It is unclear why these incorrect values had been entered into our clinical database. For DiscordLO (ie, much lower automated value), the results were mixed. For 2 of the 10 studies, the automated value was correct and the previous manual value erroneous; for 3 of the 10, the repeated value was intermediate between automated and manual. For 5 of the 10 stud- ies in DiscordLO, there were clear problems with the au-

tomated segmentation. In 2 of the 5, intravenous con- trast had been used in the study, but the segmentation algorithm, which had not been trained on these types of data, attempted to locate a black blood pool. The third poorly segmented study involved a patient with complex congenital heart disease with a double out- let right ventricle and membranous ventricular septal defect. The fourth study involved a mechanical mitral valve with strong acoustic shadowing and reverbera- tion artifact. Finally, the fifth poorly segmented study had a prominent calcified false tendon in the LV com- bined with a moderately sized pericardial effusion. This outlier analysis thus highlighted the presence of inac- curacies in our clinical database as well as the types of studies that remain challenging for our automated segmentation algorithms.






医学AI论文解读 |Circulation|2018| 超声心动图的全自动检测在临床上的应用的更多相关文章

  1. CVPR 2019 论文解读 | 小样本域适应的目标检测

    引文 ​ 最近笔者也在寻找目标检测的其他方向,一般可以继续挖掘的方向是从目标检测的数据入手,困难样本的目标检测,如检测物体被遮挡,极小人脸检测,亦或者数据样本不足的算法.这里笔者介绍一篇小样本(few ...

  2. NIPS2018最佳论文解读:Neural Ordinary Differential Equations

    NIPS2018最佳论文解读:Neural Ordinary Differential Equations 雷锋网2019-01-10 23:32     雷锋网 AI 科技评论按,不久前,NeurI ...

  3. [论文解读] 阿里DIEN整体代码结构

    [论文解读] 阿里DIEN整体代码结构 目录 [论文解读] 阿里DIEN整体代码结构 0x00 摘要 0x01 文件简介 0x02 总体架构 0x03 总体代码 0x04 模型基类 4.1 基本逻辑 ...

  4. Gaussian field consensus论文解读及MATLAB实现

    Gaussian field consensus论文解读及MATLAB实现 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/ 一.Introduction ...

  5. zz扔掉anchor!真正的CenterNet——Objects as Points论文解读

    首发于深度学习那些事 已关注写文章   扔掉anchor!真正的CenterNet——Objects as Points论文解读 OLDPAN 不明觉厉的人工智障程序员 ​关注他 JustDoIT 等 ...

  6. 论文解读丨基于局部特征保留的图卷积神经网络架构(LPD-GCN)

    摘要:本文提出一种基于局部特征保留的图卷积网络架构,与最新的对比算法相比,该方法在多个数据集上的图分类性能得到大幅度提升,泛化性能也得到了改善. 本文分享自华为云社区<论文解读:基于局部特征保留 ...

  7. 带你读AI论文丨用于目标检测的高斯检测框与ProbIoU

    摘要:本文解读了<Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection&g ...

  8. 带你读AI论文丨LaneNet基于实体分割的端到端车道线检测

    摘要:LaneNet是一种端到端的车道线检测方法,包含 LanNet + H-Net 两个网络模型. 本文分享自华为云社区<[论文解读]LaneNet基于实体分割的端到端车道线检测>,作者 ...

  9. itemKNN发展史----推荐系统的三篇重要的论文解读

    itemKNN发展史----推荐系统的三篇重要的论文解读 本文用到的符号标识 1.Item-based CF 基本过程: 计算相似度矩阵 Cosine相似度 皮尔逊相似系数 参数聚合进行推荐 根据用户 ...


  1. spring boot:swagger3文档展示分页和分栏的列表数据(swagger 3.0.0 / spring boot 2.3.3)

    一,什么情况下需要展示分页和分栏的数据的文档? 分页时,页面上展示的是同一类型的列表的数据,如图: 分栏时,每行都是一个列表,而且展示的数据类型也可能不同 这也是两种常用的数据返回形式 说明:刘宏缔的 ...

  2. maven项目导入eclipse报错

    错误提示: 原因:未安装maven,缺少ojdbc6.jar包 解决: 一.安装maven 第一步百度搜索Maven官网,进去之后,下载apache-maven-3.5.3-bin.zip,下载完成之 ...

  3. bzoj2539 丘比特的烦恼、黑书P333 (最优二分图匹配)

      丘比特的烦恼 题目描述 Description 随着社会的不断发展,人与人之间的感情越来越功利化.最近,爱神丘比特发现,爱情也已不再是完全纯洁的了.这使得丘比特很是苦恼,他越来越难找到合适的男女, ...

  4. 手撸了一个HTTP框架:支持Sprng MVC、IOC、AOP,拦截器,配置文件读取...

    https://github.com/Snailclimb/jsoncat :仿 Spring Boot 但不同于 Spring Boot 的一个轻量级的 HTTP 框架 距离上一次给小伙伴们汇报简易 ...

  5. typora的快捷键文档

    一:菜单栏 文件:alt+F 编辑:alt+E 段落:alt+P 格式:alt+O 视图:alt+V 主题:alt+T 帮助:alt+H 二:文件 新建:Ctrl+N 新建窗口:Ctrl+Shift+ ...

  6. 微信小程序picker组件两列关联使用方式

    在使用微信小程序picker组件时候,可以设置属性   mode = multiSelector   意为多列选择,关联选择,当第一列发生改变时侯,第二列甚至第三列发生相应的改变.但是官方文档上给的只 ...

  7. python 作业 用python实现 mysql查询结果导出带列名

    1 import pandas as pd 2 import numpy as np 3 import matplotlib as mpl 4 import matplotlib.pyplot as ...

  8. STM32入门系列-库目录及文件介绍

    已经介绍了过了CMSIS标准,ST公司按照这个标准设计了一套基于STM32F10x的固件库,我们可以直接在ST公司的官网进行下载,现在给大家STM32最新固件库v3.5,在网盘上给大家提供了下载包,链 ...

  9. 20200726_java爬虫_使用HttpClient模拟浏览器发送请求

    浏览器获取数据: 打开浏览器 ==> 输入网址 ==> 回车查询 ==> 返回结果 ==> 浏览器显示结果数据 HttpClient获取数据: 创建HttpClient ==& ...

  10. Simulink中封装子系统

    学习目的: 使用simulink封装一个子系统,并将封装子系统放入到自定义的库中,可供建模时重复使用 功能:封装一个能够检测输入信号下降沿跳变的边沿检测模块,该模块可支持双击时修改内部参数.封装完成后 ...