其实就是另一种形式的打分。

个人点评这种方法:

这篇文章发表在nature上,有点奇怪,个人感觉创新性和重要性还不够格,工具很多,但是本文基本都是自己开发的算法(毕竟satji就是搞统计出身的)。

但是仔细看了代码之后,发现这些方法确实是有点artificial!

而且算法原创性不高,多半是基于现有的一些工具的二次开发。

Identifying a maturation trajectory.

To assign each cell a maturation score that is proportional to the developmental progress, we first performed dimensionality reduction as described above using all genes that were detected in at least 2% of the cells (8,014 genes). This resulted in four significant dimensions. We then fit a principal curve (R package princurve, smoother= ‘lowess’, f= 1/3) through the data. The maturation score of a cell is then the arc-length from the beginning of the curve to the point at which the cell projects onto the curve.

The resulting curve is directionless, so we assign the ‘beginning’ of the curve so that the expression of Nes is negatively correlated with maturation. Nes is a known ventricular zone marker and therefore should only be highly expressed early in the trajectory. Maturation scores are normalized to the interval [0, 1]. In an independent analysis, we also used Monocle2 to order cells along a pseudo-time. We used Monocle version 2.3.6 with expression response variable set to negative binomial. We estimated size factors and dispersion using the default functions.

For ordering cells, we reduced the set of genes based on results of the monocle dispersion Table function, and only considered 718 genes with mean expression≥0.01 and an empirical dispersion at least twice as large as the fitted dispersion. Dimensionality reduction was carried out using the default method (DDRTree)

为了打分,先是降维, 然后拟合了一个principal curve,根据长度来将分数normalize到0~1,其中只利用了一个key gene(Nes),这相当的人为了。

Defining mitotic and post mitotic populations.

We observed a sharp transition point along the maturation trajectory at which cells uniformly transitioned into a postmitotic state, corresponding to the loss of proliferation potential and exit from the cell cycle (Fig. 1f, Extended Data Fig. 1).

We therefore subdivided the maturation trajectory into a mitotic and postmitotic phase to facilitate downstream analyses. We defined cells with a high phase-specific enrichment score (score >2, see section ‘Removal of cell cycle effect’) as being in the S or the G2/M phase.

We then fitted a smooth curve (loess, span=0.33, degree=2) to number of cells in S, G2/M phases as a function of maturation score. The point where this curve falls below half the global average marks the dividing threshold (Fig. 1f).

我对这种方法有着深深的怀疑!

# for maturation trajectory
# fit maturation trajectory
maturation.trajectory <- function(cm, md, expr, pricu.f=1/3) {
cat('Fitting maturation trajectory\n')
genes <- apply(cm[rownames(expr), ] > 0, 1, mean) >= 0.02 & apply(cm[rownames(expr), ] > 0, 1, sum) >= 3
rd <- dim.red(expr[genes, ], max.dim=50, ev.red.th=0.04)
# for a consisten look use Nes expression to orient each axis
for (i in 1:ncol(rd)) {
if (cor(expr['Nes', ], rd[, i]) > 0) {
rd[, i] <- -rd[, i]
}
} md <- md[, !grepl('^DMC', colnames(md))]
md <- cbind(md, rd) pricu <- principal.curve(rd, smoother='lowess', trace=TRUE, f=pricu.f, stretch=333)
# two DMCs
pc.line <- as.data.frame(pricu$s[order(pricu$lambda), ])
# lambda, for each point, its arc-length from the beginning of the curve. The curve is parametrized approximately by arc-length, and hence is unit-speed.
md$maturation.score <- pricu$lambda/max(pricu$lambda) # orient maturation score using Nes expression
if (cor(md$maturation.score, expr['Nes', ]) > 0) {
md$maturation.score <- -(md$maturation.score - max(md$maturation.score))
} # use 1% of neighbor cells to smooth maturation score
md$maturation.score.smooth <- nn.smooth(md$maturation.score, rd[, 1:2], round(ncol(expr)*0.01, 0)) # pick maturation score cutoff to separate mitotic from post-mitotic cells
md$in.cc.phase <- md$cc.phase != 0
fit <- loess(as.numeric(md$in.cc.phase) ~ md$maturation.score.smooth, span=0.5, degree=2)
md$cc.phase.fit <- fit$fitted
# pick MT threshold based on drop in cc.phase cells
# ignore edges of MT because of potential outliers
mt.th <- max(subset(md, cc.phase.fit > mean(md$in.cc.phase)/2 & maturation.score.smooth >= 0.2 & maturation.score.smooth <= 0.8)$maturation.score.smooth) md$postmitotic <- md$maturation.score.smooth > mt.th
return(list(md=md, pricu=pricu, pc.line=pc.line, mt.th=mt.th))
} # for smoothing maturation score
nn.smooth <- function(y, coords, k) {
knn.out <- FNN::get.knn(coords, k)
w <- 1 / (knn.out$nn.dist+.Machine$double.eps)
w <- w / apply(w, 1, sum)
v <- apply(knn.out$nn.index, 2, function(i) y[i])
return(apply(v*w, 1, sum))
}

  

  

单细胞数据高级分析之构建成熟路径 | Identifying a maturation trajectory的更多相关文章

  1. 单细胞数据高级分析之初步降维和聚类 | Dimensionality reduction | Clustering

    个人的一些碎碎念: 聚类,直觉就能想到kmeans聚类,另外还有一个hierarchical clustering,但是单细胞里面都用得不多,为什么?印象中只有一个scoring model是用kme ...

  2. 单细胞数据高级分析之消除细胞周期因素 | Removal of cell cycle effect

    The normalization method described above aims to reduce the effect of technical factors in scRNA-seq ...

  3. PLUTO平台是由美林数据技术股份有限公司下属西安交大美林数据挖掘研究中心自主研发的一款基于云计算技术架构的数据挖掘产品,产品设计严格遵循国际数据挖掘标准CRISP-DM(跨行业数据挖掘过程标准),具备完备的数据准备、模型构建、模型评估、模型管理、海量数据处理和高纬数据可视化分析能力。

    http://www.meritdata.com.cn/article/90 PLUTO平台是由美林数据技术股份有限公司下属西安交大美林数据挖掘研究中心自主研发的一款基于云计算技术架构的数据挖掘产品, ...

  4. 第二篇:智能电网(Smart Grid)中的数据工程与大数据案例分析

    前言 上篇文章中讲到,在智能电网的控制与管理侧中,数据的分析和挖掘.可视化等工作属于核心环节.除此之外,二次侧中需要对数据进行采集,数据共享平台的搭建显然也涉及到数据的管理.那么在智能电网领域中,数据 ...

  5. 在HDInsight中从Hadoop的兼容BLOB存储查询大数据的分析

    在HDInsight中从Hadoop的兼容BLOB存储查询大数据的分析 低成本的Blob存储是一个强大的.通用的Hadoop兼容Azure存储解决方式无缝集成HDInsight.通过Hadoop分布式 ...

  6. 个人永久性免费-Excel催化剂功能第93波-地图数据挖宝之两点距离的路径规划

    在日常手机端,网页端的向地图发出两点距离的行程规划,相信绝大多数人都有用到过,但毕竟是个体单一行为,若某些时候需要用到批量性的操作,就显得很不现实了,同时,数据只是在应用或网页内,非结构化的数据,也是 ...

  7. Lakehouse: 统一数据仓库和高级分析的新一代开放平台

    1. 摘要 数仓架构在未来一段时间内会逐渐消亡,会被一种新的Lakehouse架构取代,该架构主要有如下特性 基于开放的数据格式,如Parquet: 机器学习和数据科学将被作为头等公民支持: 提供卓越 ...

  8. 机器学习建模高级用法!构建企业级AI建模流水线 ⛵

    作者:韩信子@ShowMeAI 机器学习实战系列: http://www.showmeai.tech/tutorials/41 本文地址:http://www.showmeai.tech/articl ...

  9. 《Wireshark数据包分析实战》 - http背后,tcp/ip抓包分析

    作为网络开发人员,使用fiddler无疑是最好的选择,方便易用功能强. 但是什么作为爱学习的同学,是不应该止步于http协议的,学习wireshark则可以满足这方面的需求.wireshark作为抓取 ...

随机推荐

  1. 【Python047-魔法方法:定制序列】

    一.协议是什么 1.协议(protocols)与其他编程语言中的接口很相似,它规定你那些方法必须要定义.然而在Python中协议就显的不那么正式,事实上,在Python中,协议更像是一种指南 2.容器 ...

  2. Delphi XE5 for Android (十一)

    以下内容是根据Delphi的帮助文件进行试验的,主要测试Android下的消息提醒. 首先建立一个空白的Android工程,然后在窗体中加入一个TNotificationCenter控件,如下图: 再 ...

  3. linux 子shell subshell和函数

    关于子shell, subshell 参考:http://blog.csdn.net/sosodream/article/details/5683515 系统引导时的进程为 "原始进程&qu ...

  4. 【做题】agc003E - Sequential operations on Sequence——经典结论

    题意:有一个序列,初始是从\(1\)到\(n\)的\(n\)个数.有\(q\)次操作,每次操作给出\(q_i\),把当前的序列重复无数遍,然后截取最前面的\(q_i\)个元素作为新序列.要求输出完成所 ...

  5. sed 替换换行回车

    A carriage return linefeed (CRLF) is a special sequence of characters, used by DOS/Windows, to signi ...

  6. P3317 [SDOI2014]重建

    思路 变元矩阵树定理可以统计最小生成树边权积的和,将A矩阵变为边权,D变为与该点相连的边权和,K=D-A,求K的行列式即可 把式子化成 \[ \begin{align}&\sum_{T}\pr ...

  7. (zhuan) Deep Deterministic Policy Gradients in TensorFlow

          Deep Deterministic Policy Gradients in TensorFlow AUG 21, 2016 This blog from: http://pemami49 ...

  8. LeetCode - 198 简单动态规划 打家劫舍

    你是一个专业的小偷,计划偷窃沿街的房屋.每间房内都藏有一定的现金,影响你偷窃的唯一制约因素就是相邻的房屋装有相互连通的防盗系统,如果两间相邻的房屋在同一晚上被小偷闯入,系统会自动报警. 给定一个代表每 ...

  9. WijmoJS 使用Web Workers技术,让前端 PDF 导出效率更高效

    概述 Web Workers是一种Web标准技术,允许在后台线程中执行脚本处理. WijmoJS 的2018v3版本引入了Web Workers技术,以便在生成PDF时提高应用程序的运行速度. 一般来 ...

  10. sap hana 数据库 EBS

    SAP实时数据平台详解 ************************************************************ EBS是Oracle 公司对原有应用产品整合后的一个产 ...