SC3聚类 | 拉普拉斯矩阵 | Laplacian matrix | 图论

Laplacian和PCA貌似是同一种性质的方法，坐标系变换。只是拉普拉斯属于图论的范畴，术语更加专业了。

要看就把一篇文章看完整，再看其中有什么值得借鉴的，总结归纳理解后的东西才是属于你的。

问题：

1. 这篇文章有哪些亮点决定他能发NM？单细胞，consensus，较好的表现，包装了一些专业的术语，显得自己很专业，其实真正做的东西很少；

2. consensus方法的本质是什么？

3. 工具的评估准则？ARI，silhouette index

4. SC3的最大缺点是什么？速度太慢，超过1000个细胞就非常耗费计算和存储资源

5. 能看懂SC3这个R包的逻辑吗？核心的就4步，多种距离度量，转换，kmeans聚类，consensus；

The main sc3 method explained above is a wrapper that calls several other SC3 methods in the following order:

sc3_prepare
sc3_estimate_k - Tracy-Widom theory - random matrix theory (RMT)
sc3_calc_dists
sc3_calc_transfs
sc3_kmeans
sc3_calc_consens
sc3_calc_biology

6. 有很多问题没有回答，这篇文章偏技工！核心就是kmeans，打了个复杂的包而已。

不同距离的度量有什么差异？
为什么要做两种转换PCA和laplacian？
为什么选择了kmeans？不知道它有天然的劣势吗
做consensus的理论依据是什么？凭什么说做了一致性后结果就更好？

最近在看SC3聚类这篇文章，SC3使用了这个工具。

SC3: consensus clustering of single-cell RNA-seq data

All distance matrices are then transformed using either principal component analysis (PCA) or by calculating the eigenvectors of the associated graph Laplacian (L = I – D–1/2AD–1/2, where I is the identity matrix, A is a similarity matrix (A = e–A′/max(A′)), where A′ is a distance matrix) and D is the degree matrix of A, a diagonal matrix that contains the row-sums of A on the diagonal (Dii = ΣjAij). The columns of the resulting matrices are then sorted in ascending order by their corresponding eigenvalues.

先看下该工具的功能：SC3 package manual

跑一下常规代码：

library(SingleCellExperiment)

library(SC3)

library(scater)

head(ann)

yan[1:3, 1:3]

# create a SingleCellExperiment object

sce <- SingleCellExperiment(

  assays = list(

    counts = as.matrix(yan),

    logcounts = log2(as.matrix(yan) + 1)

  ),

  colData = ann

)

# define feature names in feature_symbol column

rowData(sce)$feature_symbol <- rownames(sce)

# remove features with duplicated names

sce <- sce[!duplicated(rowData(sce)$feature_symbol), ]

# define spike-ins

isSpike(sce, "ERCC") <- grepl("ERCC", rowData(sce)$feature_symbol)

plotPCA(sce, colour_by = "cell_type1")

sce <- sc3(sce, ks = 2:4, biology = TRUE)

# sc3_interactive(sce)

# sc3_export_results_xls(sce)

######################################

sce <- sc3_prepare(sce)

sce <- sc3_estimate_k(sce)

sce <- sc3_calc_dists(sce)

names(metadata(sce)$sc3$distances)

sce <- sc3_calc_transfs(sce)

names(metadata(sce)$sc3$transformations)

metadata(sce)$sc3$distances

sce <- sc3_kmeans(sce, ks = 2:4)

names(metadata(sce)$sc3$kmeans)

col_data <- colData(sce)

head(col_data[ , grep("sc3_", colnames(col_data))])

sce <- sc3_calc_consens(sce)

names(metadata(sce)$sc3$consensus)

names(metadata(sce)$sc3$consensus$`3`)

col_data <- colData(sce)

head(col_data[ , grep("sc3_", colnames(col_data))])

sce <- sc3_calc_biology(sce, ks = 2:4)

sce <- sc3_run_svm(sce, ks = 2:4)

col_data <- colData(sce)

head(col_data[ , grep("sc3_", colnames(col_data))])

接下来会尝试拆一下该工具。

怎么拆这个工具？

这种封装的很好的R包其实比较难拆，一般的通过函数名字就可以看到R代码，但这里你输入函数名，如sc3_calc_dists，看到的只是以下的封装好的代码：

new("nonstandardGenericFunction", .Data = function (object)

{

    standardGeneric("sc3_calc_dists")

}, generic = structure("sc3_calc_dists", package = "SC3"), package = "SC3",

    group = list(), valueClass = character(0), signature = "object",

    default = NULL, skeleton = (function (object)

    stop("invalid call in method dispatch to 'sc3_calc_dists' (no default method)",

        domain = NA))(object))

暂时还不熟悉这种形式，所以只能通过函数名去GitHub里面查了。

GitHub真的很优秀，可以直接查文件内部代码，可以很快定位到sc3_calc_dists。

再配合这个目录插件，效率提高了不少，https://www.octotree.io/?utm_source=lite&utm_medium=extension

以下是封装前的代码：

#' Calculate distances between the cells.

#'

#' This function calculates distances between the cells. It

#' creates and populates the following items of the \code{sc3} slot of the \code{metadata(object)}:

#' \itemize{

#'   \item \code{distances} - contains a list of distance matrices corresponding to

#'   Euclidean, Pearson and Spearman distances.

#' }

#'

#' @name sc3_calc_dists

#' @aliases sc3_calc_dists, sc3_calc_dists,SingleCellExperiment-method

#'

#' @param object an object of \code{SingleCellExperiment} class

#'

#' @return an object of \code{SingleCellExperiment} class

#'

#' @importFrom doRNG %dorng%

#' @importFrom foreach foreach %dopar%

#' @importFrom parallel makeCluster stopCluster

#' @importFrom doParallel registerDoParallel

sc3_calc_dists.SingleCellExperiment <- function(object) {

    dataset <- get_processed_dataset(object)

    # check whether in the SVM regime

    if (!is.null(metadata(object)$sc3$svm_train_inds)) {

        dataset <- dataset[, metadata(object)$sc3$svm_train_inds]

    }

    # NULLing the variables to avoid notes in R CMD CHECK

    i <- NULL

    distances <- c("euclidean", "pearson", "spearman")

    message("Calculating distances between the cells...")

    if (metadata(object)$sc3$n_cores > length(distances)) {

        n_cores <- length(distances)

    } else {

        n_cores <- metadata(object)$sc3$n_cores

    }

    cl <- parallel::makeCluster(n_cores, outfile = "")

    doParallel::registerDoParallel(cl, cores = n_cores)

    # calculate distances in parallel

    dists <- foreach::foreach(i = distances) %dorng% {

        try({

            calculate_distance(dataset, i)

        })

    }

    # stop local cluster

    parallel::stopCluster(cl)

    names(dists) <- distances

    metadata(object)$sc3$distances <- dists

    return(object)

}

#' @rdname sc3_calc_dists

#' @aliases sc3_calc_dists

setMethod("sc3_calc_dists", signature(object = "SingleCellExperiment"), sc3_calc_dists.SingleCellExperiment)

通过setMethod链接到一起的。

顺路找到了原函数：

#' Calculate a distance matrix

#'

#' Distance between the cells, i.e. columns, in the input expression matrix are

#' calculated using the Euclidean, Pearson and Spearman metrics to construct

#' distance matrices.

#'

#' @param data expression matrix

#' @param method one of the distance metrics: 'spearman', 'pearson', 'euclidean'

#' @return distance matrix

#'

#' @importFrom stats cor dist

#'

#' @useDynLib SC3

#' @importFrom Rcpp sourceCpp

#'

calculate_distance <- function(data, method) {

    return(if (method == "spearman") {

        as.matrix(1 - cor(data, method = "spearman"))

    } else if (method == "pearson") {

        as.matrix(1 - cor(data, method = "pearson"))

    } else {

        ED2(data)

    })

}

#' Distance matrix transformation

#'

#' All distance matrices are transformed using either principal component

#' analysis (PCA) or by calculating the

#' eigenvectors of the graph Laplacian (Spectral).

#' The columns of the resulting matrices are then sorted in

#' descending order by their corresponding eigenvalues.

#'

#' @param dists distance matrix

#' @param method transformation method: either 'pca' or

#' 'laplacian'

#' @return transformed distance matrix

#'

#' @importFrom stats prcomp cmdscale

#'

transformation <- function(dists, method) {

    if (method == "pca") {

        t <- prcomp(dists, center = TRUE, scale. = TRUE)

        return(t$rotation)

    } else if (method == "laplacian") {

        L <- norm_laplacian(dists)

        l <- eigen(L)

        # sort eigenvectors by their eigenvalues

        return(l$vectors[, order(l$values)])

    }

}

#' Calculate consensus matrix

#'

#' Consensus matrix is calculated using the Cluster-based Similarity

#' Partitioning Algorithm (CSPA). For each clustering solution a binary

#' similarity matrix is constructed from the corresponding cell labels:

#' if two cells belong to the same cluster, their similarity is 1, otherwise

#' the similarity is 0. A consensus matrix is calculated by averaging all

#' similarity matrices.

#'

#' @param clusts a matrix containing clustering solutions in columns

#' @return consensus matrix

#'

#' @useDynLib SC3

#' @importFrom Rcpp sourceCpp

#' @export

consensus_matrix <- function(clusts) {

    res <- consmx(clusts)

    colnames(res) <- as.character(c(1:nrow(clusts)))

    rownames(res) <- as.character(c(1:nrow(clusts)))

    return(res)

}

距离计算
转换
consensus

都在这里。。。　　

ED2是他们实验室自己用Rcpp写的一个计算欧氏距离的工具。

transformation输入的是对称的距离矩阵（行列都是样本细胞），然后做完PCA，返回了rotation，不知道这样做有什么意义？

还真有用PCA来处理距离相似度矩阵的，MDS，目的就是降维，因为后面要用kmean聚类；

然后对每一个降维了的矩阵用kmeans；

consensus用的是这个算法：Cluster-based Similarity Partitioning Algorithm (CSPA)，做这个的意义何在？输入是每个细胞的多重聚类结果，然后做了一个一致性统一。

参考：

拉普拉斯矩阵（Laplacian matrix）

SC3聚类 | 拉普拉斯矩阵 | Laplacian matrix | 图论 | R代码的更多相关文章

拉普拉斯矩阵(Laplacian Matrix) 及半正定性证明
摘自 https://blog.csdn.net/beiyangdashu/article/details/49300479 和 https://en.wikipedia.org/wiki/Lapla ...
拉普拉斯矩阵（Laplacian matrix）
原文地址:https://www.jianshu.com/p/f864bac6cb7a 拉普拉斯矩阵是图论中用到的一种重要矩阵,给定一个有n个顶点的图 G=(V,E),其拉普拉斯矩阵被定义为 L = ...
拉普拉斯矩阵（Laplace Matrix）与瑞利熵（Rayleigh quotient）
作者:桂. 时间:2017-04-13 07:43:03 链接:http://www.cnblogs.com/xingshansi/p/6702188.html 声明:欢迎被转载,不过记得注明出处哦 ...
R语言编程艺术# 矩阵（matrix）和数组（array）
矩阵(matrix)是一种特殊的向量,包含两个附加的属性:行数和列数.所以矩阵也是和向量一样,有模式(数据类型)的概念.(但反过来,向量却不能看作是只有一列或一行的矩阵. 数组(array)是R里更一 ...
R语言编程艺术#02#矩阵（matrix）和数组（array）
矩阵(matrix)是一种特殊的向量,包含两个附加的属性:行数和列数.所以矩阵也是和向量一样,有模式(数据类型)的概念.(但反过来,向量却不能看作是只有一列或一行的矩阵. 数组(array)是R里更一 ...
graph Laplacian 拉普拉斯矩阵
转自:https://www.kechuang.org/t/84022?page=0&highlight=859356,感谢分享! 在机器学习.多维信号处理等领域,凡涉及到图论的地方,相信小伙 ...
从零开始学习R语言（三）——数据结构之“矩阵（Matrix）”
本文首发于知乎专栏:https://zhuanlan.zhihu.com/p/60140022 也同步更新于我的个人博客:https://www.nickwu.cn/blog/id=129 3. [二 ...
【Math for ML】矩阵分解(Matrix Decompositions) （下）
[Math for ML]矩阵分解(Matrix Decompositions) (上) I. 奇异值分解(Singular Value Decomposition) 1. 定义 Singular V ...
【Math for ML】矩阵分解(Matrix Decompositions) （上）
I. 行列式(Determinants)和迹(Trace) 1. 行列式(Determinants) 为避免和绝对值符号混淆,本文一般使用\(det(A)\)来表示矩阵\(A\)的行列式.另外这里的\ ...

随机推荐

EhLib使用全攻略
使用 TDBSumList 组件还记得以前有朋友问过这样一个问题:在 DBGrid 下如何像 Excel 一样能够做统计计算,实话说,使用 DBGrid 来做的话着实不易,不过现在有了这个咚咚, ...
MySQL小记——数据格式化
记录下今天在项目中出现的一个小问题. 将通过除运算获得的结果数据进行保留两位小数的处理时,我用了MySQL 的 FORMAT(X, D)函数,之前一直没有出现问题,但是由于周期性更新的数据库中突然出现 ...
[http] http body中chunked数据的编码格式
一我们知道,http response的body可以使用chunked编码.这个时候不需要显示的指定content-length来标记结尾. 如: 我们可以见到编码的chunked字样,并且没有看 ...
[ipsec] 特别硬核的ike/ipsec NAT穿越机制分析
〇前言这怕是最后一篇关于IKE,IPSEC的文字了,因为不能没完没了. 所以,我一直在想这个标题该叫什么.总的来说可以将其概括为:IKE NAT穿越机制的分析. 但是,同时它也回答了以下问题: ( ...
【转】高性能网络编程3----TCP消息的接收
这篇文章将试图说明应用程序如何接收网络上发送过来的TCP消息流,由于篇幅所限,暂时忽略ACK报文的回复和接收窗口的滑动. 为了快速掌握本文所要表达的思想,我们可以带着以下问题阅读: 1.应用程序调用r ...
华为云和开源Istio运维管理对比样例应用部署
前言在公有云方面,华为云已经率先将 Istio 作为产品投入到公有云中进行商业应用中,保持和开源istio高度兼容,做了商业化的运维管理界面,同时进行了性能优化.这里我们做一次验证测试. Booki ...
web workers 实例
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
关于bat文件的批处理
Windows聚焦壁纸的保存目录 window+R>复制下面的路径>回车 %localappdata%\Packages\Microsoft.Windows.ContentDelivery ...
SQL Server视频总结
经过这几天艰苦卓绝的奋斗,我终于把视频看完了,可是不知道自己看了什么,下面就来总结一下,看看都学到了那些. 数据库和VB中有很多地方相似,我们可以直接搬过来,而不必再当做新知识给自己增加难度,要调动自 ...
行为型模式(八) 职责链模式（Chain of Responsibility）
一.动机(Motivate) 在软件构建过程中,一个请求可能被多个对象处理,但是每个请求在运行时只能有一个接受者,如果显示指定,将必不可少地带来请求发送者与接受者的紧耦合.如何使请求的发送者不需要指定 ...

SC3聚类 | 拉普拉斯矩阵 | Laplacian matrix | 图论 | R代码

SC3聚类 | 拉普拉斯矩阵 | Laplacian matrix | 图论 | R代码的更多相关文章

随机推荐

热门专题