From Tutte institute for mathematics and computing

Problem: dimension reduction

Theoretical foundations:

At a high level, UMAP uses local manifold approximations and patches together their local fuzzy simplicial set representations to construct a topological representation of the high dimensional data. Given some low dimensional representation of the data, a similar process can be used to construct an equivalent topological representation. UMAP then optimizes the layout of the data representation in the low dimensional space, to minimize the cross-entropy between the two topological representations.

解释:使用local manifold approximations and local fuzzy simplicial set presentations, 在高维空间上构建了一个拓扑表征topological representation,在低维空间上,同样构建一个等价的拓扑表征,之后运用交叉熵作为优化函数,来计算两个空间拓扑表征的差异性,从而使差异性最小化。

Construction of fuzzy topological representations:

1. approximating a manifold on which the data is assumed to lie;

2. constructing a fuzzy simplicial set representation of the approximated manifold.

解释:

疑问:一组高维数据究竟落在哪?高维数据应该用哪个空间进行衡量?Euclidean space, topological space, Riemannian space还是啥空间测量?还是应用不同的空间策略都能得到相似的结果?

1. approximating a manifold on which the data is assumed to lie,

Suppose the manifold is not known in advance and we wish to approximate geodesic distance on it. Let the input data be X = {X1 , . . . , XN }.

A Computational view of UMAP:

Two phases.

In the first phase, a particular weighted k-neighbour graph is constructed. In the second phase, a low dimensional layout of this graph is computed

1. weighted k-neighbour graph construction

Use the nearest neighbor descent algorithm of [1]

2. low dimensional layout

Use force-directed graph layout in low dimensional space.

Implementation and hyper-parameters:

Supplementary knowledge:

1. simplicial sets. 单纯集

In mathematics, a simplicial set is an object made up of "simplices单纯形" in a specific way. Simplicial sets are higher-dimensional generalizations of directed graphspartially ordered sets and categories

simplex: 单纯形,

In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron四边形 to arbitrary dimensions.

For example,

2. Hadamard product/ pointwise product

3. What is n-skeleton?

4. mathematical conception

Convergent Sequence, 收敛序列

Space, space_conception_2,

The concept of a space is an extremely general and important mathematical construct. Members of the space obey certain addition properties. Spaces which have been investigated and found to be of interest are usually named after one or more of their investigators.

The everyday type of space familiar to most people is called Euclidean space. In Einstein's theory of Special Relativity, Euclidean three-space plus time (the "fourth dimension") are unified into the so-called Minkowski space. One of the most general type of mathematical spaces is the topological space.

Metric Space

A metric space is a set  with a global distance function (the metric ) that, for every two points  in , gives the distance between them as a nonnegative real number . A metric space must also satisfy

1.  iff ,

2. ,

3. The triangle inequality .

Euclidean space:

Euclidean -space, sometimes called Cartesian space or simply -space, is the space of all n-tuples of real numbers, (, ..., ). Such -tuples are sometimes called points, although other nomenclature may be used (see below). The totality of -space is commonly denoted ,.

Topological space:

A topological space, also called an abstract topological space, is a set  together with a collection of open subsets  that satisfies the four conditions:

1. The empty set  is in .

2.  is in .

3. The intersection of a finite number of sets in  is also in .

4. The union of an arbitrary number of sets in  is also in .

 Triangle inequality

Let  and  be vectors. Then the triangle inequality is given by

(1)

Equivalently, for complex numbers  and ,

5. the difference between Euclidean space and Riemannian space

黎曼将二维曲面的球面几何、双曲几何(即罗巴切夫斯基几何)和欧氏几何统一在下述黎曼度规表达式中

这个弧长微分ds表达式中的α,是2维曲面的高斯曲率。当α=+1时,度规所描述的是三角形内角和E大于180°的球面几何;当α=-1时,所描述的是内角和E小于180°的双曲几何;当α=0,则对应于通常的欧几里德几何(图2)。黎曼引入度规的概念,将三种几何统一在一起,使得非欧几何焕发出蓬勃的生机。

Reference

1. Efficient k-nearest neighbor graph construction for generic similarity measures

2. 欧氏空间与黎曼空间

PP: UMAP: uniform manifold approximation and projection for dimension reduction的更多相关文章

  1. 局部敏感哈希-Locality Sensitivity Hashing

    一. 近邻搜索 从这里开始我将会对LSH进行一番长篇大论.因为这只是一篇博文,并不是论文.我觉得一篇好的博文是尽可能让人看懂,它对语言的要求并没有像论文那么严格,因此它可以有更强的表现力. 局部敏感哈 ...

  2. Machine Learning/Random Projection

    这次突然打算写点dimension reduction的东西, 虽然可以从PCA, manifold learning之类的东西开始, 但很难用那些东西说出好玩的东西. 这次选择的是一个不太出名但很有 ...

  3. NEU(Fst Network Embedding Enhancement via High Order Proximity Approximation)

    NEU(Fst Network Embedding Enhancement via High Order Proximity Approximation) NEU:通过对高阶相似性的近似,加持快速网络 ...

  4. PP: Learning representations for time series clustering

    Problem: time series clustering TSC - unsupervised learning/ category information is not available. ...

  5. Computer Graphics Research Software

    Computer Graphics Research Software Helping you avoid re-inventing the wheel since 2009! Last update ...

  6. WikiBooks/Cg Programming

    https://en.wikibooks.org/wiki/Cg_Programming Basics Minimal Shader(about shaders, materials, and gam ...

  7. 降维工具箱drtool

    工具箱下载:http://leelab.googlecode.com/svn/trunk/apps/drtoolbox/ ——————————————————————————————————————— ...

  8. matlab 降维工具 转载【https://blog.csdn.net/tarim/article/details/51253536】

    降维工具箱drtool   这个工具箱的主页如下,现在的最新版本是2013.3.21更新,版本v0.8.1b http://homepage.tudelft.nl/19j49/Matlab_Toolb ...

  9. 斯坦福CS课程列表

    http://exploredegrees.stanford.edu/coursedescriptions/cs/ CS 101. Introduction to Computing Principl ...

随机推荐

  1. docker搭建环境积累

    weblogic12搭建 sudo docker pull ismaleiva90/weblogic12 sudo docker run -d -p : -p : ismaleiva90/weblog ...

  2. 对Linux内核tty设备的一点理解(转)

    虽然一直做嵌入式Linux,宿主机和开发板通信天天都在用tty设备通信,但是其实自己对TTY设备及终端的概念认识几乎是0.对于Linux内核的终端.tty.控制台等概念的认识很模糊.由于在学习的时候碰 ...

  3. CSS3之border-image的使用

    最近,我在项目开发中遇到这样的问题. 要给这个tab的底部的蓝线左右加上圆角. 然而,这个元素实际如上图所示,只是active的时候加了个underline的类,蓝线并没有单独的html. 若给这个s ...

  4. 复习node中加载静态资源--用express+esj

    不做解释,代码一看就懂 app.js import express from 'express' import config from './config' const app = express() ...

  5. Spark学习之路 (十)SparkCore的调优之Shuffle调优[转]

    概述 大多数Spark作业的性能主要就是消耗在了shuffle环节,因为该环节包含了大量的磁盘IO.序列化.网络数据传输等操作.因此,如果要让作业的性能更上一层楼,就有必要对shuffle过程进行调优 ...

  6. Docker最全教程——从理论到实战(十八)

    前言 VS Code是一个年轻的编辑器,但是确实是非常犀利.通过本篇,老司机带你使用VS Code玩转Docker——相信阅读本篇之后,无论是初学者还是老手,都可以非常方便的玩转Docker了!所谓是 ...

  7. Python之xlrd模块读取xls文件与报错解决

    安装 pip3 install xlrd 用法 Sheet编号从0开始 rows,colnum编号均从0开始 合并的单元格仅返回第一格内容 Sheets只能被调用一次,可获取所有sheet取idx 无 ...

  8. 更改pip为豆瓣源加速下载

    需求 Python默认pip下载太慢,更改pip为豆瓣源 文件位置 Git Bash Windows $ vi ~/pip/pip.ini Linux $ vi ~/.pip/pip.conf 内容 ...

  9. centos7最小版配置

    配置启用dns cd /etc/sysconfig/network-scripts/ vi ifcfg-ens33 # 修改ONBOOT为yes ONBOOT=yes 重启系统 reboot 安装ne ...

  10. java - jmm之volatile特性

    volatile是什么? volatile是JVM提供的一种轻量级的同步机制,其具有三个特性. 保证可见性 不保证原子性 禁止指令重排 保证可见性 JMM(java memory model)中文翻译 ...