Dimensionality and high dimensional data: definition, examples, curse of..
Dimensionality in statistics refers to how many attributes a dataset has. For example, healthcare data is notorious for having vast amounts of variables (e.g. blood pressure, weight, cholesterol level). In an ideal world, this data could be represented in a spreadsheet, with one column representing each dimension. In practice, this is difficult to do, in part because many variables are inter-related (like weight and blood pressure).
Note: Dimensionality means something slightly different in other areas of mathematics and science. For example, in physics, dimensionality can usually be expressed in terms of fundamental dimensions like mass, time, or length. Inmatrix algebra, two units of measure have the same dimensionality if both statements are true:
- A function exists that maps one variable onto another variable.
- The inverse of the function in (1) does the reverse.
High Dimensional Data
High Dimensional means that the number of dimensions is staggeringly惊人地 high — so high that calculations become extremely difficult. With high dimensional data, the number of features can exceed the number of observations. For example, microarrays, which measure gene expression, can contain tens of hundreds of samples. Each sample can contain tens of thousands of genes.
1. What is the dimension of time series.
Classification of time series is a somewhat tricky matter. Most classification algorithms have an implicit assumption that the data you are classifying are stationary, and they usually work in vector spaces.
So there are two "things" that can be multidimensional here: your original time series and the result of your preprocessing before feeding data to a classifier.
Supplementary knowledge:
1. downsample.降采样
2. curse of dimensionality维度灾难
当维数提高时,空间的体积提高太快,因而可用数据变得很稀疏。稀疏性对于任何要求有统计学意义的方法而言都是一个问题,为了获得在统计学上正确并且有可靠的结果,用来支撑这一结果所需要的数据量通常随着维数的提高而呈指数级增长。
3. 缩写iid: independent and identically distributed random variables. 独立同分布.
Reference:
2. What is meant by 'high dimensional' time series?
3. 万物皆Embedding,从经典的word2vec到深度学习基本操作item2vec
Dimensionality and high dimensional data: definition, examples, curse of..的更多相关文章
- CREATE TABLE——数据定义语言 (Data Definition Language, DDL)
Sql语句分为三大类: 数据定义语言,负责创建,修改,删除表,索引和视图等对象: 数据操作语言,负责数据库中数据的插入,查询,删除等操作: 数据控制语言,用来授予和撤销用户权限. 数据定义语言 (Da ...
- How to Delete XML Publisher Data Definition Template
DECLARE -- Change the following two parameters VAR_TEMPLATECODE VARCHAR2(100) := 'CUX_CHANGE_RPT1 ...
- Hive 5、Hive 的数据类型 和 DDL Data Definition Language)
官方帮助文档:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL Hive的数据类型 -- 扩展数据类型data_t ...
- sql基础之DDL(Data Definition Languages)
好久没写SQL语句了,复习一下. DDL数据定义语言,DDL定义了不同的数据段.数据库.表.列.索引等数据库对象的定义.经常使用的DDL语句包含create.drop.alter等等. 登录数据:my ...
- 02-2--数据库MySQL:DDL(Data Definition Language:数据库定义语言)操作数据库中的表(二)
DDL对数据库的操作:http://blog.csdn.net/baidu_37107022/article/details/72334560 DDL对数据库中表的操作 1)方法概览 2)演示 //创 ...
- 数据定义语言(DDL Data Definition Language)基础学习笔记
创建数据库 create database if not exists STUDY character set utf8 ; 查看新建数据库的语句 SHOW CREATE DATABASE STUDY ...
- MySQL中的DDL(Data Definition Language,数据定义语言)
create(创建表) 标准的建表语句: create table [模式名.]表名 ( #可以有多个列定义 columnName1 dataType [default expr(这是默认值)], . ...
- mysql数据库-mysql数据定义语言DDL (Data Definition Language)归类(六)
0x01 创建数据库并指定字符集和排序规则 -- 三种实例写法 create database temptab2 character set utf8 collate utf8_general_ci; ...
- Seven Techniques for Data Dimensionality Reduction
Seven Techniques for Data Dimensionality Reduction Seven Techniques for Data Dimensionality Reductio ...
随机推荐
- 用sort实现对struct的排序
用sort 排序 struct +++ //method 1 struct node{ int k,s; }p[5005]; bool cmp1(node x,node y){ return x.s& ...
- Vue中echarts的使用
1.安装 npm install echarts --save 2. 导入并挂载 <template> <!-- 1. 为ECharts准备一个具备大小(宽高)的Dom --&g ...
- 设置datagridview隔行变色
/// <summary> /// 设置datagridview隔行变色 /// </summary> /// <param name="e"> ...
- 手写mybatis框架笔记
MyBatis 手写MyBatis流程 架构流程图 封装数据 封装到Configuration中 1.封装全局配置文件,包含数据库连接信息和mappers信息 2.封装*mapper.xml映射文件 ...
- P3292 [SCOI2016]幸运数字 [线性基+倍增]
线性基+倍增 // by Isaunoya #include <bits/stdc++.h> using namespace std; #define rep(i, x, y) for ( ...
- 【1】Logistic回归
Logistic回归 在Logistic回归中,损失函数L定义为 成本函数 J 损失函数是单个训练样本的误差,而成本函数是所有训练样本误差的平均值. 之所以选择这个损失函数,是因为该损失函数L与w ...
- Docker常用命令和功能介绍
可以搜索 dockerfile 定制创建一个redis镜像image 表示镜像docker search 搜索镜像的名称和标签docker 所在目录/var/lib/dockerdocker的镜像文件 ...
- pythonCSV内置模块应用
一.Python内置模块CSV CSV,即逗号分隔值(也称字符分隔值,因为分隔符可以不是逗号),是一种常用的文本格式,用以存储表格数据,包括数字或者字符.如下图所示: CSV类似于Excel格式 很多 ...
- 解决linux 终端UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: invalid continuation byte
vi /etc/locale.conf 修改LANG="zh_CN.gbk" 最后执行source /etc/locale.conf 即可永久生效,下次登录,中文就不会乱码了.
- JAVA并发同步互斥实现方式总结
大家都知道加锁是用来在并发情况防止同一个资源被多方抢占的有效手段,加锁其实就是同步互斥(或称独占)也行,即:同一时间不论有多少并发请求,只有一个能处理,其余要么排队等待,要么放弃执行.关于锁的实现网上 ...