Dimensionality and high dimensional data: definition, examples, curse of..
Dimensionality in statistics refers to how many attributes a dataset has. For example, healthcare data is notorious for having vast amounts of variables (e.g. blood pressure, weight, cholesterol level). In an ideal world, this data could be represented in a spreadsheet, with one column representing each dimension. In practice, this is difficult to do, in part because many variables are inter-related (like weight and blood pressure).
Note: Dimensionality means something slightly different in other areas of mathematics and science. For example, in physics, dimensionality can usually be expressed in terms of fundamental dimensions like mass, time, or length. Inmatrix algebra, two units of measure have the same dimensionality if both statements are true:
- A function exists that maps one variable onto another variable.
- The inverse of the function in (1) does the reverse.
High Dimensional Data
High Dimensional means that the number of dimensions is staggeringly惊人地 high — so high that calculations become extremely difficult. With high dimensional data, the number of features can exceed the number of observations. For example, microarrays, which measure gene expression, can contain tens of hundreds of samples. Each sample can contain tens of thousands of genes.
1. What is the dimension of time series.
Classification of time series is a somewhat tricky matter. Most classification algorithms have an implicit assumption that the data you are classifying are stationary, and they usually work in vector spaces.
So there are two "things" that can be multidimensional here: your original time series and the result of your preprocessing before feeding data to a classifier.
Supplementary knowledge:
1. downsample.降采样
2. curse of dimensionality维度灾难
当维数提高时,空间的体积提高太快,因而可用数据变得很稀疏。稀疏性对于任何要求有统计学意义的方法而言都是一个问题,为了获得在统计学上正确并且有可靠的结果,用来支撑这一结果所需要的数据量通常随着维数的提高而呈指数级增长。
3. 缩写iid: independent and identically distributed random variables. 独立同分布.
Reference:
2. What is meant by 'high dimensional' time series?
3. 万物皆Embedding,从经典的word2vec到深度学习基本操作item2vec
Dimensionality and high dimensional data: definition, examples, curse of..的更多相关文章
- CREATE TABLE——数据定义语言 (Data Definition Language, DDL)
Sql语句分为三大类: 数据定义语言,负责创建,修改,删除表,索引和视图等对象: 数据操作语言,负责数据库中数据的插入,查询,删除等操作: 数据控制语言,用来授予和撤销用户权限. 数据定义语言 (Da ...
- How to Delete XML Publisher Data Definition Template
DECLARE -- Change the following two parameters VAR_TEMPLATECODE VARCHAR2(100) := 'CUX_CHANGE_RPT1 ...
- Hive 5、Hive 的数据类型 和 DDL Data Definition Language)
官方帮助文档:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL Hive的数据类型 -- 扩展数据类型data_t ...
- sql基础之DDL(Data Definition Languages)
好久没写SQL语句了,复习一下. DDL数据定义语言,DDL定义了不同的数据段.数据库.表.列.索引等数据库对象的定义.经常使用的DDL语句包含create.drop.alter等等. 登录数据:my ...
- 02-2--数据库MySQL:DDL(Data Definition Language:数据库定义语言)操作数据库中的表(二)
DDL对数据库的操作:http://blog.csdn.net/baidu_37107022/article/details/72334560 DDL对数据库中表的操作 1)方法概览 2)演示 //创 ...
- 数据定义语言(DDL Data Definition Language)基础学习笔记
创建数据库 create database if not exists STUDY character set utf8 ; 查看新建数据库的语句 SHOW CREATE DATABASE STUDY ...
- MySQL中的DDL(Data Definition Language,数据定义语言)
create(创建表) 标准的建表语句: create table [模式名.]表名 ( #可以有多个列定义 columnName1 dataType [default expr(这是默认值)], . ...
- mysql数据库-mysql数据定义语言DDL (Data Definition Language)归类(六)
0x01 创建数据库并指定字符集和排序规则 -- 三种实例写法 create database temptab2 character set utf8 collate utf8_general_ci; ...
- Seven Techniques for Data Dimensionality Reduction
Seven Techniques for Data Dimensionality Reduction Seven Techniques for Data Dimensionality Reductio ...
随机推荐
- 首次使用Lambda表达式-sunziren
需要将List<Apple> list = new ArrayList<Apple>(); 按照Apple对象中的price属性从大到小排序. 第一个念头闪过的是冒泡排序,转念 ...
- windows 2008r2+php5.6.28搭建详细过程
安装IIS7 1.打开服务器管理器(开始-计算机-右键-管理-也可以打开),添加角色 直接下一步 勾选Web服务器(IIS),下一步,有个注意事项继续下一步(这里我就不截图了) 勾选ASP.NET会弹 ...
- 分库分表技术演进&最佳实践
每个优秀的程序员和架构师都应该掌握分库分表,这是我的观点. 移动互联网时代,海量的用户每天产生海量的数量,比如: 用户表 订单表 交易流水表 以支付宝用户为例,8亿:微信用户更是10亿.订单表更夸张, ...
- PHP中根据二维数组中某个字段实现排序
想要实现二维数组中根据某个字段排序,一般可以通过数组循环对比的方式实现.这里介绍一种更简单的方法,直接通过PHP函数实现.array_multisort() :可以用来一次对多个数组进行排序,或者根据 ...
- [译]C# 7系列,Part 10: Span<T> and universal memory management Span<T>和统一内存管理
原文:https://blogs.msdn.microsoft.com/mazhou/2018/03/25/c-7-series-part-10-spant-and-universal-memory- ...
- MVC开发模式以及Smarty模板引擎的使用
Linux 全局安装 composer 将目录切换到/usr/local/bin/目录 cd /usr/local/bin/ 在 bin 目录中下载 composer curl -sS https:/ ...
- JS:获取标签的6个方法+获取html+获取body
JS 获取标签的方法 通过class: document.getElementsByClassName('class名');返回数组通过name: document.getElementsByName ...
- PAT (Advanced Level) Practice 1005 Spell It Right (20 分) (switch)
Given a non-negative integer N, your task is to compute the sum of all the digits of N, and output e ...
- Linux connect: Network is unreachable
在虚拟机中ping,发现网络不通: [root@node01 ~]# ping 114.114.114.114 connect: Network is unreachable 发生此问题时,环境如下: ...
- arm的字节对齐问题总结(转)
问题由来:pc的lsb总是0,因为代码至少要字对齐.cm3的指令至少是半字对齐的(16) 一.啥是字对齐?为啥要字对齐? 现代计算机中内存空间都是按照byte划分的,从理论上讲似乎对任何类型的变量的访 ...