Categorical Data
This is an introduction to pandas categorical data type, including a short comparison with R’s factor.
Categoricals are a pandas data type, which correspond to categorical variables in statistics: a variable, which can take on only a limited, and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood types, country affiliations, observation time or ratings via Likert scales.
In contrast to statistical categorical variables, categorical data might have an order (e.g. ‘strongly agree’ vs ‘agree’ or ‘first observation’ vs. ‘second observation’), but numerical operations (additions, divisions, ...) are not possible.
All values of categorical data are either in categories or np.nan. Order is defined by the order of categories, not lexical order of the values. Internally, the data structure consists of a categories array and an integer array of codes which point to the real value in the categories array.
The categorical data type is useful in the following cases:
- A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory, see here.
- The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order, see here.
- As a signal to other python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types).
概括:Categorical Data数据类型就类似“性别”、“血型”、“班级”等,只能是一些固定的“值“。Categorical Data可以有不同级别,但是不能用于数值计算。
Categorical Data的更多相关文章
- Pandas的Categorical Data
http://liao.cpython.org/pandas15/ Docs » Pandas的Categorical Data类型 15. Pandas的Categorical Data panda ...
- Pandas的Categorical Data类型
pandas从0.15版开始提供分类数据类型,用于表示统计学里有限且唯一性数据集,例如描述个人信息的性别一般就男和女两个数据常用'm'和'f'来描述,有时也能对应编码映射为0和1.血型A.B.O和AB ...
- [论文]A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
http://www.cnblogs.com/Azhu/p/4137131.html 这篇论文建议先看了上面这一遍,两篇作者是一样的,方法也一样,这一片论文与上面的不同点在于,使用的数据集是目录数据, ...
- Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization
factoextra is an R package making easy to extract and visualize the output of exploratory multivaria ...
- pandas入门10分钟——serries其实就是data frame的一列数据
10 Minutes to pandas This is a short introduction to pandas, geared mainly for new users. You can se ...
- MAT 4378 – MAT 5317, Analysis of categorical
MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 1MAT 4378 – MAT 5317, Analysis of ca ...
- Data Visualization and D3.js 笔记(1)
课程地址: https://classroom.udacity.com/courses/ud507 什么是数据可视化? 高效传达一个故事/概念,探索数据的pattern 通过颜色.尺寸.形式在视觉上表 ...
- 第七章 人工智能,7.6 DNN在搜索场景中的应用(作者:仁重)
7.6 DNN在搜索场景中的应用 1. 背景 搜索排序的特征分大量的使用了LR,GBDT,SVM等模型及其变种.我们主要在特征工程,建模的场景,目标采样等方面做了很细致的工作.但这些模型的瓶颈也非常的 ...
- 10分钟学习pandas
10 Minutes to pandas This is a short introduction to pandas, geared mainly for new users. You can se ...
随机推荐
- Retrofit RestAdapter 配置说明
RestAdapter.Builder builder = new RestAdapter.Builder(); builder.setEndpoint(ip地址 ...
- 用Vue来实现音乐播放器(十五):处理得到的歌手数据
之前得到的歌手数据是用forEach遍历添加的 没有顺序性 我们希望得到的数据是title:"热门"的数据在最上面 title为字母的数据按字母从低到高顺序排列 var ho ...
- 解决ubuntu18.04使用vi编辑器方向键错乱
1.编辑 vimrc.tiny 文件 vi /etc/vim/vimrc.tiny 2.修改下述内容 修改 set compatible 为 set nocompatible 添加 set backs ...
- error: exportArchive: You don’t have permission to save the file “HelloWorld.ipa” in the folder “HelloWorld”.
成功clean环境和生成archive文件之后,最后一步导出ipa包,遇到了权限问题: you don’t have permission to save the file “HelloWorld.i ...
- c# Thread——1.为什么Abort中断线程是不可靠的
Thread.Abort 方法在c#中用作强制中断线程的执行,大多用于线程内部满足某个特定条件而自己调用关闭自身,比如下面的代码在i自增到3的时候就会停止打印. class Program { sta ...
- MySQL-mysql 8.0.11安装教程 windows
网上的教程有很多,基本上大同小异.但是安装软件有时就可能因为一个细节安装失败.我也是综合了很多个教程才安装好的,所以本教程可能也不是普遍适合的. 安装环境:win7 1.下载zip安装包: MySQL ...
- Policy Improvement and Policy Iteration
From the last post, we know how to evaluate a policy. But that's not enough, because the purpose of ...
- Linux apt-get命令的基本使用
学习笔记,如有侵权,立即删除! 什么是apt-get ? Ubuntu源自Debian Linux.Debian使用dpkg打包系统.包装系统是一种为安装提供程序和应用程序的方法.这样,您就不必从源代 ...
- “希希敬敬对”队软件工程第九次作业-beta冲刺第一次随笔
队名: “希希敬敬对” 龙江腾(队长) 201810775001 杨希 201810812008 何敬上 201810812004 今日讨论会议照片一张: 每个人 ...
- 多线程07-Monitor.TryEnter
); Console.WriteLine())) { Console.WriteLine ...