This is an introduction to pandas categorical data type, including a short comparison with R’s factor.
Categoricals are a pandas data type, which correspond to categorical variables in statistics: a variable, which can take on only a limited, and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood types, country affiliations, observation time or ratings via Likert scales.

In contrast to statistical categorical variables, categorical data might have an order (e.g. ‘strongly agree’ vs ‘agree’ or ‘first observation’ vs. ‘second observation’), but numerical operations (additions, divisions, ...) are not possible.

All values of categorical data are either in categories or np.nan. Order is defined by the order of categories, not lexical order of the values. Internally, the data structure consists of a categories array and an integer array of codes which point to the real value in the categories array.

The categorical data type is useful in the following cases:

  • A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory, see here.
  • The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order, see here.
  • As a signal to other python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types).

概括:Categorical Data数据类型就类似“性别”、“血型”、“班级”等,只能是一些固定的“值“。Categorical Data可以有不同级别,但是不能用于数值计算。

Categorical Data的更多相关文章

  1. Pandas的Categorical Data

    http://liao.cpython.org/pandas15/ Docs » Pandas的Categorical Data类型 15. Pandas的Categorical Data panda ...

  2. Pandas的Categorical Data类型

    pandas从0.15版开始提供分类数据类型,用于表示统计学里有限且唯一性数据集,例如描述个人信息的性别一般就男和女两个数据常用'm'和'f'来描述,有时也能对应编码映射为0和1.血型A.B.O和AB ...

  3. [论文]A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

    http://www.cnblogs.com/Azhu/p/4137131.html 这篇论文建议先看了上面这一遍,两篇作者是一样的,方法也一样,这一片论文与上面的不同点在于,使用的数据集是目录数据, ...

  4. Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization

    factoextra is an R package making easy to extract and visualize the output of exploratory multivaria ...

  5. pandas入门10分钟——serries其实就是data frame的一列数据

    10 Minutes to pandas This is a short introduction to pandas, geared mainly for new users. You can se ...

  6. MAT 4378 – MAT 5317, Analysis of categorical

    MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 1MAT 4378 – MAT 5317, Analysis of ca ...

  7. Data Visualization and D3.js 笔记(1)

    课程地址: https://classroom.udacity.com/courses/ud507 什么是数据可视化? 高效传达一个故事/概念,探索数据的pattern 通过颜色.尺寸.形式在视觉上表 ...

  8. 第七章 人工智能,7.6 DNN在搜索场景中的应用(作者:仁重)

    7.6 DNN在搜索场景中的应用 1. 背景 搜索排序的特征分大量的使用了LR,GBDT,SVM等模型及其变种.我们主要在特征工程,建模的场景,目标采样等方面做了很细致的工作.但这些模型的瓶颈也非常的 ...

  9. 10分钟学习pandas

    10 Minutes to pandas This is a short introduction to pandas, geared mainly for new users. You can se ...

随机推荐

  1. 清北学堂2019NOIP提高储备营DAY4

    今天只有一上午,讲的东西不多,这里就整理一下高精的东西,数论部分请见my blog 高精度: 先讲一讲进制问题:十进制的二进制表示:以10为例, 10的二进制表示为1010 10的三进制表示为101 ...

  2. (转)flexpaper 参数

    本文转载自:http://blog.csdn.net/z69183787/article/details/18659913 Flexpaper可能用到如下参数   SwfFile (String) 需 ...

  3. 阶段1 语言基础+高级_1-3-Java语言高级_06-File类与IO流_01 File类_7_File类创建删除功能的方法

    createNewFile() createNewFile抛出了异常 抛出了一个IO异常 所有我们调用方法的时候必须处理异常 throws这个异常 返回结果为true 最终创建好的文件 再次执行代码. ...

  4. Week 9 - 638.Shopping Offers - Medium

    638.Shopping Offers - Medium In LeetCode Store, there are some kinds of items to sell. Each item has ...

  5. vi, Java, Ant, Junit自学报告 - 实训week1

    vi, Java, Ant, Junit自学报告 2017软件工程实训 15331023 陈康怡 vi Vi是linux系统的标准文本编辑器,采用指令的方式进行操作,此处仅记录部分常用的指令. vi模 ...

  6. adbl连接不上 daemon not running. starting it now on port 5037 ADB server didn't ACK

     http://blog.csdn.net/prettyice2005/article/details/38682443 adbl连接不上 daemon not running. starting i ...

  7. C语言1-2019级秋季作业第一周作业

    1.你对软件工程专业或者计算机科学与技术专业了解是怎样? 软件工程专业是指对计算机的软件方面灵活掌控,开发软件的工程.软件工程其中会用到计算机科学.数学方面构建模型与算法:软件工程的目标就是开发出能够 ...

  8. linux之shell脚本

    1) 如何向脚本传递参数 ? ./script argument 例子: 显示文件名称脚本 ? 1 2 3 4 ./show.sh file1.txt cat show.sh #!/bin/bash ...

  9. idea 社区版本创建javaweb项目 使用tomcat

    1.创建maven  webapp项目 2.pom文件添加依赖及tomcat7-maven-plugin插件 <dependencies> <dependency> <g ...

  10. CodeForces.1174D.EhabandtheExpectedXORProblem(构造前缀异或和数组)

    题目链接 这道题比赛的时候没做出来,赛后补题的时候发现其实可以构造一个前缀异或和数组,然后根据初始化的第一个值进行填数,但是作为菜鸡的我虽然坚信自己的想法是正确的却想了很久也没有能够构造出来所谓的前缀 ...