What is the difference between categorical, ordinal and interval variables?

In talking about variables, sometimes you hear variables being described as categorical (or sometimes nominal), or ordinal, or interval.  Below we will define these terms and explain why they are important.

Categorical

A categorical variable (sometimes called a nominal variable) is one that has two or more categories, but there is no intrinsic ordering to the categories.  For example, gender is a categorical variable having two categories (male and female) and there is no intrinsic ordering to the categories.  Hair color is also a categorical variable having a number of categories (blonde, brown, brunette, red, etc.) and again, there is no agreed way to order these from highest to lowest.  A purely categorical variable is one that simply allows you to assign categories but you cannot clearly order the variables.  If the variable has a clear ordering, then that variable would be an ordinal variable, as described below.

Ordinal

An ordinal variable is similar to a categorical variable.  The difference between the two is that there is a clear ordering of the variables.  For example, suppose you have a variable, economic status, with three categories (low, medium and high).  In addition to being able to classify people into these three categories, you can order the categories as low, medium and high. Now consider a variable like educational experience (with values such as elementary school graduate, high school graduate, some college and college graduate). These also can be ordered as elementary school, high school, some college, and college graduate.  Even though we can order these from lowest to highest, the spacing between the values may not be the same across the levels of the variables.  Say we assign scores 1, 2, 3 and 4 to these four levels of educational experience and we compare the difference in education between categories one and two with the difference in educational experience between categories two and three, or the difference between categories three and four. The difference between categories one and two (elementary and high school) is probably much bigger than the difference between categories two and three (high school and some college).  In this example, we can order the people in level of educational experience but the size of the difference between categories is inconsistent (because the spacing between categories one and two is bigger than categories two and three).  If these categories were equally spaced, then the variable would be an interval variable.

Interval

An interval variable is similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced.  For example, suppose you have a variable such as annual income that is measured in dollars, and we have three people who make $10,000, $15,000 and $20,000. The second person makes $5,000 more than the first person and $5,000 less than the third person, and the size of these intervals  is the same.  If there were two other people who make $90,000 and $95,000, the size of that interval between these two people is also the same ($5,000).

Why does it matter whether a variable is categoricalordinal or interval?

Statistical computations and analyses assume that the variables have a specific levels of measurement.  For example, it would not make sense to compute an average hair color.  An average of a categorical variable does not make much sense because there is no intrinsic ordering of the levels of the categories.  Moreover, if you tried to compute the average of educational experience as defined in the ordinal section above, you would also obtain a nonsensical result.  Because the spacing between the four levels of educational experience is very uneven, the meaning of this average would be very questionable.  In short, an average requires a variable to be interval.  Sometimes you have variables that are "in between" ordinal and interval, for example, a five-point likert scale with values "strongly agree", "agree", "neutral", "disagree" and "strongly disagree".  If we cannot be sure that the intervals between each of these five values are the same, then we would not be able to say that this is an interval variable, but we would say that it is an ordinal variable.  However, in order to be able to use statistics that assume the variable is interval, we will assume that the intervals are equally spaced.

Does it matter if my dependent variable is normally distributed?

When you are doing a t-test or ANOVA, the assumption is that the distribution of the sample means are normally distributed.  One way to guarantee this is for the distribution of the individual observations from the sample to be normal.  However, even if the distribution of the individual observations is not normal, the distribution of the sample means will be normally distributed if your sample size is about 30 or larger.  This is due to the "central limit theorem" that shows that even when a population is non-normally distributed, the distribution of the "sample means" will be normally distributed when the sample size is 30 or more, for example see Central limit theorem demonstration .

If you are doing a regression analysis, then the assumption is that your residuals are normally distributed.  One way to make it very likely to have normal residuals is to have a dependent variable that is normally distributed and predictors that are all normally distributed, however this is not necessary for your residuals to be normally distributed.  You can see Regression with SAS: Chapter 2 - Regression DiagnosticsRegression with SAS: Chapter 2 - Regression Diagnostics, or Regression with SAS: Chapter 2 - Regression Diagnostics

【转】The difference between categorical(Nominal ), ordinal and interval variables的更多相关文章

  1. CATEGORICAL, ORDINAL AND INTERVAL VARIABLES

    WHAT IS THE DIFFERENCE BETWEEN CATEGORICAL, ORDINAL AND INTERVAL VARIABLES? In talking about variabl ...

  2. 关于使用sklearn进行数据预处理 —— 归一化/标准化/正则化

    一.标准化(Z-Score),或者去除均值和方差缩放 公式为:(X-mean)/std  计算时对每个属性/每列分别进行. 将数据按期属性(按列进行)减去其均值,并处以其方差.得到的结果是,对于每个属 ...

  3. Readability Assessment for Text Simplification -paper

    https://pdfs.semanticscholar.org/e43a/3c3c032cf3c70875c4193f8f8818531857b2.pdf 1.introduction在Brazil ...

  4. Omnibus test

    sklearn实战-乳腺癌细胞数据挖掘(博客主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&a ...

  5. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  6. SAS-决策树模型

    决策树是日常建模中使用最普遍的模型之一,在SAS中,除了可以通过EM模块建立决策树模型外,还可以通过SAS代码实现.决策树模型在SAS系统中对应的过程为Proc split或Proc hpsplit, ...

  7. [数据可视化之一]Pandas单变量画图

    Pandas单变量画图 Bar Chat Line Chart Area Chart Histogram df.plot.bar() df.plot.line() df.plot.area() df. ...

  8. Parametric Statistics

    1.What are “Parametric Statistics”? 统计中的参数指的是总体的一个方面,而不是统计中的一个方面,后者指的是样本的一个方面.例如,总体均值是一个参数,而样本均值是一个统 ...

  9. Memcached缓存瓶颈分析

    Memcached缓存瓶颈分析 获取Memcached的统计信息 Shell: # echo "stats" | nc 127.0.0.1 11211 PHP: $mc = new ...

随机推荐

  1. webform组合查询和分页

    1.组合查询(1)数据访问类 //参数1:SQL语句 参数2:哈希表public List<Users> chas(string s,Hashtable has) { List<Us ...

  2. lucene中模糊搜索的应用场景

    模糊搜索:wildCardQuery的时候,是用* 表示全部,?表示一个字符,那么直接用*搜索,就能查到当前索引文件的全部数据个数 这里搜索查到的个数和用工具查到的个数是一致的...

  3. 第2章 C#中的泛型

    2.1 理解泛型2.1.1 为什么要有泛型 并不一定要使用字符T作为类型参数的名称,也可以使用其他的字符,但习惯上使用T. 2.1.2 类型参数约束什么是“向下的强制转换(downcast)”?因为O ...

  4. MySql表名的大小写问题

    MySQL在Linux下数据库名.表名.列名.别名大小写规则是这样的: 1.数据库名与表名是严格区分大小写的: 2.表的别名是严格区分大小写的: 3.列名与列的别名在所有的情况下均是忽略大小写的: 4 ...

  5. 揭开HTTP网络协议神秘面纱系列(二)

    HTTP报文内的HTTP信息 HTTP协议交互的信息被称为HTTP报文,请求端的HTTP报文叫做请求报文,响应端的叫做响应报文. HTTP为了提升传输速率,其在传输数据时,按照数据原样进行压缩传输,相 ...

  6. TF-IDF 相关概念

    概念 TF-IDF是一种统计方法,用以评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度. TF-IDF加权的各种形式常被搜索引擎应用,作为文件与用户查询之间相关程度的度量或评级. 词频( ...

  7. ubuntu工具积累

    1.sudo apt-get install terminator一款可以切分终端窗口的工具 a.在系统>键盘>快捷键修该ctrl+alt+t快捷应用为terminator,其他的快捷键同 ...

  8. Html/Css(新手入门第二篇)

    一.在实际工作中,都是一个团队在做项目,不是一个人在工作.多人协作,就是每个团队都有自己 的命名习惯.1.css选择符命名,规范.2.都有命名规范文档. 二.css选择符作用:指定css样式所作用对象 ...

  9. Linux部署apache

    一.我们使用源码安装 官网:https://httpd.apache.org/文档:https://httpd.apache.org/docs/2.4/ 下载源码包 httpd-2.4.20.tar. ...

  10. eclipse 导入Maven项目的问题

      http://my.oschina.net/wiselyming/blog/164470