通常数据测量水平(Measurement Level)可以分为四类:类别尺度、顺序尺度、间距尺度和比例尺度。
其中前两种尺度的数据又称为定性数据,后两种尺度的数据又称为定量数据。
图片来自:https://github.com/antvis/g2/wiki/g2-data
翻译自:http://www.stat.wmich.edu/s216/book/node4.html
Measurement Levels of DataIt is useful to distinguish between four levels of measurements for data, from weakest to strongest:
1. Nominal (no ordering)
2. Ordinal (ordering exists, but not distance)
3. Interval (distance exists, but not ratios)
4. Ratio (ratios exist) Sex is a nominal variable, since `Male' and `Female' are just names of categories. There is no intrinsic ordering between them. A student's level of standing (freshman, sophomore, junior, or senior) is ordinal; they are also names of categories but, unlike sex, they are rank-ordered. However, subtraction cannot be done and distances do not make sense. GPA is an interval measurement; subtraction can be done and distances make sense. For example, the distance from 2.3-2.4 is the same distance as 3.7-3.8. However, ratios do not make sense; is 4.0 `twice as high' as 2.0? The answer is no. The grading system would work just as well on the scale (A, B, C)=(5.0, 4.0, 3.0) instead of (4.0, 3.0, 2.0). Finally, number of credit hours is a ratio measurement. A student who has completed 90 credit hours has TWICE as many as 45 credit hours, and 3 times as many as 30 credit hours It is useful to recognize a hierarchy of information in the sense that a measurement level contains an amount of information greater than or equal to the level below it. At lower levels of measurement, data analyses tend to be less sensitive and sophisticated. A statistical study should aim for the highest levels of measurement possible or affordable. Interval and ratio variables together are often called numerical variables because they provide a number which measures `quantity' (how much, how many) of something. Nominal and ordinal variables together are often called categorical variables because they classify into categories rather then count or measure. It is tempting to think of categorical variables as `non-numerical' but sometimes they do consist of numbers. For example, `social security number' consists of numbers, but are used more as labels rather than quantities.
|