Descriptive Statistics

Statistics are used by researchers to describe data and relationships among data. For example, if one asks how a class of students did on a test, it would be inefficient to communicate each and every score. Rather, a more efficient method is to provide the average to give a general indication of how the students performed on the test. Below are various methods of describing data (Descriptive Statistics) and of modeling relationships among variables (Inferential Statistics).

Descriptive statistics are used to describe data in a concise, understandable way. Descriptive statistics are summary indicators of larger groups of data. The example above illustrates how descriptive statistics may be used to reduce large amounts of information into a few summary indicators--thus reducing class scores to a class average. Two important summary methods for data are measures of central tendency (typical or average scores) and measures of dispersion (variability or spread of scores).

Measures of central tendency are indicators of average or typical scores one might find in a distribution of scores. The three most common measures of central tendency are mode, median, and mean.

(a) Mode (Symbolized as Mo): This is the most frequent score in a distribution. In the follow set of scores, there are two scores of 1, three scores of 2, five scores of 3, one score of 4, and four scores of 5. The mode is for this set is 3:

If in a class there are 7 male and 13 female students, then for the variable sex, females would be the mode.

(b) Median (Symbolized as Md, Mdn, or X₅₀): The score directly in the middle for all scores in rank order; the point at which 50% of the scores are above, and 50% are below. For example, of the following scores, 5, 3, 7, 6, 9, 1, 4, the median is:

the score that falls in the middle of the distribution, which is five in this example.

If one has an even number of scores, the median is the mean (arithmetic average) of the two middle scores. For these scores, 2, 1, 3, 10, the median is

so 2.5 is the median--exactly 50% of the scores fall below this and 50% above this score.

This median is a good measure for ordinal data, or interval/ratio data when the distribution is highly skewed (e.g., income in U.S. is positively skewed, so use Mdn). Skew means that there are a few very high scores or a few very low scores and these extreme scores often affect the mean.

where Xi represents the raw scores, n is the sample size (the number of scores), and S means to sum the scores (to add all the scores together). In words, the mean is simply the sum of all scores divided by the number of scores. For example, for this set of scores (1, 2, 3, 10) the mean is:

This measure is best used for ratio or interval data, but is often okay with ordinal data. It is not appropriate for nominal data since the mean assumes rank and nominal data do not have rank.

For the scores given in the frequency display above (see Table 1: Frequency Distribution for Test Scores), the mean, median, and mode are:

A measure of variability provides some indication of the dispersion or spread of scores in a distribution. Note that central tendency indicates typical or average scores, and variability indicates spread of scores. For example, consider the following two sets of scores:

Both sets have the same mean and median (M = 80, Mdn = 80), yet they have very different spread or dispersion. Set A has no variability at all, while Set B has much variability (no two scores are the same).

There are several measures of variability that help to show differences in variability like that found in Sets A and B above. Several of these measures are provided below.

The range is the quickest and easiest measure of dispersion to calculate. The formula is simply the difference between the largest and smallest scores in the distribution of scores, i.e.,

For example, with Set B, Xmax = 100, Xmin = 60, so the range is R = 100 - 60 = 40, or a 40 point spread. The range for Set A is R = 80 - 80 = 0, or no spread.

The problem with the range is that it only considers two numbers in the distribution, the highest and lowest score. Does the range adequately address variability for the following two sets?

Note that for both Sets C and D, the range is R = 80 - 70 = 10, or 10 points, yet the numbers suggest that Set D has more variability because no two numbers are the same while in Set C, there are only three unique numbers, 70, 75, and 80.

What is needed is a measure of variability that takes into account all numbers in the data, not just the two extreme numbers.

The standard deviation is more complex than the range and it provides a more useful indication of variability in a set of scores. The formula will not be discussed, but you should note that the standard deviation, like the range, cannot be less than zero (i.e., 0.00), and the larger the standard deviation, the greater the variability in a set of scores.

Using Sets C and D from above, the standard deviations are SD = 3.16 for Set C, and SD = 3.74 for Set D. Set D has the larger standard deviation and this indicates that scores in Set D have more variability that scores in Set C.

As another example, listed below are two sets of scores with the same measures of central tendency, but with different measures of variability.

As this table shows, the measures of central tendency are identical for both boys and girls, but vary dramatically in terms of measures of variability.

For this course one does not need to know how to calculate SD, but do know that when two sets of scores taken from the same measuring device (e.g., SAT scores) are compared, the larger the SD the more variability. Also know that SDs from differing measuring devices (e.g., IQ score compared to SAT scores) are not comparable. In order for one to be able to compare the SD between sets of scores to learn which has more variability, both sets of scores must be from the same measuring device. With IQ scores, the practical range falls between 50 and 150 points, while for a sub-scale on SAT scores (such as verbal SAT) the range falls between 200 and 800 points. As can be seen from these ranges, these two instruments (IQ and SAT) provide great differences in possible scores and are therefore not directly comparable. Thus, for IQ scores a SD = 15 may show just as much variation as a SD = 200 for SAT scores.

If we use range as a measure of variability, the two sets have the same range: Set E range = 5 - 1 = 4 and Set F range = 5 - 1 = 4. We know, however, that the scores appear to show more variability in set F than set E because three of the scores for set E are exactly the same while for set F all scores are different. Here is where SD helps show the greater variability for set F. To show this, below are worked tables showing calculation of SD for each set.

Next step is to divide the this sum by n - 1 (sample size minus 1). Here the sample size is 5 observations, so n - 1 = 5 - 1 = 4. Dividing 8 / 4 = 2.

This value of 2 is called the variance, a measure of variability that is important in advance statistics.

To obtain SD, take square root of variance, thus SD = √ 2 = 1.41, so the SD for set E = 1.41.

Next step is to divide the this sum by n - 1 (sample size minus 1). Here the sample size is 5 observations, so n - 1 = 5 - 1 = 4. Dividing 10 / 4 = 2.5.

To obtain SD, take square root of variance, thus SD = √ 2.5 = 1.58, so the SD for set F = 1.58.

These results demonstrate that the SD is a more sensitive measure of variability than range since it takes into all available scores, not just the maximum and minimum scores. Set F has more variability than set E.

Relative position refers to the location in a distribution of a given score relative to other scores. Relative position indicates how well one performed on a test relative to others.

The only measure of relative position discussed here will be Percentile Rank (PR). A percentile rank indicates the proportion or percentage of individuals who scored less than a given score. For example, if you receive a PR of 75, then this means that 75% of those who took the test scored less than you. It does not mean that you got 75% of the items correct on the test. If you had a PR of 4, this means you scored better than 4% of those who took the test.

Note also that some define percentile rank as representing the percentage who scored at or below a given score. Thus, a PR of 75 means that one scored the same as or better than 75% of test takers.

Both ways of defining percentile rank ([a] score better than or [b] scored equal to or better than) are used an commonly found in education.

Another method for presenting and describing data in an efficient manner is through the use of graphs. Graphs are frequently used to display data. Graphs provide pictorial displays that enable one to more readily understand the distribution of scores, etc. A few commonly used graphs will be described below.

Frequency distributions are used to indicate how many times a particular value was obtained in a set of scores. For example, consider the following scores obtained on a test:

X (raw score)	F (frequency of score)
88	2
87	1
86	1
85	3
84	1
83	2

As this frequency distribution shows, the most common score was 85, and the least frequent scores were 84, 86, and 87. Frequency distributions will work with data from any type of variable (nominal, ordinal, interval, or ratio). To illustrate, one could make a frequency distribution for the sexes enrolled in a course. Suppose there are 11 women and 5 men, the frequency distribution would be:

As another example, consider these final course average grades in introductory educational research.

The statistical program Stata (www.stata.com) was used to create these following displays.

While this frequency display shows all scores, it may still be easier to comprehend these scores if they are grouped by letter grade ranges, thus 60 to 69 is D, 70 to 79 is C, and so on. This grouped frequency display allows one to quickly determine performance levels of the class by grade ranges. For example, 5 students earned A's (5 students in the average grade range of 90 to 99), 10 students earned B's, etc.

As this grouped frequency display shows, it is now easier to see student performance by general grades. About 37% earned B, 18% earned A, and the rest earned C and D.

Similar to frequency displays are stem-and-leaf displays. These displays show frequency by building leaves from the raw scores. The longer the leaf, the greater the frequency associated with that set of scores. Reading the display below, note that one student scored a 64, which is displayed as 6*| 4, and three students scored between 65 and 69. Their specific scores were 65, 67, and 69, and these are displayed as 6. | 579.

This version shows scores grouped by letter grade ranges (e.g., 60 to 69, 70 to 79, etc.).

Other commonly used graphical tools are bar charts and histograms. Both are similar; the only difference is that a bar chart is used for qualitative data (so the bars usually do not touch, thus indicating a lack of continuity), while the histogram is used with quantitative data (the bars may touch). Examples of both are provided below.

Below is another example of bar chart showing sex distribution for educational research.

For comparison purposes, these data data are shown as a frequency display in the figure below.

Similar to bar charts and stem-and-leaf displays, histograms may be used to show frequency information for quantitative variables. The primary difference between histograms and bar charts is that histograms are designed for quantitative data so the bars are allowed to touch when consecutive scores are presented (although folks sometimes don't create histograms with touching bars). When gaps are present between bar in a histogram, that signals a frequency of zero for that particular score. Below is an example of a histogram for student grades. Smooth histograms are often used to present distributional shapes such as normal, F, t, chi-square, etc.

Pie charts are another commonly used tool to display distributions. Pie charts can be used to display both qualitative and quantitative data. The first example below shows course letter grade distribution and the second shows sex distribution.

These graphs are designed to show performance ranges for groups and several summary indicators of data.

For the example given in the figure above, females (indicated by the left box and whisker), the bottom of the box shows the score at the 25^th percentile (symbolized as P₂₅ [for percentile 25] or Q₁, which is roughly equivalent to a score of 74 in this sample); the top of the box is the 75^th percentile (P₇₅ or Q₃, a score in this sample of about 88); and the thick line in the middle of the box represents the median (50^th percentile, P₅₀ or Q₂). Note that the box is designed to describe the middle 50% of scores in the distribution.

Comparing the two boxes, we can see that males have typical performance that is below that of comparable females. The median (50th percentile) shows a score of less than 80 while for females it is greater than 80. Similar interpretations exist for the other percentile markers.

The whiskers extending from the box may represent several different things depending upon how they are implemented for given software. For the example listed above, the whiskers appear to show the upper and lower range for the sample of scores. The the bottom whisker shows the lower range for the distribution of scores (a lower score of about 64); and the top whisker shows the upper range for the distribution of scores (a top score of about 96).

In some software applications, whiskers extend to P₁₀ and P₉₀, and any scores beyond this range are represented as dots. The second box plot below illustrates a score, denoted by the black dot, that extends below the range of P₁₀. This score would be considered an outlier (an extreme score that deviates from other scores).

A time-series graph displays figures over time. Often time-series graphs are used on single subject research. For more examples, see the section Single Subject Research.

These graphs are useful for displaying the nature of relation between two quantitative variables. As the first scatterplot shows, there is a positive, linear trend between scores from tests 1 and 2 in educational research during the summer of 2003. Students who did well on test 1 tended also to perform well on test 2; similarly, those who performed poorly on test 1 also tended to perform poorly on test 2. There are, however, several exceptions to this trend. Note the dot highlighted by the arrow. This student scored just under 60 for test 1, but scored over 80 for test 2.

The next scatterplot, displayed below, shows information pertaining to student performance on a test in educational research. The two variables considered are the average number of seconds spent per item completing the test and test score. The scatter of data to the right of the graph shows a slight positive relation between time spent on items and test score. Generally speaking, those students who spent more time per item tended to perform better on the test, although this pattern is not strong. Most students took between 120 and 195 seconds to answer each item (that's 2 to 3.5 minutes per item). There is one very clear exception to these data and that exception is symbolized by a student who spent an average of 38 seconds per item and scored 98% correct on this test. This student's performance represents what is known as an outlier, an observation that is clearly discrepant from other observations (data) in the distribution of scores.

The next scatterplot displays data from an agricultural experiment in which grapefruit were treated with two types of fungicides and with varying amounts of the active ingredient (copper). The outcome of interest is the severity of the infection on grapefruit. The line in the graph represents a prediction line and can be used to estimate the change in severity of infection according to differing amounts of copper used.

Davis, N.F. (1990). The Reynolds Adolescent Depression Scale. Measurement and Evaluation in Counseling and Development, 23.

Dover, A., & Shore, B. (1991). Giftedness and flexibility on a mathematical set-breaking task. Gifted Child Quarterly, 35.

Fuchs, L.S., Fuchs, D., Karns, K., Hamlett, C.L., Dutka, S., and Katzaroff, M. (1996). The relation between student ability and the quality and effectiveness of explanations. American Educational Research Journal, 33, pp. 631-664.

Goodenow, C. (1993). The psychological sense of school membership among adolescents: Scale development and educational correlates. Psychology in the Schools, 30.

Martin. C.L. (1990). An empirical investigation of employee behaviors and customer perceptions. Journal of Sport Management, 4.

Margalit, M., Ankonina, D., & Avraham, Y. (1991). Community support in Israeli Kibbutz and city families of children with disabilities: Family climate and parental coherence. Journal of Special Education, 24.

Reynolds, J.R., Kunce, J.T., & Cope, C.S. (1991). Personality differences of first-time and repeat offenders arrested for driving while intoxicated. Journal of Counseling Psychology, 38.

Woznica, J.G. (1990). Delay of gratification in bulimic and restricting anorexia nervosa patients. Journal of Clinical Psychology, 46.

Scores (X)	Mean (M)	Score - Mean (X - M); these are called deviation scores	Squared deviation scores: (X - M)²
1	3	1 - 3 = -2	-2 * -2 = 4
3	3	3 - 3 = 0	0 * 0 = 0
3	3	3 - 3 = 0	0 * 0 = 0
3	3	3 - 3 = 0	0 * 0 = 0
5	3	5 - 3 = 2	2 * 2 = 4

Scores (X)	Mean (M)	Score - Mean (X - M); these are called deviation scores	Squared deviation scores: (X - M)²
1	3	1 - 3 = -2	-2 * -2 = 4
2	3	2 - 3 = -1	-1 * -1 = 1
3	3	3 - 3 = 0	0 * 0 = 0
4	3	4 - 3 = 1	1 * 1 = 1
5	3	5 - 3 = 2	2 * 2 = 4

Scores	Range (R)	Standard Deviation (SD)
Set E: 1, 3, 3, 3, 5	4	1.41
Set F: 1, 2, 3, 4, 5	4	1.58

Sex	Frequency
Female	11
Male	5

Girls		Boys
83		75
84		80
85		85
85		85
85		85
86		90
87		95
M	85	85
Mdn	85	85
Mo	85	85
SD	1.29	6.45
R	4	20