# Overview

The goal in measuring central tendency is to describe a group of individual's scores by a single meaurement that is most representative of all the scores.

Central Tendency
Central tendency is a statistical measure that identifies a single score as representative of an entire distribution of scores. The goal of central tendency is to find the single score that is most typical or most representative of the entire distribution.

Unfortunately, there is no single, standard procedure for determining central tendency. The problem is that there is no single measure that will always produce a central, representative value in every situation. For example, where are the "centers" of the following distributions?

Compute
Compute the ViSta visualization and summary statistics for these distributions. Can you get the histograms to look like these figures? They were made with ViSta! Hint: Click on the Histogram window, then use the Histogram menu to change the number of "bins" (bars).

Measures of Central Tendency
There are three main measues of central tendency:
• The Mean (and the Weighted Mean)
• The Median
• The Mode
For the distributions shown above, ViSta reports the following summary statistics:

Note that the Mean and Median values are not equal for all three distributions, and that the mode is not reported.

We discuss these three measures in this lecture.

# The Mean

Arithmetic Average
The mean is commonly known as the arithmetic average.
Level of measurement:
The mean can only be used for variables at the interval or ratio levels of measurement.
Definition:
The mean is the sum all of the scores in the distribution divided by the total number of scores in the distribution:
Example:
The mean of [2 6 2 10] is
(2 + 6 + 2 + 10)/4 = 20/4 = 5
Balance Point
You can think of the mean as the balance point of a distribution (the center of gravity). It balances the distances of observations to the mean.

That is, the sum of the distances that observations are below the mean equals the sum of the distances that observations are above the mean.

In the example,
X = 2 is 3 points below the mean of 5
X = 2 is 3 points below the mean of 5
X = 6 is 1 point above the mean of 5
X = 10 is 5 points above the mean of 5
So the sum of the distances that observations are below the mean is 6, as is the sum of the distances that observations are above the mean.

# The Weighted Mean

Definition:
The weighted mean is the sum all of the scores in the distribution, as weighted by their frequency, divided by the total number of scores in the distribution:
Level of measurement:
The mean can only be used for variables at the interval or ratio levels of measurement.

Example:
Consider the following frequency distribution table:
```_________________
X   f   cf    c%
5   1   20  100%
4   5   19   95%
3   8   14   70%
2   4    6   30%
1   2    2   10%
```
The weighted mean is:
(5*1 + 4*5 + 3*8 + 2*4 + 1*2)
(1 + 5 + 8 + 4 + 2)
which is
(5 + 20 + 24 + 8 + 2)/20 = 59/20 = 2.95

Compute
Compute the ViSta Summary Report for these data. It should look like this:

# The Median

Definition:
The median is the score that divides the distribution of scores exactly in half.

Exactly 50% of the individual scores in the distribution are at or below the median (and 50% are at or above the median).

The median is also the 50th percentile.

Level of measurement:
The median can be used for variables at the ordinal, interval or ratio levels of measurement.
Calculating the Median:
There is no formula to learn here! We follow these rules:
• When N is Odd: Find the middle score.
• When N is Even: Find the middle two scores.
• If the middle score(s) are not tied with other scores: The Median is either the middle score or the average (mean) of the middle two scores.
• If the middle score(s) are tied with other scores: Use interpolation to calculate the 50% percentile. The interpolation process is explained in chapter 2 of the textbook on pages 54-57. See the graphical approach on page 84 of the textbook.
Example:
Consider calculating the median of the scores in the following frequency distribution table:
```_________________
X   f   cf    c%
5   1   20  100%
4   5   19   95%
3   8   14   70%
2   4    6   30%
1   2    2   10%
```
There are 20 scores, so we find the middle two scores. These scores are both a score of 3. Since they are tied (both with themselves and with other scores), the third approach to calculating the median must be used. This involves finding the 50th percentile.

We see that the 50th percentile (the median) is in the interval between the cumulative percentages of 30% and 70%. This interval has real limits of 2.5 and 3.5, respectively.

The 50th percentile is 20 percentile points from the top edge of the interval, which is 40 percentile points wide. Thus, the 50th percentile is 20/40 = 1/2 of the total distance below the upper real limit.

The interval is exactly 1.00 units wide, so the 50th percentile is located at

```[(70%-50%)/(70%-30%)]*1.00 =

(20/40) * 1.00 = .50```
units above the lower limit. Thus, the median for the data in the table above is
`median = 2.5 + .5 = 3.0`
Note that we can see this just by looking at the frequency table: There are 6 scores below the central interval, and 6 above, so the median must be the middle of the central interval, which is just 3!

# The Mode

Definition
The mode is the score or category of scores in a frequency distribution that has the greatest frequency. The mode is used to describe the typical score.
Level of measurement:
The mode can be used for variables at any level of measurement (nominal, ordinal, interval or ratio).
Calculating the Mode:
There is no formula to learn here! The rule is simply to find the score or category with the largest frequency.

Sometimes a distribution has more than one mode. Such a distribution is called multimodal. A distribution with two modes is called bimodal. Note that the modes do not have to have the same frequencies. The tallest peak is called the major mode, other peaks are called minor modes.

Some distributions do not have modes. A rectangular distribution has no mode --- it is flat. Some distribution have many peaks and valleys. They, too, don't have a mode.

# Selecting A Central Tendency Measure

The Mean (weighted mean)
The mean is usually preferred because it uses every score in the distribution. Also, it is closely related to measures of variation we will be talking about next.
The Mode
The mode can be used with weaker levels of measurement (i.e., nominal and ordinal) than the mean.
The Median
The Median should be used, rather than the mean, when:
• There are some extreme scores in the distribution. The mean can be misleading when there are "outlying" scores, whereas the median is not affected by extreme scores. Consider, for example, salery levels.
• Some scores have undetermined values (missing data or data are at limits, such as response times).
• There is an open-ended distribution (such as when one has family sizes of, say "5 or more").
• The data are at the ordinal level of measurement.

# Central Tendency and Distribution Shape

Symmetric Distributions
Symmetric distributions (those where one side is the mirror image of the other) have a mean and median that have the same value. If the distribution is symmetric and unimodal, the mode also has the same value as the mean and median.
Skewed Distributions
Skewed distributions have different values for the mean, median and mode. For unimodal skewed distributions, the mean is pulled toward the tail, and the median is between the mean and mode.

Check out the HyperStat site. Pay particular attention to the first two chapters, especially the one on Describing Univariate Data.

Use the Histogram Explorer to get a better understanding of histograms and distributions. Follow the Basic Instructions given there. Use the Practice Guessing.

Check out the HyperStat site. Pay particular attention to the first two chapters, especially the one on Describing Univariate Data.

Use the Histogram Explorer to get a better understanding of histograms and distributions. Follow the Basic Instructions given there. Use the Practice Guessing.

Try this interactive demonstration of the median as the minimizer of the sum of absolute errors and the mean as the minimizer of the sum of squared errors.