-
- Overview
The goal in measuring central tendency is to describe a group
of individual's scores by a single meaurement that is most
representative of all the scores.
- Central Tendency
- Central tendency is a statistical measure that identifies
a single score as representative of an entire distribution
of scores. The goal of central tendency is to find the
single score that is most typical or most representative
of the entire distribution.
Unfortunately, there is no single, standard procedure
for determining central tendency. The problem is that
there is no single measure that will always produce
a central, representative value in every situation.
For example, where are the "centers" of the following
distributions?
- Compute
- Compute the ViSta visualization and summary statistics
for these distributions. Can you get the histograms to
look like these figures? They were made with ViSta! Hint:
Click on the Histogram window, then use the Histogram
menu to change the number of "bins" (bars).
- Measures of Central Tendency
- There are three main measues of central tendency:
- The Mean (and the Weighted Mean)
- The Median
- The Mode
For the distributions shown above, ViSta reports the following
summary statistics:
Note that the Mean and Median values are not equal for
all three distributions, and that the mode is not reported.
We discuss these three measures in this lecture.
- The Mean
- Arithmetic Average
- The mean is commonly known as the arithmetic average.
- Level of measurement:
- The mean can only be used for variables at the interval
or ratio levels of measurement.
- Definition:
- The mean is the sum all of the scores in the distribution
divided by the total number of scores in the distribution:
- Example:
- The mean of [2 6 2 10] is
(2 + 6 + 2 + 10)/4 = 20/4 = 5
- Balance Point
- You can think of the mean as the balance point
of a distribution (the center of gravity). It balances
the distances of observations to the mean.
That is, the sum of the distances that observations
are below the mean equals the sum of the distances that
observations are above the mean.
In the example,
X = 2 is 3 points below the mean of 5
X = 2 is 3 points below the mean of 5
X = 6 is 1 point above the mean of 5
X = 10 is 5 points above the mean of 5
So the sum of the distances that observations are below
the mean is 6, as is the sum of the distances that observations
are above the mean.
- The Weighted Mean
- Definition:
- The weighted mean is the sum all of the scores in the
distribution, as weighted by their frequency, divided
by the total number of scores in the distribution:
- Level of measurement:
- The mean can only be used for variables at the interval
or ratio levels of measurement.
- Example:
- Consider the following frequency distribution table:
_________________
X f cf c%
5 1 20 100%
4 5 19 95%
3 8 14 70%
2 4 6 30%
1 2 2 10%
The weighted mean is:
(5*1 + 4*5 + 3*8 + 2*4 + 1*2)
(1 + 5 + 8 + 4 + 2)
which is
(5 + 20 + 24 + 8 + 2)/20 = 59/20 = 2.95
- Compute
- Compute the ViSta Summary Report for these data. It
should look like this:
- The Median
- Definition:
- The median is the score that divides the distribution
of scores exactly in half.
Exactly 50% of the individual scores in the distribution
are at or below the median (and 50% are at or above
the median).
The median is also the 50th percentile.
- Level of measurement:
- The median can be used for variables at the ordinal,
interval or ratio levels of measurement.
- Calculating the Median:
- There is no formula to learn here! We follow these rules:
- When N is Odd: Find the middle score.
- When N is Even: Find the middle two scores.
- If the middle score(s) are not tied with other
scores: The Median is either the middle score
or the average (mean) of the middle two scores.
- If the middle score(s) are tied with other scores:
Use interpolation to calculate the 50% percentile.
The interpolation process is explained in chapter
2 of the textbook on pages 54-57. See the graphical
approach on page 84 of the textbook.
- Example:
- Consider calculating the median of the scores in the
following frequency distribution table:
_________________
X f cf c%
5 1 20 100%
4 5 19 95%
3 8 14 70%
2 4 6 30%
1 2 2 10%
There are 20 scores, so we find the middle two
scores. These scores are both a score of 3. Since they
are tied (both with themselves and with other scores),
the third approach to calculating the median must be used.
This involves finding the 50th percentile.
We see that the 50th percentile (the median) is in
the interval between the cumulative percentages of 30%
and 70%. This interval has real limits of 2.5 and 3.5,
respectively.
The 50th percentile is 20 percentile points from the
top edge of the interval, which is 40 percentile points
wide. Thus, the 50th percentile is 20/40 = 1/2 of the
total distance below the upper real limit.
The interval is exactly 1.00 units wide, so the 50th
percentile is located at
[(70%-50%)/(70%-30%)]*1.00 =
(20/40) * 1.00 = .50
units above the lower limit. Thus, the median
for the data in the table above is
median = 2.5 + .5 = 3.0
Note that we can see this just by looking
at the frequency table: There are 6 scores below the central
interval, and 6 above, so the median must be the middle
of the central interval, which is just 3!
- The Mode
- Definition
- The mode is the score or category of scores in a frequency
distribution that has the greatest frequency. The mode
is used to describe the typical score.
- Level of measurement:
- The mode can be used for variables at any level of measurement
(nominal, ordinal, interval or ratio).
- Calculating the Mode:
- There is no formula to learn here! The rule is simply
to find the score or category with the largest frequency.
Sometimes a distribution has more than one mode. Such
a distribution is called multimodal. A distribution
with two modes is called bimodal. Note that the
modes do not have to have the same frequencies. The
tallest peak is called the major mode, other
peaks are called minor modes.
Some distributions do not have modes. A rectangular
distribution has no mode --- it is flat. Some distribution
have many peaks and valleys. They, too, don't have a
mode.
- Selecting A Central Tendency Measure
- The Mean (weighted mean)
- The mean is usually preferred because it uses every
score in the distribution. Also, it is closely related
to measures of variation we will be talking about next.
- The Mode
- The mode can be used with weaker levels of measurement
(i.e., nominal and ordinal) than the mean.
- The Median
- The Median should be used, rather than the mean, when:
- There are some extreme scores in the distribution.
The mean can be misleading when there are "outlying"
scores, whereas the median is not affected by extreme
scores. Consider, for example, salery levels.
- Some scores have undetermined values (missing data
or data are at limits, such as response times).
- There is an open-ended distribution (such as when
one has family sizes of, say "5 or more").
- The data are at the ordinal level of measurement.
- Central Tendency and Distribution Shape
- Symmetric Distributions
- Symmetric distributions (those where one side is the
mirror image of the other) have a mean and median that
have the same value. If the distribution is symmetric
and unimodal, the mode also has the same value as the
mean and median.
- Skewed Distributions
- Skewed distributions have different values for the mean,
median and mode. For unimodal skewed distributions, the
mean is pulled toward the tail, and the median is between
the mean and mode.
Check out the HyperStat
site. Pay particular attention to the first two chapters,
especially the one on Describing
Univariate Data.
Use the Histogram
Explorer to get a better understanding of histograms and
distributions. Follow the Basic Instructions given there.
Use the Practice Guessing.
Check out the HyperStat
site. Pay particular attention to the first two chapters,
especially the one on Describing
Univariate Data.
Use the Histogram
Explorer to get a better understanding of histograms and
distributions. Follow the Basic Instructions given there.
Use the Practice Guessing.
Try this interactive demonstration of the median
as the minimizer of the sum of absolute errors and the mean
as the minimizer of the sum of squared errors.
|