Types of Measures of Central Tendency
There are several different ways of computing the “average” for a set of values. These are often defined as measures of central tendency.
Review the measures of central tendency and respond to the following: Question below
- What are the three types of averages that can be used to describe a set of scores?
- What are the relative advantages and disadvantages of each measure of central tendency? Which data type is each most suited for?
- Suppose you have data on the socioeconomic indicators for members of one gang. In particular, you have information on their age, race, education, and income. Which measure of central tendency is most suited for each type of data? Why are the other measures not appropriate?
NOTES BELOW.
A “Begin” button will appear on the left when the applet is finished loading. This may take a minute or two depending on the speed of your internet connection and computer. Please be patient. If no begin button appears, it is probably because you do not have Java installed or Java is not enabled.
This Java applet shows how the relationship between the mean and the median and illustrates several aspects of these measures of central tendency.
The applet begins by showing a histogram of 9 numbers. As can be seen in the histogram, the numbers are: 3, 4, 4, 5, 5, 5, 6, 6, and 7. The mean and median of the numbers are both 5.0. The mean is shown on the histogram as a small blue line; the median is shown as a small purple line. The standard deviation is 1.15. Note that the standard deviation is calculated using N rather than N-1 since the numbers are considered to be the entire population of interest. A red line extends one sd in each direction from the mean. You can change the values of the data set by “painting” the histogram with the mouse.
Below the histogram are shown: |
|
The means of these 5 quantities are displayed at the bottom of the screen.
Mean and Median as Measures of Central Tendency
The mean and median are both measured of central tendency.
The mean and the median are the same for symmetric distributions. In general, the mean will be higher than the median for positively skewed distributions and less than the median for negatively skewed distributions. Change the above distribution’s skew and see the effect on the relative size of the mean and median.
The mean is more affected by extreme scores than the median and is therefore not a good measure of central tendency for extremely skewed distributons.
The Mean as Center of Gravity
Think of the X axis of the histogram as a teeter totter. Assume that the X-axis itself and the Y axis have no weight.
You can think of each point’s deviation from the mean as the influence the point exerts on the tilt of the teeter totter. Positive values push down on the right side; negative values push down on the left side. The farther a point is from the fulcrum, the more influence it has.
Change the shape of the distribution and notice that if the fulcrum is placed at the mean (portrayed as a small blue line under the X-axis), then the teeter totter will be in balance.
Note that the mean deviation of the scores from the mean is always zero. That is why the teeter totter is in balance when the fulcrum is at the mean. This makes the mean the center of gravity for all the data points.
Error of Prediction
A group of 20 people takes a memory span test. You are told that the mean memory span is 7.5 and are asked to predict each person’s score. Since you know nothing about the individuals except that they are memebers of a group with a mean of 7.5, the best you can do is predict that each person will have a memory span of 7.5. Therefore, for each person in the group, the predicted score will be 7.5 and the error of prediction will be their score minus 7.5.
The errors of prediction will always have a mean of 0. One way to measure the accuracy of prediction is to compute the mean squared error of prediction. This measure of accuracy will be smaller when the mean is used as the prediction then it would be if any other measure (such as the median) were used as the predicted score. (Click here for a proof of this).
Each individual score can be thought of as consisting of the sum of two parts: the predicted score and the error of prediction. For example, if the mean were 4.0 and a particular score were 5.0 then the predicted score would be 4.0 and the error of prediction would be = 1.0.
How do the mean and median compare? Although the average squared error of prediction will be higher for predictions based on the median than on the mean, the average absolute value of the difference will be lower for the median than for any other number. (Click here for a proof of this).
Change the distribution in the applet and compare the error of prediction and squared errors of prediction for the mean and median.
1. Note that the mean deviation from the mean and the mean deviation from the median are both zero. Change the data and see if this is still true.
2. Change the distribution so that it has a postive skew. Which is bigger, the mean or the median?
3. Change the distribution so that it has a negative skew. Now which is bigger, the mean or the median?
4. Investigate the relative size of the mean squred deviation from the mean and the mean squared deviation from the median. Which is smaller? When are they the same? Click here for a proof of your finding.
5. Although not shown in the applet, the median minimizes the mean absolute value of deviations. Click here for a proof.
Module Notes: Descriptive Statistics
In Module 1, we saw that the first step of descriptive statistics is to organize and visualize data so that we can get a general idea of the patterns. In this module, we will study the next two steps, which are calculating measures of central tendency and variability.
Measures of Central Tendency
Once we have organized our data, we need to begin analyzing it in order to turn data into information we can use. The simplest form of data analysis is descriptive statistics. We begin with trying to describe the group as a whole and do this by getting one number that represents the entire group of data. Measures of central tendency help us do this by calculating a ‘central’ value that the data clusters around. For instance, if we have collected information on the number of gang homicides in a month in 25 of the largest cities in the U.S., measures of central tendency help us answer the question ‘In general, how many gang homicides are there in these cities?’ The answer gives us an estimate of the levels of gang activity in the group. For example, an average of 30, compared to an average of 5, leads us to very different conclusions about how big a problem gangs are in these cities. There are three measures of central tendency mean, median, and mode. These measures have their strengths and weaknesses, and some are only appropriate for certain data types. (See the module readings and presentation for a discussion of their strengths and weaknesses.)
Measures of Variability
Measures of central tendency provide very little information beyond summarizing what is typical of our data. Relying only on measures of central tendency gives a distorted view of the data. Going back to the example of gang homicides described above, let’s imagine that we had two different cities, both of which had an average of 10 homicides a month. Based only on the average, we might assume that they are similar. Suppose we look further at the information and determine that City A had very similar levels (between 11—12 homicides a month) throughout the year. On the other hand, the number of homicides per month in City B varied dramatically over the year. In most months there were none, but a gang war broke out in the summer months and the number of gang homicides was 30—50 per month. Would you say these two cities have the same type of gang problem? Probably not. City A seems to have a long-standing stable gang problem, whereas City B seems to have an acute emerging problem.
Without looking at how the data is scattered around the central value, we would have assumed that City A and City B are very similar. Measures of variability tell you how much the values ‘vary’ from that central value, and provide additional information about how your data is distributed. The measures of variability are range, variance, and standard deviation.
Application:
Once we have understood the concepts behind central tendency and variability, we will learn how to calculate these measures. In the last module, we learned how to interpret graphs and charts to visualize data. In this module, you will learn how to create graphs and charts using Excel. You will also learn how to calculate measures of central tendency and variability.
View and listen to this module’s presentation for more in-depth discussion of module topics.
- Module 2 Presentation
v:* {behavior:url(#default#VML);}
o:* {behavior:url(#default#VML);}
w:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
Normal
0
false
false
false
false
EN-US
X-NONE
X-NONE
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:””;
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-para-margin-top:0cm;
mso-para-margin-right:0cm;
mso-para-margin-bottom:8.0pt;
mso-para-margin-left:0cm;
line-height:107%;
mso-pagination:widow-orphan;
font-size:12.0pt;
mso-bidi-font-size:11.0pt;
font-family:”Times New Roman”,serif;
color:#222222;}