Provide Summaries beyond Averages and Medians

Everyone who surveys in the legal industry should understand and correctly use the most common measures of central tendency. Each of the following statistics – called that by mathematicians because they are calculated from the responses collected by the survey – describes in a single figure a summary characteristic of the data about where it is “centered.”

Mean/Average: The mean (a statistician’s term for average) of a group of numbers is its arithmetic average. Add up the number of lawyers in each of the ten law firms you pay the most and divide that sum by ten: that’s the arithmetic average of their lawyers, its most common measure of central tendency. With small numbers of observations, means give a distorted result if there is an abnormally high or low value.

Trimmed Mean: A trimmed mean shaves off extreme values. A common choice is to omit the two percent of the numbers at the low end and the two percent at the high end of a sorted list. Thus, if you have revenue for 50 companies sorted high to low, you drop the two smallest and the two largest. Then calculate the mean.

Inter-Quartile Mean: Another technique to avoid abnormally high or low values might be called the average of the middle. After sorting data from high to low, calculate the average of the middle fifty percent. You could think of this as a huge, trimmed mean that removes the top and bottom quarter of values where highly unusual or incorrect outliers lurk.

Median: If you have sorted all your figures from high to low, the middle one is the median. If you have an even number of figures, you average the two in the middle (thus the median may not be an actual figure). The median isn’t influenced by extreme figures. [I suppose one could calculate a “trimmed median” by using the same technique of lopping off a small portion at either end and then determining the median or you could calculate the inter-quartile median.] One note of caution: Do not do math with medians!

Inter-quartile Range: Midway between the median and the lowest number stands the first quartile figure; midway between the median and the highest figure stands the third quartile figure. The difference between the first quartile and the third quartile becomes the inter-quartile range (IQR). The IQR avoids the unusual and possibly misleading data points at either end of an ordered list.

Winsorize: Named after Charles Winsor, to Winsorize data you set a percentage of values at the extremes equal to a specified percentile or figure, such as plus and minus four standard deviations, or the 5th percentile at the low end or 95th percentile value at the high end. Here’s how it’s done. For a 90 percent Winsorization, the bottom five percent of the values are set equal to the value corresponding to the 5th percentile while the upper five percent of the values are set equal to the value corresponding to the 95th percentile. If you Winsorize your data, the medians will not change but the average will. Think of Winsorizing as smoothing the extreme values.

Mode: It is simply the number that is most common in a set of numbers. For example, in the among the numbers of hundreds of locations of law firms, the mode will be one. More law firms have a single office than any other number of offices.

Range: This value is the difference between the largest number in a set and the smallest. The range obviously depends entirely on the two polar extreme values. One could calculate ranges after trimming or Winsorizing, but that would be an unusual value of central tendency.

These math tools succinctly describe many characteristics of a set of data. The descriptions let you understand the data, communicate these important facets, and compare them to other sets as well as to your own figures.