There are more than one type of average(or mean).
UPDATE: In fact there’s a generalized way of finding the mean. It’s called Generalized Mean and all of the below are special cases of that.
The most famous Euclid’s three are :
1. Arithmetic Mean
2. Geometric Mean
3. Harmonic Mean
In measure theory terms, there are different ways to measure the central tendency of a distribution and each is used in different situation depending on the demands of context.
Arithmetic Mean is what most of us are taught is schools and most used. i.e: adding up values and dividing by the number of values.
What exactly is geometric mean. and where and why is it useful.
The definition of what it is rather simple.
If you want to find geometric mean of n numbers (positive integers), you multiply them all and take the nth root of the resulting product. Now, why would it be useful, and what’s the point of doing this. The positive integers is not really big difficulty in real life, (as in the worst case we can just shift/translate the origin for the axis with negative numbers)
Ok, Now imagine you have standard graph with the axes having very different limits. i.e: x axis varies from 0-0.5 while y axis varies from 0-100.
Now suppose you want to compare two(or more) objects/distribution both of which have measures along x and y. You can plot points in different colours(for diff objects) on the x and y, and then try to make a decision, based on what you want to pick.
That sounds fine till you think there are very few(say 5-10) different objects to be compared. What if you have say (100 laptops and 10 features) you want to compare them across?.
Ah now we’re in real trouble. How do we know which ones are which among the 100 colours, on top of that you have 5 graphs(for 10 features).
What we need is a way to combine these axes into one axis. Then we can go back to simple bar charts.
Here comes geometric mean to the rescue. If you look at the definition it multiplies the feature values which gives us a area(if 2 features), volume (if 3) or a n-dimensional volume value.
We can’t simply use this because, at the moment this value is biased towards features that have a higher range of values.
i.e: in the previous example y axis which ranged from 0-100 will simply wash out any differences in x.
So we take the (2 or 3 or n)th root of this value. In effect we have found normalized the axis range itself.
Note: the cool part here is we don’t need to know anything about the actual range itself. The nature of the operations on the valiues (i.e: product and nth root) itself ensures the final geometric mean value is equalized.
For an example I’ll pick laptop CPU Speed, Hard disk size, and RAM here’s a link..
If you look at it closely, while in the examples i have picked, while all the three pythagorean means don’t change ordinality/ranking of the laptop being compared, the Arithmetic mean gets dominated/boosted simply by raising Hard Disk space.
On the other hand the geometric mean, doesn’t get raised (as much simply) by raising the attribute with higher values.
It’s not really surprising, since the geometric mean is a exponential function and arithmetic mean is a linear function.
You can ignore the Harmonic mean for now, as it’s not at all clear what’s common among the laptops. I’ll later make another post/update detailing how harmonic mean can be used for this case.
One case where it is used is in finding F-Score for comparing predictive algorithms, statistical tests etc.
UPDATE: Harmonic mean post is here.
UPDATE 2: One way to approach and/or defend against confusopoly is to choose a good measure to normalize against the value of the features.. Say like geometric mean. https://softwaremechanic.wordpress.com/2016/07/18/geometric-mean/ .. However note that it assumes you’ll need to find what are comparable features and how meaningful and inter-changeable they are… That’s not trivial and needs deep domain expertise.