Statistical moments —

Inspired by this blog from paypal team. Moment is a physics concept(or atleast I encountered it first in physics, but it looks it has been generalized in math to apply to other fields).

If you followed that wikipedia math link above, you’ll know the formula for moment is
\mu_n = \int\limits_{-\infty}^{+\infty} (x-c)^n f(x)\ dx\
where x — the value of the variable
n — order of the moment (aka nth moment, we’ll get to that shortly)
c — center or value around which to calculate the moment.

However if you look at a few other pages and links they ignore that part c.. and of course use the summation symbol.**

The reason they don’t put up ‘c’ there is they assume moment around the value 0. As we’ll see below this is well and good in some cases, but not always.

The other part n- order of the moment is an interesting concept. It’s just raising the value to nth power. To begin with if n is even the the negative sign caused by differences goes away. So it’s all a summary and becomes a monotonically increasing function.

I usually would argue that the ‘c’ would be the measure of central tendency like mean/median/mode and a sign of fat-tailed/thin-tailed distributions is that the moments will be different if you choose a different c and the different moments change wildly.

The statsblog I linked above mentions something different.

Higher-order terms(above the 4th) are difficult to estimate and equally difficult to describe in layman’s terms. You’re unlikely to come across any of them in elementary stats. For example, the 5th order is a measure of the relative importance of tails versus center (mode, shoulders) in causing skew. For example, a high 5th means there is a heavy tail with little mode movement and a low 5th means there is more change in the shoulders.

Hmm.. wonder how or why? I can’t figure out how it can be an indication of fat-tails(referred by the phrase importance of tails in the quote above) with the formula they are using. i.e: when the formula doesn’t mention anything about ‘c’.

** — That would be the notation for comparing discrete variables as opposed to continuous variables, but given that most of real-world application of statistics uses sampling at discrete intervals, it’s understandable to use this notation instead of the integral sign.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s