Based on a small post found here.
One of the standard problems in ML with meta modelling algorithms(Algorithms that run multiple statistical models over given data and identifies the best fitting model. For ex: randomforest or the rarely practical genetic algorithm) , is that they might favour overly complex models that over fit the given training data but on live/test data perform poorly.
The way these meta modelling algorithms work is they have a objective function(usually the RMS of error of the stats/sub model with the data) they pick the model based on.(i.e: whichever model yields lowest value of the objective function). So we can just add a complexity penalty(one obvious idea is the rank of the polynomial that model uses to fit, but how does that work for comparing with exponential functions?) and the objective function suddenly becomes RMS(Error) + Complexity_penalty(model).
Now depending on the right choice of Error function and Complexity penalty this can find models that may perform less than more complex models on the training data, but can perform better in the live model scenario.
The idea of complexity penalty itself is not new, I don’t dare say ML borrowed it from scientific experimentation methods or some thing but the idea that the more complex a theory or a model it should be penalized over a simpler theory or model is very old. Here’s a better written post on it.