Overfitting. When models fall off the catwalk

Posted on Posted in Blog


The power of modelling for leveraging data and its place in the arsenals of marketers, financial risk assessors, business strategists and people responsible for the customer experience, is well accepted.

But why does a model work brilliantly against one dataset yet perform so disastrously against another? 

The problem could well be model overfitting and it can make your predictions turn to custard.

Model overfitting occurs when you build a model that’s too specific to the data set you’re using to build it.  In effect, the model becomes too tight. It is not generalisable enough to provide reliable predictions when new data comes along.

Think of it this way:
Suppose you’re a designer (a.k.a. Analyst) of fashionable clothes.  A client (a.k.a. Data) comes in one day wanting a bespoke pair of pants (a.k.a. The Model), which you decide to carefully craft from cling film.

Fitting the pants is complex, involving precise wrapping of cling film until you achieve a perfectly tight fit with not a crease or wrinkle to be seen.

Months later, a friend (a.k.a. New data) of the client borrows the pants.  But the friend struggles mightily to fit them because they aren’t stretchy enough to accommodate their taller and fuller form.

Those pants look dreadful.  If only they had been made from an elasticated material that allowed them to be worn by people other than the person for whom they were made.

How good is that movie?

Now let’s demonstrate overfitting with a real-world example.

We’re using a dataset taken from the International Movie Database (IMDb) website – the world’s most comprehensive information resource dedicated to movies, television and video games.

To demonstrate this, our aim is to build a model that can predict IMDb movie scores and show how overfitting can make a model that works brilliantly against the data used to build it – but gives terrible predictions when we apply it to new data.

Let’s take a look at the variable we are trying to predict, IMDb score.

Figure 1 shows the number of movies for each IMDb score.  As you can see, most have a score between 5.5 to 7.5, with only a few very high or very low scores.

If we plotted actual IMDb scores against a model’s predicted IMDb scores and got it 100% right, our points would follow the perfect line in Figure 2 exactly.

First up, we’ll start creating our model based on movies from the 90s and earlier.

Let’s make it as comprehensive as possible by allowing the model to fight very tightly to the data.

How well does our model perform? The scatterplot in figure 3 shows our model has done a fine job in making predictions because almost all the points follow the perfect line closely.


But what if we want to make some IMDb score predictions for movies made after the 90s?

This is where it all turns pear-shaped because we end up with a random collection of points, as seen in Figure 4.  If the model provides a movie score of 5, its actual IMDb score could be anywhere between 2 and 9 – as good as useless.
So, what went wrong? Our overfit model is too tight.  It has been overfit to the movies made during and before the 90s, and is less generalisable for predicting scores on movies outside of that.
Now let’s try a different model where we control for overfitting.  This time we’re optimising for generalisability by testing the fit on new data as we’re building the model.
In figure 5, you can see that the points for movies made after the 90s are much closer to the red line.  The predictions aren’t perfect, but we’ve minimised overfitting to produce a generalisable model that is much better at making predictions on new data than our first model.

We can also create a measure of model fit to compare different models and choose the one that performs best.  Figure 6 is a visual comparison of ‘model fit’ for our uncontrolled and controlled models.  We don’t want the most complex model – we want the model that is most generalisable.


Building a model for making reliable predictions is complex but entirely doable.  Just be clear about how you want the model to be applied in the future.

Tight is not right. A generalisable model is more useful than one that’s been overfit to a specific dataset.

You need a combination of skills.  It’s important to engage the right data expertise, but it’s equally important to work with a data partner who ‘gets’ the business.

And never borrow a pair of pants from someone smaller than you.