how bad is your data messy paper and books


How bad is your data - and is it fixable?


Datamine Partner Matt Wilkins has spent a decade helping organisations get more value from business data, and he often meets with companies that are uncertain about jumping into data analytics due to the poor quality of their data.  Here's his take on the issue of data hygiene and its relationship with analytics.



I really love the feeling that comes after giving the house a good clean – there’s something so intrinsically satisfying about knowing everything is tidy and in the right place.  And on the other hand, there is something weirdly unsettling about having clutter everywhere, not being able to find things you need and watching the mess build and build out of control.

This feeling of growing panic extends beyond just disorganised houses – multiply one mess by multiple millions and you’ll begin to understand what businesses feel when they think about cleaning up their ever-increasing tangle of data.  People today understand the importance of good data management, which makes the idea of having ‘dirty data’ even more terrifying.  Data hygiene is a hot topic with both our clients and prospective clients, and I’m often asked, ‘How bad is our data and is it fixable?’

This question isn’t always the right one.  I think what they should really be asking is this:


What data do I need to clean in order to do 'X'?


Trying to tackle (or even just imagining) the cleaning of an entire database is often enough to paralyse even the most competent organisations.  We typically see two different reactions to the growing threat of disorganised data:

Data cleaning icon1.  Companies feel paralysed by the belief that their data is ugly or messy, and they’re therefore hesitant to jump into any projects that they think could make the whole system fall apart
2.  Companies are determined to clean their whole database before doing any value-adding analytics work, and what starts as preliminary data hygiene often becomes its own never-ending project that’s difficult to ever derive value from

Instead of taking either of these approaches, you need to work backwards, focusing first on where your business will see the most value from analytics.  From there, you can determine what data must be cleaned and organised to enable those specific value-adding analytics projects.  Homing in on very clear use cases for any data hygiene projects allows you to carve off and capture pockets of value over time, making measurable headway without getting overwhelmed or wasting time and resource.

For example, let’s say your ideal commercial outcome is to improve marketing ROI.  In order to do this, you need to have a better understanding of your customers - who they are, how they behave, what they want from you.  And in order to get this understanding, you need to analyse demographic and transactional data.  So for the time being, put aside the rest of the mess and focus all efforts on cleaning the data you need right now.


How do I actually go about getting my data into better shape?

This is a two-part process.  Once you’ve identified a section of data that you need cleaned in order to begin a value-adding analytics project, there’s then the challenge of actually doing that cleaning – something that is unfamiliar territory for many organisations (even ones with capable IT teams).  I typically recommend an external data audit in the context of the objectives your business wants to achieve, as a third-party expert will likely be able to better (and more quickly) get your data into a digestible and usable format than your internal team will.

Data CleansingThe second part of the process is investing in the long-term health of your data.  At some point, you’ll need to apply some thinking around why the data is messy in the first place and move the focus to the systematic root of the problem – otherwise you’ll just continue cleaning up different pockets of data as they’re perpetually created.  In order to do this, you’ll need to get an understanding of where and why the data is becoming messy, as well as a solution for improving this moving forward.

One of the best ways to identify, clean and isolate the data you need for specific commercial objectives is through the use of a datamart.  These data repositories exist to overcome many of the typical hygiene issues that cause large databases and warehouses to become so messy.  As you can imagine, it’s difficult to fix data issues in a large operational system (e.g. a PoS or underwriting system), but it’s easy to clean the data you need and move it into a separate datamart where it can be added to other cleaned, relevant data.





It might all seem overwhelming, but luckily Datamine has got a heap of experience doing this for clients, and we can be called on to help on three different fronts:

1.  Data cleaning - whether you’re hoping to clean up a pocket of useful data or do a general cleanse of your database, we’ve got the skills and expertise to help you do this
2.  Improving data collection methods – if you’ve got messy data, it started somewhere.  We can help identify the source of the problem and collaborate with you to find a solution that will reduce overtime and cleaning moving forward
3.  Unlocking value in your data – if you’re hoping to get involved with analytics but aren’t sure where to begin, we can help you identify key areas where you’ll be able to quickly see added value through analysis


Click here if you’re keen to chat with me or another one of our data experts about any of the above solutions we offer.

Further Reading



Matt is a passionate advocate for using data-driven intelligence to identify and address business challenges.  A big supporter of implementing analytics in Marketing, Matt has the expertise to balance the technical, commercial and cultural considerations required to derive value from analytics.


New call-to-action