Big Data Notes 013: Data dictionary

Attempting to define the term data dictionary could conjure up imagery of a dog chasing its tail. Nevertheless, when it comes to understanding and analysing your data, the data dictionary plays a key role. Big Data Notes explains more.

Right, give it to me in a nutshell.

Helpfully, IBM’s ‘Dictionary of Computing’ sums it up quite nicely. It defines a data dictionary as ‘a centralised repository of information about data such as meaning, relationships to other data, origin, usage, and format’.

Essentially, a data dictionary is an important piece of software for anyone looking to analyse or make use of their data in some way – programmers and people creating algorithms will find it particularly useful. It is also integral to database management systems (DBMS).

Sounds handy, tell me more. How does it work?

big data is becoming huge Ok, the data dictionary contains a list of everything that is stored within a database. It doesn’t contain any data itself, just information about what’s stored in the database.

The first step in how it improves your understanding of said database is through data modelling. This is the process of assessing how streams, sets, or pieces of data – otherwise referred to as ‘objects’ – interact with one another. Having done so, each object is described by its relationship with other objects, thus giving you a more comprehensive understanding of your data and how it can be used. Basically, a data dictionary stores metadata, in others words data about data.

It may seem like an extra complication to collect data about your data but it is extremely important in helping an organisation to understand what it’s got to work with. Moreover, when it comes to running queries on your data having it all indexed and catalogued makes matters far easier. Furthermore, when something goes wrong with your data – perhaps the analysis or the algorithms prove faulty – a data dictionary will allow you to see where an error may have occurred.

So, in the age of big data, how important is this?

Well, needless to say with the arrival of big data and all the things encompassed within that term, we are sitting on ever-expanding mounds of data. Thanks to the Internet of things, mobile devices, apps, remote monitoring, sensor-laden products and social media – to name but a few – organisations are now generating and storing more and more data.

Ultimately, the more data you have the more important a data dictionary is going to be. Let’s put it another way: if we spoke a language with only a few choice words and turns of phrase being tirelessly recycled – much like that of the average footballer – the need for the Oxford English Dictionary would be severely depleted; it would not be difficult to know and understand every word within the language. But naturally the more words, or indeed to return to the point, the more data you have, it becomes crucial that you have a means of understanding what the data is, where it comes from and how it can be used, particularly in relation to all the other data you have stowed away.

Admittedly a data dictionary usually comes in handy when you have a relational database filled with data and therefore may not be entirely applicable to all big data projects. Nevertheless, it demonstrates the value, as already mentioned, of understanding your assets, how they relate to each other, and how you could therefore best make use of them.

What if I don’t create a ‘data dictionary’?

One of the biggest challenges many organisations face when trying to embark on a big data project is knowing what data they have at their disposal. Such is the range of sources and the volume of data that is flying into databases, it is often hard to keep track of what you’ve got. And, if you can’t create sense and order for big data, you are destined to drown in a sea of numbers.

To effectively analyse and, more importantly, cross-analyse your data – this is often where the most insightful results come from – you need to have a rigorous knowledge of what data you have. You could be sat on a wealth of information waiting to be tapped into but, because you are not sure where certain data sets have come from or what to do with them, they are left gathering dust rather than adding value to your organisation.

Many organisations will be familiar with data dictionaries, even if it is only something that falls under the interest and remit of the data boffins they work with. Many others, however, will only recently have started to see or look into the value they can unlock from their data. For them it is a highly recommended tool to acquire and implement to help with data management.

The first step to succeeding with big data is knowing what you’ve got and knowing what you want to do with it in order to help your organisation. A data dictionary will help with this task by giving you insight into the data you have and how it can be put to good use, certainly something you ought to implement sooner rather than later if you haven’t already.