Everyone is excited about ‘big data’ being THE NEXT BIG THING!
What does it really mean to everyday business?
The most regularly used definition of big data is:
Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
But that isn’t much help!
These days information comes from everywhere: data flows in from sensors used to gather climate information, posts to social media sites, digital pictures and videos posted online, transaction records of online purchases, and from cell phone GPS signals to name a few. The collection and storage of information is increasing at an exponential rate and 85% of the data in the world today has been created in the last two years alone – data is getting bigger.
This is an IBM schematic on where big data comes from:
So, what is big data really?
In very simple terms big data is any block of information that is too big for your computer to handle as it is!
Considering it a little differently; billions of computers all over the world are collecting information line by line, field by field input by input – and saving all these values in massive data tables with rows and columns. These data tables form the basis of what is known as big data – they have become too big to look at as a whole and so are now known collectively as “Big Data”.
Obviously the context of what constitutes Big Data is a bit subjective, and hence the general confusion as to what big data is… for an SME this definition includes any amount of data that is bigger than say excel can manage; but for a university or IBM with massive processing power, it is a much larger set of information. To make it even harder to pin down Big data sizes are a constantly moving target with improvements in technology capability and the ability to handle lots of information.
How good are your reports – is it time for a check?
This trend to super large data sets, instead of multiple smaller more accessible tables, has arisen because by putting everything in one spot, there are more ways to cross reference the information – slicing and dicing in more ways is easier to do when you start with a bigger block in the first place.
i.e. the more columns you have then the longer each of the rows becomes… therefore there is additional information to interrogate, cross-reference, report on and analyse.
I think this graphic is a really nice summation:
Consequently, because it is so BIG – the challenges around big data include capture, curation, storage, search, sharing, transfer, and analysis.
We have an overwhelming number of bits of information locked up in tables so big they can’t open – what now?
Amassing huge numbers of variables in one spot allows correlations to be found:
* determine quality of research
* link legal citations
* determine real-time roadway traffic conditions
However, don’t expect too much of Big Data itself. It is, after all only “data” – values of qualitative or quantitative variables, belonging to a set of items.
Using Big Data requires four key steps:
- defining – clarifying what decisions need to be made and therefore what is to be investigated
- querying – isolate a sub-set of data to be investigated
- reporting – creating and providing the subset from the database
- analysis – review and extrapolation as to causes and future impacts etc.
The process of using Big Data, as well as the skill of the analyst, combined with the expectations of the audience and owners all combine to how useful it is; whether it provides just more data or whether it enables decisions.
For example, you could query a database as to the number of beans available, and the colour of those beans at several points in time, Big Data will happily be able to reveal: ”This month I have 8 beans, half of them red the rest blue. Last months there were 9 beans all of them red.”
This is simply more data, it doesn’t address any real questions:
* are red better than blue?
* is that change good or bad?
* what should I be aiming for?
The interrelation between humans and big data is where it all starts falling to pieces. Big Data is only as good as the Four key steps applied to using it:
– Without human vision and intuition, how can we work out what to look for and define the investigation well?
– Without a skilled data warehouse manager how can we hold the data accurately and ensure it is queried and reported correctly?
– Without good analysts how can we be confident in the stories the data is telling?
Despite this Big Data is expected to:
* pick a winner
* prevent diseases
* combat crime
a bunch of numbers in a table is in my opinion unlikely to be able to actively achieve these goals.
Big Data can not deliver what is expected without a shift to demanding knowledge, instead of information.
Big Data can’t deliver vision.
Big Data doesn’t determine, or necessarily highlight, ambiguity.
Big Data can’t report values not yet collected.
Human judgment can be more than just binary – yes/no. Sometimes we need maybe! … or even maybe?
As Tim Leberecht so eloquently puts it in his blog last week:
Data can give us the illusion of objective truth, yes, but at the end of the day, our employees and customers are not interested in the truth, they seek experiences that feel true.
Now you know what it is, don’t be afraid of big data – start to use it well to inform your decision-making, bearing in mind all four key steps.
What is one question you will put to your data this week?
[box type=”note” style=”rounded” icon=”empty”]
About the Author
Often described as the small business answer to The Supernanny, our principal Eve Blackall owned a Tax Accounting Practice for 15 years, worked as Financial Controller and is currently an advisor to various ASX listed companies and large non-profits.
With Smart Accounting you get to work one-to-one with Eve and not many consultants have her experience and expertise. For over 25 years she has been helping businesses be more efficient and effective, and ensuring business owners sell for the highest prices.