The perils of big data

A shorter version of this article first appeared in issue 238 of .net magazine – the world's best-selling magazine for web designers and developers.

In case you missed it, Barack Obama won the US election a while back and will be in power for another four years. The news seemed to come as a shock to many, though not to Nate Silver of the New York Times who, along with a handful of others, predicted the result with 100% accuracy. And it wasn't a fluke; he declared himself 90% confident before the election that Obama didn''t need to pack up his toaster. His secret? Big data.

In 1992 at the UK general election, the Conservative party landed back in office with 8% more of the vote than polls had predicted (if the polls had been accurate they would have lost by 1%). The phenomenon is well known, as the “Shy Tory Factor” or the “Bradley Effect” – people telling pollsters that they'll vote for the candidate they perceive to be more socially acceptable, when in truth they intend to vote for someone else. As Robert L. Glass might have said, if he hadn't been talking about reuse: data-in-the-large remains a mostly unsolved problem.

Small amounts of data can certainly be misleading: a sample population of one that's inaccurate is 100% inaccurate. But big data sets provide a whole new scope for delusive conclusions. Data collected about human beings can be infused with bias and self-conscious moderation. Data analysed by humans can be tainted with politics, subjectivity, confusion over correlation and causation, and subtle shifts that lead us inexorably to the very answers we wanted at the beginning.

With all this confusion over how to integrate systems it was pretty hard for businesses to get a consolidated picture of just what was going on. The architecture team's answer to that was Enterprise Data Warehousing (EDW) – an all-knowing repository based on a build-it-and-they-will-come mentality, promoting the extraction and standardisation of data from disparate business systems. Again, not a bad idea in principle, as long as the cost of doing so is lower than the value of the insight obtained. It isn't. But CIOs and vendors need not worry because “Big Data”, the latest in a procession of over-hyped initiatives, is here to stick another layer of obfuscation in the way of those pesky financial controllers.

Businesses are generating more data than ever. It's cheaper to store and with the rise of NoSQL we're finally breaking free of the dominion of the RDBMS. Any product manager worth their salary knows that insight into market forces, the social web, consumer behaviour and the competition is critical to success. Ideal conditions then for vendors, and architects coming up for their annual performance review, to espouse the seductive promise of perspicacity. Thus the Enterprise Data Warehouse team becomes the Big Data team; the vendors throw in some Hadoop integration and they're off and running again. Befuddled financial controllers in their wake.

Point-to-point integration between applications can cause serious issues, but EAI didn't make that better, it made it worse. SOA was a valuable concept with a great deal of promise, but it was hijacked by vendors and large-systems integrators, and all but destroyed. Businesses that are complex unfortunately have to face the fact that they will also have complex IT infrastructures. The best they will ever get is an infrastructure that is only as complex as their business model. Businesses with complex, but appropriate, IT might be looking at a sign they need to simplify their model.

That's not to say big data is a crazy idea. It's a natural progression that well-run business should be able to take in their stride – not use to explain away the failure of their Enterprise Data Warehouse. Large and seemingly unconnected data sets, coupled with careful data analysis can tell us remarkable things, but the investment has to be targeted at the questions we need answered so that we don't lose sight of the goal in the pursuit of the solution.

Thank you for reading 5 articles this month* Join now for unlimited access

Enjoy your first month for just £1 / $1 / €1

*Read 5 free articles per month without a subscription

Join now for unlimited access

Try first month for just £1 / $1 / €1

TOPICS

The Creative Bloq team is made up of a group of art and design enthusiasts, and has changed and evolved since Creative Bloq began back in 2012. The current website team consists of eight full-time members of staff: Editor Georgia Coggan, Deputy Editor Rosie Hilder, Ecommerce Editor Beren Neale, Senior News Editor Daniel Piper, Editor, Digital Art and 3D Ian Dean, Tech Reviews Editor Erlingur Einarsson, Ecommerce Writer Beth Nicholls and Staff Writer Natalie Fear, as well as a roster of freelancers from around the world. The ImagineFX magazine team also pitch in, ensuring that content from leading digital art publication ImagineFX is represented on Creative Bloq.

Recommended reading

The perils of big data

The Big Data team