Seven dirty secrets of data visualisation

Data visualisation - and in particular, web-based data visualisation - is having its moment. JavaScript libraries like D3.js, Raphaël, and Paper.js, building on modern browser support for Canvas and SVG, have made it easier than ever to produce complex visualisations that, until recently, were the province of computer scientists and a handful of specialist designers.

Visualisation is the new 'must-have' element in project proposals and personal portfolios, and startups like Platfora, Datameer, and our own employers ClearStory Data and Chartio are raising millions for analytics platforms with browser-based visualisation interfaces.

To some extent, the buzz is justified. Data visualisation is a wonderful way of exploring data, finding new insights, and telling a compelling story. But what are the real challenges visualisation developers face - and what don't they want you to know about their work?

Most data visualisation tutorials start with a pleasant fantasy: a pristine data set. Whether you’re learning to build a basic bar chart or a force-directed network graph, you’re presented with clean, normalised, well-formatted base data. This perfect JSON or CSV file is the digital analog of the neatly prepped mise en place in a televised cooking show: the refined result of tedious, painstaking work presented as raw ingredients. In practice, when dealing with most real-world data sets, expect to spend up to 80 per cent of your time finding, acquiring, loading, cleaning and transforming your data.

Some of this process can be done with automated tools, but almost any data cleaning involving two or more data sets will require some level of manual work. A wide variety of tools can convert XLS to XML or timestamps to other date formats, but nothing can automagically map one company’s internal sales categories to those of its competitors, or deal reliably with data entry typos, incompatible character encodings, or (shudder) poor OCR.

Budget significant time in any visualisation project for data cleanup. Increase your estimate (in some cases exponentially) for multiple data sources, manually entered or OCR data, divergent categorisation schemes, and non-standard formats
Google Refine is a great data cleanup workhorse, though it has limitations, particularly for non-tabular data. Other cleanup-specific tools include Data Wrangler and Mr. Data Converter. However, many tasks still require basic proficiency in a scripting language like Python or manual work in Excel. Save your scripts - you’ll use them again
Eat your own dog food if you can: visualisation is a great tool for identifying data problems. Use scatter plots and histograms to find and fix suspicious outliers

One of the first questions to ask when considering a potential visualisation design is “Why is this better than a bar chart?” If you’re visualising a single quantitative measure over a single categorical dimension, there is rarely a better option. Likewise, time-based data is usually best displayed on a line chart, and scatterplots are often best for exploring correlations between two linear measures. At the risk of sounding regressive, there are good reasons these charts have been in continuous use since the 18th century. Bar charts are one of the best tools available for facilitating visual comparisons, leveraging our innate ability to precisely compare side-by-side lengths.

The corollary to bar chart superiority, and perhaps the dirtiest secret in this article, is that the coolest-looking visualisations are often the least useful. The novelty and aesthetic appeal of custom visualisations comes at a cost: the clarity of the data. Most bar chart alternatives ask the viewer to compare differences we have a harder time discerning: areas, angles, hues, or opacities. At best, such visualisations make comparison difficult; at worst, they distort the data entirely, leading viewers to false conclusions.

Don’t dismiss traditional visualisation choices if they represent the best option for your data. Start with bar and line charts, and look further only when the data requires it
Have a good rationale for choosing other options. Compared to bar charts, bubble charts support more data points with a wider range of values; pies and doughnuts clearly indicate part-whole relationships; treemaps support hierarchical categories
Bar charts have the added bonus of being one of the easiest visualisations to make - you can hand-code an effective bar chart in HTML using nothing but CSS and minimal JavaScript, or make one in Excel with a single function

Cleaning and formatting a single data set is hard enough, but what if you’re building a live visualisation that will run with many different datasets? Maybe you have to build a visualisation for use in multiple departments within one company, where every department has its own database, and you don’t have time to manually clean each dataset. Your first instinct may be to grab some demo data and use that to build your visualisation; your visualisation library may even come with standard sample data.

Unfortunately, there is no substitute for real data. Demo data tends to have a normal distribution and a manageable number of records; it’s designed to show visualisations in their best light. A bar chart doesn’t just have the prerequisite bars, it looks like an ideal bar chart. It doesn’t help you plan for data discrepancies, null values, outliers, or other real-world problems. If you rely too much on demo data, when you plug in real data you may find that your visualisation isn’t the best one suited for your data to begin with.

Ideally use several random samples of real data if you cannot access an entire dataset
Invalid and missing data is a guarantee. If your data won’t be cleaned before being graphed, do not clean your sample data
Real data may be so large as to overwhelm your visualisation or the system generating it. Be sure that if you use a sample of data you correctly scale up the sample size (or reduce it appropriately) before creating a final visualisation

Designing the labels, legends and axes for your visualisation is often an afterthought to the initial visualisation. But these elements are crucially important to the visualisation, and can be difficult and time-consuming to get right, especially when you can’t predict the data ahead of time.

When laying out your visualisation, leave significant rendering space for any additional marks you may need, often including relatively wide margins around the graphical part of your visualisation. Axis labels should be spaced such that they do not occlude each other and are easily readable. Use rotate or reposition labels if necessary for legibility. If a particular area is overcrowded with labels, but you need them for clarity, consider moving the labels farther from the elements they reference and connect them with an indicating line. Another technique is to group crowded labels together in a single tooltip-like group. Consider the space you’ve allowed and the length of the longer labels. If the labels won’t fit you might need to shorten them with ellipses, or simply truncate the text at a fixed length.

Similarly, legends require advance planning to render well. One easy option is to reserve some space for the legend to one side of the graphic. Unfortunately, this means that you’ll need to reduce the size of the graphical portion of your visualisation. In order to preserve some space you may be able to place the legend in an empty part of the graphic, or make the legend draggable so the viewer can access any graphics underneath.

Plan space around your graphic for labels, axes and legends
Designate a maximum character length for labels, truncating if needed to prevent crowding. Group nearby labels together, revealing them in response to user actions
Consider scrolling or accordion-style expansion for long legends
Whatever you do, don’t leave these elements out. Labels may seem like a secondary concern when you’re focused on the graphic elements, but they are incredibly important to your viewers

As a visualisation author, it’s often tempting to add animations into your final product. Animations are a powerful way of connecting data to changes in state and trends. However, animations can also lead to confusing or misleading interpretations of your data. You should carefully plan for how it will affect your entire output and not simply add it at the end of your work. Animations work best when they can reveal data relationships showing how data groups together between different states, how the data changes over time, or how data points are directly related.

In general, make your animations simple, predictable and re-playable. Allow users to view the animation multiple times so they can track where objects start and end. Avoid occluding objects in a transition with other objects, which makes tracking more difficult and do not transition objects along unpredictable paths. With complex animations, research suggests that viewers’ comprehension improves when the animation is broken into simple 'staged' transitions. A stage pauses the animation with the objects in a transitioning state and provides the viewer a moment to reflect on the state of each object.

It's a central tenet of the field that data visualisation can yield meaningful insight. While there’s a great deal of truth to this, it’s important to remember that visualisation is a tool to aid analysis, not a substitute for analytical skill. It’s also not a substitute for statistics: your chart may highlight differences or correlations between data points, but to reliably draw conclusions from these insights often requires a more rigorous statistical approach. (The reverse can also be true - as Anscombe’s Quartet demonstrates, visualisations can reveal differences statistics hide.) Really understanding your data generally requires a combination of analytical skills, domain expertise, and effort. Don’t expect your visualisations to do this work for you, and make sure you manage the expectations of your clients and your CEO when creating or commissioning visualisations.

Unless you’re a data analyst, be very careful about promising real insight. Consider working with a statistician or a domain expert if you need to offer reliable conclusions
Small design decisions - the colour palette you use, or how you represent a particular variable - can skew the conclusions a visualisation suggests. If you’re using visualisations for analysis, try a variety of options, rather than relying on a single view
Stephen Few’s Now You See It offers a good practical introduction to using visualisation for business analysis, including suggestions for developers on how to design analytically-valid visualisation tools

The range of libraries and tutorials now available make it easier than ever to produce production-quality web-based visualisations without specialised expertise. But creating visualisations that offer real insight or tell a compelling story still requires a particularly wide range of real skills in addition to coding, including graphic design, data analysis, and an understanding of interaction design and human perception. No library or technology can substitute for knowing what you’re doing.

But the flip side of this secret is that you don’t need to know that much - especially if you use well-established visualisations and interaction principles. Learn enough about the field to avoid newbie mistakes (always zero-base your bar charts and never set a circle radius with a linear scale), keep things simple (no 3D, limited animation, no drop shadows), base your work on solid examples and you can create great visualisations.

Nick Rabinowitz is the Senior Data Visualisation Developer at ClearStory Data. He has over 15 years experience working on web and visualisation projects, primarily for non-profit, academic, and public sector clients.

Nate Agrin is the Director of Visualization at Looker. He studied information and visualisation theory at UC Berkeley and has worked at companies like Splunk and Twitter, contributing to their web-based interfaces.

Thank you for reading 5 articles this month* Join now for unlimited access

Enjoy your first month for just £1 / $1 / €1

*Read 5 free articles per month without a subscription

Join now for unlimited access

Try first month for just £1 / $1 / €1

TOPICS

The Creative Bloq team is made up of a group of art and design enthusiasts, and has changed and evolved since Creative Bloq began back in 2012. The current website team consists of eight full-time members of staff: Editor Georgia Coggan, Deputy Editor Rosie Hilder, Ecommerce Editor Beren Neale, Senior News Editor Daniel Piper, Editor, Digital Art and 3D Ian Dean, Tech Reviews Editor Erlingur Einarsson, Ecommerce Writer Beth Nicholls and Staff Writer Natalie Fear, as well as a roster of freelancers from around the world. The ImagineFX magazine team also pitch in, ensuring that content from leading digital art publication ImagineFX is represented on Creative Bloq.

Recommended reading

Seven dirty secrets of data visualisation

Secret #01: Real data is ugly

Tools and strategies

Secret #02: A bar chart is usually better

Tools and strategies

Secret #03: There’s no substitute for real data

Tools and strategies

Secret #04: The devil is in the details

Tools and strategies

Secret #05: Animate only when appropriate

Tools and strategies

Secret #06: Visualisation is not analysis

Tools and strategies

Secret #07: Data visualisation takes more than code

Please wait...