Four Rules for Better Data Visualization in Science

The first thing I do when I come across an interesting paper is look at the figures. For me, they offer the quickest means of entry into a story. If a figure captures my interest, I go on to read the results, and then hopefully get captivated enough to read the entire text.

Apparently, I’m not the only one who does this. It turns out that a lot of academics take the same approach, whether consciously or not, when they’re sifting through dozens of articles trying to find the few that are rewarding to read. But why do so many readers filter their searches by figures rather than, for example, by the abstract?

When our time is limited, we try to amass as much information as possible as quickly as possible. Carefully crafted figures allow us to do just that – they are condensed information. Figures are a powerful tool for guiding us through the authors’ story, and publishers realize this too – in top tier journals you’ll find a lot of clear and intuitive visualizations that can be grasped by a broader audience (and these journals demand well-thought, refined figures from their authors).

Effective visualization is a crucial element in scientific story-telling and, in my opinion, a skill that is worth developing early on in your scientific career.

I developed an interest in data visualization not only because it has become an asset for scientists, but also because of the satisfying feeling you get when your graphics are both visually pleasing and effective in conveying your message. My enthusiasm ended up being very rewarding, winning me two best poster awards and quite a few compliments from the people who had to sit through my presentations.

So, with that, here are my four rules for good data visualization (and a few additional tips):

1. A picture is worth a thousand words: Visualize concepts, ideas, and data with graphics

This phrase was coined in early twentieth century as part of the revolutionary transition from using text to using pictures for successful advertising. Like advertising, in science communication we’re trying to excite people and successfully sell our research. There is nothing duller than a massive chunk of text that could easily be replaced with a single picture. Therefore, I’ll dedicate the top of my list to the practice of replacing (or accompanying) text with figures in presentations and in manuscripts.

Firstly, what type of scientific text can be converted into a figure?

Graphical abstracts

A Graphical abstract summarizes an entire paper in a visual format. It should enable the viewer to grasp the ‘take-home message’ even without reading any of the text. Some journals actually demand a graphical abstract for submitted manuscripts (and also provide guidelines and examples for how this can be done effectively). Don’t be afraid of making a graphical abstract before your project is complete! It can be a great replacement for the summary section on a poster or the summary slide in a presentation.

Visualizing experimental procedures

Method sections can often get very complex and elaborate, making it extremely time-consuming to fish out the essential procedures of a project. The key elements of a computational algorithm or an experimental procedure can often be visualized a lot more effectively in a methods summary graph. The ideal graph should allow the reader to understand the main approach without having to refer to the text.

A great example is this figure from a paper published by Ravarani and colleagues.

/var/folders/18/5mlyygnn44v4q451hny_ds040000gn/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/F1.large.jpg?width=800&height=600&carousel=1
Adapted from Ravarani et al, 2018

As this figure demonstrates well, numbered segments can visualize the experimental steps. The numbers correspond to segments in the Methods section, and therefore serve as a ‘visual reading guide’.

Tip: It’s important to remember that the natural order of reading for most people is left-to-right and then top-to-bottom.

Turn tables into figures whenever possible

For most people, it is significantly easier to see trends in data and compare information when it’s presented in a graph (rather than table) format. You should therefore ask yourself whether any table you’ve included can be visualized as a plot.

I’ve adapted the table in this Nature publication and plotted the same values as bar plots. While it takes a few minutes to parse the table, the comparison is crystal clear in the plot.

2. Increase the signal-to-noise (or ‘data to ink’) ratio

Now that I’ve talked about the importance of making graphics, let’s talk about how we can enhance those graphics.

First and foremost, it’s crucial to differentiate between signal and noise, namely which parts of the graph contribute to the message (signal) and which are substantially irrelevant (noise). Each single line, dot, text and choice of color that isn’t essential for conveying the message should be removed. This will help us draw the viewer’s attention directly to our findings without any distractions. This is the most important rule of data visualization and it lays the foundation for all the other tools and tips I’ll share.

Here are a couple of examples showing how signal-to-noise ratio can be enhanced by removing unnecessary boxes, grid-lines, and axes anchor points.

While I’ve always found colors to be a powerful tool for conveying a visual message, I’ve also included an example where that’s not necessarily the case. When chosen wisely, colors attract attention to groups or values, but when they’re used without a solid purpose, they can be distracting.

3. Choose the correct plotting format

The first step to making a pleasing graphic is choosing the right visual form. Here is a simplified guide on how to choose the best visualization scheme based on the dimensionality of the data and the message of the plot.

Dimensionality describes how complex the data is. For example, the heights of individuals in a population is a one-dimensional data structure. If the weights of the same individuals are also included, the structure becomes two-dimensional.

One dimensional Comparison: bar plot

Distribution: histogram

Comparing several distributions: box plot

Two dimensional Trends and progression (e.g. with time): line graph

Density and distribution: kernel density estimate (KDE) plots as lines, color scatter, or heatmaps.

Relationship between quantitative variables: scatter plot

Relationship between qualitative variables: heatmap

Three dimensional (advisable only for simple trends) Showing surfaces (z = f(x,y)): 3D plots

Relationships between data points: scatter plots with the third dimension encoded as color or size

More dimensions You can try introducing a fourth dimension with animation or a slide bar, but with current human limitations, it is not encouraged in this universe 🤷‍♀️.

Tip: Be careful when cutting the scale!

The point of reference for the y-axis is normally zero. Choosing to zoom into a specific range can create the illusion that there is a significant difference between two values when that isn’t really the case.

These figures are based on the first example I gave (the figure adapted from a Nature paper). Expanding the y-axis can mislead the viewer into thinking that the two models have a significant difference in performance.

4. Use colors conservatively but generously

Color is a powerful tool for conveying relationships within data and helping the viewer to notice specific values. However, as I mentioned previously, color should be used with caution. Recently there have been a lot of attempts to craft the best colormaps, where the order of the colors is not only intuitive visually, but also carries a clear distinction between sequential colors and is colorblind-friendly (you can learn more about these colormaps here).

For instance, colorbrewer offers professional color schemes for qualitative, sequential and diverging data types. Here are a few examples:

Tip: Make sure your plots are color-blind friendly

Keep in mind that one in twenty-five people has color vision deficiency, with red-green color blindness being the most common type. To make sure that your graphics are still perceivable by all viewers, you can run a simple color-blind simulation test.

Conclusion

For better or worse, we get a lot of our credit from our manuscripts and our presentations. Therefore, data visualization is inarguably becoming a fundamental part of every scientist’s work, and an important skill to hone.

To summarize, four key rules for making better visualizations are:

  1. Turn text into graphics whenever possible
  2. Remove noise from plots
  3. Choose the right plotting style
  4. Use color to convey your message on multiple channels.

And once you’ve implemented all of these rules to make marvelous schemes and plots, it’s time to put those plots together to build an effective, coherent figure. Make sure to choose the graphs that come together to convey each point in your storyline. Each figure should convince your readers about a single piece of your conclusion, and together, they should tell the main points of your entire story without any need to glance at the text (ideally).

I hope, like me you’ll find the process of data visualization rewarding and worthwhile developing!

Hungry for more? Check out the following links for further read on this topic:

“Fundamentals of Data Visualization”, Claus O. Wilke (https://serialmentor.com/dataviz/)

“Mistakes, we’ve drawn a few”, Sara Leo (https://medium.economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368)

If you have additional tips for better data visualization, let me know in the comments section below!

 

Salma Sohrabi-Jahromi

PhD candidate, Research Group Quantitative and Computational Biology, MPIbpc

Leave a Reply