1.4. Bar plots

The bar plot is another univariate plot on a two-dimensional axis. The two axes are not called x- or y-axes. Instead, one axis is called the category axis showing the category name, while the other, the value axis, shows the numeric value of that category, given by the length of the bar.


Here is some advice for bar plots:

  • Use a bar plot when there are many categories and interpretation of the plot does not differ if the category axis is reshuffled. (It might be easier to interpret the plot with a particular ordering; however, the interpretation won’t be different with a different ordering of the categories.)

  • A time-series plot is more appropriate than a bar plot when there is a time-based ordering to the categories, because usually you want to imply some sort of trend with time-ordered data. Therefore do not use a bar plot for time trends, rather use a time-series plot.


    Use this R code to draw the figures:

    labels = c("2008 Q1", "Q2", "Q3", "Q4", "2009 Q1", "Q2", "Q3", "Q4") profit = c(45, 32, 67, 23, 42, 56, 64, 92)+40 # Draw a bar-plot bp = barplot(profit, names.arg=labels, axisnames=TRUE, ylab="Quarterly profit ($ '000)", border = TRUE) text(bp, profit+3, labels=format(profit), xpd = TRUE, col = "black") # Now rather use a line plot. # Graph profit, but turn off axes # and annotations plot(profit, type="b", axes=TRUE, ann=FALSE, xaxt="n") # Show the x-axis using our labels axis(1, at=1:8, lab=labels) # Plot title title(ylab="Quarterly profit ($ '000)")

    or this Python code:

    import pandas as pd import matplotlib.pyplot as plt labels = ["2008 Q1", "Q2", "Q3", "Q4", "2009 Q1", "Q2", "Q3", "Q4"] profit = ( pd.DataFrame( data=[45, 32, 67, 23, 42, 56, 64, 92], index=labels, columns=["Quarterly profit ($ '000)"] ) + 40 ) # # Draw a bar-plot ax = profit.plot.bar(color='lightgrey') ax.set_ylabel("Quarterly profit ($ '000)") plt.show() # Now rather use a line plot. ax = profit.plot.line(marker="o") ax.set_ylabel("Quarterly profit ($ '000)") plt.show()
  • Bar plots can be wasteful as each data point is repeated several times:

    1. Left edge (line) of each bar

    2. Right edge (line) of each bar

    3. The height of the colour in the bar

    4. The number’s position (up and down along the y-axis)

    5. The top edge of each bar, just below the number

    6. The number itself

    To this end, Tufte defines the data ink ratio as:

    \[\begin{split}\text{Data-ink ratio} &= \frac{\text{total ink for data}}{\text{total ink for graphics}} \\ &= 1 - \text{proportion of ink that can be erased without loss of data information}\end{split}\]

    The heuristic is to maximize this ratio as far as possible by using the ink (pixels) for only the data.

  • Rather use a table than a bar plot for a handful of data points.

  • Don’t use cross-hatching, textures or unusual shading in the plots. This creates distracting visual vibrations.

  • Use horizontal bars if

    • there is some ordering to the categories (it is often easier to read the category labels from top-to-bottom), or

    • if the labels do not fit side-by-side: don’t make the reader have to rotate the page to interpret the plot; rotate the plot for the reader.

  • You can place the labels inside the bars.

  • You should start the noncategory axis at zero: the bar’s area shows the magnitude. Starting bars at a nonzero value distorts the meaning.