1.7. Tables as a form of data visualization¶
The data table is an efficient format for comparative data analysis on categorical objects. Usually, the items being compared are placed in a column, while the categorical objects are in the rows. The quantitative value is then placed at the intersection of the row and column, called the cell. The following examples demonstrate data tables.
This table compares monthly payments for buying or leasing various cars (categories). The first two columns are being compared; the other columns contain additional, secondary information.
The next table compares defect types (number of defects) for different product grades (categories).
This particular table raises more questions:
- Which defects cost us the most money?
- Which defects occur most frequently? The table does not contain any information about production rate. For example, if there were 1850 lots of grade A4636 (first row) produced, then defect A occurs at a rate of 37/1850 = 1/50. And if 250 lots of grade A2610 (last row) were produced, then, again, defect A occurs at a rate of 1/50. Redrawing the table on a production-rate basis would be useful if we are making changes to the process and want to target the most problematic defect.
Three common pitfalls to avoid:
Avoid using pie charts when tables will do
Pie charts are tempting when we want to graphically break down a quantity into components. I have used them erroneously myself (here is an example on a website that I helped with: http://www.macc.mcmaster.ca/gradstudies.php). We won’t go into details here, but I strongly suggest you read the convincing evidence of Stephen Few in: “Save the pies for dessert”. The key problem is that the human eye cannot adequately decode angles; however, we have no problem with linear data.
Avoid arbitrary ordering along the first column; usually, alphabetically or in time order
Listing the car types alphabetically is trivial: instead, list them by some other third criterion of interest, perhaps minimum down payment required, typical lease duration, or total amount of interest paid on the loan. That way you get some extra context to the table for free.
Avoid using excessive grid lines
Tabular data should avoid vertical grid lines, except when the columns are so close that mistakes will be made. The human eye will use the visual white space between the numbers to create its own columns.
To wrap up this section is a demonstration of tabular data in a different format, based on an idea of Tufte in The Visual Display of Quantitative Information, p. 158. Here we compare the corrosion resistance and roughness of a steel surface for two different types of coatings, A and B.
A layout that you expect to see in a standard engineering report:
Product Corrosion resistance Surface roughness Coating A Coating B Coating A Coating B K135 0.30 0.22 30 42 K136 0.45 0.39 86 31 P271 0.22 0.24 24 73 P275 0.40 0.44 74 52 S561 0.56 0.36 70 75 S567 0.76 0.51 63 70
And the layout advocated by Tufte:
Note how the slopes carry the information about the effect of changing the coating type. The rearranged row ordering shows the changes as well. This idea is effective for two treatments but could be extended to three or four treatments by adding extra “columns.”