Software tutorial/Histograms

From Statistics for Engineering
Jump to: navigation, search
← Plots with multiple series, colour, and legends (previous step) Tutorial index Next step: Annotating plots: grid lines, arrows, lines, and identifying interesting points →

Use the hist(...) command to both calculate and plot the histogram for a univariate data sequence. This section demonstrates both aspects.

rm <- read.csv('http://openmv.net/file/raw-material-properties.csv')

# Plot the histogram for the "density2" variable in the data:
hist(rm$density2)

You will get this plot:

Default-histogram-density2.jpg

You can change the axis labels and the main title by using the usual plot arguments described earlier.

The hist(...) command also returns a whole lot more information, in addition to drawing the plot, but only if you first create a variable:

density2.hist <- hist(rm$density2)
density2.hist
$breaks
[1] 10 11 12 13 14 15 16 17 18

$counts
[1] 1 2 8 8 3 2 1 1

$intensities
[1] 0.03846153 0.07692308 0.30769231 0.30769231 0.11538462 0.07692308 0.03846154 0.03846154

$density
[1] 0.03846153 0.07692308 0.30769231 0.30769231 0.11538462 0.07692308 0.03846154 0.03846154

$mids
[1] 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5

$xname
[1] "rm$density2"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

The above output shows where the bin edges (breaks) and bin midpoints (mids) were automatically calculated and the number of entries (count) in each bin. The density value is just counts/N, in other words, the relative frequency. You could access the count data, for example, directly as:

density2.hist$counts
[1] 1 2 8 8 3 2 1 1

Summary

  • The frequency histogram: just use hist(...)
  • The relative frequency histogram, which is normalized to unit area: hist(rm$density2, freq=FALSE)

← Plots with multiple series, colour, and legends (previous step) Tutorial index Next step: Annotating plots: grid lines, arrows, lines, and identifying interesting points →