Software tutorial/Histograms

From Statistics for Engineering
< Software tutorial
Revision as of 03:24, 15 January 2013 by Kevin Dunn (talk | contribs) (Created page with "{{Navigation|Book=Software tutorial|previous=Plots with multiple series, colour, and legends|current=Tutorial index|next=Annotating plots: grid lines, arrows, lines, and ident...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
← Plots with multiple series, colour, and legends (previous step) Tutorial index Next step: Annotating plots: grid lines, arrows, lines, and identifying interesting points →

<rst> <rst-options: 'toc' = False/> Use the ``hist(...)`` command to both *calculate* and *plot* the histogram for a univariate data sequence. This section demonstrates both aspects.

.. code-block:: s

rm <- read.csv('http://datasets.connectmv.com/file/raw-material-properties.csv')

# Plot the histogram for the "density2" variable in the data: hist(rm$density2) You will get this plot: </rst> [[Image:default-histogram-density2.jpg|500px|center]] <rst> <rst-options: 'toc' = False/> You can change the axis labels and the main title by using the :ref:`usual plot arguments <r-other-plot-options>` described earlier. The ``hist(...)`` command also returns a whole lot more information, in addition to drawing the plot, but only if you first create a variable: .. code-block:: s density2.hist <- hist(rm$density2) density2.hist $breaks [1] 10 11 12 13 14 15 16 17 18 $counts [1] 1 2 8 8 3 2 1 1

$intensities [1] 0.03846153 0.07692308 0.30769231 0.30769231 0.11538462 0.07692308 0.03846154 0.03846154 $density [1] 0.03846153 0.07692308 0.30769231 0.30769231 0.11538462 0.07692308 0.03846154 0.03846154

$mids [1] 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 $xname [1] "rm$density2" $equidist [1] TRUE

attr(,"class") [1] "histogram"

The above output shows where the bin edges (``breaks``) and bin midpoints (``mids``) were automatically calculated and the number of entries (``count``) in each bin. The ``density`` value is just ``counts/N``, in other words, the relative frequency. You could access the count data, for example, directly as:

.. code-block:: s

density2.hist$counts [1] 1 2 8 8 3 2 1 1 .. rubric:: Summary * The frequency histogram: just use ``hist(...)`` * The *relative* frequency histogram, which is normalized to unit area: ``hist(rm$density2, freq=FALSE)`` </rst>