Software tutorial/Dealing with factors (categorical variables)

From Statistics for Engineering
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
← Annotating plots: grid lines, arrows, lines, and identifying interesting points (previous step) Tutorial index Next step: Calculating statistics from a data sample →

<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/> This section shows a bit about R's ability to deal with factors. Factors are variables that are coded for categories: e.g. ``male`` and ``female``, or another example could be day of the week: ``Monday, Tuesday, ..., Sunday``.

When you loaded the website data, not all of the raw data (take a look inside the CSV file) is numeric. The ``DayOfWeek`` is text, so R assumes this is a factor. It automatically goes and finds all unique values in that column (the names of the 7 days in the week in this case), and codes that as factor variable. But it sorts them alphabetically, ``Friday, Monday, ..., Wednesday``. If you want them in a different order, use the ``levels`` input option to tell R your preferred order:

.. code-block:: s

day.names <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday" ) days <- factor(website$DayOfWeek, level=day.names) boxplot(website$Visits ~ days)

Now that boxplot will be ordered in a more useful way to see the weekly trends. The ``c()`` command creates a combination of items and the ``factor()`` command creates a factor variable. </rst>

Website-traffic-boxplot-ordered.jpg