Assignment 1 - 2014

From Statistics for Engineering
Revision as of 23:32, 10 January 2014 by Kevin Dunn (talk | contribs) (Created page with "{{OtherSidebar | due_dates = 16 January 2014, in class | dates_alt_text = Due date(s) | questions_PDF = 2014-4C3-6C3-Assignment-1.pdf | questions_text_alt = Assignment questio...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Due date(s): 16 January 2014, in class
Nuvola mimetypes pdf.png (PDF) Assignment questions

<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/>

.. rubric:: Assignment objectives: creating and interpreting data visualizations

.. question:: :grading: 3

Which types of features can can the human eye easily pick out of a time series plot?

.. question:: :grading: 4

*Final exam, 2013*: Why is the principle of minimizing "data ink" so important in an effective visualization? Give an engineering example of why this important.

.. question:: :grading: 10

Reproduce the box plot for board thickness that was discussed in class. The board thickness data set is available from `the dataset website <http://datasets.connectmv.com/info/six-point-board-thickness>`_.

#. Reproduce the figure that was shown in class, using the first 100 rows from the data set. See R code in the course notes.

#. Create a new box plot using rows 3100 to 3300. Interpret any interesting observations from this box plot. Superimpose a target line of 1680 mils.

#. Explain why the thick center line in the box plot is not symmetrical with the outer edges of the box.

This question is to ensure you can install R and use the course dataset site.

.. answer:: :fullinclude: no

REDO ANSWER

This question was mainly to get you warmed-up to R again, which you encountered in your stat prerequisite course. The R code below will generate the following 2 figures:

.. image:: ../figures/visualization/boxplot-for-two-by-six-boards-assign1-2013.png :align: left :width: 700px :scale: 90

*Left*: rows 1 to 100 and *right*: rows 4800 to 4900.

.. raw:: latex

\newpage

.. literalinclude:: ../figures/visualization/board-thickness-boxplot-assignment-2013.R :language: s

Some observations noted:

* The second box plot shows the data are more symmetrical for all positions than from the first box plot (except position 2 and 4 which have some skew to the higher thicknesses). * All positions tend to have outliers above and below the median in the second box plot. * There is on below-average outlier at position 3 in the second set of data. If you look closely in a hardware store, you will often see what it called `wane at the edge of a board <http://www.decks.com/deckmaterials/Wane>`_. This is an example of that, since position 3 (as well as 1, 4 and 6) is at the tip of the board.

.. question:: :grading: 5

Describe what the main difference(s) between a bar chart and a histogram are.

.. answer:: :fullinclude: no

The solution is directly from: http://www.forbes.com/sites/naomirobbins/2012/01/04/a-histogram-is-not-a-bar-chart/

* Histograms are used to show distributions of variables while bar charts are used to compare variables. * Histograms plot quantitative data with ranges of the data grouped into bins or intervals while bar charts plot categorical data. * Bars can be reordered in bar charts but not in histograms. * There are no spaces between the bars of a histogram since there are no gaps between the bins. An exception would occur if there were no values in a given bin but in that case the value is zero rather than a space. On the other hand, there are spaces between the variables of a bar chart. * The bars of bar charts typically have the same width. The widths of the bars in a histogram need not be the same as long as the total area is one hundred percent if percents are used or the total count if counts are used. Therefore, values in bar charts are given by the length of the bar while values in histograms are given by areas.


.. question:: :grading: 8

In a question on the final exam in ``4M3`` there was an open-ended question. The `data values are the grades <http://datasets.connectmv.com/info/systematic-method>`_ achieved for the answer to that question, broken down by whether the student used a systematic method, or not. No grades were given for using a systematic method; grades were awarded only on answering the question.

A systematic method is any method that assists the student with problem solving (e.g. define the problem, identify knowns/unknowns and assumptions, explore alternatives, plan a strategy, implement the strategy and then check the solution).

Draw two box plots next to each other that compare the two data sets. Also comment on any features you notice in the comparison.

.. question:: :grading: 8

Consider this plot we saw in class (it is also available on-line, with `some additional context <http://www.economist.com/blogs/freeexchange/2013/09/working-hours>`_)

.. image:: ../figures/visualization/scatterplot-GDP-working-hours.png

#. What is the plot's author trying to convey with this scatter plot? #. Do you believe this an effective and complete message (i.e. could you improve it somehow?) #. Is there a causal mechanism at play between the two variables? #. How would you confirm or disprove the message the plot's author is making?

.. question:: :grading: 10

At the start of the class several people indicated they wanted to learn about visualizing more than 3 variables. In class we say a way to visualize at least 5 variables.

Here's another method that you can investigate. Read up about scatterplot matrices, and draw one for the `Food texture data set <http://datasets.connectmv.com/info/food-texture>`_. See the ``car`` library in R to create an effective scatterplot matrix with the ``scatterplotMatrix`` function.

Give a couple of bullet-points interpreting the plot.

.. A scatterplot matrix can be calculated::

food <- read.csv('http://datasets.connectmv.com/file/food-texture.csv') library(car) scatterplotMatrix(food[,2:6]) # don't need the non-numeric first column

SHOW PLOT For this data set we can see the following correlations:

- Give correlations

.. question:: :grading: 0

Read the short, clearly written article by Stephen Few on the pitfalls of pie charts: `Save the pies for dessert, http://www.perceptualedge.com/articles/08-21-07.pdf <http://www.perceptualedge.com/articles/08-21-07.pdf>`_.

I do recommend you read this. The article presents an easy-to-read argument against pie charts that will hopefully convince you.

Here's a `great example <http://www.cra-arc.gc.ca/nwsrm/t1stts-eng.html>`_ from the CRA. </rst>