Software tutorial/Calculating statistics from a data sample

From Statistics for Engineering
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
← Dealing with factors (categorical variables) (previous step) Tutorial index Next step: Dealing with distributions →


<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/> Load a data set, for example the `Website traffic <http://openmv.net/info/website-traffic>`_ data:


.. code-block:: s

# Over the internet website <- read.csv('http://openmv.net/file/website-traffic.csv')

# or from your hard drive website <- read.csv('C:/StatsCourse/Data/website-traffic.csv')


# Take a quick look at the data to make sure it's what we expect ... summary(website) DayOfWeek MonthDay Year Visits Friday :30 August 1 : 1 Min. :2009 Min.  : 3.00 Monday :31 August 10: 1 1st Qu.:2009 1st Qu.:16.25 Saturday :30 August 11: 1 Median :2009 Median :22.00 Sunday :30 August 12: 1 Mean :2009 Mean :22.23 Thursday :31 August 13: 1 3rd Qu.:2009 3rd Qu.:27.75 Tuesday :31 August 14: 1 Max. :2009 Max. :48.00 Wednesday:31 (Other) :208

# Calculate the mean of the "Visits" column: visits <- website$Visits visits.mean <- mean(visits) visits.mean [1] 22.23364

# The standard deviation: use sd(...) visits.sd <- sd(visits) visits.sd [1] 8.331826

# How do the robust equivalents compare? visits.median = median(visits) visits.mad = mad(visits) c(visits.median, visits.mad) [1] 22.0000 8.8956


You can use these additional R commands to compute other summaries of interest for a sequence of data:

.. code-block:: s

# The sum sum(visits) [1] 4758

# The minimum and maximum c(min(visits), max(visits)) [1] 3 48

# Or just use the range(...) command to get the same result range(visits) [1] 3 48

# The summary(...) command we saw earlier gives all this, as well as the # 1st and 3rd quartiles. Here's another way to summarize a variable: quantile(visits) 0% 25% 50% 75% 100% 3.00 16.25 22.00 27.75 48.00

# It gives the 0, 0.25, 0.50, 0.75 and 1.00 sample quantiles at those # probabilities. If you want to specify your own probability: quantile(visits, 0.32) 32% 18

# So 32% of the observations in this data recored a value of 18 or # fewer visits to the website.

# Recall the interquartile range is the distance from the 3rd to the 1st quartile: visits.iqr <- quantile(visits, 0.75) - quantile(visits, 0.25) # 11.5

# or, you can calculate it more directly using the IQR(...) function: visits.iqr <- IQR(visits) # 11.5

# Type help(IQR) to see how to compare the IQR to the 2 other measures of spread: # the standard deviation and the median absolute deviation (MAD) </rst>

← Dealing with factors (categorical variables) (previous step) Tutorial index Next step: Dealing with distributions →