Software tutorial/About the course software

From Statistics for Engineering
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Tutorial index Next step: Software installation →

R is a software package for statistical computing and graphics. It has become a standard among statistical programming languages.

  • Many companies already use it as a standard package now (as reported in this New York Times article, R is used by Google, Shell, Pfizer, Merck, Bank of America, and others).
  • It can run on Windows, Linux and Mac computers.
  • Commercial software support is available from 3rd parties.
  • The software can be installed on a local desktop, or in a networked environment and run remotely.
  • It is free (both for academic and commercial use), so it can be used in any academic or corporate environment.
  • Installation is straightforward.
  • The open-source license is not restrictive: you can legally modify and improve the software.
  • There are excellent add-on libraries for almost anything related to data analysis.
  • It promotes good statistical practice of writing a code file, and then running it (like MATLAB). The code file documents what you have done, and you can always repeat your analysis on a new data set, or share the code with colleagues. Other software packages tend to promote a more point-and-click approach, so you can’t always retrace your steps.
  • There are multicore and 64-bit versions of R available to process large data sets, and do parallel data processing.
Or watch this YouTube video by Roger Peng as he explains these topics in more depth:

Alternatives

Other software alternatives you might consider are Minitab, MATLAB or Python. Microsoft Excel is not recommended: it cannot perform some of the calculations and plots that you will require to properly analyze data, and its statistical accuracy is very poor (the article documents several Excel functions that produce incorrect results).