Difference between revisions of "Software tutorial/Reading data into R"
Kevin Dunn (talk | contribs) (Created page with "The most interesting data to analyze is always your own. How do we read your own data files into R? We will look at when the data file is on your computer, or when the dat...") |
Kevin Dunn (talk | contribs) m |
||
Line 1: | Line 1: | ||
{{Navigation|Book=Software tutorial|previous=Getting started|current=Tutorial index|next=Basic data manipulation in R}} | |||
__NOTOC__ | |||
<rst> | |||
<rst-options: 'toc' = False/> | |||
<rst-options: 'reset-figures' = False/> | |||
The most interesting data to analyze is always your own. How do we read your own data files into R? We will look at when the data file is on your computer, or when the data is available somewhere on the internet. | The most interesting data to analyze is always your own. How do we read your own data files into R? We will look at when the data file is on your computer, or when the data is available somewhere on the internet. | ||
.. note:: For now we only consider comma separated values (CSV) files. R can read other files types, such as XML files, directly from databases, and other sources. All the `details are here <http://cran.r-project.org/doc/manuals/R-data.html>`_. | |||
Data on your hard drive | |||
--------------------------- | |||
Go to the `datasets website <http://datasets.connectmv.com>`_ and download any data set, for example the **Website traffic** datasets. Save the file, | |||
and remember the location. For example: ``C:/Courses/ConnectMV/data/website-traffic.csv`` | |||
.. note:: You must use the "``/``" character in R to separate directories (folders), not the "``\``" character, even in Windows. | |||
We will use the ``read.csv`` command to read these comma-separated values (CSV) files. If you look inside the ``website-traffic.csv`` file you will how the data is stored: each column is separated by a comma, and each row is a new line. | |||
.. code-block:: s | |||
> website <- read.csv('C:/Courses/ConnectMV/data/website-traffic.csv') | |||
Linux and Mac users will have something like: | |||
.. code-block:: s | |||
> website <- read.csv('/home/yourname/ConnectMV/data/website-traffic.csv') | |||
You will get **NO** output to the screen if the data are successfully read in; you only will see something if an error occurred. | |||
The ``<-`` operation means *assign the result of the expression on the right to the variable name on the left*. To see what the variable ``website`` looks like, just type ``website`` at the R command line: | |||
.. code-block:: s | |||
> website | |||
# DayOfWeek MonthDay Year Visits | |||
# 1 Monday June 1 2009 27 | |||
# 2 Tuesday June 2 2009 31 | |||
# 3 Wednesday June 3 2009 38 | |||
# 4 Thursday June 4 2009 38 | |||
# ... | |||
# 211 Monday December 28 2009 24 | |||
# 212 Tuesday December 29 2009 18 | |||
# 213 Wednesday December 30 2009 10 | |||
# 214 Thursday December 31 2009 7 | |||
Reading data from the internet | |||
------------------------------ | |||
You can read the data directly from the internet. Go to the `datasets website <http://datasets.connectmv.com>`_ again and right-click on the CSV link for the data set you want to download. Your web browser should have the right-click option :menuselection:`Copy Link Location`, or :menuselection:`Copy Shortcut` or something similar. | |||
This will copy the address of the data set to your clipboard. Then in R, you type: | |||
.. code-block:: s | |||
website <- read.csv('http://datasets.connectmv.com/file/website-traffic.csv') | |||
where the part between quotation marks is the web address you copied. Use the paste function to avoid typing errors. | |||
Getting help | |||
------------- | |||
Before continuing further, if you ever need help with an R command type ``help("name of command")``. For example: | |||
.. code-block:: s | |||
help(read.csv) | |||
This will pop up a new window and tell you what ``read.csv`` does and *shows examples* of how to use it. | |||
</rst> |
Revision as of 01:49, 15 January 2013
<rst>
<rst-options: 'toc' = False/>
<rst-options: 'reset-figures' = False/>
The most interesting data to analyze is always your own. How do we read your own data files into R? We will look at when the data file is on your computer, or when the data is available somewhere on the internet.
.. note:: For now we only consider comma separated values (CSV) files. R can read other files types, such as XML files, directly from databases, and other sources. All the `details are here <http://cran.r-project.org/doc/manuals/R-data.html>`_.
Data on your hard drive
Go to the `datasets website <http://datasets.connectmv.com>`_ and download any data set, for example the **Website traffic** datasets. Save the file, and remember the location. For example: ``C:/Courses/ConnectMV/data/website-traffic.csv``
.. note:: You must use the "``/``" character in R to separate directories (folders), not the "``\``" character, even in Windows.
We will use the ``read.csv`` command to read these comma-separated values (CSV) files. If you look inside the ``website-traffic.csv`` file you will how the data is stored: each column is separated by a comma, and each row is a new line.
.. code-block:: s
> website <- read.csv('C:/Courses/ConnectMV/data/website-traffic.csv')
Linux and Mac users will have something like:
.. code-block:: s
> website <- read.csv('/home/yourname/ConnectMV/data/website-traffic.csv')
You will get **NO** output to the screen if the data are successfully read in; you only will see something if an error occurred.
The ``<-`` operation means *assign the result of the expression on the right to the variable name on the left*. To see what the variable ``website`` looks like, just type ``website`` at the R command line:
.. code-block:: s
> website
# DayOfWeek MonthDay Year Visits # 1 Monday June 1 2009 27 # 2 Tuesday June 2 2009 31 # 3 Wednesday June 3 2009 38 # 4 Thursday June 4 2009 38 # ... # 211 Monday December 28 2009 24 # 212 Tuesday December 29 2009 18 # 213 Wednesday December 30 2009 10 # 214 Thursday December 31 2009 7
Reading data from the internet
You can read the data directly from the internet. Go to the `datasets website <http://datasets.connectmv.com>`_ again and right-click on the CSV link for the data set you want to download. Your web browser should have the right-click option :menuselection:`Copy Link Location`, or :menuselection:`Copy Shortcut` or something similar.
This will copy the address of the data set to your clipboard. Then in R, you type:
.. code-block:: s
website <- read.csv('http://datasets.connectmv.com/file/website-traffic.csv')
where the part between quotation marks is the web address you copied. Use the paste function to avoid typing errors.
Getting help
Before continuing further, if you ever need help with an R command type ``help("name of command")``. For example:
.. code-block:: s
help(read.csv)
This will pop up a new window and tell you what ``read.csv`` does and *shows examples* of how to use it. </rst>