Launch RStudio. By default the panes should look something like this -
Note that you can customize all of them. Initially we will be primarily working with the console before moving to the editor.
Assignments are of the form:
Object <- Object_Value
which can be construed to express “object gets the object’s value”.
For instance,
x <- 5+13
x
## [1] 18
will reveal object x gets the value 18. The double hash ##
is the result as it should appear in your console.
We will be using lots of assignments. While =
sign will equally work for assignments, it is best to use <-
as this will avoid lots of confusion later on. If you are lazy to use <-
note that RStudio provides lots of shortcut options. For instance, the assignment operator <-
has shortcut Option+-
in Mac and Alt+-
in windows. For a comprehensive list of shortcuts, refer to the list here. Moreover, Option+Shift+k
(in Mac) in RStudio will bring keyboard shortcut reference card.
Tip: Note that RStudio automatically surrounds assignment operator with spaces. It is a good coding practice to allow spaces to enhance readability.
Caveat: Objects cannot start with certain charecters (such as comma, space), or start with numbers. Snake case and Camel case are a good practices for naming conventions.
Whatever convention you prefer, maintain consistency. Consistancy and accuracy are critical when dealing with computers and programming. Whenever you run into errors the first thing you check are the typos. Case matters, typos matter. In my experience, I have found almost all the errors made by beginners are attributed to typographical errors. There is no other way than getting better at typing and checking/re-checking.
You can assign strings to objects using qutation marks. For example, the following code assigns the string “R rocks” to object r.
r <- "R rocks"
r
## [1] "R rocks"
Classwork/Homework: Assign the string hello world to object h and print the result.
Whatever we do in the console can be saved as an RScript in the editor and run in RStudio. To do this choose:
File -> New File -> RScript, type the scripts, save it and click Run in the editor.
Classwork/Homework: Do the above classwork as an RScript and run.
During an R session, objects are created and stored by name. The R command:
objects()
## [1] "r" "x"
will display the objects that are currently stored in the session. The list of all such objects constitute the workspace of R. For removing objects, the function rm
can be handy. For example,
rm(r)
objects()
## [1] "x"
will remove the object r
created above. To remove multiple objects, we just list them with a comma delimiter, like, rm(r,x)
.
R has a wide-range of built-in functions that are generally of the form:
functionName(arg1 = val1, arg2 = val2, and so on)
Note: Not all functions will have arguments. For instance, the function date()
that prints the current date and time does not have any argument.
date()
## [1] "Sun Oct 8 08:39:41 2017"
One of the extremely helpful feature of RStudio is the ?
operator. You can use this to find out the description of the function in question, including its usage and arguments. Question the function to know more about the function in question. For instance,
?date()
would describe the function date()
in the help tab of the IDE.
Tip: If you type da
in console and hit TAB, RStudio will try to autocomplete the function for you, suggesting hints on what possibly could be the built-in function. Also, if you type an open paranthesis, RStudio will also supply the closed parathesis.
Note that if you type the function name without any paranthesis, R will print the source code of the function, which is not you may probably want.
Classwork/Homework: Consider the seq
function. What does it do?
Observe that:
seq(from=1,to=10)
## [1] 1 2 3 4 5 6 7 8 9 10
is same as,
seq(1,10)
## [1] 1 2 3 4 5 6 7 8 9 10
This demonstrates how function arguments are resolved in R. We can always specify from/to = value
. But if we do not, R attempts to resolve by position. So in the above code, it is assumed that we want a sequence from = 1
that goes to = 10
. Although R can try to resolve arguments on its own, best practice is to specify it.
The function c()
combines values into a vector or a list. To set up a vector named x
, say, consisting of eight numbers, namely, 12, 20.1, 53.6, 2, 7.43, 24, 8.2, and 6, the assign statement:
x <- c(12, 20.1, 53.6, 2, 7.43, 24, 8.2, 6)
will assign the vector to the object x
. We can form mathematical expressions using x
like any other variable. For example,
sin(x)
## [1] -0.5365729 0.9491246 -0.1917303 0.9092974 0.9114582 -0.9055784
## [7] 0.9407306 -0.2794155
sqrt(x)
## [1] 3.464102 4.483302 7.321202 1.414214 2.725803 4.898979 2.863564 2.449490
will list the sine and square root of each number in the vector x
. We can also combine vector with itself:
c(x,x)
## [1] 12.00 20.10 53.60 2.00 7.43 24.00 8.20 6.00 12.00 20.10 53.60
## [12] 2.00 7.43 24.00 8.20 6.00
or include some numbers in-between c(x,0,x)
.
One of the greatest advantages in R (as opposed to say, Matlab) is that we can combine vectors of different length. For example, the following code:
y <- c(x,0,x)
v <- 2*x + y + 1
v
## [1] 37.00 61.30 161.80 7.00 23.29 73.00 25.60 19.00 25.00 53.20
## [11] 128.30 58.60 17.86 56.43 41.40 21.20 31.00
is equivalent to the following addition:
y : 12 20.1 53.6 2 7.43 24 8.2 6 0 12 20.1 53.6 2 7.43 24 8.2 6
2x: 24 40.2 107.2 4 14.86 48 16.4 12 24 40.2 107.2 4 14.86 48 16.4 12 24
1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
-----------------------------------------------------------------------------------
sum: 37 61.3 161.8 7 23.29 73 25.6 19 25 53.2 128.3 58.6 17.86 56.43 41.4 21.2 31
-----------------------------------------------------------------------------------
so the expression 2*x
is scaled to the length of the longest vector (here y
). This is called recycling.
Several functions work as usual, like, mean(), max(), min(), sort()
etc. Also, as such, square root of a negative number, like sqrt(-19)
will retrun NaN
and a warning, but sqrt(-19+0i)
will work.
R allows manipulation of logical quantities. Logical quantities can consist of TRUE
, FALSE
or NA
. Logical vectors are generated by conditions. This assignment:
logical_vector <- x > 13
generates logical vector corresponding to the elements of x
that are greater than or less than 13.
# Create a numeric vector
v <- c(2,15,5,7)
# Create a naming vector
n <- c("two","fifteen","five","seven")
# Assign the names to the vector
names(v) <- n
v
## two fifteen five seven
## 2 15 5 7
Alternatively, one can name the vectors as follows:
v <- c(two=2,fifteen=15,five=5,seven=7)
v
## two fifteen five seven
## 2 15 5 7
Classwork/Homework:
names(n) <- v
? # Subsetting by index
v[c(1,2)]
## two fifteen
## 2 15
# Subsetting by name
v[c("two","seven")]
## two seven
## 2 7
# Subset all but some
v[-c(1,2)]
## five seven
## 5 7
# Subset using logicals
v[c(FALSE,TRUE,TRUE,FALSE)]
## fifteen five
## 15 5
Classwork/Homework:
v[c("two","three")]
?v
(remember recyling)? The function is.na(x)
gives a logical vector of the same size as x
with value TRUE
if and only if the corresponding element in x is NA. The function is.na()
also returns the value TRUE
for NaN
. To differentiate these, R also provides a function is.nan()
that returns TRUE
only for NaN
.
Classwork/Homework:
NA
in them and test for missing values using the is.na()
function.Character strings are entered using either matching double (")
or single (’)
quotes, but are printed using double quotes (or sometimes without quotes). The paste()
function takes an arbitrary number of arguments and concatenates them one by one into character strings.
paste(c("X","Y"), "ab,b")
## [1] "X ab,b" "Y ab,b"
Note that by default the arguments are by default separated in the result by a single blank character, but this can be changed by the named argument, sep=string
, which changes it to string, possibly empty. Thus,
paste(c("X","Y"), "ab,b", sep="")
## [1] "Xab,b" "Yab,b"
will render the arguments together.
Classwork/Homework: Play with the sep
argument to include other type of delimiters (like comma etc.)
Vectors are the most important type of objects in R, but there are several other objects that we will encounter frequently.
Here is a list of other important objects.
matrices or more generally arrays are multi-dimensional generalizations of vectors. In fact, they are vectors that can be indexed by two or more indices and will be printed in special ways.
factors provide compact ways to handle categorical data.
lists are a general form of vector in which the various elements need not be of the same type, and are often themselves vectors or lists. Lists provide a convenient way to return the results of a statistical computation.
data frames are matrix-like structures, in which the columns can be of different types. One can think of data frames as ‘data matrices’ with one row per observational unit but with (possibly) both numerical and categorical variables. Many experiments are best described by data frames: the treatments are categorical but the response is numeric.
functions are themselves objects in R which can be stored in the project’s workspace. This provides a simple and convenient way to extend R.
The function class()
can be used to reveal the data type. Basic data types are:
Logical: TRUE
, FALSE
or NA
Numeric: 2.5, 17 etc.
Integer: Integer numeric appending the letter L
to it: 12L, 17L, -2L etc.
Character: “R data types rock, etc.”
Other atomic datatypes: double
, complex
and raw
.
# Reveal the class of the logical "TRUE"
class(TRUE)
## [1] "logical"
# Reveal the class of the number 2.5
class(2.5)
## [1] "numeric"
# Reveal the class of the number 2
class(2)
## [1] "numeric"
# Reveal the class of the integer 2L
class(2L)
## [1] "integer"
# Reveal the class of the character "R data types rock, etc."
class("R data types rock, etc.")
## [1] "character"
Note 1: By default an integer has a datatype as numeric as illustrated by class(2)
Note 2: One can also use is.datatype(y)
to reveal the underlying datatype of the variable y
. For example, is.logical(x)
will return TRUE
if x
is a logical variable. Otherwise, it will return FALSE
.
One can force a data type to another through what is known as coercion. The function as.datatype(y)
will coerce the datatype of y
into the specified datatype.
# Coerce "TRUE" into integer
as.integer(TRUE)
## [1] 1
# Coerce "FALSE" into integer
as.integer(FALSE)
## [1] 0
# Coerce 2.5 as a character
as.character(2.5)
## [1] "2.5"
# Coerce 2 as a numeric
as.numeric("2")
## [1] 2
# Coerce the character "4.5" as an integer
as.integer("4.5")
## [1] 4
Note: There is an information loss when coercing non-integer values to integer values as revealed by the last example. Also, coercion is not always possible. For instance, coercing a charecter “hello” into numeric/integer will result in warning and NA
.
as.numeric("hello")
## Warning: NAs introduced by coercion
## [1] NA
A matrix is a rectangular array of rows and columns.
# Create a matrix specifying the number of rows
matrix(1:6, nrow=2)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
# Create a matrix specifying the number of columns
matrix(1:6, ncol=2)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
# Create a matrix specifying the number of rows &
# distributing the numbers through rows first
matrix(1:6, nrow=2, byrow=TRUE)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
# Create a matrix specifying both rows and columns
matrix(1:6, nrow=2, ncol=3)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
We can paste rows or columns using the functions cbind()
(for coumumn binding) and `rbind() (for row binding) - really important - you will see them in lots of applications.
# Combine as columns
cbind(1:3,1:3)
## [,1] [,2]
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
# Combine as rows
rbind(1:3,1:3)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 1 2 3
Naming a matrix can be done using rownames()
and colnames()
functions, just like we used the names()
function for the vectors. Matrices can be combined using cbind()
and rbind()
functions.
Classwork/Homework:
matrix()
?cbind()
(and rbind()
) if one of the columns (and rows) have more numbers than the other?cbind()
. # Create a matrix
M <- matrix(1:9,nrow=3)
# Subsetting through indices
# Print the element in row 2 and column 3
M[2,3]
## [1] 8
# Print all the elements in column 2
M[,2]
## [1] 4 5 6
# Print all the elements in row 3
M[3,]
## [1] 3 6 9
We can also subset multiple elements - to do this we use the combine function, c()
along with index notation. Thus, M[2,c(2,3)]
will fetch all the elements in \(2\)nd row and in columns \(2\) and \(3\). Subseting also works through column names and row names and could be combined with indices. Further, subsetting works on logicals the same way it works for vectors.
Classwork/Homework:
M[5]
print? Specify the rationale behind this.M[c(2,3),c(1,2)]
? rowSums()
and colSums()
function provide the sums of rows and columns. Basic arithmetic operations involving matrix and scalar hold just like for vectors. For example, dividing a matrix by a number will divide each element by that number. How about matrix and vector operations? Recycling:
# Create a matrix
M <- matrix(1:9,nrow=3)
# Make a vector
v <- c(10,20,25)
# Matirx minus the vector
M-v
## [,1] [,2] [,3]
## [1,] -9 -6 -3
## [2,] -18 -15 -12
## [3,] -22 -19 -16
Note 1: The result is same as M - matrix(v, nrow=3, ncol=3)
. Although recycling provides a convinient way to manipulate matrix-vector operations, it is important to use such expression with caution. Whenever possible, try to convert the vector into matrix and work with matrices.
Note 2: Matrix multiplication is element-wise, unlike standard way of multiplying matrices (as in linear algebra). Matrices and vectors are very similar. The principle of coercion and recycling works almost the same.
Classwork/Homework: What happens if you subtract/add/multiply two matrices of different size?