s1: Fundamentals of R

STA141A: Fundamentals of Statistical Data Science

Akira Horiguchi

Fundamentals of R

R as a calculator

5 + 10
[1] 15
10 * 3
[1] 30
(5 + 10) * 3
[1] 45
2^5
[1] 32
(5 + 10) * 3 + 2^5
[1] 77

Assignment operator

  • We often want to assign values to variables in the workspace to access them again.
  • The assignment operators are <- and =. (The Tidyverse Style Guide prefers <- and the Google R Style Guide is a fork of the Tidyverse’s. But most other programming languages use =.)
x <- 10  # Assigns 10 to x
x + 3   # Result is 13
[1] 13
y <- 3   # Assigns 3 to y
x + y
[1] 13
z <- x + y  # Assigns the result of x + y to z

Objects: Functions vs Data Objects - 1

From a task-oriented perspective, using R and most other programming languages consists of:

  • Creating or loading data (inputs, material, values)
  • Calling or evaluating functions (verbs, actions)
  • Computing and constructing new data (results, outputs, etc.)

You can think of:

  • Data objects as matter (i.e., material, or values) that are being measured, manipulated, or processed.
  • Functions as procedures (i.e., actions, operations, or verbs) that measure, manipulate, or process data.

Objects: Functions vs Data Objects - 2

We will talk about functions later on in detail, but we’ll start making use of functions straight away. Here it is useful to review some basic concepts.

  • A function is ‘called’ by specifying the function’s name and appropriate data objects as its so-called arguments/parameters.
  • Function arguments are enclosed in (round) parentheses and best specified in a ‘name = value’ notation.
  • Some arguments are essential for the function, while others are optional.
x <- 5
log(x)  # This is log with base e (i.e. ln(x))
[1] 1.609438
log(x, base = 10)  # Here, base=10 is an optional argument
[1] 0.69897

Basic data types in R

Basic data types:

  • Numeric (235.22)
  • Integer (1)
  • Character / string ("You are here")
  • Logical / boolean (TRUE, FALSE)
  • Missing (NA)

Use class() to get data type of a value

x <- 10.5
class(x)
[1] "numeric"
x <- 1000L # R declares integers by putting L after numbers
class(x)
[1] "integer"
x <- "I love programming"
class(x)
[1] "character"
x <- TRUE
class(x)
[1] "logical"

Inspect data

You can inspect data by entering the variable name into the console, or by using str()

x <- 7
x
[1] 7
str(x)
 num 7
y <- 7L
y
[1] 7
str(y)
 int 7

What is the difference between

  • the output of x and y?
  • the output of str(x) and str(y)?

Missing values

  • When an element or value is “not available” or a “missing value”, a place within a vector may be reserved for it by assigning it the special value NA.
  • In general, any operation on an NA becomes an NA.
  • Most elementary R functions have a specific argument to deal with NAs.
x <- c(4,16,9,-1)
y <- sqrt(x)  # square root
y
[1]   2   4   3 NaN
y+1
[1]   3   5   4 NaN
mean(y)
[1] NaN
mean(y, na.rm=TRUE)
[1] 3