2.2 R code execution

STA141A: Fundamentals of Statistical Data Science

Akira Horiguchi

Functions

General concepts

  • Functions are modules of code that accomplish a specific task.
  • Functions have an input of some sort of data structure (value, vector, dataframe, etc.), process it, and return an output.
  • Common R built-in functions are, e.g., sum() or mean(), where the input is a vector and the output is a number.

Components of a function

  • Function name: It is stored in the R environment as an object with this name.
  • Argument(s): When calling a function, you pass a value or values to the argument(s).
    • …can be required or optional.
    • …can have default values.
  • Function Body: The sequence of commands that are executed when the function is called.
  • Return Value: The output of the function.
square <- function(x) {
  y <- x^2
  return(y)
}
square(3)
[1] 9
  1. The function name is square.
  2. The function has only one argument; here it is called x.
  3. The function body are the two lines of code between the curly braces { and }.
  4. The return value is y.

Passing arguments

When calling a function, you can specify the arguments by:

  • position
mean(1:10, 0.2, TRUE)
[1] 5.5
  • complete name
mean(x = 1:10, trim = 0.2, na.rm = TRUE)
[1] 5.5
  • partial name (does not work when the abbreviation is ambiguous)
mean(x = 1:10, n = TRUE, t = 0.2)
[1] 5.5

Customized functions – why are they useful?

  • Make code easier to understand due to an evocative name.
  • Useful to avoid code repetitions.
  • Help reduce the chance of making mistakes when you copy and paste.
    • (e.g., updating a variable name in one place, but not in another).

Customized functions – Writing your own function

How to write your own function:

FunctionName <- function(arg1, arg2, ...) {
  # what the function does with the arguments, and the output
}

Example

square_with_offset <- function(x, offset=0) {
  y <- x^2
  return(y + offset)
}
square_with_offset(3)  # default value of offset is 0
[1] 9
square_with_offset(3, -6)
[1] 3

Some statistical functions

Let x and y be numeric vectors of the same length. We can calculate:

  • The mean of x by mean(x);
  • The variance of x by var(x);
  • The standard deviation of x by sd(x);
  • The covariance of x and y using cov(x, y);
  • The correlation of x and y using cor(x, y).

Loops

Repetitive execution: for loop

Template

for (variable in vector) {
  # commands to be repeated
}

Example for loop: element-wise squaring

y <- sample(1:10, 5)
y
[1]  1 10  8  7  3
z <- numeric(length(y))  # sets an empty numeric vector of length 5
for (i in 1:5) {
  z[i] <- y[i]^2
}
z
[1]   1 100  64  49   9

Repetitive execution: for loop

Example for loop: cumulative sum.

y
[1]  1 10  8  7  3
z <- 0
for (i in 1:5) {
  z <- z + y[i]  # uses previous iteration's value of z
  print(paste("the cumulative sum of the vector y at index", i, "is:", z))
}
[1] "the cumulative sum of the vector y at index 1 is: 1"
[1] "the cumulative sum of the vector y at index 2 is: 11"
[1] "the cumulative sum of the vector y at index 3 is: 19"
[1] "the cumulative sum of the vector y at index 4 is: 26"
[1] "the cumulative sum of the vector y at index 5 is: 29"
z
[1] 29

Repetitive execution: for loop

Example for loop: compute \(\sum_{n=1}^5 n!\)

z <- 0
for (i in 1:5) {
  y <- factorial(i)  # the factorial() function is built into R
  print(paste0("the value of  ", i, "!  is: ", y))
  z <- z + y  # uses previous iteration's value of z
}
[1] "the value of  1!  is: 1"
[1] "the value of  2!  is: 2"
[1] "the value of  3!  is: 6"
[1] "the value of  4!  is: 24"
[1] "the value of  5!  is: 120"
z
[1] 153

Q: what happens if we omit z <- 0 at line 1?

Repetitive execution: while loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

Example while loop (random walk)

x <- 0
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}
[1] "moving x=0 by step of size 1"
[1] "moving x=1 by step of size 1"
[1] "moving x=2 by step of size 1"

Repetitive execution: while loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

Example while loop (random walk): another set of random steps

x <- 0
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}
[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size -1"
[1] "moving x=-2 by step of size -1"

Repetitive execution: while loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

Example while loop (random walk): fix the set of “random” steps

set.seed(42)  # for reproducibility; fixes any proceding "random" results
x <- 0
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}
[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size -1"
[1] "moving x=-2 by step of size -1"

Repetitive execution: while loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

It is possible that the body of a while() loop will never be executed.

x <- 4  # this will not satisfy the condition in the proceding while loop
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}

Comments on loops

Performs commands sequentially

  • Pro: Helpful if commands depend on the values from the previous iteration’s commands
  • Con: Is a bit clunky if we want to store results in a vector/list

Often we will want to perform the same set of (complicated) commands on different chunks of data.

  • Can use for loop, but can be difficult to understand because it is so flexible
  • Can instead use apply() family of functions

apply() family of functions

Example: calculate group means of grades

# for loop
for (i in 1:3) { 
  print(mean(grades[[i]])) 
}
[1] 4.44
[1] 8.13
[1] 12.54
# for-each loop
for (x in grades){ 
  print(mean(x)) 
}
[1] 4.44
[1] 8.13
[1] 12.54
lapply(grades, mean)
$group1
[1] 4.44

$group2
[1] 8.13

$group3
[1] 12.54
sapply(grades, mean)
group1 group2 group3 
  4.44   8.13  12.54 

Very clean!

Example: anonymous functions

Calculate group means using…

  • a named function
mean_v2 <- function(x) sum(x)/length(x)
sapply(grades, mean_v2)
group1 group2 group3 
  4.44   8.13  12.54 
  • an anonymous function (useful for single-use execution)
sapply(grades, function(x) sum(x)/length(x))
group1 group2 group3 
  4.44   8.13  12.54 

Example: functions as objects

R is a functional programming language: functions can be used as objects!

funlist <- list(sum, mean, var, sd)
dat <- runif(10)

# Use for loop to apply dat to all functions in funlist
for (f in funlist) {
  print(f(dat))  # prints values
}
[1] 4.108169
[1] 0.4108169
[1] 0.116695
[1] 0.3416066
# Use sapply to apply dat to all functions in funlist
sapply(funlist, \(f) f(dat))  # also stores values in a vector
[1] 4.1081692 0.4108169 0.1166950 0.3416066

Differences between for() and apply()

Beyond aesthetic differences…

  • for() executes commands sequentially.
  • apply() family can execute commands in parallel (but don’t by default).
myfun <- function(x) {
  # commands that take a long time to execute
}

# Takes how much time?
for (x in grades) {
  myfun(x)
}

# Takes how much time (if 3 cores are used)?
lapply(grades, myfun)

reduce

purrr::reduce()

Repeatedly applies a binary function to the elements of a vector or list.

  • (a binary function is a function with two arguments)
  • The base R version is Reduce(), but the version from the purrr package has nicer functionality.
  • reduce(<list or vector>, <binary function>)
library(purrr)
xvec <- c(1,3,5,4,2)
accumulate(xvec, `*`)  # collects intermediate steps of reduce()
[1]   1   3  15  60 120
reduce(xvec, `*`)  # typically just want the final result 
[1] 120

purrr::reduce(): example using paste()

paste('eek', 'a', 'bear')
[1] "eek a bear"
paste0('eek', 'a', 'bear')
[1] "eekabear"
paste('eek', 'a', 'bear', sep='...')
[1] "eek...a...bear"

paste() also does element-wise pasting

paste(c("a", "b", "c"), 1:3, sep='-')
[1] "a-1" "b-2" "c-3"

So what if you want to paste together all elements of a character vector?

svec <- c('a', 'p', 'p', 'l', 'e')
accumulate(svec, paste)
[1] "a"         "a p"       "a p p"     "a p p l"   "a p p l e"
reduce(svec, paste)
[1] "a p p l e"
accumulate(svec, paste0)
[1] "a"     "ap"    "app"   "appl"  "apple"
reduce(svec, paste0)
[1] "apple"
reduce(svec, paste, sep='-')
[1] "a-p-p-l-e"

purrr::reduce(): example using set intersection

lst <- lapply(c(5, 3, 2), function(b) seq(0, 30, by=b))
lst
[[1]]
[1]  0  5 10 15 20 25 30

[[2]]
 [1]  0  3  6  9 12 15 18 21 24 27 30

[[3]]
 [1]  0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30
accumulate(lst, intersect)
[[1]]
[1]  0  5 10 15 20 25 30

[[2]]
[1]  0 15 30

[[3]]
[1]  0 30

Final element of accumulate(lst, intersect) will be the output of reduce(lst, intersect).

  • Same as intersect(intersect(lst[[1]], lst[[2]]), lst[[3]])
  • intersect(): must have two arguments.

purrr::reduce(): example of stacking data frames

MakeDF <- function(a) {
  data.frame(param = a, u = runif(4, min=a))
}
(av <- -1 * rexp(10))
 [1] -0.2992672 -0.2799575 -0.2311141 -1.2870888 -0.5693856 -3.0185564
 [7] -0.4975908 -0.3545537 -1.7564090 -0.7374990
(df_list <- lapply(av, MakeDF))
[[1]]
       param         u
1 -0.2992672 0.3690927
2 -0.2992672 0.5785272
3 -0.2992672 0.9776749
4 -0.2992672 0.6875838

[[2]]
       param           u
1 -0.2799575  0.44512359
2 -0.2799575  0.80760922
3 -0.2799575 -0.03743895
4 -0.2799575  0.06727781

[[3]]
       param          u
1 -0.2311141  0.7884435
2 -0.2311141  0.6223001
3 -0.2311141  0.0650239
4 -0.2311141 -0.1781900

[[4]]
      param          u
1 -1.287089 -0.9658007
2 -1.287089 -0.7921962
3 -1.287089 -0.1906617
4 -1.287089 -0.8355938

[[5]]
       param           u
1 -0.5693856  0.55956111
2 -0.5693856 -0.55701136
3 -0.5693856  0.01990297
4 -0.5693856  0.23791847

[[6]]
      param          u
1 -3.018556 -3.0122451
2 -3.018556 -0.6813479
3 -3.018556 -2.3840054
4 -3.018556 -1.5757809

[[7]]
       param          u
1 -0.4975908  0.4693016
2 -0.4975908  0.6642751
3 -0.4975908  0.3465215
4 -0.4975908 -0.1475987

[[8]]
       param          u
1 -0.3545537 -0.2326702
2 -0.3545537 -0.2385875
3 -0.3545537  0.0588810
4 -0.3545537  0.5495114

[[9]]
      param          u
1 -1.756409 -1.7557505
2 -1.756409 -1.1815049
3 -1.756409  0.8154147
4 -1.756409  0.7950465

[[10]]
      param          u
1 -0.737499  0.5379891
2 -0.737499 -0.1587868
3 -0.737499  0.1574230
4 -0.737499  0.5551562
reduce(df_list, rbind)
        param           u
1  -0.2992672  0.36909267
2  -0.2992672  0.57852718
3  -0.2992672  0.97767495
4  -0.2992672  0.68758376
5  -0.2799575  0.44512359
6  -0.2799575  0.80760922
7  -0.2799575 -0.03743895
8  -0.2799575  0.06727781
9  -0.2311141  0.78844348
10 -0.2311141  0.62230012
11 -0.2311141  0.06502390
12 -0.2311141 -0.17819002
13 -1.2870888 -0.96580066
14 -1.2870888 -0.79219616
15 -1.2870888 -0.19066173
16 -1.2870888 -0.83559384
17 -0.5693856  0.55956111
18 -0.5693856 -0.55701136
19 -0.5693856  0.01990297
20 -0.5693856  0.23791847
21 -3.0185564 -3.01224508
22 -3.0185564 -0.68134793
23 -3.0185564 -2.38400545
24 -3.0185564 -1.57578093
25 -0.4975908  0.46930157
26 -0.4975908  0.66427514
27 -0.4975908  0.34652154
28 -0.4975908 -0.14759872
29 -0.3545537 -0.23267022
30 -0.3545537 -0.23858752
31 -0.3545537  0.05888100
32 -0.3545537  0.54951137
33 -1.7564090 -1.75575051
34 -1.7564090 -1.18150490
35 -1.7564090  0.81541467
36 -1.7564090  0.79504652
37 -0.7374990  0.53798910
38 -0.7374990 -0.15878679
39 -0.7374990  0.15742300
40 -0.7374990  0.55515619

Conditional execution

if and else

Only one condition: if statement

if (condition1) {
  statement 1
} else {
  statement 2
}

Example (one coin toss)

x <- sample(0:1, 1)
x
[1] 1
if (x==0) {
  print('x is heads')
} else {
  print('x is tails')
}
[1] "x is tails"

if and else: vectorized version

ifelse(condition1, statement1, statement2)

Example (five coin tosses)

y <- sample(0:1, 5, replace=TRUE)
y
[1] 1 1 0 1 0
z <- ifelse(y==0, 'y is heads', 'y is tails')
z
[1] "y is tails" "y is tails" "y is heads" "y is tails" "y is heads"

else if

More than two conditions:

if (condition_1) {
  # statement 1
} else if (condition_2) {
  # statement 2
} ...
} else if (condition_n) {
  # statement n
} else {
  # else statement
}

Example

x <- sample(-4:4, size=1)
x
[1] 3
if (x < 0) {
  print('squaring x to make it positive')
  x <- x^2
} else if (x > 0) {
  print('x is already positive')
} else {
  print('adding 1 to make it positive')
  x <- x+1
}
[1] "x is already positive"
x
[1] 3