2.2 R code execution

STA141A: Fundamentals of Statistical Data Science

Akira Horiguchi

Functions

General concepts

Functions are modules of code that accomplish a specific task.
Functions have an input of some sort of data structure (value, vector, dataframe, etc.), process it, and return an output.
Common R built-in functions are, e.g., sum() or mean(), where the input is a vector and the output is a number.

Components of a function

Function name: It is stored in the R environment as an object with this name.
Argument(s): When calling a function, you pass a value or values to the argument(s).
- …can be required or optional.
- …can have default values.
Function Body: The sequence of commands that are executed when the function is called.
Return Value: The output of the function.

square <- function(x) {
  y <- x^2
  return(y)
}
square(3)

[1] 9

The function name is square.
The function has only one argument; here it is called x.
The function body are the two lines of code between the curly braces { and }.
The return value is y.

Passing arguments

When calling a function, you can specify the arguments by:

position

mean(1:10, 0.2, TRUE)

[1] 5.5

complete name

mean(x = 1:10, trim = 0.2, na.rm = TRUE)

[1] 5.5

partial name (does not work when the abbreviation is ambiguous)

mean(x = 1:10, n = TRUE, t = 0.2)

[1] 5.5

Customized functions – why are they useful?

Make code easier to understand due to an evocative name.
Useful to avoid code repetitions.
Help reduce the chance of making mistakes when you copy and paste.
- (e.g., updating a variable name in one place, but not in another).

Customized functions – Writing your own function

How to write your own function:

FunctionName <- function(arg1, arg2, ...) {
  # what the function does with the arguments, and the output
}

Example

square_with_offset <- function(x, offset=0) {
  y <- x^2
  return(y + offset)
}
square_with_offset(3)  # default value of offset is 0

[1] 9

square_with_offset(3, -6)

[1] 3

Some statistical functions

Let x and y be numeric vectors of the same length. We can calculate:

The mean of x by mean(x);
The variance of x by var(x);
The standard deviation of x by sd(x);
The covariance of x and y using cov(x, y);
The correlation of x and y using cor(x, y).

Loops

Repetitive execution: `for` loop

Template

for (variable in vector) {
  # commands to be repeated
}

Example for loop: element-wise squaring

y <- sample(1:10, 5)
y

[1]  1 10  8  7  3

z <- numeric(length(y))  # sets an empty numeric vector of length 5
for (i in 1:5) {
  z[i] <- y[i]^2
}
z

[1]   1 100  64  49   9

Repetitive execution: `for` loop

Example for loop: cumulative sum.

[1]  1 10  8  7  3

z <- 0
for (i in 1:5) {
  z <- z + y[i]  # uses previous iteration's value of z
  print(paste("the cumulative sum of the vector y at index", i, "is:", z))
}

[1] "the cumulative sum of the vector y at index 1 is: 1"
[1] "the cumulative sum of the vector y at index 2 is: 11"
[1] "the cumulative sum of the vector y at index 3 is: 19"
[1] "the cumulative sum of the vector y at index 4 is: 26"
[1] "the cumulative sum of the vector y at index 5 is: 29"

[1] 29

Repetitive execution: `for` loop

Example for loop: compute \(\sum_{n=1}^5 n!\)

z <- 0
for (i in 1:5) {
  y <- factorial(i)  # the factorial() function is built into R
  print(paste0("the value of  ", i, "!  is: ", y))
  z <- z + y  # uses previous iteration's value of z
}

[1] "the value of  1!  is: 1"
[1] "the value of  2!  is: 2"
[1] "the value of  3!  is: 6"
[1] "the value of  4!  is: 24"
[1] "the value of  5!  is: 120"

[1] 153

Q: what happens if we omit z <- 0 at line 1?

Repetitive execution: `while` loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

Example while loop (random walk)

x <- 0
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}

[1] "moving x=0 by step of size 1"
[1] "moving x=1 by step of size 1"
[1] "moving x=2 by step of size 1"

Repetitive execution: `while` loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

Example while loop (random walk): another set of random steps

x <- 0
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}

[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size -1"
[1] "moving x=-2 by step of size -1"

Repetitive execution: `while` loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

Example while loop (random walk): fix the set of “random” steps

set.seed(42)  # for reproducibility; fixes any proceding "random" results
x <- 0
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}

[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size -1"
[1] "moving x=-2 by step of size -1"

Repetitive execution: `while` loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

It is possible that the body of a while() loop will never be executed.

x <- 4  # this will not satisfy the condition in the proceding while loop
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}

Comments on loops

Performs commands sequentially

Pro: Helpful if commands depend on the values from the previous iteration’s commands
Con: Is a bit clunky if we want to store results in a vector/list

Often we will want to perform the same set of (complicated) commands on different chunks of data.

Can use for loop, but can be difficult to understand because it is so flexible
Can instead use apply() family of functions

`apply()` family of functions

`apply()` and related functions

lapply(X, FUN, ...): returns a list containing the result of the function FUN applied to all the elements of the list/vector X.
sapply(X, FUN, ...): essentially does lapply(X, FUN, ...) first and then tries to coerce the output into a vector.

grades <- list(group1 = sample(seq(0, 10, 0.1), 10), 
               group2 = sample(seq(5, 10, 0.1), 10), 
               group3 = sample(seq(10, 15, 0.1), 5))
grades

$group1
 [1] 2.4 7.3 1.7 4.8 4.6 2.3 7.0 8.8 3.6 1.9

$group2
 [1] 7.5 9.9 9.6 5.2 9.0 7.4 7.6 8.5 8.6 8.0

$group3
[1] 14.4 10.4 11.9 13.3 12.7

Example: calculate group means of `grades`

# for loop
for (i in 1:3) { 
  print(mean(grades[[i]])) 
}

[1] 4.44
[1] 8.13
[1] 12.54

# for-each loop
for (x in grades){ 
  print(mean(x)) 
}

[1] 4.44
[1] 8.13
[1] 12.54

lapply(grades, mean)

$group1
[1] 4.44

$group2
[1] 8.13

$group3
[1] 12.54

sapply(grades, mean)

group1 group2 group3 
  4.44   8.13  12.54

Very clean!

Example: anonymous functions

Calculate group means using…

a named function

mean_v2 <- function(x) sum(x)/length(x)
sapply(grades, mean_v2)

group1 group2 group3 
  4.44   8.13  12.54

an anonymous function (useful for single-use execution)

sapply(grades, function(x) sum(x)/length(x))

group1 group2 group3 
  4.44   8.13  12.54

Example: functions as objects

R is a functional programming language: functions can be used as objects!

funlist <- list(sum, mean, var, sd)
dat <- runif(10)

# Use for loop to apply dat to all functions in funlist
for (f in funlist) {
  print(f(dat))  # prints values
}

[1] 4.108169
[1] 0.4108169
[1] 0.116695
[1] 0.3416066

# Use sapply to apply dat to all functions in funlist
sapply(funlist, \(f) f(dat))  # also stores values in a vector

[1] 4.1081692 0.4108169 0.1166950 0.3416066

Differences between `for()` and `apply()`

Beyond aesthetic differences…

for() executes commands sequentially.
apply() family can execute commands in parallel (but don’t by default).

myfun <- function(x) {
  # commands that take a long time to execute
}

# Takes how much time?
for (x in grades) {
  myfun(x)
}

# Takes how much time (if 3 cores are used)?
lapply(grades, myfun)

reduce

`purrr::reduce()`

Repeatedly applies a binary function to the elements of a vector or list.

(a binary function is a function with two arguments)
The base R version is Reduce(), but the version from the purrr package has nicer functionality.
reduce(<list or vector>, <binary function>)

library(purrr)
xvec <- c(1,3,5,4,2)
accumulate(xvec, `*`)  # collects intermediate steps of reduce()

[1]   1   3  15  60 120

reduce(xvec, `*`)  # typically just want the final result

[1] 120

`purrr::reduce()`: example using `paste()`

paste('eek', 'a', 'bear')

[1] "eek a bear"

paste0('eek', 'a', 'bear')

[1] "eekabear"

paste('eek', 'a', 'bear', sep='...')

[1] "eek...a...bear"

paste() also does element-wise pasting

paste(c("a", "b", "c"), 1:3, sep='-')

[1] "a-1" "b-2" "c-3"

So what if you want to paste together all elements of a character vector?

svec <- c('a', 'p', 'p', 'l', 'e')
accumulate(svec, paste)

[1] "a"         "a p"       "a p p"     "a p p l"   "a p p l e"

reduce(svec, paste)

[1] "a p p l e"

accumulate(svec, paste0)

[1] "a"     "ap"    "app"   "appl"  "apple"

reduce(svec, paste0)

[1] "apple"

reduce(svec, paste, sep='-')

[1] "a-p-p-l-e"

`purrr::reduce()`: example using set intersection

lst <- lapply(c(5, 3, 2), function(b) seq(0, 30, by=b))
lst

[[1]]
[1]  0  5 10 15 20 25 30

[[2]]
 [1]  0  3  6  9 12 15 18 21 24 27 30

[[3]]
 [1]  0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30

accumulate(lst, intersect)

[[1]]
[1]  0  5 10 15 20 25 30

[[2]]
[1]  0 15 30

[[3]]
[1]  0 30

Final element of accumulate(lst, intersect) will be the output of reduce(lst, intersect).

Same as intersect(intersect(lst[[1]], lst[[2]]), lst[[3]])
intersect(): must have two arguments.

`purrr::reduce()`: example of stacking data frames

MakeDF <- function(a) {
  data.frame(param = a, u = runif(4, min=a))
}
(av <- -1 * rexp(10))

 [1] -0.2992672 -0.2799575 -0.2311141 -1.2870888 -0.5693856 -3.0185564
 [7] -0.4975908 -0.3545537 -1.7564090 -0.7374990

(df_list <- lapply(av, MakeDF))

[[1]]
       param         u
1 -0.2992672 0.3690927
2 -0.2992672 0.5785272
3 -0.2992672 0.9776749
4 -0.2992672 0.6875838

[[2]]
       param           u
1 -0.2799575  0.44512359
2 -0.2799575  0.80760922
3 -0.2799575 -0.03743895
4 -0.2799575  0.06727781

[[3]]
       param          u
1 -0.2311141  0.7884435
2 -0.2311141  0.6223001
3 -0.2311141  0.0650239
4 -0.2311141 -0.1781900

[[4]]
      param          u
1 -1.287089 -0.9658007
2 -1.287089 -0.7921962
3 -1.287089 -0.1906617
4 -1.287089 -0.8355938

[[5]]
       param           u
1 -0.5693856  0.55956111
2 -0.5693856 -0.55701136
3 -0.5693856  0.01990297
4 -0.5693856  0.23791847

[[6]]
      param          u
1 -3.018556 -3.0122451
2 -3.018556 -0.6813479
3 -3.018556 -2.3840054
4 -3.018556 -1.5757809

[[7]]
       param          u
1 -0.4975908  0.4693016
2 -0.4975908  0.6642751
3 -0.4975908  0.3465215
4 -0.4975908 -0.1475987

[[8]]
       param          u
1 -0.3545537 -0.2326702
2 -0.3545537 -0.2385875
3 -0.3545537  0.0588810
4 -0.3545537  0.5495114

[[9]]
      param          u
1 -1.756409 -1.7557505
2 -1.756409 -1.1815049
3 -1.756409  0.8154147
4 -1.756409  0.7950465

[[10]]
      param          u
1 -0.737499  0.5379891
2 -0.737499 -0.1587868
3 -0.737499  0.1574230
4 -0.737499  0.5551562

reduce(df_list, rbind)

        param           u
1  -0.2992672  0.36909267
2  -0.2992672  0.57852718
3  -0.2992672  0.97767495
4  -0.2992672  0.68758376
5  -0.2799575  0.44512359
6  -0.2799575  0.80760922
7  -0.2799575 -0.03743895
8  -0.2799575  0.06727781
9  -0.2311141  0.78844348
10 -0.2311141  0.62230012
11 -0.2311141  0.06502390
12 -0.2311141 -0.17819002
13 -1.2870888 -0.96580066
14 -1.2870888 -0.79219616
15 -1.2870888 -0.19066173
16 -1.2870888 -0.83559384
17 -0.5693856  0.55956111
18 -0.5693856 -0.55701136
19 -0.5693856  0.01990297
20 -0.5693856  0.23791847
21 -3.0185564 -3.01224508
22 -3.0185564 -0.68134793
23 -3.0185564 -2.38400545
24 -3.0185564 -1.57578093
25 -0.4975908  0.46930157
26 -0.4975908  0.66427514
27 -0.4975908  0.34652154
28 -0.4975908 -0.14759872
29 -0.3545537 -0.23267022
30 -0.3545537 -0.23858752
31 -0.3545537  0.05888100
32 -0.3545537  0.54951137
33 -1.7564090 -1.75575051
34 -1.7564090 -1.18150490
35 -1.7564090  0.81541467
36 -1.7564090  0.79504652
37 -0.7374990  0.53798910
38 -0.7374990 -0.15878679
39 -0.7374990  0.15742300
40 -0.7374990  0.55515619

Conditional execution

`if` and `else`

Only one condition: if statement

if (condition1) {
  statement 1
} else {
  statement 2
}

Example (one coin toss)

x <- sample(0:1, 1)
x

[1] 1

if (x==0) {
  print('x is heads')
} else {
  print('x is tails')
}

[1] "x is tails"

`if` and `else`: vectorized version

ifelse(condition1, statement1, statement2)

Example (five coin tosses)

y <- sample(0:1, 5, replace=TRUE)
y

[1] 1 1 0 1 0

z <- ifelse(y==0, 'y is heads', 'y is tails')
z

[1] "y is tails" "y is tails" "y is heads" "y is tails" "y is heads"

`else if`

More than two conditions:

if (condition_1) {
  # statement 1
} else if (condition_2) {
  # statement 2
} ...
} else if (condition_n) {
  # statement n
} else {
  # else statement
}

Example

x <- sample(-4:4, size=1)
x

[1] 3

if (x < 0) {
  print('squaring x to make it positive')
  x <- x^2
} else if (x > 0) {
  print('x is already positive')
} else {
  print('adding 1 to make it positive')
  x <- x+1
}

[1] "x is already positive"

[1] 3

2.2 R code execution

Functions

General concepts

Components of a function

Passing arguments

Customized functions – why are they useful?

Customized functions – Writing your own function

Some statistical functions

Loops

Repetitive execution: for loop

Repetitive execution: for loop

Repetitive execution: for loop

Repetitive execution: while loop

Repetitive execution: while loop

Repetitive execution: while loop

Repetitive execution: while loop

Comments on loops

apply() family of functions

apply() and related functions

Example: calculate group means of grades

Example: anonymous functions

Example: functions as objects

Differences between for() and apply()

reduce

purrr::reduce()

purrr::reduce(): example using paste()

purrr::reduce(): example using set intersection

purrr::reduce(): example of stacking data frames

Conditional execution

if and else

if and else: vectorized version

else if

Repetitive execution: `for` loop

Repetitive execution: `for` loop

Repetitive execution: `for` loop

Repetitive execution: `while` loop

Repetitive execution: `while` loop

Repetitive execution: `while` loop

Repetitive execution: `while` loop

`apply()` family of functions

`apply()` and related functions

Example: calculate group means of `grades`

Differences between `for()` and `apply()`

`purrr::reduce()`

`purrr::reduce()`: example using `paste()`

`purrr::reduce()`: example using set intersection

`purrr::reduce()`: example of stacking data frames

`if` and `else`

`if` and `else`: vectorized version

`else if`