2.2 R code execution

STA141A: Fundamentals of Statistical Data Science

Akira Horiguchi

Functions

General concepts

Functions are modules of code that accomplish a specific task.
Functions have an input of some sort of data structure (value, vector, dataframe, etc.), process it, and return an output.
Common R built-in functions are, e.g., sum() or mean(), where the input is a vector and the output is a number.

Components of a function

Function name: It is stored in the R environment as an object with this name.
Argument(s): When calling a function, you pass a value or values to the argument(s).
- …can be required or optional.
- …can have default values.
Function Body: The sequence of commands that are executed when the function is called.
Return Value: The output of the function.

square <- function(x) {
  y <- x^2
  return(y)
}

The function name is square.
The function has only one argument; here it is called x.
The function body are the lines of code between the curly braces { and }.
The return value is y.

square(3)

[1] 9

Passing arguments

When calling a function, you can specify the arguments by:

position

mean(1:10, 0.2, TRUE)

[1] 5.5

complete name

mean(x = 1:10, trim = 0.2, na.rm = TRUE)

[1] 5.5

partial name (does not work when the abbreviation is ambiguous)

mean(x = 1:10, n = TRUE, t = 0.2)

[1] 5.5

Some useful statistical functions

Let x and y be numeric vectors of the same length. We can calculate:

The mean of x by mean(x);
The variance of x by var(x);
The standard deviation of x by sd(x);
The covariance of x and y using cov(x, y);
The correlation of x and y using cor(x, y).

Customized functions – why are they useful?

Make code easier to understand due to an evocative name.
Useful to avoid code repetitions.
Help reduce the chance of making mistakes when you copy and paste.
- (e.g., updating a variable name in one place, but not in another).

Customized functions – Writing your own function

How to write your own function:

FunctionName <- function(arg1, arg2, ...) {
  # what the function does with the arguments, and the output
}

Example

square_with_offset <- function(x, offset=0) {
  y <- x^2
  return(y + offset)
}
square_with_offset(3)  # default value of offset is 0

[1] 9

square_with_offset(3, -6)

[1] 3

Repetitive execution: Loops

for loop and while loop

Repetitive execution: `for` loop

We often will want to apply the same operation to all objects in a vector or list.

Example: square every element in the vector y

y <- sample(1:10, 4)
y

[1] 2 3 6 5

for (i in 1:4) {
  print(y[i]^2)
}

[1] 4
[1] 9
[1] 36
[1] 25

Template

for (variable in vector) {
  # commands to be repeated
}

Repetitive execution: `for` loop

Example for loop: cumulative sum.

[1] 2 3 6 5

z <- 0
for (i in 1:4) {
  z <- z + y[i]  # uses previous iteration's value of z
  print(paste("the cumulative sum of the vector y at index", i, "is:", z))
}

[1] "the cumulative sum of the vector y at index 1 is: 2"
[1] "the cumulative sum of the vector y at index 2 is: 5"
[1] "the cumulative sum of the vector y at index 3 is: 11"
[1] "the cumulative sum of the vector y at index 4 is: 16"

[1] 16

Repetitive execution: `for` loop

Example for loop: compute \(\sum_{n=1}^4 n!\)

z <- 0
for (i in 1:4) {
  y <- factorial(i)  # the factorial() function is built into R
  print(paste0("the value of  ", i, "!  is: ", y))
  z <- z + y  # uses previous iteration's value of z
}

[1] "the value of  1!  is: 1"
[1] "the value of  2!  is: 2"
[1] "the value of  3!  is: 6"
[1] "the value of  4!  is: 24"

z  # 1! + 2! + 3! + 4!

[1] 33

Q: what happens if we omit z <- 0 at line 1?

Repetitive execution: `while` loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

Example while loop (random walk)

x <- 0
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}

[1] "moving x=0 by step of size 1"
[1] "moving x=1 by step of size -1"
[1] "moving x=0 by step of size 1"
[1] "moving x=1 by step of size -1"
[1] "moving x=0 by step of size 1"
[1] "moving x=1 by step of size 1"
[1] "moving x=2 by step of size -1"
[1] "moving x=1 by step of size 1"
[1] "moving x=2 by step of size -1"
[1] "moving x=1 by step of size -1"
[1] "moving x=0 by step of size 1"
[1] "moving x=1 by step of size -1"
[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size 1"
[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size -1"
[1] "moving x=-2 by step of size -1"

Repetitive execution: `while` loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

Example while loop (random walk): another set of random steps

x <- 0
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}

[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size -1"
[1] "moving x=-2 by step of size -1"

Repetitive execution: `while` loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

Example while loop (random walk): fix the set of “random” steps

set.seed(42)  # for reproducibility; fixes any proceding "random" results
x <- 0
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}

[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size -1"
[1] "moving x=-2 by step of size -1"

Repetitive execution: `while` loop

Useful for when we don’t know how many times we want to execute commands.

while (condition is true) {
  # commands to be repeated 
}

It is possible that the body of a while() loop will never be executed.

x <- 4  # this will not satisfy the condition in the proceding while loop
while (-2 <= x && x <= 2) {
  curr_step <- sample(c(-1, 1), size=1)
  print(paste0("moving x=", x, " by step of size ", curr_step))
  x <- x + curr_step  # uses previous iteration's value of x
}

Comments on loops

Performs commands sequentially

Pro: Helpful if commands depend on the values from the previous iteration’s commands
Con: Is a bit clunky if we want to store results in a vector/list

Often we will want to perform the same set of (complicated) commands on different chunks of data.

Can use for loop, but can be difficult to understand because it is so flexible
Can instead use apply() family of functions

Repetitive execution: `apply()` family

Motivation

We often will want to apply the same function to many objects and then store the outputs for later use. Example: calculate group means of grades

grades <- list(group1 = sample(seq(0, 10, 0.1), 10), 
               group2 = sample(seq(5, 10, 0.1), 10), 
               group3 = sample(seq(10, 15, 0.1), 5))
grades

$group1
 [1] 2.4 7.3 1.7 4.8 4.6 2.3 7.0 8.8 3.6 1.9

$group2
 [1] 7.5 9.9 9.6 5.2 9.0 7.4 7.6 8.5 8.6 8.0

$group3
[1] 14.4 10.4 11.9 13.3 12.7

# One approach using a for loop
grades_group_means <- numeric(length = length(grades))  # create empty numeric vector of length 3
for (i in 1:3) { 
  grades_group_means[i] <- mean(grades[[i]]) 
}

We can simplify the above code using the apply family of functions.

`apply` family of functions

For now, introduce just two functions in this family:

lapply(X, FUN, ...): returns a list containing the result of the function FUN applied to all the elements of the list/vector X.

... indicates optional additional arguments to be passed to FUN.

Don’t worry about this until we see an example of this in a later slide.

Example: calculate group means of `grades`

# for loop
# output will be printed out but not stored anywhere
for (i in 1:3) { 
  print(mean(grades[[i]])) 
}

[1] 4.44
[1] 8.13
[1] 12.54

# for-each loop
# output will be printed out but not stored anywhere
for (x in grades){ 
  print(mean(x)) 
}

[1] 4.44
[1] 8.13
[1] 12.54

lapply(grades, mean)  # output is a list

$group1
[1] 4.44

$group2
[1] 8.13

$group3
[1] 12.54

sapply(grades, mean)  # output is a vector

group1 group2 group3 
  4.44   8.13  12.54

Very clean!

`sapply()`

sapply(X, FUN, ...): essentially does lapply(X, FUN, ...) first and then tries to coerce the output into a vector.

Example: anonymous functions

Calculate group means using…

a named function

mean_v2 <- function(x) sum(x)/length(x)
sapply(grades, mean_v2)

group1 group2 group3 
  4.44   8.13  12.54

an anonymous function (useful for single-use execution)

sapply(grades, function(x) sum(x)/length(x))

group1 group2 group3 
  4.44   8.13  12.54

Instead of the keyword function, we can use \. Example:

sapply(grades, \(x) sum(x)/length(x))  # same as above

group1 group2 group3 
  4.44   8.13  12.54

Example: functions as objects

R is a functional programming language: functions can be used as objects!

funlist <- list(sum, mean, var, sd)
dat <- runif(10)

# Use for loop to apply dat to all functions in funlist
for (f in funlist) {
  print(f(dat))  # prints values
}

[1] 4.108169
[1] 0.4108169
[1] 0.116695
[1] 0.3416066

# Use sapply to apply dat to all functions in funlist
sapply(funlist, \(f) f(dat))  # also stores values in a vector

[1] 4.1081692 0.4108169 0.1166950 0.3416066

General comment about examples

R documentation usually has many examples of how to apply that function.

Try running ?lapply to pull up the documentation for the function lapply().

Differences between `for()` and `apply` family of functions

Beyond aesthetic differences…

for() executes commands sequentially.
apply family can execute commands in parallel (but don’t by default).

myfun <- function(x) {
  # commands that take a long time to execute
}

# Takes how much time?
for (x in grades) {
  myfun(x)
}

# Takes how much time (if 3 cores are used)?
lapply(grades, myfun)

`apply()`

The apply() function enables row-wise or column-wise repetitive operations on a matrix, array, or data frame. Examples:

Row Means: apply(my_matrix, 1, mean) calculates the average for every row.
Column Sums: apply(my_matrix, 2, sum) calculates the total for every column.
Custom Functions: apply(my_matrix, 1, function(x) max(x) - min(x)) calculates the range for each row.

The basic syntax is apply(X, MARGIN, FUN, ...)

X: The input data object (matrix, array, or data frame).
MARGIN: Specifies whether the function is applied to rows or columns:
- 1: Applies the function to rows.
- 2: Applies the function to columns.
- c(1, 2): Applies the function to every individual cell.
FUN: The function to be applied (e.g., mean, sum, or a custom function).
...: Optional additional arguments to be passed to FUN

Example of additional arguments: apply(my_matrix, 1, mean, trim = 0.1, na.rm=TRUE) calculates a trimmed average for every row.

`apply()`: Notice the shape of the output!

my_matrix <- rbind(-2, -5, -c(4:1, 2:5))
my_matrix

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]   -2   -2   -2   -2   -2   -2   -2   -2
[2,]   -5   -5   -5   -5   -5   -5   -5   -5
[3,]   -4   -3   -2   -1   -2   -3   -4   -5

Suppose we want to compute the row-wise or the column-wise mean.

apply(my_matrix, 1, mean)  # row-wise mean of unique elements

[1] -2 -5 -3

apply(my_matrix, 2, mean)  # column-wise mean of unique elements

[1] -3.666667 -3.333333 -3.000000 -2.666667 -3.000000 -3.333333 -3.666667
[8] -4.000000

Both outputs are a vector.

`apply()`: Notice the shape of the output!

my_matrix

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]   -2   -2   -2   -2   -2   -2   -2   -2
[2,]   -5   -5   -5   -5   -5   -5   -5   -5
[3,]   -4   -3   -2   -1   -2   -3   -4   -5

Suppose we want to compute the row-wise or the column-wise mean, but also count the number of unique elements in that row or column.

MeanAndNumUnique <- function(x) c(mean(x), length(unique(x)))  # computes mean and number of unique elements in argument
apply(my_matrix, 1, MeanAndNumUnique)  # row-wise mean and number of unique elements

     [,1] [,2] [,3]
[1,]   -2   -5   -3
[2,]    1    1    5

apply(my_matrix, 2, MeanAndNumUnique)  # column-wise mean and number of unique elements

          [,1]      [,2] [,3]      [,4] [,5]      [,6]      [,7] [,8]
[1,] -3.666667 -3.333333   -3 -2.666667   -3 -3.333333 -3.666667   -4
[2,]  3.000000  3.000000    2  3.000000    2  3.000000  3.000000    2

Both outputs are a matrix, but do the shapes make sense?

Repetitive execution: reduce

`purrr::reduce()`

Repeatedly applies a binary function to the elements of a vector or list.

(a binary function is a function with two arguments)
The base R version is Reduce(), but the version from the purrr package has nicer functionality.
reduce(<list or vector>, <binary function>)

library(purrr)
xvec <- c(1,3,5,4,2)
accumulate(xvec, `*`)  # collects intermediate steps of reduce()

[1]   1   3  15  60 120

reduce(xvec, `*`)  # typically just want the final result

[1] 120

`purrr::reduce()`: example using `paste()`

paste('eek', 'a', 'bear')

[1] "eek a bear"

paste0('eek', 'a', 'bear')

[1] "eekabear"

paste('eek', 'a', 'bear', sep='...')

[1] "eek...a...bear"

paste() also does element-wise pasting

svec <- c('u', 'b', 'e')
paste(svec, 1:3, sep='-')

[1] "u-1" "b-2" "e-3"

So what if you want to paste together all elements of svec?

paste0(svec)  # not what we wanted

[1] "u" "b" "e"

paste0(svec[1], svec[2], svec[3])  # okay but clunky and does not generalize

[1] "ube"

reduce(svec, paste0)

[1] "ube"

accumulate(svec, paste0)

[1] "u"   "ub"  "ube"

reduce(svec, paste)

[1] "u b e"

accumulate(svec, paste)

[1] "u"     "u b"   "u b e"

reduce(svec, paste, sep='-')

[1] "u-b-e"

`purrr::reduce()`: example using set intersection

lst <- lapply(c(5, 3, 2), function(b) seq(0, 30, by=b))
lst

[[1]]
[1]  0  5 10 15 20 25 30

[[2]]
 [1]  0  3  6  9 12 15 18 21 24 27 30

[[3]]
 [1]  0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30

accumulate(lst, intersect)

[[1]]
[1]  0  5 10 15 20 25 30

[[2]]
[1]  0 15 30

[[3]]
[1]  0 30

Final element of accumulate(lst, intersect) will be the output of reduce(lst, intersect).

Same as intersect(intersect(lst[[1]], lst[[2]]), lst[[3]])
intersect(): must have two arguments.

`purrr::reduce()`: example of stacking data frames

MakeDF <- function(a) {
  data.frame(param = a, u = runif(10, min=a))
}
(av <- -1 * rexp(4))

[1] -0.2992672 -0.2799575 -0.2311141 -1.2870888

(df_list <- lapply(av, MakeDF))

[[1]]
        param           u
1  -0.2992672  0.72025839
2  -0.2992672 -0.24867826
3  -0.2992672  0.67361809
4  -0.2992672  0.58069638
5  -0.2992672 -0.07674904
6  -0.2992672  0.03995586
7  -0.2992672  0.36909267
8  -0.2992672  0.57852718
9  -0.2992672  0.97767495
10 -0.2992672  0.68758376

[[2]]
        param            u
1  -0.2799575  0.445123590
2  -0.2799575  0.807609222
3  -0.2799575 -0.037438947
4  -0.2799575  0.067277809
5  -0.2799575  0.780050158
6  -0.2799575  0.607315197
7  -0.2799575  0.027929514
8  -0.2799575 -0.224933705
9  -0.2799575 -0.100150263
10 -0.2799575 -0.002993396

[[3]]
        param           u
1  -0.2311141  0.35908022
2  -0.2311141  0.01192053
3  -0.2311141  0.65449501
4  -0.2311141 -0.22140712
5  -0.2311141  0.23115687
6  -0.2311141  0.40218047
7  -0.2311141 -0.22918060
8  -0.2311141  0.48490677
9  -0.2311141 -0.03671480
10 -0.2311141  0.21089069

[[4]]
       param            u
1  -1.287089  0.189528629
2  -1.287089  0.487288118
3  -1.287089  0.002021567
4  -1.287089 -0.752588395
5  -1.287089 -1.081295393
6  -1.287089 -1.091286429
7  -1.287089 -0.589027304
8  -1.287089  0.239374898
9  -1.287089 -1.286542449
10 -1.287089 -0.810070808

reduce(df_list, rbind)

        param            u
1  -0.2992672  0.720258394
2  -0.2992672 -0.248678257
3  -0.2992672  0.673618095
4  -0.2992672  0.580696383
5  -0.2992672 -0.076749041
6  -0.2992672  0.039955857
7  -0.2992672  0.369092672
8  -0.2992672  0.578527185
9  -0.2992672  0.977674950
10 -0.2992672  0.687583763
11 -0.2799575  0.445123590
12 -0.2799575  0.807609222
13 -0.2799575 -0.037438947
14 -0.2799575  0.067277809
15 -0.2799575  0.780050158
16 -0.2799575  0.607315197
17 -0.2799575  0.027929514
18 -0.2799575 -0.224933705
19 -0.2799575 -0.100150263
20 -0.2799575 -0.002993396
21 -0.2311141  0.359080216
22 -0.2311141  0.011920531
23 -0.2311141  0.654495006
24 -0.2311141 -0.221407118
25 -0.2311141  0.231156870
26 -0.2311141  0.402180468
27 -0.2311141 -0.229180600
28 -0.2311141  0.484906775
29 -0.2311141 -0.036714798
30 -0.2311141  0.210890690
31 -1.2870888  0.189528629
32 -1.2870888  0.487288118
33 -1.2870888  0.002021567
34 -1.2870888 -0.752588395
35 -1.2870888 -1.081295393
36 -1.2870888 -1.091286429
37 -1.2870888 -0.589027304
38 -1.2870888  0.239374898
39 -1.2870888 -1.286542449
40 -1.2870888 -0.810070808

Conditional execution

`if` and `else`

Only one condition: if statement

if (condition1) {
  statement 1
} else {
  statement 2
}

Example (one coin toss)

x <- sample(0:1, 1)
x

[1] 1

if (x==0) {
  print('x is heads')
} else {
  print('x is tails')
}

[1] "x is tails"

`if` and `else`: vectorized version

ifelse(condition1, statement1, statement2)

Example (five coin tosses)

y <- sample(0:1, 5, replace=TRUE)
y

[1] 1 1 0 1 1

z <- ifelse(y==0, 'y is heads', 'y is tails')
z

[1] "y is tails" "y is tails" "y is heads" "y is tails" "y is tails"

Example (five dice rolls)

d <- sample(1:6, 5, replace=TRUE)
d

[1] 2 2 1 2 5

ifelse((d%%2)==0, 'roll is even', 'roll is odd')

[1] "roll is even" "roll is even" "roll is odd"  "roll is even" "roll is odd"

`else if`

More than two conditions:

if (condition_1) {
  # statement 1
} else if (condition_2) {
  # statement 2
} ...
} else if (condition_n) {
  # statement n
} else {
  # else statement
}

Example

x <- sample(-4:4, size=1)
x

[1] 3

if (x < 0) {
  print('squaring x to make it positive')
  x <- x^2
} else if (x > 0) {
  print('x is already positive')
} else {
  print('adding 1 to make it positive')
  x <- x+1
}

[1] "x is already positive"

[1] 3

2.2 R code execution

Functions

General concepts

Components of a function

Passing arguments

Some useful statistical functions

Customized functions – why are they useful?

Customized functions – Writing your own function

Repetitive execution: Loops

Repetitive execution: for loop

Repetitive execution: for loop

Repetitive execution: for loop

Repetitive execution: while loop

Repetitive execution: while loop

Repetitive execution: while loop

Repetitive execution: while loop

Comments on loops

Repetitive execution: apply() family

Motivation

apply family of functions

Example: calculate group means of grades

sapply()

Example: anonymous functions

Example: functions as objects

General comment about examples

Differences between for() and apply family of functions

apply()

apply(): Notice the shape of the output!

apply(): Notice the shape of the output!

Repetitive execution: reduce

purrr::reduce()

purrr::reduce(): example using paste()

purrr::reduce(): example using set intersection

purrr::reduce(): example of stacking data frames

Conditional execution

if and else

if and else: vectorized version

else if

Repetitive execution: `for` loop

Repetitive execution: `for` loop

Repetitive execution: `for` loop

Repetitive execution: `while` loop

Repetitive execution: `while` loop

Repetitive execution: `while` loop

Repetitive execution: `while` loop

Repetitive execution: `apply()` family

`apply` family of functions

Example: calculate group means of `grades`

`sapply()`

Differences between `for()` and `apply` family of functions

`apply()`

`apply()`: Notice the shape of the output!

`apply()`: Notice the shape of the output!

`purrr::reduce()`

`purrr::reduce()`: example using `paste()`

`purrr::reduce()`: example using set intersection

`purrr::reduce()`: example of stacking data frames

`if` and `else`

`if` and `else`: vectorized version

`else if`