square <- function(x) {
y <- x^2
return(y)
}STA141A: Fundamentals of Statistical Data Science
R built-in functions are, e.g., sum() or mean(), where the input is a vector and the output is a number.R environment as an object with this name.When calling a function, you can specify the arguments by:
Let x and y be numeric vectors of the same length. We can calculate:
x by mean(x);x by var(x);x by sd(x);x and y using cov(x, y);x and y using cor(x, y).How to write your own function:
Example
for loop and while loop
for loopWe often will want to apply the same operation to all objects in a vector or list.
Example: square every element in the vector y
Template
for loopExample for loop: cumulative sum.
[1] 2 3 6 5
[1] "the cumulative sum of the vector y at index 1 is: 2"
[1] "the cumulative sum of the vector y at index 2 is: 5"
[1] "the cumulative sum of the vector y at index 3 is: 11"
[1] "the cumulative sum of the vector y at index 4 is: 16"
[1] 16
for loopExample for loop: compute \(\sum_{n=1}^4 n!\)
[1] "the value of 1! is: 1"
[1] "the value of 2! is: 2"
[1] "the value of 3! is: 6"
[1] "the value of 4! is: 24"
[1] 33
Q: what happens if we omit z <- 0 at line 1?
while loopUseful for when we don’t know how many times we want to execute commands.
Example while loop (random walk)
[1] "moving x=0 by step of size 1"
[1] "moving x=1 by step of size -1"
[1] "moving x=0 by step of size 1"
[1] "moving x=1 by step of size -1"
[1] "moving x=0 by step of size 1"
[1] "moving x=1 by step of size 1"
[1] "moving x=2 by step of size -1"
[1] "moving x=1 by step of size 1"
[1] "moving x=2 by step of size -1"
[1] "moving x=1 by step of size -1"
[1] "moving x=0 by step of size 1"
[1] "moving x=1 by step of size -1"
[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size 1"
[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size -1"
[1] "moving x=-2 by step of size -1"
while loopUseful for when we don’t know how many times we want to execute commands.
Example while loop (random walk): another set of random steps
[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size -1"
[1] "moving x=-2 by step of size -1"
while loopUseful for when we don’t know how many times we want to execute commands.
Example while loop (random walk): fix the set of “random” steps
[1] "moving x=0 by step of size -1"
[1] "moving x=-1 by step of size -1"
[1] "moving x=-2 by step of size -1"
while loopUseful for when we don’t know how many times we want to execute commands.
It is possible that the body of a while() loop will never be executed.
apply() familyWe often will want to apply the same function to many objects and then store the outputs for later use. Example: calculate group means of grades
$group1
[1] 2.4 7.3 1.7 4.8 4.6 2.3 7.0 8.8 3.6 1.9
$group2
[1] 7.5 9.9 9.6 5.2 9.0 7.4 7.6 8.5 8.6 8.0
$group3
[1] 14.4 10.4 11.9 13.3 12.7
We can simplify the above code using the apply family of functions.
apply family of functionsFor now, introduce just two functions in this family:
lapply(X, FUN, ...): returns a list containing the result of the function FUN applied to all the elements of the list/vector X.... indicates optional additional arguments to be passed to FUN.
gradessapply()sapply(X, FUN, ...): essentially does lapply(X, FUN, ...) first and then tries to coerce the output into a vector.Calculate group means using…
group1 group2 group3
4.44 8.13 12.54
Instead of the keyword function, we can use \. Example:
R is a functional programming language: functions can be used as objects!
[1] 4.108169
[1] 0.4108169
[1] 0.116695
[1] 0.3416066
[1] 4.1081692 0.4108169 0.1166950 0.3416066
R documentation usually has many examples of how to apply that function.
?lapply to pull up the documentation for the function lapply().for() and apply family of functionsBeyond aesthetic differences…
for() executes commands sequentially.apply family can execute commands in parallel (but don’t by default).apply()The apply() function enables row-wise or column-wise repetitive operations on a matrix, array, or data frame. Examples:
apply(my_matrix, 1, mean) calculates the average for every row.apply(my_matrix, 2, sum) calculates the total for every column.apply(my_matrix, 1, function(x) max(x) - min(x)) calculates the range for each row.The basic syntax is apply(X, MARGIN, FUN, ...)
X: The input data object (matrix, array, or data frame).MARGIN: Specifies whether the function is applied to rows or columns:
1: Applies the function to rows.2: Applies the function to columns.c(1, 2): Applies the function to every individual cell.FUN: The function to be applied (e.g., mean, sum, or a custom function)....: Optional additional arguments to be passed to FUNExample of additional arguments: apply(my_matrix, 1, mean, trim = 0.1, na.rm=TRUE) calculates a trimmed average for every row.
apply(): Notice the shape of the output! [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] -2 -2 -2 -2 -2 -2 -2 -2
[2,] -5 -5 -5 -5 -5 -5 -5 -5
[3,] -4 -3 -2 -1 -2 -3 -4 -5
Suppose we want to compute the row-wise or the column-wise mean.
[1] -2 -5 -3
[1] -3.666667 -3.333333 -3.000000 -2.666667 -3.000000 -3.333333 -3.666667
[8] -4.000000
Both outputs are a vector.
apply(): Notice the shape of the output! [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] -2 -2 -2 -2 -2 -2 -2 -2
[2,] -5 -5 -5 -5 -5 -5 -5 -5
[3,] -4 -3 -2 -1 -2 -3 -4 -5
Suppose we want to compute the row-wise or the column-wise mean, but also count the number of unique elements in that row or column.
[,1] [,2] [,3]
[1,] -2 -5 -3
[2,] 1 1 5
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] -3.666667 -3.333333 -3 -2.666667 -3 -3.333333 -3.666667 -4
[2,] 3.000000 3.000000 2 3.000000 2 3.000000 3.000000 2
Both outputs are a matrix, but do the shapes make sense?
purrr::reduce()Repeatedly applies a binary function to the elements of a vector or list.
Reduce(), but the version from the purrr package has nicer functionality.reduce(<list or vector>, <binary function>)purrr::reduce(): example using paste()[1] "eek a bear"
[1] "eekabear"
[1] "eek...a...bear"
paste() also does element-wise pasting
So what if you want to paste together all elements of svec?
[1] "u" "b" "e"
[1] "ube"
[1] "ube"
[1] "u" "ub" "ube"
[1] "u b e"
[1] "u" "u b" "u b e"
[1] "u-b-e"
purrr::reduce(): example using set intersection[[1]]
[1] 0 5 10 15 20 25 30
[[2]]
[1] 0 3 6 9 12 15 18 21 24 27 30
[[3]]
[1] 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
purrr::reduce(): example of stacking data frames[1] -0.2992672 -0.2799575 -0.2311141 -1.2870888
[[1]]
param u
1 -0.2992672 0.72025839
2 -0.2992672 -0.24867826
3 -0.2992672 0.67361809
4 -0.2992672 0.58069638
5 -0.2992672 -0.07674904
6 -0.2992672 0.03995586
7 -0.2992672 0.36909267
8 -0.2992672 0.57852718
9 -0.2992672 0.97767495
10 -0.2992672 0.68758376
[[2]]
param u
1 -0.2799575 0.445123590
2 -0.2799575 0.807609222
3 -0.2799575 -0.037438947
4 -0.2799575 0.067277809
5 -0.2799575 0.780050158
6 -0.2799575 0.607315197
7 -0.2799575 0.027929514
8 -0.2799575 -0.224933705
9 -0.2799575 -0.100150263
10 -0.2799575 -0.002993396
[[3]]
param u
1 -0.2311141 0.35908022
2 -0.2311141 0.01192053
3 -0.2311141 0.65449501
4 -0.2311141 -0.22140712
5 -0.2311141 0.23115687
6 -0.2311141 0.40218047
7 -0.2311141 -0.22918060
8 -0.2311141 0.48490677
9 -0.2311141 -0.03671480
10 -0.2311141 0.21089069
[[4]]
param u
1 -1.287089 0.189528629
2 -1.287089 0.487288118
3 -1.287089 0.002021567
4 -1.287089 -0.752588395
5 -1.287089 -1.081295393
6 -1.287089 -1.091286429
7 -1.287089 -0.589027304
8 -1.287089 0.239374898
9 -1.287089 -1.286542449
10 -1.287089 -0.810070808
param u
1 -0.2992672 0.720258394
2 -0.2992672 -0.248678257
3 -0.2992672 0.673618095
4 -0.2992672 0.580696383
5 -0.2992672 -0.076749041
6 -0.2992672 0.039955857
7 -0.2992672 0.369092672
8 -0.2992672 0.578527185
9 -0.2992672 0.977674950
10 -0.2992672 0.687583763
11 -0.2799575 0.445123590
12 -0.2799575 0.807609222
13 -0.2799575 -0.037438947
14 -0.2799575 0.067277809
15 -0.2799575 0.780050158
16 -0.2799575 0.607315197
17 -0.2799575 0.027929514
18 -0.2799575 -0.224933705
19 -0.2799575 -0.100150263
20 -0.2799575 -0.002993396
21 -0.2311141 0.359080216
22 -0.2311141 0.011920531
23 -0.2311141 0.654495006
24 -0.2311141 -0.221407118
25 -0.2311141 0.231156870
26 -0.2311141 0.402180468
27 -0.2311141 -0.229180600
28 -0.2311141 0.484906775
29 -0.2311141 -0.036714798
30 -0.2311141 0.210890690
31 -1.2870888 0.189528629
32 -1.2870888 0.487288118
33 -1.2870888 0.002021567
34 -1.2870888 -0.752588395
35 -1.2870888 -1.081295393
36 -1.2870888 -1.091286429
37 -1.2870888 -0.589027304
38 -1.2870888 0.239374898
39 -1.2870888 -1.286542449
40 -1.2870888 -0.810070808
if and elseOnly one condition: if statement
Example (one coin toss)
if and else: vectorized versionifelse(condition1, statement1, statement2)
Example (five coin tosses)
[1] 1 1 0 1 1
[1] "y is tails" "y is tails" "y is heads" "y is tails" "y is tails"
Example (five dice rolls)
else ifMore than two conditions:
Comments on loops
Performs commands sequentially
Often we will want to perform the same set of (complicated) commands on different chunks of data.
forloop, but can be difficult to understand because it is so flexibleapply()family of functions