x <- c(4,2,7,8) # concatenates the numbers 4, 2, 7, and 8
mean(x) [1] 5.25
summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00 3.50 5.50 5.25 7.25 8.00
STA141A: Fundamentals of Statistical Data Science
R is a programming language for statistical computing and data visualization.
Broadly speaking
TRUE, FALSE." " or single ' quotes, e.g., "apple" or 'banana'.paste() function is often used for creating character strings."apple" and 'banana'), and is used to store categorical data.x
bear cat dog
4 2 3
[1] dog dog cat dog cat bear bear bear bear
Levels: dog cat bear
x
dog cat bear
3 2 4
"apple" and 'banana'), and is used to store categorical data.x
bear cat dog
4 2 3
[1] dog dog cat dog cat bear bear bear bear
Levels: cat dog bear
x
cat dog bear
2 3 4
"apple" and 'banana'), and is used to store categorical data.x
bear cat dog
4 2 3
[1] dog dog cat dog cat bear bear bear bear
Levels: bear dog cat
x
bear dog cat
4 3 2
Create “regular” vectors.
c() will sometimes coerce different data types into the same type.
int [1:2] 4 6
num [1:2] 4 6
chr [1:2] "3" "haha"
num [1:2] 1 7
num [1:2] 0 7
Roughly, order is logical < integer < numeric < character.
Can generate a vector containing random values using e.g., sample().
[1] "bug" "ant" "dog"
[1] "bug" "cat" "cat" "cat" "bug" "bug"
[1] 0.6920607 0.9312297 0.2116521 0.3092926 0.2847530
[1] 0.04980434 1.56936624 1.41340558 -1.43368174 -0.31236580
[1] 10000
int [1:10000] 2 1 1 0 6 3 1 4 0 5 ...
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.000 2.000 2.003 3.000 10.000
x
0 1 2 3 4 5 6 7 8 9 10
1336 2734 2685 1826 890 352 125 40 10 1 1
Square brackets [ ] are used for indexing (i.e., accessing elements of) a vector, matrix, array, list, or dataframe.
Using a vector of positive integers, the corresponding elements of the vector are selected and concatenated, in that order. A vector of negative integers specifies the values to be excluded rather than included.
[1] 0.6490791 0.1544339 0.3551344 0.5247749 0.4393103 0.3635327 0.4119610
[1] 0.1544339
[1] 0.4393103
[1] 0.6490791 0.1544339 0.5247749 0.4393103 0.3635327 0.4119610
[1] NA
Values corresponding to TRUE in the index vector are selected and those corresponding to FALSE are omitted.
[1] 0.6186568 0.8780956 0.2644407 0.4669568 0.5955775 0.9808410 0.9834137
[1] 0.6186568 0.8780956 0.9834137
[1] 0.2644407
[1] FALSE FALSE TRUE TRUE FALSE FALSE FALSE
[1] 0.2644407 0.4669568
If a vector has a names attribute to identify its components, a sub-vector of the names vector may be used to select the elements.
a b c d
0.42191630 0.12014876 -0.01132619 0.92015369
Named num [1:4] 0.4219 0.1201 -0.0113 0.9202
- attr(*, "names")= chr [1:4] "a" "b" "c" "d"
b a
0.1201488 0.4219163
To get the names of a vector, can again use names(x)
To get the names of a vector, can again use names(x)
as.logical(), as.numeric(), as.character().is.character(), is.logical(), etc.Q: What will be the result of the following code?
Q: What is the sum of the first three values of the vector x?
Q: What is the mean of the positive values of the vector x?
Q: How many negative values are there in the vector x?
all.equal function==.all.equal function.all.equal(vector1, vector2, ...).You can think of a matrix as a collection of vectors of the same type and length.
matrix() function, where the number of rows and columns must be specified.It is similar to vectors: each element is indexed by row and column.
It is similar to vectors: values corresponding to TRUE in the index vector are selected and those corresponding to FALSE are omitted.
apply() function)Arrays are a generalization of matrices.
dim() to find the size of an array.dimnames() to assign names to each element.List elements can be of any type.
list().Square brackets operator [ ] or the double square bracket operator [[ ]].
[ ], [[ ]], or the $ operator.
Why might one want to use a vector instead of a list?
Data frames are a convenient way to store datasets.
USArrests, PlantGrowth, ToothGrowth, mtcarsA data frame is a list with class “data.frame”.
read.table() function to read an entire data frame from an external file.
read.csv().[, [[ or $, and they return a vector or a dataframe.Subsetting:
[ or [[).[ or [[).[, [[, or $).A tibble is a modern, opinionated version of a data frame.
# A tibble: 32 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# ℹ 22 more rows
Subset syntax is the same as with data frame.