[1] 5.25
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00 3.50 5.50 5.25 7.25 8.00
STA141A: Fundamentals of Statistical Data Science
R is a programming language for statistical computing and data visualization.
Broadly speaking
TRUE, FALSE." " or single ' quotes, e.g., "apple" or 'banana'.paste() function is often used for creating character strings."apple" and 'banana'), and is used to store categorical data."apple" and 'banana'), and is used to store categorical data.x <- c("dog", "dog", "cat", "dog", "cat", "bear", "bear", "bear", "bear")
table(x) # default order is alphabetical orderx
bear cat dog
4 2 3
x <- factor(x, levels=names(sort(table(x)))) # don't need to know this code right now
x # vector elements remain the same[1] dog dog cat dog cat bear bear bear bear
Levels: cat dog bear
x
cat dog bear
2 3 4
"apple" and 'banana'), and is used to store categorical data.x <- c("dog", "dog", "cat", "dog", "cat", "bear", "bear", "bear", "bear")
table(x) # default order is alphabetical orderx
bear cat dog
4 2 3
x <- factor(x, levels=names(sort(table(x), decreasing=TRUE))) # don't need to know this code right now
x # vector elements remain the same[1] dog dog cat dog cat bear bear bear bear
Levels: bear dog cat
x
bear dog cat
4 3 2
Create “regular” vectors.
c() will sometimes coerce different data types into the same type.
int [1:2] 4 6
num [1:2] 4 6
chr [1:2] "3" "haha"
num [1:2] 1 7
num [1:2] 0 7
Roughly, order is logical < integer < numeric < character.
Can generate a vector containing random values using e.g., sample().
[1] "cat" "bug" "dog"
[1] "ant" "ant" "cat" "dog" "ant" "bug"
[1] 0.2378454 0.7153129 0.9154155 0.1102262 0.9308610
[1] 0.1165353 1.1950628 -2.1309916 -1.8542192 -0.3959627
x <- rpois(10000, lambda=2) # randomly generate 10000 values from a Poission distribution with parameter lambda=2
length(x) # returns length of the variable x[1] 10000
int [1:10000] 2 0 5 4 2 1 3 2 2 1 ...
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.000 2.000 2.019 3.000 8.000
x
0 1 2 3 4 5 6 7 8
1307 2693 2720 1821 932 349 132 35 11
Square brackets [ ] are used for indexing (i.e., accessing elements of) a vector, matrix, array, list, or dataframe.
Using a vector of positive integers, the corresponding elements of the vector are selected and concatenated, in that order. A vector of negative integers specifies the values to be excluded rather than included.
[1] 0.5655175 0.7688468 0.7220338 0.5090330 0.3008393 0.2124152 0.4767549
[1] 0.7688468
[1] 0.3008393
[1] 0.5655175 0.7688468 0.5090330 0.3008393 0.2124152 0.4767549
[1] NA
Values corresponding to TRUE in the index vector are selected and those corresponding to FALSE are omitted.
[1] 0.88553721 0.75740886 0.09780963 0.59125508 0.65270475 0.62804784 0.86978383
[1] 0.8855372 0.7574089 0.8697838
[1] 0.09780963
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE
[1] 0.09780963
If a vector has a names attribute to identify its components, a sub-vector of the names vector may be used to select the elements.
a b c d
-0.2558558 -0.4432794 0.2562432 0.5204531
Named num [1:4] -0.256 -0.443 0.256 0.52
- attr(*, "names")= chr [1:4] "a" "b" "c" "d"
b a
-0.4432794 -0.2558558
To get the names of a vector, can again use names(x)
To get the names of a vector, can again use names(x)
as.logical(), as.numeric(), as.character().is.character(), is.logical(), etc.Q: What will be the result of the following code?
Q: What is the sum of the first three values of the vector x?
Q: What is the mean of the positive values of the vector x?
Q: How many negative values are there in the vector x?
all.equal function==.all.equal function.all.equal(vector1, vector2, ...).You can think of a matrix as a collection of vectors of the same type and length.
matrix() function, where the number of rows and columns must be specified.It is similar to vectors: each element is indexed by row and column.
It is similar to vectors: values corresponding to TRUE in the index vector are selected and those corresponding to FALSE are omitted.
apply() function)Arrays are a generalization of matrices.
dim() to find the size of an array.dimnames() to assign names to each element.List elements can be of any type.
list().Square brackets operator [ ] or the double square bracket operator [[ ]].
[ ], [[ ]], or the $ operator.
Data frames are a convenient way to store datasets.
USArrests, PlantGrowth, ToothGrowth, mtcarsA data frame is a list with class “data.frame”.
read.table() function to read an entire data frame from an external file.
read.csv().[, [[ or $, and they return a vector or a dataframe.Subsetting:
[ or [[).[ or [[).[, [[, or $).[1] 11 12 13 14 15 16
[1] 11 12 13 14 15 16
age name
1 11 Bug
2 12 Cat
3 13 Bug
4 14 Bug
5 15 Dog
6 16 Bug