Testing equality with == for floating points can cause problems. Numbers are represented with finite “precision”, i.e. only up to 2^{-32} or 2^{-64}.
x <-c( (1/49) *49, sqrt(2)^2)x ==c(1,2)
[1] FALSE FALSE
What’s going on? Let’s look at more precise representation in R.
print(x, digits=10)
[1] 1 2
print(x, digits=20)
[1] 0.99999999999999988898 2.00000000000000044409
dplyr::near() helps with this, ignores small differences
near(x, c(1,2))
[1] TRUE TRUE
all.equal(x, c(1,2)) # returns single value
[1] TRUE
Missing values
Almost any operation involving an NA returns NA.
(NA>5)
[1] NA
(10==NA)
[1] NA
What about NA==NA?
NA==NA
[1] NA
Why? Think of this example
# Suppose we don't know Ant's ageage_ant <-NA# And we also don't know Bug's ageage_bug <-NA# Then we shouldn't know whether Ant and# Bug are the same ageage_ant == age_bug
[1] NA
Missing values
A useful function for dealing with NA: is.na()
is.na(x) works with any type of vector and returns TRUE for missing values and FALSE for everything else:
is.na(c(TRUE, NA, FALSE))
[1] FALSE TRUE FALSE
is.na(c(1, NA, 3))
[1] FALSE TRUE FALSE
is.na(c("a", NA, "b"))
[1] FALSE TRUE FALSE
Missing values
Since is.na() returns logicals, can be used in filter():
flights %>%filter(is.na(dep_time))
# A tibble: 8,255 × 19
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
<int> <int> <int> <int> <int> <dbl> <int> <int>
1 2013 1 1 NA 1630 NA NA 1815
2 2013 1 1 NA 1935 NA NA 2240
...
Missing values
Can use to help identify where NA come from. e.g., why are there air_time NAs?
Let’s examine how dep_time, dep_delay, and sched_dep_time are related.
# A tibble: 365 × 5
year month day behind n_flight
<int> <int> <int> <dbl> <int>
1 2013 1 1 32.5 842
2 2013 1 2 32.0 943
...
n() gives total \(\#\) of flights per group, not ideal.
Conditional transformations
Conditional transformations: if_else()
if_else(CONDITION, TRUE_VAL, FALSE_VAL, MISSING_VAL) is useful when we want to return some value when condition is TRUE and return another value when condition is FALSE.
x <-c(-2, -1, 1, 2, NA)if_else(x >0, "yay", "boo")
[1] "boo" "boo" "yay" "yay" NA
The fourth argument of if_else() specifies what to fill NA’s with:
if_else(x >0, "yay", "boo", "idk how i feel yet")
[1] "boo" "boo" "yay"
[4] "yay" "idk how i feel yet"