babynames # n is the total number of people of that sex with that name born in that year
# A tibble: 1,924,665 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724
2 1880 F Anna 2604 0.0267
3 1880 F Emma 2003 0.0205
4 1880 F Elizabeth 1939 0.0199
5 1880 F Minnie 1746 0.0179
6 1880 F Margaret 1578 0.0162
7 1880 F Ida 1472 0.0151
8 1880 F Alice 1414 0.0145
9 1880 F Bertha 1320 0.0135
10 1880 F Sarah 1288 0.0132
# ℹ 1,924,655 more rows
Pattern matching
str_view()
str_view(string, pattern = NULL) will print the underlying representation of a string and to see how a pattern matches.
pattern will parse regular expressions (regex) and character classes
# A tibble: 1,924,665 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724
2 1880 F Anna 2604 0.0267
3 1880 F Emma 2003 0.0205
...
babynames |>filter(str_detect(name, "x")) |>count(name, wt = n, sort =TRUE)
# A tibble: 974 × 2
name n
<chr> <int>
1 Alexander 665492
2 Alexis 399551
3 Alex 278705
4 Alexandra 232223
5 Max 148787
...
str_detect()
You can also use str_detect() in conjunction with group_by(), summarize() etc.
sum() will return number of strings which have pattern
mean() will return proportion of strings which have pattern
E.g. proportion of names per year that have an “x”
Can use str_count() with mutate, i.e. computing number of vowels/consonants in baby names:
babynames %>%count(name) %>%mutate(vowels =str_count(name, "[aeiou]"), # pattern matching is case sensitive, so "A" isn't counted.consonants =str_count(name, "[^aeiou]") )