MBA 6361
Data Science for Managers
Lecture 3 v1

Peter Rabinovitch

2023-01-24

Bad plot

Stuff

About the project

How to do slides in knitr

Assignment 2 - Comments about the ones I have seen so far

One advantage to submitting early is that if I have time, I can have a look and provide feedback before it is due.

Assignments in general

How to learn this stuff

How to ask a question

Example: can’t figure out how to exclude rows with filter

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ purrr   1.0.0     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(readxl)
df <- read_excel("statementofvotescastoctober242018.xls",skip=11)
## New names:
## • `` -> `...3`
df %>% tail()
## # A tibble: 6 × 4
##   Precinct                                               Registe…¹ ...3  Cards…²
##   <chr>                                                      <dbl> <lgl>   <dbl>
## 1 Spc Adv 4 99-002 - City Hall                                   0 NA        487
## 2 Spc Adv 4 99-003 - Greenboro Community Centre                  0 NA        515
## 3 Spc Adv 4 99-004 - Ben Franklin Place                          0 NA        600
## 4 Spc Adv 4 99-005 - Minto Recreation Complex-Barrhaven          0 NA        620
## 5 Spc Adv 4 99-006 - Richcraft Recreation Complex-Kanata         0 NA        501
## 6 City / Ville - Total                                      633946 NA     269772
## # … with abbreviated variable names ¹​`Registered\nVoters`, ²​`Cards Cast`
# want to get rid of "City" line
df <- tribble(~precinct, ~votes, #input
  "99-002 - City Hall",0,
  "99-003 - Greenboro Community Centre", 515,
  "99-006 - Richcraft Recreation Complex-Kanata",501,
  "City / Ville - Total", 633946
  )

# want
#   precinct                                     votes
# 1 99-002 - City Hall                               0
# 2 99-003 - Greenboro Community Centre            515
# 3 99-006 - Richcraft Recreation Complex-Kanata   501

df %>% filter(str_detect(precinct, 'city'))
## # A tibble: 0 × 2
## # … with 2 variables: precinct <chr>, votes <dbl>
df %>% filter(str_detect(precinct, 'City'))
## # A tibble: 2 × 2
##   precinct              votes
##   <chr>                 <dbl>
## 1 99-002 - City Hall        0
## 2 City / Ville - Total 633946
# ok, realized I need !
df %>% filter(!str_detect(precinct, 'City'))
## # A tibble: 2 × 2
##   precinct                                     votes
##   <chr>                                        <dbl>
## 1 99-003 - Greenboro Community Centre            515
## 2 99-006 - Richcraft Recreation Complex-Kanata   501
# but how to get back the 'City Hall' row?

Note: frequently the act of reducing your problem to minimal reproducible example will help you figure out what the problem is

Also: if you have to compress you example use a format that can be decompressed free and commonly (i.e. zip). Do not require your helper to install or buy software.

RStudio Projects

How to hide stuff

{r, warning=FALSE, message=FALSE, error=TRUE,eval=TRUE, fig.height=300px}
See https://kbroman.org/knitr_knutshell/pages/Rmarkdown.html

tibble(x=rnorm(100))%>%
  ggplot(aes( x= x))+
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

tibble(x=rnorm(100))%>%
  ggplot(aes( x= x))+
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Homework (Individual)

Tonight

open code_walkthrough.R

Stats

open Stats_New_1.Rmd

open Stats_Coin_Tossing.R

Watch

https://www.youtube.com/watch?v=5Dnw46eC-0o

Less than 15 minutes