Homework 5
Homework 5
Due: 2020-09-15, 11:59pm
For this homework you should submit a ZIP archive called
firstnameLastnameHW5.zip
. When unzipped there should be a single
directory/folder called firstnameLastnameHW5
and all files should be
within that directory. For this homework that directory should
contain:
- A single document with the answers to all the following items in
HTML or IPYNB format. Make sure you include plain English blocks in
between the code and its output, to interpret what R is giving you.
There should also be comments in the R code blocks prefixed with
#
. - Code file used to generate the HTML file in RMD format (not needed if using IPYNB).
In this homework, we will practice logical operations, handling missing data, and calculating data summaries. In this, and future homeworks we will be using datasets introduced in the example datasets notebook.
1. Emergency department triage data (50%)
Read in the Emergency department triage dataset and answer the following questions. You may want to read the background material to familiarize yourself with the context. Briefly, the dataset measured short-tem (30-day) mortality in patients presenting in an emergency deparment. The objective was to examine if routine blood tests can predict mortality.
- How many variables does the dataset have, and what are the variable names? How many patients were studied?
- Does the dataset have any missing data? If so, what variables have missing data, and how many are they missing for each of those variables?
- How many patients have any missing data?
- What proportion of patients died in 30 days (use the mean of
mort30
)? - What was the median CRP (c-reactive protein) level in this cohort
(use the variable
crp
, taking care to remove any missing values)? How many patients were missing CRP levels?
2. Soccer data (50%)
In this dataset, the variables FTHG
and FTAG
variables refer to
the number of goals scored at full time by the home and away teams
respectively. The variables HomeTeam
and AwayTeam
denote the home
and away teams respectively.
- How many games were played in the league that year?
- How many games did each team play (tabulate the
HomeTeam
andAwayTeam
variables)? - Tabulate the
FTHG
andFTAG
variables; in how many games did the home team fail to score? - How many games were scoreless (add the home and away team goals; tabulate)?
- What was the mean number of goals scored in the league that year?
- What was the mean number of goals scored in games where the home team scored (subset by home team scoring at least once; average)?
3. Acknowledgements
Cite resources or individuals helping you.