Homework 5

Due: 2020-09-15, 11:59pm

For this homework you should submit a ZIP archive called firstnameLastnameHW5.zip. When unzipped there should be a single directory/folder called firstnameLastnameHW5 and all files should be within that directory. For this homework that directory should contain:

  • A single document with the answers to all the following items in HTML or IPYNB format. Make sure you include plain English blocks in between the code and its output, to interpret what R is giving you. There should also be comments in the R code blocks prefixed with #.
  • Code file used to generate the HTML file in RMD format (not needed if using IPYNB).

In this homework, we will practice logical operations, handling missing data, and calculating data summaries. In this, and future homeworks we will be using datasets introduced in the example datasets notebook.

1. Emergency department triage data (50%)

Read in the Emergency department triage dataset and answer the following questions. You may want to read the background material to familiarize yourself with the context. Briefly, the dataset measured short-tem (30-day) mortality in patients presenting in an emergency deparment. The objective was to examine if routine blood tests can predict mortality.

  • How many variables does the dataset have, and what are the variable names? How many patients were studied?
  • Does the dataset have any missing data? If so, what variables have missing data, and how many are they missing for each of those variables?
  • How many patients have any missing data?
  • What proportion of patients died in 30 days (use the mean of mort30)?
  • What was the median CRP (c-reactive protein) level in this cohort (use the variable crp, taking care to remove any missing values)? How many patients were missing CRP levels?

2. Soccer data (50%)

In this dataset, the variables FTHG and FTAG variables refer to the number of goals scored at full time by the home and away teams respectively. The variables HomeTeam and AwayTeam denote the home and away teams respectively.

  • How many games were played in the league that year?
  • How many games did each team play (tabulate the HomeTeam and AwayTeam variables)?
  • Tabulate the FTHG and FTAG variables; in how many games did the home team fail to score?
  • How many games were scoreless (add the home and away team goals; tabulate)?
  • What was the mean number of goals scored in the league that year?
  • What was the mean number of goals scored in games where the home team scored (subset by home team scoring at least once; average)?

3. Acknowledgements

Cite resources or individuals helping you.