Homework 6

Due: 2020-09-22, 11:59pm

For this homework you should submit a ZIP archive called firstnameLastnameHW6.zip. When unzipped there should be a single directory/folder called firstnameLastnameHW6 and all files should be within that directory. For this homework that directory should contain:

  • A single document with the answers to all the following items in HTML or IPYNB format. Make sure you include plain English blocks in between the code and its output, to interpret what R is giving you. There should also be comments in the R code blocks prefixed with #.
  • Code file used to generate the HTML file in RMD format (not needed if using IPYNB).

In this homework we will practice reading and manipulating data, calculating summary statistics, and making graphical displays.

1. Agren data (50%)

Read in the Agren data and make the following plots. All plots should be properly labeled. Tweak the defaults to make the plots effective. Make a comment on each plot, and what it adds (or doesn’t).

a. Histogram

Make a histogram of fitness in Italy in 2011.

b. Stem and leaf plot

Make a stem and leaf plot of fitness in Italy in 2011.

c. Boxplot

Make a boxplot of fitness in Italy in 2011, separated by FLC genotype.

d. Dotplot

Make a dotplot of fitness in Italy in 2011, separated by FLC genotype.

e. Stripplot

Make a stripplot of fitness in Italy in 2011, separated by FLC genotype.

f. Scatterplot

Make a scaterplot of fitness in Italy and Sweden in 2009.

g. Scatterplot for two groups

Make a scaterplot of fitness in Italy and Sweden in 2009, distinguishing points by FLC genotype. Take a log transformation, if helpful.

2. Soccer data (20%)

Read in the soccer data.

a. Histogram

Make a histogram of the number of goals at full-time by the home team (FTHG).

b. Mosaic plot

  • Categorize the number of goals into three categories (0 goals, 1 goal, and 2 or more goals).

  • Make a mosaic plot of the number of full-time goals by the home team against the full-time goals by the away team.

3. Emergency data (30%)

Read in the Emergency data (use the reading data notebook).

  • Make a histogram of CRP (crp); then log transform it and make a histogram. Comment on the difference between the two histograms.
  • Make a scatterplot of albumin (alb) against creatnine (crea); make the scatterplot again after taking logarithms of both variables. Comment on the differrence.
  • Make a violin plot of creatnine levels by mortality status (mort30). Repeat after taking logarithms and comment on the difference.

Visual assessment of association with mortality (optional)

The main outcome measure here is mortality in 30 days (mort30). A key question is whether routine measures can predict mortality.

Make appropriate plots of every variable in the dataset against mortality (see the Readme file on the website for information on all the data fields). In general, if the variable is quantitative you will want to make a violin plot; you may need to transform the variable by a log transformation. If the variable is categorical, try a mosaic plot. If you choose to do something different, please justify the choice.

4. Acknowledgements

Cite resources or individuals helping you.