Homework 6
Homework 6
Due: 2020-09-22, 11:59pm
For this homework you should submit a ZIP archive called
firstnameLastnameHW6.zip
. When unzipped there should be a single
directory/folder called firstnameLastnameHW6
and all files should be
within that directory. For this homework that directory should
contain:
- A single document with the answers to all the following items in
HTML or IPYNB format. Make sure you include plain English blocks in
between the code and its output, to interpret what R is giving you.
There should also be comments in the R code blocks prefixed with
#
. - Code file used to generate the HTML file in RMD format (not needed if using IPYNB).
In this homework we will practice reading and manipulating data, calculating summary statistics, and making graphical displays.
1. Agren data (50%)
Read in the Agren data and make the following plots. All plots should be properly labeled. Tweak the defaults to make the plots effective. Make a comment on each plot, and what it adds (or doesn’t).
a. Histogram
Make a histogram of fitness in Italy in 2011.
b. Stem and leaf plot
Make a stem and leaf plot of fitness in Italy in 2011.
c. Boxplot
Make a boxplot of fitness in Italy in 2011, separated by FLC genotype.
d. Dotplot
Make a dotplot of fitness in Italy in 2011, separated by FLC genotype.
e. Stripplot
Make a stripplot of fitness in Italy in 2011, separated by FLC genotype.
f. Scatterplot
Make a scaterplot of fitness in Italy and Sweden in 2009.
g. Scatterplot for two groups
Make a scaterplot of fitness in Italy and Sweden in 2009, distinguishing points by FLC genotype. Take a log transformation, if helpful.
2. Soccer data (20%)
Read in the soccer data.
a. Histogram
Make a histogram of the number of goals at full-time by the home team
(FTHG
).
b. Mosaic plot
-
Categorize the number of goals into three categories (0 goals, 1 goal, and 2 or more goals).
-
Make a mosaic plot of the number of full-time goals by the home team against the full-time goals by the away team.
3. Emergency data (30%)
Read in the Emergency data (use the reading data notebook).
- Make a histogram of CRP (
crp
); then log transform it and make a histogram. Comment on the difference between the two histograms. - Make a scatterplot of albumin (
alb
) against creatnine (crea
); make the scatterplot again after taking logarithms of both variables. Comment on the differrence. - Make a violin plot of creatnine levels by mortality status
(
mort30
). Repeat after taking logarithms and comment on the difference.
Visual assessment of association with mortality (optional)
The main outcome measure here is mortality in 30 days (mort30
). A
key question is whether routine measures can predict mortality.
Make appropriate plots of every variable in the dataset against mortality (see the Readme file on the website for information on all the data fields). In general, if the variable is quantitative you will want to make a violin plot; you may need to transform the variable by a log transformation. If the variable is categorical, try a mosaic plot. If you choose to do something different, please justify the choice.
4. Acknowledgements
Cite resources or individuals helping you.