Homework 10

Due: 2019-11-10, 11:59pm

For this homework you should submit a ZIP archive called firstnameLastnameHW10.zip. When unzipped there should be a single directory/folder called firstnameLastnameHW10 and all files should be within that directory. For this homework that directory should contain:

  • A single document with the answers to all the following items in HTML or IPYNB format. Make sure you include plain English blocks in between the code and its output, to interpret what R is giving you. There should also be comments in the R code blocks prefixed with #.
  • Code file used to generate the HTML file in RMD format (not needed if using IPYNB).

The goal of this homework is to analyze a dataset you have not seen before, and to practice some linear regression. This homework has less precise instructions, and you may have to read the supporting material cited below to understand the data better.

1. Frog abnormalities in Alaska (100%)

We are going to use the frog abnormalities dataset collected by Reeves et. al. (2010). We are going to use a subset of the data they collected which gives details of over 9000 frogs sampled in Alaska by the researchers. Their objective was to study potentical causes of amphibian abnormalities. To load the file use the following.

frogURL <- "https://datadryad.org/stash/downloads/file_stream/98621"
tmpfile <- tempfile(fileext="csv")
download.file(frogURL,tmpfile)
frog <- read.csv(tmpfile)

More details about the data are in the README.

We will focus on just four variables, the snout to vent length (SVL), the Gosner (developmental) stage of the animal (GOSNER_STAGE), whether or not the frog had an abnormality (ABNORMAL), and whether or not the animal showed signs of a protozoan infection (Perkensus).

Our main goal is to examine if the snout to vent length is associated with abnormalities, accounting for the Gosner developmental stage and possible protozoan infection.

  • Perform a descriptive analysis of the four variables of interest. Calculate appropriate summary statistics for each variable.
  • Are there any missing data, if so how many and for which variables?
  • Use graphical summaries to examine association of the snout to vent length with the other variables. Which variable appears to show strongest association?
  • Use linear regression to examine association between snout to vent length and Gosner stage. What is the meaning of the slope and intercept of the line fitted?
  • How much of the variation in snout to vent length is explained by variation in Gosner stage?
  • Use a t-test comparing snout to vent length in animals with and without an abnormality.
  • Build a linear regression model for snout to vent length and whether or not the animal was abnormal adjusting for Gosner stage and protozoan infection status. Interpret the output of the regression model.
  • Check model fit using a residual vs fitted plot. Does it look satisfactory?
  • Compare the result of the regression model with that of the t-test. Do they lead to the same or different conclusions?
  • What is your overall conclusion regarding the relationship between snout to vent length and frog abnormalities?

2. Acknowledgements

Cite resources or individuals helping you.