Homework 5
Homework 5
Due: 2025-03-11, 11:59pm
Submission instructions
For this homework you should submit a ZIP archive called
BIOE806-HW03-FirstnameLastname.zip
of a folder named
BIOE806-HW03-FirstnameLastname.
(Make the appropriate substitutions
for your name.)
- A single document with the answers to all the following items in HTML format only. Make sure you include plain English blocks in between the code, and its output to interpret what R is giving you.
- Code file used to generate the answers (RMD format). There should be comments in the code blocks.
- Please remember to mix your comments with code and output.
- Do not forget acknowledgements.
In this homework we will work on generalized linear models.
1. Link functions (50%)
Use the frog abnormalities data for this part. The ABNORMAL
variable denotes whether or not a frog had an abormality; the
Perkensus
variable denotes with a 1
animals diagnosed with a
protozoan pathogen, and SVL
variable denotes animal length not
including the tail.
- Fit a logistic regression model for the association between frog abormalities and the protozoan infection. Calculate the odds ratio of association and its 95% confidence interval.
- Compare the odds ratio calculated using logistic regression to that calculated directly by tabulating the two variables.
- Instead of using the logit link function, use the log link function
to assess the association between abnormalities and infection using
the
glm
function. What does the coefficient corresponding toPerkensus
represent? - Fit a logistic and probit regression model for abnormalities using
the animal length as predictor. Canculate the predicted (fitted)
probabilities for each animal using both models using the
predict
function. Make a scatterplot comparing the predictions obtained from the probit and logistic regressions, and comment.
2. Contingency tables (50%)
Use the emergency visit data for this part. We will see how to test association between a binary variable and a quantitative variable (age) using log linear models of summarized counts. We will do this by dividing the range of age by decade, and make a categoriacal variable out of that. Then we can use the methods of contingency tables.
- Create a categorical variable by splitting age into decades of age; the highest age group should be 90+ (put the centenarians in this category). Plot the average 30-day mortality by sex and decade of age.
- Use logistic regression to assess whether the association between 30-day mortality and age (coded as a categorical variable) depends on sex.
- Tabulate 30-day mortality, sex and age (as categorical variable by decade). Use log linear models on this contingency table to answer the same question above.
3. Acknowledgements
Cite resources or individuals helping you.