Homework 2
Homework 2
Due: 2024-01-30, 11:59pm
Submission instructions
For this homework you should submit a ZIP archive containing:
- A single document with the answers to all the following items in HTML format only. Make sure you include plain English blocks in between the code, and its output to interpret what R is giving you.
-
Code file used to generate the answers (RMD format). There should be comments in the code blocks.
- Jupyter notebook (IPYNB) is okay.
- Please remember to mix your comments with code and output.
- Do not forget acknowledgements.
In this homework we will practice working with factors.
1. Creating a factor from a numeric variable (100%)
We will use data from the model plant, Arabidopsis, for this exercise. The data and associated information is here. https://github.com/sens/smalldata/tree/master/arabidopsis
The CSV file can be accessed directly here: https://raw.githubusercontent.com/sens/smalldata/master/arabidopsis/agren2013.csv
Creating factors from numeric variables is helpful in some situations when you expect non-linear effects. Although that is not the case for this data, we are using it as an example.
- Read in the data; report the types of all variables.
- Make histograms of the fitness measurements (6 in all). If they are positively skewed try using a log2 transformation and plot the histograms again.
- Calculate the quintiles of (log2) fitness in Sweden in 2009 using
the
quantile
command. Quintiles are the 20th, 40th, 60th and 80th percentiles. - Use the
cut
command to make a factor variable from the 2009 Sweden fitness. Label themq1
throughq5
. There should be five categories. Use thetable
command to table this variable. - Perform linear regression of (log2) fitness in Sweden in 2010 on the quantile categories of fitness in 2009. Is the association positive or negative?
- Using the
relevel
command, make the middle quintile (q3
) the reference category. In many situations, it is more natural to have the middle category as the reference. Repeat the linear regression, and explain the new regression coefficients using the previous regression output.
2. Acknowledgements
Cite resources or individuals helping you.