Homework 2

Due: 2024-01-30, 11:59pm

Submission instructions

For this homework you should submit a ZIP archive containing:

  • A single document with the answers to all the following items in HTML format only. Make sure you include plain English blocks in between the code, and its output to interpret what R is giving you.
  • Code file used to generate the answers (RMD format). There should be comments in the code blocks.

  • Jupyter notebook (IPYNB) is okay.
  • Please remember to mix your comments with code and output.
  • Do not forget acknowledgements.

In this homework we will practice working with factors.

1. Creating a factor from a numeric variable (100%)

We will use data from the model plant, Arabidopsis, for this exercise. The data and associated information is here. https://github.com/sens/smalldata/tree/master/arabidopsis

The CSV file can be accessed directly here: https://raw.githubusercontent.com/sens/smalldata/master/arabidopsis/agren2013.csv

Creating factors from numeric variables is helpful in some situations when you expect non-linear effects. Although that is not the case for this data, we are using it as an example.

  • Read in the data; report the types of all variables.
  • Make histograms of the fitness measurements (6 in all). If they are positively skewed try using a log2 transformation and plot the histograms again.
  • Calculate the quintiles of (log2) fitness in Sweden in 2009 using the quantile command. Quintiles are the 20th, 40th, 60th and 80th percentiles.
  • Use the cut command to make a factor variable from the 2009 Sweden fitness. Label them q1 through q5. There should be five categories. Use the table command to table this variable.
  • Perform linear regression of (log2) fitness in Sweden in 2010 on the quantile categories of fitness in 2009. Is the association positive or negative?
  • Using the relevel command, make the middle quintile (q3) the reference category. In many situations, it is more natural to have the middle category as the reference. Repeat the linear regression, and explain the new regression coefficients using the previous regression output.

2. Acknowledgements

Cite resources or individuals helping you.