Homework 4

Due: 2020-09-08, 11:59pm

In this homework you will practice organizing your work, and in particular, all future homeworks and projects.

1. Organize your folders and files (25%)

For this homework you should submit a ZIP archive called firstnameLastnameHW4.zip whose directory/folder structure is as follows:

firstnameLastnameHW4
     primaryData
         README
         myrecipe.md
         E0.csv
     processedData
         myrecipe.csv
     analysis
         HW4.Rmd
         HW4.html
         HW4.ipynb
  • All files should be inside the directories.
  • Make sure you only use relative path names in your code.
  • Put a README file in the primary data directory that describes where the data came from and what they are.

Your homework file will be in the analysis directory and can be either a RMarkdown (RMD/HTML) or Jupyter notebook (IPYNB) format. Each item below should be a separate section in the homework file.

2. Read in your recipe (25%)

Repeat the process of reading in your recipe in R, and writing out a CSV file of the ingredients, amounts, and units. This time, organize it properly, paying attention to the following:

  • The recipe file should be in the primaryData folder. Add info to the README file in the primaryData folder describing the file including how you obtained them.
  • The output CSV file should be in the processedData folder.
  • The homework files should be in the analysis folder.

3. Read in English Premier League data (25%)

You will use the small datasets archive on Blackboard for this work. Go to the soccer directory (folder) that contains data for all games in the English Premier League for one season.

  • Copy the file E0.csv into the primaryData folder. Update the README file to indicate the source of the data.
  • Read in the data using read.csv, and print out the first 10 lines using head.
  • How many games were played in that season (use nrow)?
  • The data are for which season (look at the original README)?

4. Project (25%)

In this item you will be developing your project idea a bit further than last time. You should have the answers to the following items as subsections (you may reuse some of this content when you write the formal report).

  • Background: Give a one para scientific background of the project.
  • Research question: State the research question for this specific project. For example, “We want to examine the whether FLC genotype and fitness in previous years are predictive of future fitness of these recombinant inbred lines.” It should suggest a specific direction for your analysis.
  • Data: Briefly describe the origin and content of the data you are proposing to use. State any issues with obtaining and sharing the data. What is the expected format of the data (CSV files, Excel files, database, etc.)? How large is the dataset (number of files, variables, and observation units)?

If you do not have a project idea, and need assistance, please say so. The ideal project is something that is closely related to your research. You can use the class to make research progress.

It is okay for your project idea to drift a bit as you refine it.

5. Acknowledgements

Cite resources or individuals helping you.