Homework 3

Due: 2022-09-04, 11:59pm

All homework must be submitted via Blackboard. Your answers must be in a MS Word (DOCX) or PDF format. Your submitted document should have sections corresponding to those in this homework.

Please make sure that you have watched the videos and have done the readings.

Except as noted, you should include the Stata output to support your answers. The results should be in

fixed width (typewriter) font.

Include graphs as images in your document. Use the lecture notes as a guide.

1. Anderson data exploration (20%)

The purpose of this section is to ensure that you can replicate what was done in the lecture.

  • Retrace the steps taken in the lecture using the Anderson data. Are you able to replicate what was done? Did you observe any discrepancies, errors, or anything noteworthy you would like to share? Do not include any output unless you wish to support a comment.
  • Tabulate the flu test positivity rate by month of visit. What month has the highest positivity rate? (Include supporting output.)

2. Bos-Touwen data exploration (60%)

Read the introduction to the dataset in the README file (also in Chapter 3 of BCB). Read the abstract of the accompanying paper.

The dataset is in SAV format (which is used by SPSS). Import the data into Stata using the import command (or the point-and-click interface). (If you already have a dataset imported, you will have to use the command clear to clear the previously imported dataset from memory.)

The rest of this exercise will be similar, but slightly different from the Anderson data. The idea is to see if you can apply the ideas from the lecture to a new data. Make sure to include supporting output and graphs.

  • Drop the questionnaire variables using drop PAM* HAD* SF* IP* SSU*. How many variables and observations do you have?
  • Use describe to get an overview of the variables. Are there any duplicates in the data?
  • Use misstable to get an idea of the missing data patterns. What do you see?
  • What fraction of the subjects never smoked? Separate it out by gender. Use tabulate.
  • Make a histogram of body length, adjusting the defaults so that the bins are 5cm wide starting at 135cm.
  • Make a dotplot of body length by gender, so that we can compare the distributions for men and women. Comment.

3. Post on “Study habits” discussion forum (20%)

How are you using the study materials from this class (eg. videos, vs lecture notes vs textbook)? What other resources are you finding helpful?
Please share with the class what you feel comfortable sharing.

If you don’t want to share, state so in your post.

In the homework file, please enter the date of your posting.

4. Acknowledgements

Please acknowledge individuals who helped you or resources thay were helpful in completing the homework.