Midterm 1

Due: 2022-09-18, 11:59pm

The purpose of the midterm is to review and test the topics we have covered in the class so far. In this test you will organize the coin spin data that everyone posted in the Blackboard discussion, and then do a basic analysis. You will also do a exploratory analysis of a new data you have not seen before.

This work must be submitted via Blackboard. The answers must be in a MS Word (DOCX) or PDF format. Your submitted document should have sections corresponding to those in this homework.

Please make sure that you have watched the videos and have done the readings.

You should include the Stata output/figures to support your answers. The Stata output should be in

fixed width (typewriter) font.

Include graphs as images in your document. Use the lecture notes as a guide.

1. Organize and explore coin spin data (30%)

Create a dataset from the student postings in the class discussion forum on the coin spins. Then, explore the data following the instructions that follow.

  • Create a spreadsheet file (CSV or XLSX) from the student postings. The data should have 5 columns in this order: (a) last name of student, (b) string of H’s an T’s reporting the outcome of the spins with heads facing the spinner, (c) string reporting outcome of spins with tails facing the spinner, (d) number of heads in first 10 spins, and (e) number of heads in second 10 spins.

  • Write a README file that describes the experiment, and provides enough information for a future analyst to understand and analyze the data.

  • Read in the data into Stata and report the number of variables and observations.

  • Report the mean, median, variance, and IQR of the number of heads from the 10 spins each student made when spinning with the heads facing them.

  • Make a histogram, stem and leaf plot, dotplot, and boxplot of the number of heads with heads facing the spinner. Comment on what you learned from them and the pro/cons of each plot in helping you understand the data.

  • Make a scatterplot of the number of heads from the first 10 spins (facing heads on the X-axis) vs the second 10 spins (facing tails on the Y-axis). Repeat using dotplots. Comment on what you learned from the plots.

2. Analyze Mulugeta metabolic syndrome data (50%)

Mulugeta and colleagues studied predictors of metabolic syndrome among adults visiting a hospital in Ethiopia. Dryad link

  • Download the data and README file from Dryad, unzip it, and put it in a folder called mulugeta-metabolic-syndrome. Does the README file and Dryad website have enough information for you to understand the data? Do we need anyone’s permission to use this data? Please post a note on the discussion board if you have a question.

  • Read in the data into Stata; how many variables and observations are in the dataset?

  • Make a list of all variables in the dataset, and classify them.

     quantitative - integer
                  - continuous
     categorical  - dichotomous
                  - ordinal
                  - nominal
    
  • Are there any duplcates or missing values in the data?

  • Tabluate age category by sex. What percent of males are more than 60 years old? What percent of 60+ olds are male?

  • Make a spineplot of age category against sex. Are the age distributions comparable – any sex trending older?

  • Label the values of the sex and agenewcat to be more readable, e.g. male/female for sex. Redo the table and mosaicplot.

  • Make a histogram of systolic blood pressure. Compare the distributions of systolic blood pressure between men and women using a dotplot. Comment.

  • Use the summmarize command (or other ways) to get the mean, median, standard deviation, and IQR of systolic blood pressure. Can you reconcile the numerical measures with the histogram? State your reasons.

  • Make a scatterplot of total cholesterol against waist circumference. Color the plotting character by sex. You may find the sepscatter package useful (ssc install sepscatter to install; to color by sex use the option separate(sex) msymbol(O)). Comment.

3. Exam/report organization (20%)

This exam should be submitted as a ZIP file; call it yourlastname.zip. To do this follow the following steps.

  • Put your report document (DOCX/PDF), coin spin data file (XLSX,TXT,CSV), and coin spin README file (TXT, DOCX) in the midterm1 folder located in the exams folder. It would look something like this.

     exams/
     ├── final
     ├── midterm1
     │   ├── coinspin-README.txt
     │   ├── coinspin.csv
     │   └── report.pdf
     └── midterm2
    
  • Zip the midterm1 folder; (re)name it yourlastname-midterm1.zip. (Replace yourlastname with your last name.

This approach will ensure that I get all your files in a organized fashion and can distinguish student homeworks more easily.

Make sure that your report file (PDF or DOCX) has all the answers to the questions.

4. Acknowledgements

Please acknowledge individuals who helped you or resources thay were helpful in completing the homework.