Homework 2

Due: 2022-08-28, 11:59pm

All homework must be submitted via Blackboard. Your answers must be in a MS Word (DOCX) or PDF format. Your submitted document should have sections corresponding to those in this homework (4 sections).

Please make sure that you have watched the videos and have done the readings.

1. Read Anderson data (40%)

Read the introduction to the Anderson data in the README file (also in Chapter 3 of BCB). Read the abstract of the paper and navigate to the Dryad page of the dataset.

Read in the Anderson data in Stata using either the point-and-click interface or by pasting commands into the command window.

a. In what country was the data collected (read abstract)? b. When was the data uploaded into Dryad (examine Dryad page)? c. Give an example of a binary or dichotomous variable in the dataset. d. Is there a quantitative (numerical) variable in the dataset? If so, what is that variable?

3. Read Bos-Touwen data (40%)

Read the introduction to the dataset in the README file (also in Chapter 3 of BCB). Read the abstract of the accompanying paper.

The dataset is in SAV format (which is used by SPSS). Import the data into Stata using the import command (or the point-and-click interface). (If you already have a dataset imported, you will have to use the command clear to clear the previously imported dataset from memory.)

a. How many variables and observations are in the dataset? b. Use the codebook command to create a summary of the data (don’t paste it into the HW). c. How is the financial distress variable coded? What are the three possible values (Stata uses . to denote missing data)? d. Give an example of an ordinal variable, and a categorical (non-ordinal) variable in the dataset.

3. Post on “Dataset documentation” discussion forum (20%)

To promote reproducible and rigorous science and to accelerate discoveries, NIH has developed policies for sharing data. In practice, it is often difficult to understand the data, and thus it becomes unusable.

The datasets we shared in the data folder each have some documentation on the data being shared in the README file.

Please post on the “Dataset documentation” discussion forum a brief opinion on the README file for the Anderson data. Was it sufficient to understand the dataset and the variables? How would you compare the documentation of the Bos-Touwen dataset to the Anderson data?

A posting or reply in the forum is mandatory.

In the homework file, please enter the date of your posting.

4. Acknowledgements

Please acknowledge individuals who helped you or resources thay were helpful in completing the homework.