Homework 7

Due: 2022-10-17, 11:59pm

All homework must be submitted via Blackboard. Your answers must be in a MS Word (DOCX) or PDF format. Your submitted document should have sections corresponding to those in this homework.

Please make sure that you have watched the videos and have done the readings. Everyone should do this independently; you can discuss the process, but the answers are expected to be different.

Include graphs as images in your document. Use the lecture notes as a guide.

Attach the video to the homework; if there are problems, create a ZIP archive with the video and the PDF/DOCX document and attach that.

1. Flu diagnostic test (50%)

Use the Anderson flu dataset for this problem. We will examine the impact of diagnostic criteria on performance characteristics of a test.

Consider the variables sec3_cough, sec3_runny and sec3_chill which indicate whether the individual exhibited cough, runny nose, and chills symptoms respectively. We will use this information to build different versions of a diagnostic test and examine their performance characteristics.

  • Check the levels of each variable, eg. for the chills variable levels sec3_chill If there are spaces, then replace them with empty strings (essentially missing values) using something like replace sec3_chill=”” if sec3_chill=” “ Tabulate all three variables, one at at time, against the flu diagnosis variable
  • Create a new variable, say coughrunnychills that is a composite of the three variables using egen (see previous lecture). The tabulate flu status against this composite variable. The composite variable should have 8 levels (2x2x2=8). If we define a test that is positive if an individual has all three symptoms, what is the sensitivity and specificity of that test?
  • If we define a measure that counts the number of symptoms present (out of cough, runny nose, and chills), it will vary between 0 to 3. Make a table of counts using the previous exercise of the number of symptoms against flu diagnosis.
  • If we define a test to be positive based on the number of symptoms variying from 1, 2, and 3, what are the sensitivities and specificities corresponding to those choices?
  • Plot an ROC curve varying the cutoff for the number of symptoms. One way to do this is to create a CSV or XLS file with the sensitivity and specificity from the previous answer. Then plot it using twoway line. To make it look nicer add a two with sensitivity nd specificity equal to zero, and another when they are both one. Comment on the figure and potential usefulness of the test based on these commonly-observed symptoms.

2. Video explanation (50%)

Make a short video (4-6 minutes) based on the previous exercise. Imagine you are explaining to a lay audience the use of the commonly used symptoms for deciding whether a family member should stay at home or not.

Please note that I will share the videos with the class,

  • Introduce yourself.
  • Summarize the findings from the Anderson data (using the exercise above). Explain what sensitivity and specificity are in the context of a test that is positive when all three symptoms are present. Explain how sensitivity and specificity vary as you count the number of symptoms.
  • If flu is uncommon in the population (say 5%). Explain what is the chance they have the flu, if an individual has all three symptoms. If they have two or less of the symptoms, what is the chance that they have the flu?

Some thoughts for making the video:

  • Make sure you have the written homework (part 1) done before you make the video. That written piece is the raw material for the video.
  • The easiest way to make a video is to have a meeting with yourself on Zoom and to hit record. You can share a screen if you want to show visuals. You could also print out and hold an image or message in front of the camera. If it helps, invite a friend to the Zoom meeting to give company.,
  • Do a dry run at least once before hitting record. Make sure the video is within time limits (4-6 minutes).
  • Don’t overthink. Relax, and have fun. This is not a filmmaking class, and ultimately it is more important to be factually clear and correct than to have fancy effects.

The default grade will be 1.5 points (out of 2.5 points for the video). Points will be added for creativity, clarity, or special insights contributed. Points will be deducted for not following directions (eg going over or under time, not answering the questions posed), unclear or incorrect explanations.

3. Acknowledgements

Please acknowledge individuals who helped you or resources thay were helpful in completing the homework.