

Objectives
This assessment addresses Unit Learning Outcomes 1, 2, 3 & 4:
• Explain how statistical choices in analysis link directly to the research study design that
generated the data, and the type of data,
• Explain the rationale behind hypothesis testing, and the concept of Type I and II errors,
• Differentiate the most appropriate descriptive and inferential statistics to use for common types of
health data,
• Analyse health data using a statistical software package, and interpret the results.
Task
The outbreak of COVID-19 is now widespread across the world and has resulted in a large number of
infections and serious health outcomes. The outbreak has infected more than 9,000 people in South
Korea to date and their public health response has been praised for quickly reducing the number of
infections and casualties. Your task is to answer a set of research questions using current data from the
Korean Center for Disease Control & Prevention (sent by e-mail). This data was obtained at a particular
point in time (cross-sectional) from cases reported and contains a variety of clinical and epidemiological
data. Details of the data set and the variables are provided below.
Description of variables
Table 1. Coding manual for variables in dataset
Name of Variable Variable Description Variable Values
patient_id Unique patient identification number Numeric value; NA = missing
global_num Cumulative number of case Numeric value from 1 to 10000; NA = missing
sex Gender of patient 1 = Male; 2 = Female; NA = missing
birth_year Year of birth Numeric value from 1901 to 2020; NA = missing
country Country of contact China, France, Korea, etc. NA = missing
province Province of South Korea Busan, Seoul, Gang-do, etc; NA = missing
(provinces represented will depend on dataset)
city City in South Korea Andong-si, Ansan-si, etc; NA = missing
disease Pre-exisiting disease identified TRUE/FALSE; NA = missing
infection_case Source of infection Overseas inflow, contact with patient, etc; NA =
missing
infection_order Order of infection Numeric value from 1 to 6; NA = missing
infected_by Patient id if source known Numeric value; NA = missing
contact_number Number of possible contacts Numeric value from 1 to 1160; NA = missing
symptom_onset_
date
Date of symptom onset Date; NA = missing
confirmed_date Date of positive test for Covid-19 if known Date; NA = missing
released_date Date of release from isolation or hospital Date; NA = missing
deceased_date Date of death Date; NA = missing
state Current status of Covid-19 patient 1 = Isolated, 2 = Released, 3 = Deceased; NA =
missing
days_rel_conf Number of days from confirmed test to
release
Numeric value from 1 to 30; NA = missing
days_death_conf Number of days from confirmed test to
death
Numeric value from 1 to 30; NA = missing
gender_deaths Gender of patients deceased 1 = Male; 2 = Female; NA = missing
A subset of the original dataset (which is unique to you) has been sent by e-mail and you will need to use
this version for the Assignment.
In your analytical report you are required to answer the following research questions:
Answer all of the questions (Q1, 2, 3 & 4):
Q1: Is there an association between deaths and gender? (Note: you will need to
transform the current status of the Covid-19 patient to be in one of two groups (alive or
deceased)).
Q2: Is there a difference in age between males and females for those patients deceased? At
the end of your summary, critically reflect on how these results provide important
information to answer Question 1. (Note: you will need to create a new variable for age
which is the difference between the year 2020 and the patients birth year. As the sample
size for this analysis is small you can use a threshold of 20% to assess normality. You can
also use a significance level of 0.10 and 90% confidence level for the statistical results).
Q3: Are there differences in the number of days from confirmed test to release for the
provinces identified in your data?
Q4: Are age and global number (cumulative number of case) significant predictors of the
number of days from confirmed test to release? Firstly describe the relationships between
each of the independent and dependent variables, and then identify which of the variables
explain the largest amount of variation in number of days from confirmed test to release. If
researchers are mostly interested in the association between global number and number of
days from confirmed test to release, why is the effect of age being examined (provide
details relevant to your data)? (Note: use the new variable age you created for Question 2)
For each research question (Q1 to 4) you are required to fully detail an analytical plan, similar to that
used in the PUB561 Activity Workbook, Week 5 (page 6 & 7).
Please use the marking guide on page 6 to guide the extent of the analysis and answers
presented for each question.
This should include, at a minimum, the following:
1. State the question
2. Develop and clearly articulate an analysis plan that will allow you to answer the question
3. Implement the analysis plan using Jamovi and report all relevant output. If you need to modify or
create new variables to implement the plan then you should describe these new / modified
variables and how they were calculated.
4. Interpret the results of the analysis
5. Write a summary paragraph describing the question, the data and the results. Graphics should
be incorporated if relevant.
6. Tables and figures in the report should be professionally presented with clear numbering, titles
and appropriate referencing in the written sections of the report. e.g Table 1.1 shows the
results from a … test examining the association between …
Formatting and word limits
Your report should contain a title page clearly identifying the unit code, your name and student number.
You should also indicate the word count for each section of your report as outlined below and the file
name and number of the dataset you used. Failure to do so will result in the assignment not being
marked.
Each research question should be treated as a separate section in your report and it is expected that you
will use appropriate headings within each section.
You are not required to provide a formal introduction, search any literature or provide references
in your analytical report.
The report must:
1) use 12 pt font,
2) have minimum of 1.5 line spacing, and
3) have page margins no smaller than 2cm.
It is expected the report will be well written using professional language and be free from
grammatical and spelling errors. The written sections of the report should be no longer than 4,000
words excluding the analysis plans, section headings and tables/figures. The valid word count for
each question should be stated on the title page of the report.
Submission
Your assignment MUST be submitted to TURNITIN in the Assessment 2 section of Blackboard.
The submission deadline for Question 1 is 11:59pm Sunday 3rd May 2020 (end of Week 8). Questions 2, 3 &
4 are due 11:59pm Sunday 7th June 2019.
PLEASE NOTE: All requests for extensions must be submitted prior to the due date using the QUT
online application (http://external-apps.qut.edu.au/studentservices/concession/ ).
Assessment submitted after the due date without an approved extension will not be marked and will
receive a grade of 1 or 0%. Please read the instructions for an extension carefully.
Marking criteria
The analytical report will be marked out of a total of 270 marks according to the criteria on page 6 (last
page). Please ensure you review the criteria prior to submitting your assessment.
Feedback
Feedback will be provided via the criteria marking sheet and written comments on the
assessment.
Marking criteria
Element Max. marks
Question 1 (50)
• Clear & comprehensive analytical plan to answer question that is technically correct
including scientific hypothesis, statistical test & assumptions
10
• Clearly documented evidence that all test assumptions have been tested for validity 10
• Concise & accurate written summary describing the data 10
• Comprehensive and correct interpretation and reporting of statistical results 20
Question 2 (60)
• Clear & comprehensive analytical plan to answer question that is technically correct
including scientific hypothesis, statistical test & assumptions
10
• Clearly documented evidence that all test assumptions have been tested for validity (&
revision of analysis if required)
10
• Concise & accurate written summary describing the data
• Comprehensive and correct interpretation and reporting of statistical results
20
20
Question 3 (55)
• Clear & comprehensive analytical plan to answer question that is technically correct
including scientific hypothesis, statistical test & assumptions
10
• Clearly documented evidence that all test assumptions have been tested for validity (&
revision of analysis if required)
15
• Concise & accurate written summary describing the data
• Comprehensive and correct interpretation and reporting of statistical results
10
20
Question 4 (85)
• Clear & comprehensive analytical plan to answer question that is technically correct
including scientific hypothesis, statistical test & assumptions
20
• Clear description of univariate & bivariate analysis undertaken 20
• Clearly documented evidence that all test assumptions have been tested for validity, and
test all relevant correlations, describe significant single linear relationships with regression and
multiple regression.
30
• Concise & accurate written summary describing the data 15
Overall Report
Overall Report
• Written report contains all of the required information and adheres to formatting
requirements including maximum prescribed length
10
• Written report uses professional language to clearly articulate meaning with minimal
typographic and grammatical errors
10
TOTAL This will be converted to a final mark of 50 270