header.knit

Working with biological data with R


You are working in collaboration with clinical teams in three hospitals, caring for individuals living with RandoVirus infection. Some of these are asymptomatic (i.e. they have no symptoms relating to RandoVirus), whereas some are affected by the clinical condition of Randesease. With the help of the clinical teams you have been compiling a dataset of the subjects of your study, noting demographic and clinical information.

You would like to prepare a report/presentation describing this cohort that will be shared with your collaborators, guide followup research and provide a basis for your manuscript. Give your answers meaningfull headers. If a question can be answered by simple text, calculate the answer inline.

Link to the dataset

  1. Start an RMarkdown script. load the tidyverse package in the setup chunk.

  2. Download the dataset using the link above. The data is saved in tab-separated text format, so will need to be read with the appropriate readr function. (hint: look them up on https://readr.tidyverse.org)

  3. How many people are included in the study? Show a summary table describing how many subjects are included from each hospital.

  4. How many people in the study have which clinical status? How does that compare between hospitals? display using a bar graph (hint: first calculate the required value using count() then plot using geom_bar() with the stat = "identity option).

  5. Is the subject age known about all participants? is any data missing?
    Is there a correlation between proviral load and subject age? Evaluate via geom_point()

  6. Is there a difference between the proviral load of subjects with Randesease and the proviral load of asymptomatic people living with RandoVirus?

  7. Is there a difference between the proviral load of subjects with existing comorbidities and the proviral load of subjects with none of the comorbidities investigated?

  8. Advanced: install patchwork library. Combine the plot from the above with a similar plot looking for the differences in proviral load on the bases of known coinfections. Produce as a single figure with two panels.