Biomedical Data Science
Since 2016 I have been developing and teaching a 10 credits (20 hours) course on analysis of biomedical data using the R statistical software as part of the MSc in Operational Research (with Data Science) and the MSc in Statistics.
The course covers the following topics during 5 lectures (10 hours in total):
Introduction to biomedical data
- Typical research questions: association, causation, discovery and prediction
 - Types of biomedical data: routine data (consented and unconsented), phenotypic biomarkers, genetic data, derived data
 - Identifying problems in real-world data
 - Data cleaning, alignment, imputation and exploration
 - Mechanisms of missing data
 
Discovering associations
- Covariance and correlation
 - Statistical inference and linear regression
 - Solving the least squares problem
 - Linear algebra considerations and collinearity
 - Hypothesis testing
 - Power considerations
 - Assessing the fit of the model
 
Logistic regression and predictive models
- Case-control studies
 - Generalized linear models
 - Logistic regression
 - Odds ratio and interpretation of results
 - Likelihood and model comparison
 - Measures of discrimination and calibration performance
 - Predictive models and cross-validation
 
Biomarker discovery and high-dimensional datasets
- High-throughput data (proteomics, metabolomics, lipidomics, glycomics)
 - Biomarkers and biomarker discovery
 - Dimensionality reduction: clustering and PCA
 - Multiple testing
 - Subset selection approaches
 - Penalised regression: LASSO, ridge regression, elastic nets
 
Prediction from genetic data
- Causality, confounding and stratification
 - Introduction to genetic data
 - Genetic variation
 - Genome-wide association studies
 - GWAS meta-analysis
 - Approaches for genotypic prediction and genetic risk scores
 
The course is accompanied by self-guided material to learn and practice how to perform analyses using R (10 hours in total):
Lab 1: Introduction to R
- Interactive terminal and workspaces
 - Object types and data structures
 - Basic functions and operators
 
Lab 2: Data preparation and linear regression
- Merging and simple imputations
 - Statistical summaries and plots
 - Writing functions and loops
 - Fitting linear regression monels
 
Lab 3: Logistic regression and predictive models
- Using R packages
 - Fitting logistic regression models
 - Making predictions on withdrawn data
 
Lab 4: High-dimensional datasets
- Correlation plots and PCA
 - Subset selection in R
 - Regularisation approaches
 
Lab 5: Prediction from genetic data
- Performing genome-wide association studies
 - Computing genetic risk scores
 - Prediction from genetic scores
 - Performing a GWAS meta-analysis