This course offers a practical and structured approach to data visualization, biostatistical analysis, statistical modeling, and machine learning in R. It is designed for students, researchers, and professionals who want to apply data analysis techniques to real-world datasets.
Using a hands-on, learning-by-doing approach, you will work with real data from the beginning. You will learn how to import data in R and RStudio, explore data structures, and create professional visualizations using ggplot. By the end, you will be able to produce clear, publication-ready figures suitable for reports, theses, and research work.
The course then moves into data management and descriptive analysis. You will learn how to clean datasets, handle missing values, create new variables, and perform exploratory data analysis. These steps build a strong foundation for statistical modeling. You will perform regression analysis in R, including linear and logistic regression, and learn how to interpret model results correctly. Special emphasis is placed on creating clear statistical summaries and publication-ready tables.
After building a solid statistical base, the course introduces machine learning in R. You will learn key concepts such as train-test split, model evaluation, and overfitting. Practical models such as decision trees and random forests are implemented and compared with traditional regression approaches. You will also evaluate model performance using confusion matrices and ROC curves, helping you understand when machine learning methods are appropriate.
What you will learn
In this course, you will learn how to create professional data visualizations using ggplot in R and how to clean and manage datasets for analysis. You will perform descriptive and statistical analysis, build and interpret regression models, and generate publication-ready tables. You will also apply machine learning techniques in R and evaluate models using confusion matrix, ROC, and AUC. In addition, you will learn how to compare statistical and predictive modeling approaches in practical research settings.
Who this course is for
This course is designed for students including MPH, MSc, and PhD candidates, as well as public health and research professionals. It is also suitable for data analysts and beginners in data science. Anyone with basic R knowledge who wants to advance their skills in data visualization and machine learning will benefit from this course.





