Summary and Schedule

This is a new lesson built with The Carpentries Workbench.

Setup Instructions Download files required for the lesson

Duration: 00h 00m 1. Introduction What are the main types of functional enrichment analysis approaches, and how do they differ?
When should you choose one enrichment strategy over another for RNA-seq data?

Duration: 00h 12m 2. Gene Ontology testing with clusterProfiler What are the different types of GO terms (BP, MF, CC)?
How do we perform ORA using enrichGO() function?
How can we run GSEA-style functional class scoring with gseGO() function?

Duration: 00h 24m 3. KEGG enrichment analysis with clusterProfiler How can we perform pathway analysis using KEGG?
What insights can KEGG enrichment provide about differentially expressed genes

Duration: 00h 36m 4. Gene set enrichment analysis with fgsea What is Gene Set Enrichment Analysis (GSEA) and when should I use it?
How does fgsea perform fast, ranked-list GSEA?
How do I interpret enrichment scores, p-values, and leading-edge genes?
How does fgsea differ from the GSEA functions in clusterProfiler?

Duration: 00h 48m 5. Analysis with RegEnrich How can we use RegEnrich to identify key transcriptional regulators from RNA-seq data?
What inputs does RegEnrich need (expression matrix, metadata, list of regulators)?
Why do we need mouse-specific transcription factor (TF) information instead of the built-in human TFs?

Duration: 01h 00m 6. Interaction networks with StringDB How can we use STRINGdb to visualise protein–protein interaction networks for our DE genes?
How do we map our gene identifiers to the IDs used by STRING?
What information does STRING functional enrichment add beyond standard GO/KEGG analysis?

Duration: 01h 12m 7. Conclusion What have we learned about functional enrichment and pathway analysis?
How do different methods complement one another when interpreting RNA-seq results?

Duration: 01h 24m Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

This workshop provides a practical introduction to functional enrichment analysis following differential expression in RNA-seq studies. We will compare two major enrichment strategies, over-representation analysis (ORA) and functional class scoring (FCS), and discuss when each approach is most appropriate. Participants will learn how to implement these methods in R using packages including clusterProfiler, fgsea, Reg-Enrich and STRINGdb, drawing on pathway and gene-set resources such as Gene Ontology, KEGG Pathway Database and Molecular Signatures Database. By the end of this workshop, you will have a clear understanding of how to interpret enriched pathways in RNA-seq data.

This workshop is largely based on the Galaxy-based workshop RNA-seq genes to pathways on the Galaxy Training Network.

Prerequisite

Installed R and RStudio
Have basic R knowledge
Completed ‘Intro to R for Biologists’ and ‘RNA-seq: From reads to counts to genes’, or equivalent

R Packages & Datasets

In this workshop, we will learn how to use clusterProfiler, fgsea, RegEnrich and STRINGdb tools, along with related dependencies org.Mm.eg.db, impute and preprocessCore.

Please install the following packages:

R


## Install BiocManager if not already installed
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

## List of Bioconductor packages
bioc_packages <- c(
    "edgeR",
    "goseq",
    "fgsea",
    "EGSEA",
    "clusterProfiler",
    "org.Mm.eg.db",
    "enrichplot",
    "pathview",
    "preprocessCore",
    "RegEnrich",
    "STRINGdb"
)

## Install Bioconductor packages
BiocManager::install(bioc_packages, ask = FALSE, update = TRUE)

## Install CRAN packages
cran_packages <- c(
    "ggplot2",
    "impute"
)

install.packages(cran_packages)

The original dataset is from Fu et al., 2015 and described in the this tutorial.

We have provided the data we will be using within the GitHub repo. To access them:

The code assumes that these files are in a folder called “data” but you can adjust the code to the correct download path as needed.

You can load some of the data directly from Zenodo into your RStudio enviornment with the following code:

R


# To download files from Zenodo

dataurl <- "https://zenodo.org/record/2596382/files/"

debasal <- read.csv(paste0(dataurl,"limma-voom_basalpregnant-basallactate"), header = TRUE, sep = "\t")
deluminal <- read.csv(paste0(dataurl,"limma-voom_luminalpregnant-luminallactate"), header = TRUE, sep = "\t")
seqdata <- read.csv(paste0(dataurl,"seqdata"), header = TRUE, sep = "\t")
load(paste0(dataurl,"mouse_hallmark_sets")) #loads as Mm.H
factordata <- read.table(paste0(dataurl,"factordata"), header = TRUE, sep = "\t")
filteredcounts <- read.csv(paste0(dataurl,"limma-voom_filtered_counts"), header = TRUE, sep = "\t")

Summary

Checklist

Attendees are required to bring their own laptop computers. Please ensure you have installed:

Chrome or FireFox
R (Download and install the latest version of R using the UniMelb mirror)
RStudio
R packages required for this workshop (see above)
Datasets required for this workshop (see above)

Citations

Maria Doyle, Belinda Phipson, 3: RNA-seq genes to pathways (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/rna-seq-genes-to-pathways/tutorial.html Online; accessed Tue Nov 25 2025
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012