Summary and Schedule
This is a new lesson built with The Carpentries Workbench.
| Setup Instructions | Download files required for the lesson | |
| Duration: 00h 00m | 1. Introduction |
What are the main types of functional enrichment analysis approaches,
and how do they differ? When should you choose one enrichment strategy over another for RNA-seq data? |
| Duration: 00h 12m |
2. Gene Ontology testing with
clusterProfiler
|
What are the different types of GO terms (BP, MF, CC)? How do we perform ORA using enrichGO() function?How can we run GSEA-style functional class scoring with gseGO() function?
|
| Duration: 00h 24m |
3. KEGG enrichment analysis with
clusterProfiler
|
How can we perform pathway analysis using KEGG? What insights can KEGG enrichment provide about differentially expressed genes |
| Duration: 00h 36m |
4. Gene set enrichment analysis with
fgsea
|
What is Gene Set Enrichment Analysis (GSEA) and when should I use
it? How does fgsea perform fast, ranked-list GSEA? How do I interpret enrichment scores, p-values, and leading-edge genes? How does fgsea differ from the GSEA functions in clusterProfiler? |
| Duration: 00h 48m |
5. Analysis with RegEnrich
|
How can we use RegEnrich to identify key transcriptional
regulators from RNA-seq data?What inputs does RegEnrich need (expression matrix, metadata, list of
regulators)?Why do we need mouse-specific transcription factor (TF) information instead of the built-in human TFs? |
| Duration: 01h 00m |
6. Interaction networks with
StringDB
|
How can we use STRINGdb to visualise protein–protein
interaction networks for our DE genes?How do we map our gene identifiers to the IDs used by STRING? What information does STRING functional enrichment add beyond standard GO/KEGG analysis? |
| Duration: 01h 12m | 7. Conclusion |
What have we learned about functional enrichment and pathway
analysis? How do different methods complement one another when interpreting RNA-seq results? |
| Duration: 01h 24m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
This workshop provides a practical introduction to functional
enrichment analysis following differential expression in RNA-seq
studies. We will compare two major enrichment strategies,
over-representation analysis (ORA) and
functional class scoring (FCS), and discuss when each
approach is most appropriate. Participants will learn how to implement
these methods in R using packages including
clusterProfiler, fgsea,
Reg-Enrich and STRINGdb, drawing on pathway
and gene-set resources such as Gene Ontology,
KEGG Pathway Database and Molecular Signatures
Database. By the end of this workshop, you will have a clear
understanding of how to interpret enriched pathways in RNA-seq data.
This workshop is largely based on the Galaxy-based workshop RNA-seq genes to pathways on the Galaxy Training Network.
Installed R and RStudio
Have basic R knowledge
Completed ‘Intro to R for Biologists’ and ‘RNA-seq: From reads to counts to genes’, or equivalent
R Packages & Datasets
In this workshop, we will learn how to use
clusterProfiler, fgsea, RegEnrich
and STRINGdb tools, along with related dependencies
org.Mm.eg.db, impute and
preprocessCore.
Please install the following packages:
R
## Install BiocManager if not already installed
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
## List of Bioconductor packages
bioc_packages <- c(
"edgeR",
"goseq",
"fgsea",
"EGSEA",
"clusterProfiler",
"org.Mm.eg.db",
"enrichplot",
"pathview",
"preprocessCore",
"RegEnrich",
"STRINGdb"
)
## Install Bioconductor packages
BiocManager::install(bioc_packages, ask = FALSE, update = TRUE)
## Install CRAN packages
cran_packages <- c(
"ggplot2",
"impute"
)
install.packages(cran_packages)
The original dataset is from Fu et al., 2015 and described in the this tutorial.
We have provided the data we will be using within the GitHub repo. To access them:
- limma-voom_basalpregnant-basallactate
- limma-voom_luminalpregnant-luminallactate
- seqdata
- mouse_hallmark_sets
- factordata
- filteredcounts
- mouseTFs
The code assumes that these files are in a folder called “data” but you can adjust the code to the correct download path as needed.
You can load some of the data directly from Zenodo into your RStudio enviornment with the following code:
R
# To download files from Zenodo
dataurl <- "https://zenodo.org/record/2596382/files/"
debasal <- read.csv(paste0(dataurl,"limma-voom_basalpregnant-basallactate"), header = TRUE, sep = "\t")
deluminal <- read.csv(paste0(dataurl,"limma-voom_luminalpregnant-luminallactate"), header = TRUE, sep = "\t")
seqdata <- read.csv(paste0(dataurl,"seqdata"), header = TRUE, sep = "\t")
load(paste0(dataurl,"mouse_hallmark_sets")) #loads as Mm.H
factordata <- read.table(paste0(dataurl,"factordata"), header = TRUE, sep = "\t")
filteredcounts <- read.csv(paste0(dataurl,"limma-voom_filtered_counts"), header = TRUE, sep = "\t")
Summary
Citations
Maria Doyle, Belinda Phipson, 3: RNA-seq genes to pathways (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/rna-seq-genes-to-pathways/tutorial.html Online; accessed Tue Nov 25 2025
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012