RNA-seq pathway analysis: Key Points

Introduction

GO terms are divided into Biological Process (BP), Molecular Function (MF) and Cellular Component (CC), which can be analysed separately or together depending on the biological question.
The enrichGO() and gseGO() functions in clusterProfiler allow users to perform ORA and GSEA using the GO database directly.
GO testing results highlight gene sets or pathways that are overrepresented in your dataset, allowing interpretation of downregulated or upregulated genes.

KEGG pathway analysis helps link DEGs to functional biological pathways.
Both ORA (enrichKEGG) and GSEA-style (gseKEGG) methods provide complementary insights.
pathview enables visual interpretation of pathway-level expression changes.

GSEA evaluates enrichment across a ranked list of all genes, not just a subset of significant ones.
The fgsea package provides a fast implementation of GSEA suitable for large RNA-seq datasets.
A positive NES indicates enrichment among up-regulated genes, while a negative NES indicates enrichment among down-regulated genes.
plotGseaTable() and plotEnrichment() help visualise how pathways behave across the ranked gene list.
Compared with clusterProfilers GSEA functions, fgsea focuses on speed and flexibility, while clusterProfiler provides tighter integration with specific databases (e.g., GO, KEGG) and additional plotting helpers.

RegEnrich helps identify potential regulatory drivers (e.g. TFs) behind observed gene expression changes.
The package’s built-in TF dataset (data(TFs)) is human-specific and not suitable for mouse RNA-seq analysis.
For mouse data, a mouse-specific TF list (e.g. from TcoF-DB) must be supplied via the reg argument.
A RegenrichSet object requires: an expression matrix, sample metadata, a regulator list, and a design/contrast specification.

STRINGdb links your genes to protein–protein interaction networks from the STRING database.
Mapping from gene IDs (e.g. ENTREZ) to STRING IDs is a crucial first step.
Network visualisation can reveal modules of interconnected DE genes that may not be obvious from lists or tables.
STRING provides its own functional enrichment, which can complement results from clusterProfiler and fgsea.

Enrichment methods help translate gene-level changes into biological meaning.
Different tools (ORA, GSEA, network-based methods) answer different but complementary questions.
Combining methods provides stronger and more interpretable biological insights.
Functional enrichment is an essential component of any RNA-seq analysis workflow.