Introduction


Gene Ontology testing with clusterProfiler


  • GO terms are divided into Biological Process (BP), Molecular Function (MF) and Cellular Component (CC), which can be analysed separately or together depending on the biological question.
  • The enrichGO() and gseGO() functions in clusterProfiler allow users to perform ORA and GSEA using the GO database directly.
  • GO testing results highlight gene sets or pathways that are overrepresented in your dataset, allowing interpretation of downregulated or upregulated genes.

KEGG enrichment analysis with clusterProfiler


  • KEGG pathway analysis helps link DEGs to functional biological pathways.

  • Both ORA (enrichKEGG) and GSEA-style (gseKEGG) methods provide complementary insights.

  • pathview enables visual interpretation of pathway-level expression changes.

Gene set enrichment analysis with fgsea


  • GSEA evaluates enrichment across a ranked list of all genes, not just a subset of significant ones.
  • The fgsea package provides a fast implementation of GSEA suitable for large RNA-seq datasets.
  • A positive NES indicates enrichment among up-regulated genes, while a negative NES indicates enrichment among down-regulated genes.
  • plotGseaTable() and plotEnrichment() help visualise how pathways behave across the ranked gene list.
  • Compared with clusterProfilers GSEA functions, fgsea focuses on speed and flexibility, while clusterProfiler provides tighter integration with specific databases (e.g., GO, KEGG) and additional plotting helpers.

Analysis with RegEnrich


  • RegEnrich helps identify potential regulatory drivers (e.g. TFs) behind observed gene expression changes.
  • The package’s built-in TF dataset (data(TFs)) is human-specific and not suitable for mouse RNA-seq analysis.
  • For mouse data, a mouse-specific TF list (e.g. from TcoF-DB) must be supplied via the reg argument.
  • A RegenrichSet object requires: an expression matrix, sample metadata, a regulator list, and a design/contrast specification.

Interaction networks with StringDB


  • STRINGdb links your genes to protein–protein interaction networks from the STRING database.

  • Mapping from gene IDs (e.g. ENTREZ) to STRING IDs is a crucial first step.

  • Network visualisation can reveal modules of interconnected DE genes that may not be obvious from lists or tables.

  • STRING provides its own functional enrichment, which can complement results from clusterProfiler and fgsea.

Conclusion


  • Enrichment methods help translate gene-level changes into biological meaning.
  • Different tools (ORA, GSEA, network-based methods) answer different but complementary questions.
  • Combining methods provides stronger and more interpretable biological insights.
  • Functional enrichment is an essential component of any RNA-seq analysis workflow.