KEGG enrichment analysis with clusterProfiler

Last updated on 2025-12-02 | Edit this page

Overview

Questions

  • How can we perform pathway analysis using KEGG?
  • What insights can KEGG enrichment provide about differentially expressed genes

Objectives

  • Learn how to run KEGG over-representation and GSEA-style analysis in R.
  • Understand how to interpret pathway-level results.
  • Generate and visualise KEGG pathway figures.

Introduction


The KEGG (Kyoto Encyclopedia of Genes and Genomes) database links genes to curated biological pathways, offering a powerful foundation for understanding cellular functions at a systems level and making meaningful biological interpretations. clusterProfiler allows us to access KEGG and apply both ORA (using enrichKEGG function) and GSEA (using gseKEGG function) to extract pathway-level insights from our RNA-seq data.

KEGG analysis


Before running enrichment, we need to confirm the correct KEGG organism code for mouse (mmu). You can verify by searching:

R

kegg_organism <- "mmu"

search_kegg_organism(kegg_organism, by='kegg_code')

OUTPUT

     kegg_code               scientific_name                   common_name
29        mmur            Microcebus murinus              gray mouse lemur
34         mmu                  Mus musculus                   house mouse
9090      mmuc Mycolicibacterium mucogenicum Mycolicibacterium mucogenicum

Over-representation analysis with enrichKEGG


To run ORA using KEGG database, we need to specify the gene list, KEGG organism code and p-value cut-off. In this example, we take the top 500 genes from the ranked gene list debasal_genelist, specify the organism code mmu (defined as `kegg_organism) and use 0.05 as the p-value cut-off.

We can use head() function to briefly inspect the results of enrichKEGG.

R

kk <- enrichKEGG(gene         = names(debasal_genelist)[1:500],
                 organism     = kegg_organism,
                 pvalueCutoff = 0.05)

OUTPUT

Reading KEGG annotation online: "https://rest.kegg.jp/link/mmu/pathway"...

OUTPUT

Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/mmu"...

R

head(kk)

OUTPUT

                                     category
mmu04110                   Cellular Processes
mmu04060 Environmental Information Processing
mmu05323                       Human Diseases
mmu04061 Environmental Information Processing
mmu04062                   Organismal Systems
mmu04914                   Organismal Systems
                                 subcategory       ID
mmu04110               Cell growth and death mmu04110
mmu04060 Signaling molecules and interaction mmu04060
mmu05323                      Immune disease mmu05323
mmu04061 Signaling molecules and interaction mmu04061
mmu04062                       Immune system mmu04062
mmu04914                    Endocrine system mmu04914
                                                           Description
mmu04110                                                    Cell cycle
mmu04060                        Cytokine-cytokine receptor interaction
mmu05323                                          Rheumatoid arthritis
mmu04061 Viral protein interaction with cytokine and cytokine receptor
mmu04062                                   Chemokine signaling pathway
mmu04914                       Progesterone-mediated oocyte maturation
         GeneRatio   BgRatio RichFactor FoldEnrichment   zScore       pvalue
mmu04110    19/247 157/10644 0.12101911       5.215091 8.200826 3.563172e-09
mmu04060    24/247 294/10644 0.08163265       3.517806 6.747644 8.088296e-08
mmu05323    13/247  87/10644 0.14942529       6.439201 7.851470 9.190595e-08
mmu04061    12/247  95/10644 0.12631579       5.443341 6.704900 1.853385e-06
mmu04062    16/247 194/10644 0.08247423       3.554072 5.533530 1.118165e-05
mmu04914    10/247  93/10644 0.10752688       4.633669 5.424584 5.627105e-05
             p.adjust       qvalue
mmu04110 9.976881e-07 8.026513e-07
mmu04060 8.577889e-06 6.901008e-06
mmu05323 8.577889e-06 6.901008e-06
mmu04061 1.297369e-04 1.043748e-04
mmu04062 6.261723e-04 5.037627e-04
mmu04914 2.468240e-03 1.985727e-03
                                                                                                                                                     geneID
mmu04110                               20877/434175/12235/77011/12236/76464/17218/12534/71988/268697/12428/17216/67849/17215/18817/17219/67052/105988/12532
mmu04060 12978/16878/77125/20311/29820/20308/20297/20305/12977/21948/17082/16182/232983/21942/18829/21926/20310/20309/16181/330122/14563/20296/12985/230405
mmu05323                                                                    110935/20311/20297/12977/14960/21926/14961/15001/68775/20310/330122/22339/20296
mmu04061                                                                           12978/20311/20308/20297/20305/12977/16182/18829/21926/20310/330122/20296
mmu04062                                                  22324/20311/20308/20297/20305/15162/18829/18751/432530/20310/20309/94176/330122/11513/18796/20296
mmu04914                                                                                    434175/12235/110033/12534/268697/432530/12428/18817/11513/12532
         Count
mmu04110    19
mmu04060    24
mmu05323    13
mmu04061    12
mmu04062    16
mmu04914    10

GSEA-style KEGG enrichment with gseKEGG


Similar to previous enrichment analysis with GO database, we can also perform a GSEA-style enrichment using the KEGG database. To do so, we use the gseKEGG and specify the entire ranked gene list (debasal_genelist) rather than an arbitrary cutoff. In this example, we test KEGG pathways between 3 and 800 genes using 10,000 permutations and NCBI Gene IDs. Results are filtered using a p-value cut-off of 0.05.

R

kk2 <- gseKEGG(geneList     = debasal_genelist,
               organism     = kegg_organism,
               nPerm        = 10000,
               minGSSize    = 3,
               maxGSSize    = 800,
               pvalueCutoff = 0.05,
               pAdjustMethod = "none",
               keyType       = "ncbi-geneid")

OUTPUT

Reading KEGG annotation online: "https://rest.kegg.jp/conv/ncbi-geneid/mmu"...

OUTPUT

using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

OUTPUT

preparing geneSet collections...

OUTPUT

GSEA analysis...

WARNING

Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
minGSSize, : We do not recommend using nPerm parameter incurrent and future
releases

WARNING

Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
= minGSSize, : You are trying to run fgseaSimple. It is recommended to use
fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
in the fgsea function call.

WARNING

Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (0.98% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.

OUTPUT

leading edge analysis...

OUTPUT

done...

Visualising enriched pathways


Dotplot

Before we look at individual pathways in detail, we can visualise the overall enrichment results using dotplot().
This dotplot summarises which KEGG pathways are enriched, how many genes contribute to each pathway, and how significant each one is.

R

dotplot(kk2, showCategory = 10, title = "Enriched Pathways" , split=".sign") + facet_grid(.~.sign)

### Similarity-based network plots Next, we can explore how the enriched pathways relate to one another.
The enrichment map groups pathways that share many genes, helping us see broader biological themes rather than isolated pathways. In this case, pairwise_termsim() function calculates the similarity between enriched KEGG pathways and produces a similarity matrix that quantifies their relationship. The emapplot()generates an enrichment map using the similarity matrix produced, visualising the enriched pathways as a network with nodes representing pathways and edges reflecting their similarity.

R

kk3 <- pairwise_termsim(kk2)

emapplot(kk3)

WARNING

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggtangle package.
  Please report the issue to the authors.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.

We can also use cnetplot() to understand which genes drive these enriched pathways. This plot links genes to pathways they belong to and highlights genes that appear in multiple pathways.

R

cnetplot(kk3, categorySize="pvalue")

WARNING

Warning: ggrepel: 160 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

### Ridge plot We can also inspect the distribution of enrichment scores across pathways with ridgeplot(). This shows how strongly and broadly each pathway is enriched across the ranked gene list using overlapping density curves. 

R

ridgeplot(kk3) + labs(x = "enrichment distribution")

ERROR

Error in `ridgeplot.gseaResult()` at enrichplot/R/ridgeplot.R:6:15:
! The package "ggridges" is required for `ridgeplot()`.

R

head(kk3)

OUTPUT

               ID                             Description setSize
mmu05171 mmu05171          Coronavirus disease - COVID-19     216
mmu03010 mmu03010                                Ribosome     188
mmu04060 mmu04060  Cytokine-cytokine receptor interaction     177
mmu04110 mmu04110                              Cell cycle     153
mmu04530 mmu04530                          Tight junction     147
mmu04080 mmu04080 Neuroactive ligand-receptor interaction     146
         enrichmentScore      NES       pvalue     p.adjust      qvalue rank
mmu05171       0.5006706 1.946263 0.0001153802 0.0001153802 0.003343085 3724
mmu03010       0.5814136 2.226791 0.0001177302 0.0001177302 0.003343085 4733
mmu04060       0.5334229 2.030917 0.0001182313 0.0001182313 0.003343085 2003
mmu04110       0.5682774 2.130646 0.0001213298 0.0001213298 0.003343085 1287
mmu04530       0.4668123 1.743425 0.0001218918 0.0001218918 0.003343085 2221
mmu04080       0.4495919 1.678626 0.0001219066 0.0001219066 0.003343085 2287
                           leading_edge
mmu05171 tags=59%, list=24%, signal=46%
mmu03010 tags=67%, list=30%, signal=48%
mmu04060 tags=36%, list=13%, signal=32%
mmu04110  tags=22%, list=8%, signal=21%
mmu04530 tags=18%, list=14%, signal=16%
mmu04080 tags=36%, list=14%, signal=31%
core_enrichment
mmu05171 12266/12262/12260/12259/666501/21926/18751/12268/13058/15200/20296/12985/24088/16176/664969/50908/20344/317677/14962/17174/16785/56040/269261/667277/625018/20084/99571/19982/68436/20963/225215/22186/50528/78294/619883/16451/67186/67097/26419/20085/67671/16193/671641/20055/19951/11837/100503670/20115/27367/243302/100040416/20116/54217/27370/11421/50909/621697/100042335/76808/629595/20103/270106/268449/20088/19896/67025/68052/20090/75617/432725/20054/27050/54127/26961/67115/67891/67945/114641/22121/19946/20091/19899/20042/66489/100039532/100040298/100502825/16194/67427/66480/66481/15945/65019/19921/100043695/20068/432502/19988/19933/76846/21898/267019/665562/20102/20044/27207/100043813/670832/19981/19942/71586/19941/57294/66475/19944/66483/27176/57808/16898/22371/625281/20848/19934/110954/433745/12263/68193
mmu03010  666501/664969/16785/56040/269261/20084/56282/19982/66973/68436/225215/22186/78294/619883/67186/67097/20085/67671/671641/20055/19951/11837/100503670/20115/14694/68836/27367/243302/100040416/20116/54217/27370/621697/100042335/76808/629595/20103/270106/268449/20088/19896/67025/68052/20090/75617/432725/69163/20054/27050/54127/26961/67115/67891/67945/114641/22121/19946/20091/19899/20042/66489/59054/100039532/100040298/100502825/67427/60441/66480/66481/65019/19921/100043695/27397/20068/432502/118451/19988/19933/76846/267019/665562/79044/20102/20044/27207/100043813/78523/670832/19981/19942/66230/19941/57294/66475/19944/94063/66483/27176/57808/16898/625281/66258/19934/110954/433745/28028/68193/75398/67281/619547/319195/50529/26451/14109/19989/20104/64657/64655/68028/66407/20005/94065/216767/67308/19943/100043805
mmu04060                                                                                                                                                                                                                                                                                                                                                                                                                                          12978/16878/77125/20311/29820/20308/20297/20305/12977/21948/17082/16182/232983/21942/18829/21926/20310/20309/16181/330122/14563/20296/12985/230405/93672/20304/16176/12984/16153/14560/83430/16847/215257/20306/16994/16154/16164/16156/20303/16169/110075/12983/20292/16185/326623/21938/17480/19116/16190/20300/14825/16323/16175/320100/21939/12156/21943/18049/12162/245527/69583/20315/16193/13608
mmu04110                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  20877/434175/12235/77011/12236/76464/17218/12534/71988/268697/12428/17216/67849/17215/18817/17219/67052/105988/12532/107995/72415/22137/13555/12649/69716/12544/12442/67177/56150/12571/13557/12443/17127/27214
mmu
mmu04080                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       12310/22044/12266/223780/19204/216643/15558/207911/14419/15559/381073/231602/13614/18619/65086/54140/12062/16336/17200/11555/11549/16847/239845/11535/53623/67405/20287/109648/20607/18441/18389/170483/18436/19116/11541/11550/11606/13618/21333/15552/193034/15465/12671/16995/11539/227717/18442/110637/381677/14062/14658/171530/11553

You can see the top pathways, you can get the top pathway ID with the ID column.

R

# There must be a function that gets the results -> not ideal code
kk3@result$ID[1]

OUTPUT

[1] "mmu05171"

KEGG Pathway Diagram

Finally, we can visualise gene expression changes directly onto a KEGG pathway diagram.
pathview highlights which components of the pathway are up- or down-regulated in your enrichment analysis.

R

# Produce the native KEGG plot (PNG)
mmu_pathway <- pathview(gene.data=debasal_genelist, pathway.id=kk3@result$ID[1], species = kegg_organism)

These will produce these files in your working directory:

mmu05171.xml mmu05171.pathview.png mmu05171.png

Image of pathway
Figure of output produced
Key Points
  • KEGG pathway analysis helps link DEGs to functional biological pathways.

  • Both ORA (enrichKEGG) and GSEA-style (gseKEGG) methods provide complementary insights.

  • pathview enables visual interpretation of pathway-level expression changes.