Exploring the dataset


  • The dataset used in this tutorial includes stool samples from a study investigating non-invasive diagnostic methods for Inflammatory Bowel Disease.
  • We can use the rpx package to download and explore data from PRIDE or the ProteomeXchange Consortium without leaving R.

Processing data with DIA-NN


  • Raw files from a mass spectrometry-based DIA proteomics experiment can be processed using DIA-NN (peptide identification and protein inference).
  • The DIA-NN workflow has two key steps: generating a predicted spectral library, and processing raw files.
  • The DIA-NN documentation contains detailed information about changing default settings as appropriate for your experiment.

Data cleaning with R


  • Key elements of the proteomics data cleaning workflow include: quality filtering, removing contaminants, protein quantification, managing batch effects, and normalisation.
  • We can use the limpa package in R to clean our peptide data and generate a complete, analysis-ready protein matrix with corresponding standard error values.

Differential expression analysis


  • We can fit a linear model to our data and conduct differential expression analysis using limpa and limma functions.
  • There are many different ways to visualise differentially expressed proteins.

Network and Enrichment Analysis


  • Differential expression analysis identifies statistically significant changes in protein abundance between conditions, but does not necessarily tell us the biological significance of such identified proteins.
  • Network analysis (STRING) can reveal physical or functional interactions among DE proteins.
  • Enrichment analysis (GO/KEGG) can link those proteins to known biological processes or pathways.
  • Using multiple approaches can help you interpret proteomics data to measure reliability and consistency of results.