Exploring the dataset
- The dataset used in this tutorial includes stool samples from a study investigating non-invasive diagnostic methods for Inflammatory Bowel Disease.
- We can use the
rpxpackage to download and explore data from PRIDE or the ProteomeXchange Consortium without leaving R.
Processing data with DIA-NN
- Raw files from a mass spectrometry-based DIA proteomics experiment can be processed using DIA-NN (peptide identification and protein inference).
- The DIA-NN workflow has two key steps: generating a predicted spectral library, and processing raw files.
- The DIA-NN documentation contains detailed information about changing default settings as appropriate for your experiment.
Data cleaning with R
- Key elements of the proteomics data cleaning workflow include: quality filtering, removing contaminants, protein quantification, managing batch effects, and normalisation.
- We can use the
limpapackage in R to clean our peptide data and generate a complete, analysis-ready protein matrix with corresponding standard error values.
Differential expression analysis
- We can fit a linear model to our data and conduct differential
expression analysis using
limpaandlimmafunctions. - There are many different ways to visualise differentially expressed proteins.
Network and Enrichment Analysis
- Differential expression analysis identifies statistically significant changes in protein abundance between conditions, but does not necessarily tell us the biological significance of such identified proteins.
- Network analysis (STRING) can reveal physical or functional interactions among DE proteins.
- Enrichment analysis (GO/KEGG) can link those proteins to known biological processes or pathways.
- Using multiple approaches can help you interpret proteomics data to measure reliability and consistency of results.