Introduction to Proteomics: Key Points

Exploring the dataset

The dataset used in this tutorial includes stool samples from a study investigating non-invasive diagnostic methods for Inflammatory Bowel Disease.
We can use the rpx package to download and explore data from PRIDE or the ProteomeXchange Consortium without leaving R.

Raw files from a mass spectrometry-based DIA proteomics experiment can be processed using DIA-NN (peptide identification and protein inference).
The DIA-NN workflow has two key steps: generating a predicted spectral library, and processing raw files.
The DIA-NN documentation contains detailed information about changing default settings as appropriate for your experiment.

Key elements of the proteomics data cleaning workflow include: quality filtering, removing contaminants, protein quantification, managing batch effects, and normalisation.
We can use the limpa package in R to clean our peptide data and generate a complete, analysis-ready protein matrix with corresponding standard error values.

We can fit a linear model to our data and conduct differential expression analysis using limpa and limma functions.
There are many different ways to visualise differentially expressed proteins.

Differential expression analysis identifies statistically significant changes in protein abundance between conditions, but does not necessarily tell us the biological significance of such identified proteins.
Network analysis (STRING) can reveal physical or functional interactions among DE proteins.
Enrichment analysis (GO/KEGG) can link those proteins to known biological processes or pathways.
Using multiple approaches can help you interpret proteomics data to measure reliability and consistency of results.