Summary and Setup
This is a new lesson built with The Carpentries Workbench.
Tutorial overview
This tutorial will introduce participants to proteomic data processing, cleaning, and analysis, with a focus on label-free, bottom-up, DIA, mass spectrometry-based quantitative proteomics.
We will begin with a brief overview of the proteomics workflow and
introduction to the dataset used in this tutorial. This will be followed
by an introduction to DIA-NN as a command-line interoperable DIA-focused
proteomic data processing software. Participants will then be introduced
to the limpa package for protein quantification and guided
through key steps for cleaning and converting proteomics data to an
analysis-ready format, including removing contaminants and quality
filtering. Finally, we will introduce some basic statistical analyses
for interpreting proteomics data, including differential expression
analysis, enrichment analysis, and protein-protein interaction network
analysis.
LEARNING OBJECTIVES
By the end of this tutorial, you will be able to:
- Describe the proteomics workflow and understand the types of information that can be acquired from proteomic testing.
- List the steps required to process and clean proteomics data, and justify which methods are most appropriate for your data.
- Apply a variety of statistical analyses to a proteomics dataset and interpret their meaning.
Prior to the workshop
REQUIRED KNOWLEDGE
- This workshop assumes participants have a basic understanding of R or have previously attended an Introduction to R workshop. Please review introductory materials here.
REQUIRED SOFTWARE
Attendees are required to bring their own laptop computers. Please ensure you have installed:
- Chrome or FireFox
- R (Download and install the latest version of R using the UniMelb mirror)
- RStudio
- R packages required for this workshop (see below)
Installing required R packages
Please copy and run the below code to install the required R packages prior to the workshop.
R
# Packages from CRAN
cran_packages <- c(
"limpa", # Proteomics data processing and DE analysis
"dplyr", # Data manipulation
"readxl", # Read Excel files
"stringr", # Manipulate strings
"curl", # Download files from URLs
"pheatmap", # Heatmap visualization
"EnhancedVolcano", # Volcano plots
"STRINGdb", # Protein-protein interaction network visualization
"arrow" # Dependency for .parquet reading in limpa
)
# Packages from Bioconductor
bioc_packages <- c(
"clusterProfiler", # Functional enrichment analysis
"org.Hs.eg.db", # Human gene annotation (for GO/KEGG)
"rpx", # Interface to the ProteomeXchange Repository
"STRINGdb" # Interface to the STRING protein-protein interactions database
)
# Function to install missing CRAN packages
install_if_missing <- function(pkgs, repo = "https://cloud.r-project.org") {
to_install <- pkgs[!pkgs %in% installed.packages()[, "Package"]]
if (length(to_install) > 0) {
install.packages(to_install, repos = repo)
}
}
# 1. Install CRAN packages
install_if_missing(cran_packages)
# 2. Install Bioconductor manager if needed
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
# 3. Install Bioconductor packages
for (pkg in bioc_packages) {
if (!requireNamespace(pkg, quietly = TRUE)) {
BiocManager::install(pkg, ask = FALSE, update = TRUE)
}
}
REQUIRED DATA
Please click the links below to download the data required for this workshop:
We have pre-processed a subset of this dataset downloaded from PRIDE, a public repository for mass spectrometry-based proteomics data.
You can read the associated paper here.
Introductory slides
Introductory slides for this workshop can be found here.
If you are attending a live workshop, there is no need to review these slides in advance.