Processing data with DIA-NN

Last updated on 2025-11-14 | Edit this page

Overview

Questions

How can I process my output files from a mass spectrometry-based proteomics experiment?

Objectives

Learn how to process proteomics data using DIA-NN.

Processing proteomics data

There are many software options for processing mass spectrometry-based proteomics output. These software match mass spectra against an empirical or predicted spectral library to identify peptides and infer the protein to which they belong.

Some software you may have heard of include Spectronaut, Fragpipe, MaxQuant and DIA-NN. Each software have slightly different functionality and attributes, and are frequently updated. In this tutorial, we are demonstrating the use of DIA-NN, a software designed to process DIA proteomics data.

Proteomics software like DIA-NN match observed spectra against a spectral library to identify peptides and assign them to a protein or protein group. Source: Galaxy Training Network

Why DIA-NN?

Key features of DIA-NN software include:

Neural network–based scoring: Improves identification confidence.
Interference correction: Enhances quantification accuracy.
Library-free analysis: Can work without a spectral library (generating one directly from DIA data).
High speed and efficiency: Suitable for large-scale datasets.
Cross-run normalisation: Ensures consistent quantification across experiments.
Command-line executable: DIA-NN can be run from a graphical user interface (GUI) or directly from the command line, making it easy to integrate into an automated pipeline.

DIA-NN is free to download. Detailed operating instructions and a link to install the latest version of DIA-NN can be found on the GitHub page.

DIA-NN workflow

Callout

Important Note

Due to the amount of time it will take to process the data, you do not need to install or operate DIA-NN for the purposes of this tutorial. Pre-processed data is provided, which has been prepared following the procedures described below.

Part 1: Creating a spectral library

DIA-NN analysis consists of two steps. The first step is to generate a predicted spectral library for your experimental organism.

Download a sequence database from UniProt.
1. Go to https://www.uniprot.org/uniprotkb?query=*
2. Select the relevant organism for your sample and Reviewed (Swiss-Prot) status to filter results.
3. Click Download
4. Change the format to FASTA (canonical and isoform)
5. Select No under Compressed and download the FASTA file

Callout

FOR SPECIFIC SEARCHES

If you need to search for a variant protein or peptide that differs from the standard organism proteome, you can manually edit the protein sequences in the UniProt FASTA using a text editor.

Download a contaminant FASTA file. For most experiments, the Universal Contaminants.fasta will be appropriate, but you may prefer to use one of the sample-type specific contaminant FASTAs depending on your experiment. More information about this resource and contaminants will be discussed in the next lesson.
Open the DIA-NN GUI or a command line interface. If there are multiple versions of DIA-NN installed on your device, ensure to update DIA-NN exe with the file path to the version you wish to use.
Add the UniProt and Contaminant FASTA files you just downloaded.

Caution

At the time of writing, DIA-NN 2.2.0 cannot read or write files saved in an external drive. Please ensure all input and output files are saved on a local drive to avoid error.

Examples are included below for the additional settings we recommend changing from the default. Depending on your experiment and research goals, you may wish to change additional settings from the default - read more in the DIA-NN documentation.
Run. DIA-NN will output a predicted spectral library in the .speclib file type.

Example of the DIA-NN GUI used to generate a predicted spectral library. Settings which have been altered from the default are highlighted in red.

BASH

C:\DIA-NN\2.2.0\diann.exe ` # Replace with file path to DIA-NN on your device
--lib "" `
--out "C:\path\to\output\report.parquet" ` # Replace with desired file path to output report.parquet
--out-lib "C:\path\to\output\report-lib.parquet" ` # Replace with desired file path to output report-lib.parquet
--fasta "C:\path\to\UniprotFasta\.fasta" ` # Replace with file path to Uniprot FASTA
--fasta "C:\path\to\ContaminantsFasta\.fasta" ` # Replace with file path to contaminants FASTA
--threads 30 ` # Replace with the number of threads to use for this analysis
--verbose 1 `
--qvalue 0.01 `
--matrices  `
--gen-spec-lib `
--predictor `
--reannotate `
--fasta-search `
--min-fr-mz 200 `
--max-fr-mz 1800 `
--met-excision `
--min-pep-len 7 `
--max-pep-len 30 `
--min-pr-mz 300 `
--max-pr-mz 1800 `
--min-pr-charge 1 `
--max-pr-charge 4 `
--cut K*,R* `
--missed-cleavages 1 `
--unimod4 `
--rt-profiling

This spectral library can be re-used for all analyses of samples from the same organism. It is recommended to re-download the UniProt FASTA and re-run this step approximately once a month to ensure your library is up-to-date.

Part 2: Analysis

The second step is to search your raw files (mass spectrometry output files) against the spectral library.

Open the DIA-NN GUI or a command line interface. If there are multiple versions of DIA-NN installed on your device, ensure to update DIA-NN exe with the file path to the version you wish to use.
Upload your raw files.
- If using the GUI, under Input select Raw then select all of the files you wish to search.
- If using the command line, you can use --f multiple times to list each file individually, or (recommended) use --dir [folder] to instruct DIA-NN to process all raw files saved in a particular folder.
Upload your spectral library.

Callout

Empirical vs predicted spectral library

You may wish to generate an empirical spectral library from your data, and use this for future analyses. An empirical library will be smaller than a predicted library and therefore increase the speed of analysis; however, may result in fewer identifications. We only recommend using an empirical library if generated from high quality data.

You can generate an empirical library from your samples by selecting Generate spectral library and specifying the location of its output.

Adjust settings according to your experiment and research goals. Again, examples are included below for the additional settings we recommend changing from the default. You may wish to change additional settings from the default - read more in the DIA-NN documentation.
- In this example, the protease Trypsin was used to generate peptides during sample preparation.
- Leaving Mass accuracy and MS1 accuracy on the default 0 means DIA-NN will optimise these parameters automatically for the first run in the experiment and then reuse the optimised settings for other runs. The DIA-NN documentation recommends different settings depending on the type of mass spectrometer used to generate your data. In this example, data were generated on a TripleTOF 6600, so we set Mass accuracy to 20.0 and MS1 accuracy to 12.0
- Set the number of Threads to be no more than the number of logical cores on your computer.
- In this experiment, we have set the maximum number of variable modifications to 0; however, it is common for this setting to be relaxed in the range of 1 - 3.
- DIA-NN’s default setting is to activate match-between-runs. From the documentation: In MBR mode, DIA-NN does two passes over the data. During the first pass, it creates an empirical spectral library from the data. During the second pass, it reanalyses the experiment with this empirical library, which may result in much improved identification numbers, data completeness and quantification accuracy.

Example of the DIA-NN GUI used to analyse the data described in this tutorial. Settings which have been altered from the default are highlighted in red.

BASH

C:\DIA-NN\2.2.0\diann.exe ` # Replace with file path to DIA-NN on your device
--dir "C:\path\to\folder\of\raw\files" ` # Replace with file path to folder containing all raw files to be analysed
--lib "C:\path\to\SpectralLibrary\report-lib.predicted.speclib" ` # Replace with file path to spectral library
--out "C:\path\to\output\report.parquet" ` # Replace with desired file path to output report.parquet
--threads 30 ` # Replace with the number of threads to use for this analysis
--verbose 1 `
--qvalue 0.01 `
--matrices  `
--unimod4 `
--mass-acc 20 `
--mass-acc-ms1 12.0 `
--reanalyse `
--rt-profiling

Challenge

If you want to run DIA-NN from the command line but are unsure how to write the code, you can update your settings in the GUI, click ‘Run’, and copy the commands output in the log. That’s how we got the example code above!

Can you match each line of code to the corresponding setting in the GUI?

Show me the solution

Check the description of available commands in the DIA-NN documentation

DIA-NN output

DIA-NN output includes several files, including:

Output file	Description
log.txt	A log of what DIA-NN has run. You should always use `ctrl + F` to check for any ‘ERROR’ or ‘WARNING’ messages issued by DIA-NN during your run.
.parquet	This is what we will be using for subsequent analyses in this tutorial.
pg_matrix.tsv	Matrix of protein values generated by DIA-NN.
pr_matrix.tsv	Matrix of peptide values generated by DIA-NN.
trends.pdf	A PDF report listing the number of peptides, proteins, and other quality control measures detected for each sample.

Challenge

If DIA-NN is run through the command line rather than the GUI, the trends.pdf and runs.pdf reports are not automatically generated. Using the DIA-NN documentation, can you work out how to generate this report via the command line?

Show me the solution

Run the diann-stats.py Python script provided when you download DIA-NN.

BASH

python C:\DIA-NN\2.2.0\diann-stats.py C:\path\to\report.parquet

We have now processed our raw files, identifying the peptides present in our samples and inferring the proteins to which they belong.

Learn more about DIANN

For more background information, see the original publication:
Demichev, V., et al. (2020). DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nature Methods. https://doi.org/10.1038/s41592-019-0638-x.

For further details about the different DIA-NN settings, see the documentation.

If you come across any issues, the creators are fairly active at responding to issues and discussions on their GitHub.

Key Points

Raw files from a mass spectrometry-based DIA proteomics experiment can be processed using DIA-NN (peptide identification and protein inference).
The DIA-NN workflow has two key steps: generating a predicted spectral library, and processing raw files.
The DIA-NN documentation contains detailed information about changing default settings as appropriate for your experiment.