DIA metaproteomics is ready to take off

With the superior performance in terms of deep proteome coverage and high quantitative accuracy, data-independent acquisition (DIA) mass spectrometry is expected to cut a conspicuous figure in metaproteomics.

Published in Microbiology

Jan 24, 2023

Yi Yang

Research Scientist, ZJU-Hangzhou Global Scientific and Technological Innovation Center

Liked by Evelina Satkevic and 1 other

Read the paper

Metaproteomics has shown the capability to reveal not only taxonomic composition of complex microbial communities, such as gut microbiota, but also deep function information regarding the dynamic host-microbiota interactions. Due to the extreme complexity and heterogeneity of gut metaproteome, accurate protein identification and quantification in metaproteome is a severe challenge compared to the conventional proteomics of a single organism.

During the past years, while most proteomic and metaproteomic studies adopt traditional mass spectrometry (MS) approaches based on data-dependent acquisition (DDA), data-independent acquisition (DIA) techniques have shown outstanding performance in conventional proteomics with increased proteome coverage, reproducibility, and accuracy in quantification¹. In 2020, Aakko et al. pioneered the use of DIA in metaproteomics and reported a proof-of-concept study to demonstrate its technical feasibility². In the same year, our team also reported one of the earliest DIA-based metaproteomic studies, in which more than 30,000 proteins were identified and quantified from gut microbiota in a real clinical cohort³. As an emerging technique, there may be matters of concern about the trustworthiness of the DIA-based metaproteomic methods, which is exactly the main message this paper is expected to convey.

DIA changes how metaproteomic data are generated

In a traditional DDA-based proteomic analysis, MS2 acquisition is triggered by precursor peaks in an MS1 scan (in other words, “dependent” on the MS1 scan). Protein identification is achieved via database searching of peptide fragments in MS2 spectra and thus information of a peptide precursor would be lost if it had not been selected for MS2 fragmentation. What is worse, the stochastic precursor selection can result in the “missing value” problem, which is more significant in metaproteomics compared to conventional single organism proteomics due to the significantly enhanced sample heterogeneity, complexity and dynamic range.

In contrast, DIA performs a sequence of MS2 scans in an acquisition cycle, where precursor ions within defined isolation windows are reproducibly fragmented. This feature allows systematical and unbiased recording of fragmentation information of theoretically all peptides present in a sample. It is even somewhat overdrawn to say that the sample has been digitized for long-term storage. Reanalysis using more advanced informatic solutions in the future may contribute to biological findings that are undetectable currently.

**Figure 1.** Schematic illustration of the DDA and DIA approaches.
Adapted from ref. 1, under CC BY 4.0 license.

DIA deepens metaproteome coverage and improves quantitative accuracy

Popular quantification strategies other than DIA include DDA-based label-free quantification (LFQ) and tandem mass tag (TMT) isobaric labeling quantification. In a typical LFQ-DDA analysis, protein quantification is based on precursor ion intensities. In a TMT analysis, protein quantification is based on the reporter ions generated by the isobaric labels after fragmentation.

In this paper, we have benchmarked DIA and the other two quantification strategies on human fecal metaproteome samples spiked with 6 known bacterial species. DIA outperformed LFQ-DDA with superior proteome coverage and data completeness. Compared with LFQ-DDA, DIA yielded approximately 80% more proteins and 55% more peptides that were validly identified and quantified (with no more than 1/3 missing values).

From the quantified proteins, those uniquely belonging to the 6 species were selected to evaluate the quantitative accuracy of the methods. We demonstrated that DIA outperformed TMT with higher quantitative accuracy. This result is different from those on simpler mixture samples⁴ and simulated microbial communities of up to 12 species, where DIA showed quantitative accuracy slight worse than or close to TMT. (We greatly appreciate the reviewers’ suggestion on mimicking the complexity of real metaproteome samples.) Due to the bias came from superposition of reporter ion intensities of co-isolated and co-fragmented peptides in complex metaproteome samples, TMT suffered from ratio compression that measured fold changes deviated from theoretical values toward a 1:1 ratio. In DIA, protein quantification can be performed based on peptide-specific fragment chromatogram profiles to alleviate interference from other peptides, improving the quantification accuracy.

**Figure 2.** Comparison of quantitative accuracy of DIA, LFQ-DDA, and TMT on the spike-in metaproteome samples.

DIA no longer requires external spectral libraries

In the past, a main drawback of DIA was that its data analysis required a spectral library, which is usually built from fractionated samples by DDA. More recently, Pietilä et al. introduced library-free DIA for metaproteomic analysis⁵. With improvements of informatic solutions by incorporating deep learning⁶, experimental spectral libraries are no longer necessary for DIA analysis.

In this paper, we compared the library-free workflow in Spectronaut, directDIA, with alternative DIA data analysis methods based on experimental and predicted spectral libraries. Our results demonstrated that directDIA outperformed library-based approaches on complex proteome samples of microbial communities.

The only prior knowledge for directDIA data analysis is the protein sequence database. We applied directDIA to characterize human gut microbiota in two real clinical cohorts. Our results demonstrated that directDIA cooperated well with the mainstream metaproteome sequence database choices⁷, i.e., an experimental metagenome of the sample and a “pseudo-metagenome” assembled from publicly available sequences.

This study indicates a promising prospect of DIA in deep characterization of gut microbiota. With further innovations in instrumentation and informatics, DIA metaproteomics is bound to shine forth with all its strength. We hope our work can promote broader adoption of DIA metaproteomics in microbiome research.

References

Ludwig, C. et al. Data-independent acquisition-based SWATH-MS for quantitative proteomics: a tutorial. Mol. Syst. Biol. 14, e8126 (2018).
Aakko, J. et al. Data-independent acquisition mass spectrometry in metaproteomics of gut microbiota—implementation and computational analysis. J. Proteome Res. 19, 432-436 (2020).
Long, S. et al. Metaproteomics characterizes human gut microbiome function in colorectal cancer. npj Biofilms Microbiomes 6, 14 (2020).
Muntel, J. et al. Comparison of protein quantification in a complex background by DIA and TMT workflows with fixed instrument time. J. Proteome Res. 18, 1340-1351 (2019).
Pietilä, S., Suomi, T. & Elo, L. L. Introducing untargeted data-independent acquisition for metaproteomics of complex microbial samples. ISME Commun. 2, 51 (2022).
Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods 17, 41-44 (2020).
Tanca, A. et al. The impact of sequence database choice on metaproteomic results in gut microbiota studies. Microbiome 4, 51 (2016).

Yi Yang

Research Scientist, ZJU-Hangzhou Global Scientific and Technological Innovation Center

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Microbiology

Life Sciences > Biological Sciences > Microbiology

npj Biofilms and Microbiomes

npj Biofilms and Microbiomes

The aim of this journal is to serve as a comprehensive platform to promote biofilms and microbiomes research across a wide spectrum of scientific disciplines.

More about the journal

Related Collections

With collections, you can get published faster and increase your visibility.

Live biotherapeutics and medicinal microbiome products

This Collection welcomes contributions on recent developments in the utilization of live biotherapeutics and medicinal microbiome products for the treatment or prevention of diseases. In many cases, suitable drug treatments are still lacking due to the ineffectiveness of existing standards or the presence of long-term side effects. We are particularly interested in difficult-to-treat non-communicable diseases whose pathogenesis is associated with microbiota.

Publishing Model: Open Access

Deadline: Aug 02, 2024

Explore this collection

Latest Content

Behind the Paper

Research Communities by Springer Nature

DIA metaproteomics is ready to take off