massMap: an efficient two-stage microbial association mapping framework with advanced FDR control

The two-stage microbial association mapping framework massMap provides an efficient solution for microbiome-wide association analysis. By fully exploiting the microbial dependence structures of the taxonomic tree, massMap is much more powerful than the existing methods in mapping the association at the lowest available taxonomic rank such as species or genus. We applied massMap to the analyses of the American Gut Project data and other datasets and found that massMap has marked improvement than the competing methods by discovering more biologically meaningful taxa.

Like Comment

In microbiome studies, one key problem is to identify which microbial taxa (microbes) are associated with the interested pathological outcomes (traits). To solve this problem, people usually conduct a microbiome-wide association analysis across the microbial community. Different from the well-known genome-wide association study, the microbiome data can be hierarchically classified into six major ranks via the taxonomic tree, from phylum, the most general rank with few taxonomic groups, to species being the most specific rank with hundreds of members. Among these taxonomic ranks, researchers are most interested in identifying the association signals at the lowest available rank, such as species or genus rank.

The taxonomic tree displays the evolutionary relationship among the microbial taxa; taxa that are closer on the tree tend to have similar responses to environmental shifts. Thus, trait-associated taxa are more likely to be clustered evolutionarily instead of randomly distributed across the community. However, none of the existing microbial association mapping methods utilize the tree information in the testing. This motivates us to develop a new statistical method that can incorporate this valuable information to enhance the statistical power.

We recently published a methodological article in Microbiome. In this paper, we propose a powerful two-stage microbial association mapping framework called massMap. It fully exploits the microbial dependence structures of the taxonomic tree. In the first stage, the upper rank family is pre-selected as the ‘screening rank’ and then massMap screens the association of taxonomic groups using the powerful microbial group association test OMiAT [1]. OMiAT is a new microbiome-based group association testing method designed to discover various association patterns of a group of taxa at the higher rank. In the second stage, massMap proceeds to test the association for each candidate taxon at the ‘target rank’ (usually the lowest available rank) within the significant taxonomic groups identified at the family rank in the first stage only (See the below figure; red indicates the significances). Through this ‘screening-target strategy’, massMap screens out less promising taxa and focus the statistical power on those more promising ones. The powerfulness of the group association test OMiAT and the incorporation of advanced FDR controlling procedures resolve the dependency among taxa and guarantee the high efficiency of massMap. 

We have evaluated our method with large scale simulation studies and applied it to the various datasets, including human gut, virginal and oral microbiome data and murine gut microbiome data. In these applications, massMap has more discoveries than the competing methods.

As an illustration, here we present the microbial association results from massMap on the antibiotic history (ABH) and body mass index (BMI) respectively from the American Gut Project (AGP, data analysis. The AGP data includes 16S rRNA V4 region sequences of 8,610 fecal samples and 456 descriptive measurements from 7,293 individuals. MassMap discovered 15 ABH-associated (FDR =0.05) and 6 BMI-associated (FDR =0.10) taxa respectively. The results are illustrated on the taxonomic tree in the following figures.  

From the figure, we can see that the association mapping of ABH shows a strong clustering pattern across the taxonomic tree: four associated species are clustered in family Lachnospiraceae and two species are clustered in family Micrococcaceae. These observations are consistent with our assumption that evolutionarily closer taxa usually have similar responses to exposures. By utilizing the prior biological information from the taxonomic structure, massMap was able to identify sufficient discoveries. 

In the BMI investigation, massMap discovered six BMI-associated species, while the competing methods had only one or no discovery. More interestingly, we found strong biological evidences of the discovered species through literature search. Among the six identified species, [Eubacterium] biforme, Bifidobacterium| Other, Catenibacterium| Other and Prevotella stercorea are reported to be associated with BMI or the other metabolic-related traits in several studies.

In summary, massMap is a highly efficient association mapping tool and we welcome investigators from worldwide to use it in their research projects.

The methodological article is published in Microbiome and can be found here: Accompanied R package ‘massMap’ is available at and

Currently, massMap can examine binary, continuous and survival traits. It is compatible with 16S rRNA amplicon sequence data and will soon be extended to metagenomic shotgun sequence data. We welcome investigators from worldwide to use our tool in their research projects. 


This post has been written by Dr. Jiyuan Hu and Dr. Huilin Li on behalf of all authors.



[1] Koh H, Blaser MJ, Li H. A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping. Microbiome. 2017;5:45.

Jiyuan Hu

Postdoctoral Fellow, New York University Langone Health