Predicting the diversity of unknown antibiotic resistance genes

Antibiotic resistance is becoming a global healthcare crisis. Much of the resistance in human pathogens is due to antibiotic resistance genes with origins in the environment. Recruitment of resistance genes from the environment is likely to happen again in the future. However, we don't know what these future resistance genes look like, so we cannot directly investigate which environments that would be important sources of novel resistance genes. In a recent study published in Microbiome, I explore if it would be possible to predict the diversity of yet undiscovered resistance genes using data generated using the sequences of those already known to be circulating in pathogens.
Predicting the diversity of unknown antibiotic resistance genes

The paper in Microbiome is here:

Antibiotics are crucial in modern healthcare and yet we may not fully appreciate their value until they don’t work anymore. Imagine something as simple as an appendix operation being a life-threatening surgical procedure because of the risk of acquiring an untreatable bacterial infection. This could be the reality if the rapid antibiotic resistance development seen in the last decades continues. But why is this development taking place? Undoubtedly, overuse of antibiotics in human healthcare is playing a large role, and the misuse of antibiotics in animal husbandry is probably also an important driver.

In addition, the natural environment is important in the development of resistance in pathogens in two ways. First, it can act as a means for resistant bacteria to be transmitted between persons – for example through sewage treatment plants, into surface waters used for drinking or irrigation, and back to humans. Second, the vast diversity of environmental bacteria has been shown to host an enormous variety of antibiotic resistance genes. Under the right conditions, these genes can make the jump from harmless environmental bacteria into human pathogens, potentially having a devastating impact on our ability treat infectious diseases.

The roles of the environment in antibiotic resistance developmentReproduced from Bengtsson-Palme et al. (2018), FEMS Microbiol. Rev. doi:, distributed under the CC-BY-NC license. 

I have, together with Joakim Larsson, argued that the risks associated with the latter scenario likely are underestimated, for understandable reasons. Risks for resistance acquisition are inherently hard to account for because they correspond to exceedingly rare events with a very wide range of possible consequences for human health. It is also hard to predict where the risks for transfer of environmental antibiotic resistance genes are high. Why? Because we don’t know what these genes look like, so we don’t know what to search for. In contrast, detecting known resistance genes that are already found in human pathogens is easy, which contributes to making dissemination of resistant bacteria seemingly a greater risk than it – in relative terms – might actually be.

But what if it would be possible to predict the abundance and diversity of these yet undiscovered resistance genes using the easily obtainable information on known ones? This question has been nagging me for years. If there was such a relation between known and unknown resistance genes, we would be able to rank environments for the overall risks for both dissemination and acquisition of novel resistance genes. We would be able to monitor environments for known gene variants and use that as a proxy for risks for transfer. And on the other hand, if there was no such link, we would essentially be left off with guessing which environments could be of high and low importance as sources of novel resistance genes.

Soil is thought to be one of the largest reservoirs of novel antibiotic resistance genes.

The problem was finding a way to test the hypothesis that the abundance and/or diversity of known resistance genes could predict the properties of the uncharacterized collection of resistance genes. When Chandan Pal and I worked on resistance gene patterns across more than 800 samples from different environments, I started toying with the idea of using subsets of the genes in the database we were using to predict the results obtained from the full database. In other words, I was pretending that we only knew about a random subset of genes, and then I looked at what that subset could tell us about the overall resistance patterns. The results looked promising, but I felt that this subset-prediction was not enough on its own to establish that there was a link between the known and unknown resistance gene diversity. The results needed a validation that was independent of the gene set used for prediction.

When the FARME database was published, I almost immediately realized that this was my validation data. The FARME database consists of resistance genes uncovered using cloning of random DNA from (mostly) environmental bacteria and expressing these randomly cloned genes in a host that is susceptible to an antibiotic. If the inserted DNA makes the new host resistant, a resistance gene has been identified, and the inserted DNA can be sequenced to identify if its known or novel. The team behind FARME collected all these gene sequences from different studies and annotated them, and with some filtering this became the perfect data set to test my predictions.

The abundance and diversity of known antibiotic resistance genes can predict the abundance and diversity of unknown resistance genes
The abundance and diversity of known antibiotic resistance genes can predict the abundance and diversity of unknown resistance genes. Predictions from known resistance genes are on the x-axis (RESQU resistance gene abundance/diversity) and the FARME abundance/diversity represents the numbers for the uncharacterized resistance genes. Reproduced from Bengtsson-Palme (2018), Microbiome. doi:, distributed under the CC-BY license. (

Armed with the FARME data I got strong (although not perfect) correlations between the abundance and diversity of known resistance genes from pathogenic bacteria and the abundance and diversity recently uncovered genes from FARME. That implied that the diversity of known resistance genes predicted the diversity of unknown resistance genes. I was thrilled. It meant that if we want to rank risks associated with antibiotic resistance in the environment, the knowledge gained from large-scale quantification of known resistance genes can be used as a proxy for the unknown resistance factors. These findings, just published in Microbiome, can be used to guide monitoring efforts for environmental antibiotic resistance, to design risk ranking strategies, and to choose appropriate measures and methods for describing resistance gene abundance and diversity in the environment.

Please sign in or register for FREE

If you are a registered user on Nature Portfolio Microbiology Community, please sign in