Commensal or pathogen? ColV plasmids in E. coli evolution

E. coli ST58 are supposed to be environmental commensals and yet they are a globally emerging cause of sepsis. What can genomic epidemiology reveal about their rise to prominence?

In 2018, our group published a comparative genomic analysis of two near-identical E. coli strains isolated from the urine and blood, respectively, of a patient suffering urosepsis (a blood infection that originates in the urinary tract). These strains were unusual in that they belonged to an atypical and emerging lineage of extra-intestinal pathogenic E. coli (ExPEC) called ST58 (ST = sequence type; a common method of genetic classification). ST58 were considered atypical as they were originally described in wild animals and did not belong to classical groupings of pathogenic E. coli. Furthermore, the two strains in question both carried a ColV type plasmid (a self-transmissible piece of mobile DNA), commonly identified in poultry source E. coli and a proportion of ExPEC infections.

At that time, we knew that ColV plasmids were a diverse group of structurally similar plasmids, and carried genes functioning in both commensal gut colonisation, as well as extra-intestinal pathogenicity. Gut commensalism can be considered a ‘natural’ ecological niche of E. coli, whereas extra-intestinal pathogenicity is more akin to an accidental or opportunistic niche for E. coli that are displaced from the gut to the non-intestinal bodily sites such as the urinary tract. Thus, we thought ColV plasmids might act as a double-edged sword for E. coli. What we did not know was how widespread they were in ST58, E. coli in general, or whether the documented association with poultry represented their true ecological niche.

As further reports of ST58 emerged – ranging from cases of sepsis in humans to commensal strains from wild animals – we began to wonder if human and animal populations were distinct or linked. Could pathogenic ST58 in humans be a form of zoonosis? And if so, what animals were involved? Was there a role for ColV plasmids or other plasmid types in their emergence?

The genome collection

To address these questions, we collaborated with colleagues in Europe and leveraged the ever-growing wealth of publicly available genomic data to assemble the largest possible collection of ST58 E. coli from diverse sources and geographies. Our final collection consisted of 752 ST58 E. coli genomes from 33 countries and six continents isolated between 1970 and 2019. Sources included food-producing animals, humans, wild animals and water among others.

Before any fancy bioinformatics software could be run in anger, the metadata of the collection revealed an interesting result. The three major food producing animals (chickens, cattle and pigs) dominated the collection, accounting for 68% of genomes (514/752), whereas genomes from ExPEC infections only accounted for 6% (43/752). This was notable as public genome databases are notoriously biased toward human clinical samples, so the relative proportions in the collection alone suggested that food animals might have had a role to play in the emergence of ST58.

The analysis

Turning to our favourite bioinformatics software packages, we built a phylogenetic tree to show relationships between the genomes, grouped them into meaningful clusters based on genome sequence similarity, determined the presence of important genetic features such as ColV plasmids, and compared these between clusters.

The phylogenetic tree of ST58 was split into six major clusters. The largest cluster (BAP2) drew our interest immediately. It comprised nearly half the genome collection and displayed a broad host-range dominated by poultry and pigs, and featured 77% of sequences from human infections. Astonishingly, 85% of genomes in this cluster carried a ColV plasmid, supporting our ideas about its relationship with host range and pathogenicity. See Figure.

These results indicated linkage between ColV plasmids, pathogenicity, poultry and pigs within ST58, but we wanted to know if this was reflected in E. coli in general. Are poultry and pigs really the key ecological sources of ColV plasmids, or do they spill over from humans?

Beyond ST58

To address this, we downloaded 34,364 E. coli genomes from Enterobase1, determined presence of ColV plasmids in each genome and then compared rates of carriage across sources and STs. Compared to the overall ColV carriage rate of 13%, a stunning 55% of poultry source E. coli carried a ColV plasmid. Pig and ExPEC sourced E. coli were also above average carriers at 20% and 16%, respectively. (See Figure below) ColV carriage was generally higher in STs with large proportions of poultry and pig source genomes and these STs always displayed a proportion of ExPEC source. Overall, these results were highly reflective of what we observed within ST58 and highlight links between ColV plasmids, pathogenicity, poultry and pig production.

Whilst we have developed a greater understanding of ST58 and the role of ColV plasmids in their evolution, many questions remain. As ExPEC infections are typically caused by host faecal E. coli, what is the carriage rate of ColV+ ST58 in the gut of healthy individuals? Are ColV+ E. coli common in the healthy human gut in general? What are the risk factors for infection by ColV+ E. coli that have reservoirs in food animals? Future work will endeavour to resolve these problems and provide a framework that might inform future public health interventions.


In conclusion, our work is significant because it highlights how selection for mobile genetic elements in non-human sources can contribute to the evolution of human pathogens. For E. coli, the reductive conception of zoonosis as the simple transfer of a pathogen from an animal to human should rather be understood as a complex phenomenon arising from a vast network of interactions between evolutionary lineages, mobile genetic elements and the selective pressures they encounter in both animal and human hosts.

We would like to thank our generous collaborators for sharing ST58 genome sequences with us and for their help refining the manuscript.

1. Enterobase is an incredible curated source of enteric microbial genomes encompassing isolate metadata and a wealth of genotypic information as well as tools for phylogenetic analysis.