Alright stop, Kleborate and listen

Klebsiella pneumoniae has garnered quite a reputation in the last decade as a human pathogen of increasing public health concern in healthcare facilities and some communities worldwide.

Like Comment
Read the paper

Klebsiella pneumoniae has garnered quite a reputation in the last decade as a human pathogen of increasing public health concern in healthcare facilities and some communities worldwide. As with other hospital superbugs, its notoriety largely stems from high levels of resistance to the antibiotics usually relied on to treat infections. This resistance is mostly due to the acquisition and accumulation of antimicrobial resistance (AMR) genes that easily transfer between strains. Of greatest concern, there are increasing cases of K. pneumoniae infections that display resistance to the last-line drugs reserved for when all other options have failed, effectively rendering these infections untreatable. Outside of the hospital setting and notably concentrated in areas within the Asian Pacific rim, K. pneumoniae serves up an entirely distinct public health threat from that faced in hospitals. A subset of ‘hypervirulent’ strains, enriched with virulence genes that enhance the disease-causing capacity, can cause invasive and life-threatening infections in people that appear to be otherwise healthy and immunocompetent. Like the AMR genes, these virulence genes can also move between strains. 

Under the lead of Professor Kathryn Holt and building upon the early genomic insights gleaned from her 2015 study of n=328 diverse K. pneumoniae genomes (Holt et al. 2015), several members of the Holt lab group have focused their efforts on investigating the evolution, genetic diversity and distribution of genes that enhance the virulence of K. pneumoniae. Prior to the commencement of the work leading to this paper, we had conducted a number of studies revealing extensive genetic diversity in the following key virulence loci in K. pneumoniae that are often associated with distinct lineages and/or mobile elements: 

  1. The polysaccharide capsule encoded by the K locus (Wyres et al. 2016)
  2. The iron-scavenging siderophore yersiniabactin and genotoxin colibactin (encoded by ybt and clb locus respectively), typically mobilised by a chromosomal element called an integrative and conjugative element, ICEKp (Lam et al. 2018a
  3. The siderophores aerobactin and salmochelin (iuc and iro respectively), typically mobilised by the so-called large virulence plasmids (Lam et al. 2018b

A meeting was called one afternoon in late 2017 to discuss how we could harness the rich information embedded in the genetic diversity of these loci for tracking purposes, especially given that studies often only report on the presence or absence of virulence loci. Midway through, the conversation took a slight detour and we somehow landed on a game of wordplay, attempting to work ICEKp into the lyrics of Ice Ice Baby. “Alright, stop, Kleborate and listen…” And thus Kleborate was born. As for whether or not we managed to integrate ICEKp (pun intended) into Ice Ice Baby, the proof of our lyrical genius is evident on the Kleborate wiki page.  

Our vision for Kleborate from the get-go was a tool that rapidly extracts key genotyping information for loci of clinical and epidemiological relevance. What this entails has naturally evolved over the years, and the code, logic and output from the initial Kleborate release made back in March 2018 looks quite different to the version 2.0.0 release used in the paper. Edits to the code often came hand-in-hand with an arduous debugging process (many thanks to Ryan Wick on this front), followed by a re-running of the updated code on the ever-growing dataset of publicly available Klebsiella genomes, re-analysis of the data, re-drawing of figures and so on. A cycle that was finally (thankfully!) broken in late 2020 when we pre-printed the paper. Today’s version of Kleborate outputs an impressive 106 columns of data encompassing assembly metrics, species prediction, MLST, genotypes of the aforementioned virulence loci in addition to the lipopolysaccharide O antigen and hypermucoidy loci rmpADC and rmpA2, and reporting of AMR genes and mutations for 17 different drug classes. The phrase 'one shop stop for Klebsiella genomes’ has been thrown around on a number of occasions. 

In our paper, we applied Kleborate to a number of datasets to demonstrate the suitability of the tool as a genome surveillance tool. Starting first with the 2013-14 EuSCAPE surveillance dataset comprising 1600 carbapenem susceptible or non-susceptible K. pneumoniae from across Europe (David et al 2019), Kleborate not only recapitulated the main findings around country vs. strain (i.e. ST) vs. carbapenemase trends (Figure 1), but also revealed novel insights around the K and O surface antigens and the compounding effects of the Omp porin mutations required for enhanced levels of carbapenem resistance.

Figure 1 - Geographical and lineage distribution of carbapenemase genes from the EuSCAPE surveillance study. 
Data summarises Kleborate results for 927 carbapenem-non-susceptible and 697 carbapenem-susceptible K. pneumoniae genomes. Each circle represents a genome, coloured by carbapenemase as per inset legend. Barplots summarise the number of genomes from each K. pneumoniae lineage (top) and country (right), coloured by carbapenemase.

We next applied Kleborate to a dataset of publicly available Klebsiella whole genomes, which had grown from a relatively measly 3,000 genomes back when we first started the Kleborate project in 2017 to a whopping 13,000+ by 2020. Unsurprisingly, the samples were dominated by clinical samples but still revealed some interesting trends in AMR, virulence and the overall genetic diversity nonetheless (Figure 2). 

Figure 2 - Summary of genome collection metadata, and Kleborate-derived virulence and antimicrobial resistance (AMR) genotypes, for all publicly available Klebsiella genomes.
Data shown summarises Kleborate results for 11,277 non-redundant Klebsiella genomes publicly available as at 17 July 2020. From left to right: barplots showing source information by geographical region and sample type (coloured as per inset legend); heatmaps showing the prevalence of virulence loci (blue) and predicted AMR drug classes (red) (as per inset scale bars). Genomes are summarised by species, ordered by species complex: KpSC, K. pneumoniae species complex; KoSC, K. oxytoca species complex; and other Klebsiella. In the heatmaps, the total number of genomes in which each type of virulence/AMR determinant was detected are indicated below each column. Column names are as follows: ybt yersiniabactin, clb colibactin, iuc aerobactin, iro salmochelin, rmp hypermucoidy Rmp, rmpA2 hypermucoidy rmpA2, AGly aminoglycosides, Col colistin, Fcyn fosfomycin, Flq fluoroquinolone, Gly glycopeptide, MLS macrolides, Phe phenicols, Rif rifampin, Sul sulfonamides, Tet tetracyclines, Tgc tigecycline, Tmt trimethoprim, Bla β-lactamases, inhR β-lactamase inhibitor, ESBL extended-spectrum β-lactamases, ESBL_inhR extended-spectrum β-lactamase with resistance to β-lactamase inhibitors, Carb carbapenemase, Bla_chr intrinsic chromosomal β-lactamase, SHV mutations in SHV, Omp truncations/mutations in ompK35/ompK36, Col truncations in mgrB/pmrB conferring colistin resistance, Flq mutations in gyrA/parC conferring resistance to fluoroquinolones.

In particular, and worryingly, we identified 600 genomes where virulence and AMR genes have converged in the same strain (Figure 3) - a potentially deadly combination resulting in infections that are not only invasive due to the hypervirulence determinants (i.e. aerobactin) but also difficult to treat or untreatable thanks to resistance to the most relied-upon drugs, conferred by presence of ESBL and/or carbapenemases. 

Figure 3 - Convergence of AMR and virulence determinants in the K. pneumoniae population, identified by Kleborate analysis of public genomes.
A. Geographical and lineage distribution of convergence events. Each circle represents a unique convergence event (i.e. a monophyletic clade harbouring both ESBL/carbapenemase genes and iuc. Circles are scaled by the number of total genomes linked to the event and coloured to indicate whether convergence is inferred to have occurred via (i) acquisition of ESBL or carbapenemase/s, (ii) acquisition of an iuc-encoding plasmid, or (iii) unresolved means as per inset legend. Marginal barplots show the number of convergence events (colour blocks) and genomes (block heights) associated with each lineage (top) or geographical region (right). Lineages were defined on the basis of multi-locus sequence types (STs), number of convergence events estimated for each is labelled at the top of each bar.
B. Distribution of convergent genomes by location. Countries from which convergent genomes were detected are coloured on the map; circles represent the number of convergent genomes detected in each UN-defined geographical region (indicated by colour, as per inset legend), scaled and labelled with the minimum estimated number of unique convergence events specific to each region (excluding inter-regional convergence events). The total number of convergence events affecting each region, including region-specific and inter-regional convergence events, are given in brackets in the inset legend.

Lastly, we also highlighted the potential for Kleborate to genotype strains direct from culture-free metagenomics-based sequencing data, using the Baby Biome Study dataset as an example (Shao et al. 2019). We anticipate this will be an increasingly important application, as deep sequencing is rapidly being adopted for both clinical investigations and environmental surveillance. 

With the tool cited in at least 74 publications at the time of finalising this study and its implementation into the Pathogenwatch pipeline (Argimón et al. 2021), the usefulness of Kleborate in genomic analyses and surveillance of Klebsiella has already been widely recognised. We have also developed Kleborate-Viz, and along with the Centre for Genomic Surveillance's Pathogenwatch, are two online platforms that facilitate the generation and/or exploration of Kleborate data. Going forward, the Kleborate code will continue to be modified in line with developments in research around AMR and virulence in Klebsiella for improved reporting and interpretation of clinically-relevant features. 

The study describing Kleborate and insights gleaned from its application to 13,000+ publicly available genomes and metagenomes from the Baby Biome Study is now published online at Nature Communications

Margaret Lam

Postdoctoral research fellow, Monash University