A PhD student at the very beginning of his project on the type three secretion system of Yersinia pseudotuberculosiswas approached by his supervisor telling him; ‘We might have developed a novel model for persistent infection, would you like to work on that?’ Switching from biochemistry of protein-protein interactions to wild, mysterious, complex, and dynamic in vivoenvironments, where both host and pathogens are in action with their full power, was a challenge that was well-accepted.
How could bacteria stay for unusually long time in a mouse model? What kind of gene products are involved in the establishment of persistent infection? How could the pathogen adapt to the harsh host environments? These were the main questions to be answered with in vivo RNA-seq of Y. pseudotuberculosis. After partly overcoming the obstacles of doing RNA-seq of bacteria in complex human host tissue, we could see that hundreds of genes other than those involved in virulence capacity of the pathogen were induced for long term maintenance of infection. What type of genes were they? I can hear many of you saying ‘mostly hypothetical genes’ and yes, it is true! There were many genes encoding hypothetical proteins and some others that we could not assign any functional annotation. Then, there were not many genes left for deep understanding of host-pathogens interactions during infection except genes involved in coping with low oxygen, low iron, oxidative stress, and acidic stress. How about the rest? Are they also involved in the responses to those stresses and possibly other types of stresses that the pathogen encounter in tissue, for example; nitrosative stress, nutrient limitation, and bile stress? How can one reach such information for hundreds of genes with many unknowns? My dream was to culture Y. pseudotuberculosisunder those stresses mimicking in vivoenvironments and annotate those genes with their response to a particular stress condition. Soon after I realized that even the dream was not big/strong enough to reach my goal. It would need to be expanded to many other pathogens with different infection route, phylogenetical order etc., which means dozens of human bacterial pathogens under a dozen of in vivomimicking conditions that would cost more than thousand RNA-seq libraries. This was just a dream until RNAtag-Seq, allowing up to 52 samples in one library preparation, was introduced.
As the opportunities in NGS technologies were blooming, we started to ask more comprehensive questions such as; ‘Despite being diverse in shape and gene content, how can all bacterial pathogens invade the human host and adapt to stressful host environments? Are there specific gene products dedicated to a particular stress condition? Do these gene products exist in all pathogens or in subgroups of pathogens? Answers to those questions have potential to give insight in understanding bacterial plasticity for adapting to stressful host environment, which will aid in development of novel strategies to combat bacterial pathogens. Therefore, we designed a set of conditions where each was mimicking a specific stress in different compartments of human host and targeted a diverse set of human bacterial pathogens with different Gram staining, oxygen dependency and phylogenetically close and distant species. I visited the labs which were expert in particular bacterial species and performed stress exposures and bacterial growths in their labs with the very kind help of the lab members. This to ensure that the bacterial cultures were grown in their optimal conditions and that experimental setups were kept the same for each species.
We have disclosed the global transcriptome of 32 human bacterial pathogens under 11 in vivo mimicking stress conditions together with control condition. This was retrieved by sequencing of 1122 bacterial cultures with RNAtag-seq. The data set contains the expression level of 105 088 coding genes and also non-coding RNAs under 12 conditions. The data provides an excellent resource for artificial intelligence algorithms to find novel information which are hidden.
We started to dig into data by asking very basic questions such as ‘Are same genes similarly regulated in different bacterial pathogens? Is it possible to predict potential of a gene to be regulated in a particular stress conditions in bacterial species not represented in the dataset?To answer those questions, we first performed comparative genomics and clustered the genes sharing similarity/identity into gene groups. Then, we realized that we needed to develop a formula to predict the probability of the genes in the group to be regulated in each of the stress conditions. After several attempts of formulating different aspects of gene regulation, frequency of regulation, conservation of the gene groups and their regulation under a particular condition, we ended up with an equation that generates a meaningful score reflecting ‘probability to be differentiallyexpressed’ (PTDEX score). Condition specific PTDEX scores provide a simplified evaluation on condition specific regulation of a gene that may exist in many other bacterial species.
Participation of gene products in responses to multiple stresses could be key to conservation during evolutionary diversification of species. With this dataset we could identify conserved genes with high PTDEX scores for at least 6 stress conditions, which we called ‘universal stress responders’ (USRs). Interestingly, many of these represented genes of which mutations been shown to confer antibiotic resistance. Hence, USRs may harbor genes with potential to be target for novel antibiotics. Moreover, identification of species-specific USRs has similar potential for species-specific targets for antibiotics, which is an attractive future approach for precision antimicrobials. We could also show that the in vivoobtained stress responses indeed were active in the in vivosituation by comparingour dataset to in vivoexpression profiles of two clinically important pathogens Pseudomonas aeruginosaand Staphylococcus aureus. These findings suggested that mimicking host stress conditions is a relevant approach for studies aiming at understanding of stress dynamics linked different infection scenarios. Another exciting finding was the obvious involvement of expression form the non-coding region of the genome. We could see a very strong expression of some known and unknown ncRNAs under particular stress conditions, indicating presence of other unknown ncRNAs to be discovered.
All the data generated in this study are deposited in PATHOgenex RNA Atlas (www.pathogenex.org) in a user-friendly web interphase. Users can retrieve expression level and differential expression of any gene(s) in any of the species investigated in the study. Each gene is linked to its associated gene group that enables users to quickly find similar genes in other species and evaluate differential expressions in different stress conditions with a heatmap. All figures generated according to users’ preferences could be downloaded in svg format, enabling users to have publishing quality figures.
You can reach the publication from here.