Logo of Robert Koch InstituteLogo of Robert Koch Institute
Publication Server of Robert Koch Instituteedoc
de|en
View Item 
  • edoc-Server Home
  • Artikel in Fachzeitschriften
  • Artikel in Fachzeitschriften
  • View Item
  • edoc-Server Home
  • Artikel in Fachzeitschriften
  • Artikel in Fachzeitschriften
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
All of edoc-ServerCommunity & CollectionTitleAuthorSubjectThis CollectionTitleAuthorSubject
PublishLoginRegisterHelp
StatisticsView Usage Statistics
All of edoc-ServerCommunity & CollectionTitleAuthorSubjectThis CollectionTitleAuthorSubject
PublishLoginRegisterHelp
StatisticsView Usage Statistics
View Item 
  • edoc-Server Home
  • Artikel in Fachzeitschriften
  • Artikel in Fachzeitschriften
  • View Item
  • edoc-Server Home
  • Artikel in Fachzeitschriften
  • Artikel in Fachzeitschriften
  • View Item
2023-05-12Zeitschriftenartikel
Combination of whole genome sequencing and supervised machine learning provides unambiguous identification of eae-positive Shiga toxin-producing Escherichia coli
Vorimore, Fabien
Jaudou, Sandra
Tran, Mai-Lan
Richard, Hugues
Fach, Patrick
Delannoy, Sabine
Introduction: The objective of this study was to develop, using a genome wide machine learning approach, an unambiguous model to predict the presence of highly pathogenic STEC in E. coli reads assemblies derived from complex samples containing potentially multiple E. coli strains. Our approach has taken into account the high genomic plasticity of E. coli and utilized the stratification of STEC and E. coli pathogroups classification based on the serotype and virulence factors to identify specific combinations of biomarkers for improved characterization of eae-positive STEC (also named EHEC for enterohemorrhagic E.coli) which are associated with bloody diarrhea and hemolytic uremic syndrome (HUS) in human. Methods: The Machine Learning (ML) approach was used in this study on a large curated dataset composed of 1,493 E. coli genome sequences and 1,178 Coding Sequences (CDS). Feature selection has been performed using eight classification algorithms, resulting in a reduction of the number of CDS to six. From this reduced dataset, the eight ML models were trained with hyper-parameter tuning and cross-validation steps. Results and discussion: It is remarkable that only using these six genes, EHEC can be clearly identified from E. coli read assemblies obtained from in silico mixtures and complex samples such as milk metagenomes. These various combinations of discriminative biomarkers can be implemented as novel marker genes for the unambiguous EHEC characterization from different E. coli strains mixtures as well as from raw milk metagenomes
Files in this item
Thumbnail
Combination of whole genome sequencing and supervised machine learning provides unambiguous identification of eae-positive Shiga toxin-producing Escherichia coli.pdf — Adobe PDF — 1.489 Mb
MD5: 729d1c6a4b1db9bc53eb4375518e9bef
Cite
BibTeX
EndNote
RIS
(CC BY 3.0 DE) Namensnennung 3.0 Deutschland(CC BY 3.0 DE) Namensnennung 3.0 Deutschland
Details
Terms of Use Imprint Policy Data Privacy Statement Contact

The Robert Koch Institute is a Federal Institute

within the portfolio of the Federal Ministry of Health

© Robert Koch Institute

All rights reserved unless explicitly granted.