Zur Kurzanzeige

2023-05-12Zeitschriftenartikel
Combination of whole genome sequencing and supervised machine learning provides unambiguous identification of eae-positive Shiga toxin-producing Escherichia coli
dc.contributor.authorVorimore, Fabien
dc.contributor.authorJaudou, Sandra
dc.contributor.authorTran, Mai-Lan
dc.contributor.authorRichard, Hugues
dc.contributor.authorFach, Patrick
dc.contributor.authorDelannoy, Sabine
dc.date.accessioned2023-11-16T14:09:21Z
dc.date.available2023-11-16T14:09:21Z
dc.date.issued2023-05-12none
dc.identifier.other10.3389/fmicb.2023.1118158
dc.identifier.urihttp://edoc.rki.de/176904/11362
dc.description.abstractIntroduction: The objective of this study was to develop, using a genome wide machine learning approach, an unambiguous model to predict the presence of highly pathogenic STEC in E. coli reads assemblies derived from complex samples containing potentially multiple E. coli strains. Our approach has taken into account the high genomic plasticity of E. coli and utilized the stratification of STEC and E. coli pathogroups classification based on the serotype and virulence factors to identify specific combinations of biomarkers for improved characterization of eae-positive STEC (also named EHEC for enterohemorrhagic E.coli) which are associated with bloody diarrhea and hemolytic uremic syndrome (HUS) in human. Methods: The Machine Learning (ML) approach was used in this study on a large curated dataset composed of 1,493 E. coli genome sequences and 1,178 Coding Sequences (CDS). Feature selection has been performed using eight classification algorithms, resulting in a reduction of the number of CDS to six. From this reduced dataset, the eight ML models were trained with hyper-parameter tuning and cross-validation steps. Results and discussion: It is remarkable that only using these six genes, EHEC can be clearly identified from E. coli read assemblies obtained from in silico mixtures and complex samples such as milk metagenomes. These various combinations of discriminative biomarkers can be implemented as novel marker genes for the unambiguous EHEC characterization from different E. coli strains mixtures as well as from raw milk metagenomeseng
dc.language.isoengnone
dc.publisherRobert Koch-Institut
dc.rights(CC BY 3.0 DE) Namensnennung 3.0 Deutschlandger
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/de/
dc.subjectmachine learningeng
dc.subjectShiga toxin-producing Escherichia Colieng
dc.subjectfood safetyeng
dc.subjectmetagenomicseng
dc.subjectraw milkeng
dc.subject.ddc610 Medizin und Gesundheitnone
dc.titleCombination of whole genome sequencing and supervised machine learning provides unambiguous identification of eae-positive Shiga toxin-producing Escherichia colinone
dc.typearticle
dc.identifier.urnurn:nbn:de:0257-176904/11362-0
dc.type.versionpublishedVersionnone
local.edoc.container-titleFrontiers in Microbiologynone
local.edoc.pages13none
local.edoc.type-nameZeitschriftenartikel
local.edoc.container-typeperiodical
local.edoc.container-type-nameZeitschrift
local.edoc.container-urlhttps://www.frontiersin.org/journals/microbiologynone
local.edoc.container-publisher-nameFrontiers Media S. A.none
local.edoc.container-volume14none
local.edoc.container-reportyear2023none
dc.description.versionPeer Reviewednone

Zur Kurzanzeige