Show simple item record

2024-05-24Zeitschriftenartikel
Interpretable molecular encodings and representations for machine learning tasks
dc.contributor.authorWeckbecker, Moritz
dc.contributor.authorAnžel, Aleksandar
dc.contributor.authorYang, Zewen
dc.contributor.authorHattab, Georges
dc.date.accessioned2026-02-10T09:59:37Z
dc.date.available2026-02-10T09:59:37Z
dc.date.issued2024-05-24none
dc.identifier.other10.1016/j.csbj.2024.05.035
dc.identifier.urihttp://edoc.rki.de/176904/13297
dc.description.abstractMolecular encodings and their usage in machine learning models have demonstrated significant breakthroughs in biomedical applications, particularly in the classification of peptides and proteins. To this end, we propose a new encoding method: Interpretable Carbon-based Array of Neighborhoods (iCAN). Designed to address machine learning models' need for more structured and less flexible input, it captures the neighborhoods of carbon atoms in a counting array and improves the utility of the resulting encodings for machine learning models. The iCAN method provides interpretable molecular encodings and representations, enabling the comparison of molecular neighborhoods, identification of repeating patterns, and visualization of relevance heat maps for a given data set. When reproducing a large biomedical peptide classification study, it outperforms its predecessor encoding. When extended to proteins, it outperforms a lead structure-based encoding on 71% of the data sets. Our method offers interpretable encodings that can be applied to all organic molecules, including exotic amino acids, cyclic peptides, and larger proteins, making it highly versatile across various domains and data sets. This work establishes a promising new direction for machine learning in peptide and protein classification in biomedicine and healthcare, potentially accelerating advances in drug discovery and disease diagnosis.eng
dc.language.isoengnone
dc.publisherRobert Koch-Institut
dc.rights(CC BY 3.0 DE) Namensnennung 3.0 Deutschlandger
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/de/
dc.subjectExplainableeng
dc.subjectInterpretableeng
dc.subjectMolecular encodingeng
dc.subjectRepresentationeng
dc.subjectMachine learningeng
dc.subject.ddc610 Medizin und Gesundheitnone
dc.titleInterpretable molecular encodings and representations for machine learning tasksnone
dc.typearticle
dc.identifier.urnurn:nbn:de:0257-176904/13297-1
dc.type.versionpublishedVersionnone
local.edoc.container-titleComputational and Structural Biotechnology Journalnone
local.edoc.type-nameZeitschriftenartikel
local.edoc.container-typeperiodical
local.edoc.container-type-nameZeitschrift
local.edoc.container-publisher-nameElsevier B.V.none
local.edoc.container-reportyear2024none
local.edoc.container-firstpage2326none
local.edoc.container-lastpage2336none
dc.description.versionPeer Reviewednone

Show simple item record