Interpretable detection of novel human viruses from genome sequencing data
dc.contributor.author | Bartoszewicz, Jakub M. | |
dc.contributor.author | Seidel, Anja | |
dc.contributor.author | Renard, Bernhard Y. | |
dc.date.accessioned | 2022-01-31T09:38:27Z | |
dc.date.available | 2022-01-31T09:38:27Z | |
dc.date.issued | 2021-02-01 | none |
dc.identifier.other | 10.1093/nargab/lqab004 | |
dc.identifier.uri | http://edoc.rki.de/176904/9289 | |
dc.description.abstract | Viruses evolve extremely quickly, so reliable meth- ods for viral host prediction are necessary to safe- guard biosecurity and biosafety alike. Novel human- infecting viruses are difficult to detect with stan- dard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next- generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology- based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host pre- diction task. We propose a new approach for con- volutional filter visualization to disentangle the in- formation content of each nucleotide from its contri- bution to the final classification decision. Nucleotide- resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy- to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics. | eng |
dc.language.iso | eng | none |
dc.publisher | Robert Koch-Institut | |
dc.rights | (CC BY 3.0 DE) Namensnennung 3.0 Deutschland | ger |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/de/ | |
dc.subject.ddc | 610 Medizin und Gesundheit | none |
dc.title | Interpretable detection of novel human viruses from genome sequencing data | none |
dc.type | article | |
dc.identifier.urn | urn:nbn:de:0257-176904/9289-9 | |
dc.identifier.doi | http://dx.doi.org/10.25646/9594 | |
dc.type.version | publishedVersion | none |
local.edoc.container-title | NAR Genomics and Bioinformatics | none |
local.edoc.type-name | Zeitschriftenartikel | |
local.edoc.container-type | periodical | |
local.edoc.container-type-name | Zeitschrift | |
local.edoc.container-url | https://academic.oup.com/nargab/article/3/1/lqab004/6125551 | none |
local.edoc.container-publisher-name | Oxford University Press | none |
local.edoc.container-volume | 3 | none |
local.edoc.container-issue | 1 | none |
local.edoc.container-year | 2021 | none |
local.edoc.container-firstpage | 1 | none |
local.edoc.container-lastpage | 14 | none |
local.edoc.rki-department | Methodenentwicklung und Forschungsinfrastruktur | none |
dc.description.version | Peer Reviewed | none |