2023-02-16Masterarbeit
Classifying Emergency Department Data to Improve Syndromic Surveillance: From Mixed Data Types to ICD Codes and Syndromes
Wagner, Birte
Syndromic surveillance systems are used to monitor public health and enable a timely outbreak
detection. Emergency department (ED) data can serve as an important data source for
syndromic surveillance, but a high amount of missing diagnosis codes can make analyses
relying on this information impossible. This study aims at enhancing an ED dataset from a
piloted syndromic surveillance system in Germany to enable the monitoring of an influenza-like
illness (ILI) syndrome.
Routinely collected data from one ED containing mixed-type variables are analysed and
two different approaches are implemented to deal with the missing data. Within the first
approach, the missing diagnosis codes are imputed by predicting them from the remaining
variables, using a multi-class naive Bayes classifier and a deep learning imputation package. In
the second approach, a logistic regression model and a binary naive Bayes classifier are used to
predict the ILI syndrome from all variables except the diagnosis code. The resulting ILI cases
are evaluated on time series level with regard to seasonal patterns.
The diagnosis codes were predicted from mixed-type input variables with sufficient precision
(34.37% F1-measure in the best model). By taking into account the hierarchical structure of
the ICD-10 codes, the performance was improved. Predicting the ILI syndrome independent
of the diagnosis code from the remaining variables worked well (39.63% F1-measure in the
best model) and the predictions showed medical similarity with the ILI syndrome. The models
differed in their sensitivity of including cases, which can be adjusted by changing the threshold
of the classifiers. The resulting ILI cases from all models were positively correlated with the
reference cases on a time series basis (r = 0.865 for best model) and were comparable with an
external data source, a surveillance of severe acute respiratory infections (SARI) (r = 0.867
for best model).
The present study showed that the ED dataset can be enhanced to enable the syndromic
surveillance of an ILI syndrome based on the diagnosis codes, even if this variable is missing.
Additionally, a flexible case definition for an ILI syndrome was developed that is independent
of the diagnosis code and the underlying generic method can be applied to other syndromes
as well.
No license information