Classifying Emergency Department Data to Improve Syndromic Surveillance: From Mixed Data Types to ICD Codes and Syndromes
Syndromic surveillance systems are used to monitor public health and enable a timely outbreak detection. Emergency department (ED) data can serve as an important data source for syndromic surveillance, but a high amount of missing diagnosis codes can make analyses relying on this information impossible. This study aims at enhancing an ED dataset from a piloted syndromic surveillance system in Germany to enable the monitoring of an influenza-like illness (ILI) syndrome. Routinely collected data from one ED containing mixed-type variables are analysed and two different approaches are implemented to deal with the missing data. Within the first approach, the missing diagnosis codes are imputed by predicting them from the remaining variables, using a multi-class naive Bayes classifier and a deep learning imputation package. In the second approach, a logistic regression model and a binary naive Bayes classifier are used to predict the ILI syndrome from all variables except the diagnosis code. The resulting ILI cases are evaluated on time series level with regard to seasonal patterns. The diagnosis codes were predicted from mixed-type input variables with sufficient precision (34.37% F1-measure in the best model). By taking into account the hierarchical structure of the ICD-10 codes, the performance was improved. Predicting the ILI syndrome independent of the diagnosis code from the remaining variables worked well (39.63% F1-measure in the best model) and the predictions showed medical similarity with the ILI syndrome. The models differed in their sensitivity of including cases, which can be adjusted by changing the threshold of the classifiers. The resulting ILI cases from all models were positively correlated with the reference cases on a time series basis (r = 0.865 for best model) and were comparable with an external data source, a surveillance of severe acute respiratory infections (SARI) (r = 0.867 for best model). The present study showed that the ED dataset can be enhanced to enable the syndromic surveillance of an ILI syndrome based on the diagnosis codes, even if this variable is missing. Additionally, a flexible case definition for an ILI syndrome was developed that is independent of the diagnosis code and the underlying generic method can be applied to other syndromes as well.
No license information