Clinical Narrative Summarization based on the MIMIC III Dataset
AUTHORS
Jugal Shah,Department of Computer Science, Lakehead University, Thunder Bay, Ontario, Canada
Sabah Mohammed*,Department of Computer Science, Lakehead University, Thunder Bay, Ontario, Canada
ABSTRACT
With the increasing technology in the field of healthcare, there is a substantial increase in the amount of clinical data produced for each patient, which makes it difficult for physicians to review all the information. Also, the produced clinical data are found in a multimodal format, making it difficult to interpret and review. It has always been a common practice for healthcare professionals to document patient health data in a non-structured natural human language to communicate specifics accurately without any loss of knowledge. However, reading through all the needless data regularly decreases the optimum usage of doctor-patient time and increases the risk of error. Generally speaking, text summarization is characterized as the creation of a subset of original data containing only relevant information. It is a significant research topic in the field of NLP. Despite this, not much research focuses on summarizing the text data collected in the healthcare sector. This paper represents the application of Bio Clinical Bert in conjunction with Bert extractive summarizer to shorten clinical data. Additionally, it also implements topic modelling using LDA to assign relevant topics to the summary text. The MIMIC III dataset is used as a source of clinical notes for this project. The paper also presents key concepts revolving around text summarization and topic modelling.
KEYWORDS
POS, NER, Summarization, TF-IDF, BOW, Topic modeling, BERT
REFERENCES
[1] Feblowitz, Joshua C., Adam Wright, Hardeep Singh, Lipika Samal, and Dean F. Sittig., “Summarization of clinical information: A conceptual model,” Journal of biomedical informatics, vol.44, no.4, pp.688-699, (2011)
[2] Pomares-Quimbaya, Alexandra, Markus Kreuzthaler, and Stefan Schulz, “Current approaches to identify sections within clinical narratives from electronic health records: a systematic review,” BMC medical research methodology, vol.19, no.1, pp.155, (2019)
[3] Gehrmann, Sebastian, Franck Dernoncourt, Yeran Li, Eric T. Carlson, Joy T. Wu, Jonathan Welt, and John Foote Jr et al., “Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives.” PloS one, vol.13, no.2, article no. e0192360, (2018)
[4] Weng, Wei-Hung, Yu-An Chung, and Schrasing Tong, “Clinical text summarization with syntax-based negation and semantic concept identification,” arXiv preprint arXiv:2003.00353, (2020)
[5] Diomaiuta, Crescenzo, Maria Mercorella, Mario Ciampi, and Giuseppe De Pietro, “A novel system for the automatic extraction of a patient problem summary,” In 2017 IEEE Symposium on Computers and Communications (ISCC), pp.182-186, IEEE, (2017)
[6] Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., Mark, and R. G., “MIMIC-III, a freely accessible critical care database,” Scientific Data, vol.3, 160035, (2016) DOI: 10.1038/sdata.2016.35(CrossRef)(Google Scholar)
[7] Pivovarov and Rimma, “Electronic health record summarization over het- erogenous and irregularly sampled clinical data,” Ph. D. dissertation, Columbia University, (2015)
[8] Sibunruang, Chumsak, and Jantima Polpinij, “Finding clinical knowledge from MEDLINE abstracts by text summarization technique,” In 2018 International Conference on Information Technology (InCIT), pp.1-6, IEEE, (2018)
[9] Shubham Singh, “How to get started with NLP - 6 unique methods to perform tokenization,” Analytics Vidhya, Available: https://www.analyticsvidhya.com/blog/2019/07/how-get-started-nlp-6- unique ways perform-tokenization/, Sept., (2020)
[10] Abhishek Sharma, “How Part-of-Speech Tag, dependency and constituency parsing aid in understanding text data?” Analytics Vidhya. Available: https://www.analyticsvidhya.com/blog/2020/07/part- of-speechpos-tagging-dependency-parsing-and-constituency-parsing-in- nlp/ Sept., (2020)
[11] Pooja Mahajan, “NER tagging in python using spicy,” Medium.com. Available: https://medium.com/analytics-vidhya/ner-tagging-in-python- using-spacy-c66cf01d3c7f Sept., (2020)
[12] Wang, Yanshan, Liwei Wang, Majid Rastegar-Mojarad, Sungrim Moon, Feichen Shen, Naveed Afzal, Sijia Liu et al., “Clinical information extraction applications: A literature review,” Journal of biomedical in- formatics, vol.77, pp.34-49, (2018)
[13] Jonathan Keller, “Building a Topic Modeling Pipeline with spacy and Gensim”. TowardsDataScience.com. Available: https://towardsdatascience.com/building a topic modeling pipeline with spacy and genism c5dc03ffc619 Sept., (2020)
[14] Susan Li, “Topic modeling and Latent Dirichlet Allocation (LDA) in Python,” TowardsDataScience.com., Available: https://towardsdatascience.com/topic-modeling-and-latent-dirichletallocation-in-python-9bf156893c24 Sept., (2020)
[15] Shivam Bansal, “Beginners guide to topic modeling in Python,” Analytics Vidhya, Available:https://www.analyticsvidhya.com/blog/2016/08/beginners-guide-to-topic-modeling-in-python/ Sept., (2020)
[16] Nagwani, Naresh Kumar. “Summarizing large text collection using topic modeling and clustering based on MapReduce framework,” Journal of Big Data, vol.2, no.1, pp.1-18, (2015)
[17] Kenei, Jonah Kipcirchir, Elisha TO Opiyo, Juliet Chebet Moso, and Robert Oboko, “Clinical documents summarization using text visualization technique,” International Journal of Computer and Information Technology, vol.7, no.4, pp.139-156, (2018)
[18] Gaurika Tyagi, “NLP-Preprocessing clinical data to find sections,” TowardsDataScience.com. Available: https://towardsdatascience.com/nlp- preprocessing-clinical-data-to-find-sections-461fdadbec77 Sept., (2020)
[19] NSchrading, “Subject object extraction,” Github.com, Available: https://github.com/NSchrading/intro-spacy- nlp/blob/master/subject object extraction.py Sept., (2020)
[20] Andrew Long, “Machine learning with datetime feature engineering: predicting healthcare appointment no-shows,” Medium.com., Available: https://towardsdatascience.com/machine-learning-with- datetime-feature-engineering-predicting-healthcare-appointment-no- shows-5e4ca3a85f96 Sept., (2020)
[21] Vinod, Pooja, Seema Safar, Divins Mathew, Parvathy Venugopal, Linta Merin Joly, and Joish George, “Fine-tuning the BERTSUMEXT model for clinical report summarization,” In 2020 International Conference for Emerging Technology (INCET), pp.1-7, (2020)
[22] Gunjal, Hardik, Preetkumar Patel, KhushalParesh Thaker, Abhishek Nagrecha, Sabah Mohammed, and Alizar Marchawala, “Text summarization and classification of clinical discharge summaries using deep learning,” (2020)
[23] Alsentzer, Emily, and Anne Kim, “Extractive summarization of EHR discharge notes,” arXiv preprint arXiv:1810.12085, (2018)(CrossRef)(Google Scholar)
[24] Johnson, A., Pollard, T., Mark, and R., “MIMIC-III clinical database demo (version 1.4),” Physio Net, (2019) DOI: 10.13026/C2HM2Q(CrossRef)(Google Scholar)
[25] Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... Stanley, H. E., “Physio Bank, Physio Toolkit, and Physio Net: Components of a new research resource for complex physiologic signals,” Circulation [Online], vol.101, no.23, pp.e215-e220, (2000)
[26] Miller and Derek, “Leveraging BERT for extractive text summarization on lectures,” arXiv preprint arXiv:1906.04165, (2019)
[27] Alsentzer, Emily, John R. Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, and Matthew McDermott, “Publicly available clinical BERT embeddings,” arXiv preprint arXiv:1904.03323, (2019)
[28] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova., “Bert: Pretraining of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, (2018)(CrossRef)(Google Scholar)
[29] Maarten Grootendorst, “Topic modeling with BERT,” Towards Data- Science.com. Available: https://towardsdatascience.com/topic-modeling- with-bert-779f7db187e6 Sept., (2020)
[30] Nicha Ruchirawat, “6 tips for interpretable topic Models,” TowardsDataScience.com. Available: https://towardsdatascience.com/6-tips-to-optimize-an-nlp-topic-model-for-interpretability-20742f3047e2 Sept., (2020)