About this Journal  |  Author Guidelines  |   Submit a Manuscript     

International Journal of Reliable Information and Assurance

Volume 4 No. 2, 2016, pp 7-12
http://dx.doi.org/10.21742/ijria.2016.4.2.02

Abstract



A Text-learning Based Method of Detecting Personal Information in Image Files



    Youngkyung Lee1, Chaeho Cho2 and Yoojae Won3
    1,2,3Dept. of Computer Science Engineering, Chungnam National University

    Abstract

    Recently, as the private and corporate damages caused by the leakage of files that include personal information are increasing, the leaked personal information itself is being exploited through illegal distribution. This study deals with the issue by developing software for detecting the personal information embedded in image files through text-based training. The Tesseract Optical Character Recognition engine, which detects personal information, converts the characters contained in images to text and uses the Levenshtein Distance Algorithm to check for similarities in the personal information. In addition, the possibility of personal information being included in the image file was represented in percentages. The text recognition capacity and personal information recognition accuracy are enhanced through text trainings. The existing function for detecting personal information can be expanded and applied to the text file, image files, etc., by using the personal information detection software for image files. It is possible to block and prevent the leaks of personal information contained in image files stored on personal and corporate PCs.


 

Contact Us

  • PO Box 5074, Sandy Bay Tasmania 7005, Australia
  • Phone: +61 3 9028 5994