• 2024 (Vol.38)
  • 1990 (Vol.4)
  • 1989 (Vol.3)
  • 1988 (Vol.2)
  • 1987 (Vol.1)

Comparison of the digitized pages of business documents by means of recognition

© 2018 E. I. Andreeva, T. V. Manzhikov, O. A. Slavin

Moscow Institute of Physics and Technology (State University) 141701, Moscow Region, Dolgoprudny, Institutskiy per., 9
Federal Research Center “Computer Science and Control” of RAS Institute for System Analysis 117312 Moscow, pr. 60-letiya Oktyabrya, 9
Smart Engines Ltd. 117312 Moscow, pr. 60-letiya Oktyabrya, 9

Received 21 Aug 2017

The paper examines the problem of comparing of the digitized pages of business documents. Such task arises when comparing two copies of the documents signed by two sides with the purpose to find the possible modifications brought by one party. This task is practically significant in the bank sphere at signing contracts in a paper form. The way of comparison on the basis of recognition consisting in comparison of two sets of words which are results of recognition of reference and test pages is offered. The described experiments have been made with use of OCR Tesseract. Advantages of the proposed method is the versatility of the comparison algorithm and the high accuracy of comparison. As the main disadvantage of the proposed algorithm can be noted – the dependence on a font and the font size used for the printing.

Key words: algorithms for comparing digitized copies of documents, automatic text recognition, Levenshtein distance

DOI: 10.7868/S0235009218010067

Cite: Andreeva E. I., Manzhikov T. V., Slavin O. A. Sravnenie otsifrovannykh stranits delovykh dokumentov na osnove raspoznavaniya [Comparison of the digitized pages of business documents by means of recognition]. Sensornye sistemy [Sensory systems]. 2018. V. 32(1). P. 35-41 (in Russian). doi: 10.7868/S0235009218010067

References:

  • Bulatov K.B., Ilin D.A., Polevoy D.V., Chernyshova Y.S. Problemy raspoznavaniya mashinochitaemykh zons ispol'zovaniem maloformatnykh tsifrovykh kamer mobil'nykh ustroistv [Problems of machine-readable zone recognition captured with digital mobile cameras]. Trudy Instituta Sistemnogo Analiza Rossiiskoi Akademii Nauk. 2015. V. 65 (3). P. 85–94 (in Russian).
  • Slavin O.A. Metod klassifikatsii raspoznannykh stranits delovykh dokumentov na osnove metoda template matching [Method of classification of recognized pages of business documents on the basis of the method template matching]. Trudy Sed'moi Mezhdunarodnoi konferentsii “Sistemnyianaliz i informatsionnye tekhnologii” SAIT – 2017 [Proceedings of the Seventh International Conference “System Analysis and Information Technologies” SAIT – 2017]. 2017. P. 667–671 (in Russian).
  • Smirnov S.V. Tekhnologiya i sistema avtomaticheskoi korrektirovki rezul'tatov pri raspoznavanii arkhivnykh dokumentov. Kand. diss. [Technology and system of automatic correction of results in the recognition of archival documents. PhD Thesis]. Saint Petersburg. 2015. 130 p. (in Russian).
  • Khanipov Т.М., Nikolaev D.P. Issledovanie metoda slijanija oblastej v zadache cvetovoj segmentacii [Investigation of the regions fusion method in the problem of color segmentation]. Proceedings of the conference Information Technologies and Systems ITaS. 2010. Р. 151–155. [in Russian])."
  • Usilin S., Nikolaev D., Postnikov V. Structural Compression of Document Images with PDF/A. Proc. 24th European Conf. Modelling and Simulation. 2010. P. 242–246.