Архив / Том 34 №3 / Анализ метода останова распознавания текста в видеопотоке с использованием расширенной модели результата с посимвольными альтернативами

В сфере анализа и распознавания документов на мобильных устройствах, а также распознавания объектов в видеопотоке, задача определения момента времени, когда необходимо остановиться, является очень важной. Эффективность останова влияет не только на время, затраченное на распознавание и ввод данных, но и на ожидаемую точность результата. Данная работа направлена на расширение метода останова, основанного на моделировании следующего результата интеграции, с целью использования результата распознавания в виде строки с посимвольными альтернативами. Описаны метод и примечания по его расширению, произведена экспериментальная оценка на открытых наборах данных MIDV-500 и MIDV-2019. Рассматриваемый метод был сравнен с методами, опубликованными ранее и основанными на кластеризации входных наблюдений. Полученные результаты указывают на то, что метод останова, основанный на моделировании следующего результата интеграции, позволяет достигать более высокой точности, даже по сравнению с наилучшей достижимой конфигурацией конкурирующих методов. Однако данный метод обладает высокой вычислительной трудоемкостью и существует необходимость в оптимизации его реализации.

Ключевые слова: распознавание в видеопотоке, мобильный OCR, правила останова, принятие решения, мобильное распознавание документа, “anytime” алгоритмы

DOI: 10.31857/S0235009220030026

Цитирование для раздела "Список литературы": Булатов К. Б., Савельев Б. И., Арлазаров В. В., Федотова Н. В. Анализ метода останова распознавания текста в видеопотоке с использованием расширенной модели результата с посимвольными альтернативами. Сенсорные системы. 2020. Т. 34. № 3. С. 217-225. doi: 10.31857/S0235009220030026

Цитирование для раздела "References": Bulatov K. B., Savelyev B. I., Arlazarov V. V., Fedotova N. V. Analiz metoda ostanova raspoznavaniya teksta v videopotoke s ispolzovaniem rasshirennoi modeli rezultata s posimvolnymi alternativami [Analysis of a stopping method for text recognition in video stream using an extended result model with per-character alternatives]. Sensornye sistemy [Sensory systems]. 2020. V. 34(3). P. 217-225 (in Russian). doi: 10.31857/S0235009220030026

Список литературы:

Polevoy D.V. Ispol’zovanie mobil’nyh ustrojstv dlja vyjavlenija priznakov fabrikacii dokumentov, udostoverjajushhih lichnost' [Identity documents forgery detection with mobile devices]. Sensornye sistemy [Sensory systems]. 2019. T. 33 (2). C. 142–156 (In Russian).
Slugin D., Arlazarov V.V. Poisk tekstovyh polej dokumenta s pomoshh’ju metodov obrabotki izobrazhenij [Text fields extraction based on image processing]. Trudy ISA RAN [Proc. Institute for Systems Analysis RAS]. 2017. V. 67 (4). P. 65–73 (In Russian).
Arlazarov V.V., Bulatov K., Chernov T., Arlazarov V.L. MIDV-500: A Dataset for Identity Documents Analysis and Recognition on Mobile Devices in Video Stream. Computer optics. 2019. V. 43 (5). P. 818–824.
Arlazarov V.V., Bulatov K., Manzhikov T., Slavin O., Janiszewski I. Method of determining the necessary number of observations for video stream documents recognition. In Proc. SPIE (ICMV 2017). 2018. V. 10696. https://doi.org/10.1117/12.2310132
Berezovskij B.A., Gnedin A.V. Theory of choice and the problem of optimal stopping at the best entity. Automation and Remote Control. 1981. V. 42. P. 1221–1225.
Bulatov K. A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives. Bulletin of the South Ural State University. Ser. Mathematical Modelling, Programming & Computer Software. 2019a. V. 12 (3). P. 74–88. https://doi.org/10.14529/mmp190307
Bulatov K., Arlazarov V.V., Chernov T., Slavin O., Nikolaev D. Smart IDReader: Document recognition in video stream. In 14th International Conference on Document Analysis and Recognition (ICDAR). 2017. V. 6. P. 39–44. https://doi.org/10.1109/ICDAR.2017.347
Bulatov K., Matalov D., Arlazarov V.V. MIDV-2019; challenges of the modern mobile-based document OCR. Twelfth International Conference on Machine Vision (ICMV 2019). 2020a. V. 11433. P. 717–722. https://doi.org/10.1117/12.2558438
Bulatov K., Razumnyi N., Arlazarov V.V. On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model. International Journal on Document Analysis and Recognition (IJDAR). 2019b. V. 22. P. 303–314. https://doi.org/10.1007/s10032-019-00333-0
Bulatov K., Savelyev B., Arlazarov V.V. Next integrated result modelling for stopping the text field recognition process in a video using a result model with per-character alternatives. Proc. SPIE 11433, Twelfth International Conference on Machine Vision (ICMV 2019). 2020b. V. 114332M. https://doi.org/10.1117/12.2559447
Chernyshova Y., Aliev M., Gushchanskaia E., Sheshkus A. Optical font recognition in smartphone-captured images and its applicability for id forgery detection. In Proc. SPIE (ICMV 2018). 2019. V. 11041. https://doi.org/10.1117/12.2522955
Chow Y.S., Robbins H. A martingale system theorem and applications. Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability. 1961. V. 1. P. 93–104. University of California Press, Berkeley, CA.
Christensen S., Irle A. The monotone case approach for the solution of certain multidimensional optimal stopping problems. 2019. arXiv.1705.01763
Dangiwa B.A., Kumar S.S. A business card reader application for iOS devices based on Tesseract. 2018 International Conference on Signal Processing and Information Security (ICSPIS). 2018. P. 1–4. https://doi.org/10.1109/CSPIS.2018.8642727
Esser D., Muthmann K., Schuster D. Information extraction efficiency of business documents captured with smartphones and tablets. In Proceedings of the 2013 ACM Symposium on Document Engineering. 2013. P. 111–114. ACM, New York, NY, USA. https://doi.org/10.1145/2494266.2494302
Ferguson T.S. Optimal stopping and applications. 2006. URL: https://www.math.ucla.edu/~tom/Stopping/Contents. html (accessed 03.05.2020).
Ferguson T., Klass M. House-hunting without second moments. Sequential Analysis. 2010. V. 29 (3). P. 236–244. https://doi.org/10.1080/07474946.2010.487423
Fiscus J.G. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER). In IEEE Workshop Automatic Speech Recognition and Understanding. 1997. P. 347–354. https://doi.org/10.1109/ASRU.1997.659110
Llobet R., Cerdan-Navarro J., Perez-Cortes J., Arlandis J. OCR post-processing using weighted finite-state transducers. In 2010 20th International Conference on Pattern Recognition. 2010. P. 2021–2024. https://doi.org/10.1109/ICPR.2010.498
Povolotskiy M., Tropin D. Dynamic programming approach to template-based OCR. In Proc. SPIE (ICMV 2018). 2019. V. 11041. https://doi.org/10.1117/12.2522974
Ravneet K. Text recognition applications for mobile devices. Journal of Global Research in Computer Science. 2018. V. 9(4). P. 20–24.
Skoryukina N., Shemiakina J., Arlazarov V.L., Faradjev I. Document localization algorithms based on feature points and straight lines. In Proc. SPIE (ICMV 2017). 2018. V. 10696. https://doi.org/10.1117/12.2311478
Smith R. An overview of the Tesseract OCR engine. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). 2007. V. 02. P. 629–633.
Van Phan T., Cong Nguyen K., Nakagawa M. A nom historical document recognition system for digital archiving. International Journal on Document Analysis and Recognition (IJDAR). 2016. V. 19 (1), P. 49–64. https://doi.org/10.1007/s10032-015-0257-8
Yujian L., Bo L. A normalized levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007. V. 29 (6). P. 1091–1095. https://doi.org/10.1109/TPAMI.2007.1078
Zilberstein S. Using anytime algorithms in intelligent systems. AI Magazine. 1996. V. 17 (3). P. 73–83. https://doi.org/10.1609/aimag.v17i3.1232

Том 34 №3

Содержание

Анализ метода останова распознавания текста в видеопотоке с использованием расширенной модели результата с посимвольными альтернативами

© 2020 г. К. Б. Булатов1,2, Б. И. Савельев1,2, В. В. Арлазаров1,2, Н. В. Федотова2

Список литературы:

© 2020 г. К. Б. Булатов^1,2, Б. И. Савельев^1,2, В. В. Арлазаров^1,2, Н. В. Федотова²