In the field of document analysis and recognition using mobile devices for capturing, and the field of object
recognition in a video stream, an important problem is determining the time when the capturing process should be
stopped. Efficient stopping influences not only the total time spent for performing recognition and data entry, but the
expected accuracy of the result as well. This paper is directed on extending the stopping method based on the modelling
of the next integrated recognition result, in order for it to be used within a string result recognition model with per-
character alternatives. The stopping method and notes on its extension are described, and experimental evaluation is
performed using the open datasets MIDV-500 and MIDV-2019. The method was compared with previously published methods
based on input observations clustering. The obtained results indicate that the stopping method based on the next
integrated result modelling allows to achieve higher accuracy, even when compared with the best achievable configuration
of the competing methods, however the computations required are significant and more research should be targeted on
optimizing its implementation.
Key words:
recognition in video stream, mobile OCR, stopping rules, decision making, mobile document recognition, anytime
algorithms
DOI: 10.31857/S0235009220030026
Cite:
Bulatov K. B., Savelyev B. I., Arlazarov V. V., Fedotova N. V.
Analiz metoda ostanova raspoznavaniya teksta v videopotoke s ispolzovaniem rasshirennoi modeli rezultata s posimvolnymi alternativami
[Analysis of a stopping method for text recognition in video stream using an extended result model with per-character alternatives].
Sensornye sistemy [Sensory systems].
2020.
V. 34(3).
P. 217-225 (in Russian). doi: 10.31857/S0235009220030026
References:
- Polevoy D.V. Ispol’zovanie mobil’nyh ustrojstv dlja vyjavlenija priznakov fabrikacii dokumentov, udostoverjajushhih lichnost' [Identity documents forgery detection with mobile devices]. Sensornye sistemy [Sensory systems]. 2019. T. 33 (2). C. 142–156 (In Russian).
- Slugin D., Arlazarov V.V. Poisk tekstovyh polej dokumenta s pomoshh’ju metodov obrabotki izobrazhenij [Text fields extraction based on image processing]. Trudy ISA RAN [Proc. Institute for Systems Analysis RAS]. 2017. V. 67 (4). P. 65–73 (In Russian).
- Arlazarov V.V., Bulatov K., Chernov T., Arlazarov V.L. MIDV-500: A Dataset for Identity Documents Analysis and Recognition on Mobile Devices in Video Stream. Computer optics. 2019. V. 43 (5). P. 818–824.
- Arlazarov V.V., Bulatov K., Manzhikov T., Slavin O., Janiszewski I. Method of determining the necessary number of observations for video stream documents recognition. In Proc. SPIE (ICMV 2017). 2018. V. 10696. https://doi.org/10.1117/12.2310132
- Berezovskij B.A., Gnedin A.V. Theory of choice and the problem of optimal stopping at the best entity. Automation and Remote Control. 1981. V. 42. P. 1221–1225.
- Bulatov K. A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives. Bulletin of the South Ural State University. Ser. Mathematical Modelling, Programming & Computer Software. 2019a. V. 12 (3). P. 74–88. https://doi.org/10.14529/mmp190307
- Bulatov K., Arlazarov V.V., Chernov T., Slavin O., Nikolaev D. Smart IDReader: Document recognition in video stream. In 14th International Conference on Document Analysis and Recognition (ICDAR). 2017. V. 6. P. 39–44. https://doi.org/10.1109/ICDAR.2017.347
- Bulatov K., Matalov D., Arlazarov V.V. MIDV-2019; challenges of the modern mobile-based document OCR. Twelfth International Conference on Machine Vision (ICMV 2019). 2020a. V. 11433. P. 717–722. https://doi.org/10.1117/12.2558438
- Bulatov K., Razumnyi N., Arlazarov V.V. On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model. International Journal on Document Analysis and Recognition (IJDAR). 2019b. V. 22. P. 303–314. https://doi.org/10.1007/s10032-019-00333-0
- Bulatov K., Savelyev B., Arlazarov V.V. Next integrated result modelling for stopping the text field recognition process in a video using a result model with per-character alternatives. Proc. SPIE 11433, Twelfth International Conference on Machine Vision (ICMV 2019). 2020b. V. 114332M. https://doi.org/10.1117/12.2559447
- Chernyshova Y., Aliev M., Gushchanskaia E., Sheshkus A. Optical font recognition in smartphone-captured images and its applicability for id forgery detection. In Proc. SPIE (ICMV 2018). 2019. V. 11041. https://doi.org/10.1117/12.2522955
- Chow Y.S., Robbins H. A martingale system theorem and applications. Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability. 1961. V. 1. P. 93–104. University of California Press, Berkeley, CA.
- Christensen S., Irle A. The monotone case approach for the solution of certain multidimensional optimal stopping problems. 2019. arXiv.1705.01763
- Dangiwa B.A., Kumar S.S. A business card reader application for iOS devices based on Tesseract. 2018 International Conference on Signal Processing and Information Security (ICSPIS). 2018. P. 1–4. https://doi.org/10.1109/CSPIS.2018.8642727
- Esser D., Muthmann K., Schuster D. Information extraction efficiency of business documents captured with smartphones and tablets. In Proceedings of the 2013 ACM Symposium on Document Engineering. 2013. P. 111–114. ACM, New York, NY, USA. https://doi.org/10.1145/2494266.2494302
- Ferguson T.S. Optimal stopping and applications. 2006. URL: https://www.math.ucla.edu/~tom/Stopping/Contents. html (accessed 03.05.2020).
- Ferguson T., Klass M. House-hunting without second moments. Sequential Analysis. 2010. V. 29 (3). P. 236–244. https://doi.org/10.1080/07474946.2010.487423
- Fiscus J.G. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER). In IEEE Workshop Automatic Speech Recognition and Understanding. 1997. P. 347–354. https://doi.org/10.1109/ASRU.1997.659110
- Llobet R., Cerdan-Navarro J., Perez-Cortes J., Arlandis J. OCR post-processing using weighted finite-state transducers. In 2010 20th International Conference on Pattern Recognition. 2010. P. 2021–2024. https://doi.org/10.1109/ICPR.2010.498
- Povolotskiy M., Tropin D. Dynamic programming approach to template-based OCR. In Proc. SPIE (ICMV 2018). 2019. V. 11041. https://doi.org/10.1117/12.2522974
- Ravneet K. Text recognition applications for mobile devices. Journal of Global Research in Computer Science. 2018. V. 9(4). P. 20–24.
- Skoryukina N., Shemiakina J., Arlazarov V.L., Faradjev I. Document localization algorithms based on feature points and straight lines. In Proc. SPIE (ICMV 2017). 2018. V. 10696. https://doi.org/10.1117/12.2311478
- Smith R. An overview of the Tesseract OCR engine. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). 2007. V. 02. P. 629–633.
- Van Phan T., Cong Nguyen K., Nakagawa M. A nom historical document recognition system for digital archiving. International Journal on Document Analysis and Recognition (IJDAR). 2016. V. 19 (1), P. 49–64. https://doi.org/10.1007/s10032-015-0257-8
- Yujian L., Bo L. A normalized levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007. V. 29 (6). P. 1091–1095. https://doi.org/10.1109/TPAMI.2007.1078
- Zilberstein S. Using anytime algorithms in intelligent systems. AI Magazine. 1996. V. 17 (3). P. 73–83. https://doi.org/10.1609/aimag.v17i3.1232