• 2024 (Vol.38)
  • 1990 (Vol.4)
  • 1989 (Vol.3)
  • 1988 (Vol.2)
  • 1987 (Vol.1)

Learning a metric feature space with siamese networks for disparity map computation

© 2018 D. O. Okhlopkov, S. A. Gladilin, F. A. Fedorenko

Moscow Institute of Physics and Technology, 141700 Moskovskaya Oblast, Dolgoprudnii, Institutsky pereulok, 9, Russia
Institute for Information Transmission Problems, 127051 Moscow, Bolshoy Karetny pereulok, 19, Russia

Received 21 Aug 2017

In this article we consider stereo matching problem using classic dynamic time warping algorithm. This method utilizes similarity metrics of small ( pixels) image fragments. We consider algorithm’s quality in respect to the similarity metric. We compare -norm of point-wise differences between the pixel neighborhoods and neural network-based metrics, based on relatively small (about 1000) number of neurons and an output vector of dimension 64. Similarity metric in this case is an -norm of a vector of neural network output differences. We show that this modification of DTW achieves better results than the unmodified -distance-based method. All neural networks in this article were trained on open datasets Middlebury Stereo Datasets and KITTI.

Key words: computer vision, stereoscopy, stereo matching, dynamic time warping, machine learning, Siamese neural networks

DOI: 10.1134/S0235009218030101

Cite: Okhlopkov D. O., Gladilin S. A., Fedorenko F. A. Postroenie metricheskogo priznakovogo prostranstva pri pomoshchi siamskikh neironnykh setei dlya vychisleniya karty disparatnosti [Learning a metric feature space with siamese networks for disparity map computation]. Sensornye sistemy [Sensory systems]. 2018. V. 32(3). P. 253-259 (in Russian). doi: 10.1134/S0235009218030101

References:

  • Forsait D., Pons Zh. Komp’yuternoe zrenie. Sovremennyi podkhod. [Computer vision. Modern approach] Moscow. Vil’yams, 2004. 928 p. (in Russian).
  • Birchfield S., Tomasi C. A pixel dissimilarity measure that is insensitive to image sampling. TPAMI. 1998. P. 401–406.
  • Chopra S., Hadsell R., LeCun Y. Learning a similarity metric discriminatively, with application to face verification. CVPR 2005. V. 1. P. 539–546.
  • Einecke N., Eggert J. A two-stage correlation method for stereoscopic depth estimation. International Conference on Digital Image Computing: Techniques and Applications. 2010. P. 227–234.
  • Fedorenko F.A., Ivanova A.A., Limonova E.E., Konovalenko I.A., Trainable Siamese keypoint descriptors for real-time applications. 2016 International Conference on Robotics and Machine Vision. 2017. P. 1025306–1025306.
  • Geiger A., Lenz P., Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. CVPR, 2012. P. 3354–3361.
  • Hirschmuller H., Scharstein D. Evaluation of stereo matching costs on images with radiometric differences. TPAMI. 2009. P. 1582–1599.
  • Itakura F. Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing. 1975. V. 23. № 1. P. 67–72.
  • Ma Z., He K., Wei Y., Sun J., Wu E. Constant time weighted median filtering for stereo matching and beyond. Proceedings of the IEEE International Conference on Computer Vision. 2013. P. 49–56.
  • Min D., Lu J., Do M. A revisit to cost aggregation in stereo matching: How far can we reduce its computational redundancy? ICCV. 2011. P. 1567–1574.
  • Muehlmann K., Maier D., Hesser J., Maenner R. Calculating dense disparity maps from color stereo images, an efficient implementation. IJCV. 2002. P. 79–88.
  • Rhemann C., Hosni A., Bleyer M., Rother C., Gelautz M. Fast cost-volume filtering for visual correspondence and beyond. Transactions on Pattern Analysis and Machine Intelligence. 2013. V. 35. № 2.
  • Scharstein D., Szeliski R. Middlebury stereo vision page URL: http://www.middlebury.edu/stereo. (last access date 15.08.2017).
  • Scharstein D., Szeliski R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV. 2002 P. 7–42.
  • Sun D., Roth S., Black M.J. Secrets of optical flow estimation and their principles. CVPR. 2010. P. 2432–2439.
  • Takaya K. Dense Stereo Disparity Maps – Real-time Video Implementation by the Sparse Feature Sampling. MVA. 2011. P. 164–167.
  • Yoon K., Kweon I. Adaptive support-weight approach for correspondence search. TPAMI. 2006. P. 650–656.
  • Zagoruyko S., Komodakis N. Learning to compare image patches via convolutional neural networks. CVPR. 2015. P. 4353–4361.
  • Zbontar J., LeCun Y. Stereo matching by training a convolutional neural network to compare image patches. Journal of Machine Learning Research. 2016. V. 17. №1–32. P. 2.
  • Zhang K., Lu J., Lafruit G. Cross-based local stereo matching using orthogonal integral images. TCSVT. 2009. P. 1073–1079.