Multi-cameras calibration System Based Deep Learning Approach and Beyond: A Survey
Main Article Content
Abstract
The process of determining camera settings to deduce geometric attributes from recorded sequences is known as camera calibration. This process is essential in the fields of robotics and computer vision, encompassing both two-dimensional and three-dimensional applications. Traditional calibration methods, however, are time-consuming and require specific expertise. Recent endeavors have demonstrated that learning-based systems can replace the monotonous tasks associated with manual calibrations. Responses have been examined through a range of learning techniques, networks, geometric assumptions, and datasets. A thorough examination of camera calibration systems that rely on learning algorithms is offered in this paper, assessing their advantages and disadvantages. The primary categories of calibration presented are the regular pinhole camera model, distortion camera model, cross-sensor model, and cross-view model. These categories align with current research trends and have diverse applications. As there is no existing standard in this field, a large dataset of calibration has been created, which can be used as a public platform to assess the effectiveness of current methods. This collection consists of both artificially generated and genuine data, including images and videos obtained from various cameras in different locations. The difficulties faced will be analyzed, and alternative avenues for further research will be suggested in the next stage of this project. This survey represents the initial attempt to perform camera calibration using learning-based methods spanning a period of eight years. Our findings indicate that learning-based methods significantly reduce the time and expertise required for calibration while maintaining or improving accuracy compared to traditional methods. Specifically, our research demonstrates a calibration error reduction of up to 20% and speed improvements by a factor of three compared to traditional methods, as well as better adaptability to different camera types and environments.