Scene and crowd analysis using synthetic data generation with 3D quality improvements and deep network architectures

Khadka, Anish (2021) Scene and crowd analysis using synthetic data generation with 3D quality improvements and deep network architectures. (PhD thesis), Kingston University, .

Abstract

In this thesis, a scene analysis mainly focusing on vision-based techniques have been explored. The vision-based scene analysis techniques have a wide range of applications from surveillance, security to agriculture. A vision sensor can provide rich information about the environment such as colour, depth, shape, size and much more. This information can be further processed to have an in-depth knowledge of the scene such as type of environment, objects and distances. Hence, this thesis covers initially the background on human detection in particular pedestrian and crowd detection methods and introduces various vision-based techniques used in human detection. Followed by a detailed analysis of the use of synthetic data to improve the performance of state-of-the-art Deep Learning techniques and a multi-purpose synthetic data generation tool is proposed. The tool is a real-time graphics simulator which generates multiple types of synthetic data applicable for pedestrian detection, crowd density estimation, image segmentation, depth estimation, and 3D pose estimation. In the second part of the thesis, a novel technique has been proposed to improve the quality of the synthetic data. The inter-reflection also known as global illumination is a naturally occurring phenomena and is a major problem for 3D scene generation from an image. Thus, the proposed methods utilised a reverted ray-tracing technique to reduce the effect of inter-reflection problem and increased the quality of generated data. In addition, a method to improve the quality of the density map is discussed in the following chapter. The density map is the most commonly used technique to estimate crowds. However, the current procedure used to generate the map is not content-aware i.e., density map does not highlight the humans’ heads according to their size in the image. Thus, a novel method to generate a content-aware density map was proposed and demonstrated that the use of such maps can elevate the performance of an existing Deep Learning architecture. In the final part, a Deep Learning architecture has been proposed to estimate the crowd in the wild. The architecture tackled the challenging aspect such as perspective distortion by implementing several techniques like pyramid style inputs, scale aggregation method and self-attention mechanism to estimate a crowd density map and achieved state-of-the-art results at the time.

Actions (Repository Editors)

Item Control Page Item Control Page