Fast & Efficient Normalizing Flows and Applications of Image Generative Models
About
This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-world computer vision challenges. The first part introduce significant improvements to normalizing flow architectures through six key innovations: 1) Development of invertible 3x3 Convolution layers with mathematically proven necessary and sufficient conditions for invertibility, (2) introduction of a more efficient Quad-coupling layer, 3) Design of a fast and efficient parallel inversion algorithm for kxk convolutional layers, 4) Fast & efficient backpropagation algorithm for inverse of convolution, 5) Using inverse of convolution, in Inverse-Flow, for the forward pass and training it using proposed backpropagation algorithm, and 6) Affine-StableSR, a compact and efficient super-resolution model that leverages pre-trained weights and Normalizing Flow layers to reduce parameter count while maintaining performance. The second part: 1) An automated quality assessment system for agricultural produce using Conditional GANs to address class imbalance, data scarcity and annotation challenges, achieving good accuracy in seed purity testing; 2) An unsupervised geological mapping framework utilizing stacked autoencoders for dimensionality reduction, showing improved feature extraction compared to conventional methods; 3) We proposed a privacy preserving method for autonomous driving datasets using on face detection and image inpainting; 4) Utilizing Stable Diffusion based image inpainting for replacing the detected face and license plate to advancing privacy-preserving techniques and ethical considerations in the field.; and 5) An adapted diffusion model for art restoration that effectively handles multiple types of degradation through unified fine-tuning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Density Estimation | CIFAR-10 (test) | Bits/dim3.3471 | 134 | |
| Density Estimation | ImageNet 32x32 (test) | Bits per Sub-pixel4.014 | 66 | |
| Density Estimation | ImageNet 64x64 (test) | Bits Per Sub-Pixel3.8514 | 62 | |
| Image Generation | MNIST (test) | -- | 13 | |
| Face Detection | Pvt-IDD real faces (test) | -- | 8 | |
| Face Detection | Pvt-IDD anonymized faces (test) | -- | 8 | |
| Density Estimation | Galaxy (test) | Bits-per-Dimension2.2591 | 3 | |
| Object Detection | Missing Traffic Signs Video Dataset (MTSVD) | mAP90 | 3 | |
| Scene Categorization | Missing Traffic Signs Video Dataset (MTSVD) | Top-1 Accuracy60.5 | 3 | |
| Clustering | Landsat 8 (ground truth data (30 rock samples)) | -- | 3 |