main result

Abstract

We propose FrePolad: frequency-rectified point latent diffusion, a point cloud generation pipeline integrating a variational autoencoder (VAE) with a denoising diffusion probabilistic model (DDPM) for the latent distribution. FrePolad simultaneously achieves high quality, diversity, and flexibility in point cloud cardinality for generation tasks while maintaining high computational efficiency. The improvement in generation quality and diversity is achieved through (1) a novel frequency rectification via spherical harmonics designed to retain high-frequency content while learning the point cloud distribution; and (2) a latent DDPM to learn the regularized yet complex latent distribution. In addition, FrePolad supports variable point cloud cardinality by formulating the sampling of points as conditional distributions over a latent shape distribution. Finally, the low-dimensional latent space encoded by the VAE contributes to FrePolad's fast and scalable sampling. Our quantitative and qualitative results demonstrate FrePolad's state-of-the-art performance in terms of quality, diversity, and computational efficiency.

Network Architecture

network architecture training
FrePolad is architectured as a point cloud VAE, with an embedded latent DDPM to represent the latent distribution. Two-stage training (left): in the first stage (blue), the VAE is optimized to maximize the FreELBO with a standard Gaussian prior; in the second stage (green), while fixing the VAE, the latent DDPM is trained to model the latent distribution; Generation (right): conditioned on a shape latent sampled from the DDPM, the CNF decoder transforms a Gaussian noise input into a synthesized shape.

Results

generation results
Generation with 2048 points for airplane, chair, and car classes. Samples generated by FrePolad have better fidelity and diversity.
plots
Plots: (b) training and (c) generation costs vs. final validation score measured by 1-NNA-CD (↓), (d) learning curves for the first 20 hours of training, and (e) generation cost for synthesizing different numbers of points.
visualization of frequency rectification
A point cloud before and after frequency rectification and its representative function in spherical and frequency domains. Frequency rectification shifts points to more complex, less smooth regions and increases the relative importance of higher-frequency features, where VAEs can give more attention during reconstruction.
flexible generation
FrePolad supports flexibility in the cardinality of the synthesized point clouds.
interpolation
Interpolation of shapes in the VAE latent space.

Citation

@inproceedings{zhou2023frepolad,
  title={FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation},
  author={Zhou, Chenliang and Zhong, Fangcheng and Hanji, Param and Guo, Zhilin and Fogarty, Kyle and Sztrajman, Alejandro and Gao, Hongyun and Oztireli, Cengiz},
  journal={ECCV 2024},
  year={2024}
}


The website template was borrowed from D2NeRF.