Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior
arxiv(2024)
摘要
Existing neural rendering-based text-to-3D-portrait generation methods
typically make use of human geometry prior and diffusion models to obtain
guidance. However, relying solely on geometry information introduces issues
such as the Janus problem, over-saturation, and over-smoothing. We present
Portrait3D, a novel neural rendering-based framework with a novel joint
geometry-appearance prior to achieve text-to-3D-portrait generation that
overcomes the aforementioned issues. To accomplish this, we train a 3D portrait
generator, 3DPortraitGAN-Pyramid, as a robust prior. This generator is capable
of producing 360 canonical 3D portraits, serving as a starting point for
the subsequent diffusion-based generation process. To mitigate the "grid-like"
artifact caused by the high-frequency information in the feature-map-based 3D
representation commonly used by most 3D-aware GANs, we integrate a novel
pyramid tri-grid 3D representation into 3DPortraitGAN-Pyramid. To generate 3D
portraits from text, we first project a randomly generated image aligned with
the given prompt into the pre-trained 3DPortraitGAN-Pyramid's latent space. The
resulting latent code is then used to synthesize a pyramid tri-grid. Beginning
with the obtained pyramid tri-grid, we use score distillation sampling to
distill the diffusion model's knowledge into the pyramid tri-grid. Following
that, we utilize the diffusion model to refine the rendered images of the 3D
portrait and then use these refined images as training data to further optimize
the pyramid tri-grid, effectively eliminating issues with unrealistic color and
unnatural artifacts. Our experimental results show that Portrait3D can produce
realistic, high-quality, and canonical 3D portraits that align with the prompt.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要