Hierarchical text-conditional image generation with CLIP latents

3 years ago 12
Read Entire Article