Hierarchical text-conditional image generation with CLIP latents

3 years ago 13
Read Entire Article