No code

Text2Human

2022 / ACM Transactions on Graphics / DOI 10.1145/3528223.3530104

Yuming Jiang Shuai Yang Haonan Qiu Wayne Wu Chen Change Loy Ziwei Liu

Generating high-quality and diverse human images is an important yet challenging task in vision and graphics. However, existing generative models often fall short under the high diversity of clothing shapes and textures. Furthermore, the generation process is even desired to be intuitively controllable for layman users. In this work, we present a text-driven controllable framework, Text2Human, for a high-quality and diverse human generation. We synthesize full-body human images starting from a given human pose with two dedicated steps. 1) With some texts describing the shapes of clothes, the given human pose is first translated to a human parsing map. 2) The final human image is then generated by providing the system with more attributes about the textures of clothes. Specifically, to model the diversity of clothing textures, we build a hierarchical texture-aware codebook that stores multi-scale neural representations for each type of texture. The codebook at the coarse level includes the structural representations of textures, while the codebook at the fine level focuses on the details of textures. To make use of the learned hierarchical codebook to synthesize desired images, a diffusion-based transformer sampler with mixture of experts is firstly employed to sample indices from the coarsest level of the codebook, which then is used to predict the indices of the codebook at finer levels. The predicted indices at different levels are translated to human images by the decoder learned accompanied with hierarchical codebooks. The use of mixture-of-experts allows for the generated image conditioned on the fine-grained text input. The prediction for finer level indices refines the quality of clothing textures. Extensive quantitative and qualitative evaluations demonstrate that our proposed Text2Human framework can generate more diverse and realistic human images compared to state-of-the-art methods. Our project page is https://yumingj.github.io/projects/Text2Human.html. Code and pretrained models are available at https://github.com/yumingj/Text2Human.

122

Citations

References

Implementations

Reusable

Repro status

Reproducibility Dossier

ReusableConfidence: editor verified / checked Apr 2026

GEOMDIGEST treats reproducibility as an evidence trail: public artifacts, documentation, data, packaging, archival stability, and verification checks. Numeric scores are only exposed for audited records; public pages prioritize the evidence itself.

Evidence

Verified

yes

Code

not yet

Data

not yet

Docs

not yet

Build checks

code / verified / editor verified

Detected evidence link

supplementary / verified / editor verified

Detected evidence link

code / verified / editor verified

Code repository discovered from verified project page

Methodology

Improve this dossier

Implementation Index

No implementations indexed yet

This paper is in the knowledge graph, but we have not attached a runnable artifact yet.

Citation Lineage

References5

2022AvatarCLIP200 cites 2021StyleFlow: Attribute-conditioned Exploration ...385 cites 2021Pose with style99 cites 2021TryOnGAN80 cites 2021AgileGAN75 cites

Selected paper

Text2Human

2022 / 122 citations

Cited by1

2022Text2Light69 cites