Skip to content

jiuntian/OneHOI

Repository files navigation

OneHOI: Unifying Human-Object Interaction Generation and Editing

Jiun Tian Hoe1, Weipeng Hu1,2, Xudong Jiang1, Yap-Peng Tan1,4, Chee Seng Chan3

1Nanyang Technological University   2Sun Yat-sen University   3Universiti Malaya   4VinUniversity

CVPR 2026 (Main)

Paper   Code   Model   Dataset

OneHOI Teaser

OneHOI unifies Human-Object Interaction (HOI) generation and editing in a single, versatile model. It excels at challenging HOI editing, from text-guided changes to novel layout-guided control and novel multi-HOI edits. For generation, OneHOI synthesises scenes from text, layouts, arbitrary shapes, or mixed conditions, offering unprecedented control over relational understanding in images.

πŸ“° News

  • [2026/02] πŸŽ‰ OneHOI is accepted to CVPR 2026!
  • [2026/02] πŸ“¦ HOI-Edit-44K dataset is released!
  • [2026/02] 🌐 Project page is live!
  • [2026/04] πŸ“„ Paper is now available!
  • [2026/04] πŸš€ Inference code and pretrained model weights are released!

βœ… TODO

  • Release paper on arXiv
  • Release inference code and pretrained models
  • Release HOI-Edit-44K dataset
  • Release training code
  • Release Multi HOI Editing Benchmark

Abstract

Human-Object Interaction (HOI) modelling captures how humans act upon and relate to objects, typically expressed as ⟨person, action, object⟩ triplets. Existing approaches split into two disjoint families: HOI generation synthesises scenes from structured triplets and layout, but fails to integrate mixed conditions like HOI and object-only entities; and HOI editing modifies interactions via text, yet struggles to decouple pose from physical contact and scale to multiple interactions. We introduce OneHOI, a unified diffusion transformer framework that consolidates HOI generation and editing into a single conditional denoising process driven by shared structured interaction representations. At its core, the Relational Diffusion Transformer (R-DiT) models verb-mediated relations through role- and instance-aware HOI tokens, layout-based spatial Action Grounding, a Structured HOI Attention to enforce interaction topology, and HOI RoPE to disentangle multi-HOI scenes. Trained jointly with modality dropout on our HOI-Edit-44K, along with HOI and object-centric datasets, OneHOI supports layout-guided, layout-free, arbitrary-mask, and mixed-condition control, achieving state-of-the-art results across both HOI generation and editing.

Run this project

  1. Clone this project
    git clone https://github.com/jiuntian/OneHOI.git
    cd ./OneHOI
  2. We recommend uv to setting up python environment for this project. Run the following if you haven't had uv. We running this on CU
    pip install uv
  3. Install pytorch.
    uv pip install torch==2.11.0 --torch-backend=auto
  4. Download the pretrained model.
    huggingface-cli download jiuntian/OneHOI --local-dir models/OneHOI
  5. Start with:
    uv run inference.py # this downloads all other deps
    or
    uv sync
    python inference.py

Related Links

Citation

If you find our code useful, feel free to ⭐ star this repo!

If you use our work in your research, please cite:

@inproceedings{hoe2026onehoi,
  title={OneHOI: Unifying Human-Object Interaction Generation and Editing},
  author={Hoe, Jiun Tian and Hu, Weipeng and Jiang, Xudong and Tan, Yap-Peng and Chan, Chee Seng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

About

[CVPR2026] Official repo for "OneHOI: Unifying Human-Object Interaction Generation and Editing"

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages