I am presently a fifth-year Ph.D. student in the Department of Computer Science at Shanghai Jiao Tong University. As an active member of the
esteemed SJTU MVIG lab, I work closely under the expert guidance
of Prof. Cewu
Lu.
Before that, I acquired my Bachelor's degree (Computer Science) in 2019
from Huazhong University of Science and
Technology. My research pursuits predominantly revolve around Computer
Vision, 3D Vision, and Embodied
AI.
Generating human grasps involves both object geometry and semantic cues. This paper introduces
SemGrasp, a method that infuses semantic information
into grasp generation, aligning with language
instructions. Leveraging a unified semantic framework and a Multimodal Large Language Model (MLLM),
SemGrasp is supported by CapGrasp, a dataset
featuring detailed captions and diverse grasps.
Experiments demonstrate SemGrasp's ability to produce
grasps consistent with linguistic intentions,
surpassing shape-only approaches.
Rearranging objects is key in human-environment interaction, and creating natural sequences of such
motions is crucial in AR/VR and CG. Our work presents Favor, a unique dataset that captures
full-body virtual object rearrangement motions through motion capture and AR glasses. We also
introduce a new pipeline, Favorite, for generating
lifelike digital human rearrangement motions
driven by commands. Our experiments show that Favor
and Favorite produce high-fidelity motion
sequences.
We proposed a new method Chord which exploits the
categorical shape prior for reconstructing the
shape of intra-class objects. In addition, we constructed a new dataset, COMIC, of category-level
hand-object interaction. Comic encompasses a diverse
collection of object instances, materials, hand
interactions, and viewing directions, as illustrated.
Learning how humans manipulate objects requires machines to acquire knowledge from two perspectives:
one for understanding object affordances and the other for learning human interactions based on
affordances. In this work, we propose a multi-modal and rich-annotated knowledge repository,
OakInk,
for the visual and cognitive understanding of hand-object interactions. Check our website for more
details!
We propose a lightweight online data enrichment method that boosts articulated hand-object pose
estimation
from the data perspective.
During training, ArtiBoost alternatively performs data exploration and synthesis.
Even with a simple baseline, our method can boost it to outperform the previous SOTA on several
hand-object benchmarks.
In this paper, we extend MANO with more Diverse Accessories and Rich Textures, namely DART.
DART is comprised of 325 exquisite hand-crafted texture maps which vary in appearance and cover
different kinds of blemishes, make-ups, and accessories.
We also generate large-scale (800K), diverse, and high-fidelity hand images, paired with
perfect-aligned 3D labels, called DARTset.
OAKINK2 is a rich dataset focusing on bimanual object manipulation tasks involved in complex daily
activities. It introduces a unique three-tiered abstraction structure—Affordance, Primitive Task,
and Complex Task—to systematically organize task representations. By emphasizing an object-centric
approach, the dataset captures multi-view imagery and precise annotations of human and object poses,
aiding in applications like interaction reconstruction and motion synthesis. Furthermore, we propose
a Complex Task Completion framework that utilizes Large Language Models to break down complex
activities into Primitive Tasks and a Motion Fulfillment Model to generate corresponding bimanual
motions.
Color-NeuS focuses on mesh reconstruction with color. We remove view-dependent color while using a
relighting network to maintain volume rendering performance. Mesh is extracted from the SDF network,
and vertex color is derived from the global color network. We conceived a in hand object scanning
task and gathered several videos for it to evaluate Color-NeuS.
We highlight contact in the hand-object interaction modeling task by proposing an
explicit representation named Contact Potential Field (CPF). In CPF, we treat each contacting
hand-object
vertex pair as a spring-mass system, Hence the whole system forms a potential field with minimal
elastic
energy
at the grasp position.
Misc
Conference reviewer for CVPR 2024, ICCV
2023, ECCV
2024, ECCV
2022, CVPR 2022, and 3DV 2022.