Ablation Study

Input Image
Without Motion Area
Without Hand Refinement Loss
HANDI
Input Image
Input Action Description: "Pour vinegar into bowl."
Input Image
Input Action Description: "Julienne carrot."
Input Image
Input Action Description: "Stir the pasta."
Input Image
Input Action Description: "Wash fruit."
This page presents an ablation study of our proposed hand structure loss. Each row shows one sample, from left to right displaying the input context image (1st), results from the model without conditioning on our generated Motion Area (MA) (2nd), results from the model without optimizing with our Hand Refinement Loss (HRL) (3rd), and results from our method (HANDI) (4th). These results demonstrate that both Motion Area generation and the Hand Refinement Loss helps generate clearer, more natural, and more consistent hand structures without distraction in the background, even for dexterous hand motion.