Ablation Study
Input Image
Without Motion Area
Without Hand Refinement Loss
HANDI
Input Action Description: "Pour vinegar into bowl."
Input Action Description: "Julienne carrot."
Input Action Description: "Stir the pasta."
Input Action Description: "Wash fruit."
This page presents an ablation study of our proposed hand structure loss. Each row shows one sample, from left to right displaying the input context image (1st), results from the model without conditioning on our generated Motion Area (MA) (2nd), results from the model without optimizing with our Hand Refinement Loss (HRL) (3rd), and results from our method (HANDI) (4th). These results demonstrate that both Motion Area generation and the Hand Refinement Loss helps generate clearer, more natural, and more consistent hand structures without distraction in the background, even for dexterous hand motion.