Datasets containing robotic failure detection datasets and benchmarks. The data is organized as image VQA with multi-class, fine-grained failures.