Abstract: This work introduces Talk2BEV, a large vision-language model (LVLM) 1 interface for bird’s-eye view (BEV) maps commonly used in autonomous driving. While existing perception systems for ...
Abstract: Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to navigate 3D environments based on visual observations and natural language instructions. Existing ...