Visual-Language-Guided Task Planning for Horticultural Robots
About
Crop monitoring is essential for precision agriculture, but current systems lack high-level reasoning. We introduce a novel, modular framework that uses a Visual Language Model (VLM) to guide robotic task planning, interleaving input queries with action primitives. We contribute a comprehensive benchmark for short- and long-horizon crop monitoring tasks in monoculture and polyculture environments. Our main results show that VLMs perform robustly for short-horizon tasks (comparable to human success), but exhibit significant performance degradation in challenging long-horizon tasks. Critically, the system fails when relying on noisy semantic maps, demonstrating a key limitation in current VLM context grounding for sustained robotic operations. This work offers a deployable framework and critical insights into VLM capabilities and shortcomings for complex agricultural robotics.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multiple Plant - Multiple Target Monitoring | Agricultural Crop Monitoring Dataset | Success Rate9.38 | 2 | |
| Multiple Plant - Multiple Target Task Planning | Simple Polyculture Environment | Success Rate0.00e+0 | 2 | |
| Multiple Plant - Single Target Monitoring | Agricultural Crop Monitoring Dataset | Success Rate42.11 | 2 | |
| Multiple Plant - Single Target Task Planning | Simple Polyculture Environment | Success Rate33.33 | 2 | |
| Robot Task Planning | Complex Polyculture Environment Table II (Single Plant - Single Target) | Success Rate100 | 2 | |
| Robot Task Planning | Complex Polyculture Environment Single Plant - Multiple Target Table II | Success Rate0.8095 | 2 | |
| Robot Task Planning | Complex Polyculture Environment Multiple Plant - Single Target Table II | Success Rate31.58 | 2 | |
| Robot Task Planning | Complex Polyculture Environment Multiple Plant - Multiple Target Table II | Success Rate0.2727 | 2 | |
| Single Plant - Multiple Target Monitoring | Agricultural Crop Monitoring Dataset | Success Rate68.42 | 2 | |
| Single Plant - Multiple Target Task Planning | Simple Polyculture Environment | Success Rate70.59 | 2 |