Cog Icon

CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking CogDDN



1Zhejiang University

2vivo AI Lab
Corresponding author

Abstract

Demand-driven navigation (DDN) refers to identifying and locating objects based on implicit user needs, although in dynamic and uncertain scenarios where the locations of objects are unknown. Traditional data-driven methods rely on pre-collected data for model training and decision-making, which limits their ability to generalize in unseen scenarios. In this paper, we propose CogDDN, a framework that emulates the human attentional mechanism by selectively focusing on key objects crucial for fulfilling user demands. CogDDN incorporates a dual-process decision-making module, comprising a Heuristic Process (System-I) for fast and efficient decision-making, with an Analytic Process (System-II) that analyzes past errors, accumulates them in a knowledge base and continuously improves its performance. Chain of Thought (CoT) reasoning is employed to strengthen the decision-making process. Extensive closed-loop evaluations on the AI2Thor simulator with the ProcThor dataset demonstrate that CogDDN outperforms single-view camera-only methods by 15%, showing significant improvements in navigation accuracy and adaptability.

CogDDN Architecture

MY ALT TEXT

The monocular 3D object detection module identifies objects based on the robot's perception of the environment. The detected objects and the human demand are then used as input prompts for the demand matching module, which identifies the matched objects. This information is fed into the dual-process decision module, which drives scene description, reasoning, and decision-making. If no object matches the instruction, the explore module in the Heuristic Process is triggered to output a series of decisions for exploring unknown areas. Conversely, if a matching object is found, the exploit module, refined through the knowledge base, is activated to approach the target object. The Analytic Process uses a VLM to analyze the situation whenever obstacles are encountered, and the corrected information is stored as experience in the knowledge base.

knowledge Base

MY ALT TEXT

A random instruction is selected, and the target object is identified. The A* algorithm generates a trajectory consisting of actions based on the existing grid map and the target object's position. During the execution of these actions, the position of the target object is determined using the current viewpoint image. When a matching object is detected, the VLM generates a scene description and reasoning, which is then added to the knowledge base experience format.

Heuristic Process

MY ALT TEXT

When no target object is present, the Explore module directly utilizes the VLM and the current viewpoint image, historical information, and obstacle mark to generate a series of actions for exploring unknown areas. In contrast, the Exploit module leverages a VLM fine-tuned with knowledge base information to output a single action that progressively moves towards the target object.

Analytic Process

MY ALT TEXT

When the Heuristic Process encounters an obstruction, the Analytic Process intervenes by analyzing the previous frame to identify errors and generate corrected samples. These corrected samples are subsequently integrated into the knowledge base, supporting continuous learning.

Case Studies

BibTeX


        @misc{huang2025cogddncognitivedemanddrivennavigation,
              title={CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking}, 
              author={Yuehao Huang and Liang Liu and Shuangming Lei and Yukai Ma and Hao Su and Jianbiao Mei and Pengxiang Zhao and Yaqing Gu and Yong Liu and Jiajun Lv},
              year={2025},
              eprint={2507.11334},
              archivePrefix={arXiv},
              primaryClass={cs.AI},
              url={https://arxiv.org/abs/2507.11334}, 
        }