Journal Paper Open Research Europe Large Language Models Scene Graphs Context-Aware Assistance Situational Awareness

SituationalLLM: Proactive Language Models with Scene Awareness for Dynamic, Contextual Task Guidance [version 1; peer review: awaiting peer review]

Muhammad Saif Ullah Khan, Muhammad Zeshan Afzal, Didier Stricker

Abstract

Large Language Models (LLMs) have demonstrated remarkable success in text-based reasoning tasks but struggle to provide actionable guidance in real-world physical environments. This limitation arises from their lack of situational awareness—an inability to recognize gaps in their understanding of a user’s physical context, leading to unreliable and overly generic instructions. To address this, we propose SituationalLLM, a novel approach that integrates structured scene representations into LLMs to improve context-aware assistance. SituationalLLM leverages scene graphs—structured representations of objects, attributes, and spatial relationships—to encode real-world environments in a text-based Scene Graph Language. We introduce the Situational Awareness Database for Instruct-Tuning (SAD-Instruct), which pairs diverse scene graphs with multi-agent dialogue, enabling LLMs to iteratively refine their guidance through clarifying questions. A LoRA-adapted LLaMA-3-8B model is fine-tuned on SAD-Instruct to bridge structured knowledge with natural language reasoning, enhancing its ability to recognize missing information and dynamically adjust responses. Qualitative evaluations show that SituationalLLM outperforms state-of-the-art LLMs (GPT-4, LLaMA-3) in providing precise, task-specific, and contextually relevant instructions. The model reduces hallucinations by proactively identifying missing environmental details and requesting clarifications before generating guidance. Through comparative analyses on everyday tasks (e.g., cooking, office assistance), SituationalLLM demonstrates superior adaptability, delivering grounded, user-centered recommendations. By integrating structured scene representations and iterative dialogue-based refinements, SituationalLLM enables more reliable, context-aware AI assistants. This research highlights the significance of bridging structured knowledge with natural language for enhanced real-world task guidance. Future work should focus on expanding scenario diversity and improving real-time scene perception to further enhance situational adaptability.

Grant information

This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101135724 (Language Augmentation for Humanverse [LUMINOUS]), addressing Topic HORIZON-CL4-2023-HUMAN-01-21.

Acknowledgements

We would like to acknowledge the contributions of Sankalp Sinhah’s invaluable input and deeply appreciate his insights and early discussions, which laid the groundwork for the research presented in this paper.

@article{khan2024situational, 
  title={SituationalLLM: Proactive language models with scene awareness for dynamic, contextual task guidance}, 
  author={Khan, Muhammad Saif Ullah and Afzal, Muhammad Zeshan and Stricker, Didier}, 
  journal={arXiv preprint arXiv:2406.13302}, 
  year={2024} 
}

Maintained by saifkhichi96 on GitHub.

The website is distributed under different open-source licenses. For more details, see the notice at the bottom of the page.

Found an issue? Report it or edit this page to help us improve.