sellJournal Paper sellOpen Research Europe sellLarge Language Models sellScene Graphs sellContext-Aware Assistance sellSituational Awareness

SituationalLLM: Proactive Language Models with Scene Awareness for Dynamic, Contextual Task Guidance

Abstract

Large Language Models (LLMs) have demonstrated remarkable success in text-based reasoning tasks but struggle to provide actionable guidance in real-world physical environments. This limitation arises from their lack of situational awareness—an inability to recognize gaps in their understanding of a user’s physical context, leading to unreliable and overly generic instructions. To address this, we propose SituationalLLM, a novel approach that integrates structured scene representations into LLMs to improve context-aware assistance. SituationalLLM leverages scene graphs—structured representations of objects, attributes, and spatial relationships—to encode real-world environments in a text-based Scene Graph Language. We introduce the Situational Awareness Database for Instruct-Tuning (SAD-Instruct), which pairs diverse scene graphs with multi-agent dialogue, enabling LLMs to iteratively refine their guidance through clarifying questions. A LoRA-adapted LLaMA-3-8B model is fine-tuned on SAD-Instruct to bridge structured knowledge with natural language reasoning, enhancing its ability to recognize missing information and dynamically adjust responses. Qualitative evaluations show that SituationalLLM outperforms state-of-the-art LLMs (GPT-4, LLaMA-3) in providing precise, task-specific, and contextually relevant instructions. The model reduces hallucinations by proactively identifying missing environmental details and requesting clarifications before generating guidance. Through comparative analyses on everyday tasks (e.g., cooking, office assistance), SituationalLLM demonstrates superior adaptability, delivering grounded, user-centered recommendations. By integrating structured scene representations and iterative dialogue-based refinements, SituationalLLM enables more reliable, context-aware AI assistants. This research highlights the significance of bridging structured knowledge with natural language for enhanced real-world task guidance. Future work should focus on expanding scenario diversity and improving real-time scene perception to further enhance situational adaptability.

SituationalLLM is a framework designed to give language models the ability to offer reliable, context-aware task guidance in physical environments. The system combines structured environmental knowledge, an interaction-driven training dataset, and a fine-tuned model that learns to reason about what it knows — and what it still needs to ask.

At the foundation is the Scene Graph Language, a compact text representation of objects, attributes, and spatial relations within an environment. This provides the model with a structured world description that can be interpreted entirely through language.

To train the model to use this structure effectively, we introduce SAD-Instruct, a dataset of scenario-specific scene graphs paired with multi-agent dialogues and grounded step-by-step instructions. The dialogues model how an assistant should behave when uncertain: identify missing context, request clarification, refine its plan, and only then provide guidance.

SAD-Instruct contains:

– scenario-aligned scene graphs derived from realistic indoor scans
– object-level attributes, affordances, and spatial relations
– multi-turn dialogues showing clarifying questions and refinements
– grounded procedural instructions for each scenario

Each sample includes:

{
  "scan": "string",
  "scenario": "string",
  "objects": [
    {
      "global_id": "string",
      "id": "string",
      "label": "string",
      "ply_color": "string",
      "affordances": ["string"],
      "attributes": {
        "texture": ["string"],
        "lexical": ["string"],
        "color": ["string"],
        "material": ["string"],
        "shape": ["string"]
      }
    }
  ],
  "instructions": "string",
  "conversation": [
    {
      "role": "string",
      "content": "string"
    }
  ]
}


Some data samples are visualized below:


A lightweight, LoRA-adapted LLaMA-3-8B model is fine-tuned on SAD-Instruct, resulting in a system that proactively checks assumptions, adapts its instructions to the environment, and avoids generic or hallucinated guidance. Qualitative comparisons show stronger situational reasoning than standard LLMs across everyday tasks.

Grant information

This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101135724 (Language Augmentation for Humanverse [LUMINOUS]), addressing Topic HORIZON-CL4-2023-HUMAN-01-21.

License

The dataset is licensed under Creative Commons Attribution-NonCommercial 4.0 International License. By using the dataset, you agree to the terms of the license.