News

From Manuals to XR: The AI Powering MOTIVATE XR 

The Hidden Challenge of Modern Industry

Imagine standing in front of a multi-million-euro aircraft engine or complex industrial robotic assembly line. You are a technician tasked with a critical maintenance procedure. To guide you, you are handed in a 500-page installation and maintenance manual. It is incredibly dense, filled with highly technical jargon, intricate black-and-white schematics, and endless tables of safety specifications. 

While these manuals are indispensable for the commissioning and maintenance of industrial machinery, their sheer volume presents a massive cognitive hurdle. Technicians often spend a considerable amount of their valuable time just locating the correct procedure. Once they find it, they must constantly flick back and forth between text-heavy instructions and image-only reference sections, repeatedly verifying each step to ensure nothing goes wrong. 

In today’s fast-paced industrial landscape, this traditional method is no longer sustainable. Updates, translations, and annotations to these manuals are still largely done by hand, which severely limits how quickly a workforce can respond to rapidly evolving product designs. Furthermore, as industries shift toward more human-centric approaches – often referred to as Industry 5.0 – there is a growing consensus that we must provide our workers with better, more intuitive tools. 

Extended Reality (XR) – which encompasses Virtual, Augmented, and Mixed Reality – has proven to be a game-changer for skill transfer. However, a massive bottleneck remains: how do we get the information out of those 500-page PDFs and into an immersive 3D environment? Until now, XR developers have had painstakingly recreated training procedures from scratch, a process that is slow, expensive, and difficult to scale. 

This is exactly the problem that the MOTIVATE XR project set out to solve. And at the heart of this solution is a innovative piece of artificial intelligence technology that we, at the Universidad Politécnica de Madrid (UPM), have proudly co-led in close collaboration with our partners Sopra Steria and D-Cube (D3). 

Meet the Brain: The Semantic Processing Engine (SPE)

The SPE is a modular, AI-powered backend designed to act as the intelligent bridge between static technical documentation and active industrial XR assistance. You can think of it as one of the central nervous system of the MOTIVATE XR authoring platform. 

When a user uploads a heterogeneous technical document – meaning a file that contains a messy mix of unstructured text, complex tables, and detailed figures – the SPE goes to work. It reads, analyses, and transforms this flat document into a structured, highly queryable format known as a Knowledge Graph. 

What is a Knowledge Graph? Imagine a traditional database as a simple Excel spreadsheet, where information is stored in rigid rows and columns. A Knowledge Graph, by contrast, is like a massive, interconnected web of concepts. It doesn’t just store the word “screwdriver”; it understands that a “screwdriver” is a tool, which is required for step 4, which applies to the landing gear assembly, which has a safety warning associated with it. SPE allows the MOTIVATE XR authoring tools to ask complex questions and get precise, context-aware answers.  

The Two Halves of the Engine: Text and Graphics

Technical manuals are multimodal – they communicate through both words and pictures. Therefore, our AI engine had to be multimodal as well. We divided the SPE into two synergistic components that work in perfect harmony: the Semantic Textual Engine (STE) and the Semantic Graphical Engine (SGE). 

1. The Semantic Textual Engine (STE): The Reader

The STE is the component responsible for tackling the dense prose of industrial documents. Utilising advanced Large Language Models (LLMs), the STE parses the unstructured text to identify key semantic entities, industrial procedures, tool requirements, and safety warnings. 

It reads a paragraph that might say, “Before removing the valve cover, ensure the primary power switch is set to OFF and use a 10mm wrench to loosen the four retaining bolts,” and breaks it down into a structured sequence of actions. It maps out the prerequisites, the required tools, the exact sequence of steps, and the safety conditions. It then feeds all of this highly structured intelligence directly into the Knowledge Graph. 

Text is only half the story. Industrial manuals rely heavily on intricate diagrams, exploded views, and schematic tables. This is where the Semantic Graphical Engine (SGE) steps in. 

The SGE applies state-of-the-art Vision-Language Models (VLMs) to infer spatial and functional meaning from graphical content. Instead of just seeing an image as a block of pixels, the SGE “looks” at a diagram of a motor and identifies the distinct parts, matching them to the descriptions found in the text. 

One of the greatest challenges we faced was grounding these images in the correct context. A picture of a generic bolt means nothing without context. To solve this, the SGE performs “context-aware” analysis. It feeds a short snippet of the text surrounding the image into the AI model alongside the image itself. This anchors the visual interpretation in the correct domain, ensuring that the engine understands precisely what part of the machinery is being illustrated.

Keeping AI Honest and Verifiable

Anyone who has used modern AI chatbots knows that they can sometimes “hallucinate” – invent facts or provide plausible-sounding but entirely incorrect answers. In industrial environments, hallucination isn’t just an annoyance; it is a critical safety hazard. You cannot have an AI inventing a torque value for an aircraft engine component. 

Traditional AI generates answers based on its vast, generalised training data. GraphRAG, however, restricts the AI. When a user requests a procedure (e.g., “Generate a torque-calibration checklist for aircraft model P250”), the system first queries our highly structured, verified Knowledge Graph. It gathers all the factual, interconnected data specific to that exact manual. Then, and only then, does it use the reasoning power of the Large Language Model to format that data into a step-by-step XR training scenario. 

Because the response is generated strictly from the structured evidence contained within the Knowledge Graph, the output is completely verifiable and directly grounded in the manufacturer’s original source of documentation. This eliminates hallucinations and guarantees that the XR training reflects the exact specifications of the equipment. 

Putting It to the Test: Real-World Pilots

A tool is only as good as its performance in the real world. We are actively validating the Semantic Processing Engine across several highly demanding industrial pilots within the MOTIVATE XR project. 

In the Aerospace sector, where compliance and safety are paramount, the SPE is proving its ability to extract flawless, step-by-step maintenance protocols from incredibly dense aeronautic documentation. In the Home Appliances sector, where technicians face a vast and constantly updating array of consumer products, the engine’s ability to rapidly ingest new manuals is allowing for dynamic, on-the-fly updates to training scenarios. We are also seeing fantastic results in Aluminium Assembly, where the SGE’s ability to decipher complex spatial diagrams is helping to create intuitive, visual-heavy XR guidance for factory floor workers. 

Our empirical evaluations have shown outstanding results. While analysing images (via the SGE) is naturally the most time-intensive part of the process, the overall system scales brilliantly. More importantly, we have achieved excellent Knowledge Graph coverage, meaning the AI is successfully capturing and connecting almost every relevant entity and procedure within the manuals without dropping critical information. 

Conclusion: A New Era for Industrial Training

The days of technicians sifting through hundreds of pages of static, disconnected technical manuals are numbered. Through the power of the Semantic Processing Engine, we are unlocking the knowledge trapped within these documents and breathing life into it for the immersive era. 

By automating the heaviest lifting of content creation, the MOTIVATE XR platform is drastically lowering the barrier to entry for Extended Reality. Subject-matter experts can now focus on what they do best – training the next generation of workers – while the AI seamlessly handles the transformation of their technical data into rich, interactive 3D experiences. 

Note: Cover image generated by AI

Authors

Universidad Politécnica de Madrid (UPM)

Alberto del Rio completed his PhD in Communication Technologies and Systems at the Universidad Politécnica de Madrid (UPM), building on a strong academic foundation that includes a Bachelor’s Degree in Mobile and Space Communications Engineering from the Universidad Carlos III of Madrid (UC3M) and a Master’s Degree in Signal Processing and Machine Learning for Big Data from UPM. As a researcher at UPM, he focuses on innovative projects in 5G networks and artificial intelligence, applying techniques like Reinforcement Learning to optimise network efficiency and enhance Quality of Experience (QoE) in multimedia communication systems. He has published extensively in leading journals advancing the integration of AI and telecommunications. 

Veronica Ruiz

Universidad Politécnica de Madrid (UPM)

Veronica Ruiz received the B.S. degree in Telecommunication Technologies and Services and the M.S. degree in Telecommunication Engineering from the Universidad Politécnica de Madrid (UPM), Madrid, Spain, in 2018 and 2020, respectively. Since 2020, she has been a Researcher with the ‘‘Visual Telecommunications Applications Group’’ (GATV), UPM, where she is actively developing technical solutions in EU projects for scenarios related to health, security, and sensors. 

Share

Categories

Related News

MOTIVATE XR advances its UX co-design work, turning stakeholder feedback into shared guidelines for usable and consistent XR tools....
Our team has developed an XR pipeline for commanding mobile robots via intuitive hand gestures. This work proves how embedded AI and wearables provide reliable,...
Discover how MOTIVATE XR D3.6 translates industrial needs into functional specifications and a cybersecure architecture for scalable XR training solutions....

Stay up to date

Subscribe to
MOTIVATE XR Newsletter

Subscription Form Homepage