Slashdot: Waymo Explores Using Google’s Gemini To Train Its Robotaxis

Source URL: https://tech.slashdot.org/story/24/11/01/2150228/waymo-explores-using-googles-gemini-to-train-its-robotaxis?utm_source=rss1.0mainlinkanon&utm_medium=feed
Source: Slashdot
Title: Waymo Explores Using Google’s Gemini To Train Its Robotaxis

Feedly Summary:

AI Summary and Description: Yes

Summary: Waymo’s introduction of its new training model for autonomous driving, called EMMA, highlights a significant advancement in the application of multimodal large language models (MLLMs) in operational environments beyond traditional uses. This novel integration not only addresses existing challenges in autonomous driving but also indicates a potential paradigm shift in how AI can be utilized in dynamic real-world scenarios.

Detailed Description:
Waymo, a leader in the autonomous vehicle space, has developed a new training model named EMMA (End-to-End Multimodal Model for Autonomous Driving) utilizing Google’s MLLM, Gemini. This development marks a notable step in the evolution of AI applications within autonomous driving, indicating that the capabilities of large language models may extend far beyond typical tasks associated with AI, such as chatbots.

– **Key Features of EMMA**:
– **Integration of MLLMs**: EMMA represents one of the first attempts to employ an MLLM like Gemini in autonomous vehicle functionality, suggesting MLLMs can play a central role in operational AI.
– **Processing Sensor Data**: The model is designed to process complex sensor data and predict future trajectories, allowing robotaxis to navigate and make decisions.
– **Overcoming Modular Limitations**: Traditional autonomous systems rely on separate modules for tasks like perception and mapping, often leading to scaling issues due to accumulated errors. EMMA aims to mitigate these by providing a more integrated solution.

– **Advantages over Conventional Approaches**:
– **Generalist Knowledge**: MLLMs trained on extensive data offer rich contextual knowledge that surpasses conventional driving logs, thus allowing for improved decision-making in varied driving contexts.
– **Improved Reasoning**: Techniques like chain-of-thought reasoning enable the model to break down complex tasks systematically, thus enhancing its operational efficiency under challenging scenarios.

– **Real-World Navigation Aid**: EMMA has already shown promise in helping the robotaxis effectively navigate complex environments by adapting to various obstacles like animals or construction projects.

– **Acknowledged Limitations**:
– **Computational Constraints**: The model currently does not utilize 3D sensor inputs, such as those from lidar or radar, to avoid excessive computational demands and can only handle limited image frames.
– **Potential Risks**: The research paper hints at some underlying risks associated with using MLLMs, such as the tendency for AI systems to “hallucinate” or misinterpret simple tasks, which poses a challenge for reliability in critical driving scenarios.

In summary, Waymo’s EMMA signifies an innovative use of multimodal large language models, potentially reshaping the future of autonomous vehicle technology by addressing key operational challenges and validating the broader application spectrum of AI models. However, careful consideration of its limitations and inherent risks will be essential in further development and deployment.