Hacker News: WebRL: Training LLM Web Agents via Self-Evolving Online Reinforcement Learning

Source URL: https://arxiv.org/abs/2411.02337
Source: Hacker News
Title: WebRL: Training LLM Web Agents via Self-Evolving Online Reinforcement Learning

Feedly Summary: Comments

AI Summary and Description: Yes

Summary: The paper introduces WebRL, a novel framework that employs self-evolving online curriculum reinforcement learning to enhance the training of large language models (LLMs) as web agents. This development is particularly relevant in the context of open LLMs, addressing significant obstacles and demonstrating superior performance compared to existing proprietary models.

Detailed Description:

The paper titled “WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning” presents a significant advancement in the area of autonomous agents powered by large language models (LLMs). Here are the major points covered in the study:

– **Introduction of WebRL Framework**:
– WebRL leverages self-evolving online curriculum reinforcement learning to train high-performance web agents using open LLMs.

– **Challenges Addressed**:
– The paper identifies and seeks to overcome three primary challenges faced by LLM web agents:
– **Scarcity of Training Tasks**: The lack of diverse training scenarios leads to limited learning experiences.
– **Sparse Feedback Signals**: In an online learning context, inadequate feedback can hinder the model’s ability to learn effectively.
– **Policy Distribution Drift**: Changes over time in the training data distribution can lead to performance inconsistencies.

– **Innovative Techniques**:
– The framework incorporates several advanced techniques:
– **Self-Evolving Curriculum**: This approach allows the generation of new training tasks based on previous unsuccessful attempts, adapting the learning process in real-time.
– **Outcome-Supervised Reward Model (ORM)**: A robust reward system that optimizes learning by focusing on outcomes rather than predefined criteria.
– **Adaptive Reinforcement Learning Strategies**: These strategies ensure continuous improvement of the web agents.

– **Performance Results**:
– When applied to the open Llama-3.1 and GLM-4 models, the WebRL framework significantly enhanced their abilities:
– The success rate for Llama-3.1-8B was improved from 4.8% to 42.4%.
– The success rate for GLM-4-9B increased from 6.1% to 43%.
– These improvements also showcase that the open models not only surpass proprietary models like GPT-4-Turbo (17.6%) and GPT-4o (13.9%) but also outpace existing open LLM-trained agents (e.g., AutoWebGLM, 18.2%).

– **Implications for Future Research and Development**:
– The findings indicate that WebRL could significantly bridge the gap between open and proprietary LLM-based web agents, encouraging accessibility and enhancing the development of powerful autonomous web interaction systems.

This research holds substantial relevance for professionals engaged in AI and LLM development, particularly in enhancing the functionality and application of open models in real-world scenarios, which will be crucial for advancing the field of AI security and application in dynamic environments.