Tag: alignment robustness
-
Schneier on Security: Evaluating the Effectiveness of Reward Modeling of Generative AI Systems
Source URL: https://www.schneier.com/blog/archives/2024/09/evaluating-the-effectiveness-of-reward-modeling-of-generative-ai-systems-2.html Source: Schneier on Security Title: Evaluating the Effectiveness of Reward Modeling of Generative AI Systems Feedly Summary: New research evaluating the effectiveness of reward modeling during Reinforcement Learning from Human Feedback (RLHF): “SEAL: Systematic Error Analysis for Value ALignment.” The paper introduces quantitative metrics for evaluating the effectiveness of modeling and aligning…