Mesa-optimization
AI alignment phenomenon and challenge
Mesa-optimization refers to a phenomenon in advanced machine learning where a model trained by an outer optimizer—such as stochastic gradient descent—develops into an optimizer itself, known as a mesa-optimizer. Rather than merely executing learned patterns of behavior, the system actively optimizes for its own internal goals, which may not align with those intended by human designers. This raises significant concerns in the field of AI alignment, particularly in cases where the system’s internal objectives diverge from its original training goals, a situation termed inner misalignment.[1][2]
This article may incorporate text from a large language model. (August 2025)
|
Source: Wikipedia. License: CC BY-SA 4.0. Changes may have been made. See authors on source page history.
Eksplorasi konten lain dari Tinta Emas
Berlangganan untuk dapatkan pos terbaru lewat email.