PHD Discussions Logo

Ask, Learn and Accelerate in your PhD Research

Question Icon Post Your Answer

Question Icon

Did AlphaGo surpass humans because it had perfect target labels?

Was AlphaGo's superiority achieved through its reinforcement learning from self-play, rather than learning from perfect examples?

All Answers (1 Answers In All)

By Fathima M Answered 7 months ago

No. The key was not perfect labels but self-play. AlphaGo generated its own superior training data by playing millions of games against itself, discovering novel strategies beyond human knowledge. It used Monte Carlo Tree Search to evaluate positions, not a simple labeled dataset.

Your Answer