I'll need to use it more and see. But normally, a ranking where users see side-by-side answers from two models and they don't know what models they are, and they select the better answer should be a pretty accurate ranking.
You are viewing a single comment's thread from: