I have a small iota of doubt regarding the calculation of the human level performance. Human level performance is the best possible low error generation among humans ( maybe anindividual, group of experts). So what exactly my doubt is all the models are analysed based on what the trainers have understood from the error analysis, even more obviously those are humans even. So on what parameters or what is scale over which the model is trained to improve the human level performance. I maybe sounding senseless, Forgive me for that. But for whom I am understood , please help me.