Sorry Michael - just plain wrong on this one.
In terms of the sound kit makes our ears are the only measurement equipment that is relevant. DBT is the principal method of eliminating external influences. If we can't distinguish between two bits of kit on the basis of hearing alone, then any jaw-drops etc are clearly just psychology and not directly related to the sound.
The point of repeated tests (DBT or whatever) is to determine the presence or absence of consistent differences in the presence of noise (aka intersample variability) by reducing/averaging the effect of the noise. A scatter of results in the no-difference test is indeed saying that our measuring equipment (ears etc) is unreliable, hence the necessity of multiple tests to reduce the effect of noise on our measurements.
The fact that our perceptions change over time and in response to repeated stimuli is a complicating factor on the simplistic analysis that I'm sure some statisticians have worked out how to accommodate.
The fact remains that an equivalent DBT is clearly more valid than anything sighted. How large a difference is required for a sighted test to become equally valid is probably the issue here, vs the extra hassle of doing it DB. I really can't see what the anti-DBT camp is arguing about: conceed that DBT is better, but that sighted was good enough given the amount of dosh we were prepared to throw at things, and the risk we were willing to take that we were deluded. If making the right choice were a matter of life and death I'd do an extensive DBT and buy the cheapest of the best bunch.
Absolutely. They are an attempt to remove bias and extraneous influence from a subjective test.