What do you mean, all different? Most are exactly the same. The first 4 are a bit low and the last 3 a bit high, but last 2 and first also extremely wide, so irrelevant anyway. Everything else agrees, most within >99 % confidence with only slight differences on the absolute values.
9 of the teams reaching a different conclusion is a pretty large group. Nearly a third of the teams, using what I assume are legitimate methods, disagree with the findings of the other 20 teams.
Sure, not all teams disagree, but a lot do. So the issue is whether or not the current research paradigm correctly answers “subjective” questions such as these.
If we only look that those with p <0.05 (green) and with 95 % confidence interval, then there are 17 teams left. And they all(!) agree with more than 95% conference.
So ignore all non-significant results? What’s to say those methods result in findings closer to the truth than the methods with no significant results.
The issue is that so many seemingly legitimate methods produce different findings with the same data.
What do you mean, all different? Most are exactly the same. The first 4 are a bit low and the last 3 a bit high, but last 2 and first also extremely wide, so irrelevant anyway. Everything else agrees, most within >99 % confidence with only slight differences on the absolute values.
I wish science was a simple as taking the mean and confidence intervals.
9 of the teams reaching a different conclusion is a pretty large group. Nearly a third of the teams, using what I assume are legitimate methods, disagree with the findings of the other 20 teams.
Sure, not all teams disagree, but a lot do. So the issue is whether or not the current research paradigm correctly answers “subjective” questions such as these.
If we only look that those with p <0.05 (green) and with 95 % confidence interval, then there are 17 teams left. And they all(!) agree with more than 95% conference.
And you missed the pint in the very article about how p value isn’t really as useful as it’s been touted.
That’s not the point, which is that the results are indeed mostly very similar, unlike what OP claims.
I never said that only looking at p values is a good idea or anything else like that.
So ignore all non-significant results? What’s to say those methods result in findings closer to the truth than the methods with no significant results.
The issue is that so many seemingly legitimate methods produce different findings with the same data.
And if you get only the statistically significant ones, it gets even more visible.