I´m currently revising a manuscript, and have run into some problems. The aim of the paper is to track differences in health between two groups over time in repeated cross-sectional surveys. As most of the health outcomes in the paper are dichotomous we are mainly using logistic regressions. As a first step in the manuscript we plotted out the development in terms of predicted probabilities. Then we went on to explore it further in terms of Average Marginal Effects. As the predicted probabilities were mainly intended to be descriptive, we did not present any confidence intervals for them. All of this was done using the margins command.
However, now a reviewer have asked that we provide confidence intervals even for the predicted probabilities. This becomes problematic since the predicted probabilities for the different groups often have overlapping confidence intervals (suggesting non-significant differences), while the average marginal effects (from the same models) shows that there are indeed statistically significant differences. Thus, reporting the confidence intervals for the predicted probabilities would lead to an underestimation of the statistically significant differences. Why is this, and is it really reasonable to compare the confidence intervals for the predicted probabilities from different groups in order to establish statistically significant differences?
However, now a reviewer have asked that we provide confidence intervals even for the predicted probabilities. This becomes problematic since the predicted probabilities for the different groups often have overlapping confidence intervals (suggesting non-significant differences), while the average marginal effects (from the same models) shows that there are indeed statistically significant differences. Thus, reporting the confidence intervals for the predicted probabilities would lead to an underestimation of the statistically significant differences. Why is this, and is it really reasonable to compare the confidence intervals for the predicted probabilities from different groups in order to establish statistically significant differences?
Comment