Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical test for non-parametric panel data

    I'm analyzing some data and I've run into a problem I've never had before. My study was an experiment and had both between and within subject variables, so I need to treat it as panel data. My DV was measured on a 1-7 Likert scale. The frequencies are as follows: 1-150, 2-80, 3-66, 4-82, 5-126, 6-129, and 7-195. As you can see, it's not a normal distribution (frequencies actually increase as the scale goes up), and it may even have some zero saturation. Any suggestions for what stat I should use? Maybe a non-parametric that will accommodate panel data?

  • #2
    A Likert scale is categorical (not continuous) and is not normally distributed. And many DV are not normal.

    If you're not combining multiple Likert scales. then you probably need an ordered model, which is a pain but likely best.

    Comment


    • #3
      I don't follow how a scale from 1 to 7 has zero saturation. But in some circumstances a mean of ordinal scores can be useful, despite the whole legion of measurement theorists laying down the law otherwise. (Gradepoint averages, any one? Review score means on websites?) Whether those circumstances include yours is hard to say. I'd call your distribution bimodal and not just not normal.

      The real question is what kind of questions you have about your outcome (you say DV). If they boil down to questions (exact or rough) about its mean over time and between panels, then the usual models may not be outrageous, so long as you base inferences on robust standard errors.

      My own experience with models for ordinal data boils down to willingness to believe that they are correct in principle, but unwillingness to work with models that I find awkward in practice, and sometimes just as implausible as the alternatives.

      Comment


      • #4
        Whether Likert scales can be treated as continuous interval level variables or must be treated as ordinal is a controversy in its own right. I have ventured my opinion on this elsewhere before, but will say nothing about it here.

        What I want to do is just emphasize that there is no regression model that requires that the outcome variable have a normal distribution. No, OLS does not require that--that is a common misconception, probably common because it is commonly (mis)taught that way. The truth about OLS is this: there is a theorem that if the residuals of the regression are normally distributed, and the other usual assumptions about OLS hold, then the F and t statistics calculated by OLS actually have F and t sampling distributions with the degrees of freedom shown in OLS outputs. Note that is the residuals, not the outcome variable itself, hypothesized to be normally distributed. Moreover, note that the implication in the theorem goes only in one direction: if the residuals are normally distributed, then the F and t statistics work. The converse is not true: when the sample size is large enough, it can be shown using the central limit theorem that the F and t statistics still work almost regardless of what the residual distribution looks like.

        Added: Crossed with #3, which deals with other issues.

        Comment


        • #5
          Hello Thomas Boling. If you do decide it is best to treat your DV as a set of ordered categories, one option would be to use the -meologit- command, with subjects at level 2, and with the repeated measures (at level 1) clustered within subjects.

          Also, at the risk of being pedantic, let me note that data are neither parametric nor non-parametric. Those terms are typically used to describe statistical tests. But even in that case, the distinction between parametric and non-parametric is quite slippery. See this VassarStats comment, for example. See also Ronan Conroy's 2012 Stata Journal article. As he notes, the Wilcoxon-Mann-Whitney test actually estimates a parameter, despite being almost universally described as a non-parametric test.

          Cheers,
          Bruce
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            Not sure what this is really saying, but I went down the rabbit hole so I thought I'd share it.

            I create a latent Y, cut it into pieces to create a Likert, and then do the regressions to see how biased the coefficients are.

            ologit is included, but not clear how to compare coefficients. Some fit measures are provided.

            Bias can be quite large, but the predictions are pretty close.


            Code:
            clear all
            set seed 12345
            
            scalar beta1 = 1
            scalar beta2 = 0
            scalar beta3 = 2
            
            postfile sim  scale trueb1 truerej1 likertb1 likertrej1 trueb2 truerej2         ///
                likertb2 likertrej2 trueb3 truerej3 likertb3 likertrej3 ologrej1 ologrej2     ///
                ologrej3 corrtrue corrlin corrologit ologfitrat rat             ///
                using likert, replace
            forv n = 3/10 {
            forvalues i = 1/100 {
                quietly {
                drop _all
                set obs 1000
                local scale = `n'
                g x1 = rgamma(5,2)
                g x2 = runiform()
                g x3 = runiform()>0.7
                g e = rnormal(0,2)
                g y = 1 + beta1*x1 + beta2*x2 + beta3*x3 + e
                egen yc = xtile(y) , n(`n')
                summ y
                local y1 = r(mean)
                summ yc
                local y2 = r(mean)
                local rat = `y1' / `y2'
                    
                eststo e1: reg y x1 x2 x3, r 
                matrix R = r(table)
                predict linfit , xb
                local trueb1 = R[1,1]
                local truerej1 = R[4,1] <= 0.05 
                local trueb2 = R[1,2]
                local truerej2 = R[4,2] <= 0.05
                local trueb3 = R[1,3]
                local truerej3 = R[4,3] <= 0.05
                
                eststo e2: reg yc x1 x2 x3, r
                matrix R = r(table)
                predict likertfit , xb
                local likertb1 = R[1,1] * `rat'
                local likertrej1 = R[4,1] <= 0.05
                local likertb2 = R[1,2] * `rat'
                local likertrej2 = R[4,2] <= 0.05 
                local likertb3 = R[1,3] * `rat'
                local likertrej3 = R[4,3] <= 0.05 
                
                ologit yc x1 x2 x3
                predict ologfit , xb
                local ologrej1 = R[4,1] <= 0.05
                local ologrej2 = R[4,2] <= 0.05 
                local ologrej3 = R[4,3] <= 0.05 
                
                correl y linfit 
                local corrtrue = r(rho)
                correl y likertfit
                local corrlin = r(rho)
                correl y ologfit
                local corrologit = r(rho)
                local ologfitrat = `corrologit' / `corrlin'
                
            }
                post sim (`scale') (`trueb1') (`truerej1') (`likertb1') (`likertrej1')         ///
                    (`trueb2') (`truerej2') (`likertb2') (`likertrej2') (`trueb3')             ///
                    (`truerej3') (`likertb3') (`likertrej3') (`ologrej1') (`ologrej2')         ///
                    (`ologrej3') (`corrtrue') (`corrlin') (`corrologit') (`ologfitrat') (`rat')
            }
            
            di "Scale `n' Complete."
            
            }
            postclose sim
            
            preserve
            use likert, clear 
            g scale2 = scale+0.1
            g scale3 = scale+0.2
            forv i = 1/3 {
                g biasb`i' = abs((trueb`i' - likertb`i')/(trueb`i'))
                egen m_biasb`i' = mean(biasb`i') , by(scale)
            }
            tabstat trueb1 likertb1 biasb1 trueb2 likertb2 biasb2 trueb3 likertb3 biasb3 , by(scale)
            tabstat truerej1 likertrej1 ologrej1 truerej2 likertrej2 ologrej2 truerej3 likertrej3 ologrej3 , by(scale)
            tabstat corr* ologf* , by(scale)
            set graph off
            global OPT xlabel(3(1)10) xtitle(Likert Item Scale) 
                twoway scatter biasb1 scale , jitter(1) || connected m_biasb1 scale, $OPT name(biasb1, replace)
                twoway scatter biasb2 scale , jitter(1) || connected m_biasb2 scale, $OPT  name(biasb2, replace)
                twoway scatter biasb3 scale , jitter(1) || connected m_biasb3 scale, $OPT  name(biasb3, replace)
            set graph on
            graph combine biasb1 biasb2 biasb3 , name(bias, replace)
            twoway scatter corrtrue scale , jitter(1) ||                                     ///
                scatter corrlin scale2 , jitter(1) ||                                         ///
                scatter corrologit scale3 , jitter(1) xlabel(3(1)10) xtitle(Corr of Fit) name(fit, replace)
            restore

            Comment

            Working...
            X