Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing difference between variances of two variables from the same population

    Greetings,
    For a methodological framework I am developing, I need to test the difference between variances of two variables from the same population. The closest procedure I can find in Stata is sdtest (https://www.stata.com/manuals/rsdtest.pdf), which is described as a variance comparison test but actually tests the difference between standard deviations rather than variances. Without boring you with the details about why I need to test variances rather than standard deviations, can anyone point me to a procedure in Stata that will do the job?
    Thanks,
    Jeff Edwards

  • #2
    Per the Methods and Formulas documentation associated with -sdtest-, the stock two-sample test there *is* a test on variances. The more robust Levene's test is (to my recollection) described in the literature as a variance test. However, my impression is that the substantial literature on variance tests raises questions about their performance under less than "ideal conditions," so I would want to consider using -permute- to do a randomization test, rather than rely on formula-based results. For example:
    Code:
    sysuse auto, clear
    permute foreign vardiff = (r(sd_1)^2 -r(sd_2)^2), reps(1000): ttest weight, by(foreign)

    Comment


    • #3
      Jeff:
      going parametric and trusting the fullfilment of the related assumptions:
      Code:
      use "C:\Program Files\Stata17\ado\base\a\auto.dta"
      . oneway price foreign, bonferroni
      
                              Analysis of variance
          Source              SS         df      MS            F     Prob > F
      ------------------------------------------------------------------------
      Between groups      1507382.66      1   1507382.66      0.17     0.6802
       Within groups       633558013     72   8799416.85
      ------------------------------------------------------------------------
          Total            635065396     73   8699525.97
      
      Bartlett's equal-variances test: chi2(1) =   0.7719    Prob>chi2 = 0.380
      
                            Comparison of Price by Car origin
                                      (Bonferroni)
      Row Mean-|
      Col Mean |   Domestic
      ---------+-----------
       Foreign |    312.259
               |      0.680
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Originally posted by Jeff Edwards View Post
        . . . variances of two variables from the same population. [emphasis added]
        The responses so far seem to be more for comparing variance of the same variable between two populations (treatment conditions etc.).

        I'm not sure that it usually makes sense to compare variances of two qualitatively different outcome variables under most circumstances—apples-to-oranges, and all.

        Is there some reason to believe that their scales ought to be the same?

        Comment


        • #5
          Joseph Coveney is absolutely right; I read the original question carelessly. I suspect there are contexts in which comparisons of variances are meaningful, though, one possible field being within paleontology, where there is a longstanding interest in tests to compare relative variation. (I don't recall, though, whether people there have done tests on "repeated measures" as described here.) The Methods and Formula section documenting -sdtest- in fact *does* show a test for a difference in variance between two variables. (It is does not say anything about the distinction between a test for the same variable across two groups vs. a test for two variables within one group, so that leaves me wondering a little.)

          I can't offhand think of how a permutation test would be done here, since I can't conceptualize what one would permute. I'd be interested in what someone else thinks.

          Comment


          • #6
            Thanks for your responses, and yes, I am interested in comparing the variances of two variables within a single sample. This comparison is useful in the context I am considering, which can be characterized as using different metrics to assess the same attitude. For instance, assume an attitude is measured using a rating scale ranging from 1-7 an an agree-disagree scale ranging from -3 to +3. The question being asked is the same for both metrics, and therefore scores on one scale should represent a simple transformation of scores on the other scale. If R is the 1-7 rating scale, and A is the -3 to +3 agreement scale, then A = R - 4. Furthermore, E(A) = E(R - 4) = E(R) - 4, and V(A) = V(R - 4) = V(R). Comparing the means and variances as expressed here can shed light on whether respondents are consistent when confronted with rating vs. agreement scales. As might be expected, I have found that the psychology of responding is more erratic than the formula A = R - 4, even when items are presented in adjacent sections of the same questionnaire. I have also cast this problem in terms of latent variable modeling, but comparing means and variances is a simple first step that most readers in my field (applied psychology) should understand. And after having inspected the Stata documentation more closely, I see that the F-test reported by sdtest is, in fact, a ratio of variances (i.e., the squares of the standard deviations), even though the F is labeled ratio = sd(x) / sd(y) in the output, where x and y are the two variables whose variances are being compared (I confirmed this by computing the ratio of the variances manually, which matched the F statistic reported by Stata). I am not sure why this comparison is cast as a ratio rather than a difference, but I suppose the rationale can be found in the literature (e.g., Armitage et al., 2002). Again, thanks for your help!

            Comment


            • #7
              Because you are interested in comparing what appear to be ordinal response variables, I'd say you should take a look at the literature on variation with ordinal variables. I'm a contributor to that literature, though not the latest word. I'd immodestly say you could benefit by checking out my program at -ssc describe ordvar-, and check out the more recent literature citing what I mention there. -ssc describe ineqord-, from Stephen Jenkins, is much newer and also would have some useful ideas. Finally, you might look at -ssc describe oglm-, which is an ordinal regression model that allows heteroscedasticity, hence modeling of ordinal variation. These may not specifically touch on the within-subjects aspect of your problem, but you might get some useful ideas. On the less optimistic side, comparing variability while controlling for differences in location (mean/median) is a well known problem, and my best guess is that it's even more difficult with an ordinal response.

              Comment


              • #8
                Originally posted by Jeff Edwards View Post
                . . .I am interested in comparing the variances of two variables within a single sample. T. . . using a rating scale ranging from 1-7 an an agree-disagree scale ranging from -3 to +3.
                You can try an approach like that illustrated below. The approach is implemented in a Stata program, called -testEm-, which takes the two rating-scale variables as a varlist. (It’s shown in the output below as the first program. The rest of the output shows results of some method-validation work.)

                The method displays good ability to discriminate scale (variance) differences from location (mean) differences in the exploration of its operating characteristics under selected pertinent use cases shown below, and so it shows promise for your problem.

                I’ve used -suest- in the implementation below, but if you’re going to stick with the major link functions, then you could easily use -gsem-, instead. -gsem- will allow the use of likelihood-ratio testing, which might give more consistent test size. (The test size with -suest- is in the neighborhood of 4½ to 5½%.)

                I’m not sure how sensitive the approach is to violations of the usual assumptions regarding presumed data-generating process (degree of model misspecification) etc. I’m also not familiar with the literature that Mike cites, and if this method has been explored before more thoroughly, then you might find more limitations to its applicability there.

                .ÿ
                .ÿversionÿ17.0

                .ÿ
                .ÿclearÿ*

                .ÿ
                .ÿseedem
                setÿseedÿ1737616889

                .ÿ
                .ÿ//ÿThisÿisÿtheÿactualÿestimationÿcommand:
                .ÿprogramÿdefineÿtestEm,ÿrclass
                ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
                ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿvarlist(numericÿmin=2ÿmax=2)
                ÿÿ3.ÿ
                .ÿÿÿÿÿÿÿÿÿforvaluesÿiÿ=ÿ1/2ÿ{
                ÿÿ4.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlocalÿrspÿ:ÿwordÿ`i'ÿofÿ`varlist'
                ÿÿ5.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿoprobitÿ`rsp'
                ÿÿ6.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿestimatesÿstoreÿM`i'
                ÿÿ7.ÿÿÿÿÿÿÿÿÿ}
                ÿÿ8.ÿÿÿÿÿÿÿÿÿsuestÿM1ÿM2
                ÿÿ9.ÿ
                .ÿÿÿÿÿÿÿÿÿ//ÿLocation
                .ÿÿÿÿÿÿÿÿÿlincomÿ///
                >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(_b[/M1:cut1]ÿ+ÿ_b[/M1:cut2]ÿ+ÿ_b[/M1:cut3]ÿ+ÿ_b[/M1:cut4]ÿ+ÿ_b[/M1:cut5]ÿ+ÿ_b[/M1:cut6])ÿ-ÿ///
                >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(_b[/M2:cut1]ÿ+ÿ_b[/M2:cut2]ÿ+ÿ_b[/M2:cut3]ÿ+ÿ_b[/M2:cut4]ÿ+ÿ_b[/M2:cut5]ÿ+ÿ_b[/M2:cut6])ÿ
                ÿ10.ÿÿÿÿÿÿÿÿÿtempnameÿp_loc
                ÿ11.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`p_loc'ÿ=ÿr(p)
                ÿ12.ÿ
                .ÿÿÿÿÿÿÿÿÿ//ÿScale
                .ÿÿÿÿÿÿÿÿÿtestÿ///
                >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ([/M1]cut2ÿ-ÿ[/M1]cut1ÿ=ÿ[/M2]cut2ÿ-ÿ[/M2]cut1)ÿ///
                >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ([/M1]cut3ÿ-ÿ[/M1]cut1ÿ=ÿ[/M2]cut3ÿ-ÿ[/M2]cut1)ÿ///
                >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ([/M1]cut4ÿ-ÿ[/M1]cut1ÿ=ÿ[/M2]cut4ÿ-ÿ[/M2]cut1)ÿ///
                >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ([/M1]cut5ÿ-ÿ[/M1]cut1ÿ=ÿ[/M2]cut5ÿ-ÿ[/M2]cut1)ÿ///
                >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ([/M1]cut6ÿ-ÿ[/M1]cut1ÿ=ÿ[/M2]cut6ÿ-ÿ[/M2]cut1)
                ÿ13.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿp_scaÿ=ÿr(p)
                ÿ14.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿp_locÿ=ÿ`p_loc'
                ÿ15.ÿend

                .ÿ
                .ÿ*
                .ÿ*ÿExaminingÿtheÿoperatingÿcharacteristicsÿofÿtheÿtestÿmethodÿabove
                .ÿ*
                .ÿ
                .ÿ/*ÿUtilityÿprogramÿforÿcreatingÿprobitÿcutpointsÿofÿsevenÿorderedÿcategoriesÿ
                >ÿÿÿÿforÿuseÿbyÿ-grologit-ÿ*/
                .ÿlocalÿline_sizeÿ`c(linesize)'

                .ÿsetÿlinesizeÿ80

                .ÿmata:
                -------------------------------------------------ÿmataÿ(typeÿendÿtoÿexit)ÿ------
                :ÿmataÿsetÿmatastrictÿon

                :ÿ
                :ÿclassÿCutSetÿ{
                >ÿÿÿÿÿÿÿÿÿprivate:
                >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿrealÿrowvectorÿCuts
                >ÿÿÿÿÿÿÿÿÿpublic:
                >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿfinalÿvoidÿsetCuts()
                >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿfinalÿvoidÿgetCuts()
                >ÿ}

                :ÿvoidÿfunctionÿCutSet::setCuts(realÿrowvectorÿCutValues)ÿ{
                >ÿ
                >ÿÿÿÿÿÿÿÿÿrealÿscalarÿcut_count
                >ÿÿÿÿÿÿÿÿÿcut_countÿ=ÿlength(CutValues)
                >ÿ
                >ÿÿÿÿÿÿÿÿÿifÿ(cut_count==ÿ1)ÿCutsÿ=ÿinvnormal((1..CutValues-1)ÿ:/ÿCutValues)
                >ÿÿÿÿÿÿÿÿÿelseÿCutsÿ=ÿinvnormal(CutValuesÿ:/ÿ(cut_countÿ+ÿ1))
                >ÿ}

                :ÿvoidÿfunctionÿCutSet::getCuts()ÿ
                >ÿÿÿÿÿÿÿÿÿst_local("cuts",ÿinvtokens(strofreal(Cuts,ÿ"%18.0g")))

                :ÿ
                :ÿend
                --------------------------------------------------------------------------------

                .ÿsetÿlinesizeÿ`line_size'

                .ÿ
                .ÿmata:ÿCutBankÿ=ÿCutSet(2)

                .ÿ
                .ÿ//ÿEvenlyÿspacedÿcutpoints
                .ÿmata:ÿCutBank[1].setCuts(7)

                .ÿ
                .ÿ//ÿArbitrarilyÿspacedÿcutpoints
                .ÿmata:ÿCutBank[2].setCuts((0.5,ÿ0.75,ÿ1.5,ÿ2.5,ÿ4.5,ÿ6))

                .ÿ
                .ÿ/*ÿSimulationÿprogramÿforÿevaluationÿofÿoperatingÿcharacteristicsÿ*/
                .ÿprogramÿdefineÿsimEm
                ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
                ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ,ÿ[Cuts(string)ÿDelta(realÿ0)ÿSd(realÿ1)ÿn(integerÿ300)]
                ÿÿ3.ÿ
                .ÿÿÿÿÿÿÿÿÿestimatesÿdropÿ_all
                ÿÿ4.ÿÿÿÿÿÿÿÿÿdropÿ_all
                ÿÿ5.ÿÿÿÿÿÿÿÿÿdrawnormÿl1ÿl2,ÿdoubleÿmean(0ÿ`delta')ÿsd(1ÿ`sd')ÿcorr(1ÿ0.5ÿ\ÿ0.5ÿ1)ÿn(`n')
                ÿÿ6.ÿ
                .ÿÿÿÿÿÿÿÿÿifÿ"`cuts'"ÿ==ÿ""ÿmata:ÿCutBank[1].getCuts()
                ÿÿ7.ÿÿÿÿÿÿÿÿÿelseÿmata:ÿCutBank[2].getCuts()
                ÿÿ8.ÿ
                .ÿÿÿÿÿÿÿÿÿforvaluesÿiÿ=ÿ1/2ÿ{
                ÿÿ9.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgrologitÿl`i',ÿgenerate(m`i')ÿcuts(`cuts')ÿprobit
                ÿ10.ÿÿÿÿÿÿÿÿÿ}
                ÿ11.ÿ
                .ÿÿÿÿÿÿÿÿÿtestEmÿm?
                ÿ12.ÿend

                .ÿ
                .ÿ/*ÿOne-offÿutilityÿprogramÿforÿtestÿreportÿ*/
                .ÿprogramÿdefineÿreportEm
                ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
                ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntax
                ÿÿ3.ÿ
                .ÿÿÿÿÿÿÿÿÿforeachÿvarÿofÿvarlistÿp_*ÿ{
                ÿÿ4.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿassertÿ!missing(`var')
                ÿÿ5.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlocalÿwhichÿ=ÿsubstr("`var'",ÿ-3,ÿ3)
                ÿÿ6.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgenerateÿbyteÿpos_`which'ÿ=ÿ`var'ÿ<ÿ0.05
                ÿÿ7.ÿÿÿÿÿÿÿÿÿ}
                ÿÿ8.ÿÿÿÿÿÿÿÿÿformatÿp*ÿ%05.3f
                ÿÿ9.ÿÿÿÿÿÿÿÿÿsummarizeÿpos*,ÿformat
                ÿ10.ÿ
                .ÿend

                .ÿ
                .ÿ/*ÿSelectedÿuseÿcasesÿtoÿassessÿabilityÿtoÿdiscriminateÿdifferencesÿinÿ
                >ÿÿÿÿlocationÿfromÿthoseÿofÿscaleÿ*/
                .ÿ
                .ÿ//ÿSameÿmeanÿ(LOCation)ÿandÿvarianceÿ(SCAle)
                .ÿquietlyÿsimulateÿp_locÿ=ÿr(p_loc)ÿp_scaÿ=ÿr(p_sca),ÿreps(1000):ÿsimEm

                .ÿreportEm

                ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
                -------------+---------------------------------------------------------
                ÿÿÿÿÿpos_locÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.053ÿÿÿÿÿÿÿ0.224ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
                ÿÿÿÿÿpos_scaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.057ÿÿÿÿÿÿÿ0.232ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

                .ÿ
                .ÿ//ÿDitto,ÿbutÿarbitraryÿcutpointsÿ(Nÿincreasedÿinÿorderÿtoÿavoidÿsparsity)
                .ÿquietlyÿsimulateÿp_locÿ=ÿr(p_loc)ÿp_scaÿ=ÿr(p_sca),ÿreps(1000):ÿ///
                >ÿÿÿÿÿÿÿÿÿsimEm,ÿc("Arbitrary")ÿn(1000)

                .ÿreportEm

                ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
                -------------+---------------------------------------------------------
                ÿÿÿÿÿpos_locÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.055ÿÿÿÿÿÿÿ0.228ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
                ÿÿÿÿÿpos_scaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.046ÿÿÿÿÿÿÿ0.210ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

                .ÿ
                .ÿ//ÿDifferentÿmean;ÿsameÿvariance
                .ÿquietlyÿsimulateÿp_locÿ=ÿr(p_loc)ÿp_scaÿ=ÿr(p_sca),ÿreps(1000):ÿ///
                >ÿÿÿÿÿÿÿÿÿsimEmÿ,ÿd(0.5)

                .ÿreportEm

                ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
                -------------+---------------------------------------------------------
                ÿÿÿÿÿpos_locÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.997ÿÿÿÿÿÿÿ0.055ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
                ÿÿÿÿÿpos_scaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.055ÿÿÿÿÿÿÿ0.228ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

                .ÿ
                .ÿ//ÿSameÿmean;ÿdifferentÿvariance
                .ÿquietlyÿsimulateÿp_locÿ=ÿr(p_loc)ÿp_scaÿ=ÿr(p_sca),ÿreps(1000):ÿ///
                >ÿÿÿÿÿÿÿÿÿsimEmÿ,ÿs(1.8)

                .ÿreportEm

                ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
                -------------+---------------------------------------------------------
                ÿÿÿÿÿpos_locÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.048ÿÿÿÿÿÿÿ0.214ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
                ÿÿÿÿÿpos_scaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.867ÿÿÿÿÿÿÿ0.340ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

                .ÿ
                .ÿ//ÿDifferentÿmeanÿandÿvariance
                .ÿquietlyÿsimulateÿp_locÿ=ÿr(p_loc)ÿp_scaÿ=ÿr(p_sca),ÿreps(1000):ÿ///
                >ÿÿÿÿÿÿÿÿÿsimEmÿ,ÿd(0.5)ÿs(1.8)

                .ÿreportEm

                ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
                -------------+---------------------------------------------------------
                ÿÿÿÿÿpos_locÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.900ÿÿÿÿÿÿÿ0.300ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
                ÿÿÿÿÿpos_scaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.865ÿÿÿÿÿÿÿ0.342ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

                .ÿ
                .ÿexit

                endÿofÿdo-file


                .
                Attached Files

                Comment

                Working...
                X