Testing difference between variances of two variables from the same population

Jeff Edwards

Join Date: Aug 2016

Posts: 12
#1

Testing difference between variances of two variables from the same population

06 Jan 2022, 14:20

Greetings,
For a methodological framework I am developing, I need to test the difference between variances of two variables from the same population. The closest procedure I can find in Stata is sdtest (https://www.stata.com/manuals/rsdtest.pdf), which is described as a variance comparison test but actually tests the difference between standard deviations rather than variances. Without boring you with the details about why I need to test variances rather than standard deviations, can anyone point me to a procedure in Stata that will do the job?
Thanks,
Jeff Edwards
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2425
#2

06 Jan 2022, 16:25

Per the Methods and Formulas documentation associated with -sdtest-, the stock two-sample test there *is* a test on variances. The more robust Levene's test is (to my recollection) described in the literature as a variance test. However, my impression is that the substantial literature on variance tests raises questions about their performance under less than "ideal conditions," so I would want to consider using -permute- to do a randomization test, rather than rely on formula-based results. For example:

Code:

sysuse auto, clear permute foreign vardiff = (r(sd_1)^2 -r(sd_2)^2), reps(1000): ttest weight, by(foreign)
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17748

07 Jan 2022, 01:50

Jeff:
going parametric and trusting the fullfilment of the related assumptions:

Code:

use "C:\Program Files\Stata17\ado\base\a\auto.dta"
. oneway price foreign, bonferroni

                        Analysis of variance
    Source              SS         df      MS            F     Prob > F
------------------------------------------------------------------------
Between groups      1507382.66      1   1507382.66      0.17     0.6802
 Within groups       633558013     72   8799416.85
------------------------------------------------------------------------
    Total            635065396     73   8699525.97

Bartlett's equal-variances test: chi2(1) =   0.7719    Prob>chi2 = 0.380

                      Comparison of Price by Car origin
                                (Bonferroni)
Row Mean-|
Col Mean |   Domestic
---------+-----------
 Foreign |    312.259
         |      0.680

Kind regards,
Carlo
(Stata 19.0)

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#4

07 Jan 2022, 05:15

Originally posted by Jeff Edwards View Post

. . . variances of two variables from the same population. [emphasis added]

The responses so far seem to be more for comparing variance of the same variable between two populations (treatment conditions etc.).

I'm not sure that it usually makes sense to compare variances of two qualitatively different outcome variables under most circumstances—apples-to-oranges, and all.

Is there some reason to believe that their scales ought to be the same?
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2425
#5

07 Jan 2022, 09:06

Joseph Coveney is absolutely right; I read the original question carelessly. I suspect there are contexts in which comparisons of variances are meaningful, though, one possible field being within paleontology, where there is a longstanding interest in tests to compare relative variation. (I don't recall, though, whether people there have done tests on "repeated measures" as described here.) The Methods and Formula section documenting -sdtest- in fact *does* show a test for a difference in variance between two variables. (It is does not say anything about the distinction between a test for the same variable across two groups vs. a test for two variables within one group, so that leaves me wondering a little.)

I can't offhand think of how a permutation test would be done here, since I can't conceptualize what one would permute. I'd be interested in what someone else thinks.
Comment
Jeff Edwards

Join Date: Aug 2016

Posts: 12
#6

07 Jan 2022, 10:31

Thanks for your responses, and yes, I am interested in comparing the variances of two variables within a single sample. This comparison is useful in the context I am considering, which can be characterized as using different metrics to assess the same attitude. For instance, assume an attitude is measured using a rating scale ranging from 1-7 an an agree-disagree scale ranging from -3 to +3. The question being asked is the same for both metrics, and therefore scores on one scale should represent a simple transformation of scores on the other scale. If R is the 1-7 rating scale, and A is the -3 to +3 agreement scale, then A = R - 4. Furthermore, E(A) = E(R - 4) = E(R) - 4, and V(A) = V(R - 4) = V(R). Comparing the means and variances as expressed here can shed light on whether respondents are consistent when confronted with rating vs. agreement scales. As might be expected, I have found that the psychology of responding is more erratic than the formula A = R - 4, even when items are presented in adjacent sections of the same questionnaire. I have also cast this problem in terms of latent variable modeling, but comparing means and variances is a simple first step that most readers in my field (applied psychology) should understand. And after having inspected the Stata documentation more closely, I see that the F-test reported by sdtest is, in fact, a ratio of variances (i.e., the squares of the standard deviations), even though the F is labeled ratio = sd(x) / sd(y) in the output, where x and y are the two variables whose variances are being compared (I confirmed this by computing the ratio of the variances manually, which matched the F statistic reported by Stata). I am not sure why this comparison is cast as a ratio rather than a difference, but I suppose the rationale can be found in the literature (e.g., Armitage et al., 2002). Again, thanks for your help!
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2425
#7

07 Jan 2022, 10:55

Because you are interested in comparing what appear to be ordinal response variables, I'd say you should take a look at the literature on variation with ordinal variables. I'm a contributor to that literature, though not the latest word. I'd immodestly say you could benefit by checking out my program at -ssc describe ordvar-, and check out the more recent literature citing what I mention there. -ssc describe ineqord-, from Stephen Jenkins, is much newer and also would have some useful ideas. Finally, you might look at -ssc describe oglm-, which is an ordinal regression model that allows heteroscedasticity, hence modeling of ordinal variation. These may not specifically touch on the within-subjects aspect of your problem, but you might get some useful ideas. On the less optimistic side, comparing variability while controlling for differences in location (mean/median) is a well known problem, and my best guess is that it's even more difficult with an ordinal response.
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#8

08 Jan 2022, 21:25

Originally posted by Jeff Edwards View Post

. . .I am interested in comparing the variances of two variables within a single sample. T. . . using a rating scale ranging from 1-7 an an agree-disagree scale ranging from -3 to +3.

You can try an approach like that illustrated below. The approach is implemented in a Stata program, called -testEm-, which takes the two rating-scale variables as a varlist. (It’s shown in the output below as the first program. The rest of the output shows results of some method-validation work.)

The method displays good ability to discriminate scale (variance) differences from location (mean) differences in the exploration of its operating characteristics under selected pertinent use cases shown below, and so it shows promise for your problem.

I’ve used -suest- in the implementation below, but if you’re going to stick with the major link functions, then you could easily use -gsem-, instead. -gsem- will allow the use of likelihood-ratio testing, which might give more consistent test size. (The test size with -suest- is in the neighborhood of 4½ to 5½%.)

I’m not sure how sensitive the approach is to violations of the usual assumptions regarding presumed data-generating process (degree of model misspecification) etc. I’m also not familiar with the literature that Mike cites, and if this method has been explored before more thoroughly, then you might find more limitations to its applicability there.

.ÿ
.ÿversionÿ17.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿseedem
setÿseedÿ1737616889

.ÿ
.ÿ//ÿThisÿisÿtheÿactualÿestimationÿcommand:
.ÿprogramÿdefineÿtestEm,ÿrclass
ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿvarlist(numericÿmin=2ÿmax=2)
ÿÿ3.ÿ
.ÿÿÿÿÿÿÿÿÿforvaluesÿiÿ=ÿ1/2ÿ{
ÿÿ4.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlocalÿrspÿ:ÿwordÿ`i'ÿofÿ`varlist'
ÿÿ5.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿoprobitÿ`rsp'
ÿÿ6.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿestimatesÿstoreÿM`i'
ÿÿ7.ÿÿÿÿÿÿÿÿÿ}
ÿÿ8.ÿÿÿÿÿÿÿÿÿsuestÿM1ÿM2
ÿÿ9.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿLocation
.ÿÿÿÿÿÿÿÿÿlincomÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(_b[/M1:cut1]ÿ+ÿ_b[/M1:cut2]ÿ+ÿ_b[/M1:cut3]ÿ+ÿ_b[/M1:cut4]ÿ+ÿ_b[/M1:cut5]ÿ+ÿ_b[/M1:cut6])ÿ-ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(_b[/M2:cut1]ÿ+ÿ_b[/M2:cut2]ÿ+ÿ_b[/M2:cut3]ÿ+ÿ_b[/M2:cut4]ÿ+ÿ_b[/M2:cut5]ÿ+ÿ_b[/M2:cut6])ÿ
ÿ10.ÿÿÿÿÿÿÿÿÿtempnameÿp_loc
ÿ11.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`p_loc'ÿ=ÿr(p)
ÿ12.ÿ
.ÿÿÿÿÿÿÿÿÿ//ÿScale
.ÿÿÿÿÿÿÿÿÿtestÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ([/M1]cut2ÿ-ÿ[/M1]cut1ÿ=ÿ[/M2]cut2ÿ-ÿ[/M2]cut1)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ([/M1]cut3ÿ-ÿ[/M1]cut1ÿ=ÿ[/M2]cut3ÿ-ÿ[/M2]cut1)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ([/M1]cut4ÿ-ÿ[/M1]cut1ÿ=ÿ[/M2]cut4ÿ-ÿ[/M2]cut1)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ([/M1]cut5ÿ-ÿ[/M1]cut1ÿ=ÿ[/M2]cut5ÿ-ÿ[/M2]cut1)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ([/M1]cut6ÿ-ÿ[/M1]cut1ÿ=ÿ[/M2]cut6ÿ-ÿ[/M2]cut1)
ÿ13.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿp_scaÿ=ÿr(p)
ÿ14.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿp_locÿ=ÿ`p_loc'
ÿ15.ÿend

.ÿ
.ÿ*
.ÿ*ÿExaminingÿtheÿoperatingÿcharacteristicsÿofÿtheÿtestÿmethodÿabove
.ÿ*
.ÿ
.ÿ/*ÿUtilityÿprogramÿforÿcreatingÿprobitÿcutpointsÿofÿsevenÿorderedÿcategoriesÿ
>ÿÿÿÿforÿuseÿbyÿ-grologit-ÿ*/
.ÿlocalÿline_sizeÿ`c(linesize)'

.ÿsetÿlinesizeÿ80

.ÿmata:
-------------------------------------------------ÿmataÿ(typeÿendÿtoÿexit)ÿ------
:ÿmataÿsetÿmatastrictÿon

:ÿ
:ÿclassÿCutSetÿ{
>ÿÿÿÿÿÿÿÿÿprivate:
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿrealÿrowvectorÿCuts
>ÿÿÿÿÿÿÿÿÿpublic:
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿfinalÿvoidÿsetCuts()
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿfinalÿvoidÿgetCuts()
>ÿ}

:ÿvoidÿfunctionÿCutSet::setCuts(realÿrowvectorÿCutValues)ÿ{
>ÿ
>ÿÿÿÿÿÿÿÿÿrealÿscalarÿcut_count
>ÿÿÿÿÿÿÿÿÿcut_countÿ=ÿlength(CutValues)
>ÿ
>ÿÿÿÿÿÿÿÿÿifÿ(cut_count==ÿ1)ÿCutsÿ=ÿinvnormal((1..CutValues-1)ÿ:/ÿCutValues)
>ÿÿÿÿÿÿÿÿÿelseÿCutsÿ=ÿinvnormal(CutValuesÿ:/ÿ(cut_countÿ+ÿ1))
>ÿ}

:ÿvoidÿfunctionÿCutSet::getCuts()ÿ
>ÿÿÿÿÿÿÿÿÿst_local("cuts",ÿinvtokens(strofreal(Cuts,ÿ"%18.0g")))

:ÿ
:ÿend
--------------------------------------------------------------------------------

.ÿsetÿlinesizeÿ`line_size'

.ÿ
.ÿmata:ÿCutBankÿ=ÿCutSet(2)

.ÿ
.ÿ//ÿEvenlyÿspacedÿcutpoints
.ÿmata:ÿCutBank[1].setCuts(7)

.ÿ
.ÿ//ÿArbitrarilyÿspacedÿcutpoints
.ÿmata:ÿCutBank[2].setCuts((0.5,ÿ0.75,ÿ1.5,ÿ2.5,ÿ4.5,ÿ6))

.ÿ
.ÿ/*ÿSimulationÿprogramÿforÿevaluationÿofÿoperatingÿcharacteristicsÿ*/
.ÿprogramÿdefineÿsimEm
ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ,ÿ[Cuts(string)ÿDelta(realÿ0)ÿSd(realÿ1)ÿn(integerÿ300)]
ÿÿ3.ÿ
.ÿÿÿÿÿÿÿÿÿestimatesÿdropÿ_all
ÿÿ4.ÿÿÿÿÿÿÿÿÿdropÿ_all
ÿÿ5.ÿÿÿÿÿÿÿÿÿdrawnormÿl1ÿl2,ÿdoubleÿmean(0ÿ`delta')ÿsd(1ÿ`sd')ÿcorr(1ÿ0.5ÿ\ÿ0.5ÿ1)ÿn(`n')
ÿÿ6.ÿ
.ÿÿÿÿÿÿÿÿÿifÿ"`cuts'"ÿ==ÿ""ÿmata:ÿCutBank[1].getCuts()
ÿÿ7.ÿÿÿÿÿÿÿÿÿelseÿmata:ÿCutBank[2].getCuts()
ÿÿ8.ÿ
.ÿÿÿÿÿÿÿÿÿforvaluesÿiÿ=ÿ1/2ÿ{
ÿÿ9.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgrologitÿl`i',ÿgenerate(m`i')ÿcuts(`cuts')ÿprobit
ÿ10.ÿÿÿÿÿÿÿÿÿ}
ÿ11.ÿ
.ÿÿÿÿÿÿÿÿÿtestEmÿm?
ÿ12.ÿend

.ÿ
.ÿ/*ÿOne-offÿutilityÿprogramÿforÿtestÿreportÿ*/
.ÿprogramÿdefineÿreportEm
ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntax
ÿÿ3.ÿ
.ÿÿÿÿÿÿÿÿÿforeachÿvarÿofÿvarlistÿp_*ÿ{
ÿÿ4.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿassertÿ!missing(`var')
ÿÿ5.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlocalÿwhichÿ=ÿsubstr("`var'",ÿ-3,ÿ3)
ÿÿ6.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgenerateÿbyteÿpos_`which'ÿ=ÿ`var'ÿ<ÿ0.05
ÿÿ7.ÿÿÿÿÿÿÿÿÿ}
ÿÿ8.ÿÿÿÿÿÿÿÿÿformatÿp*ÿ%05.3f
ÿÿ9.ÿÿÿÿÿÿÿÿÿsummarizeÿpos*,ÿformat
ÿ10.ÿ
.ÿend

.ÿ
.ÿ/*ÿSelectedÿuseÿcasesÿtoÿassessÿabilityÿtoÿdiscriminateÿdifferencesÿinÿ
>ÿÿÿÿlocationÿfromÿthoseÿofÿscaleÿ*/
.ÿ
.ÿ//ÿSameÿmeanÿ(LOCation)ÿandÿvarianceÿ(SCAle)
.ÿquietlyÿsimulateÿp_locÿ=ÿr(p_loc)ÿp_scaÿ=ÿr(p_sca),ÿreps(1000):ÿsimEm

.ÿreportEm

ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
-------------+---------------------------------------------------------
ÿÿÿÿÿpos_locÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.053ÿÿÿÿÿÿÿ0.224ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿpos_scaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.057ÿÿÿÿÿÿÿ0.232ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

.ÿ
.ÿ//ÿDitto,ÿbutÿarbitraryÿcutpointsÿ(Nÿincreasedÿinÿorderÿtoÿavoidÿsparsity)
.ÿquietlyÿsimulateÿp_locÿ=ÿr(p_loc)ÿp_scaÿ=ÿr(p_sca),ÿreps(1000):ÿ///
>ÿÿÿÿÿÿÿÿÿsimEm,ÿc("Arbitrary")ÿn(1000)

.ÿreportEm

ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
-------------+---------------------------------------------------------
ÿÿÿÿÿpos_locÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.055ÿÿÿÿÿÿÿ0.228ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿpos_scaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.046ÿÿÿÿÿÿÿ0.210ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

.ÿ
.ÿ//ÿDifferentÿmean;ÿsameÿvariance
.ÿquietlyÿsimulateÿp_locÿ=ÿr(p_loc)ÿp_scaÿ=ÿr(p_sca),ÿreps(1000):ÿ///
>ÿÿÿÿÿÿÿÿÿsimEmÿ,ÿd(0.5)

.ÿreportEm

ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
-------------+---------------------------------------------------------
ÿÿÿÿÿpos_locÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.997ÿÿÿÿÿÿÿ0.055ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿpos_scaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.055ÿÿÿÿÿÿÿ0.228ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

.ÿ
.ÿ//ÿSameÿmean;ÿdifferentÿvariance
.ÿquietlyÿsimulateÿp_locÿ=ÿr(p_loc)ÿp_scaÿ=ÿr(p_sca),ÿreps(1000):ÿ///
>ÿÿÿÿÿÿÿÿÿsimEmÿ,ÿs(1.8)

.ÿreportEm

ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
-------------+---------------------------------------------------------
ÿÿÿÿÿpos_locÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.048ÿÿÿÿÿÿÿ0.214ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿpos_scaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.867ÿÿÿÿÿÿÿ0.340ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

.ÿ
.ÿ//ÿDifferentÿmeanÿandÿvariance
.ÿquietlyÿsimulateÿp_locÿ=ÿr(p_loc)ÿp_scaÿ=ÿr(p_sca),ÿreps(1000):ÿ///
>ÿÿÿÿÿÿÿÿÿsimEmÿ,ÿd(0.5)ÿs(1.8)

.ÿreportEm

ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
-------------+---------------------------------------------------------
ÿÿÿÿÿpos_locÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.900ÿÿÿÿÿÿÿ0.300ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000
ÿÿÿÿÿpos_scaÿ|ÿÿÿÿÿÿ1,000ÿÿÿÿÿÿÿ0.865ÿÿÿÿÿÿÿ0.342ÿÿÿÿÿÿ0.000ÿÿÿÿÿÿ1.000

.ÿ
.ÿexit

endÿofÿdo-file

.
Attached Files

grologit.ado (1.9 KB, 8 views)
1 like
Comment

Announcement

Testing difference between variances of two variables from the same population

Comment

Comment

Comment

Comment

Comment

Comment

Comment