Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Index of qualitative variation

    Hello,

    I want to use an index of qualitative variation (nominal variables) in Stata. Is there a specific command?

    Thank you in advance

  • #2
    I think this is one of about 20 or so names in various literatures for a sum of squared proportions. (If you want the complement or reciprocal of that, that's one easy step further.)

    See e.g. http://www.statalist.org/forums/foru...index-in-stata

    If that doesn't solve the problem please give us an exact definition (equation, not words) and some sample data.

    Comment


    • #3
      never heard of it - could you supply some more info (e.g., a cite)? there are a lot of user-written commands for different measures of variability

      Comment


      • #4
        Hi, my main goal it to measure "variability" for ordinal variables. Below, you have an example.

        I wanted to do apply this formula but it does not work:

        gen b1 = 1-((p1^2)+(p2^2)+(p3^2)+(p4^2)+(p5^2)+(p6^2))


        tab education

        *parents' highest education level* | Freq. Percent Cum.
        ----------------------------------------+-----------------------------------
        university or higher | 4,192 31.99 31.99
        post-secondary | 3,020 23.05 55.04
        upper secondary | 3,106 23.70 78.74
        lower secondary | 2,508 19.14 97.88
        some primary,lower secondary | 237 1.81 99.69
        not applicable | 41 0.31 100.00
        ----------------------------------------+-----------------------------------
        Total | 13,104 100.00

        Comment


        • #5
          So, this is already answered in the thread linked within #2 with the information that your index is 1 minus what is there called the Herfindahl-Hirschman index.

          Here's how to do it in Stata with Mata:

          Code:
          sysuse auto, clear 
          
          tab rep78, matcell(freq) 
          
          mata 
          freq = st_matrix("freq") 
          1  - sum((freq :/ sum(freq)):^2)
          end
          In this case Mata shows .7032136106

          In your case, it's a matter of naming education as the variable.

          Comment


          • #6
            I'd like to get bootstrapped standard errors for this statistic. Is there a way of plugging the mata program above into a bootstrap command? Or if not, is there another way of doing it? (I have an experiment where the outcome variable is nominal and I want to compare treatment and control groups on this statistic).

            Thanks in advance.
            Nick

            Comment


            • #7
              Originally posted by emanuele fedeli View Post

              gen b1 = 1-((p1^2)+(p2^2)+(p3^2)+(p4^2)+(p5^2)+(p6^2))
              I have seen this referred to as a diversity index. I believe that the index of qualitative variation is a standardized version of the diversity index that ranges from 0 to 1 so you can compare variability across measures with different numbers of outcome categories. That can be calculated as:
              Code:
              gen b1 = (k/k-1)*1-((p1^2)+(p2^2)+(p3^2)+(p4^2)+(p5^2)+(p6^2)))
              Where k is the number of outcome categories.

              Best,
              Alan

              Comment


              • #8
                The IQV, as many sociologists term it, is available in -divcat- (available from SSC) and is described as the "normalized generalized variance." However, since you have an ordinal variable, I suggest you consider the measures of ordinal dispersion implemented in my module -ordvar-, also available at SSC.

                To amplify Nick's comment, I might venture that no other statistic has been (re)invented as many times as the IQV.

                Comment


                • #9
                  To answer #6 directly, the answer is surely yes. Here's a minimal code example.

                  Code:
                  *! 1.0.0 NJC 28 Sept 2016 
                  program iqv, rclass  
                      syntax varname [if] [in] 
                      marksample touse, strok 
                      tempname freq iqv 
                      qui tab `varlist' if `touse', matcell(`freq') 
                      mata: freq = st_matrix("`freq'") 
                      mata: st_numscalar("`iqv'", 1  - sum((freq :/ sum(freq)):^2)) 
                      di _n  "IQV: " %4.3f `iqv'
                      return scalar iqv = `iqv' 
                  end
                  
                  . sysuse auto
                  (1978 Automobile Data)
                  
                  . bootstrap r(iqv) , reps(100) : iqv rep78
                  (running iqv on estimation sample)
                  
                  Warning:  Since iqv is not an estimation command or does not set e(sample),
                            bootstrap has no way to determine which observations are used in
                            calculating the statistics and so assumes that all observations are
                            used.  This means no observations will be excluded from the
                            resampling because of missing values or other reasons.
                  
                            If the assumption is not true, press Break, save the data, and drop
                            the observations that are to be excluded.  Be sure that the dataset
                            in memory contains only the relevant data.
                  
                  Bootstrap replications (100)
                  ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
                  ..................................................    50
                  ..................................................   100
                  
                  Bootstrap results                               Number of obs      =        74
                                                                  Replications       =       100
                  
                        command:  iqv rep78
                          _bs_1:  r(iqv)
                  
                  ------------------------------------------------------------------------------
                               |   Observed   Bootstrap                         Normal-based
                               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                         _bs_1 |   .7032136   .0327004    21.50   0.000      .639122    .7673053
                  ------------------------------------------------------------------------------
                  
                  . estat bootstrap
                  
                  Bootstrap results                               Number of obs      =        74
                                                                  Replications       =       100
                  
                        command:  iqv rep78
                          _bs_1:  r(iqv)
                  
                  ------------------------------------------------------------------------------
                               |    Observed               Bootstrap
                               |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                         _bs_1 |   .70321361  -.0074416   .03270042    .6309599   .7529844  (BC)
                  ------------------------------------------------------------------------------
                  (BC)   bias-corrected confidence interval
                  In your case, the approach might need to be extended e.g. to get a CI for the difference of two measures.

                  Comment

                  Working...
                  X