Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Methods for bunch of zeros

    Dear forum members,

    I am using individual data from one of the Indian large-scale household surveys with information on 9 digital skills of 877344 individuals aged 15 years and above. The question is asked in yes (1) no (0) form.

    I intend to decompose inequality in digital skill by social group (caste in India [ST, SC, OBC, Other])

    In the full sample, I have 70% zeros.

    I have two questions related to methodology-

    1. Shall I go with linear regression and decomposition by taking a total of all nine skills?
    Code:
    egen digitalskill=rowtotal(b5q8 b5q9 b5q10 b5q11 b5q12 b5q13 b5q14 b5q15 b5q16)
    la var digitalskill "total scor of skills"
    or
    2. Create a binary dependent variable and apply non-linear decomposition
    Code:
    recode digitalskill (0=0 "no") (1/9=1 "yes"), gen(digtskill)
    or
    3. Is there any method(s) which can help considering the level of skill

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(b5q8 b5q9 b5q10 b5q11 b5q12 b5q13 b5q14 b5q15 b5q16 ST SC OBC Others)
    0 0 0 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 0 0 0 0 0 0 0 1 0
    1 0 0 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 1 1 1 1 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 0 0 0 0 0 0 0 1 0
    1 1 0 0 0 0 0 0 0 0 1 0 0
    1 0 0 0 0 0 0 0 0 0 1 0 0
    1 1 0 0 0 0 0 0 0 0 1 0 0
    1 1 0 0 0 1 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 0 1 0 0 0 0 0 1 0
    1 1 1 0 0 1 0 0 0 0 1 0 0
    1 1 0 0 0 1 0 0 0 0 1 0 0
    1 1 1 0 1 1 0 1 0 0 1 0 0
    1 1 1 0 1 1 0 1 0 0 1 0 0
    1 1 1 0 1 1 0 0 0 0 1 0 0
    1 1 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 1 1 0 1 0 0 0 1 0
    1 1 0 0 0 0 0 0 0 0 0 1 0
    1 1 0 0 0 1 0 0 0 0 0 1 0
    1 1 0 0 0 1 0 0 0 0 0 1 0
    1 0 0 0 0 0 0 0 0 0 1 0 0
    1 1 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 0 0 0 0 0 0 0 0 0 1 0
    1 0 0 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    1 1 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    1 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 1 0 0 0 0 0 0 1 0
    1 1 0 0 0 0 0 0 0 0 0 1 0
    1 1 0 0 0 0 0 0 0 0 0 1 0
    1 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 1 0 0 0 0 0 0 1 0
    1 1 0 0 0 0 0 0 0 0 0 1 0
    1 0 0 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 0 0 0 1 0 0 0 1 0
    1 1 0 0 0 1 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 0 0 0 0 0 0 0 1 0
    1 1 0 0 0 1 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 1 1 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 0 1 0
    1 1 1 0 0 1 0 0 0 0 0 1 0
    1 1 1 0 0 0 0 0 0 0 0 1 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    1 1 0 0 0 1 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    1 1 0 0 0 1 0 0 0 0 1 0 0
    1 1 0 0 0 1 0 1 0 0 1 0 0
    1 1 0 0 0 1 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    1 1 1 1 1 1 1 1 0 0 1 0 0
    1 1 0 0 0 1 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    1 1 0 0 0 1 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    1 1 0 0 0 1 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    1 1 1 1 1 1 1 1 1 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    1 1 0 0 0 1 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 0 1 0 0
    end
    label values b5q8 b5q8
    label def b5q8 0 "no", modify
    label def b5q8 1 "yes", modify
    label values b5q9 b5q9
    label def b5q9 0 "no", modify
    label def b5q9 1 "yes", modify
    label values b5q10 b5q10
    label def b5q10 0 "no", modify
    label def b5q10 1 "yes", modify
    label values b5q11 b5q11
    label def b5q11 0 "no", modify
    label def b5q11 1 "yes", modify
    label values b5q12 b5q12
    label def b5q12 0 "no", modify
    label def b5q12 1 "yes", modify
    label values b5q13 b5q13
    label def b5q13 0 "no", modify
    label def b5q13 1 "yes", modify
    label values b5q14 b5q14
    label def b5q14 0 "no", modify
    label def b5q14 1 "yes", modify
    label values b5q15 b5q15
    label def b5q15 0 "no", modify
    label def b5q15 1 "yes", modify
    label values b5q16 b5q16
    label def b5q16 0 "no", modify
    label def b5q16 1 "yes", modify
    Last edited by Mukesh Punia; 13 Jan 2024, 13:56.
    Best regards,
    Mukesh

    (Stata 15.1 SE)

  • #2
    Mukesh:
    I would support your solution #1.
    However, I do not know the specification of the right-hand side of your -regress- equation.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Not trying to answer the question -- it's the researcher's prerogative to identify the problem -- but noting that you have 2^9 = 512 composite possibilities and in such a large sample, they may well all occur! But you can still explore their frequencies.

      Code:
      . egen all = concat(b5q*)
      
      . 
      . tab all, sort 
      
              all |      Freq.     Percent        Cum.
      ------------+-----------------------------------
        000000000 |         43       43.00       43.00
        110000000 |         14       14.00       57.00
        110001000 |         13       13.00       70.00
        111000000 |          8        8.00       78.00
        100000000 |          7        7.00       85.00
        111001000 |          3        3.00       88.00
        111011010 |          3        3.00       91.00
        111010000 |          2        2.00       93.00
        111011000 |          2        2.00       95.00
        110001010 |          1        1.00       96.00
        111000010 |          1        1.00       97.00
        111011110 |          1        1.00       98.00
        111111110 |          1        1.00       99.00
        111111111 |          1        1.00      100.00
      ------------+-----------------------------------
            Total |        100      100.00

      Comment


      • #4
        Why not estimate a separate regression for each skill? There are lots of observations.

        This might be a job for Zellner's "seemingly unrelated regression" - where the error terms are correlated but the RHS variables are not endogenous.

        See: https://www.stata.com/manuals/rsureg.pdf

        Linear regression with a binary dependent variable isn't ideal - but a generalization of SUR to Probit is beyond the scope of this reply - or my knowledge.

        Comment


        • #5
          Thank you Dr. Lazzaro

          Originally posted by Carlo Lazzaro View Post
          Mukesh:
          I would support your solution #1.
          However, I do not know the specification of the right-hand side of your -regress- equation.

          Will zeros not be an issue in ols?

          RHS of the specification includes socioeconomic status. Except for age, age^2, and Usual Monthly per capita consumption exp. (MPCE) all others are dummy or categoricals. Given below

          Code:
          regress digitalskill age i.sex i.edu age_hh i.sex_hh i.edu_hh i.residence i.size_hh mpce i.caste i.religon i.nssregn[iw=wt] if age <=58
          Last edited by Mukesh Punia; 14 Jan 2024, 11:00.
          Best regards,
          Mukesh

          (Stata 15.1 SE)

          Comment


          • #6
            Thank you, Dr. Cox

            My aim is to decompose social groups (castes in India) based inequality in digital skills in India.

            From your suggestion, I am looking for a method that can take into consideration the inequality at 0 & 1 and then at the degrees of skill.


            Originally posted by Nick Cox View Post
            Not trying to answer the question -- it's the researcher's prerogative to identify the problem -- but noting that you have 2^9 = 512 composite possibilities and in such a large sample, they may well all occur! But you can still explore their frequencies.

            Code:
            . egen all = concat(b5q*)
            
            .
            . tab all, sort
            
            all | Freq. Percent Cum.
            ------------+-----------------------------------
            000000000 | 43 43.00 43.00
            110000000 | 14 14.00 57.00
            110001000 | 13 13.00 70.00
            111000000 | 8 8.00 78.00
            100000000 | 7 7.00 85.00
            111001000 | 3 3.00 88.00
            111011010 | 3 3.00 91.00
            111010000 | 2 2.00 93.00
            111011000 | 2 2.00 95.00
            110001010 | 1 1.00 96.00
            111000010 | 1 1.00 97.00
            111011110 | 1 1.00 98.00
            111111110 | 1 1.00 99.00
            111111111 | 1 1.00 100.00
            ------------+-----------------------------------
            Total | 100 100.00
            Best regards,
            Mukesh

            (Stata 15.1 SE)

            Comment


            • #7
              Mukesh:
              just one note about the RHS of your regression equation: it is recommended to rely on the wonderful capabilities of -fvvarlist- notation (and its relationship with -margins- and -marginsplot-) when it comes to categorical variables and interactions:
              Code:
              c.age#c.age
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Respected Jeff Wooldridge kindly share your thoughts on this.

                Originally posted by Mukesh Punia View Post
                Dear forum members,

                I am using individual data from one of the Indian large-scale household surveys with information on 9 digital skills of 877344 individuals aged 15 years and above. The question is asked in yes (1) no (0) form.

                I intend to decompose inequality in digital skill by social group (caste in India [ST, SC, OBC, Other])

                In the full sample, I have 70% zeros.

                I have two questions related to methodology-

                1. Shall I go with linear regression and decomposition by taking a total of all nine skills?
                Code:
                egen digitalskill=rowtotal(b5q8 b5q9 b5q10 b5q11 b5q12 b5q13 b5q14 b5q15 b5q16)
                la var digitalskill "total scor of skills"
                or
                2. Create a binary dependent variable and apply non-linear decomposition
                Code:
                recode digitalskill (0=0 "no") (1/9=1 "yes"), gen(digtskill)
                or
                3. Is there any method(s) which can help considering the level of skill

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte(b5q8 b5q9 b5q10 b5q11 b5q12 b5q13 b5q14 b5q15 b5q16 ST SC OBC Others)
                0 0 0 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 0 0 0 0 0 0 0 1 0
                1 0 0 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 1 1 1 1 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 0 0 0 0 0 0 0 1 0
                1 1 0 0 0 0 0 0 0 0 1 0 0
                1 0 0 0 0 0 0 0 0 0 1 0 0
                1 1 0 0 0 0 0 0 0 0 1 0 0
                1 1 0 0 0 1 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 0 1 0 0 0 0 0 1 0
                1 1 1 0 0 1 0 0 0 0 1 0 0
                1 1 0 0 0 1 0 0 0 0 1 0 0
                1 1 1 0 1 1 0 1 0 0 1 0 0
                1 1 1 0 1 1 0 1 0 0 1 0 0
                1 1 1 0 1 1 0 0 0 0 1 0 0
                1 1 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 1 1 0 1 0 0 0 1 0
                1 1 0 0 0 0 0 0 0 0 0 1 0
                1 1 0 0 0 1 0 0 0 0 0 1 0
                1 1 0 0 0 1 0 0 0 0 0 1 0
                1 0 0 0 0 0 0 0 0 0 1 0 0
                1 1 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 0 0 0 0 0 0 0 0 0 1 0
                1 0 0 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                1 1 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                1 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 1 0 0 0 0 0 0 1 0
                1 1 0 0 0 0 0 0 0 0 0 1 0
                1 1 0 0 0 0 0 0 0 0 0 1 0
                1 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 1 0 0 0 0 0 0 1 0
                1 1 0 0 0 0 0 0 0 0 0 1 0
                1 0 0 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 0 0 0 1 0 0 0 1 0
                1 1 0 0 0 1 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 0 0 0 0 0 0 0 1 0
                1 1 0 0 0 1 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 1 1 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 0 1 0
                1 1 1 0 0 1 0 0 0 0 0 1 0
                1 1 1 0 0 0 0 0 0 0 0 1 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                1 1 0 0 0 1 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                1 1 0 0 0 1 0 0 0 0 1 0 0
                1 1 0 0 0 1 0 1 0 0 1 0 0
                1 1 0 0 0 1 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                1 1 1 1 1 1 1 1 0 0 1 0 0
                1 1 0 0 0 1 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                1 1 0 0 0 1 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                1 1 0 0 0 1 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                1 1 1 1 1 1 1 1 1 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                1 1 0 0 0 1 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                0 0 0 0 0 0 0 0 0 0 1 0 0
                end
                label values b5q8 b5q8
                label def b5q8 0 "no", modify
                label def b5q8 1 "yes", modify
                label values b5q9 b5q9
                label def b5q9 0 "no", modify
                label def b5q9 1 "yes", modify
                label values b5q10 b5q10
                label def b5q10 0 "no", modify
                label def b5q10 1 "yes", modify
                label values b5q11 b5q11
                label def b5q11 0 "no", modify
                label def b5q11 1 "yes", modify
                label values b5q12 b5q12
                label def b5q12 0 "no", modify
                label def b5q12 1 "yes", modify
                label values b5q13 b5q13
                label def b5q13 0 "no", modify
                label def b5q13 1 "yes", modify
                label values b5q14 b5q14
                label def b5q14 0 "no", modify
                label def b5q14 1 "yes", modify
                label values b5q15 b5q15
                label def b5q15 0 "no", modify
                label def b5q15 1 "yes", modify
                label values b5q16 b5q16
                label def b5q16 0 "no", modify
                label def b5q16 1 "yes", modify
                Best regards,
                Mukesh

                (Stata 15.1 SE)

                Comment

                Working...
                X