Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Note: ### strata omitted because they contain no subpopulation members.

    Hello,

    Happy holidays, folks.

    I am using the National Health Interview Survey (NHIS) to fit a SEM model. I restricted my analysis to a sub-population using the svy, subpop option. Here is an extract of the command and results I got:


    Code:
    . svy, subpop(afim): gsem (hivtest <- meandepression acculturate), logit
    (running gsem on estimation sample)
    
    Survey: Generalized structural equation model
    
    Number of strata   =       391              Number of obs     =      1,027,151
    Number of PSUs     =       782              Population size   =  3,240,666,012
                                                Subpop. no. obs   =          1,575
                                                Subpop. size      =      4,662,509
                                                Design df         =            391
    Response           : hivtest                Number of obs     =          1,575
    Family             : Bernoulli
    Link               : logit
    
    --------------------------------------------------------------------------------
                   |             Linearized
                   |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    hivtest <-     |
    meandepression |  -.0141553   .0971542    -0.15   0.884    -.2051653    .1768547
       acculturate |   .0879319   .3502425     0.25   0.802    -.6006622     .776526
             _cons |   .2799228   .3520108     0.80   0.427     -.412148    .9719936
    --------------------------------------------------------------------------------
    Note: 248 strata omitted because they contain no subpopulation members.

    As the results show, I keep getting a caution stating that "248 strata omitted because they contain no subpopulation members".

    How do I resolve this issue of missing strata with no sub-pop members? How I do track and correct this, if possible?

    thanks again, Yy
    Last edited by Yawo Kokuvi; 26 Dec 2016, 14:23.

  • #2
    Hello Yawo,

    Welcome to the Stata Forum,

    I gather you'd get better advice provided you present the commands as well as the output. I kindly suggest you to do so, and you may use the SSC dataex, or you may just paste both under CODE delimiters.

    That said, I believe you may "track" the problem just by typing - svydes - and adding the "if subpop" clause.

    Hopefully that helps!


    Best,

    Marcos
    Best regards,

    Marcos

    Comment


    • #3
      thanks - I edited the original post, as suggested. In the meantime, I will look into the svydes.

      rgds, Yy

      Comment


      • #4
        Here is what I got when I issue the svydes if afim command.

        Code:
        
                                              #Obs per Unit
                                      ----------------------------
        Stratum    #Units     #Obs      min       mean      max   
        --------  --------  --------  --------  --------  --------
            5001         1*        1         1       1.0         1
            5002         1*        4         4       4.0         4
            5006         1*        3         3       3.0         3
            5007         1*       17        17      17.0        17
            5008         2         3         1       1.5         2
            5009         1*        6         6       6.0         6
            5010         2         4         2       2.0         2
            5011         2        25        12      12.5        13
            5012         2        13         3       6.5        10
            5013         2        23         2      11.5        21
            5014         2         2         1       1.0         1
            5016         1*        7         7       7.0         7
            5017         2         2         1       1.0         1
            5020         2        14         5       7.0         9
            5021         2         3         1       1.5         2
            5022         2        11         4       5.5         7
            5024         2         7         3       3.5         4
            5025         1*        1         1       1.0         1
            5027         2        52        11      26.0        41
            5028         1*        3         3       3.0         3
            5030         1*        2         2       2.0         2
            5031         2         7         3       3.5         4
            5032         1*        1         1       1.0         1
            5033         2         2         1       1.0         1
            5034         1*        6         6       6.0         6
            5035         1*        1         1       1.0         1
            5036         1*        3         3       3.0         3
            5039         1*        6         6       6.0         6
            5040         1*        2         2       2.0         2
            5041         2        16         6       8.0        10
            5042         2        94        43      47.0        51
            5043         1*        8         8       8.0         8
            5044         2        11         5       5.5         6
            5045         2         3         1       1.5         2
            5047         2         5         1       2.5         4
            5048         1*        3         3       3.0         3
            5049         2        12         2       6.0        10
        ....
        ....
            6288         2         2         1       1.0         1
            6289         2         7         1       3.5         6
            6290         2         5         1       2.5         4
            6291         1*        1         1       1.0         1
            6292         1*        1         1       1.0         1
            6293         2         6         3       3.0         3
            6294         2        12         1       6.0        11
            6295         1*        1         1       1.0         1
            6296         2        10         3       5.0         7
            6297         1*        2         2       2.0         2
            6299         1*        1         1       1.0         1
        --------  --------  --------  --------  --------  --------
             530       889     8,121         1       9.1       196
        
                           1,506,406 = #Obs with missing values in the 
                            --------   survey characteristics
                           1,514,527
        I realize that there are a number of PSU's with few observations (the ones with 1*).
        Can I combine some of these singleton PSU's with adjoining ones? Will that help resolve some of these issues?

        thanks Yy

        Comment


        • #5
          Hello, I am following up on my previous postings.

          The subpopulation I am looking at is rather small (less than 1% of the total population in the data). Is that partly responsible for the issues I am having with omitted strata?

          I want to provide a scratch dataset to help replicate the problem. However, since SEM models require large samples, I am not sure if 10 or 20 observations (as advised in the FAQ) will be enough, particularly, if I want to ensure there are enough sub-population members in the data.

          I would appreciate any advice to help solve the problem I am facing.

          thanks - Yy

          Comment


          • #6
            Hello Yawo,

            It seems you have many strata with a single unit. What is more, many units with a single observation. Also, by reading the message from Stata output, it seems there are several units with missing observations.

            Unfortunately, I gather it is not something necessarily to "correct", as you wished.

            Stata is just telling you the truth, nothing but the truth.

            No doubt, and depending on the rationale as well as the "situation" ( i.e,, the amount of missing data, which seems to be huge in your case), units and strata can be aggregated, and multiple imputation may be applied. However, it seems that is not the case, for you'd get rid of the subpopulation itself.

            With regards to small subpopulations in survey data analysis, I recommend a search on this Forum with the key words "subpop" or "subpopulation". I believe you will find plenty of "food for thought",

            That said, I wish to share with you this particular thread, which approaches the controversial issue of dealing with small subpopulations: http://www.statalist.org/forums/foru...on-size-matter

            Best,

            Marcos
            Best regards,

            Marcos

            Comment


            • #7
              Marcos, thanks very much for your informative answer. I've requested the Applied Survey Data Analysis book, and would take a look at the sections in the post you referenced and suggested.

              But I took another look at the dataset, and realized I can account for some of the missing data - mainly cases that were ineligible for key variables in my study.

              Specifically, the dataset has a lot of respondents who were deemed ineligible for certain questions. For example, I am interested in HIV testing, a question that was asked only of sample adults aged 18+. years. Respondents under 18 yrs are tagged with an NIU code for the HIV and related variables.

              To clear some of these missing cases, I am thinking of possibly deleting them from the dataset. Before I do my svyset, do i delete these NIU responses from the data, since my key dependent variable is restricted to only those aged 18+ years or is it better to use the svy subpop option to restrict the sample to only those who are eligible respondents (given the general applicability of this question, I posted a question on this issue, available here: http://www.statalist.org/forums/foru...ses-and-svyset

              Thanks - Yy

              Comment


              • #8
                Hello Yawo,

                I agree with you: it seems to be a new query and it's great you started a new thread for that. Thanks.

                Best,

                Marcos

                P.S. I post a comment to your latest question on your new query as well.
                Best regards,

                Marcos

                Comment


                • #9
                  Marcos,
                  I'm trying to use de IBGE's Pnad Contínua, using the Stata's survey commands. Anyway, I can't figure out a way to solve the problem with the omitted strata. Stata shows this message:
                  Code:
                  Note: 20 strata omitted because they contain no subpopulation members
                  Due to it, the expansion of the weights are overestimated.

                  It's not a problem of missing values. I already checked it.
                  Hope you can help me.

                  Here is the code:

                  Code:
                  * Tabela 6479 do Sidra    
                  * https://sidra.ibge.gov.br/tabela/6479#resultado    
                      
                  * abre o banco para a pnad do 2º trimestre de 2017    
                  clear all    
                  set more off    
                  set dp comma    
                      
                  cd "/mnt/hdexterno/bancos/Bases Não Identificadas/PNAD/PNADC_Trimestral/"
                  use "PNADC_1T12_2T18.dta"
                  
                  * seleciona 2T2017  
                  keep if (Ano==2017)    
                  keep if (Trimestre == 2)    
                      
                      
                  * FILTRO - aqui eu crio a subpopulação que atende à tabela:    
                  * Pessoas de 14 anos ou mais, ocupadas na semana de referência como empregadores ou conta própria no trabalho principal, que tinham empreendimento registrado no CNPJ    
                  gen filtro=0    
                  replace filtro= 1 if ((VD4002==1) & (V4019==1) & ((VD4007==2) | (VD4007==3)) & (V2009>=14))
                      
                  * pra ficar igual a tabela, tem ainda que juntar Os Grupamentos de atividade Administração pública, defesa e seguridade e Educação, saúde humana e serviços sociais    
                  recode VD4010 (8=9), gen(GO)    
                      
                  * definição do desenho amostral    
                  svyset  UPA [pweight =  V1027], strata(Estrato) singleunit(centered) poststrata(posest) postweight(V1029)    
                      
                  * Tabela 6479 - com desenho amostral    
                  svy, subpop(filtro): tab UF GO, format(%15,1fc) count cv
                  The messages shown in the output below and under the table are:
                  Code:
                  Number of strata   =       554                Number of obs     =      557.121
                  Number of PSUs     =    14.829                Population size   =  206.882.729
                  N. of poststrata   =        77                Subpop. no. obs   =       17.145
                                                                Subpop. size      = 7.544.977,25
                                                                Design df         =       14.275
                  
                  
                    Key:  weighted count
                          coefficients of variation of weighted count
                  
                    Table contains a zero in the marginals.
                    Statistics cannot be computed.
                  
                  Note: 20 strata omitted because they contain no subpopulation members.
                  Thanks a lot!

                  Abraço,
                  Gustavo Monteiro
                  Last edited by gustavo monteiro; 25 Oct 2018, 11:39.

                  Comment


                  • #10
                    Hi Marcos,
                    do you think you could help me on this issue?
                    thanks a lot!

                    Comment


                    • #11
                      I gather the advice in #5 still applies to your new query.

                      That being said, I believe that this subpopulation should ideally be estimated under the original survey design. This is the furthest I can go.

                      Best regards,

                      Marcos

                      Comment


                      • #12
                        I would add that this issue may happen also if you have some weights set to zero for an entire stratum.

                        Comment

                        Working...
                        X