Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with xtset

    Dear Statalist,

    I want to combine three datasets. In dataset 1 (my main dataset) I have a panel with firm (ORDINAL) and time (year, from 1990 to 2013) and some other variables as for example sectors and regions. From another dataset 2, I have employment for each sector for the different regions from 1993 to 2007. And from dataset 3 I have employment for each sector for each region from 2008 to 2013. That is, dataset 1 is firms level while dataset 2 and 3 are regional data by sectors.
    Code:
    +------------------------------------+
          | CCAA1   newnac~o   year   newwor~r |
          |------------------------------------|
       1. |     1          1   2008      52505 |
       2. |     1          1   2009      48966 |
       3. |     1          1   2010      49414 |
       4. |     1          1   2011      47264 |
       5. |     1          1   2012      44904 |
          |------------------------------------|
       6. |     1          1   2013      47170 |
       7. |     1          2   2008      12158 |
       8. |     1          2   2009       9815 |
       9. |     1          2   2010       9426 |
      10. |     1          2   2011       8830 |
          |------------------------------------|
      11. |     1          2   2012       7919 |
      12. |     1          2   2013       9086 |
      13. |     1          3   2008      19733 |
      14. |     1          3   2009      14942 |
      15. |     1          3   2010      13753 |
    Code:
    +----------------------------------+
          | CCAA1   newnac~o   year   worker |
          |----------------------------------|
       1. |     1          1   1993    58271 |
       2. |     1          1   1994    59180 |
       3. |     1          1   1995    57143 |
       4. |     1          1   1996    55592 |
       5. |     1          1   1997    56492 |
          |----------------------------------|
       6. |     1          1   1998    56227 |
       7. |     1          1   1999    58156 |
       8. |     1          1   2000    53856 |
       9. |     1          1   2001    53819 |
      10. |     1          1   2002    51748 |
          |----------------------------------|
      11. |     1          1   2003    52904 |
      12. |     1          1   2004    50106 |
      13. |     1          1   2005    51564 |
      14. |     1          1   2006    52516 |
      15. |     1          1   2007    52501 |
          |----------------------------------|
      16. |     1          2   1993    23416 |
      17. |     1          2   1994    21481 |
      18. |     1          2   1995    22371 |
      19. |     1          2   1996    17999 |
      20. |     1          2   1997    18396 |
          |----------------------------------|
      21. |     1          2   1998    22040 |
      22. |     1          2   1999    21689 |
      23. |     1          2   2000    20450 |
      24. |     1          2   2001    20818 |
    In order to merge those datasets I am using the following code, where CCAA1 is the id for regions, and newnaceclio is the id for sectors:
    Code:
     merge m:1 CCAA1 newnaceclio year using "D:\Usuarios\dtojeiro\Dropbox\PhD\Second Paper ESEE_DATA\employ_by_
    > region_sector_cenae93.dta"
    
        Result                           # of obs.
        -----------------------------------------
        not matched                       100,990
            from master                   100,683  (_merge==1)
            from using                        307  (_merge==2)
    
        matched                            26,613  (_merge==3)
        -----------------------------------------
    
    . drop _merge
    
    . merge m:1 CCAA1 newnaceclio year using "D:\Usuarios\dtojeiro\Dropbox\PhD\Second Paper ESEE_DATA\employ_by_
    > region_sector_cenae09.dta"
    
        Result                           # of obs.
        -----------------------------------------
        not matched                       116,321
            from master                   116,205  (_merge==1)
            from using                        116  (_merge==2)
    
        matched                            11,398  (_merge==3)
        -----------------------------------------
    My problem is that I need to tell Stata that this is a panel with id(ORDINAL) year(year) but it give me this error
    Code:
     xtset ORDINAL year
    repeated time values within panel
    I don´t know what am I doing wrong. Can anyone give me a hand with this?

    Thanks in advanced.

  • #2
    I can't see any data example here containing ORDINAL.

    http://www.stata.com/support/faqs/da...ues/index.html gives generic advice.

    Comment


    • #3
      While it's hard to say without a meaningful data example, my guess would be its due to the incomplete merge. As you have certain regions not represented by any firms, ORDINAL will be missing for those cases. Hence, you'll have repeated values (missings in this case) per ordinal/year.

      More generally, you can use duplicates tag ORDINAL year, gen(dubs) to see which observations are causing your problem.

      Comment


      • #4
        Many thanks for your answers Nick and Jesse. I have done the steps on the link
        Code:
         isid ORDINAL year
        variables ORDINAL year should never be missing
        Code:
         duplicates list ORDINAL year
        
        Duplicates in terms of ORDINAL year
        
          +----------------------------------+
          | group:     obs:   ORDINAL   year |
          |----------------------------------|
          |      1    16013         .   1993 |
          |      1    16654         .   1993 |
          |      1    16688         .   1993 |
          |      1    16706         .   1993 |
          |      1    17093         .   1993 |
          |----------------------------------|
          |      1    18118         .   1993 |
          |      1    18124         .   1993 |
          |      1    18195         .   1993 |
          |      1    18376         .   1993 |
          |      1    18490         .   1993 |
          |----------------------------------|
          |      1    18549         .   1993 |
          |      1    18933         .   1993 |
          |      1    19283         .   1993 |
          |      1    19338         .   1993 |
          |      1    19406         .   1993 |
        As you can see, some ORDINAL that existed before, now are missings. Then I try the following steps on the link

        Code:
        duplicates tag ORDINAL year, gen(isdup)
        
        Duplicates in terms of ORDINAL year
        
        . sum isdup
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
               isdup |    127,719    .0653309    1.150891          0         26
        
        . edit if isdup
        
        . drop isdup
        
        . sum ORDINAL
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
             ORDINAL |    127,296      2652.5    1531.139          1       5304
        but as my id(ORDINAL) is missing and since I do not want to lose those observations (see that before I had 127,719 and now I have 127,296), I do not know what to do now.
        Can I do something in the merge process in order not to convert my ORDINAL into missings? Should I make a new identifier from the regional dataset 2 and 3 using regions and sectors in order to have only one identifier and replicate such identifier into my main dataset (which is a panel)? I mean, for region 1 in sector 1 in 1993 (make one and unique identifier; something as 111993) and so on. The point is that I dont know if this will solve the problem, even more, I don´t know how to do it.
        Any help?

        Comment


        • #5
          Hi again, just to show you that even doing what the link of Nick says, it seems that cannot solve the problem
          Code:
           clear
          
          . use "D:\Usuarios\dtojeiro\Dropbox\PhD\Second Paper ESEE_DATA\esee.dta"
          
          . recode NACECLIO (1 2 3 = 1) (4 5 = 2) (6/8 = 3) (9 = 4) (10 = 5) (11 = 6) (12 13 = 7) (14 = 8) (15 16 =9)
          > (17 18 = 10) (19 20 = 11), gen(newnaceclio)
          (42902 differences between NACECLIO and newnaceclio)
          
          . merge m:1 CCAA1 newnaceclio year using "D:\Usuarios\dtojeiro\Dropbox\PhD\Second Paper ESEE_DATA\employ_by_
          > region_sector_cenae93.dta"
          
              Result                           # of obs.
              -----------------------------------------
              not matched                       100,990
                  from master                   100,683  (_merge==1)
                  from using                        307  (_merge==2)
          
              matched                            26,613  (_merge==3)
              -----------------------------------------
          
          . drop _merge
          
          . merge m:1 CCAA1 newnaceclio year using "D:\Usuarios\dtojeiro\Dropbox\PhD\Second Paper ESEE_DATA\employ_by_
          > region_sector_cenae09.dta"
          
              Result                           # of obs.
              -----------------------------------------
              not matched                       116,321
                  from master                   116,205  (_merge==1)
                  from using                        116  (_merge==2)
          
              matched                            11,398  (_merge==3)
              -----------------------------------------
          
          . drop _merge
          
          .     duplicates list ORDINAL year
          
          Duplicates in terms of ORDINAL year
          
            +----------------------------------+
            | group:     obs:   ORDINAL   year |
            |----------------------------------|
            |      1     5603         .   1993 |
            |      1     6700         .   1993 |
            |      1     6771         .   1993 |
            |      1     6897         .   1993 |
            |      1     7369         .   1993 |
            |----------------------------------|
            |      1     7418         .   1993 |
            |      1     7568         .   1993 |
            |      1     7620         .   1993 |
            |      1     7848         .   1993 |
            |      1     7905         .   1993 |
            |----------------------------------|
            |      1     8140         .   1993 |
            |      1     8293         .   1993 |
            |      1    28986         .   1993 |
            |      1    29001         .   1993 |
            |      1    29092         .   1993 |
            |----------------------------------|
            |      1    29195         .   1993 |
            |      1    29211         .   1993 |
            |      1    39246         .   1993 |
            |      1    39927         .   1993 |
            |      1    44150         .   1993 |
            |----------------------------------|
            |      1    44203         .   1993 |
            |      1    44220         .   1993 |
          --Break--
          r(1);
          
          . duplicates tag ORDINAL year, gen(isdup)
          
          Duplicates in terms of ORDINAL year
          
          . sum isdup
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 isdup |    127,719    .0653309    1.150891          0         26
          
          . edit if isdup
          
          . drop isdup
          
          . sort ORDINAL year
          
          . xtset ORDINAL year
          repeated time values within panel
          r(451);
          For sure I am doing something wrong or even more, I don´t understand what I am doing. Any advice?

          Comment


          • #6
            The link provided explains that you need to find duplicates and then fix them. What did the edit do? We can't see from here.

            Comment


            • #7
              The edit open the data editor and show
              Code:
              ORDINAL    year    IDSIT    _new_to_IDSIT    _new_to_old    ACCID    ACT    ADBEM    AEMP    AGM1N    APLIF    APLIFN    ATFRP    BSFE    CAMBIO    CC1M1    CC2M1    CC3M1    CC4M1    CCAA1    CCAA2    CI1N    COINT    CPSP    CPSP1    CR41N    CTCL    CTCO    CTPR    CTSP    CTSP1    CUCT    DEDID    DEDIT    DISENO    DMER1N    ECM1N    EMPIDA    EMPIDM    EMPIDS    EMPIDT    EMYM    EPCT    ESNIND    ESTI    ETAE    EXPORT    EXPTEC    FAMILI    FICS    FPIDCA    FPIDES    FPIDOT    GEID    GIID    GTID    HETN    ICO    ICOCAN    ICODIS    ICOFIJ    ICOPRO    IGM    IILR    IMO    IMOGE    IMOPE    IMPTEC    INBE    INDIV    INM    INOCDE    INRM    INVEX    IP    IPR    IPRME    IPRPI    IPRTM    LOCAL    MEDGAS    MEDINV    MPAR    NACECLIO    NCM1N    NESNIN1    NIP    NMERIN    NPRNIN    NTES    NUMER    NUMEST    NUMPRO    NYCC    OACID    PATESP    PATEXT    PBSOI    PCAEXT    PEIT    PERETT    PERFTC    PERSOC    PERTOT    PEVEN    PIL    PIUE    PTIM    PTPO    REEID    REPID    RIMVA    SICYT    TIPSO    TMUN    TOTVEN    UAIT    UC    VA    VENTAS    VEXPOR    VPRO1    WEBASI    WEBB2B    WEBB2C    WEBCE    WEBCOM    WEBCOS    WEBEMP    WEBIMG    WEBINF    WEBOTR    WEBPRO    WEBVEN    XCEE    XIBERO    XOCDE    XRESTO    newnaceclio    worker    newworker    isdup        
                  1998                                                                        3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            4    1577        23        
                  1999                                                                        3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            4    1740        16        
                  1993                                                                        3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            5    306        22        
                  1994                                                                        3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            5    433        20        
                  2000                                                                        3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            5    690        15        
                  2001                                                                        3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            5    791        20        
                  2002                                                                        3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            5    654        18        
                  2003                                                                        3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            5    769        26
              I don´t know why, but the columns are moved. For ORDINAL is all missings, then I have year (that have its values) and then it shows the values for newnaceclio, woker and isdup (at the end of the line). The rest of variables are all missing.
              What I have done is as simply as
              Code:
              . drop if ORDINAL==.
              (423 observations deleted)
              
              . xtset ORDINAL year
                     panel variable:  ORDINAL (strongly balanced)
                      time variable:  year, 1990 to 2013
                              delta:  1 unit
              Do you think that dropping those observations is good? Because I am not sure if those observations were generated because of the merge process or I had it since the beginning and the merge convert then into missings.

              Comment


              • #8
                Changes in the order of variables (not in Stata usually called columns) is not something to worry about.

                It's clear that the problem is entirely in the missings for ORDINAL.

                Otherwise I can't say what is good for you.

                I pointed out in #2 that we can't see your original data including any values of ORDINAL and that remains true. So, I can't add to Jesse's comment in #3 on why missings appear to emerge.

                I realise that your dataset is too large to post here, but without a reproducible example it is difficult to comment further.

                http://stackoverflow.com/help/mcve was written for a different forum but its message holds true here, as does the advice in http://www.statalist.org/forums/help#stata

                Comment


                • #9
                  Yes, you are right, the data is huge, sorry. In order to give you an idea of how the data is I do the following where I just do the summary of my firm identifier to show you the amount of observation I have in the data

                  Code:
                  xtset ORDINAL year
                         panel variable:  ORDINAL (strongly balanced)
                          time variable:  year, 1990 to 2013
                                  delta:  1 unit
                  
                  . xtsum ORDINAL
                  
                  Variable         |      Mean   Std. Dev.       Min        Max |    Observations
                  -----------------+--------------------------------------------+----------------
                  ORDINAL  overall |    2652.5   1531.139          1       5304 |     N =  127296
                           between |             1531.277          1       5304 |     n =    5304
                           within  |                    0     2652.5     2652.5 |     T =      24
                  
                  . recode NACECLIO (1 2 3 = 1) (4 5 = 2) (6/8 = 3) (9 = 4) (10 = 5) (11 = 6) (12 13 = 7) (14 = 8) (15 16 =9)
                  > (17 18 = 10) (19 20 = 11), gen(newnaceclio)
                  (42902 differences between NACECLIO and newnaceclio)
                  
                  . list ORDINAL year CCAA1 newnaceclio NIP
                  
                          +-----------------------------------------+
                          | ORDINAL   year   CCAA1   newnac~o   NIP |
                          |-----------------------------------------|
                       1. |       1   1990      16          7     0 |
                       2. |       1   1991       .          .     . |
                       3. |       1   1992       .          .     . |
                       4. |       1   1993       .          .     . |
                       5. |       1   1994      16          7     1 |
                          |-----------------------------------------|
                       6. |       1   1995      16          7     0 |
                       7. |       1   1996      16          7     0 |
                       8. |       1   1997      16          7     1 |
                       9. |       1   1998      16          7     0 |
                      10. |       1   1999      16          7     0 |
                          |-----------------------------------------|
                      11. |       1   2000      16          7     0 |
                      12. |       1   2001      16          7     0 |
                      13. |       1   2002      16          7     0 |
                      14. |       1   2003      16          7     0 |
                      15. |       1   2004      16          7     0 |
                          |-----------------------------------------|
                      16. |       1   2005      16          7     0 |
                      17. |       1   2006      16          7     0 |
                      18. |       1   2007      16          7     0 |
                      19. |       1   2008      16          7     0 |
                      20. |       1   2009      16          7     3 |
                          |-----------------------------------------|
                      21. |       1   2010      16          7    10 |
                      22. |       1   2011      16          7     8 |
                      23. |       1   2012      16          7     0 |
                      24. |       1   2013      16          7     6 |
                      25. |       2   1990      16          6     0 |
                          |-----------------------------------------|
                      26. |       2   1991      16          6     0 |
                      27. |       2   1992      16          6     0 |
                      28. |       2   1993      16          6     0 |
                      29. |       2   1994      16          6   100 |
                      30. |       2   1995      16          6     0 |
                          |-----------------------------------------|
                      31. |       2   1996      16          6     0 |
                      32. |       2   1997      16          6     0 |
                      33. |       2   1998      16          6     0 |
                      34. |       2   1999      16          6    58 |
                      35. |       2   2000      16          6    58 |
                          |-----------------------------------------|
                      36. |       2   2001      16          6     0 |
                      37. |       2   2002      16          6     0 |
                      38. |       2   2003       .          .     . |
                      39. |       2   2004       .          .     . |
                      40. |       2   2005       .          .     . |
                          |-----------------------------------------|
                      41. |       2   2006       .          .     . |
                      42. |       2   2007       .          .     . |
                      43. |       2   2008       .          .     . |
                      44. |       2   2009       .          .     . |
                      45. |       2   2010       .          .     . |
                          |-----------------------------------------|
                      46. |       2   2011       .          .     . |
                      47. |       2   2012       .          .     . |
                      48. |       2   2013       .          .     . |
                      49. |       3   1990      16          4     4 |
                      50. |       3   1991      16          4     6 |
                          |-----------------------------------------|
                      51. |       3   1992      16          4     4 |
                  --Break--
                  Now the dataex
                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input int(ORDINAL year) byte(CCAA1 newnaceclio) int NIP
                  1 1990 16 7   0
                  1 1991  . .   .
                  1 1992  . .   .
                  1 1993  . .   .
                  1 1994 16 7   1
                  1 1995 16 7   0
                  1 1996 16 7   0
                  1 1997 16 7   1
                  1 1998 16 7   0
                  1 1999 16 7   0
                  1 2000 16 7   0
                  1 2001 16 7   0
                  1 2002 16 7   0
                  1 2003 16 7   0
                  1 2004 16 7   0
                  1 2005 16 7   0
                  1 2006 16 7   0
                  1 2007 16 7   0
                  1 2008 16 7   0
                  1 2009 16 7   3
                  1 2010 16 7  10
                  1 2011 16 7   8
                  1 2012 16 7   0
                  1 2013 16 7   6
                  2 1990 16 6   0
                  2 1991 16 6   0
                  2 1992 16 6   0
                  2 1993 16 6   0
                  2 1994 16 6 100
                  2 1995 16 6   0
                  2 1996 16 6   0
                  2 1997 16 6   0
                  2 1998 16 6   0
                  2 1999 16 6  58
                  2 2000 16 6  58
                  2 2001 16 6   0
                  2 2002 16 6   0
                  2 2003  . .   .
                  2 2004  . .   .
                  2 2005  . .   .
                  2 2006  . .   .
                  2 2007  . .   .
                  2 2008  . .   .
                  2 2009  . .   .
                  2 2010  . .   .
                  2 2011  . .   .
                  2 2012  . .   .
                  2 2013  . .   .
                  3 1990 16 4   4
                  3 1991 16 4   6
                  3 1992 16 4   4
                  3 1993  . .   .
                  3 1994 16 4   3
                  3 1995 16 4   0
                  3 1996 16 4   2
                  3 1997 16 4   0
                  3 1998 16 4   2
                  3 1999 16 4   0
                  3 2000 16 4   0
                  3 2001  . .   .
                  end
                  Code:
                   misstable sum ORDINAL
                  (variables nonmissing or string)
                  I don´t know if with this is enough. As you can see, the data do not have any missing in ORDINAL before the merge process.
                  If I do the merge from a dataset that only contain CCAA1 (region), year and lets say university degree and do the merge there is no problem. The problem comes from the dataset in which I have to merge using regions and sector classification, so I don´t know why it creates the missing.
                  On top of that, let me thank you for the time you are dedicating to this.

                  Comment


                  • #10
                    You are trying hard, but sorry, this does not help (me). I don't see any extra information on why missings emerge.

                    Comment


                    • #11
                      Do not worry, you have helped me a lot. I now need to evaluate if dropping or not the missing ORDINAL. In fact, I just realize that the amount of missing is exactly the same as the sum of the _merge==2 of the third table of my first post (the one that is for the merge process).

                      Anyway, thanks a lot for your help.

                      Comment


                      • #12
                        A good way to go about this is to look at your data before the merge and then compare it to the data after the merge. You can also tab the missing values _merge values.

                        E.g.

                        Code:
                        gen ismissing = missing(ORDINAL)
                        tab ismissing _merge

                        Comment


                        • #13
                          Jesse, following your advice I can see that those that are missing (ORDINAL) are the ones that were _merge==2
                          Code:
                          tab  _merge ismissing
                          
                                                |       ismissing
                                         _merge |         0          1 |     Total
                          ----------------------+----------------------+----------
                                master only (1) |   100,683          0 |   100,683
                                 using only (2) |         0        307 |       307
                                    matched (3) |    26,613          0 |    26,613
                          ----------------------+----------------------+----------
                                          Total |   127,296        307 |   127,603
                          Since in my main dataset before doing the merge process I have some missing for CCAA1(regions) and NACECLIO(sectors), I am trying to impute the mode by ORDINAL in order to see if those missing are part of the problem. Even though I am not sure if could be because of this.
                          The point here is that I cannot see why for some values it work while for some other it impute a missing (ORDINAL) knowing that I have a unique combination of those three variables that I used in the merge process CCAA (with values from 1 to 17), newnaceclio (with values from 1 to 11) and time

                          Code:
                          ORDINAL    year    CCAA1    newnaceclio    worker
                          3850    2006    1    6    25717
                          1052    2006    1    6    25717
                          4185    2006    1    6    25717
                          1771    2006    1    6    25717
                          2562    2006    1    6    25717
                          4287    2006    1    6    25717
                          4165    2006    1    6    25717
                          Code:
                          ORDINAL    year    CCAA1    newnaceclio    worker
                              2001    3    5    791
                              2002    3    5    654
                              2003    3    5    769
                              2004    3    5    995
                              2005    3    5    1105
                              2006    3    5    1156
                          Again, many thanks for your time and help.

                          Comment


                          • #14
                            I have to admit I don't understand what you're asking. However, normally the reason you get those missings if because you have no observation in your original dataset corresponding to the combination of "CCAA1 newnaceclio year". You can check this in your original data.

                            Comment

                            Working...
                            X