Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repeated time values within panel when trying to create new variable

    Hi, I am trying to create a variable called "fempstat" which measures an individual's employment status in the next month. I have the following lines of code:
    xtset cpsidp date
    gen fempstat=f1.empstat
    label var fempstat "Next month employment status"

    However, I am getting the error "repeated time values within panel". I have tried to switch the variable "date" out with "month" but I am still getting the same error.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year byte month double cpsid byte(statefip empstat) float date
    2021 1 20191000055400 1 10 732
    2021 1 20201100108800 1 10 732
    2021 1 20191000000800 1 36 732
    2021 1 20201100070800 1 36 732
    2021 1 20201000048900 1 21 732
    2021 1 20201100075800 1 34 732
    2021 1 20191000006900 1  0 732
    2021 1 20201200015600 1 36 732
    2021 1 20201200115200 1 10 732
    2021 1 20210100040500 1 10 732
    2021 1 20191000107600 1 32 732
    2021 1 20201100138400 1 10 732
    2021 1 20191000064600 1 10 732
    2021 1 20201100070800 1 36 732
    2021 1 20191200098300 1 32 732
    2021 1 20201200057900 1 10 732
    2021 1 20191200132900 1 34 732
    2021 1 20201000063600 1 10 732
    2021 1 20201200039300 1 10 732
    2021 1 20201100033300 1 34 732
    2021 1 20191200076200 1 36 732
    2021 1 20191000062900 1 10 732
    2021 1 20201100060300 1 10 732
    2021 1 20201000122200 1 36 732
    2021 1 20201000023600 1 10 732
    2021 1 20210100072600 1 36 732
    2021 1 20191100037400 1  0 732
    2021 1 20191200085700 1 36 732
    2021 1 20200100122500 1 36 732
    2021 1 20201200122800 1 10 732
    2021 1 20191100108900 1  0 732
    2021 1 20201100005300 1 10 732
    2021 1 20201200068000 1 36 732
    2021 1 20191200030500 1 10 732
    2021 1 20191100144700 1 21 732
    2021 1 20191000127000 1 10 732
    2021 1 20201000057500 1 10 732
    2021 1 20200100102700 1  0 732
    2021 1 20201100025400 1 10 732
    2021 1 20201100056500 1 10 732
    2021 1 20200100070100 1 10 732
    2021 1 20191200117500 1 12 732
    2021 1 20191100126000 1 34 732
    2021 1 20201000010700 1 10 732
    2021 1 20191200094600 1 36 732
    2021 1 20201000000200 1 10 732
    2021 1 20201100000100 1 36 732
    2021 1 20201100064000 1 10 732
    2021 1 20191000126700 1 36 732
    2021 1 20201200008400 1 10 732
    2021 1 20210100014400 1 10 732
    2021 1 20201100069100 1 10 732
    2021 1 20201200123000 1 10 732
    2021 1 20191000133700 1 10 732
    2021 1 20201100108500 1 36 732
    2021 1 20201200135300 1 10 732
    2021 1 20191200075100 1 10 732
    2021 1 20210100009800 1 34 732
    2021 1 20210100115200 1 12 732
    2021 1 20191100082900 1 10 732
    2021 1 20201000137500 1 10 732
    2021 1 20191000083500 1 10 732
    2021 1 20191100028100 1 10 732
    2021 1 20210100044200 1 10 732
    2021 1 20201200124900 1 36 732
    2021 1 20201100033000 1 10 732
    2021 1 20191100004600 1 10 732
    2021 1 20201100079500 1  0 732
    2021 1 20201000133500 1 10 732
    2021 1 20201200039400 1 10 732
    2021 1 20210100023300 1 36 732
    2021 1 20210100011700 1 36 732
    2021 1 20201200057300 1 36 732
    2021 1 20201100109500 1  0 732
    2021 1 20200300000800 1  0 732
    2021 1 20201100139800 1 34 732
    2021 1 20191100060300 1  0 732
    2021 1 20200100147800 1 32 732
    2021 1 20191100123800 1  0 732
    2021 1 20201100082100 1 10 732
    2021 1 20201000033800 1  0 732
    2021 1 20191200075100 1 34 732
    2021 1 20201200006800 1 36 732
    2021 1 20201200016600 1 32 732
    2021 1 20201100112900 1 10 732
    2021 1 20210100119400 1 10 732
    2021 1 20201000085300 1 34 732
    2021 1 20210100111900 1  0 732
    2021 1 20201000077700 1  0 732
    2021 1 20200100145800 1 36 732
    2021 1 20200100057300 1 36 732
    2021 1 20201200087500 1 10 732
    2021 1 20201000099500 1 32 732
    2021 1 20200100108700 1 10 732
    2021 1 20201200140700 1 36 732
    2021 1 20191000064600 1 21 732
    2021 1 20191200044200 1 10 732
    2021 1 20201200057300 1 36 732
    2021 1 20210100042900 1 12 732
    2021 1 20191200106300 1 36 732
    end
    format %tm date
    label values month month_lbl
    label def month_lbl 1 "January", modify
    label values statefip statefip_lbl
    label def statefip_lbl 1 "Alabama", modify
    label values empstat empstat_lbl
    label def empstat_lbl 0 "NIU", modify
    label def empstat_lbl 10 "At work", modify
    label def empstat_lbl 12 "Has job, not at work last week", modify
    label def empstat_lbl 21 "Unemployed, experienced worker", modify
    label def empstat_lbl 32 "NILF, unable to work", modify
    label def empstat_lbl 34 "NILF, other", modify
    label def empstat_lbl 36 "NILF, retired", modify

  • #2
    note: Please use this instead for dataex

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year byte month double cpsidp byte(statefip empstat) float date
    2021 1 20191000055401 1 10 732
    2021 1 20201100108803 1 10 732
    2021 1 20191000000801 1 36 732
    2021 1 20201100070801 1 36 732
    2021 1 20201000048901 1 21 732
    2021 1 20201100075803 1 34 732
    2021 1 20191000006902 1  0 732
    2021 1 20201200015601 1 36 732
    2021 1 20201200115203 1 10 732
    2021 1 20210100040502 1 10 732
    2021 1 20191000107601 1 32 732
    2021 1 20201100138402 1 10 732
    2021 1 20191000064601 1 10 732
    2021 1 20201100070802 1 36 732
    2021 1 20191200098302 1 32 732
    2021 1 20201200057902 1 10 732
    2021 1 20191200132902 1 34 732
    2021 1 20201000063601 1 10 732
    2021 1 20201200039301 1 10 732
    2021 1 20201100033305 1 34 732
    2021 1 20191200076202 1 36 732
    2021 1 20191000062901 1 10 732
    2021 1 20201100060304 1 10 732
    2021 1 20201000122202 1 36 732
    2021 1 20201000023601 1 10 732
    2021 1 20210100072602 1 36 732
    2021 1 20191100037402 1  0 732
    2021 1 20191200085701 1 36 732
    2021 1 20200100122501 1 36 732
    2021 1 20201200122801 1 10 732
    2021 1 20191100108905 1  0 732
    2021 1 20201100005302 1 10 732
    2021 1 20201200068002 1 36 732
    2021 1 20191200030502 1 10 732
    2021 1 20191100144702 1 21 732
    2021 1 20191000127001 1 10 732
    2021 1 20201000057502 1 10 732
    2021 1 20200100102703 1  0 732
    2021 1 20201100025401 1 10 732
    2021 1 20201100056502 1 10 732
    2021 1 20200100070102 1 10 732
    2021 1 20191200117502 1 12 732
    2021 1 20191100126002 1 34 732
    2021 1 20201000010701 1 10 732
    2021 1 20191200094601 1 36 732
    2021 1 20201000000201 1 10 732
    2021 1 20201100000101 1 36 732
    2021 1 20201100064005 1 10 732
    2021 1 20191000126702 1 36 732
    2021 1 20201200008401 1 10 732
    2021 1 20210100014402 1 10 732
    2021 1 20201100069103 1 10 732
    2021 1 20201200123002 1 10 732
    2021 1 20191000133701 1 10 732
    2021 1 20201100108502 1 36 732
    2021 1 20201200135301 1 10 732
    2021 1 20191200075101 1 10 732
    2021 1 20210100009804 1 34 732
    2021 1 20210100115201 1 12 732
    2021 1 20191100082902 1 10 732
    2021 1 20201000137502 1 10 732
    2021 1 20191000083501 1 10 732
    2021 1 20191100028101 1 10 732
    2021 1 20210100044201 1 10 732
    2021 1 20201200124902 1 36 732
    2021 1 20201100033001 1 10 732
    2021 1 20191100004601 1 10 732
    2021 1 20201100079503 1  0 732
    2021 1 20201000133502 1 10 732
    2021 1 20201200039402 1 10 732
    2021 1 20210100023302 1 36 732
    2021 1 20210100011701 1 36 732
    2021 1 20201200057302 1 36 732
    2021 1 20201100109503 1  0 732
    2021 1 20200300000803 1  0 732
    2021 1 20201100139801 1 34 732
    2021 1 20191100060303 1  0 732
    2021 1 20200100147801 1 32 732
    2021 1 20191100123802 1  0 732
    2021 1 20201100082101 1 10 732
    2021 1 20201000033804 1  0 732
    2021 1 20191200075102 1 34 732
    2021 1 20201200006802 1 36 732
    2021 1 20201200016602 1 32 732
    2021 1 20201100112902 1 10 732
    2021 1 20210100119402 1 10 732
    2021 1 20201000085302 1 34 732
    2021 1 20210100111903 1  0 732
    2021 1 20201000077703 1  0 732
    2021 1 20200100145801 1 36 732
    2021 1 20200100057301 1 36 732
    2021 1 20201200087502 1 10 732
    2021 1 20201000099501 1 32 732
    2021 1 20200100108701 1 10 732
    2021 1 20201200140701 1 36 732
    2021 1 20191000064602 1 21 732
    2021 1 20191200044202 1 10 732
    2021 1 20201200057301 1 36 732
    2021 1 20210100042901 1 12 732
    2021 1 20191200106301 1 36 732
    end
    format %tmCCYY!mNN date
    label values month month_lbl
    label def month_lbl 1 "January", modify
    label values statefip statefip_lbl
    label def statefip_lbl 1 "Alabama", modify
    label values empstat empstat_lbl
    label def empstat_lbl 0 "NIU", modify
    label def empstat_lbl 10 "At work", modify
    label def empstat_lbl 12 "Has job, not at work last week", modify
    label def empstat_lbl 21 "Unemployed, experienced worker", modify
    label def empstat_lbl 32 "NILF, unable to work", modify
    label def empstat_lbl 34 "NILF, other", modify
    label def empstat_lbl 36 "NILF, retired", modify

    Comment


    • #3
      Well, the example data you show in #2 produces no error message when you run -xtset cpsidp date-. But the error message you report in #1 is a commonly encountered problem. It is never something wrong with Stata. It is rarely a matter of the coding. It is nearly always that the data are defective.

      In order to run -xtset- with a panel variable and a time variable, it must be the case that for any given value of cpsidp, there is only one observation with any given value of date. When Stata tells you that isn't the case, then your data are unsuitable for that command.

      Occasionally, the problem is that the wrong panel variable or wrong date variable has been selected. But there is nothing in your example data to suggest that the problem would be resolved by the use of any other variables in the -xtset- coommand. Indeed, using the -month- variable would only make matters worse.

      So you probably have inappropriate data. To find the offending observations you can run:
      Code:
      sort cpsidp date
      duplicates tag cpsidp date, gen(flag)
      browse if flag
      and Stata will show them to you. Then your task is to figure out how to fix the data. Sometimes the excess observations are exact duplicates in every variable. In that case, -duplicates drop- will solve the problem. But I would point out that the mere fact that you ended up with extra copies of some observations suggests that something went wrong in the data management up to this point. Where one mistake lies, others often lurk. So I would urge you to do a thorough review of how your data set came into existence and check it carefully for other problems. Another hard case is if the observations that agree on cpsidp and date have different values on one or more other variables. Then you have to figure out how to resolve the contradiction: perhaps you can identify which observation is correct. Perhaps neither is correct and you need to combine them in some way (means, medians, min, max, something else). Doing that will require a thoughtful understanding of how these data came to be and what makes sense in the specific context of your research question.

      I want to emphasize that you cannot escape solving this problem. There are analyses which will work perfectly well even though the data cannot be -xtset panelvar timevar- due to this problem. But since you want to use the f1 operator, you must -xtset- the data that way. It is not just a matter of Stata's wishes: if there are two observations for cpsidp X and date 2001m3, then it is impossible to define the forward value for an observation of cpsidp X and date 2001m2 because there are two or more possibilities and no way to choose among them.
      [/code

      Comment


      • #4
        Note that when we apply Clyde's code to your dataset from post #1, we find four pair of duplicates, and two of those pairs have different values for empstat, illustrating the problem Clyde described in his final sentence.
        Code:
        . sort cpsid date
        
        . duplicates tag cpsid date, gen(flag)
        
        Duplicates in terms of cpsid date
        
        . list if flag, sepby(cpsid date)
        
             +------------------------------------------------------------------+
             | year   month            cpsid   statefip   empstat   date   flag |
             |------------------------------------------------------------------|
          5. | 2021       1   20191000064600          1        10    732      1 |
          6. | 2021       1   20191000064600          1        21    732      1 |
             |------------------------------------------------------------------|
         23. | 2021       1   20191200075100          1        34    732      1 |
         24. | 2021       1   20191200075100          1        10    732      1 |
             |------------------------------------------------------------------|
         62. | 2021       1   20201100070800          1        36    732      1 |
         63. | 2021       1   20201100070800          1        36    732      1 |
             |------------------------------------------------------------------|
         79. | 2021       1   20201200057300          1        36    732      1 |
         80. | 2021       1   20201200057300          1        36    732      1 |
             +------------------------------------------------------------------+

        Comment


        • #5
          You know, I just wanna take a moment and discuss the importance of data management here. I know it's boring sometimes, but I still take it deadly seriously. I don't have much to say that's not been said above, so I'll just discuss my project structure

          For every real project, I have a master file. A Master File that runs a Master Management and Master Analysis file. The Master Anal. and Management files are kept in their own directory (which are themselves stored in Management and Analysis directories). The Master files run sub-files, which are themselves kept in their own directory.

          I have one sub file per variable/task. Collect the shape file? 1_GIS.do. Collect population data? 2_Population.do Urbanicity, weather, outcome data? All have their very own do files that handle that specific task. Each file checks for data integrity too- assert that population is all an integer greater than 0 and that all percentages of populations aren't less than 0 or greater than 100. I check that my panel and date ids uniquely ID each other. Same for my analyses- one do file per graph, table and so on (unless there's a good case to have them in the same file.)

          my point here is that data cleaning and management is quintessential to your project being able to proceed at all. You don't need to be as obsessive about it as I am and have 5 directories for your replication files and one directory each for your raw and Master data respectively, but an organized, structured process prevents issues like this to a great degree, even if it's still imperfect. This is partly why I say you should always incorporate data collection into your code (if you work work administrative public data anyways) and why journals should demand the publication of the data cleaning and management files too in addition to analysis files. But what do I know, I'm just a doctoral student.

          Comment

          Working...
          X