Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extending Year Forward and Filling Values

    Dear Community members,


    I have a problem that requires me to pull back years in following dataframe:

    Click image for larger version

Name:	STATA yr.png
Views:	1
Size:	291.3 KB
ID:	1403602



    Basically, the data is CODGEO (like a zip code) by Zone of Employment (ZE, which is a larger unit).
    The current data maps CODEGEO to ZE from 2008-2014.
    I would like to pull the years back to 1998 for every CODGEO and copy the value of ZE.

    ie.

    CODGEO YR ZE
    01001 1998 8203
    01001 1999 8203
    01001 2000 8203
    01001 2001 8203
    01001 2002 8203
    01001 2003 8203
    (all the way to 2008 for each CODGEO)
    01001 2008 8203


    I was refering the following link https://stats.idre.ucla.edu/stata/fa...time-variable/
    but it needs there to be atleast one CODGEO to have 1998.


    Kindly let me know if it is possible to do so. I await your verdict.

    Best,
    Pranav Garg

  • #2
    Code:
    **create fake data example
    clear 
    set obs 100
    g CODGEO = 1000+mod(_n,20)
    bys CODGEO: g YR  =  2007+_n
    g ZE = CODGEO*1.5
    sort COD YR
    
    list in 1/10
    
    
    **1. add extra rows to help rectangularize data with fillin:
    
    forval j = 1998/2007 {
        set obs `=_N+1'
        replace YR = `j' in l
        }
        l in -15/l  //here are the new blank records
    
    
        
    *get fillin from SSC
    cap which fillin
    if _rc ssc install fillin, replace
    
    
    fillin CODGEO YR
    sort COD YR
    
    bys CODGEO (YR): g obs = _n
    ta obs
    drop if mi(CODGEO)
    
    **use egen max to fill ZE, or subscripting, or carry forward from SSC:
    cap which carryforward
    if _rc ssc install carryforward, replace
    
    g revyear = 2012-YR
     
    bys CODGEO (revyear):  carryforward ZE , replace
    drop revyear obs _fillin
    sort COD YR 
    isid COD YR ZE
    Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

    Comment


    • #3
      Code:
      encode CODGEO, gen(codeg)
      
      tsset codeg YR, yearly
      
      tsappend, add(10)
      replace YR=YR-8 if YR==2015
      replace YR=YR-10 if YR==2016
      replace YR=YR-12 if YR==2017
      replace YR=YR-14 if YR==2018
      replace YR=YR-16 if YR==2019
      replace YR=YR-18 if YR==2020
      replace YR=YR-20 if YR==2021
      replace YR=YR-22 if YR==2022
      replace YR=YR-24 if YR==2023
      replace YR=YR-26 if YR==2024
      
      decode codeg, gen(CodG)
      drop CODGEO codeg
      rename CodG CODGEO
      order CODGEO YR
      
      gsort CODGEO -YR
      
      
      replace ZE = ZE[_n-1] if ZE==""

      This is what I used. It worked for me but it was a very messy code to see.
      I shall try again with yours!

      Thank you Eric.

      Pranav
      Last edited by Pranav Garg; 25 Jul 2017, 08:19.

      Comment


      • #4
        Dear Sir,

        Thank you for your time.

        The code you have constructed in working. However, there is one error at the end:

        <CODE>
        isid COD YR ZE
        variables CODGEO YR ZE should never be missing
        r(459);
        <CODE>

        For this, I did:

        <CODE>
        list if mi(CODGEO) mi(YR ) mi(ZE)

        +--------------------+
        CODGEO YR ZE
        --------------------
        31565. 05161 2010
        31566. 05161 2011
        31567. 05161 2012
        31568. 05161 2013
        31569. 05161 2014
        --------------------
        33761. 06106 2013
        33762. 06106 2014
        33763. 06107 1998
        33764. 06107 1999
        33765. 06107 2000
        --------------------
        33766. 06107 2001
        33767. 06107 2002
        33768. 06107 2003
        33769. 06107 2004
        33770. 06107 2005
        --------------------
        33771. 06107 2006
        33772. 06107 2007
        33773. 06107 2008
        33774. 06107 2009
        33775. 06107 2010
        --------------------
        33776. 06107 2011
        33777. 06107 2012
        33778. 06107 2013
        33779. 06107 2014
        34134. 06128 2012
        --------------------
        34135. 06128 2013
        34136. 06128 2014
        38979. 07259 2012
        38980. 07259 2013
        38981. 07259 2014
        --------------------
        81567. 14472 1998
        81568. 14472 1999
        81569. 14472 2000
        81570. 14472 2001
        81571. 14472 2002
        --------------------
        81572. 14472 2003
        81573. 14472 2004
        81574. 14472 2005
        81575. 14472 2006
        81576. 14472 2007
        --------------------
        81577. 14472 2008
        81578. 14472 2009
        81579. 14472 2010
        81580. 14472 2011
        81581. 14472 2012
        --------------------
        81582. 14472 2013
        81583. 14472 2014
        94231. 16294 2014
        102374. 17377 2014
        177459. 28387 2010
        --------------------
        177460. 28387 2011
        177461. 28387 2012
        177462. 28387 2013
        177463. 28387 2014
        178329. 29021 2013
        --------------------
        178330. 29021 2014
        181712. 29236 2013
        181713. 29236 2014
        202517. 31455 2010
        202518. 31455 2011
        --------------------
        202519. 31455 2012
        202520. 31455 2013
        202521. 31455 2014
        237724. 37005 2010
        237725. 37005 2011
        --------------------
        237726. 37005 2012
        237727. 37005 2013
        237728. 37005 2014
        270656. 42019 2013
        270657. 42019 2014
        --------------------
        287857. 45265 2010
        287858. 45265 2011
        287859. 45265 2012
        287860. 45265 2013
        287861. 45265 2014
        --------------------
        330052. 52376 2011
        330053. 52376 2012
        330054. 52376 2013
        330055. 52376 2014
        334300. 53135 2009
        --------------------
        334301. 53135 2010
        334302. 53135 2011
        334303. 53135 2012
        334304. 53135 2013
        334305. 53135 2014
        --------------------
        358661. 56229 2009
        358662. 56229 2010
        358663. 56229 2011
        358664. 56229 2012
        358665. 56229 2013
        --------------------
        358666. 56229 2014
        389435. 60099 2013
        389436. 60099 2014
        413997. 62477 2010
        413998. 62477 2011
        --------------------
        413999. 62477 2012
        414000. 62477 2013
        414001. 62477 2014
        424197. 63181 2010
        424198. 63181 2011
        --------------------
        424199. 63181 2012
        424200. 63181 2013
        424201. 63181 2014
        496294. 73116 2010
        496295. 73116 2011
        --------------------
        496296. 73116 2012
        496297. 73116 2013
        496298. 73116 2014
        497994. 73231 2010
        497995. 73231 2011
        --------------------
        497996. 73231 2012
        497997. 73231 2013
        497998. 73231 2014
        500140. 74045 2014
        502788. 74220 2010
        --------------------
        502789. 74220 2011
        502790. 74220 2012
        502791. 74220 2013
        502792. 74220 2014
        513860. 76601 1998
        --------------------
        513861. 76601 1999
        513862. 76601 2000
        513863. 76601 2001
        513864. 76601 2002
        513865. 76601 2003
        --------------------
        513866. 76601 2004
        513867. 76601 2005
        513868. 76601 2006
        513869. 76601 2007
        513870. 76601 2008
        --------------------
        513871. 76601 2009
        513872. 76601 2010
        513873. 76601 2011
        513874. 76601 2012
        513875. 76601 2013
        --------------------
        513876. 76601 2014
        527643. 78401 2011
        527644. 78401 2012
        527645. 78401 2013
        527646. 78401 2014
        --------------------
        529412. 78683 2012
        529413. 78683 2013
        529414. 78683 2014
        546908. 80791 1998
        546909. 80791 1999
        --------------------
        546910. 80791 2000
        546911. 80791 2001
        546912. 80791 2002
        546913. 80791 2003
        546914. 80791 2004
        --------------------
        546915. 80791 2005
        546916. 80791 2006
        546917. 80791 2007
        546918. 80791 2008
        546919. 80791 2009
        --------------------
        546920. 80791 2010
        546921. 80791 2011
        546922. 80791 2012
        546923. 80791 2013
        546924. 80791 2014
        --------------------
        551155. 81217 2012
        551156. 81217 2013
        551157. 81217 2014
        552075. 81271 2014
        555743. 82161 2010
        --------------------
        555744. 82161 2011
        555745. 82161 2012
        555746. 82161 2013
        555747. 82161 2014
        558245. 83114 2013
        --------------------
        558246. 83114 2014
        587689. 89342 2013
        587690. 89342 2014
        588012. 89367 2013
        588013. 89367 2014
        --------------------
        588962. 89431 2011
        588963. 89431 2012
        588964. 89431 2013
        588965. 89431 2014
        +--------------------+
        <CODE>



        Yet, I am greatly interested in knowing how you did things differently.

        Is there anything I could refer to understand your methodology?
        Or, if possible, would you give a brief explanation of why you did the things you did.

        Best,
        Pranav
        Last edited by Pranav Garg; 25 Jul 2017, 08:35.

        Comment


        • #5
          I didnt run anything differently - I dont get the error at the end. You've run something differently (at least based on your output - the id's are larger than 1000 which is the limit set in my code, so you could not have run my exact code). -isid- checks whether the specified variables uniquely identify the observations. -help isid- gives more information about the functionality but I included this to show that the fake data example I used creates a rectangularized dataset. You can explore this further with commands like -duplicates-, e.g.:


          Code:
          isid COD YR ZE
          
          xtset COD YR
          duplicates report COD YR ZE
          duplicates report COD YR 
          bys COD (YR): g numobs = _N
          ta numobs
          xtsum ZE
          I've annotated my code below - hopefully this helps answer some of your questions, but if not please ask specific questions.


          Code:
          **1.this part adds extra rows to the end of the dataset. `=_N+1' adds one obs to the dataset and then the -replace- line replaces that new observation with one of the years you want to fill (starting with 1998):
          
          forval j = 1998/2007 {
              set obs `=_N+1'
              replace YR = `j' in l
              }
              l in -15/l  //here are the new blank records
          
          
           **2. next you want to rectangularize the dataset (make all panels of CODGEO YR have all the years from 1998 to 2012) 
          ** this command will add observations so that each CODGEO has new observations for all possible values/combinations of YR. 
          
          *get fillin from SSC
          cap which fillin
          if _rc ssc install fillin, replace
          
          
          fillin CODGEO YR
          sort COD YR
          
          bys CODGEO (YR): g obs = _n
          ta obs
          drop if mi(CODGEO)
          
          **Next, I use carry forward  from SSC to fill ZE across each panel of CODGEO & YR from earlier, nonmissing observations. You could use other methods (search for help on explicit subscripting or you could use -egen- functions like max() ). In order to fill or carry forward from non-missing observations, I need to reverse the sort for each panel since the nonmissing observations of YR occur later in time -- so I create a reversely coded YR var 'revyear' and then sort off this variable during the carry forward.
          
          cap which carryforward
          if _rc ssc install carryforward, replace
          
          g revyear = 2012-YR
          
           
          bys CODGEO (revyear):  carryforward ZE , replace
          
          drop revyear obs _fillin
          
          sort COD YR 
          
          isid COD YR ZE
          ​​​​​​​
          Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

          Comment

          Working...
          X