Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Defining variables where only measure the change in the distance

    Hi everyone,

    I'm having a trouble creating a variable to measure changes in distance overtime using a panel data. For example: I create a var, name base12. This var contains information about distance of a household to the main road. In this case, we only pay attention to households whose distance is from 0~1km in 2012. Now I want to create another var, base14 for example. This var, base14 contains households whose characteristics are: having a distance of 0~1km in 2012 but somehow having a distance significantly larger than 1km (maybe 2 or 3km because the household moved away) in 2014.

    I have tried as follow, yet it's not no effective and cannot satisfy my needs for a variable.

    bysort ID: gen base12=dist_b if (inrange(dist_b,0,1)) & YR==2012
    drop if base12==.
    bysort base12: gen base14=dist_b if dist_b>1
    replace base14=0 if base14==.

    I have made an example of my data as follow:

    input float ID int YR float dist_b
    1 2012 0
    2 2014 2
    3 2012 1
    4 2014 9
    5 2012 9
    1 2014 2
    2 2012 8
    3 2014 10
    4 2012 1
    5 2014 4
    1 2012 1
    2 2014 4
    3 2012 7
    4 2014 8
    5 2012 6
    1 2014 6
    2 2012 7
    3 2014 10
    4 2012 1
    5 2014 3
    1 2012 1
    2 2014 9
    3 2012 4
    4 2014 1
    5 2012 8
    I appreciate any of your help.
    All the best.

  • #2
    Hi Hoang,

    Please copy and paste the data dataex gave you without pasting it in excel or other softwares first. dataex is explicit on what it wants you to copy to share your data: please follow its instructions and use code delimiters.

    All ID's had their distance to the road measured more than once per year. Which of those measurements do you want to use (or if any of these measurements is <1, would that be enough for base12 to be 1?)

    Comment


    • #3
      Here is a solution to your problem. I don't know what criteria you use to choose multiple observations of ID's within years, so I coded in a manner such that all observations are kept. I converted your data from long to wide.

      Code:
      clear
      input float ID int YR float dist_b
      1 2012 0
      2 2014 2
      3 2012 1
      4 2014 9
      5 2012 9
      1 2014 2
      2 2012 8
      3 2014 10
      4 2012 1
      5 2014 4
      1 2012 1
      2 2014 4
      3 2012 7
      4 2014 8
      5 2012 6
      1 2014 6
      2 2012 7
      3 2014 10
      4 2012 1
      5 2014 3
      1 2012 1
      2 2014 9
      3 2012 4
      4 2014 1
      5 2012 8
      end
      
      *your database is in long format, converted it to wide - now measurements of a single ID within
      *a year are not new lines, but new variables. Variables that measure the distance now contain the
      *year and the number of the measurement within that year
      tostring YR, replace
      bysort ID YR: gen n = _n
      tostring n, replace
      gen YR_n =  YR+"_"+ n
      drop YR n
      reshape wide dist_b, i(ID) j(YR_n) string
      
      *creating base12 as 1 if any measurement of 2012 is <=1. If you want all the measurements to be <=1, replace | by &
      gen base12 = 1 if (dist_b2012_1 <=1) | (dist_b2012_2 <=1) | (dist_b2012_3 <=1)
      
      *creating base14 as 1 if base12==1 and if the smallest measurement of 2014 is >= 2 while not being
      *missing (Stata interprets . as positive infinity, so coding >2 includes missing. If you have a house
      *without measurements in 2014, the var max2014 would be . and just specifiying min2014 >= 2 would
      *return as 1 in the observations with all missing values for 2014)
      egen min2014 = rowmin(dist_b2014_1 dist_b2014_2 dist_b2014_3)
      gen base14 = 1 if base12 == 1 & (min2014 >= 2 & min2014 != .)
      This might help you solve the problem. Good luck!

      Comment


      • #4
        Hi Igor,

        Thank you for your explanation. I changed the data a little bit in excel since my data is a panel data and I don't know how to make a sample to illustrate what I want to change.

        Please forgive my demanding. Can you show me how to generate base 12 = if we take into account all range of the ID? Because I have around 3000 observations and looking for a one by one change is kinda encumberance.

        Comment


        • #5
          Hoang, my suggestion above does that - please try it. It transform all rows of ID's into a single line and then assign each of them (doesn't matter if it's 3000 rows) a value of base12 based on the criteria you wanted.

          If you would rather keep your data on long format, the commands below creates base12 and base14 according to what you specified and return the data to long, keeping all observations (rows) for each ID. You did not specify how do you plan on dealing with situations in which there are more than one measurement per year - i.e., for base12, all obs must be <1 or is a single one enough?

          Code:
          clear
          input float ID int YR float dist_b
          1 2012 0
          2 2014 2
          3 2012 1
          4 2014 9
          5 2012 9
          1 2014 2
          2 2012 8
          3 2014 10
          4 2012 1
          5 2014 4
          1 2012 1
          2 2014 4
          3 2012 7
          4 2014 8
          5 2012 6
          1 2014 6
          2 2012 7
          3 2014 10
          4 2012 1
          5 2014 3
          1 2012 1
          2 2014 9
          3 2012 4
          4 2014 1
          5 2012 8
          end
          tostring YR, replace
          bysort ID YR: gen n = _n
          tostring n, replace
          gen YR_n = YR +"_"+ n
          drop YR n
          reshape wide dist_b, i(ID) j(YR_n) string
          gen base12 = 1 if (dist_b2012_1 <=1) | (dist_b2012_2 <=1) | (dist_b2012_3 <=1)
          egen min2014 = rowmin(dist_b2014_1 dist_b2014_2 dist_b2014_3)
          gen base14 = 1 if base12 == 1 & (min2014 >= 2 & min2014 != .)
          reshape long dist_b, i(ID) j(YR_n) string
          drop if dist_b==.
          gen YR = substr(YR_n, 1,4)
          drop YR_n min2014
          destring YR, replace
          order ID YR dist_b base12 base14

          Comment

          Working...
          X