Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create new dummy variable based on identifiers of 2 other variables

    Hi!
    I'm currently working with data of this form
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(c1 c2 zone out_zone)
    1 1 100 0
    1 2 100 0
    1 3 100 1
    2 1 100 0
    2 2 100 0
    2 3 100 1
    3 1 200 1
    3 2 200 1
    3 3 200 1
    end
    I have c1 and c2 which are county IDs. zone is a variable which identifies the commuting zone that counties in c1 are in. In this case, counties 1 and 2 are in the same commuting zone (100), while county 3 is in commuting zone 200. I'd like to create the out_zone variable which is a dummy = 0 if the two counties are in the same commuting zone and = 1 if they are not in the same commuting zone. For example, here, out_zone is 0 when the rows contain combinations of counties 1 and 2 but out_zone = 1 when rows contain combinations of county 3 and the other two.

    In order to create the out_zone variable, my initial thought was to have a for loop creating macros by the zone variable (adding all c1s into a macro if they have the same zone number). Then I could generate out_zone , specifying 0 if c1 and c2 are in the same macro. However, I'm not quite sure how to do this and it seems to me that there should be a simpler way.
    Any help would be greatly appreciated!
    Last edited by Dhruva Jaishankar; 06 Mar 2022, 12:02.

  • #2
    No loop is needed. If you are using Stata 16 or later, the following should start you in a useful direction.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(c1 c2 zone out_zone)
    1 1 100 0
    1 2 100 0
    1 3 100 1
    2 1 100 0
    2 2 100 0
    2 3 100 1
    3 1 200 1
    3 2 200 1
    3 3 200 1
    end
    
    capture frame drop zones
    by c1 (c2), sort: frame put c1 zone  if _n==1, into(zones)
    frlink m:1 c2, frame(zones c1) generate(zonelink)
    frget zone2 = zone, from(zonelink)
    drop zonelink
    generate wanted = zone2!=zone
    list, sepby(c1)
    Code:
    . list, sepby(c1)
    
         +--------------------------------------------+
         | c1   c2   zone   out_zone   zone2   wanted |
         |--------------------------------------------|
      1. |  1    1    100          0     100        0 |
      2. |  1    2    100          0     100        0 |
      3. |  1    3    100          1     200        1 |
         |--------------------------------------------|
      4. |  2    1    100          0     100        0 |
      5. |  2    2    100          0     100        0 |
      6. |  2    3    100          1     200        1 |
         |--------------------------------------------|
      7. |  3    1    200          1     100        1 |
      8. |  3    2    200          1     100        1 |
      9. |  3    3    200          1     200        0 |
         +--------------------------------------------+
    If you're using an older version, the following produces the same results without using frames.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(c1 c2 zone out_zone)
    1 1 100 0
    1 2 100 0
    1 3 100 1
    2 1 100 0
    2 2 100 0
    2 3 100 1
    3 1 200 1
    3 2 200 1
    3 3 200 1
    end
    
    preserve
    
    by c1 (c2), sort: keep if _n==1
    keep c1 zone
    rename (c1 zone) (c2 zone2)
    tempfile zone
    save `zone'
    
    restore
    
    merge m:1 c2 using `zone', nogen
    sort c1 c2
    generate wanted = zone2!=zone
    list, sepby(c1)
    Last edited by William Lisowski; 06 Mar 2022, 12:26.

    Comment


    • #3
      Hi William,
      Thanks very much, the second code snippet worked! I am using Stata 16 but the first one doesn't quite seem to work. I receive an error for the second line saying that "if is not allowed" and then that "frame zones not found". For future reference, am I meant to create a frame called zones first or is there something else I need to add for the code to work?

      Comment


      • #4
        Your copy of Stata 16 has not been updated to the latest version. The ability to use a variable list and an if together on the frame put command was added in the 18 Feb 2020 update to Stata 16.1.

        Stata updates are free (as opposed to Stata upgrades to a newer version, e.g. to Stata 17). You should review Chapter 19 of the Getting Started with Stata PDF included in your Stata installation and accessible from Stata's Help menu for information about keeping your copy of Stata as updated as possible, so that you have the best possible experience for the version you have.

        Comment


        • #5
          Oh I see, thank you very much!

          Comment

          Working...
          X