Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating an indicator variable for ever exporting to the US

    Hello, I have data which looks like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float year long madn float(c1_export_destination_vtcs c2_export_destination_vtcs c3_export_destination_vtcs)
    2009  14    .    .    .
    2010  14 1311 1304    .
    2011  14    .    .    .
    2012  14    .    .    .
    2013  14    .    .    .
    2014  14    .    .    .
    2015  14    .    .    .
    2016  14    .    .    .
    2017  14    .    .    .
    2009  41    .    .    .
    2010  41    .    .    .
    2011  41    .    .    .
    2012  41    .    .    .
    2013  41    .    .    .
    2014  41    .    .    .
    2015  41    .    .    .
    2016  41    .    .    .
    2017  41    .    .    .
    2009  76    .    .    .
    2010  76    .    .    .
    2011  76    .    .    .
    2012  76 1307 1107    .
    2013  76    .    .    .
    2014  76    .    .    .
    2015  76    .    .    .
    2016  76    .    .    .
    2017  76    .    .    .
    2009  81    .    .    .
    2010  81    .    .    .
    2011  81    .    .    .
    2012  81    .    .    .
    2013  81    .    .    .
    2014  81    .    .    .
    2015  81    .    .    .
    2016  81    .    .    .
    2017  81    .    .    .
    2009 100    .    .    .
    2010 100    .    .    .
    2011 100    .    .    .
    2012 100    .    .    .
    2013 100    .    .    .
    2014 100    .    .    .
    2015 100    .    .    .
    2016 100    .    .    .
    2017 100    .    .    .
    2009 297    .    .    .
    2010 297    .    .    .
    2011 297    .    .    .
    2012 297    .    .    .
    2013 297    .    .    .
    2014 297    .    .    .
    2015 297    .    .    .
    2016 297    .    .    .
    2017 297    .    .    .
    2009 307    .    .    .
    2010 307    .    .    .
    2011 307    .    .    .
    2012 307    .    .    .
    2013 307    .    .    .
    2014 307    .    .    .
    2015 307    .    .    .
    2016 307    .    .    .
    2017 307    .    .    .
    2009 319    .    .    .
    2010 319    .    .    .
    2011 319    .    .    .
    2012 319    .    .    .
    2013 319    .    .    .
    2014 319    .    .    .
    2015 319    .    .    .
    2016 319    .    .    .
    2017 319    .    .    .
    2009 330    .    .    .
    2010 330    .    .    .
    2011 330    .    .    .
    2012 330    .    .    .
    2013 330    .    .    .
    2014 330    .    .    .
    2015 330    .    .    .
    2016 330    .    .    .
    2017 330    .    .    .
    2009 339    .    .    .
    2010 339    .    .    .
    2011 339    .    .    .
    2012 339    .    .    .
    2013 339    .    .    .
    2014 339    .    .    .
    2015 339 1320 1109 2208
    2016 339 1320 1109 1307
    2017 339 1307 1320 1311
    2009 343    .    .    .
    2010 343    .    .    .
    2011 343    .    .    .
    2012 343    .    .    .
    2013 343    .    .    .
    2014 343    .    .    .
    2015 343    .    .    .
    2016 343    .    .    .
    2017 343    .    .    .
    2009 344    .    .    .
    end

    This is a small sample but it captures what is required. We have the 'year' variable, the 'madn' variable is used as a firm identifier, for example madn = 14, corresponds to one firm, here we observe the same firms in different years. 'c*_export_destination_vtcs' are variables containing the firms top 3 export destinations for that year (countries are assigned a 4-digit numeric code), for example c1_export_destination_vtcs = 1320, means that in that year the firms top exporting destination was the country assigned '1320'. There are a lot of missing values here either because the firm did not export to 3 or more countries or did not export at all.

    I want to create an indicator variable for wether a firm had ever exported to the US, the country code assigned to the US is 4101. So I want this variable to take the value 1 if the firm ever exported to the US, in other words if 'c1_export_destination_vcts' or 'c2...' or 'c3....' was ever equal to '4101' for a certain firm. How can I implement this?

    Thanks,
    Jad


  • #2
    There are (at least) two ways to do this.

    Code:
    //    METHOD 1: WORK WITH THE EXISTING LAYOUT
    foreach v of varlist c*_export_destination_vtcs {
        gen byte us_`v' = (`v' == 4101)
    }
    egen us_this_madn_this_year = rowmax(us_*)
    by madn (year), sort: egen us_this_madn_ever = max(us_this_madn_this_year)
    
    
    
    
    //    METHOD 2: GO TO FULLY LONG DATA LAYOUT
    isid madn year, sort
    reshape long c@_export_destination_vtcs, i(madn year) j(rank)
    by madn (year rank): egen us_this_madn_ever = max(c_export_destination_vtcs == 4101)
    reshape wide // OPTIONALLY RETURN TO WIDE LAYOUT
    Note: Don't do these one after the other. Do one, then restore the original data, and try the other, if you want to try them both.

    Picking between them depends on several factors. The first method's code is more convoluted and difficult to follow, and it wastes memory. If your full data set is already pushing the limits of accessible memory, the second would be clearly preferable. The second method, however, will take longer in a very large data set. On the other hand, depending on what you want to do after this, you might be better off with the fully long data layout and you can cut down the time required for this method by skipping the -reshape wide- at the end. If your original data set is not especially large, I would lean to method 2 because of the simpler code.

    By the way, your example data never uses code 4101, so this code is not completely tested. In the future, when posting example data, please select an example that fully illustrates the problem you seek to solve.
    Last edited by Clyde Schechter; 19 Mar 2024, 11:50.

    Comment


    • #3
      Under Clyde Schechter's first approach, note also the any* functions of egen.

      Comment

      Working...
      X