Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing variable entries based on ranking/ordering of entries in other variables

    Hello,

    (in the table below, each row corresponds to a different firm)

    I have 9 variables here, c1, c2 and c3, which correspond to the share of exports going to country 1, 2 and 3 (assuming the firm only exports to 3 countries or less so these percentages add to 100).
    c1_code is the country code for the country 1, c2_code and c3_code are the codes for countries 2 and 3. c1_year, c2_year and c3_year are the year at which the firm started exporting to each of the countries.
    (note: there are cases where the firm only exports to one or two countries, hence the missing values, but for simplicity lets say the max number of countries is 3 so these percentages always add to 100).

    The significance of ordering the countries 1, 2 and 3, is that the questionnaire was asking firms to rank the countries in order from highest to lowest share of exports going to that country. So essentially c1 should be larger than c2 which should be larger than c3.
    c1 c2 c3 c1_code c2_code c3_code c1_year c2_year c3_year
    100 . . 1308 . . 2008 . .
    35 20 45 1765 1308 2247 1979 2002 2000
    50 50 . 5099 1147 . 2003 2004 .
    My problem here as you can see in the table below, in a few rare cases (such as row 2) the firm did not order the countries correctly, so I want to reorganize the values so that the ordering is consistent. so row 2 should look something like this:
    c1 c2 c3 c1_code c2_code c3_code c1_year c2_year c3_year
    45 35 20 2247 1765 1308 2000 1979 2002
    Note how I'm also trying to reorder the other variable entries in the 'code' and 'year' variables which correspond to the initial incorrect country ordering. So those will also be reordered accordingly.

    How can I achieve this on a large scale? I have many instances of row 2, although they are quiet rare.

    Thanks,
    Jad




  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(c1 c2 c3) int(c1_code c2_code c3_code c1_year c2_year c3_year)
    100  .  . 1308    .    . 2008    .    .
     35 20 45 1765 1308 2247 1979 2002 2000
     50 50  . 5099 1147    . 2003 2004    .
    end
    
    gen `c(obs_t)' obs_no = _n
    reshape long c c@_code c@_year, i(obs_no)
    gsort obs_no -c _j
    by obs_no: replace _j = _n
    reshape wide
    That said, most data management and analysis in Stata is easier, or only possible, in long data layout. So you might want to skip the -reshape wide- at the end and keep your data in the long layout. It will probably make whatever you do next easier.

    In the future, when showing data examples, please use the -dataex- command to do so as I have here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment

    Working...
    X