Hello,
(in the table below, each row corresponds to a different firm)
I have 9 variables here, c1, c2 and c3, which correspond to the share of exports going to country 1, 2 and 3 (assuming the firm only exports to 3 countries or less so these percentages add to 100).
c1_code is the country code for the country 1, c2_code and c3_code are the codes for countries 2 and 3. c1_year, c2_year and c3_year are the year at which the firm started exporting to each of the countries.
(note: there are cases where the firm only exports to one or two countries, hence the missing values, but for simplicity lets say the max number of countries is 3 so these percentages always add to 100).
The significance of ordering the countries 1, 2 and 3, is that the questionnaire was asking firms to rank the countries in order from highest to lowest share of exports going to that country. So essentially c1 should be larger than c2 which should be larger than c3.
My problem here as you can see in the table below, in a few rare cases (such as row 2) the firm did not order the countries correctly, so I want to reorganize the values so that the ordering is consistent. so row 2 should look something like this:
Note how I'm also trying to reorder the other variable entries in the 'code' and 'year' variables which correspond to the initial incorrect country ordering. So those will also be reordered accordingly.
How can I achieve this on a large scale? I have many instances of row 2, although they are quiet rare.
Thanks,
Jad
(in the table below, each row corresponds to a different firm)
I have 9 variables here, c1, c2 and c3, which correspond to the share of exports going to country 1, 2 and 3 (assuming the firm only exports to 3 countries or less so these percentages add to 100).
c1_code is the country code for the country 1, c2_code and c3_code are the codes for countries 2 and 3. c1_year, c2_year and c3_year are the year at which the firm started exporting to each of the countries.
(note: there are cases where the firm only exports to one or two countries, hence the missing values, but for simplicity lets say the max number of countries is 3 so these percentages always add to 100).
The significance of ordering the countries 1, 2 and 3, is that the questionnaire was asking firms to rank the countries in order from highest to lowest share of exports going to that country. So essentially c1 should be larger than c2 which should be larger than c3.
c1 | c2 | c3 | c1_code | c2_code | c3_code | c1_year | c2_year | c3_year |
100 | . | . | 1308 | . | . | 2008 | . | . |
35 | 20 | 45 | 1765 | 1308 | 2247 | 1979 | 2002 | 2000 |
50 | 50 | . | 5099 | 1147 | . | 2003 | 2004 | . |
c1 | c2 | c3 | c1_code | c2_code | c3_code | c1_year | c2_year | c3_year |
45 | 35 | 20 | 2247 | 1765 | 1308 | 2000 | 1979 | 2002 |
How can I achieve this on a large scale? I have many instances of row 2, although they are quiet rare.
Thanks,
Jad
Comment