Replacing variable entries based on ranking/ordering of entries in other variables

Jad Tamimi

Join Date: Sep 2023

Posts: 115
#1

Replacing variable entries based on ranking/ordering of entries in other variables

07 Mar 2024, 16:08

Hello,

(in the table below, each row corresponds to a different firm)

I have 9 variables here, c1, c2 and c3, which correspond to the share of exports going to country 1, 2 and 3 (assuming the firm only exports to 3 countries or less so these percentages add to 100).
c1_code is the country code for the country 1, c2_code and c3_code are the codes for countries 2 and 3. c1_year, c2_year and c3_year are the year at which the firm started exporting to each of the countries.
(note: there are cases where the firm only exports to one or two countries, hence the missing values, but for simplicity lets say the max number of countries is 3 so these percentages always add to 100).

The significance of ordering the countries 1, 2 and 3, is that the questionnaire was asking firms to rank the countries in order from highest to lowest share of exports going to that country. So essentially c1 should be larger than c2 which should be larger than c3.

c1 c2 c3 c1_code c2_code c3_code c1_year c2_year c3_year

100 . . 1308 . . 2008 . .

35 20 45 1765 1308 2247 1979 2002 2000

50 50 . 5099 1147 . 2003 2004 .

My problem here as you can see in the table below, in a few rare cases (such as row 2) the firm did not order the countries correctly, so I want to reorganize the values so that the ordering is consistent. so row 2 should look something like this:

c1 c2 c3 c1_code c2_code c3_code c1_year c2_year c3_year

45 35 20 2247 1765 1308 2000 1979 2002

Note how I'm also trying to reorder the other variable entries in the 'code' and 'year' variables which correspond to the initial incorrect country ordering. So those will also be reordered accordingly.

How can I achieve this on a large scale? I have many instances of row 2, although they are quiet rare.

Thanks,
Jad
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

07 Mar 2024, 17:37

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(c1 c2 c3) int(c1_code c2_code c3_code c1_year c2_year c3_year) 100 . . 1308 . . 2008 . . 35 20 45 1765 1308 2247 1979 2002 2000 50 50 . 5099 1147 . 2003 2004 . end gen `c(obs_t)' obs_no = _n reshape long c c@_code c@_year, i(obs_no) gsort obs_no -c _j by obs_no: replace _j = _n reshape wide

That said, most data management and analysis in Stata is easier, or only possible, in long data layout. So you might want to skip the -reshape wide- at the end and keep your data in the long layout. It will probably make whatever you do next easier.

In the future, when showing data examples, please use the -dataex- command to do so as I have here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment

c1	c2	c3	c1_code	c2_code	c3_code	c1_year	c2_year	c3_year
100	.	.	1308	.	.	2008	.	.
35	20	45	1765	1308	2247	1979	2002	2000
50	50	.	5099	1147	.	2003	2004	.

c1	c2	c3	c1_code	c2_code	c3_code	c1_year	c2_year	c3_year
45	35	20	2247	1765	1308	2000	1979	2002

Announcement

Replacing variable entries based on ranking/ordering of entries in other variables

Comment