merging data sets - Statalist

Clyde Schechter

Join Date: Apr 2014

Posts: 29795
#31

20 Mar 2024, 17:38

Why has the bico variable now disappeared from the master data set? This makes the data sets even less compatible for merging.

Here's the problem you are facing. Your master data set's observations are indexed by an exporter, importer, (bico?), year, and month. In the using data set, the observations are indexed by an exporter, importer, bico (for sure), year, and product category (name/number). To put these together, without aggregating up the data, you would need some rules that decide which month in the master data gets matched with which product in the using data. This seems highly implausible, and I suspect impossible to do even if it seemed to make sense.

Perhaps the solution is to aggregate up the data in the using data set to get one observation per exporter-importer-bico-year combination by averaging in some way the tariffs on the different product categories. (I've already checked: the tariffs differ across product categories even when the exporter, importer, bico, and year are all the same.) So you would need to decide on how to weight the different product categories (perhaps by volume traded--which you would have to get from yet another data set as it doesn't appear here?) in some way to make this work.

Another possibility is to match every product in the using data set to every month in the master data set, having matching values for exporter, importer, (bico?) and year. That will make the combined data set very large, with the observations in it identified by unique combinations of exporter, importer, year, month, and product group. Is that what you want? If so, that code is:

Code:

use master_data, clear joinby exporter importer bico year using using_data

Note: In this code I assume that bico really is still in your master data set. If it's not, just remove it from the -joinby- command.

Again, I don't know if either of these approaches will get you what you want. But I don't see any other possibilities.
Comment
Odiri Metieh

Join Date: Jun 2023

Posts: 22
#32

20 Mar 2024, 22:09

Clyde Schechter You were right to assume that bico was still existing in the master dataset. My apologies as there are over a 100 variables in that dataset and I sometimes forget to click the important ones when doing the dataex.

This joinby approach actually did a much better job than the former, thank you for taking the time to provide options! The only problem is that it dropped the other bico codes that were not matched in the master dataset with a tariff value from the using, which reduced my observations to 3,826 from 9,807 (this reduction is also partly due to the fact that my using dataset only had observations up to 2021). This happens to bring me to my last request on this please. Using the exporter importer bico year combination, is it possible to extrapolate a 2022 tariff value based off the previous years? Or just assign the 2021 value to 2022 for that combination as tariffs do not change significantly over a short period of time.

I'm really sorry for all the trouble.

Thank you so much again! I'm very grateful.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29795
#33

20 Mar 2024, 23:27

The -joinby- command has an -unmatched- option that lets you decide how observations that are found in one data set and not the other are handled. So if you want to keep the observations in the master, whether or not they have a match in the using, but not the otherway around, add the -unmatched(master)- option to your -joinby- command.

As for imputing a 2022 tariff value, there are many ways to do it. Carrying forward the 2021 observation is one way. Another would be to project the trend from several years before 2022. And there are more complicated ways as well. It's not really a statistical issue: it's a trade economics issue and if you aren't sufficiently expert in that field yourself to decide what to do, you need to consult somebody who is. I have no special knowledge in economics and can't advise you on this. Once you settle on a specific approach, though, if you want help implementing it in Stata, I'll try to help.
Comment
Odiri Metieh

Join Date: Jun 2023

Posts: 22
#34

21 Mar 2024, 08:38

Clyde Schechter Yes, I would like to keep the observations in the master whether or not they have a match in the using. I think I am making an error with the syntax, could you please correct it for me? Thank you.

Code:

use "MR_trade_tariff.dta", clear joinby exporter importer bico year using tariff_dataset.dta, unmatched(master) save "tariffmerged1.dta", replace

Also, for the 2022 tariff value, projecting the trend from prior years is the first option I was given. If that did not work, then assigning the 2021 value was the alternative. I would appreciate your help with implementing this in Stata.

Thank you!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29795
#35

21 Mar 2024, 11:12

I don't see anything wrong with that syntax. What is going wrong when you run it? Are you getting error messages? If so, show them. Or is it running without error messages, but not giving you the result you expect? In that case, show what you are getting and explain how it differs from what you want.

I don't understand your data set names. The names both refer to tariff, but only the using data set has any tariff information. So perhaps you have interchanged the two data sets, and you are getting a result that has all of the observations from the data set with the tariffs but only the matching ones from the one with all those tot_* variables.

If you want to to extrapolate a linear trend from, say years 2019 through 2021 to 2022 try:

Code:

by exporter importer bico prodno (year), sort: ipolate tariff year if year >= 2019, gen(new_tariff) epolate replace tariff = new_tariff if year == 2022 & missing(tariff) drop new_tariff

Note: Do this in the tariff data set before you join it to the master data set.
Comment
Odiri Metieh

Join Date: Jun 2023

Posts: 22
#36

21 Mar 2024, 16:17

I don't get any error messages, nothing at all. When I run the code, this is all that shows when I run the code;

[/CODE]. joinby exporter importer bico year using tariff_dataset.dta, unmatched(master)

.
end of do-file
[/CODE]

I didn't change any information in the dataset, the name "MR_trade_tariff" was just to differentiate it from another dataset I have called "MR_trade". You are correct, only the using dataset (tariff_dataset) still has the tariff information.
I was also able to do the extrapolation for 2022 with the code you gave. Thank you! Clyde Schechter
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment