Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge two cross-section data

    Hello,

    I'm trying to combine two cross-sectional data sets using merge. The reason why I'm using merge is because I want to create a balanced panel data set, using keep(match). The datasets that I'm working, are the scores from a standardized test applied to schools, so each school has an unique "id" and I'd like to combine the data from different years, so I can create a panel data. The problem is that when I use the merge command, it doesn't add another row for the scores of the prior year (which is the dataset that I'm adding), it keeps the scores from the master data set.

    Code:
    clear all
    
    global path "/Users/fernandobastidasespinoza/Desktop/Universidad/2022 S2/Tesis/Datos/bbdd_simce"
    cd "$path"
    
    use "$path/simce8b2019_rbd.dta"
    
    merge 1:1 rbd using "$path/simce8b2017_rbd_publica_final.dta", keep(match) force
    I'd really appreciate your help!

  • #2
    If the two data sets represent two different years of data but contain the same variables (school id and scores, etc.) then you shouldn't be -merge-ing them to create a panel data set, you should be -append-ing them.

    Other thoughts: don't use -force- options. They will cause you to lose data, and if you aren't 1,000% sure that the data you are going to lose is of no importance, you are then heading into your analyses with data that is near guaranteed to produce incorrect results. The -force- option in -merge- (and also in -append-, by the way) serves to allow Stata to throw out data if a variable is stored as a string in one data set and numeric in the other. That kind of data loss is almost always seriously consequential, unless the variable(s) in question is (are) never going to be used in your analyses. But if that's the case, a safer solution is to just -drop- the variable(s) from both data sets before combining them. Then there will be no clash between them when using un-force-d commands.

    If you are not able to work this out by using -append- instead of -merge-, then when posting back for additional help show example data from both data sets, using the -dataex- command to do so. Without seeing the data, any offer of help is going to be based purely on guesswork and is a waste of everybody's time (including yours). So help those who want to help you by showing data, and in a usable way. That's what -dataex- is for. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment

    Working...
    X