
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pooling two cross-sections


    There are two location variables (state and district) in the datasets I'm trying to pool. The value label is defined differently for the same states/districts in the two datasets. How can I proceed so that in the pooled dataset the value labels are the same? There are close to 700 districts, and there are cases wherein the same district is spelled differently in the two datasets. Please suggest a solution.


  • #2
    I guess you don't seek the vacuous answer: so make them consistent!

    What's not clear to me includes

    * whether the values associated with the value labels are the same; if they are, some things are easier; if they are not, your problem is worse than you say

    * what pooling means here: in Stata terms merge or append or something else.

    In principle you need a dataset with the 700 or so districts you want correctly labelled. Then I guess you just start looking for mismatches and fix them one by one.


    • #3
      Hello Nick,

      Sorry for being unclear. The values associated with the value labels are different. And by pooling, I meant append.

      Thanks for your suggestion.


      • #4
        I don't have any good news for you. Perhaps you need to append the files and work on a string version of each variable. Never overwrite the original data, but build up a script with a series of replace statements standardizing to the names you want. Could be hours or days of work depending on how bad the problem is.


        • #5
          Nick's dark forecast is unfortunately right.
          I would email who looked after the data entry and ask them to fix those issues (which are often an evidence of laziness).
          Kind regards,
          (StataNow 18.5)


          • #6
            Thank you for your responses. I followed the suggestion by Nick (#2), and it worked.

