Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alternatives to m:m merge?

    Statalist,

    I'm trying to do a merge between two datasets that have different characteristics for the same company. One dataset has the company id and the production by product, and the other dataset has the company id and the share of its total operations by city. I want to create a resulting dataset that prorates the production by product to each city according to the city shares. I thought that this would be a good chance to use a m:m merge, but it's giving me bad results. What other approach would you use to deal with this situation?

    Thanks,

    Jose


  • #2
    I have difficulties to follow the verbal description. Do you have a small example dataset, illustrating what you now have and what you are trying to achieve. You might want to reshape one or both of the datasets, before you apply merge, but this is a vague guess.

    Best
    Daniel

    Comment


    • #3
      Using an m:m merge is almost always a really bad idea. It is almost guaranteed to end badly; the best one can hope for is that it is obvious that it has ended badly and you don't actually end up taking the results seriously!

      You don't describe your data well enough to give you better advice. As I understand what you've written, you have two data sets, each keyed by company. So I don't get why you don't do a -merge 1:1 company_id...-. Maybe each data set has multiple observations per company_id. In that case, perhaps those observations are in turn identified by something distinctive that will indicate which observation in the other data set should match with it. Maybe a year variable or a city variable? -merge 1:1 company_id year...- or -merge 1:1 company_id city...-

      Or maybe what you really want is to pair up each observation for a given company in the first data set with every observation for the same company in the other data set. In that case you want a different command: -joinby company_id...-

      Comment


      • #4
        Thanks for your soon reply Daniel. In a dummy setting, I have these two datasets:

        1) Company to Product dataset:
        Company_ID Product_ID Value
        1 Apples 600
        1 Oranges 300
        2) Company to City dataset:
        Company_ID City_ID Share
        1 NY 0.3
        1 BOS 0.33
        1 LA 0.36

        3) From these two datasets, I want to prorate production of each product using the distribution by city of each company's operations.
        Company_ID Product_ID City_ID Value Share Prorated_Value
        1 Apples NY 600 0.3 180
        1 Apples BOS 600 0.33 198
        1 Apples LA 600 0.36 216
        1 Oranges NY 300 0.3 90
        1 Oranges BOS 300 0.33 99
        1 Oranges LA 300 0.36 108

        I thought a many to many merge could work but it hasn't. I'll try the reshape approach.

        Jose

        Comment


        • #5
          Clyde has pointed to joinby.

          Best
          Daniel

          Comment


          • #6
            Worked perfectly! Thanks guys.

            Jose

            Comment

            Working...
            X