Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing across variables and observations

    Dear all,

    I am hoping I can get some help with the following problem.

    I have data for a number of companies (company_id) that compete in different markets. The variables market_A, market_B, market_C, and market_D take the value 1 if a company competes there and 0 otherwise. A sample of the data appears below.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(company_id market_A market_B market_C market_D)
    1 0 1 1 1
    2 1 1 1 1
    3 1 1 1 0
    end
    I want to create a new variable for each market (competition_A, competition_B, competition_C, and competition_D) that takes values according to the following logic.

    For example, for the variable competition_A for each company:

    1) we consider whether a company operates in market A. If it doesn't, competition_A takes the value 0 for that company. This would be the case for company_id 1.

    2) If a company operates in market A (such as company_id 2), we count the number of the markets it competes with each company in market A beyond market A. We take the sum of these calculations and divide it with the total number of companies in Market A that the company competes with (we divide with the number of companies it competes with in at least 1 market beyond market A).

    Therefore:

    - for company_id 2, competition_A would take the value 2 because: company_2 operates in market_A, and competes with company_id 3 in two other markets beyond market_A. As company_id 2 has no other competitors in market_A we divide 2 by 1.
    - for company_id 3, competition_A would take the value 2 because: company_3 operates in market_A, and competes with company_id 2 in two other markets beyond market_A. As company_id 3 has no other competitors in market_A we divide 2 by 1.

    The following dataset shows the result of the above calculations for all companies and markets.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(company_id market_A market_B market_C market_D competition_A competition_B competition_C competition_D)
    1 0 1 1 1 0   2 1.5 2
    2 1 1 1 1 2   2   2 2
    3 1 1 1 0 2 1.5 1.5 0
    end
    Thank you for your time.

    Pavlos

  • #2
    I think what you want is this:
    Code:
    //    VERIFY NECESSARY ASSUMPTIONS
    isid company_id
    foreach v of varlist market_* {
        assert inlist(`v', 0, 1)
    }
    
    frame copy default working
    frame change working
    rename market_* active*
    reshape long active, i(company_id) j(market) string
    keep if active
    drop active
    preserve
    rename company_id company_id2
    tempfile copy
    save `copy'
    restore
    joinby market using `copy'
    drop if company_id == company_id2
    isid company_id company_id2 market, sort
    by company_id company_id2 (market): gen n_other_markets_this_pair = _N-1
    sort company_id market company_id2
    by company_id market (company_id2): gen n_competitors_this_market = _N
    collapse (sum) n_other_markets_this_pair (first) n_competitors_this_market, ///
        by(company_id market)
    
    gen competition_ = n_other_markets_this_pair/n_competitors_this_market
    drop n_other_markets_this_pair n_competitors_this_market
    reshape wide competition_, i(company_id) j(market) string
    mvencode competition_*, mv(0)
    
    frame change default
    frlink 1:1 company_id, frame(working)
    frget _all, from(working)
    drop working
    frame drop working
    Notes:
    1. The code assumes, and verifies, that each company_id appears only once in the original data, and that the variables market_* are always 0 or 1. The code will break with an error message at the top if these assumptions are not true in the real data set.
    2. If the original data set is large, the -joinby- command may both take a lot of time, and ultimately fail due to insufficient available memory. If it takes a long time but succeeds, I can only counsel patience: as each firm must be compared with every other firm in every market, I see no way to break the data into chunks that can be processed separately. If it fails due to insufficient available memory, I think your only recourse would be to find a computer with more RAM (or, if it is the operating system that refuses to provide the memory, a computer with an operating system that is less restrictive on memory per process.)
    Last edited by Clyde Schechter; 29 Feb 2024, 17:29.

    Comment


    • #3
      Dear Clyde,

      thank you so much for this. The code works as intended when using the full dataset.

      In a number of instances the code interrupts or does not create the 'competition_' variables.
      1. When a competitor has a presence only in one and no other markets, it gives me an error when the code reaches the command: mvencode competition_*, mv(0) - . This is probably because the value 0 in mv(0) is "reserved" for when a company: competes only in one market AND in that market, there are other companies. Changing mv(0) to mv(10000) addresses this issue (I deal with the 10000 values afterward).
      2. The code does not produce the competition_ variable for markets with only one or zero competitors. To address this, I identify those markets before executing your code and create their competitor_ variables filling them with zeros.
      The sample data below will allow you to reproduce these cases.

      Many thanks again,

      Pavlos

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(company_id market_AE market_AF market_AU market_BE market_CH)
       1 0 0 1 0 0
       2 1 0 0 1 1
       3 0 0 0 0 0
       4 0 0 1 0 0
       5 0 0 0 1 1
       6 0 0 0 0 1
       7 0 0 1 0 0
       8 0 0 0 0 0
       9 0 0 1 0 0
      10 0 0 0 0 0
      end

      Comment


      • #4
        Yes, I see the problem. You've already identified, in words, your solutions to these problems. I will point out that the problem with -mvencode- can be more simply dealt with by just adding the -override- option. Altogether, here's how I would expand the solution to deal with these cases:
        Code:
        //    VERIFY NECESSARY ASSUMPTIONS
        isid company_id
        foreach v of varlist market_* {
            assert inlist(`v', 0, 1)
        }
        
        frame copy default working
        frame change working
        rename market_* active*
        reshape long active, i(company_id) j(market) string
        levelsof market, local(markets)
        keep if active
        drop active
        preserve
        rename company_id company_id2
        tempfile copy
        save `copy'
        restore
        joinby market using `copy'
        drop if company_id == company_id2
        isid company_id company_id2 market, sort
        by company_id company_id2 (market): gen n_other_markets_this_pair = _N-1
        sort company_id market company_id2
        by company_id market (company_id2): gen n_competitors_this_market = _N
        collapse (sum) n_other_markets_this_pair (first) n_competitors_this_market, ///
            by(company_id market)
        
        gen competition_ = n_other_markets_this_pair/n_competitors_this_market
        drop n_other_markets_this_pair n_competitors_this_market
        reshape wide competition_, i(company_id) j(market) string
        
        frame change default
        frlink 1:1 company_id, frame(working)
        frget _all, from(working)
        drop working
        frame drop working
        
        foreach m of local markets {
            capture confirm var competition_`m', exact
            if c(rc) != 0 {
                gen competition_`m' = .
            }
        }
        mvencode competition_*, mv(0) override
        Additions to original code shown in bold face.

        Comment


        • #5
          Dear Clyde,

          thank you for the additions. I can confirm that the code performs seamlessly and produces the expected outcome.

          Once again, thank you for your time and effort.

          Best regards,

          Pavlos

          Comment

          Working...
          X