Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating intensive and extensive margins

    I am working on firm level heterogeneity in India and my database has information on firms, codes of the products that the firms are producing in each year and the sales variables for each of the product code by year. I want to generate two variables viz. Intensive and Extensive margin for each firm-year observation. The definitions of each of these two variables are as follows:
    1. Intensive Margin = sum of sales of those products that a firm produce in both periods t and t-1.
    2. Extensive Margin = sum of sales of those products that a firm produce in year t or in year t-1, but not in both the periods.
    The issue is that mine is a strongly unbalanced panel as regards the products that each firm produces every year. There are some single product firms, while others are multi-product firms. Also, majority of the high growth firms have a tendency of adding a lot of product each year. I tried a lot but I am not able to execute the correct stata command for generating these two variables. Could you please suggest how should I proceed to solve this query?

    Thank you so much.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long company double products_product_code int yr float dsales
    14 2.10611020109e+19 1991  274.9848
    14     2.1061215e+19 1991 4.6154695
    14         2.106e+19 1991  29.39325
    14     2.1061109e+19 1991 35.952076
    14     2.1061215e+19 1992 11.147906
    14 2.10611020109e+19 1992  373.0567
    14     2.1061109e+19 1992  41.00837
    14         2.106e+19 1992  77.03999
    228     1.8010503e+19 1990         0
    228 1.41602069999e+19 1990   322.537
    228 1.41602030404e+19 1990         0
    228         1.000e+20 1990  170.6014
    228 1.41602030404e+19 1991         0
    228     1.8010503e+19 1991         0
    228 1.41602069999e+19 1991         0
    228         1.000e+20 1991 291.96457
    228 1.41602069999e+19 1992  166.5394
    228         1.000e+20 1992 271.90756
    228     1.8010503e+19 1992         0
    228 1.41602030404e+19 1992         0
    289   2.101011648e+19 1990  4.985451
    289   2.103060101e+19 1990 1791.2013
    289   2.101011648e+19 1991  4.349279
    289   2.103060101e+19 1991  1394.089
    520      5.110501e+18 1991 12.790994
    520         6.070e+18 1991         0
    520    2.10101162e+19 1991 216.28407
    520        5.1106e+18 1991         0
    520         1.000e+20 1991    65.893
    520      5.110501e+18 1992   94.1451
    520        5.1105e+18 1992  3.417245
    520    2.10101162e+19 1992  29.55917
    520         1.000e+20 1992 2.2212093
    520         6.070e+18 1992 2.0503469
    569   1.414050609e+19 1989  3.539676
    569   1.414050609e+19 1990  3.612735
    569   1.414050609e+19 1991 .18513922
    569   1.414050609e+19 1992 1.9216082
    572     1.4140501e+19 1989         0
    572      1.414051e+19 1989 34.044724
    572     1.4140513e+19 1989         0
    572     1.4140514e+19 1989         0
    572     1.4140599e+19 1989  51.06709
    572   2.101011648e+19 1989  75.94592
    572   1.414051702e+19 1989 106.71712
    572     1.9100107e+19 1989  51.06709
    572     1.4140102e+19 1989  818.3828
    572     1.3010603e+19 1989 19.313833
    572   1.301030201e+19 1989   5.23765
    572     1.4140506e+19 1989         0
    572     1.4140506e+19 1990         0
    572     1.9100107e+19 1990 176.52036
    572     1.4140501e+19 1990         0
    572      1.414051e+19 1990         0
    572   1.301030201e+19 1990 32.011463
    572     1.3010603e+19 1990  51.52321
    572     1.4140513e+19 1990         0
    572     1.4140102e+19 1990 1888.6764
    572     1.4140599e+19 1990  57.62064
    572   2.101011648e+19 1990 146.64299
    572     1.4140514e+19 1990         0
    572   1.414051702e+19 1990 265.54272
    572     1.3010603e+19 1991 128.97295
    572     1.4140514e+19 1991         0
    572     1.4140513e+19 1991         0
    572   2.101011648e+19 1991  110.8317
    572     1.4140102e+19 1991 1455.2684
    572     1.4140501e+19 1991         0
    572     1.9100107e+19 1991 291.11038
    572   1.301030201e+19 1991  53.00647
    572   1.414051702e+19 1991 219.96266
    572      1.414051e+19 1991         0
    572     1.4140506e+19 1991         0
    572   1.301030201e+19 1992  39.39966
    572   2.101011648e+19 1992   87.9105
    572     1.4140506e+19 1992         0
    572      1.414051e+19 1992  161.5386
    572     1.3010603e+19 1992  91.85046
    572     1.4140514e+19 1992         0
    572     1.9100107e+19 1992 216.69815
    572     1.4140102e+19 1992  1455.325
    572   1.414051702e+19 1992 219.40688
    572     1.4140513e+19 1992         0
    572     1.4140501e+19 1992         0
    867   2.101011648e+19 1990         0
    867 2.10101161608e+19 1990         0
    867 2.10101161604e+19 1990         0
    867         1.000e+20 1990         0
    867   2.101011648e+19 1992         0
    867         2.100e+19 1992 34.973297
    867 2.10101161604e+19 1992         0
    906    1.00105021e+19 1990         0
    906       1.41608e+19 1990  92.12475
    906       1.41606e+19 1990 24.486317
    906 2.10101160404e+19 1990  9.433253
    906       1.41608e+19 1991 100.16032
    906    1.00105021e+19 1991         0
    906       1.41606e+19 1991  19.43962
    906 2.10101160404e+19 1991  5.554177
    906       1.41608e+19 1992  116.7377
    end

  • #2
    I have little confidence that I understand what you want here. But perhaps this is it:

    Code:
    by company products_product_code (yr), sort: gen intensive = (yr[_n-1] == yr-1)
    
    by company yr, sort: egen intensive_margin = total(cond(intensive, dsales, .))
    by company yr, sort: egen extensive_margin = total(cond(intensive, ., dsales))
    By the way, you have a serious problem in your data. The variable product_code is apparently supposed to be 19 or 20 digits long. But you can't squeeze that much information into a double. It is likely that the data is already corrupted and that some product codes that differ in their low order 3 or 4 digits have been stored in the data as the same number. You will need to re-create the data set taking care to create the product code variable as a string variable.

    Comment


    • #3
      Thank you so much for your help. The commands fit well my query.
      Thanks a lot. And, i will definitely try to take care about the data from the next time onwards. Could you please also help me in decomposing extensive and intensive margin according to the following:
      1. extensive margin into the gross margins due to product addition and product droppings. This means that I want to decompose extensive margin into sales of products that are produced in period t (but not in t-1) and sales of products that are produced in period t-1 (but not in pd t).
      2. intensive margin into the sales of growing products and sales of shrinking products, By growing and shrinking, I mean the products for which sales have increased over the two periods t-1 and t, and the opposite holds true for shrinking.
      Thanks a lot once again for all the help and suggestions.
      thank you.

      Comment


      • #4
        So I think you want
        Code:
        by company products_product_code (yr), sort: gen byte growing = ///
            (dsales > dsales[_n-1]) if intensive
        by company products_product_code (yr): gen byte added = (yr[_n-1] != yr-1)
        by company products_product_code (yr): gen byte dropped = (yr[_n+1] != yr + 1)
        
        by company yr, sort: egen growing_margin = total(cond(growing == 1, dsales, .))
        by company yr: egen shrinking_margin = total(cond(growing == 0, dsales, .))
        
        by company yr: egen added_margin = total(cond(!intensive & added, dsales, .))
        by company yr: egen dropped_margin = total(cond(!intensive & dropped, dsales, .))

        Comment


        • #5
          Thanks so much for all the help. I understood how have you created these variables. Thanks a ton. But, my definition of product dropped is a little different. For. e.g.: If a company produces good 1 till the year 2013 since its inception and stop reporting data after 2013, then I presume that it never dropped product 1. So, for the year 2013 also, the variable dropped = 0. Could you please help me in redefining this variable? Thank you so much once again

          Comment


          • #6
            If I understand you correctly, then, it should be:

            Code:
            by company, sort: egen last_reporting_year = max(yr)
            
            replace dropped = 0 if yr == last_reporting_year

            Comment


            • #7
              thank you so much for your reply. And I'm really sorry for I was not able to explain the problem appropriately.
              So, the definition of the variable 'dropped_margin' in year t for company i should be : sum of sales of all those products that a company produces in only year t-1 or year t.
              Could you please help me in redefining the variables dropped and dropped_margin? Thank you, once again.

              Comment


              • #8
                I think the code given in #4 and #6 do that. (The code in #6 should be placed immediately after the third command in #4.))

                If that's not right, then I don't understand what you are asking for. In that case, please post back a very short example and also show what the results you want should be.


                Comment


                • #9
                  The previous set of commands do not seem to work perfectly in my case because of the following:
                  Let's assume that there is a firm called firm A which starts producing and selling products 1 and 2 from the year 2000. So, in the year 2000, there are 2 products that get added but nothing is dropped. Let's assume that it follows the same practice till 2006 but in the year 2007, it starts producing a new product '3' and stops production of product ''2'. In this case, corresponding to year 2007, we have one product which has got added and one product which the firm has dropped. This means that if a firm stops producing in year 2007, then it will be considered as a drop in the year 2007 only.

                  Comment


                  • #10
                    Your explanations are not making it clearer to me. Sorry if I'm being dense, but I still don't understand. Show a small data set, perhaps some of the rows from the example you gave in #3 that illustrate the problem, and also show the results you want.

                    Comment


                    • #11
                      In this case, can I add a row for each product-firm-year observation?

                      Comment


                      • #12
                        Ok, i will just do that. Sorry for being not clear. and, thank you so much for all your suggestions and help.

                        Comment


                        • #13
                          Sorry, I don't understand the question.

                          Added: Crossed with #12. This response refers to #11.

                          Comment


                          • #14
                            Please don't say sorry. Its my fault. Let me explain my problem to you again. For each firm-year observation, i want to verify the following identity:
                            Change in sales at time t = change in sales of product added at time t + change in sales of product dropped at time t + change in sales of growing products in time t + change in sales of shrinking products at time t.
                            here, change means (sales in t - sales in t-1)
                            The definition of each sub-component of this identity is as follows:
                            1. Change in sales at time t = dsales (t) - dsales (t-1)
                            2. Change in sales of products added at time t = sales of those products that are sold in time t but not in time t-1
                            3. Change in sales of products dropped at time t = sales of those products that were sold in time t-1 but not in time t. since, the product is no longer produced in time t, therefore, this variable will be equal to [0 - sales of the product in time t-1]
                            4. change in sales of growing products = dslaes of growing product (t) - dsales of growing products (t-1)
                            5. change in sales of shrinking products = dsales of shrinking products (t) - dsales of shrinking products (t-1)

                            Comment


                            • #15
                              Code:
                              * Example generated by -dataex-. To install: ssc install dataex
                              clear
                              input byte co_code str23 company_name double product_code str44 product_name int yr long nic float(dsales intensive growing shrinking extensive added dropped cal_output outputchange)
                              12 "20TH CENTURY ENGG. LTD." 2.100e+19 "Trade in textiles & apparels"                 1999 71100  25.7501          0         0          0  26.75991 26.75991         .   26.75991   26.75991
                              12 "20TH CENTURY ENGG. LTD." 2.110e+19 "Securities investment services"               1999 71100 1.009808          0         0          0  26.75991 26.75991         .   26.75991   26.75991
                              12 "20TH CENTURY ENGG. LTD." 2.110e+19 "Securities investment services"               2000 71100 .9702497 -25.668377         0 -25.668377         0        0         . -25.668377 -25.668377
                              12 "20TH CENTURY ENGG. LTD." 2.100e+19 "Trade & Commissioning Agents' services"       2000 71100 .1212812 -25.668377         0 -25.668377         0        0         . -25.668377 -25.668377
                              12 "20TH CENTURY ENGG. LTD." 2.110e+19 "Technical Consultancy & engineering services" 2001 71100 8.896934  -.1507953         0          0  7.926684 8.896934 -.9702497   7.926684   8.624858
                              12 "20TH CENTURY ENGG. LTD." 2.110e+19 "Securities investment services"               2001 71100 .8194544  -.1507953         0  -.1507953  7.926684 8.896934 -.9702497   7.775889   8.624858
                              12 "20TH CENTURY ENGG. LTD." 2.110e+19 "Technical Consultancy & engineering services" 2002 71100 8.762367   -.134567         0   -.134567 -.8194544        0 -.8194544  -.9540214  -.9540214
                              12 "20TH CENTURY ENGG. LTD." 2.110e+19 "Securities investment services"               2005 71100      8.7          0         0          0       8.7      8.7         0        8.7        8.7
                              12 "20TH CENTURY ENGG. LTD." 2.110e+19 "Securities investment services"               2006 71100 9.100483    .400483   .400483          0         0        0         0    .400483    .400483
                              12 "20TH CENTURY ENGG. LTD." 2.110e+19 "Securities investment services"               2007 71100 26.40141  17.300926 17.300926          0         0        0         0  17.300926  17.300926
                              12 "20TH CENTURY ENGG. LTD." 2.110e+19 "Securities investment services"               2008 71100 26.63156     .23015    .23015          0         0        0         0     .23015     .23015
                              12 "20TH CENTURY ENGG. LTD." 2.110e+19 "Securities investment services"               2009 71100 3.859529  -22.77203 -22.77203          0         0        0         0  -22.77203  -22.77203
                              end
                              Hi,
                              This is the example that I have generated using an excel document. The definitions of the variables are as follows: For each firm in year t:
                              1. Intensive: change in sales of those products that are sold in years t and t-1
                              2. growing: change in sales of those products that are sold in years t and t-1 & for which the sales have increased over the two years
                              3. shrinking: change in sales of those products that are sold in years t and t-1 & for which the sales have decreased over the two years
                              4. extensive: change in sales of those products that are sold in year t or year t-1
                              5. added: change in sales of those products that are sold in year t but not in year t-1
                              6. dropped: change in sales of those products that are sold in year t-1 but not in year t
                              7. cal_output: sum of growing, shrinking, added and dropped variables
                              8. output change: change in total sales of each firm in year t (dsales (t) - dsales(t-1)).

                              I have verified my result using the following identities:
                              1. dropped + added = extensive
                              2. growing + shrinking = intensive
                              3. cal_output = output change

                              Could you please help me in defining stata codes for these steps? I have tried a lot but I am not able to succeed in running the perfect codes which verify these results. Thank you so much.

                              Comment

                              Working...
                              X