Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exact matching of observations with two diffrent variables

    Hi everyone,

    I am currently working on my master thesis for which I have to analyse the effect of tax changes on personnel expenses within corporations.
    For example: Does a tax rise in the parent firm lead to a decrease of personnel expenses in this firm and maybe even an increase in a subsidiary or even all subsidiaries.

    My dataset (unbalanced panel) consists of 1,6 million observations and about 200.000 companies with a uniquie id which can be one of the following:

    1. stand-alone company
    2. parent company
    3. subsidiary company

    I tried to use dataex but the dataset is too big and the output would not be very useful so i created this table to make it more clear.
    id id_owner tax change personnel expenses
    1 - 0 100
    2 1 0 150
    3 1 1 10
    4 - 1 5
    5 4 0 200
    6 - 1 300
    My idea is now to firstly identify corporations within this panel. For Example: corporation 1 consists of company 1, 2 and 3, and corporation 2 of company 4 and 5. Is there any way to do an exact matching?
    I tried to look at psmtach2, iematch, vmatch and cem, but do not get very smart out of it.

    In a second step I need to regress within these corporation groups, where I have treatment (1) and control (0) companies before I can finally can compare all corporations.

    Alternatively the regression could work like this:
    Code:
    reg personnel expense i.treatment, vce(cluster corporation)
    I currently use the newest stata version and I am sorry that I can't present any code yet, but I am looking forward to try out new things.

    Help is much appreciated, thank you very much in advance!

    Regards
    Nick

  • #2
    I now figured out how to create groups (see the code below).

    Code:
    **Generating a new variable in the main dataset, which includes the unique id*
    gen id_new = id
    save dataset, replace
    
    **Save a dataset with observations for which id_owner isn't missing**
    drop if id_owner == "."
    replace id_new = id_owner
    save subsidiarys, replace
    
    append using dataset
    
    **Create the group id based on id_new**
    egen grpid = group(id_new)
    
    duplicates drop id_new jahr, force
    
    sort grpid jahr
    quietly by grpid jahr:  gen dup = cond(_N==1,0,_n)
    
    **Replace group id 1 - n, if there are more than one company in the group**
    egen grp = group(grpid) if dup > 0
    drop grpid dup id_new
    Now I have about 13.000 groups with up to 200 companys in it and over 100.000 stand alone companies.

    In addition I have one dummy variable for a tax rise, one for a tax cut and the taxrate for every year.

    I am now stuck on how to make a correct regression. I need to analyse the impact of tax changes on personnel expenses.

    Does anyone has a recommendation?

    Comment


    • #3
      Since the tax rate is continuous, I'm not sure I'd use the dummies. That said, there may be an asymmetric effect (so you have taxincreasedummy*taxrate and taxdecreasedummy*taxrate and can test for a difference). Might add a little sizzle to that steak.

      I'd be careful with treating this as a DID problem, as you've got a lot of irregularly timed treatment dates, I suspect. You'd need csdid or something like it to estimate, and then with a continuous treatment.

      I would think some other financials might be useful (say, cash flow, revenues, or cashflow/revenues as a profit margin). You've got a DV with scale and it needs to be conditioned by something (fixed effects might help). Probably a lag structure to such expenditures too, but the unbalanced nature of the data may make that problematic and you'd need to address the lagged DV in estimation.

      There might be a sector effect too (sales forces may require more personal expenditures).

      Might get a sense of things starting here, but hard to say not knowing more:
      Code:
      reghdfe personal_expense taxrate, absorb(corporation) cluster(corporation)
      reghdfe personal_expense sizevariable taxrate, absorb(corporation) cluster(corporation)
      reg personal_expense sizevariable taxrate, cluster(corporation)






      Comment


      • #4
        cem, psmatch, and so forth are matching techniques across groups (controls, treated). Not designed for data management, but you have solved that problem.

        Comment

        Working...
        X