Exact matching of observations with two diffrent variables

Nick Paul

Join Date: Dec 2021

Posts: 5
#1

Exact matching of observations with two diffrent variables

18 Jan 2022, 14:20

Hi everyone,

I am currently working on my master thesis for which I have to analyse the effect of tax changes on personnel expenses within corporations.
For example: Does a tax rise in the parent firm lead to a decrease of personnel expenses in this firm and maybe even an increase in a subsidiary or even all subsidiaries.

My dataset (unbalanced panel) consists of 1,6 million observations and about 200.000 companies with a uniquie id which can be one of the following:

1. stand-alone company
2. parent company
3. subsidiary company

I tried to use dataex but the dataset is too big and the output would not be very useful so i created this table to make it more clear.

id id_owner tax change personnel expenses

1 - 0 100

2 1 0 150

3 1 1 10

4 - 1 5

5 4 0 200

6 - 1 300

My idea is now to firstly identify corporations within this panel. For Example: corporation 1 consists of company 1, 2 and 3, and corporation 2 of company 4 and 5. Is there any way to do an exact matching?
I tried to look at psmtach2, iematch, vmatch and cem, but do not get very smart out of it.

In a second step I need to regress within these corporation groups, where I have treatment (1) and control (0) companies before I can finally can compare all corporations.

Alternatively the regression could work like this:

Code:

reg personnel expense i.treatment, vce(cluster corporation)

I currently use the newest stata version and I am sorry that I can't present any code yet, but I am looking forward to try out new things.

Help is much appreciated, thank you very much in advance!

Regards
Nick
Tags: None

Nick Paul

Join Date: Dec 2021
Posts: 5

19 Jan 2022, 07:36

I now figured out how to create groups (see the code below).

Code:

**Generating a new variable in the main dataset, which includes the unique id*
gen id_new = id
save dataset, replace

**Save a dataset with observations for which id_owner isn't missing**
drop if id_owner == "."
replace id_new = id_owner
save subsidiarys, replace

append using dataset

**Create the group id based on id_new**
egen grpid = group(id_new)

duplicates drop id_new jahr, force

sort grpid jahr
quietly by grpid jahr:  gen dup = cond(_N==1,0,_n)

**Replace group id 1 - n, if there are more than one company in the group**
egen grp = group(grpid) if dup > 0
drop grpid dup id_new

Now I have about 13.000 groups with up to 200 companys in it and over 100.000 stand alone companies.

In addition I have one dummy variable for a tax rise, one for a tax cut and the taxrate for every year.

I am now stuck on how to make a correct regression. I need to analyse the impact of tax changes on personnel expenses.

Does anyone has a recommendation?

Comment

George Ford

Join Date: Aug 2014

Posts: 3040
#3

19 Jan 2022, 13:24

Since the tax rate is continuous, I'm not sure I'd use the dummies. That said, there may be an asymmetric effect (so you have taxincreasedummy*taxrate and taxdecreasedummy*taxrate and can test for a difference). Might add a little sizzle to that steak.

I'd be careful with treating this as a DID problem, as you've got a lot of irregularly timed treatment dates, I suspect. You'd need csdid or something like it to estimate, and then with a continuous treatment.

I would think some other financials might be useful (say, cash flow, revenues, or cashflow/revenues as a profit margin). You've got a DV with scale and it needs to be conditioned by something (fixed effects might help). Probably a lag structure to such expenditures too, but the unbalanced nature of the data may make that problematic and you'd need to address the lagged DV in estimation.

There might be a sector effect too (sales forces may require more personal expenditures).

Might get a sense of things starting here, but hard to say not knowing more:

Code:

reghdfe personal_expense taxrate, absorb(corporation) cluster(corporation) reghdfe personal_expense sizevariable taxrate, absorb(corporation) cluster(corporation) reg personal_expense sizevariable taxrate, cluster(corporation)
Comment
George Ford

Join Date: Aug 2014

Posts: 3040
#4

19 Jan 2022, 13:25

cem, psmatch, and so forth are matching techniques across groups (controls, treated). Not designed for data management, but you have solved that problem.
Comment

id	id_owner	tax change	personnel expenses
1	-	0	100
2	1	0	150
3	1	1	10
4	-	1	5
5	4	0	200
6	-	1	300

Announcement

Exact matching of observations with two diffrent variables

Comment

Comment

Comment