Hello,
I am making flag variables to detect irregular changes in capital & investment variables in the panel dataset.
Here, one of the main point is to flag capital & investment variables from (i) own yoy changes (either sudden increase/decrease) and (ii) comovement between capital and investment variables.
Underlying logic is that capital stocks (tangible and intangible) evolves according to past investment (tangible and intangible, respectively).
For example, if there is a sudden increase (above certain threshold) in capital stock at t, but if we observe that there was an increase in investment at t-1, we do not flag them.
In other words, all the increase/decrease in capital stock at t should be justified by increase/decrease in investment at t-1.
Example of the dataset would be like:
Therefore, I used the below code:
* Setup
tsset $id $yr, yearly
bysort $id ($yr): gen byte first_year = (_n == 1)
* Generate log differences (excluding first year)
foreach var in $vars_L $vars_K $vars_I $vars_output $vars_ratio {
gen log_diff_`var' = .
replace log_diff_`var' = log(`var'/L1.`var') ///
if L1.`var' > 0 & `var' > 0 ///
& first_year == 0
}
* Section (i) - define global var
gl Ktan capital_tan
gl Kintan capital_intan
gl Itan invest_tan
gl Iintan invest_intan
gl vars_K capital_tan capital_intan
gl vars_I invest_tan invest_intan
* Section (ii) - Define threshold
loc threshold_K 0.69
loc threshold_I 0.69
loc com_threshold_K 0.3
* Section (iii) - Flags for K and I variables
local nvars : word count $vars_K
forvalues i = 1/`nvars' {
loc K_var : word `i' of $vars_K
loc I_var : word `i' of $vars_I
gen log_diff_L1_`I_var' = .
replace log_diff_L1_`I_var' = log(L1.`I_var'/L2.`I_var') ///
if L1.`I_var' > 0 & L2.`I_var' > 0
gen flag_jump_`K_var' = 0
gen flag_jump_`I_var' = 0
// Startup jumps
replace flag_jump_`K_var' = 1 ///
if first_year == 0 ///
& `K_var' > 0 ///
& (L1.`K_var' == 0 | L1.`K_var' == .) ///
& (L2.`K_var' == 0 | L2.`K_var' == .)
replace flag_jump_`I_var' = 1 ///
if first_year == 0 ///
& `I_var' > 0 ///
& (L1.`I_var' == 0 | L1.`I_var' == .) ///
& (L2.`I_var' == 0 | L2.`I_var' == .)
// Drops to zero
replace flag_jump_`K_var' = 1 ///
if first_year == 0 ///
& `K_var' == 0 ///
& L1.`K_var' > 0
replace flag_jump_`I_var' = 1 ///
if first_year == 0 ///
& `I_var' == 0 ///
& L1.`I_var' > 0
// Positive-to-positive jumps with lagged comovement
replace flag_jump_`K_var' = 1 ///
if first_year == 0 ///
& log_diff_`K_var' != . ///
& abs(log_diff_`K_var') > `threshold_K' ///
& (log_diff_L1_`I_var' == . ///
| abs(log_diff_`K_var' - log_diff_L1_`I_var') > `com_threshold_K')
replace flag_jump_`I_var' = 1 ///
if first_year == 0 ///
& log_diff_`I_var' != . ///
& abs(log_diff_`I_var') > `threshold_I'
}
foreach var of varlist flag_jump_* {
replace `var' = 0 if missing(`var')
}
But, now I want to make commands using if statement to account for the cases where there is no either capital or investment variables in the dataset.
So if there exists both capital&investment variables (intangible capital&intangible investment or tangible capital&tangible investment), then the code should generate flags with comovement.
However, if either K or I variables are missing in the dataset (not missing observation but the variable per se is not available in the dataset) - when we leave global variable empty in section (i), we generate flags based on their (tangible, intangible K and I) own yoy changes above threshold.
Could someone please help me with this?
I tried to do it but everytime I couldn't avoid errors :/
Thank you in advance!
AC
I am making flag variables to detect irregular changes in capital & investment variables in the panel dataset.
Here, one of the main point is to flag capital & investment variables from (i) own yoy changes (either sudden increase/decrease) and (ii) comovement between capital and investment variables.
Underlying logic is that capital stocks (tangible and intangible) evolves according to past investment (tangible and intangible, respectively).
For example, if there is a sudden increase (above certain threshold) in capital stock at t, but if we observe that there was an increase in investment at t-1, we do not flag them.
In other words, all the increase/decrease in capital stock at t should be justified by increase/decrease in investment at t-1.
Example of the dataset would be like:
id | year | capital_tan | capital_intan | investment_tan | investment_intan |
1 | 2020 | 0 | 0 | 0 | 90 |
1 | 2021 | 0 | 200 | 0 | 0 |
1 | 2022 | 0 | 299 | 0 | 30 |
1 | 2023 | 200 | 30000 | 0 | 600 |
1 | 2024 | 50000 | . | 300 | 0 |
1 | 2025 | 0 | 0 | 5000 | . |
2 | 2011 | 40 | 4555 | . | . |
2 | 2012 | . | 4555 | . | . |
2 | 2013 | . | 46666 | 3333 | 555555 |
2 | 2014 | 600 | 0 | 55555 | 555555 |
3 | 2009 | 34 | 1345 | 22 | 53 |
3 | 2010 | 10000 | 1355 | 3523 | 5235 |
3 | 2011 | . | 1555 | . | . |
Therefore, I used the below code:
* Setup
tsset $id $yr, yearly
bysort $id ($yr): gen byte first_year = (_n == 1)
* Generate log differences (excluding first year)
foreach var in $vars_L $vars_K $vars_I $vars_output $vars_ratio {
gen log_diff_`var' = .
replace log_diff_`var' = log(`var'/L1.`var') ///
if L1.`var' > 0 & `var' > 0 ///
& first_year == 0
}
* Section (i) - define global var
gl Ktan capital_tan
gl Kintan capital_intan
gl Itan invest_tan
gl Iintan invest_intan
gl vars_K capital_tan capital_intan
gl vars_I invest_tan invest_intan
* Section (ii) - Define threshold
loc threshold_K 0.69
loc threshold_I 0.69
loc com_threshold_K 0.3
* Section (iii) - Flags for K and I variables
local nvars : word count $vars_K
forvalues i = 1/`nvars' {
loc K_var : word `i' of $vars_K
loc I_var : word `i' of $vars_I
gen log_diff_L1_`I_var' = .
replace log_diff_L1_`I_var' = log(L1.`I_var'/L2.`I_var') ///
if L1.`I_var' > 0 & L2.`I_var' > 0
gen flag_jump_`K_var' = 0
gen flag_jump_`I_var' = 0
// Startup jumps
replace flag_jump_`K_var' = 1 ///
if first_year == 0 ///
& `K_var' > 0 ///
& (L1.`K_var' == 0 | L1.`K_var' == .) ///
& (L2.`K_var' == 0 | L2.`K_var' == .)
replace flag_jump_`I_var' = 1 ///
if first_year == 0 ///
& `I_var' > 0 ///
& (L1.`I_var' == 0 | L1.`I_var' == .) ///
& (L2.`I_var' == 0 | L2.`I_var' == .)
// Drops to zero
replace flag_jump_`K_var' = 1 ///
if first_year == 0 ///
& `K_var' == 0 ///
& L1.`K_var' > 0
replace flag_jump_`I_var' = 1 ///
if first_year == 0 ///
& `I_var' == 0 ///
& L1.`I_var' > 0
// Positive-to-positive jumps with lagged comovement
replace flag_jump_`K_var' = 1 ///
if first_year == 0 ///
& log_diff_`K_var' != . ///
& abs(log_diff_`K_var') > `threshold_K' ///
& (log_diff_L1_`I_var' == . ///
| abs(log_diff_`K_var' - log_diff_L1_`I_var') > `com_threshold_K')
replace flag_jump_`I_var' = 1 ///
if first_year == 0 ///
& log_diff_`I_var' != . ///
& abs(log_diff_`I_var') > `threshold_I'
}
foreach var of varlist flag_jump_* {
replace `var' = 0 if missing(`var')
}
But, now I want to make commands using if statement to account for the cases where there is no either capital or investment variables in the dataset.
So if there exists both capital&investment variables (intangible capital&intangible investment or tangible capital&tangible investment), then the code should generate flags with comovement.
However, if either K or I variables are missing in the dataset (not missing observation but the variable per se is not available in the dataset) - when we leave global variable empty in section (i), we generate flags based on their (tangible, intangible K and I) own yoy changes above threshold.
Could someone please help me with this?
I tried to do it but everytime I couldn't avoid errors :/
Thank you in advance!
AC
Comment