Hello everyone,
I have unbalanced panel data and I would like to solve the problem of missing values with a multiple imputation. If I am not wrong, the first step is to assess if data are missing at random. To do that I can either employ a logit model or a t-test. I have tried both without succeeding.
These are my data:
where total_cases_per_million and total_deaths_per_million are the dependent variables for which I would like to impute missing values, and miss_total_cases_per_million and miss_total_deaths_per_million are dummy variables that indicate observations for which the previous two variables are missing.
I tried the following code:
I saw in previous posts that it is possible to estimate this by using a logistic model, however, I saw only examples in which the data were cross-sectional which is not the case here. Thus, I would like to ask if someone knows how to that and also if it is possible in this case. Thank you in advance to anyone who is willing to help.
Best regards
Alessio Lombini
I have unbalanced panel data and I would like to solve the problem of missing values with a multiple imputation. If I am not wrong, the first step is to assess if data are missing at random. To do that I can either employ a logit model or a t-test. I have tried both without succeeding.
These are my data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str32 country str10 date double(total_deaths_per_million total_cases_per_million) float country_system byte(miss_total_cases_per_million miss_total_deaths_per_million) double(life_expectancy hospital_beds_per_thousand aged_70_older) "Argentina" "2020-04-09" 1.593 39.716 1 0 0 76.67 5 7.441 "Argentina" "2020-04-10" 1.814 43.699 1 0 0 76.67 5 7.441 "Argentina" "2020-04-11" 1.836 43.699 1 0 0 76.67 5 7.441 "Argentina" "2020-04-12" 1.991 47.394 1 0 0 76.67 5 7.441 "Argentina" "2020-04-13" 2.146 48.854 1 0 0 76.67 5 7.441 "Argentina" "2020-04-14" 2.257 50.381 1 0 0 76.67 5 7.441 "Argentina" "2020-04-15" 2.456 54.054 1 0 0 76.67 5 7.441 "Argentina" "2020-04-16" 2.544 56.886 1 0 0 76.67 5 7.441 "Argentina" "2020-04-17" 2.721 59.054 1 0 0 76.67 5 7.441 "Argentina" "2020-04-18" 2.854 61.023 1 0 0 76.67 5 7.441 "Argentina" "2020-04-19" 2.921 62.816 1 0 0 76.67 5 7.441 "Argentina" "2020-04-20" 3.009 65.072 1 0 0 76.67 5 7.441 "Argentina" "2020-04-21" 3.253 67.064 1 0 0 76.67 5 7.441 "Argentina" "2020-04-22" 3.363 69.564 1 0 0 76.67 5 7.441 "Argentina" "2020-04-23" 3.651 76.003 1 0 0 76.67 5 7.441 "Argentina" "2020-04-24" 3.894 79.808 1 0 0 76.67 5 7.441 "Argentina" "2020-04-25" 4.093 83.636 1 0 0 76.67 5 7.441 "Argentina" "2020-04-26" 4.248 86.114 1 0 0 76.67 5 7.441 "Argentina" "2020-04-27" 4.359 88.57 1 0 0 76.67 5 7.441 "Argentina" "2020-04-28" 4.58 91.314 1 0 0 76.67 5 7.441 "Argentina" "2020-04-29" 4.735 94.81 1 0 0 76.67 5 7.441 "Argentina" "2020-04-30" 4.823 97.974 1 0 0 76.67 5 7.441 "Argentina" "2020-05-01" 4.978 100.275 1 0 0 76.67 5 7.441 "Argentina" "2020-05-02" 5.244 103.572 1 0 0 76.67 5 7.441 "Argentina" "2020-05-03" 5.443 105.828 1 0 0 76.67 5 7.441 "Argentina" "2020-05-04" 5.753 108.13 1 0 0 76.67 5 7.441 "Argentina" "2020-05-05" 5.841 111.072 1 0 0 76.67 5 7.441 "Argentina" "2020-05-06" 6.04 115.232 1 0 0 76.67 5 7.441 "Argentina" "2020-05-07" 6.24 118.839 1 0 0 76.67 5 7.441 "Argentina" "2020-05-08" 6.483 124.149 1 0 0 76.67 5 7.441 "Argentina" "2020-05-09" 6.638 127.8 1 0 0 76.67 5 7.441 "Argentina" "2020-05-10" 6.748 133.508 1 0 0 76.67 5 7.441 "Argentina" "2020-05-11" 6.948 138.907 1 0 0 76.67 5 7.441 "Argentina" "2020-05-12" 7.058 145.213 1 0 0 76.67 5 7.441 "Argentina" "2020-05-13" 7.279 152.204 1 0 0 76.67 5 7.441 "Argentina" "2020-05-14" 7.81 157.847 1 0 0 76.67 5 7.441 "Argentina" "2020-05-15" 7.877 165.48 1 0 0 76.67 5 7.441 "Argentina" "2020-05-16" 8.032 172.693 1 0 0 76.67 5 7.441 "Argentina" "2020-05-17" 8.253 178.512 1 0 0 76.67 5 7.441 "Argentina" "2020-05-18" 8.452 185.216 1 0 0 76.67 5 7.441 "Argentina" "2020-05-19" 8.696 194.908 1 0 0 76.67 5 7.441 "Argentina" "2020-05-20" 8.917 205.395 1 0 0 76.67 5 7.441 "Argentina" "2020-05-21" 9.204 219.733 1 0 0 76.67 5 7.441 "Argentina" "2020-05-22" 9.581 235.619 1 0 0 76.67 5 7.441 "Argentina" "2020-05-23" 9.846 251.196 1 0 0 76.67 5 7.441 "Argentina" "2020-05-24" 10.001 267.193 1 0 0 76.67 5 7.441 "Argentina" "2020-05-25" 10.333 279.407 1 0 0 76.67 5 7.441 "Argentina" "2020-05-26" 10.709 292.682 1 0 0 76.67 5 7.441 "Argentina" "2020-05-27" 11.063 308.281 1 0 0 76.67 5 7.441 "Argentina" "2020-05-28" 11.24 325.296 1 0 0 76.67 5 7.441 "Argentina" "2020-05-29" 11.505 341.16 1 0 0 76.67 5 7.441 end
where total_cases_per_million and total_deaths_per_million are the dependent variables for which I would like to impute missing values, and miss_total_cases_per_million and miss_total_deaths_per_million are dummy variables that indicate observations for which the previous two variables are missing.
I tried the following code:
Code:
. xtlogit miss_total_cases_per_million country_system total_cases_per_million > stringency gdp_per_capita extreme_poverty aged_70_older outcome does not vary; remember: 0 = negative outcome, all other nonmissing values = positive outcome r(2000); end of do-file
Best regards
Alessio Lombini
Comment