Dear Statalist community,
I am performing a research project that tries to determine if there has been any difference in several clinical variables after switching from one treatment to another. I am unsure about the statistical method chosen to answer that question, and I would be most grateful if you could help me see if the way I have proceeded is correct.
In this study, there will be around 50 patients included, but I started a pilot study with only 10 of them before getting the whole data. Patients were seen once per year, and I have data in a yearly manner before and post to the switch. The main research question is about changes in renal variables: proteinuria values, ACR and PCT, and the direct measurement of the renal function mGFR, which are all numerical variables.
Therefore, I longitudinally organised the data in a long format in Stata (Stata 17). A subset of the database looks like the following, including some yes/no clinical variables (presence of diabetes or hypertension) gender, age and age at switch. The switch time is identified by the variable beforeafter 0/1
Initially, I was driven towards the command treatment effects in Stata, as it defined what I intended to do. However, after reading the help file, I realised I needed a control group that had not switched treatment. All the data I have access to is patients who switched, so the idea would be to compare the data before and after acting each patient as their own control.
To achieve this, I performed longitudinal regressions with the independent variable beforeafter, which identified the switch, and the clinical variable as the dependent one. To nullify any potential cofounder effect by age, I chose age as the time variable of the panel.
As an example, it looked like this.
Listed 100 out of 200 observations
Use the count() option to list more
Then, I tried to adjust by possible cofounders like having diabetes or using some types of medication (ARB ACEi)
So far, from these results, I interpreted that the switch negatively affected the renal function (mGFR) regardless of age and all the other covariates in the model.
My overarching question is: Is longitudinal regression correct to answer my research question? If so, is putting age as the time variable of the panel the correct way to adjust by age?
And finally…am I interpreting my results correctly? For qualitative variables, logistic regression would be used, as they are mainly 0/1 variables.
Thank you very much for all your help and apologies for the long post!
Best regards,
David.
I am performing a research project that tries to determine if there has been any difference in several clinical variables after switching from one treatment to another. I am unsure about the statistical method chosen to answer that question, and I would be most grateful if you could help me see if the way I have proceeded is correct.
In this study, there will be around 50 patients included, but I started a pilot study with only 10 of them before getting the whole data. Patients were seen once per year, and I have data in a yearly manner before and post to the switch. The main research question is about changes in renal variables: proteinuria values, ACR and PCT, and the direct measurement of the renal function mGFR, which are all numerical variables.
Therefore, I longitudinally organised the data in a long format in Stata (Stata 17). A subset of the database looks like the following, including some yes/no clinical variables (presence of diabetes or hypertension) gender, age and age at switch. The switch time is identified by the variable beforeafter 0/1
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(id year sex ageatswitch) int mgfr_ float proteinuria_ int pcr_ float(acr_ renalevents stroke whitematterlesion) byte(diabetes hypertension) float beforeafter 1 1 1 50 115 . . . 0 1 0 0 1 0 1 2 1 50 96 . . . 0 1 0 0 1 0 1 3 1 50 94 . . . 0 1 0 0 1 0 1 4 1 50 94 .09 10 2.05 0 1 0 0 1 0 1 5 1 50 95 . . . 0 1 0 0 1 0 1 6 1 50 89 . 23 2.85 0 1 0 0 1 0 1 7 1 50 105 .33 35 5.16 0 1 0 0 1 0 1 8 1 50 109 . . . 0 1 0 0 1 0 1 9 1 50 103 . . . 0 1 0 0 1 0 1 10 1 50 109 . . . 0 1 0 0 1 0 1 11 1 50 96 . . . 0 1 0 0 1 0 1 12 1 50 113 . . . 0 1 0 0 1 0 1 13 1 50 108 .09 9 1.2 0 1 0 0 1 0 1 14 1 50 105 .13 13 .84 0 1 0 0 1 0 1 15 1 50 82 .12 12 1.8 0 1 0 0 1 1 1 16 1 50 85 .11 . 1.44 0 1 0 0 1 1 1 17 1 50 . . . . 0 1 0 0 1 1 1 18 1 50 . . 9 . 0 1 0 0 1 1 1 19 1 50 . .05 . 2.31 0 1 0 0 1 1 1 20 1 50 . . . . 0 1 0 0 1 1 2 1 1 62 94 . . . 0 0 1 0 1 0 2 2 1 62 89 . . . 0 0 1 0 1 0 2 3 1 62 . . 10 . 0 0 1 0 1 0 2 4 1 62 92 . 7 .72 0 0 1 0 1 0 2 5 1 62 84 . . . 0 0 1 0 1 0 2 6 1 62 85 . . . 0 0 1 0 1 0 2 7 1 62 96 0 . . 0 0 1 0 1 0 2 8 1 62 88 . . . 0 0 1 0 1 0 2 9 1 62 93 . . . 0 0 1 0 1 0 2 10 1 62 101 . . . 0 0 1 0 1 0 2 11 1 62 . . . . 0 0 1 0 1 0 2 12 1 62 93 0 . . 0 0 1 0 1 0 2 13 1 62 . . . . 0 0 1 0 1 0 2 14 1 62 93 .11 16 1.27 0 0 1 0 1 0 2 15 1 62 88 .09 14 .76 0 0 1 0 1 1 2 16 1 62 80 .09 12 .57 0 0 1 0 1 1 2 17 1 62 . . . . 0 0 1 0 1 1 2 18 1 62 . . . . 0 0 1 0 1 1 2 19 1 62 . . . . 0 0 1 0 1 1 2 20 1 62 79 .11 15 2.24 0 0 1 0 1 1 3 1 0 42 . . . . 0 0 0 0 0 0 3 2 0 42 98 . . . 0 0 0 0 0 0 3 3 0 42 94 . . .49 0 0 0 0 0 0 3 4 0 42 . .32 7 . 0 0 0 0 0 0 3 5 0 42 87 0 . .44 0 0 0 0 0 0 3 6 0 42 73 0 . . 0 0 0 0 0 0 3 7 0 42 93 0 . . 0 0 0 0 0 0 3 8 0 42 86 . . . 0 0 0 0 0 0 3 9 0 42 92 0 . .38 0 0 0 0 0 0 3 10 0 42 94 . . . 0 0 0 0 0 0 3 11 0 42 90 . . .72 0 0 0 0 0 0 3 12 0 42 101 .12 10 . 0 0 0 0 0 0 3 13 0 42 101 .11 8 .49 0 0 0 0 0 0 3 14 0 42 91 0 8 .42 0 0 0 0 0 0 3 15 0 42 98 .14 10 .48 0 0 0 0 0 1 3 16 0 42 94 .28 19 .53 0 0 0 0 0 1 3 17 0 42 . . . . 0 0 0 0 0 1 3 18 0 42 85 .15 13 1.61 0 0 0 0 0 1 3 19 0 42 . .1 9 1.1 0 0 0 0 0 1 3 20 0 42 . . . . 0 0 0 0 0 1 4 1 1 57 . . . . 0 0 1 1 1 0 4 2 1 57 63 . . . 0 0 1 1 1 0 4 3 1 57 52 . 16 1.89 0 0 1 1 1 0 4 4 1 57 57 . . .65 0 0 1 1 1 0 4 5 1 57 54 . . . 0 0 1 1 1 0 4 6 1 57 56 . 18 5.07 0 0 1 1 1 0 4 7 1 57 72 . . .61 0 0 1 1 1 0 4 8 1 57 . . . . 0 0 1 1 1 0 4 9 1 57 75 . . . 0 0 1 1 1 0 4 10 1 57 57 . . . 0 0 1 1 1 0 4 11 1 57 67 . . . 0 0 1 1 1 0 4 12 1 57 61 . 55 18.26 0 0 1 1 1 0 4 13 1 57 . . . . 0 0 1 1 1 0 4 14 1 57 68 . 124 69.12 0 0 1 1 1 0 4 15 1 57 41 . 68 32.22 0 0 1 1 1 1 4 16 1 57 64 . . . 0 0 1 1 1 1 4 17 1 57 . . . . 0 0 1 1 1 1 4 18 1 57 . . 32 7.88 0 0 1 1 1 1 4 19 1 57 48 . 34 17.07 0 0 1 1 1 1 4 20 1 57 . . 13 .92 0 0 1 1 1 1 5 1 0 62 . . . . 0 0 0 0 1 0 5 2 0 62 82 . . . 0 0 0 0 1 0 5 3 0 62 80 . 7 2 0 0 0 0 1 0 5 4 0 62 76 . . . 0 0 0 0 1 0 5 5 0 62 . . . . 0 0 0 0 1 0 5 6 0 62 85 . . . 0 0 0 0 1 0 5 7 0 62 80 . . 5.77 0 0 0 0 1 0 5 8 0 62 76 . . . 0 0 0 0 1 0 5 9 0 62 78 . . . 0 0 0 0 1 0 5 10 0 62 84 . . 76 0 0 0 0 1 0 5 11 0 62 . . . . 0 0 0 0 1 0 5 12 0 62 78 . . 4.93 0 0 0 0 1 0 5 13 0 62 81 .1 15 7.1 0 0 0 0 1 0 5 14 0 62 . 0 . 4.94 0 0 0 0 1 0 5 15 0 62 77 .06 7 1.97 0 0 0 0 1 1 5 16 0 62 67 .11 13 4.61 0 0 0 0 1 1 5 17 0 62 . . . 5.3 0 0 0 0 1 1 5 18 0 62 . . . . 0 0 0 0 1 1 5 19 0 62 . . . . 0 0 0 0 1 1 5 20 0 62 . . . . 0 0 0 0 1 1 end label values sex MaleFemale label def MaleFemale 0 "Male", modify label def MaleFemale 1 "Female", modify label values renalevents YesNo label values stroke YesNo label values whitematterlesion YesNo label values diabetes YesNo label values hypertension YesNo label def YesNo 0 "No", modify label def YesNo 1 "Yes", modify label values beforeafter beforeafter label def beforeafter 0 "Before", modify label def beforeafter 1 "After", modify
To achieve this, I performed longitudinal regressions with the independent variable beforeafter, which identified the switch, and the clinical variable as the dependent one. To nullify any potential cofounder effect by age, I chose age as the time variable of the panel.
As an example, it looked like this.
Listed 100 out of 200 observations
Use the count() option to list more
Code:
. xtset id age_ Panel variable: id (weakly balanced) Time variable: age_, 22 to 74 Delta: 1 unit
Code:
xtreg mgfr_ beforeafter, re Random-effects GLS regression Number of obs = 135 Group variable: id Number of groups = 10 R-squared: Obs per group: Within = 0.2163 min = 9 Between = 0.0368 avg = 13.5 Overall = 0.0339 max = 16 Wald chi2(1) = 34.49 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ mgfr_ | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- beforeafter | -9.922148 1.689414 -5.87 0.000 -13.23334 -6.610959 _cons | 89.87322 8.022397 11.20 0.000 74.14961 105.5968 -------------+---------------------------------------------------------------- sigma_u | 25.333941 sigma_e | 7.6829071 rho | .91577617 (fraction of variance due to u_i) ------------------------------------------------------------------------------
Code:
xtreg mgfr_ beforeafter acei_ arb_ diabetes hypertension, re Random-effects GLS regression Number of obs = 135 Group variable: id Number of groups = 10 R-squared: Obs per group: Within = 0.2168 min = 9 Between = 0.1903 avg = 13.5 Overall = 0.1745 max = 16 Wald chi2(5) = 35.50 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ mgfr_ | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- beforeafter | -10.048 1.779538 -5.65 0.000 -13.53583 -6.560166 acei_ | -.0579371 2.240219 -0.03 0.979 -4.448686 4.332812 arb_ | .933376 3.457939 0.27 0.787 -5.844061 7.710813 diabetes | -28.92689 32.37343 -0.89 0.372 -92.37766 34.52387 hypertension | -6.379531 19.84124 -0.32 0.748 -45.26765 32.50859 _cons | 96.56906 14.7622 6.54 0.000 67.63569 125.5024 -------------+---------------------------------------------------------------- sigma_u | 29.672306 sigma_e | 7.7429727 rho | .93624663 (fraction of variance due to u_i) ------------------------------------------------------------------------------
My overarching question is: Is longitudinal regression correct to answer my research question? If so, is putting age as the time variable of the panel the correct way to adjust by age?
And finally…am I interpreting my results correctly? For qualitative variables, logistic regression would be used, as they are mainly 0/1 variables.
Thank you very much for all your help and apologies for the long post!
Best regards,
David.
Comment