Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Observation changes during age in months calculation

    Dear all,
    I'm doing data analysis on childhood vaccination coverage among vulnerable population. However, I'm having trouble calculating age in months from the interview date to the date of birth. Please find attached a solution to my problem regarding observation change. Firstly, I include the birth cohort from 2020–2021, and then I check duplicate reports and drop duplicate reports. Finally, calculating age in month and segment on target months on 12-23 months, i got a sample when I ran the codes and it's changed when repeated this code again. Furthermore, I need to solve this problem regarding the accuracy of the analyzed sample. Please check it and give me a possible solution. Below, I include my code:

    ************************************************** **********
    ****** Data preparation ********
    ************************************************** *******

    gen year_cdob= year(cdob)

    ta year_cdob

    ta year_cdob if year_cdob>2019 & year_cdob<2022


    ************************************************** ************
    /// Inclusion crateria ///
    ************************************************** *******

    Child date of birth: January 1, 2020, to Dec 31, 2021

    keep if year_cdob>2019 & year_cdob<2022

    //// Who reported multiple data points regarding vaccination

    bysort crid: gen tot_n=_n

    bysort crid: gen tot_N=_N

    keep if tot_n== tot_N

    count

    A Child aged 12-23 months were included in study ///

    gen age_days = intdt-cdob

    gen age_month=int((intdt-cdob)/30.44)

    ta age_month

    age_month if age_month > 11 & age_month <= 23

    Keep if age_month > 11 & age_month <= 23

    Attached Files

  • #2
    Note: Please read the Forum FAQ for excellent advice about the best way to show example data. Among the things you will learn there is that attachments are discouraged. Some Forum members, including me, will not download and open attachments from people we do not know, due to the risk of malware. Please follow the advice in the FAQ and use the -dataex- command to show example data.

    Accordingly, I have not looked at your data

    The irreproducibility of your results, I believe, stems from:
    Code:
    bysort crid: gen tot_n=_n
    
    bysort crid: gen tot_N=_N
    
    keep if tot_n== tot_N
    You have more than one observation for at least some values of crid and you are trying to eliminate all but one of them. When you use -bysort crid-, because crid does not determine a unique sort order for the data, Stata will sort the observations within each crid-defined group randomly. Consequently, the particular observation among those with a given value of crid for which tot_n turns out to have the value of tot_N may differ from one run of the code to the next. So each time you run the code you are potentially using different data after -keep if tot_n == tot_N- is executed.

    Now, there are ways to get around this problem. But in this situation I think that it would be a mistake to do that, because I think the error is deeper than that. It seems that although you have multiple observations with a given crid (for at least some crid's), the fact that the results of your calculation of age_month change when you rerun the code tells me that these observations that agree on the value of crid may disagree on the value of intdt. So what your code is doing is not the elimination of duplicates: it is the arbitrary (random) selection of single observations from groups of observations that conflict with each other on the value of the variable intdt.

    This leaves us with a few possibilities. One possibility is that the conflicting values of intdt for observations with the same crid represent data errors. If so, these errors need to be fixed. Another possibility is that they are not data errors but represent dates of different events for the same child. In that case, you need to use some other variable in the data set to identify which of these events is the one for which you want to calculate the age, and you need to write code that will keep that observation, rather than the code I quoted above, which selects a random observation. Another possibility is that you need to calculate the age in months for all of observations for the same crid. In that case you can just eliminate the code above altogether.

    To figure out what is going on, you will need to examine the offending observations with conflicting values of intdt and the same crid. To do that:
    Code:
    duplicates tag crid, gen(flag)
    browse if flag
    Finally, once you get all of the above straightened out, if you are using current Stata (version 18) or even version 17, you can calculate the age in months using the -datediff()- or -datediff_frac()- function. See -help datediff()- or -help datediff_frac()-.

    Comment

    Working...
    X