Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating difference in two time points for ~400 variables

    Hello all,

    I am trying to calculate the difference in values between month 0 and month 6 for a list of metabolites (~435). I have included only 3 metabolites here. I tried reshaping the data set to wide format and then generating unique values for each month using the command:

    reshape wide lglucuronicacid...., i(pid) j(month)

    Then I created a loop to calculate the difference between two-time points 0 and 6 using:

    foreach x of varlist lglucuronicacid.... {
    gen diff_`x'= `x'6 - `x'0
    }

    I am getting the following error from STATA: lglucuronicacid ambiguous term

    This is how my dataset looks like in long format
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(month lglucuronicacid lhydroxybutanoicacid lisoleucine)
    0  7.658227 11.233132   11.9977
    2  7.544861 10.348557 11.735717
    6  7.914618 11.283764 11.619742
    6   6.93537  9.281358  11.51593
    2  8.258423  11.76697 11.320032
    0  7.121253  10.22216 12.143404
    2  7.666222 10.556594 11.860062
    0  7.199678 10.984428 11.812474
    6   6.72263 10.716593 11.716405
    0  7.269617 10.434292  11.91481
    6  7.472501  9.857129  11.59689
    2  7.208601 10.316888  11.69343
    6  7.533694  9.383033 11.575524
    2  7.104965  9.696095 11.216216
    0  7.360104  10.69917 11.345228
    0  7.206378 10.482038 12.228895
    6  7.652546 10.269102 12.262798
    2  8.225235 10.773944 12.511762
    6   7.77191 10.588376  12.46499
    0  7.899525 10.638088 12.090964
    2  8.476163 10.903825  12.47775
    0  7.788626 10.420285 12.412277
    6  7.991592 10.621108 12.515452
    2  8.045268 11.014588 12.693892
    0  7.992945 11.166724  11.80705
    6  7.364547  10.81382 11.902472
    2  7.474772 10.915652  11.97766
    2   8.32579 11.593593 12.019165
    6  7.800163  10.75735 11.799435
    0  7.984463 10.636985 12.075395
    0   7.84659  11.16409 12.268126
    2  7.916808  10.41217 11.703687
    6  7.711101 10.237636 12.378403
    0  7.258412  10.47895 11.799314
    2  7.601902 10.300786 11.883081
    6  7.123673 10.535637 11.626915
    2  7.591357 10.161496 12.219053
    6  7.437206  9.658993 12.261966
    0  8.212026 10.218627 11.602382
    0   8.11582  10.65674  12.24359
    6  7.269617 10.278494 12.349658
    2   8.32579  10.29306  12.42326
    6  7.762596 11.835632 11.899704
    0  6.953684 10.326793  11.84716
    2  6.892642  9.794788 12.061428
    2 8.3113985 10.499352 12.244057
    0  7.635787 10.557608 12.381793
    6  7.492203 10.176716 12.204412
    0  7.983781  10.40771  11.94703
    2  7.081708 10.723246  12.18302
    6  7.508239 10.552578 12.207437
    2  7.720905  10.35644 12.291305
    0  7.484369 11.625227 12.166755
    6   7.73718  10.52366   12.1269
    2  7.562161 10.056294 11.471228
    0  6.280396   9.79568  11.69657
    6  7.817626  9.887765  11.67491
    6  7.663408  9.973992 11.919763
    0  7.848934 10.484417  12.28903
    2  8.014997  9.877862 12.048565
    2  6.919684  10.64514 11.759575
    6  6.838405 11.268967  11.70089
    0  7.520235 10.265837 11.822936
    2  8.643121  9.737138 12.291838
    0  7.907651 12.139067  12.05322
    6  8.823795  11.06996 12.516318
    2  8.203852  10.51732 12.124076
    0  7.261225 11.201538 11.655284
    6  7.451241 11.311336 12.073518
    6  7.529943 10.601722  11.22695
    0   7.81682  11.02593  11.86045
    2  7.129298  10.05277 11.481362
    6  7.941296  10.26486  12.44317
    0  7.591862 11.298234 12.180867
    2  7.508787 10.623812 12.135468
    0  7.021084 10.097573  11.55632
    2  7.038784  10.32246  11.30874
    6  7.112328  9.723284  11.71411
    2  7.864804 11.283286 11.633451
    0  7.621685 11.109818 11.662104
    6  7.528869 11.312264 11.639884
    0  8.632663 11.415125 11.937408
    2  8.169336 11.870502  11.88657
    6  8.126814 10.659962  12.41829
    6  8.373553  9.858595 12.312427
    0  8.150468 10.620888  12.49891
    2  6.156979 10.098026  12.31546
    2  7.550135 10.258045 12.398415
    0  8.597482 11.379074 11.916395
    6  7.747597 12.647976  11.97315
    2  7.616776   9.50017  11.75988
    6  7.915713  9.945828 11.603406
    0  7.599902  10.26747  11.33669
    6  7.398786  11.70074 11.804535
    2  7.646354 10.928973  11.74272
    0  6.765039 11.713954  11.75708
    0   7.90581  11.78134  11.89152
    6  7.751045 10.639766  12.08946
    2  7.557473  11.66808  12.53736
    2  7.576097 10.678514 12.569257
    end
    I appreciate any help in understanding how to calculate the difference for a list of ~435 metabolites.

  • #2
    When you reshape the data wide, you now have a series of variables lglucuronicacid0 lglucuronicacid2 and lglulcuronicacid6. When your -foreach x of varlist ...- command encounters lglucoronicacid, it doesn't know which of these variables you are referring to. Actually, you really want it to think of them as a group of variables--but that isn't how -foreach x of varlist ...- works.

    Your example data omits the crucial pid variable, so this code could not be tested, but I think it will do what you need:
    Code:
    ds pid month, not
    local metabolites `r(varlist)'
    reshape wide `metabolites', i(pid) j(month)
    
    foreach m of local metabolites {
        gen diff_`m' = `m'6 - `m'0
    }
    That said, I think it is actually simpler to do this in long layout. Again, untestable with the example data, but something like this:
    Code:
    ds pid month, not
    local metabolites `r(varlist)'
    
    sort pid month
    foreach m of local metabolites {
        by pid (month): egen value0 = max(cond(month == 0, `m', .))
        by pid (month): egen value6 = max(cond(month == 6, `m', .))
        gen diff_`m' = value6 - value0
        drop value0 value6
    }
    Last edited by Clyde Schechter; 01 Mar 2024, 10:48.

    Comment


    • #3
      Hello Clyde,

      Thank you for your response. I tested both the codes, and the second one- the long format one, is not giving me any new outputs. The code is running without any error, but I am not seeing any new variable starting with diff_. The one in wide format is giving me an error saying "Invalid syntax" after the reshape command. I have attached another example of the dataset, this time with the pid.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str14 pid float(month lglucuronicacid lhydroxybutanoicacid lisoleucine lornithine lglycerolalphaphosphate)
      "CHEA023027500" 0 7.658227 11.233132   11.9977 11.546824 6.575076
      "CHEA023027500" 2 7.544861 10.348557 11.735717 10.557738  6.55108
      "CHEA023027500" 6 7.914618 11.283764 11.619742  11.02947 6.861712
      "CHEA023028100" 0 7.121253  10.22216 12.143404 12.178192 7.532624
      "CHEA023028100" 2 8.258423  11.76697 11.320032  11.47358 6.369901
      "CHEA023028100" 6  6.93537  9.281358  11.51593 12.126705 6.656726
      end
      Thank you for your guidance regarding this.

      Comment


      • #4
        I cannot reproduce the errors you are getting when I use your new example data with my code on my setup. Both approaches produce the desired results.

        I think the problem is the way you are running the code--in fact both of the problems you are encountering are easily explained by the same error. The error is that you are trying to run this code line-by-line or in chunks. Because the code depends on the local macro metabolites which is created by the second command in either approach, it must be run uninterrupted from beginning to end. If you attempt to run part of the code, stop, and then run the rest of it, the code will break. That's because the local macro metabolites will become undefined after the interruption. Subsequent attempts to use local macro metabolites will then treat it as an empty string-that is how Stata handles undefined local macros.

        In the long layout method, the failure is that the -reshape- command has no list of metabolites after it, as `metabolites' is undefined after the interruptioin. So you get an invalid syntax error message. In the wide layout method, local metabolites is again undefined. Now, -foreach m of local metabolites- is syntactally OK. But since metabolites has no content, that means that there is nothing for the loop to iterate over: the entire loop is simply skipped, so you get no results created.

        Run the whole thing from beginning to end in one fell swoop and it will work for you. And remember for future reference: any code that uses local macros must be run without interruption from the definition of the first local macro to the final use of all of them.

        Comment

        Working...
        X