Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assigning average of values to new variable

    Hello,

    I'm seeking assistance with what appears to be a seemingly simple task, but I'm currently struggling to manage it effectively.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double FirmID long Exe_ID int Year float CNS
    4295899290 18101 2010 -.04134016
    4295899290 18101 2010 -.04134016
    4295899290 18101 2010 -.04134016
    4295899290 35166 2012 -1.1268909
    4295899290 35166 2012 -1.1192799
    4295899290 35166 2012 -1.1131546
    4295899290 35166 2012 -1.1079446
    4295899290 35166 2013 -1.8053812
    4295899290 35166 2013 -1.8028207
    4295899290 35166 2013 -1.8004007
    4295899290 35166 2013 -1.7981247
    4295899290 35166 2014   .9342402
    4295899290 35166 2014   .9355524
    4295899290 35166 2014   .9365929
    4295899290 35166 2014   .9376943
    4295899290 35166 2015   .5666934
    4295899290 35166 2015   .5674239
    4295899290 35166 2015   .5679057
    4295899290 35166 2015  .53309023
    4295899290 35166 2016  .12444775
    end

    I possess Firm and Executive IDs that are matched with Year and a variable of interest, CNS, which is associated with combinations of Exe_ID and Year. My objective is to create a new variable, CNS_Constant, which would assume the average values of CNS based on the first two years.

    In the provided data example, you can observe only Firm, 4295899290, but my dataset includes numerous other firms.

    Example:
    The first Executive with the ID 18101 has observations for only one year, which is 2010. In this case, the CNS_Constant variable for this Executive should be marked as missing as the average should be calculated based on the first two years.

    Executive 35166, on the other hand, has CNS data for the years 2012 to 2016. For this Executive, across all years, CNS_Constant should take on the value representing the average of the values assigned to the first two years, specifically 2012 and 2013. That is, the average should be calculated from the values: -1.126891, -1.11928, -1.113155, -1.107945, -1.805381, -1.802821, -1.800401, and -1.798125.

    Any guidance or assistance on how to accomplish this task would be greatly appreciated.

  • #2
    For techniques in this area, see https://journals.sagepub.com/doi/pdf...867X1101100210.

    Code:
    bys FirmID Exe_ID (Year): gen period= sum(Year!= Year[_n-1])
    by FirmID Exe_ID: egen tag= max(period==2)
    by FirmID Exe_ID: egen wanted= mean(cond(inrange(period, 1, 2) & tag, CNS, .))
    Res.:

    Code:
    . l, sepby(FirmID Exe_ID)
    
         +-----------------------------------------------------------------+
         |    FirmID   Exe_ID   Year         CNS   period   tag     wanted |
         |-----------------------------------------------------------------|
      1. | 4.296e+09    18101   2010   -.0413402        1     0          . |
      2. | 4.296e+09    18101   2010   -.0413402        1     0          . |
      3. | 4.296e+09    18101   2010   -.0413402        1     0          . |
         |-----------------------------------------------------------------|
      4. | 4.296e+09    35166   2012   -1.126891        1     1   -1.45925 |
      5. | 4.296e+09    35166   2012    -1.11928        1     1   -1.45925 |
      6. | 4.296e+09    35166   2012   -1.113155        1     1   -1.45925 |
      7. | 4.296e+09    35166   2012   -1.107945        1     1   -1.45925 |
      8. | 4.296e+09    35166   2013   -1.805381        2     1   -1.45925 |
      9. | 4.296e+09    35166   2013   -1.802821        2     1   -1.45925 |
     10. | 4.296e+09    35166   2013   -1.800401        2     1   -1.45925 |
     11. | 4.296e+09    35166   2013   -1.798125        2     1   -1.45925 |
     12. | 4.296e+09    35166   2014    .9342402        3     1   -1.45925 |
     13. | 4.296e+09    35166   2014    .9355524        3     1   -1.45925 |
     14. | 4.296e+09    35166   2014    .9365929        3     1   -1.45925 |
     15. | 4.296e+09    35166   2014    .9376943        3     1   -1.45925 |
     16. | 4.296e+09    35166   2015    .5666934        4     1   -1.45925 |
     17. | 4.296e+09    35166   2015    .5674239        4     1   -1.45925 |
     18. | 4.296e+09    35166   2015    .5679057        4     1   -1.45925 |
     19. | 4.296e+09    35166   2015    .5330902        4     1   -1.45925 |
     20. | 4.296e+09    35166   2016    .1244477        5     1   -1.45925 |
         +-----------------------------------------------------------------+
    Last edited by Andrew Musau; 30 Jan 2024, 03:43.

    Comment


    • #3
      The tricky bit here is checking for observations that may be absent. The first year is just the first year, but the second year may be absent.

      See also https://journals.sagepub.com/doi/pdf...867X1101100210 which says much more about several small techniques here.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input double FirmID long Exe_ID int Year float CNS
      4295899290 18101 2010 -.04134016
      4295899290 18101 2010 -.04134016
      4295899290 18101 2010 -.04134016
      4295899290 35166 2012 -1.1268909
      4295899290 35166 2012 -1.1192799
      4295899290 35166 2012 -1.1131546
      4295899290 35166 2012 -1.1079446
      4295899290 35166 2013 -1.8053812
      4295899290 35166 2013 -1.8028207
      4295899290 35166 2013 -1.8004007
      4295899290 35166 2013 -1.7981247
      4295899290 35166 2014   .9342402
      4295899290 35166 2014   .9355524
      4295899290 35166 2014   .9365929
      4295899290 35166 2014   .9376943
      4295899290 35166 2015   .5666934
      4295899290 35166 2015   .5674239
      4295899290 35166 2015   .5679057
      4295899290 35166 2015  .53309023
      4295899290 35166 2016  .12444775
      end
      
      bysort FirmID Exe_ID : gen FirstYear = Year[1]
      by FirmID Exe_ID : egen CountSecondYear = total(Year == FirstYear + 1 & CNS < .)
      by FirmID Exe_ID : egen MeanYear12 = mean(cond(Year <= FirstYear + 1 & CountSecondYear > 0, CNS, .))
      
      list, sepby(Exe_ID)
      Code:
      su CNS in 4/11
      
       list, sepby(Exe_ID)
      
           +------------------------------------------------------------------------+
           |    FirmID   Exe_ID   Year         CNS   FirstY~r   CountS~r   MeanY~12 |
           |------------------------------------------------------------------------|
        1. | 4.296e+09    18101   2010   -.0413402       2010          0          . |
        2. | 4.296e+09    18101   2010   -.0413402       2010          0          . |
        3. | 4.296e+09    18101   2010   -.0413402       2010          0          . |
           |------------------------------------------------------------------------|
        4. | 4.296e+09    35166   2012   -1.126891       2012          4   -1.45925 |
        5. | 4.296e+09    35166   2012    -1.11928       2012          4   -1.45925 |
        6. | 4.296e+09    35166   2012   -1.113155       2012          4   -1.45925 |
        7. | 4.296e+09    35166   2012   -1.107945       2012          4   -1.45925 |
        8. | 4.296e+09    35166   2013   -1.805381       2012          4   -1.45925 |
        9. | 4.296e+09    35166   2013   -1.802821       2012          4   -1.45925 |
       10. | 4.296e+09    35166   2013   -1.800401       2012          4   -1.45925 |
       11. | 4.296e+09    35166   2013   -1.798125       2012          4   -1.45925 |
       12. | 4.296e+09    35166   2014    .9342402       2012          4   -1.45925 |
       13. | 4.296e+09    35166   2014    .9355524       2012          4   -1.45925 |
       14. | 4.296e+09    35166   2014    .9365929       2012          4   -1.45925 |
       15. | 4.296e+09    35166   2014    .9376943       2012          4   -1.45925 |
       16. | 4.296e+09    35166   2015    .5666934       2012          4   -1.45925 |
       17. | 4.296e+09    35166   2015    .5674239       2012          4   -1.45925 |
       18. | 4.296e+09    35166   2015    .5679057       2012          4   -1.45925 |
       19. | 4.296e+09    35166   2015    .5330902       2012          4   -1.45925 |
       20. | 4.296e+09    35166   2016    .1244477       2012          4   -1.45925 |
           +------------------------------------------------------------------------+
      
      .
      . su CNS in 4/11
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
               CNS |          8    -1.45925      .36612  -1.805381  -1.107945
      EDIT: #2 from Andrew Musau uses the same main idea.

      Comment


      • #4
        Andrew Musau I think your code just checks that two or more different years are present. So, if observations were present for 2012, 2014 and so on for particular IDs, then 2014 would get classified as period 2.

        So, your code works on the first two years that are in the data, regardless of whether the second year followed the first year immediately. Mine is fussier.

        It's a question of precisely what the OP wants.

        Comment


        • #5
          You are correct Nick. It does not check that the years are consecutive.

          Comment

          Working...
          X