Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Referencing a specific cell by values of other variables

    Hello,

    I would like to reference a particular cell in my database to generate a local macro (I think “local macro” is the right terminology, but I’m not sure). All of this is done within a few loops, but I’ve ignored that part of the code because I suspect the loops make no difference for this question.

    I would like to do this:

    Look up the value of “shares_owned” and "control_owned" for observations which have specific values of “mngr_no” and “cusip”. Then, I would like to use this value in a calculation.

    Closer to pseudo-code:

    Local a_k = [mngr_no 1’s control_owned of cusip 11111] * [mngr_no 1’s shares_owned of cusip 222222]

    Data looks like this:
    mngr_no cusip shares_owned control_owned
    1 111111 100 100
    2 111111 50 25
    1 222222 200 100
    2 222222 500 10
    Thank you!

    Nathan

  • #2
    This will illustrate the approach. It is easy to wrap this inside loops.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte mngr_no long cusip int shares_owned byte control_owned
    1 111111 100 100
    2 111111  50  25
    1 222222 200 100
    2 222222 500  10
    end
    
    
    //    FIND MGR 2'S SHARES OWNED AND CONTROL OWNED
    //    FOR CUSIP 111111
    
    isid mngr_no cusip
    gen long obs_no = _n
    summ obs_no if mngr_no == 2 & cusip == 111111, meanonly
    local shares_owned_2_111111 = shares_owned[`r(mean)']
    local control_owned_2_111111 = control_owned[`r(mean)']
    
    display `shares_owned_2_111111', `control_owned_2_111111'
    Note: This approach can only work if the combination of mngr_no and cusip uniquely identifies observations in the data. Of course, if that assumption fails then your question is ill-posed because there would be no such thing as "the" cell corresponding to that combination of mngr_no and cusip.

    In the future, when showing data examples, please use the -dataex- command to do so. as I have in this response. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Additional advice: Stata datasets are not spreadsheets. If you try to use Stata as if it were a spreadsheet you will, at best, be inefficient, and at times your instincts will lead you seriously astray. To keep Stata separate in your mind from spreadsheets, it is helpful not to use spreadsheet terminology when talking about Stata. Thus, in Stata, we do not refer to rows, columns or cells. We speak of observations, variables, and values of variables in particular observations.

    Comment


    • #3
      It dawns on me that the above code is a little bit brittle. If, for whatever reason, you insert additional code after the -gen long obs_no = _n- line and that code changes the sort order of the data, you will get incorrect results.

      The following is more robust:

      Code:
      //    FIND MGR 2'S SHARES OWNED AND CONTROL OWNED
      //    FOR CUSIP 111111
      
      isid mngr_no cusip
      summ shares_owned if mngr_no == 2 & cusip == 111111, meanonly
      local shares_owned_2_111111 = `r(mean)'
      summ control_owned if mngr_no == 2 & cusip == 111111, meanonly
      local control_owned_2_111111 = `r(mean)'
      and does not rely on preserving the sort order at all. The drawback to it is that in a large data set it will run noticeably slower, and if you are putting this code inside some kind of loop, the performance hit could be appreciable.

      Comment


      • #4
        Thank you very much! (And I will install -dataex- and use the correct terminology in the future, appreciate the instruction.)

        Comment

        Working...
        X