Referencing a specific cell by values of other variables

Nathan Barrymore

Join Date: Jul 2018

Posts: 5
#1

Referencing a specific cell by values of other variables

19 Jul 2018, 17:25

Hello,

I would like to reference a particular cell in my database to generate a local macro (I think “local macro” is the right terminology, but I’m not sure). All of this is done within a few loops, but I’ve ignored that part of the code because I suspect the loops make no difference for this question.

I would like to do this:

Look up the value of “shares_owned” and "control_owned" for observations which have specific values of “mngr_no” and “cusip”. Then, I would like to use this value in a calculation.

Closer to pseudo-code:

Local a_k = [mngr_no 1’s control_owned of cusip 11111] * [mngr_no 1’s shares_owned of cusip 222222]

Data looks like this:
mngr_no cusip shares_owned control_owned

1 111111 100 100

2 111111 50 25

1 222222 200 100

2 222222 500 10

Thank you!

Nathan
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

19 Jul 2018, 17:37

This will illustrate the approach. It is easy to wrap this inside loops.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte mngr_no long cusip int shares_owned byte control_owned 1 111111 100 100 2 111111 50 25 1 222222 200 100 2 222222 500 10 end // FIND MGR 2'S SHARES OWNED AND CONTROL OWNED // FOR CUSIP 111111 isid mngr_no cusip gen long obs_no = _n summ obs_no if mngr_no == 2 & cusip == 111111, meanonly local shares_owned_2_111111 = shares_owned[`r(mean)'] local control_owned_2_111111 = control_owned[`r(mean)'] display `shares_owned_2_111111', `control_owned_2_111111'

Note: This approach can only work if the combination of mngr_no and cusip uniquely identifies observations in the data. Of course, if that assumption fails then your question is ill-posed because there would be no such thing as "the" cell corresponding to that combination of mngr_no and cusip.

In the future, when showing data examples, please use the -dataex- command to do so. as I have in this response. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

Additional advice: Stata datasets are not spreadsheets. If you try to use Stata as if it were a spreadsheet you will, at best, be inefficient, and at times your instincts will lead you seriously astray. To keep Stata separate in your mind from spreadsheets, it is helpful not to use spreadsheet terminology when talking about Stata. Thus, in Stata, we do not refer to rows, columns or cells. We speak of observations, variables, and values of variables in particular observations.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#3

19 Jul 2018, 17:43

It dawns on me that the above code is a little bit brittle. If, for whatever reason, you insert additional code after the -gen long obs_no = _n- line and that code changes the sort order of the data, you will get incorrect results.

The following is more robust:

Code:

// FIND MGR 2'S SHARES OWNED AND CONTROL OWNED // FOR CUSIP 111111 isid mngr_no cusip summ shares_owned if mngr_no == 2 & cusip == 111111, meanonly local shares_owned_2_111111 = `r(mean)' summ control_owned if mngr_no == 2 & cusip == 111111, meanonly local control_owned_2_111111 = `r(mean)'

and does not rely on preserving the sort order at all. The drawback to it is that in a large data set it will run noticeably slower, and if you are putting this code inside some kind of loop, the performance hit could be appreciable.
Comment
Nathan Barrymore

Join Date: Jul 2018

Posts: 5
#4

19 Jul 2018, 17:43

Thank you very much! (And I will install -dataex- and use the correct terminology in the future, appreciate the instruction.)
Comment

mngr_no	cusip	shares_owned	control_owned
1	111111	100	100
2	111111	50	25
1	222222	200	100
2	222222	500	10

Announcement

Referencing a specific cell by values of other variables

Comment

Comment

Comment