Replace with first group values

Sartaj Hussain

Join Date: Jan 2020
Posts: 342

Replace with first group values

07 Apr 2022, 20:02

I want to know how to replace the missing values for other groups (by mdate) with the values of group 1 in case of mk and ex variable which are same for all groups. Each group is identified by the stock_id. The illustrative data example is appended.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(date mdate) byte stock_id float rt byte mk int ex
22281 732 1  .21 20 101
22312 733 1  .23 30 255
22340 734 1  .01 40  62
22371 735 1  .27 50   7
22281 732 2 1.21  .   .
22312 733 2 2.52  .   .
22340 734 2 1.11  .   .
22371 735 2  .78  .   .
22281 732 3 3.55  .   .
22312 733 3  .99  .   .
22340 734 3 1.76  .   .
22371 735 3  .29  .   .
end
format %td date
format %tm mdate

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#2

07 Apr 2022, 20:47

Code:

foreach v of varlist mk ex { by mdate (`v'), sort: replace `v' = `v'[1] } sort stock_id mdate

Note: This solution is slightly more general than the stated problem. It does not require that the stock_id that has the non-missing values have the value stock_id = 1, nor even that it have the numerically smallest value of all stock_ids. The only requirement here is that in any given month, there be only one distinct non-missing value for each of the variables being filled in. There can be more than one non-missing value, but they must not disagree.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#3

07 Apr 2022, 21:09

Clyde Schechter what's the difference between "by [varlist], sort" and "bys [varlist]"?

I've always used bys/qbys, presuming it meant the same thing as the code you used.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#4

07 Apr 2022, 22:21

There is no difference. In the very early days of Stata, only -by [varlist], sort- existed. The -bysort [varlist]- and abbreviation -bys [varlist]- were introduced later, after I had already been using -by [varlist], sort- several times a day every day for some years. The habit stuck with me, especially since I'm a good, fast typist and not easily enticed to change habits by the prospect of sparing a couple of keystrokes. In fact, it's even more ingrained than a mere habit. My fingers just go and type it out of muscle memory without me even consciously thinking about it.

More generally I'm a creature of habit. I'm one of the few people who consistently codes -gen byte dichotomous_variable = ...-. And I always -compress- any newly created data set before I -save- it. Now, with the memory available in modern computers, it is rarely necessary to care about sparing memory, nor disk space, at least not in statistical applications. But I started programming back in the early 1960's when memory was very expensive and that of a really large computer was denominated in modest numbers of kilobytes, so you had to agonize over every bit you used when programming anything more than a toy problem. And the habit of using the smallest amount of memory possible has largely stuck with me.

Last edited by Clyde Schechter; 07 Apr 2022, 22:24.
1 like
Comment

Announcement

Replace with first group values

Comment

Comment

Comment