Looping observation by observation and within variable by variable against another variables value, in that observation

Chuck Stone

Join Date: Jun 2024

Posts: 3
#1

Looping observation by observation and within variable by variable against another variables value, in that observation

27 Jun 2024, 07:39

Yes So I cannot figure this out. Here is my code

I have created variables and initialized and now I want to check each observations value in a variable Plat_HP_months and then replace the values in those initialized variables ONLY FOR THAT OBSERVATION one by one (starting at 1 and stopping at the variable that matches the value in a variable Plat_HP_months) with a 1. I appreciate your help!

log using "`folder_path'temp_debugging.log", replace //

sort Plat_HP_months
* Get the number of observations (rows) in your dataset
local N = _N
display "Total number of observations: `N'"

* Loop through each observation
forval i = 1/`N' {
display "Processing observation `i'"

* Get the value of Plat_HP_months for the current observation
local Plat_HP_months = Plat_HP_months[`i']
display "Plat_HP_months = `Plat_HP_months' for observation `i'"

* Loop through each observation's monthly dispersion variable d_disp_Mo1 to d_disp_MoN
forval month = 1/`Plat_HP_months' {
display "Processing month `month' of `Plat_HP_months'"

* Skip the replacement if the condition is met
if `month' > `Plat_HP_months' {
display "Condition met for month `month'. Skipping replacement."
continue
}

quietly replace d_disp_Mo_`month' = 1 if `month' <= `Plat_HP_months' in `i'

* Break the loop if condition is met (assuming you want to stop checking further months)
}
}
log close
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

27 Jun 2024, 08:39

I don't understand your explanation of what you want to do, nor can I glean the purpose from your code. In particular "...and then replace the values in those initialized variables ONLY FOR THAT OBSERVATION one by one (starting at 1 and stopping at the variable that matches the value in a variable Plat_HP_months) with a 1" seems to contradict itself.

Of course, it is possible that somebody else grasps what you want here and will respond. But if you do not receive better advice in a few hours, I think it would be helpful if you posted a small example dataset (using the -dataex- command, of course*) and then show what the desired results for that example would look like and explaining the sequence of calculations you do by hand to arrive at that result.

*If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment

Chuck Stone

Join Date: Jun 2024
Posts: 3

27 Jun 2024, 08:53

Hi Clyde,
Perfect ok here is an example of how it looks now

Dealidentifier	Plat_HP_months	max_Plat_HP_months	d_disp_Mo_1	d_disp_Mo_2	d_disp_Mo_3	d_disp_Mo_4	d_disp_Mo_5	d_disp_Mo_6	d_disp_Mo_7	d_disp_Mo_8	d_disp_Mo_9	d_disp_Mo_10	d_disp_Mo_11	d_disp_Mo_12	d_disp_Mo_13	d_disp_Mo_14	d_disp_Mo_15	d_disp_Mo_16	d_disp_Mo_17	d_disp_Mo_18	d_disp_Mo_19	d_disp_Mo_20	d_disp_Mo_21
5	11	21	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.
9	9	21	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.
25	8	21	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.
27	21	21	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.
28	10	21	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.

Except my N is much higher and based on this loop.

* Create dummy variables for each observation for each month up to max_Plat_HP_months, across all observations
forval month = 1/`max_months' {
gen d_disp_Mo_`month' = .
}

And now I am getting closer with some small tweaks except all month variables are now replaced with 1

l

og using "`folder_path'temp_debugging.log", replace //

sort Plat_HP_months
* Get the number of observations (rows) in your dataset
local N = _N
display "Total number of observations: `N'"

* Loop through each observation
forval i = 1/`N' {
display "Processing observation `i'"

* Get the value of Plat_HP_months for the current observation
local current_Plat_HP_months = Plat_HP_months[`i']
display "Plat_HP_months = `current_Plat_HP_months' for observation `i'"

* Loop through each observation's monthly dispersion variable d_disp_Mo1 to d_disp_MoN
forval month = 1/`current_Plat_HP_months' {
display "Processing month `month' of `current_Plat_HP_months'"

* Skip the replacement if the condition is met
if `month' < `current_Plat_HP_months' {
display "Condition met for month `month'. Skipping replacement."
continue
}

quietly replace d_disp_Mo_`month' = 1 in `i'

* Break the loop if condition is met (assuming you want to stop checking further months)
}
}
log close

but basically this should be the result

Dealidentifier	Plat_HP_months	max_Plat_HP_months	d_disp_Mo_1	d_disp_Mo_2	d_disp_Mo_3	d_disp_Mo_4	d_disp_Mo_5	d_disp_Mo_6	d_disp_Mo_7	d_disp_Mo_8	d_disp_Mo_9	d_disp_Mo_10	d_disp_Mo_11	d_disp_Mo_12	d_disp_Mo_13	d_disp_Mo_14	d_disp_Mo_15	d_disp_Mo_16	d_disp_Mo_17	d_disp_Mo_18	d_disp_Mo_19	d_disp_Mo_20	d_disp_Mo_21
5	11	21	1	1	1	1	1	1	1	1	1	1	1	.	.	.	.	.	.	.	.	.	.
9	9	21	1	1	1	1	1	1	1	1	1	.	.	.	.	.	.	.	.	.	.	.	.
25	8	21	1	1	1	1	1	1	1	1	.	.	.	.	.	.	.	.	.	.	.	.	.
27	21	21	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
28	10	21	1	1	1	1	1	1	1	1	1	1	.	.	.	.	.	.	.	.	.	.	.

Comment

Chuck Stone

Join Date: Jun 2024

Posts: 3
#4

27 Jun 2024, 09:06

never mind I got it!

log using "`folder_path'temp_debugging.log", replace

gsort -Plat_HP_months
* Get the number of observations (rows) in your dataset
local N = _N
display "Total number of observations: `N'"

* Loop through each observation
forval i = 1/`N' {
display "Processing observation `i'"

* Get the value of Plat_HP_months for the current observation
local current_Plat_HP_months = Plat_HP_months[`i']
display "Plat_HP_months = `current_Plat_HP_months' for observation `i'"

* Loop through each observation's monthly dispersion variable d_disp_Mo1 to d_disp_MoN
forval month = 1/`current_Plat_HP_months' {
display "Processing month `month' of `current_Plat_HP_months'"

* Skip the replacement if the condition is met
if `month' > `current_Plat_HP_months' {
display "Condition met for month `month'. Skipping replacement."
continue
}

* Replace only for the current observation
quietly replace d_disp_Mo_`month' = 1 in `i'
}
}
log close
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

27 Jun 2024, 09:28

I'm glad you found a working solution. Now that I see what you were trying to do, let me suggest a different way. First, creating variables that are coded 1/. is a bad idea in Stata and is likely to cause problems and confusion later in your code when you use these variables. It is much better to use 1/0 coding. Second, there is a simpler way to get this done that requires no explicit loops at all. The code is shorter, and much easier to follow.

To illustrate this approach, I have modified your data set to include fewer d_disp_mo_* variables and correspondingly reduced the values of the plat_hp_months variable just so everything fits neatly on my screen--but that change does not impact the code, and you can use this code on your data without modifying it. (Well, I also changed all the variable names to all lower case to make my typing easier--you would have to change that back.)

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(dealidentifier plat_hp_months max_plat_hp_months d_disp_mo_1 d_disp_mo_2 d_disp_mo_3 d_disp_mo_4 d_disp_mo_5 d_disp_mo_6 d_disp_mo_7 d_disp_mo_8 d_disp_mo_9 d_disp_mo_10 d_disp_mo_11 d_disp_mo_12 d_disp_mo_13 d_disp_mo_14 d_disp_mo_15) 5 11 11 . . . . . . . . . . . . . . . 9 9 11 . . . . . . . . . . . . . . . 25 8 11 . . . . . . . . . . . . . . . 27 10 11 . . . . . . . . . . . . . . . 28 10 11 . . . . . . . . . . . . . . . end isid dealidentifier, sort reshape long d_disp_mo_, i(dealidentifier) j(month) by dealidentifier (month): replace d_disp_mo_ = month <= plat_hp_months reshape wide

I also suspect that this code will run faster than yours if your data set really is very large.

One more suggestion: I don't know what you're going to be doing with these d_disp_mo_* variables, but you might want to consider skipping the -reshape wide- at the end. Most of Stata's data management and analysis commands work better with, or require, data in long layout. The -reshape wide- was put there to restore the wide layout you began with. But depending on what you're going to be doing, it may well be that staying in wide layout will simplify your life and make everything go faster and smoother.
Comment

Announcement