Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • looping to create new dataset

    I am hoping someone can point me in the right direction. I have two datasets that contain the same variable/observation structure. There are 2,362 variables in each dataset, each representing a groundwater well. Each well has 3,621 observations, the unique identifier of which is the date (dmy) the well was sampled. The dates are the same in each database. One of data categorizes the wells at each point in time depending on whether or not the groundwater level was above or below certain thresholds (call this category.dta). Values range from 0 to 3. The second contains how much water was pumped from each well over the date period (which are in 5-day increments, call this pumping.dta). I want to create a new dataset with the same structure with each value being based on values for the same variable/observation in category.dta and pumping.dta. So for example, if the category.dta value for the well on a specific DMY is 1 and pumping is equal to zero for the same well/DMY in the pumping.dta dataset, I want to assign that well/date combination a "1" in a new dataset or output file. if the category.dta value for the well on a specific DMY is 2 and pumping is greater than zero for the same well/DMY in the pumping.dta dataset, I want to assign that well/date combination a "3" in a new dataset or output file. There are a total of four such combinations.
    I am struggling with how to structure this and where to get started - i have been reading about looping, etc. and have researched all day without much to show for it. I am hoping someone can point me in the right direction! Thank you.

  • #2
    If I understand your situation correctly, each observation in each data set refers to a specific date and each variable to a specific well. This arrangement of the data isn't very easy to work with in Stata. You will need to first -reshape- both data sets to long layout. Then you can -merge- them together and then finally calculate your new variable. So it would look something like this:

    Code:
    use category, clear
    ds date, not
    rename (`r(varlist)') thr=
    reshape long thr, i(date) j(well_id)
    rename thr groundwater_threshold
    tempfile thresholds
    save `thresholds'
    
    use pumping, clear
    ds date, not
    rename (`r(varlist)') pump=
    reshape long pump, i(date) j(well_id)
    rename pump pumped_amount
    merge 1:1 date well_id using `thresholds'
    // NOW YOU HAVE THE DATA IN A CONVENIENT FORM TO CALCULATE YOUR NEW VARIABLE
    Now, I don't know what these different well_ids look like, other than that they are variable names that Stata accepts. But it is possible that some of them may be too long to be prefixed with thr or pump. In that case, try just t and p. in the -rename- and -reshape- commands instead.

    I have not provided code for the calculation of the new variable because I don't understand your description of how it is to be done. Apparently it is based on the combined values of the groundwater threshold and the amount pumped, but the actual formula isn't clear to me. You state that there are four possibilities, but since the groundwater threshold alone has four possible values, it would seem to me that there would be at least 8 values for the new variable, and that's in the smallest case scenario where the pumping amount only takes on two values. So I'll leave it to you to work that out. But with this arrangement of the data, where each date's pumping and groundwater threshold for a given well are side by side, with no other well's data or other dates in that observation, it should be simple enough for somebody who knows what the formula is.

    Comment


    • #3
      Thank you! I will give this a try!

      Comment

      Working...
      X