Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop over values of a variable not working

    Hello! I have a dataset containing voters as observations. It includes parties they voted for (par) and elections in which they voted (elc). From total number of votes each party gained (votNUM) and total number of voters per election I'd like to create a variable that indicates how many voters voted for the same party in the same election on a scale from 0 to 1 (vot). I need to do this to compensate for different sample sizes of elections. Returning the number of observations per election works perfectly when I do it manually, but looping the same process only returns the total number of observations in spite of the "if elc == `elc'" statement.

    What's wrong with the loop?

    bys par: gen votNUM = _N
    gen vot = .
    foreach elc in elc {
    quietly sum votNUM if elc == `elc'
    quietly replace vot = (votNUM/r(N)) if elc == `elc'
    }

  • #2
    The problem is with
    Code:
    foreach elc in elc {
    What that does is create a loop that runs exactly once, with `elc' set to elc. Within that loop, your -if elc == `elc'- condition therefore translates to -if elc == elc-. Of course, elc always equals itself, so all observations are included in the calculations.

    What I think you mean is to first create a list of all the elections, and then loop over the values in that list. So something like this:

    Code:
    levelsof elc, local(elections)
    foreach elc of local elections {
            etc.
    }
    Note: I haven't scrutinized the code inside your loop for other problems. So this may or may not give you what you are looking for, but it will at least get the loop to loop over the different elections.

    Comment


    • #3
      Thank you, Clyde! I guess I'm not so familiar with macros as I thought.

      However, I still didn't manage to make the loop work. Stata displays error "AUS_2007 not found", while the first value of elc is a string called "AUS_2007" (without the quotation marks). So I assume the loop is still the problem. Could the values of elc cause this error?

      This is the output of "levelsof elc" if it should matter.
      . levelsof elc
      `"AUS_2007"' `"AUT_2008"' `"BLR_2008"' `"BRA_2006"' `"BRA_2010"' `"CAN_2008"' `"CHE_2007"' `"CHL_2009"'
      Here's the code I used:
      levelsof elc, local(elections)
      foreach elc of local elections {
      quietly sum votNUM if elc == `elc'
      quietly replace vot = (votNUM/r(N)) if elc == `elc'
      }

      Comment


      • #4
        Well, as I said, I didn't scrutinize the code in the interior of your loop. Since `elc' is a string, to which you want to compare the value of the string variable elc, you need quotes around `elc' in both places inside the loop:
        Code:
        levelsof elc, local(elections)
         foreach elc of local elections {
             quietly sum votNUM if elc == `"`elc'"'
             quietly replace vot = (votNUM/r(N)) if elc == `"`elc'"'
         }

        Comment


        • #5
          The count of observations can be obtained without a loop, so if I understand correctly the code boils down to this.

          Code:
          bysort par : gen votNUM = _N 
          bysort elc : replace votNUM = votNUM/_N

          Comment


          • #6
            Thank you both! I actually tried adding quotes, but had no idea I have to put the quotes itself between the apostrophes. Now it works perfectly. And as silly as I feel, the simple code by Nick does the trick just as well. I actually began with "bysort elc", but didn't think of using the "_N" there and at some point decided for the loop. Still, I'm glad I asked about the loop, since I had the same mistake elsewhere in the code. Thanks again!

            Comment


            • #7
              As for that other loop, I again ran into a loop-related problem I couldn't crack myself. I need to run a regression for each party as dummy variable (par_*, generated from par) in each election (elc). I tried nesting loops (the code below), but the problem here is that the regressions for each election will be carried out also for parties that are not related to the respective election, which results in outcome not varying for some regressions. Is there any way to get around this problem? What I think I need to do is to tell Stata to only compute that regression in each election for party dummies that can have a value of 1 and not for those party dummies that only have a value of 0 in case of the respective election.

              levelsof elc, local(elections)
              foreach elc of local elections {
              foreach par of varlist par_* {
              quietly logit `par' rel if elc ==`"`elc'"'
              quietly replace relB = _b[rel] if `par' == 1
              quietly replace relOR = exp(_b[rel]) if `par'==1
              quietly replace relSIG = chi2tail(1,(_b[rel]/_se[rel])^2) if `par' == 1

              }
              }

              Comment


              • #8
                So, I take it that only a subset of parties actually participate in a given election. In that case you need a different logic:

                Code:
                levelsof elc, local(elections)
                foreach elc of local elections {
                    foreach par of varlist par_* {
                        // DETERMINE IF THIS PARTY PARTICIPATED IN THIS ELECTION
                        count if `par' == 1 &  elc == `"`elc'"'
                        if `r(N)' > 0 {
                             quietly logit `par' rel if elc == `"`elec'"'
                             quietly replace relB = ... etc.
                        }
                    }
                }
                NOTE: This code assumes that each of the par_* variables is coded only 0 for non-participation and 1 for non-participation. This code will also break if there is any election in which `par' is always 1 for that election (which, I guess would mean that no other party ran in the "election.") For now, I'll assume that that doesn't actually happen. (If it does happen, repost: there is a simple way to deal with it.)

                Now, all of that said, it looks to me as if there is some other problem here. At the end of this pair of nested loops, the values of relB, relOR, and relSIG will represent the corresponding regression outputs for only the last election (in order of occurrence in local macro elections) in which the party participated, because you keep writing the results over all observations where `par' == 1 regardless of the value of elc in the observation. I'm not sure exactly what you're trying to do here, but my guess is that what you really want requires adding -& elc == `"`elc'"'- to the end of your -quietly replace- commands. If you really want it only for the last value of `elc', then it's unclear why you bother running a loop.

                Comment


                • #9
                  This worked like a charm with the additional conditions for replacing the values. I also did some manual testing with several parties and the results were identical to what the loop did. That really saved my day!

                  Comment


                  • #10
                    Hello,

                    I do difference-in-difference estimation in Stata.

                    I have hourly household consumption “cons”, which I regress on my post-treatment (and post-interaction) variable “treat”. I have household-hour and date-hour fixed effects, standard errors are clustered at a household-hour level. I need to perform the regression for each hour-of-the-sample, “hour_of_sample”. In order to do this, I use a loop (I follow Clyde Schechter’s advice on how to code it).

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input long household float cons byte treat double hour_of_sample int year byte(month day hour) int day_of_sample
                    170001  .8419615 0 1.8251136e+12 2017 11 1  1 21124
                    170001  .7378439 0 1.8251172e+12 2017 11 1  2 21124
                    170001    .83577 0 1.8251208e+12 2017 11 1  3 21124
                    170001    .82187 0 1.8251244e+12 2017 11 1  4 21124
                    170001  .8314207 0  1.825128e+12 2017 11 1  5 21124
                    170001  1.308828 0 1.8251316e+12 2017 11 1  6 21124
                    170001 1.6755122 0 1.8251352e+12 2017 11 1  7 21124
                    170001  3.449555 0 1.8251388e+12 2017 11 1  8 21124
                    170001 1.4857545 0 1.8251424e+12 2017 11 1  9 21124
                    170001 1.0911116 0  1.825146e+12 2017 11 1 10 21124
                    170001  .9769514 0 1.8251496e+12 2017 11 1 11 21124
                    170001  .9466591 0 1.8251532e+12 2017 11 1 12 21124
                    170001 1.0181745 0 1.8251568e+12 2017 11 1 13 21124
                    170001 1.0498574 0 1.8251604e+12 2017 11 1 14 21124
                    170001 1.5275816 0  1.825164e+12 2017 11 1 15 21124
                    170001  1.340508 0 1.8251676e+12 2017 11 1 16 21124
                    170001  1.559614 0 1.8251712e+12 2017 11 1 17 21124
                    170001   1.64184 0 1.8251748e+12 2017 11 1 18 21124
                    170001 1.3092457 0 1.8251784e+12 2017 11 1 19 21124
                    170001  .9569498 0  1.825182e+12 2017 11 1 20 21124
                    end
                    format %tc hour_of_sample
                    format %td day_of_sample


                    global loop = "/Users/…/loop"
                    cd $loop

                    use hourly_data

                    tempfile estimates_datehour_1
                    tempfile estimates_datehour_all

                    drop _all
                    save estimates_datehour_all, emptyok
                    use hourly_data

                    sort household hour_of_sample
                    quietly levelsof hour_of_sample, local(datehours)

                    foreach h of local datehours {
                    display `h'
                    reghdfe cons treat if hour_of_sample ==`h', absorb(i.household#i.hour i.hour#i.day_of_sample) vce(i.household#i.hour)
                    parmest, saving(estimates_datehour_1, replace)
                    use estimates_datehour_all, clear
                    append using estimates_datehour_1
                    save estimates_datehour_all, replace
                    use hourly_data
                    }

                    I do not interact factor variables for treatment and each hour-of-the-sample because Stata shuts down every time I try to run this kind of regression (I have big data). I do not use “preserve” and “restore” commands because Stata returns the error that there is not enough space on a disk.

                    What I want is to display an “hour_of_sample” in %tc format before each estimation result so that I can see an hour to which my estimates refer to (so that later I can draw a figure with some hours of the sample on the x-axis, etc.). I mean, in my log file, I would like to see this

                    01nov2017 00:00:00
                    …estimation results…
                    01nov2017 01:00:00
                    …estimation results…

                    With my code, I only see
                    1.8251136e+12
                    …estimation results…
                    1.8251172e+12
                    …estimation results…

                    Ideally, I also want to see the hour-of-sample for each of the estimation in the “estimates_datehour_all” file, but I do not know how to do this at all.

                    Could you please help me?

                    Many thanks!

                    Comment

                    Working...
                    X