Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difficulty generating a conditional variable

    I'm working with some data on NFL players. I want to create a set of variables that captures the influence one player has historically had on a teammate of another position and have these variables show up for that player's current teammates. I realize that is very confusing, so I will attempt to illustrate with an example.

    Jared Goff was the most important quarterback (the "QB1") on the Los Angeles Rams in 2018. The second-most important wide receiver on the Los Angeles Rams in 2018, statistically speaking, was Brandin Cooks (he was the "WR2"). In 2017, Brandin Cooks played for the New England Patriots. His top quarterback in 2017 on the Patriots, Tom Brady, threw for 32 touchdowns that season. What I want to do is create a variable called 'wr2_qb1_pass_td_lag' for Jared Goff in 2018 that gives the number of touchdowns Goff's WR2's previous QB1 threw for (i.e. wr2_qb1_pass_td_lag = 32 if (player == "Jared Goff" & year == 2018)).



    My best attempt to do this so far has been:
    Code:
    egen wr2_qb1_pass_td_lag = max(cond(pos_rank_on_team == 2 & position == "WR"), qb1_pass_td_lag, .), by(team year)
    , but this only seems to work when the WR2 hasn't changed teams, so I don't think it's really doing what I want it to do.


    Can anyone help me out here? I'd really appreciate it!

  • #2
    Particularly given the complexity of your situation, showing us a concrete example of your data using -dataex- would increase your chances of getting help. (See the StataList FAQ about this if you have not already.) That kind of example is always very helpful, but it's particularly so in your situation. You know the structure of your data, but we don't, and almost no one is good at describing a data structure in the abstract.

    As a very first stab at your question, I have the sense that you may need end up needing a file of all pairs (see -help cross-).

    Comment


    • #3
      Hey Mike, please see below for a simplified version of my data (to go with the example I gave above). I will have a look at -cross-.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str24 player int year str7 team_abbrv_pfr str2 position float pos_rank_on_team byte(pass_td_pfr rush_td rec_td)
      "Dwayne Allen"          2017 "NWE"     "TE" 2  0  0 1
      "Danny Amendola"        2017 "NWE"     "WR" 2  0  0 2
      "Tavon Austin"          2017 "LAR-STL" "WR" 4  0  1 0
      "Tom Brady"             2017 "NWE"     "QB" 1 32  0 0
      "Tom Brady"             2018 "NWE"     "QB" 1 29  2 0
      "Rex Burkhead"          2017 "NWE"     "RB" 3  0  5 3
      "Rex Burkhead"          2018 "NWE"     "RB" 3  0  0 1
      "Brandin Cooks"         2017 "NWE"     "WR" 1  0  0 7
      "Brandin Cooks"         2018 "LAR-STL" "WR" 2  0  1 5
      "Phillip Dorsett"       2017 "NWE"     "WR" 4  0  0 0
      "Phillip Dorsett"       2018 "NWE"     "WR" 3  0  0 3
      "Julian Edelman"        2017 "NWE"     "WR" 6  0  0 0
      "Julian Edelman"        2018 "NWE"     "WR" 1  0  0 6
      "Gerald Everett"        2017 "LAR-STL" "TE" 2  0  0 2
      "Mike Gillislee"        2017 "NWE"     "RB" 2  0  5 0
      "Jared Goff"            2017 "LAR-STL" "QB" 1 28  1 0
      "Jared Goff"            2018 "LAR-STL" "QB" 1 32  2 0
      "Rob Gronkowski"        2017 "NWE"     "TE" 1  0  0 8
      "Rob Gronkowski"        2018 "NWE"     "TE" 1  0  0 3
      "Todd Gurley"           2017 "LAR-STL" "RB" 1  0 13 6
      "Todd Gurley"           2018 "LAR-STL" "RB" 1  0 17 4
      "Tyler Higbee"          2017 "LAR-STL" "TE" 1  0  0 1
      "Jeremy Hill"           2018 "NWE"     "RB" 4  0  0 0
      "Chris Hogan"           2017 "NWE"     "WR" 3  0  0 5
      "Chris Hogan"           2018 "NWE"     "WR" 2  0  0 3
      "John Kelly"            2018 "LAR-STL" "RB" 2  0  0 0
      "Cooper Kupp"           2017 "LAR-STL" "WR" 1  0  0 5
      "Cooper Kupp"           2018 "LAR-STL" "WR" 3  0  0 6
      "Dion Lewis"            2017 "NWE"     "RB" 1  0  6 3
      "Sony Michel"           2018 "NWE"     "RB" 1  0  6 0
      "Malcolm Mitchell"      2017 "NWE"     "WR" 5  0  0 0
      "Cordarrelle Patterson" 2018 "NWE"     "WR" 4  0  1 3
      "Sammy Watkins"         2017 "LAR-STL" "WR" 3  0  0 8
      "James White"           2017 "NWE"     "RB" 4  0  0 3
      "James White"           2018 "NWE"     "RB" 2  0  5 7
      "Robert Woods"          2017 "LAR-STL" "WR" 2  0  0 5
      "Robert Woods"          2018 "LAR-STL" "WR" 1  0  1 6
      end

      Comment


      • #4
        I'm not sure I understand what you want to do, and as I know nothing about football, I can't fall back on any kind of intuition about what might make sense here. But as best I grasp what you say, you want to do this:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str24 player int year str7 team_abbrv_pfr str2 position float pos_rank_on_team byte(pass_td_pfr rush_td rec_td)
        "Dwayne Allen"          2017 "NWE"     "TE" 2  0  0 1
        "Danny Amendola"        2017 "NWE"     "WR" 2  0  0 2
        "Tavon Austin"          2017 "LAR-STL" "WR" 4  0  1 0
        "Tom Brady"             2017 "NWE"     "QB" 1 32  0 0
        "Tom Brady"             2018 "NWE"     "QB" 1 29  2 0
        "Rex Burkhead"          2017 "NWE"     "RB" 3  0  5 3
        "Rex Burkhead"          2018 "NWE"     "RB" 3  0  0 1
        "Brandin Cooks"         2017 "NWE"     "WR" 1  0  0 7
        "Brandin Cooks"         2018 "LAR-STL" "WR" 2  0  1 5
        "Phillip Dorsett"       2017 "NWE"     "WR" 4  0  0 0
        "Phillip Dorsett"       2018 "NWE"     "WR" 3  0  0 3
        "Julian Edelman"        2017 "NWE"     "WR" 6  0  0 0
        "Julian Edelman"        2018 "NWE"     "WR" 1  0  0 6
        "Gerald Everett"        2017 "LAR-STL" "TE" 2  0  0 2
        "Mike Gillislee"        2017 "NWE"     "RB" 2  0  5 0
        "Jared Goff"            2017 "LAR-STL" "QB" 1 28  1 0
        "Jared Goff"            2018 "LAR-STL" "QB" 1 32  2 0
        "Rob Gronkowski"        2017 "NWE"     "TE" 1  0  0 8
        "Rob Gronkowski"        2018 "NWE"     "TE" 1  0  0 3
        "Todd Gurley"           2017 "LAR-STL" "RB" 1  0 13 6
        "Todd Gurley"           2018 "LAR-STL" "RB" 1  0 17 4
        "Tyler Higbee"          2017 "LAR-STL" "TE" 1  0  0 1
        "Jeremy Hill"           2018 "NWE"     "RB" 4  0  0 0
        "Chris Hogan"           2017 "NWE"     "WR" 3  0  0 5
        "Chris Hogan"           2018 "NWE"     "WR" 2  0  0 3
        "John Kelly"            2018 "LAR-STL" "RB" 2  0  0 0
        "Cooper Kupp"           2017 "LAR-STL" "WR" 1  0  0 5
        "Cooper Kupp"           2018 "LAR-STL" "WR" 3  0  0 6
        "Dion Lewis"            2017 "NWE"     "RB" 1  0  6 3
        "Sony Michel"           2018 "NWE"     "RB" 1  0  6 0
        "Malcolm Mitchell"      2017 "NWE"     "WR" 5  0  0 0
        "Cordarrelle Patterson" 2018 "NWE"     "WR" 4  0  1 3
        "Sammy Watkins"         2017 "LAR-STL" "WR" 3  0  0 8
        "James White"           2017 "NWE"     "RB" 4  0  0 3
        "James White"           2018 "NWE"     "RB" 2  0  5 7
        "Robert Woods"          2017 "LAR-STL" "WR" 2  0  0 5
        "Robert Woods"          2018 "LAR-STL" "WR" 1  0  1 6
        end
        
        //  BREAK DATA SET INTO PIECES: ONE FOR QB2S ONE FOR WR1S
        //  AND DISCARD THE REST
        preserve
        keep if position == "QB" & pos_rank_on_team == 1
        keep team year pass_td_pfr
        isid team year, sort
        tempfile quarterbacks
        save `quarterbacks'
        restore
        keep if position == "WR" & pos_rank_on_team == 2
        
        //  JOIN THE PIECES WITH 1 YEAR LAG
        rangejoin year -1 -1 using `quarterbacks', by(team)
        rename pass_td_pfr_U wr2_qb1_pass_td_lag
        drop year_U
        I have made certain assumptions here. I assume that each team has only a single first-ranked quarterback in any given year. I also assume that from your description of the problem, players in positions other than WR and QB are irrelevant here.

        To run this code, you need the -rangejoin- command, written by Robert Picard and available from SSC. To use -rangejoin- you also need the -rangestat- command, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

        Comment


        • #5
          Hey Clyde, thanks for your input! I think that's close to what I'm getting at, but I'd like my final dataset to look like this:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str24 player int year str7 team_abbrv_pfr str2 position float pos_rank_on_team byte(pass_td_pfr rush_td rec_td wr2_qb1_pass_td_lag)
          "Cooper Kupp"           2018 "LAR-STL" "WR" 3  0  0 6  .
          "Jared Goff"            2017 "LAR-STL" "QB" 1 28  1 0  .
          "Tyler Higbee"          2017 "LAR-STL" "TE" 1  0  0 1  .
          "Tavon Austin"          2017 "LAR-STL" "WR" 4  0  1 0  .
          "Todd Gurley"           2018 "LAR-STL" "RB" 1  0 17 4  .
          "Sammy Watkins"         2017 "LAR-STL" "WR" 3  0  0 8  .
          "John Kelly"            2018 "LAR-STL" "RB" 2  0  0 0  .
          "Gerald Everett"        2017 "LAR-STL" "TE" 2  0  0 2  .
          "Todd Gurley"           2017 "LAR-STL" "RB" 1  0 13 6  .
          "Robert Woods"          2018 "LAR-STL" "WR" 1  0  1 6  .
          "Robert Woods"          2017 "LAR-STL" "WR" 2  0  0 5  .
          "Cooper Kupp"           2017 "LAR-STL" "WR" 1  0  0 5  .
          "Jared Goff"            2018 "LAR-STL" "QB" 1 32  2 0 32
          "Brandin Cooks"         2018 "LAR-STL" "WR" 2  0  1 5  .
          "Malcolm Mitchell"      2017 "NWE"     "WR" 5  0  0 0  .
          "Phillip Dorsett"       2017 "NWE"     "WR" 4  0  0 0  .
          "James White"           2017 "NWE"     "RB" 4  0  0 3  .
          "Julian Edelman"        2018 "NWE"     "WR" 1  0  0 6  .
          "Phillip Dorsett"       2018 "NWE"     "WR" 3  0  0 3  .
          "Chris Hogan"           2017 "NWE"     "WR" 3  0  0 5  .
          "Rob Gronkowski"        2017 "NWE"     "TE" 1  0  0 8  .
          "Rex Burkhead"          2017 "NWE"     "RB" 3  0  5 3  .
          "Dwayne Allen"          2017 "NWE"     "TE" 2  0  0 1  .
          "James White"           2018 "NWE"     "RB" 2  0  5 7  .
          "Dion Lewis"            2017 "NWE"     "RB" 1  0  6 3  .
          "Rex Burkhead"          2018 "NWE"     "RB" 3  0  0 1  .
          "Tom Brady"             2018 "NWE"     "QB" 1 29  2 0 32
          "Tom Brady"             2017 "NWE"     "QB" 1 32  0 0  .
          "Jeremy Hill"           2018 "NWE"     "RB" 4  0  0 0  .
          "Rob Gronkowski"        2018 "NWE"     "TE" 1  0  0 3  .
          "Cordarrelle Patterson" 2018 "NWE"     "WR" 4  0  1 3  .
          "Sony Michel"           2018 "NWE"     "RB" 1  0  6 0  .
          "Brandin Cooks"         2017 "NWE"     "WR" 1  0  0 7  .
          "Julian Edelman"        2017 "NWE"     "WR" 6  0  0 0  .
          "Danny Amendola"        2017 "NWE"     "WR" 2  0  0 2  .
          "Chris Hogan"           2018 "NWE"     "WR" 2  0  0 3  .
          "Mike Gillislee"        2017 "NWE"     "RB" 2  0  5 0  .
          end

          ...where Goff and Brady's value for wr2_qb1_pass_td_lag in 2018 are both 32 since Brady threw 32 touchdowns in 2017 and both Goff's 2018 WR2, Brandin Cooks, and Brady's 2018 WR2, Chris Hogan, were on Brady's team in 2017.

          Comment


          • #6
            Now I'm completely confused about what you want. According to your data, Goff and Brady are quarterbacks. I originally understood that you wanted to pair up a WR in second rank with his team's QB of first rank from the previous year. Now you are somehow pairing QBs with each other, but in a way I can't understand. And what do Cooks and Hogan have to do with this? And why does Goff get matched with Brady's prior year performance, but Brady doesn't get matched with Goff's prior year performance? I don't get any of this. It seems like you want to do some kind of "6 degrees of separation" game here but I am clueless what the rules are.
            Last edited by Clyde Schechter; 16 Nov 2019, 16:33.

            Comment


            • #7
              Sorry, I realize this is a bit confusing. I will do my best to explain more clearly.

              Goff gets matched with Brady's prior year performance because the WR2 on Goff's team in 2018 (Cooks) was on Brady's team the year prior.

              Brady doesn't get matched with Goff's prior year performance because the WR2 on Brady's team in 2018 (Hogan) was not on Goff's team the year prior.

              Brady gets "matched" with his own prior year performance because the WR2 on Brady's team in 2018 (Hogan) was on Brady's team the year prior.



              Comment


              • #8
                OK, I think I have it now.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str24 player int year str7 team_abbrv_pfr str2 position float pos_rank_on_team byte(pass_td_pfr rush_td rec_td)
                "Cooper Kupp"           2018 "LAR-STL" "WR" 3  0  0 6
                "Jared Goff"            2017 "LAR-STL" "QB" 1 28  1 0
                "Tyler Higbee"          2017 "LAR-STL" "TE" 1  0  0 1
                "Tavon Austin"          2017 "LAR-STL" "WR" 4  0  1 0
                "Todd Gurley"           2018 "LAR-STL" "RB" 1  0 17 4
                "Sammy Watkins"         2017 "LAR-STL" "WR" 3  0  0 8
                "John Kelly"            2018 "LAR-STL" "RB" 2  0  0 0
                "Gerald Everett"        2017 "LAR-STL" "TE" 2  0  0 2
                "Todd Gurley"           2017 "LAR-STL" "RB" 1  0 13 6
                "Robert Woods"          2018 "LAR-STL" "WR" 1  0  1 6
                "Robert Woods"          2017 "LAR-STL" "WR" 2  0  0 5
                "Cooper Kupp"           2017 "LAR-STL" "WR" 1  0  0 5
                "Jared Goff"            2018 "LAR-STL" "QB" 1 32  2 0
                "Brandin Cooks"         2018 "LAR-STL" "WR" 2  0  1 5
                "Malcolm Mitchell"      2017 "NWE"     "WR" 5  0  0 0
                "Phillip Dorsett"       2017 "NWE"     "WR" 4  0  0 0
                "James White"           2017 "NWE"     "RB" 4  0  0 3
                "Julian Edelman"        2018 "NWE"     "WR" 1  0  0 6
                "Phillip Dorsett"       2018 "NWE"     "WR" 3  0  0 3
                "Chris Hogan"           2017 "NWE"     "WR" 3  0  0 5
                "Rob Gronkowski"        2017 "NWE"     "TE" 1  0  0 8
                "Rex Burkhead"          2017 "NWE"     "RB" 3  0  5 3
                "Dwayne Allen"          2017 "NWE"     "TE" 2  0  0 1
                "James White"           2018 "NWE"     "RB" 2  0  5 7
                "Dion Lewis"            2017 "NWE"     "RB" 1  0  6 3
                "Rex Burkhead"          2018 "NWE"     "RB" 3  0  0 1
                "Tom Brady"             2018 "NWE"     "QB" 1 29  2 0
                "Tom Brady"             2017 "NWE"     "QB" 1 32  0 0
                "Jeremy Hill"           2018 "NWE"     "RB" 4  0  0 0
                "Rob Gronkowski"        2018 "NWE"     "TE" 1  0  0 3
                "Cordarrelle Patterson" 2018 "NWE"     "WR" 4  0  1 3
                "Sony Michel"           2018 "NWE"     "RB" 1  0  6 0
                "Brandin Cooks"         2017 "NWE"     "WR" 1  0  0 7
                "Julian Edelman"        2017 "NWE"     "WR" 6  0  0 0
                "Danny Amendola"        2017 "NWE"     "WR" 2  0  0 2
                "Chris Hogan"           2018 "NWE"     "WR" 2  0  0 3
                "Mike Gillislee"        2017 "NWE"     "RB" 2  0  5 0
                end
                
                tempfile original
                save `original'
                
                //  REDUCE DATA SET TO FIRST RANKED QUARTERBACKS AND SECOND RANKED WIDE RECEIVERS
                keep if inlist(position, "QB", "WR")
                
                
                //  FIND EACH PLAYER'S TEAM FROM THE PREVIOUS YEAR
                preserve
                tempfile copy
                save `copy'
                restore
                rangejoin year -1 -1 using `copy', by(player) keepusing(team_abbrv_pfr)
                drop year_U
                rename *_U prior_*
                
                //  NOW SEPARATE THE DATA INTO QUARTERBACKS AND WIDE RECEIVERS
                preserve
                keep if position == "QB"
                tempfile quarterbacks
                save `quarterbacks'
                restore
                keep if position == "WR"
                tempfile wide_receivers
                save `wide_receivers'
                
                //  MATCH EACH 1ST RANKED QUARTERBACK WITH THE 2ND RANKED WIDE RECEIVER
                //  ON HIS TEAM IN THE CURRENT YEAR
                use `quarterbacks', clear
                keep if pos_rank_on_team == 1
                ds team year, not
                rename (`r(varlist)') =_1
                joinby team_abbrv_pfr year using `wide_receivers', unmatched(master)
                keep if pos_rank_on_team == 2
                drop pass_td_pfr rush_td rec_td _merge
                rename team_abbrv_pfr =_1
                
                //  NOW ASSOCIATE THE WR WITH HIS PRIOR TEAM'S 1ST RANKED QUARTERBACK
                rename year year_1
                gen year = year_1 - 1
                drop position pos_rank_on_team
                rename prior_team_abbrv_pfr team_abbrv_pfr
                joinby team_abbrv_pfr year using `quarterbacks', unmatched(master)
                keep if pos_rank_on_team == 1
                
                //  CLEAN UP
                keep *_1 pass_td_pfr
                drop prior_team_abbrv_pfr
                rename pass_td_pfr wr2_qb1_pass_td_lag
                rename *_1 *
                
                //  MERGE BACK WITH ORIGINAL DATA
                merge 1:1 player year using `original', assert(match using) nogenerate

                Comment


                • #9
                  This works! Thank you so much for your help!

                  Comment

                  Working...
                  X