Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Practicing Writing Programs in Stata by Creating a Wordle Helper - any tips or improvements?

    Hi all,

    Been wanting to practice and improve my Stata program writing skills. I thought Wordle presented a good opportunity try to write something. I think what I came up with basically works (but I may have missed edge cases), but it still feels pretty hacky to me (e.g. looping through all iterations and then dropping arrangements with repeated letters rather than doing some sort of sampling without replacement). Anyway - would love any advice or improvements you might have:

    Code:
    capture program drop statle 
    program statle
    
        clear all 
        set more off 
    
        *Clear out any commas 
        local no_commas = subinstr("`1'", ",", "", .)
        local no_spaces = subinstr("`no_commas'", " ", "", .)
    
        *Get the length of the string free of commas and spaces
        local len = strlen("`no_spaces'") 
        
        *Remove commas and spaces from the position string as well
        local pos_no_commas = subinstr("`2'", ",", "", .) 
        local pos_no_spaces = subinstr("`pos_no_commas'", " ", "", .) 
        
    
        *Use the length information to insert blanks if not already provided
        if `len'==1{
            
            local add_spaces = "`no_spaces'" + " _ _ _ _" 
        
        }
    
        else if `len'==2{
            
            local add_spaces = substr("`no_spaces'", 1, 1) + " " + ///
                substr("`no_spaces'", 2, 1) + " _ _ _" 
            
        }
    
        else if `len'==3{
            
            local add_spaces = substr("`no_spaces'", 1, 1) + " " + ///
                substr("`no_spaces'", 2, 1) + " " + /// 
                substr("`no_spaces'", 3, 1) + " _ _" 
            
        }
    
        else if `len'==4{
            
            local add_spaces = substr("`no_spaces'", 1, 1) + " " + ///
                substr("`no_spaces'", 2, 1) + " " + /// 
                substr("`no_spaces'", 3, 1) + " " + ///
                substr("`no_spaces'", 4, 1) + " _" 
            
        }
    
        else if `len'==5{
            
            local add_spaces = substr("`no_spaces'", 1, 1) + " " + ///
                substr("`no_spaces'", 2, 1) + " " + /// 
                substr("`no_spaces'", 3, 1) + " " + ///
                substr("`no_spaces'", 4, 1) + " " + ///
                substr("`no_spaces'", 5, 1) +
    
        }
    
        *Throw some errors if you gave me no letters or too many letters
        else if `len'==0{
            
            di "ERROR: You specified `len' letters. You have to specify at least one."
            stop 
        }
    
        else{
            
            di "ERROR: You specified `len' letters. That is too many."
            stop 
            
        }
    
        *Set a new dataset with one observation 
        set obs 1 
    
        *Initialize 5 blank variables that will store each letter by position
        forvalues i = 1/5{
            
            gen letter`i' = ""
            
        }
    
        *Start a counter to use in the loop below
        local count = 1
    
        *Loop through all possible combinations of the 5 letters, the dumb way 
        foreach letter1 of local add_spaces{
            
            foreach letter2 of local add_spaces{
                
                foreach letter3 of local add_spaces{
                    
                    foreach letter4 of local add_spaces{
                        
                        foreach letter5 of local add_spaces{
    
                            forvalues i = 1/5{
                                
                                replace letter`i' = "`letter`i''" in `count'
                                local new_n = _N + 1
                                set obs `new_n'
                                
                            }
                            
                            local ++count
                            
                        }
                    }
                    
                }
                
            }
         
        }
    
        *Store a version of the string with no blanks 
        local no_blanks = subinstr("`add_spaces'", "_", "", .)
    
        *Create a series of dummies to tell us if specific letter is in a specific position
        foreach letter of local no_blanks{
            
            forvalues i = 1/5{
                
                gen letter`i'_is_`letter' = letter`i'=="`letter'"
                
            }
            
            *Calculate the number of times each letter appears 
            egen row_`letter'_count = rowtotal(*_is_`letter')
            
            *Drop the combination if the letter appears more than once or not at all 
            drop if row_`letter'_count>1 & !missing(row_`letter'_count) | ///
                row_`letter'_count==0
            
        }
    
        if "`pos_no_spaces'"=="_____"{
            
            di "No Need to Check Positions!"
            
        }
        
        else{
            
            /*Loop through all the letters to check the positions of the letter combinations 
                against the known positions
            */
            foreach letter of local no_blanks{
                
                *Initialize a blank variable for storing indicators of out of position letters
                gen drop_because_of_`letter' = . 
                *Store the known position of the letter in the submitted string with known locations
                local let_pos = strpos("`pos_no_spaces'", "`letter'") 
                *If we don't know the positon of a letter, we pass
                if `let_pos'==0{
                    
                    di "No position of `letter' determined"
                    
                }
                
                /* If we do know the position of a letter, we check whether the letter is in the 
                    right position by checking the letter of that number against the letter
                    it should be
                */
                else{
                    
                    replace drop_because_of_`letter' = letter`let_pos'!="`letter'"
                    
                }
                
            }
    
            *Aggregate across the drop indicators to make one single drop indicator
            egen to_drop = rowmax(drop_*)
            *Drop the combinations that don't meet the criteria
            drop if to_drop
    
            
        }
        
        *Drop all the weird variables we made along the way 
        drop *_*
    
        *Combine the letter options into one string
        gen arrangement = letter1+letter2+letter3+letter4+letter5
        *Tab all the remaining arrangements 
        tab arrangement
        
    end

  • #2
    I can give some general comments, but talk to me first about what the program is intended to do. I have some thoughts, but what's the context, what's this command meant to do?

    Comment


    • #3
      Oh yes - I should have started with that. For anyone not yet enmeshed in the viral sensation game, Wordle gives a player 5 blank spots into which they can input their guess of a 5 letter word in the English language. If the player's guess is wrong, they are returned information about whether any part of their guess was correct. That information can come in two forms, letters from the original guess can be highlighted in green indicating a correct letter in the correct position. The other information a player receives is when a letter is highlighted in yellow, indicating the letter is in the word, but in a different position. Imagine the true 5 letter word is: INCUR. I guess TARES and find only that I know R is in the word (it is highlighted in yellow). Then I guess CHINO and learn that C and I are in the word at position one (1-indexed) - with both C and I being highlighted in yellow. If instead on my second guess I had chosen to use INKED, I would see that both I and N would be highlighted in green, as they are the right letter in the right place.

      Thus, generally after a guess or two, a player has two types of information - some information about which letters are in the word, and some information about where the letters are in the word. Players then have to guess the real word given that information, and a thing that can be helpful in making those guesses is writing out all of the possible orders of the letters (including unknowns positions indicated by blanks), to see what words might fit. This program is intended to automate the process of producing all of these letter order permutations.

      So in our example I could input:

      Code:
       statle "I N R" "I N _ _ _"
      And would be returned:

      Code:
      arrangement | Freq. Percent Cum.
      ------------+-----------------------------------
      INR__       | 4     33.33   33.33
      IN_R_       | 4     33.33   66.67
      IN__R       | 4     33.33   100.00
      ------------+-----------------------------------
      Total       | 12    100.00


      In this instance, those permutations are a little obvious, but with different input information, the set of possibilities can be larger or less easy to see and this can be quite helpful.

      Comment


      • #4
        Uhhhh.... okay I think I understand, I guess my question is, what's the application of this for Stata? Sounds cool. My advice to you with whatever program you write would be to use subroutines in your code, programs WITHIN programs that do one, main task

        Comment


        • #5
          Originally posted by CJ Libassi View Post
          Oh yes - I should have started with that. For anyone not yet enmeshed in the viral sensation game, Wordle gives a player 5 blank spots into which they can input their guess of a 5 letter word in the English language. If the player's guess is wrong, they are returned information about whether any part of their guess was correct. That information can come in two forms, letters from the original guess can be highlighted in green indicating a correct letter in the correct position. The other information a player receives is when a letter is highlighted in yellow, indicating the letter is in the word, but in a different position. Imagine the true 5 letter word is: INCUR. I guess TARES and find only that I know R is in the word (it is highlighted in yellow). Then I guess CHINO and learn that C and I are in the word at position one (1-indexed) - with both C and I being highlighted in yellow. If instead on my second guess I had chosen to use INKED, I would see that both I and N would be highlighted in green, as they are the right letter in the right place.

          Thus, generally after a guess or two, a player has two types of information - some information about which letters are in the word, and some information about where the letters are in the word. Players then have to guess the real word given that information, and a thing that can be helpful in making those guesses is writing out all of the possible orders of the letters (including unknowns positions indicated by blanks), to see what words might fit. This program is intended to automate the process of producing all of these letter order permutations.
          While doing this project, I also realize that a very important part of doing it will be writing up the results of my research. I read a lot of useful information on the website https://fixgerald.com/blog/what-is-plagiarism which helps me learn more about the issue of plagiarism, what the specifics of this or that style of citation are, and how to reduce the amount of plagiarism in my texts.
          So in our example I could input:

          Code:
          statle "I N R" "I N _ _ _"
          And would be returned:

          Code:
          arrangement | Freq. Percent Cum.
          ------------+-----------------------------------
          INR__ | 4 33.33 33.33
          IN_R_ | 4 33.33 66.67
          IN__R | 4 33.33 100.00
          ------------+-----------------------------------
          Total | 12 100.00


          In this instance, those permutations are a little obvious, but with different input information, the set of possibilities can be larger or less easy to see and this can be quite helpful.
          It does sound very cool, I believe it will work out.

          Comment


          • #6
            CJ Libassi Also check out: -ssc describe wordy- (by Austin Nichols)

            Comment

            Working...
            X