Practicing Writing Programs in Stata by Creating a Wordle Helper - any tips or improvements?

CJ Libassi

Join Date: May 2020
Posts: 46

Practicing Writing Programs in Stata by Creating a Wordle Helper - any tips or improvements?

16 Jan 2022, 10:32

Hi all,

Been wanting to practice and improve my Stata program writing skills. I thought Wordle presented a good opportunity try to write something. I think what I came up with basically works (but I may have missed edge cases), but it still feels pretty hacky to me (e.g. looping through all iterations and then dropping arrangements with repeated letters rather than doing some sort of sampling without replacement). Anyway - would love any advice or improvements you might have:

Code:

capture program drop statle 
program statle

    clear all 
    set more off 

    *Clear out any commas 
    local no_commas = subinstr("`1'", ",", "", .)
    local no_spaces = subinstr("`no_commas'", " ", "", .)

    *Get the length of the string free of commas and spaces
    local len = strlen("`no_spaces'") 
    
    *Remove commas and spaces from the position string as well
    local pos_no_commas = subinstr("`2'", ",", "", .) 
    local pos_no_spaces = subinstr("`pos_no_commas'", " ", "", .) 
    

    *Use the length information to insert blanks if not already provided
    if `len'==1{
        
        local add_spaces = "`no_spaces'" + " _ _ _ _" 
    
    }

    else if `len'==2{
        
        local add_spaces = substr("`no_spaces'", 1, 1) + " " + ///
            substr("`no_spaces'", 2, 1) + " _ _ _" 
        
    }

    else if `len'==3{
        
        local add_spaces = substr("`no_spaces'", 1, 1) + " " + ///
            substr("`no_spaces'", 2, 1) + " " + /// 
            substr("`no_spaces'", 3, 1) + " _ _" 
        
    }

    else if `len'==4{
        
        local add_spaces = substr("`no_spaces'", 1, 1) + " " + ///
            substr("`no_spaces'", 2, 1) + " " + /// 
            substr("`no_spaces'", 3, 1) + " " + ///
            substr("`no_spaces'", 4, 1) + " _" 
        
    }

    else if `len'==5{
        
        local add_spaces = substr("`no_spaces'", 1, 1) + " " + ///
            substr("`no_spaces'", 2, 1) + " " + /// 
            substr("`no_spaces'", 3, 1) + " " + ///
            substr("`no_spaces'", 4, 1) + " " + ///
            substr("`no_spaces'", 5, 1) +

    }

    *Throw some errors if you gave me no letters or too many letters
    else if `len'==0{
        
        di "ERROR: You specified `len' letters. You have to specify at least one."
        stop 
    }

    else{
        
        di "ERROR: You specified `len' letters. That is too many."
        stop 
        
    }

    *Set a new dataset with one observation 
    set obs 1 

    *Initialize 5 blank variables that will store each letter by position
    forvalues i = 1/5{
        
        gen letter`i' = ""
        
    }

    *Start a counter to use in the loop below
    local count = 1

    *Loop through all possible combinations of the 5 letters, the dumb way 
    foreach letter1 of local add_spaces{
        
        foreach letter2 of local add_spaces{
            
            foreach letter3 of local add_spaces{
                
                foreach letter4 of local add_spaces{
                    
                    foreach letter5 of local add_spaces{

                        forvalues i = 1/5{
                            
                            replace letter`i' = "`letter`i''" in `count'
                            local new_n = _N + 1
                            set obs `new_n'
                            
                        }
                        
                        local ++count
                        
                    }
                }
                
            }
            
        }
     
    }

    *Store a version of the string with no blanks 
    local no_blanks = subinstr("`add_spaces'", "_", "", .)

    *Create a series of dummies to tell us if specific letter is in a specific position
    foreach letter of local no_blanks{
        
        forvalues i = 1/5{
            
            gen letter`i'_is_`letter' = letter`i'=="`letter'"
            
        }
        
        *Calculate the number of times each letter appears 
        egen row_`letter'_count = rowtotal(*_is_`letter')
        
        *Drop the combination if the letter appears more than once or not at all 
        drop if row_`letter'_count>1 & !missing(row_`letter'_count) | ///
            row_`letter'_count==0
        
    }

    if "`pos_no_spaces'"=="_____"{
        
        di "No Need to Check Positions!"
        
    }
    
    else{
        
        /*Loop through all the letters to check the positions of the letter combinations 
            against the known positions
        */
        foreach letter of local no_blanks{
            
            *Initialize a blank variable for storing indicators of out of position letters
            gen drop_because_of_`letter' = . 
            *Store the known position of the letter in the submitted string with known locations
            local let_pos = strpos("`pos_no_spaces'", "`letter'") 
            *If we don't know the positon of a letter, we pass
            if `let_pos'==0{
                
                di "No position of `letter' determined"
                
            }
            
            /* If we do know the position of a letter, we check whether the letter is in the 
                right position by checking the letter of that number against the letter
                it should be
            */
            else{
                
                replace drop_because_of_`letter' = letter`let_pos'!="`letter'"
                
            }
            
        }

        *Aggregate across the drop indicators to make one single drop indicator
        egen to_drop = rowmax(drop_*)
        *Drop the combinations that don't meet the criteria
        drop if to_drop

        
    }
    
    *Drop all the weird variables we made along the way 
    drop *_*

    *Combine the letter options into one string
    gen arrangement = letter1+letter2+letter3+letter4+letter5
    *Tab all the remaining arrangements 
    tab arrangement
    
end

Tags: None

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

16 Jan 2022, 14:06

I can give some general comments, but talk to me first about what the program is intended to do. I have some thoughts, but what's the context, what's this command meant to do?
Comment
CJ Libassi

Join Date: May 2020

Posts: 46
#3

16 Jan 2022, 14:31

Oh yes - I should have started with that. For anyone not yet enmeshed in the viral sensation game, Wordle gives a player 5 blank spots into which they can input their guess of a 5 letter word in the English language. If the player's guess is wrong, they are returned information about whether any part of their guess was correct. That information can come in two forms, letters from the original guess can be highlighted in green indicating a correct letter in the correct position. The other information a player receives is when a letter is highlighted in yellow, indicating the letter is in the word, but in a different position. Imagine the true 5 letter word is: INCUR. I guess TARES and find only that I know R is in the word (it is highlighted in yellow). Then I guess CHINO and learn that C and I are in the word at position one (1-indexed) - with both C and I being highlighted in yellow. If instead on my second guess I had chosen to use INKED, I would see that both I and N would be highlighted in green, as they are the right letter in the right place.

Thus, generally after a guess or two, a player has two types of information - some information about which letters are in the word, and some information about where the letters are in the word. Players then have to guess the real word given that information, and a thing that can be helpful in making those guesses is writing out all of the possible orders of the letters (including unknowns positions indicated by blanks), to see what words might fit. This program is intended to automate the process of producing all of these letter order permutations.

So in our example I could input:

Code:

statle "I N R" "I N _ _ _"

And would be returned:

Code:

arrangement | Freq. Percent Cum. ------------+----------------------------------- INR__ | 4 33.33 33.33 IN_R_ | 4 33.33 66.67 IN__R | 4 33.33 100.00 ------------+----------------------------------- Total | 12 100.00

In this instance, those permutations are a little obvious, but with different input information, the set of possibilities can be larger or less easy to see and this can be quite helpful.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#4

16 Jan 2022, 14:58

Uhhhh.... okay I think I understand, I guess my question is, what's the application of this for Stata? Sounds cool. My advice to you with whatever program you write would be to use subroutines in your code, programs WITHIN programs that do one, main task
Comment
Barc

Join Date: Aug 2014

Posts: 42
#5

23 Feb 2022, 02:55

Originally posted by CJ Libassi View Post

Oh yes - I should have started with that. For anyone not yet enmeshed in the viral sensation game, Wordle gives a player 5 blank spots into which they can input their guess of a 5 letter word in the English language. If the player's guess is wrong, they are returned information about whether any part of their guess was correct. That information can come in two forms, letters from the original guess can be highlighted in green indicating a correct letter in the correct position. The other information a player receives is when a letter is highlighted in yellow, indicating the letter is in the word, but in a different position. Imagine the true 5 letter word is: INCUR. I guess TARES and find only that I know R is in the word (it is highlighted in yellow). Then I guess CHINO and learn that C and I are in the word at position one (1-indexed) - with both C and I being highlighted in yellow. If instead on my second guess I had chosen to use INKED, I would see that both I and N would be highlighted in green, as they are the right letter in the right place.

Thus, generally after a guess or two, a player has two types of information - some information about which letters are in the word, and some information about where the letters are in the word. Players then have to guess the real word given that information, and a thing that can be helpful in making those guesses is writing out all of the possible orders of the letters (including unknowns positions indicated by blanks), to see what words might fit. This program is intended to automate the process of producing all of these letter order permutations.
While doing this project, I also realize that a very important part of doing it will be writing up the results of my research. I read a lot of useful information on the website https://fixgerald.com/blog/what-is-plagiarism which helps me learn more about the issue of plagiarism, what the specifics of this or that style of citation are, and how to reduce the amount of plagiarism in my texts.
So in our example I could input:

Code:

statle "I N R" "I N _ _ _"

And would be returned:

Code:

arrangement | Freq. Percent Cum. ------------+----------------------------------- INR__ | 4 33.33 33.33 IN_R_ | 4 33.33 66.67 IN__R | 4 33.33 100.00 ------------+----------------------------------- Total | 12 100.00

In this instance, those permutations are a little obvious, but with different input information, the set of possibilities can be larger or less easy to see and this can be quite helpful.

It does sound very cool, I believe it will work out.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1480
#6

23 Feb 2022, 03:28

CJ Libassi Also check out: -ssc describe wordy- (by Austin Nichols)
2 likes
Comment

Announcement

Practicing Writing Programs in Stata by Creating a Wordle Helper - any tips or improvements?

Comment

Comment

Comment

Comment

Comment