Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with ML Program

    Hello, I am trying to define and estimate the parameters from a log likelihood function using minimum wage data and unemployment and wage data from the National Agricultural Workers Survey. The individual likelihood contributions are defined as follows (from Flinn (2006): Minimum Wage Effects on Labor Market Outcomes under Search, Matching, and Endogenous Contact Rates):

    For unemployed individuals
    Click image for larger version

Name:	FLinn 1.PNG
Views:	1
Size:	3.6 KB
ID:	1647985




    For employed individuals earning the minimum wage:
    Click image for larger version

Name:	Flinn 2.PNG
Views:	1
Size:	3.9 KB
ID:	1647989




    For employed individuals earning above the minimum wage:
    Click image for larger version

Name:	Flinn 3.PNG
Views:	1
Size:	2.9 KB
ID:	1647990




    where m denotes the minimum wage, w denotes the wage of individuals who are paid at least $0.50 above the minimum wage, rho*V_n(m) is denoted by the parameter "theta" in the code below, G() denotes the CDF of the lognormal distribution, and g() denotes the PDF of the lognormal distribution. So far I have specified my program as follows: I want to estimate mu and sigma (which are embedded in the G() and g() functions), as well as the parameters alpha, theta, lambda, and eta. Here is the code I have been using so far:

    use "C:\Users\Zach\Dropbox\Confidential NAWS Data\Confidential NAWS Data 2018\Raw Files\NAWS Datasets (Stata)\nawscrtdvars1db18_STATA.dta", clear
    rename *, lower
    gen year = year(cs2)
    merge m:1 state year using "C:\Users\Zach\Dropbox\AEWR Project\Data Files\Generated Data Files\State Minimum Wages No FY 1990-2018.dta"
    drop _m
    merge m:1 year using "C:\Users\Zach\Dropbox\AEWR Project\Data Files\Generated Data Files\Federal Minimum Wages No FY 1990-2018.dta"
    drop _m
    replace fed_min_wage = 3.35 if year==1988 | year==1989 //impute fed min wage data that is missing
    replace min_wage = fed_min_wage if real_min_wage==. //impute fed min wage for states with no min wage
    replace min_wage = fed_min_wage if state=="GA" | state=="WY" //impute fed min wage for states with lower min wage
    replace fwweeks = 52 if fwweeks>52 //round down weeks for leap year
    replace fwweeks = fwweeks/52 //normalize work weeks to 52 weeks = 1
    gen unemployed = fwweeks < 52 //identify workers who were unemployed during year
    gen ti = 52 - fwweeks if fwweeks<52 //identify length of unemployment spell
    gen paid_min_wage = waget1<=(min_wage + .50) //identify individuals paid about the min wage
    replace waget1 = min_wage if waget1<=(min_wage+.50) //replace wages for individuals who make min wage + .50
    gen paid_above_min_wage = waget1>(min_wage+.50) //identify individuals paid above min wage
    gen ln_mw=ln(min_wage) //generate the log of min wage
    gen waget1_hi = waget1 if waget1>(min_wage +.50) //generate wages for individuals who make above min
    gen ln_waget1_hi = ln(waget1) if waget1>(min_wage +.50) //take log of wages for those above min wage
    replace paid_min_wage=0 if unemployed==1 //classify individuals as not paid min wage if classified as unemployed
    replace paid_above_min_wage=0 if unemployed==1 //classifiy individual as not paid hi wage if classified as unemployed
    program drop AEWR_1
    program define AEWR_1
    version 1.0
    args llf mu sigma alpha theta lambda eta
    tempvar G1 G2 G3 T1 T2 T3 T4 T5 T6
    generate double `G1' = normal(sqrt((ln_mw - `mu')/`sigma')^2)
    generate double `G2' = normal([(ln_mw - (1 - `alpha')*`theta')/`alpha' - `mu']/`sigma')
    generate double `G3' = normalden([(ln_waget1_hi - (1 - `alpha')*`theta')/`alpha' - `mu']/`sigma')/(`sigma'*waget1_hi)
    generate double `T1' = ln(sqrt(`lambda')^2) - ln(sqrt(`eta' + `lambda'*`G1')^2)
    generate double `T2' = ln(sqrt(`eta')^2) + ln(sqrt(`G1')^2) if unemployed==1
    generate double `T3' = -sqrt(`lambda'*`G1'*ti)^2 if unemployed==1
    generate double `T4' = ln(sqrt(`G1' - `G2')^2) if paid_min_wage==1
    generate double `T5' = -ln(sqrt(`alpha')^2) if paid_above_min_wage==1
    generate double `T6' = ln(sqrt(`G3')^2) if paid_above_min_wage==1
    quietly replace `llf' = `T1' + `T2' + `T3' if unemployed==1
    quietly replace `llf' = `T1' + `T4' if paid_min_wage==1
    quietly replace `llf' = `T1' + `T5' + `T6' if paid_above_min_wage==1
    end

    ml model lf AEWR_1 () () () () () ()
    ml check
    ml search, repeat()
    ml maximize, iterate(50) difficult
    ml graph
    exit

    I get this message indicating that feasible values cannot be found.
    Click image for larger version

Name:	Stata Snip.PNG
Views:	1
Size:	131.3 KB
ID:	1647991




    First of all, I am not sure if I should be using the lf model or if I should switch to using the d0 estimator. Also, I am not sure if I am specifying the equations correctly in terms of the () () ... () after the ml model lf AEWR_1 command. At this point, I do not want my dependent variables (i.e., ln_mw, ln_waget1_hi, and ti) to depend on other variables, which is why I left the () without an equation in them. Is this how I should be doing it? Any help you could provide would be greatly appreciated, as I have spent the past two days trying to figure this out.
    Attached Files

  • #2
    I noticed an error with my previous code, which normalized the weeks worked to 1 for full year. Even after correcting that error, I get the same result.

    Specifically, I deleted the line of code that reads: replace fwweeks = fwweeks/52 //normalize work weeks to 52 weeks = 1

    Comment


    • #3
      The code is too complicated and too local to your data for us the easily see what the problem is. In general, my strategy would be to simplify the model a lot, and add complications one at the time till you find the problem.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Hi Maarten,
        Thanks for the response. I will do as you suggest. Based on the way the likelihood contributions are defined, can you speak to whether the "lf" estimator would be able to handle this type of optimization problem, or if I would need to use the d0 or d1 estimator instead. Similarly, it is unclear to me whether the way I have specified the ml model command with the "() () ... ()" equation specifications is appropriate. Specifically, some of the parameters enter into different parts of the log likelihood function, so it's unclear to me whether I need to specify a specific variable to be associated with them, if I can specify two variables associated with them, and how I should go about doing that. Any help or input would be greatly appreciated. Thanks.

        Comment


        • #5
          Hi Zachariah
          LF should be enough for your purposes. The others may be more efficient and faster, but for a first run, lf its good enough. (many of my own programs are based on this).
          Regarding the ML model command line. Im never tried using just empty parenthesis. I always prefer to give it names, to know what each parameter represent:

          ml model lf AERW_1 (mu (sigma etc etc

          Something to consider tho. As many other nonlinear estimations, Solutions to this type of model depend strongly on initial conditions. So, bad initial conditions may cause an endless loop.
          That being said. Two options you could try here:
          1. Use better initial conditions, using "init()". That may speed things up, IF you know what are good values for your parameters.
          2. I notice you are modeling "sigma" . This can have very bad convergence properties. I would suggest using "lnsigma" instead. And then transform it back.

          HTH

          Comment


          • #6
            Hi Fernando,
            Thank you for the response. Your input is very helpful. I will try your suggestions to see if they help me resolve my issue.

            Comment


            • #7
              Hi Fernando, can you please elaborate on how I should specify lnsigma in my code? Should I use ln(`sigma') in place of `sigma' or something else?

              Comment


              • #8
                Hi Zachariah
                Consider the very simple OLS via ML

                This will work, but will be problematic:
                Code:
                program myols1
                    args lnf xb sigma
                    qui:replace `lnf'=log(normalden(`xb',`sigma'))
                end
                This will work better, because avoids, for example Sigma=0 or Sigma<0
                Code:
                program myols2
                    args lnf xb lnsigma
                
                    qui:replace `lnf'=log(normalden(`xb', exp(`lnsigma') ))
                end
                So, you have to estimate a parameter of lnsigma, but use exp(sigma) within the estimator.

                Comment


                • #9
                  Hi Fernando, excellent! Thanks for following up. That seems like a nice solution. I have one more question about the () () ... () in the ml model command, though. Do I need to associate each of the parameters with one of the variables in the LLF, or can I just leave them unassociated with variables. I'm unclear what the purpose of the () () ... () is for running the ml model command in Stata. If you could provide some insight into that, I would be very grateful.

                  Comment


                  • #10
                    Yes, at least that is how I have always treated those parameters.
                    In the model I proposed before, for example, i would call them as follows:
                    ml model lf myols1 (xb: wage = x1 x2 x3) /lnsigma
                    or
                    ml model lf myols1 (xb: wage = x1 x2 x3) (lnsigma: )
                    I do this, a) to remember what each parameter information is, and b) because ML gets those parameters (as i understand) to sort what controls go to which model.
                    Hope this helps.
                    Fernando
                    Last edited by FernandoRios; 03 Feb 2022, 19:36.

                    Comment


                    • #11
                      Hi Fernando,
                      Yes that helps, but my some of the parameters in the LLF are associated with more than one variable, and I'm not sure which variables the other parameters are directly associated with as they simply appear in parts of the LLF that are not within a standard framework like OLS or the normal distribution. Also, I am not controlling for other variables in my model (at least while I try to get an initial optimization to work), so I'm unsure how I should be specifying the () () .. () equations given my setting. Do you have any ideas for me about how I should specify the () () ... () in my setting?

                      Comment


                      • #12
                        Well, the way i see your program:
                        Code:
                        program define AEWR_1
                        version 1.0
                        args llf mu sigma alpha theta lambda eta
                        tempvar G1 G2 G3 T1 T2 T3 T4 T5 T6
                        generate double `G1' = normal(sqrt((ln_mw - `mu')/`sigma')^2)
                        generate double `G2' = normal([(ln_mw - (1 - `alpha')*`theta')/`alpha' - `mu']/`sigma')
                        generate double `G3' = normalden([(ln_waget1_hi - (1 - `alpha')*`theta')/`alpha' - `mu']/`sigma')/(`sigma'*waget1_hi)
                        generate double `T1' = ln(sqrt(`lambda')^2) - ln(sqrt(`eta' + `lambda'*`G1')^2)
                        generate double `T2' = ln(sqrt(`eta')^2) + ln(sqrt(`G1')^2) if unemployed==1
                        generate double `T3' = -sqrt(`lambda'*`G1'*ti)^2 if unemployed==1
                        generate double `T4' = ln(sqrt(`G1' - `G2')^2) if paid_min_wage==1
                        generate double `T5' = -ln(sqrt(`alpha')^2) if paid_above_min_wage==1
                        generate double `T6' = ln(sqrt(`G3')^2) if paid_above_min_wage==1
                        quietly replace `llf' = `T1' + `T2' + `T3' if unemployed==1
                        quietly replace `llf' = `T1' + `T4' if paid_min_wage==1
                        quietly replace `llf' = `T1' + `T5' + `T6' if paid_above_min_wage==1
                        end
                        should be called:

                        Code:
                        ml model lf AEWR_1 (mu: )  (sigma: )  ( alpha : )  (theta : )  (lambda : )  (eta: )
                        because each one of this is a unique parameter.

                        And i see everything else is already inside the program. so no need to be specified.

                        In other words, if your first line for the arguments is:
                        args lnf x1 x2 x3 x4
                        The parameters in the model that will need to be "named" in the parenthesis are (x1: ) (x2: ) (x3: ) (x4: )
                        HTH

                        EDIT
                        I just tried to see what happens when you simply use parenthesis "()". And it simply adds a generic name to that parameter. Specifically "EQ#", where # represents the number of the equation.
                        Last edited by FernandoRios; 04 Feb 2022, 06:24.

                        Comment


                        • #13
                          Hi Fernando,
                          Okay, That's what I thought. That is super helpful. And once again, thanks for your help.

                          Cheers,
                          Zach

                          Comment

                          Working...
                          X