Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New package on SSC: qrprocess

    Thanks to Kit Baum a new package, qrprocess, is now available on SSC for Stata 9.2+. You can install it with
    Code:
    ssc install qrprocess
    This package offers fast estimation and inference procedures for the linear quantile regression model. First, qrprocess implements new algorithms that are much quicker than the built-in Stata commands, especially when a large number of quantile regressions or bootstrap replications must be estimated. Second, the commands provide analytical estimates of the variance-covariance matrix of the coefficients for several quantile regressions allowing for weights, clustering, and stratification. Third, in addition to traditional pointwise confidence intervals, this command also provides functional confidence bands and tests of functional hypotheses. Fourth, predict called after qrprocess can generate monotone estimates of the conditional quantile and distribution functions obtained by rearrangement. Fifth, the new command plotprocess conveniently plots the estimated coefficients with their confidence intervals and uniform bands.

    Let's consider an example. We load a data set with 5634 observations:
    Code:
    use http://www.stata.com/data/jwooldridge/eacsap/cps91
    The median regression of lwage on age, age2, education, and indicator variables for black and hispanic can be estimated with
    Code:
    . qrprocess lwage c.age##c.age educ i.black i.hispanic
    
    Quantile regression
    No. of obs.        3286    
    Algorithm:         qreg.
    Variance:          kernel estimate of the sandwich as proposed by Powell(1990).
    
    ------------------------------------------------------------------------------
          lwage  |      Coef.   Std. Err.      t     P>|t|    [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    Quant. 0.5   |
            age  |     .04578   .0077878    5.88    0.000     .0305106    .0610495
    c.age#c.age  |  -.0005031    .000099   -5.08    0.000    -.0006972    -.000309
           educ  |   .1018382   .0041052   24.81    0.000     .0937893    .1098871
        1.black  |   -.021541   .0414319   -0.52    0.603     -.102776    .0596939
     1.hispanic  |   .0484709   .0432474    1.12    0.262    -.0363236    .1332655
          _cons  |  -.1473537   .1501476   -0.98    0.326    -.4417461    .1470388
    ------------------------------------------------------------------------------
    qrprocess is very similar to the official command qreg when a single quantile regression is estimated but qrprocess offers additional algorithms that are faster when the number of observations is very large and it provides standard errors that allow for clustering and stratification.

    The main advantages of
    qrprocess appear when many quantile regressions must be estimated to analyze the conditional distribution of the outcome. For instance, we may estimate 81 quantile regression for the quantile indexes 0.1, 0.11, 0.12, ..., 0.9 with
    Code:
    qrprocess lwage c.age##c.age educ i.black i.hispanic, quantile(0.1(0.01)0.9) noprint
    We have activated the option
    noprint because the tables of coefficients is huge. Instead, we can easily plot all the coefficients with the command
    Code:
    plotprocess
    and obtain

    Click image for larger version

Name:	figure1.png
Views:	1
Size:	33.5 KB
ID:	1546890


    Note that qrprocess is significantly faster than calling 81 times qreg. In addition, qrprocess also estimates the covariances between the coefficients estimated at different quantile indexes, which allows testing cross-restrictions.

    If this algorithm is still too slow, qrprocess implements a new and even faster estimator, the one-step estimator. This estimator is not numerically identical to the traditional quantile regression estimator but it is asymptotically equivalent to it. We can select this algorithm with the option method(onestep)
    Code:
    qrprocess lwage c.age##c.age educ i.black i.hispanic, quantile(0.1(0.01)0.9) noprint method(onestep)


    Many of the hypotheses of interest to researchers involve the whole quantile regression process, e.g. (1) Has a variable any effect at all? I.e. is the coefficient on this variable 0 at all quantile indexes? (2) Has a variable a positive effect over the whole distribution (stochastic dominance)? (3) Is the effect of a variable homogenous (constant at all quantile indexes)?
    These are functional null hypotheses. A naive approach consisting of estimating many quantile regressions and using pointwise tests will suffer from the multiple testing problem. qrprocess offers tests for functional hypotheses as well as uniform confidence bands that cover the whole function with a prespecified probability. The option functional must be activated. Only the bootstrap can be used for functional inference. Here we use the multiplier bootstrap, which is faster:
    Code:
    qrprocess lwage c.age##c.age i.black i.hispanic educ, quantile(0.1(0.01)0.9) functional vce(multiplier, reps(500))
    At the end of the omitted output the p-values for many functional null hypotheses are provided. We can plot the coefficients, the pointwise confidence intervals as well as the uniform bands with plotprocess. Without any argument, we can see all the coefficients. If we are especially interested in the effect of education, we can type
    Code:
    plotprocess educ, ytitle("QR coefficent") title("Years of education")
    and we obtain

    Click image for larger version

Name:	figure2.png
Views:	1
Size:	19.7 KB
ID:	1546891


    qrprocess and plotprocess offer many additional options that you can discover by reading the help files. We have also written a paper that describes the algorithms, the inference procedures, and the codes: "Quantile and distribution regression in Stata: algorithms, pointwise and functional inference". We are still working on it with the objective to submit it to the Stata Journal. We have written another paper where we suggest the new algorithms that are implemented in the package: "Fast algorithms for the quantile regression process".

    These codes and papers are the results of joint work by Victor Chernozhukov, Iván Fernández-Val and myself.

  • #2
    Hi,
    I am using instead of the quantile regression the distribution regression, however, the plots should appears too.
    My exact command is
    Code:
    drprocess log_income age marital low med high origin child if gen==0
    where age is just the age, marital is a dummy for being married, low med and high indicate the educational level of an individual, origin indicates whether being from the Netherlands and child is a dummy for having a child.
    Here, I am solely interested in the men of my sample.
    However, when running
    Code:
    plotprocess
    I end up with empty plots, see below.
    My data consists of almost 3000 observations and running the distribution regression does not results in an error.
    Can you help me?
    Click image for larger version

Name:	Screenshot 2021-11-03 at 13.41.15.png
Views:	1
Size:	112.6 KB
ID:	1634552

    Comment


    • #3
      Dear Maud,

      I cannot replicate your issue on my computer because I don't have your dataset. When I try to run the codes in the help file
      Code:
      use http://www.stata.com/data/jwooldridge/eacsap/cps91
      drprocess lwage c.age##c.age i.black i.hispanic educ, functional method(logit, onestep) vce(multiplier, reps(500))
      plotprocess
      they work as expected. Do they run on your computer?

      To go further I would need to have a replicable example. Can you send me the data or an extract of the data such that I can replicate your issue? Or find a publicly available dataset for which the same issue appears?

      Best,
      Blaise

      Comment


      • #4
        Dear Blaise,

        Here is an extract of my data.
        However, trying to run the example gives no problems. Also, when running the exact code as you gave in your example but with my data and variables, I get the desired plots.
        Could you explain what the options exactly specify?
        I am not sure why the plots are empty when I do not specify those options.

        Kind regards,
        Maud


        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(log_income gender) byte age float(marital low med high origin child)
        7.313221 0 50 0 0 0 1 1 0
        7.600903 0 60 1 1 0 0 1 0
        6.993933 0 31 0 0 1 0 1 0
        7.313221 1 38 0 0 1 0 1 0
        7.408531 1 51 0 0 0 1 1 0
         8.34284 0 55 1 1 0 0 1 1
        7.901007 0 38 1 1 0 0 1 0
        7.783224 0 41 0 0 0 1 1 0
        7.740664 1 44 1 0 0 1 1 1
        7.549609 1 40 0 1 0 0 1 0
        8.411833 1 46 0 1 0 0 1 1
        7.718686 1 42 1 0 0 1 1 0
        7.718686 0 54 1 0 1 0 1 0
        7.130899 1 51 1 1 0 0 1 0
        7.600903 1 60 0 0 1 0 1 0
        8.070906 0 45 0 0 0 1 1 0
        7.824046 0 46 1 0 1 0 1 1
        7.058758 1 46 0 1 0 0 0 1
        7.438384 0 56 0 1 0 0 0 0
        7.696213 0 60 1 1 0 0 1 0
        7.495542 1 27 1 1 0 0 1 0
        8.268732 0 44 1 0 1 0 1 1
        6.745236 1 40 1 0 1 0 1 1
        8.006368 0 43 1 0 1 0 1 1
        7.244227 1 43 1 1 0 0 1 1
        8.032685 0 56 1 1 0 0 1 0
        7.299798 1 32 0 0 1 0 1 0
        7.617268 0 32 1 1 0 0 1 1
        8.022897 0 36 1 0 0 1 1 1
        7.972466 0 36 1 0 1 0 1 1
        7.495542 1 32 1 0 0 1 1 1
        8.101678 0 37 1 0 1 0 1 1
        7.824046 0 50 1 0 1 0 1 1
        7.740664 0 44 1 0 1 0 1 1
        7.783224 0 38 1 0 1 0 1 1
        7.649693 0 32 1 1 0 0 1 1
        6.813445 1 44 0 1 0 0 1 0
        7.003066 0 24 1 1 0 0 1 0
        7.776115 0 31 1 1 0 0 1 1
        7.600903 0 28 1 1 0 0 1 1
        7.600903 0 49 1 1 0 0 1 1
        7.244227 1 22 1 0 1 0 1 0
        7.882315 0 40 0 0 0 1 1 0
        8.006368 1 51 1 0 0 1 1 1
        8.070906 0 49 0 0 0 1 1 0
        7.575585 0 42 1 1 0 0 1 0
        7.244227 1 39 1 1 0 0 1 0
        7.600903 0 56 1 1 0 0 1 0
        6.866933 1 22 0 0 1 0 1 0
        7.740664 0 36 1 1 0 0 1 1
        end

        Comment


        • #5
          Now I understand! By default, drprocess estimates a single distribution regression. But plotprocess is designed to plot the coefficients of a family of distribution regressions. It shows empty plots when a single one has been estimated.

          You should either (i) add the option functional which will set the number of distribution regression to 100, or (ii) specify the number of regression with the option ndreg, e.g. as ndreg(200) or (iii) provides a list of thresholds with the option thresholds. Then it should work.

          Thus,
          Code:
           
           drprocess log_income age marital low med high origin child if gen==0, functional plotprocess
          should work.

          Comment


          • #6
            Thank you so much, you really helped me in doing my research!

            Comment


            • #7
              Dear Blaise Melly,

              I hope you do not mind me using this forum to make a suggestion about qrprocess. One thing that I would find useful would be to have a "from" option to provide initial values. This would allow the user to use the faster methods available when estimating multiple quantiles when we just want to estimate a single quantile but have starting values from a previous estimation. It could also allow skipping the first estimation when estimating additional quantiles. Do you think this is something that can easily be implemented?

              Best wishes and thanks,

              Joao
              PS: I sent you an email about this some time ago but I am assuming it went to your junk folder; if not, my apologies for insisting.

              Comment


              • #8
                how to do this: "test [q25]var = [q75]var" after qrprocess regression? which is to test whether there is significant difference between 0.25 and 0.75 quantiles? This can be done after like sqreg, but in this qrprocess, seems not working.

                Comment


                • #9
                  Hello Statalist community and Blaise Melly,

                  I am using qrprocess and plotprocess in Stata 18.0. I am able to run qrprocess and replicate your output, but I consistently get the same r(111) error, "variable __000001 not found" after running plotprocess. This occurs when I run the code on my dataset, as well as the sample code you shared above using the cps91 dataset:


                  Code:
                  net install qrprocess, replace from("https://raw.githubusercontent.com/bmelly/Stata/main/")
                  
                  use http://www.stata.com/data/jwooldridge/eacsap/cps91, clear
                  
                  qrprocess lwage c.age##c.age educ i.black i.hispanic
                  
                  plotprocess
                  This replicates your output above, but I get the error:
                  Code:
                  variable __000001 not found
                  r(111);
                  I tried the same code after using ssc install instead, which gives slightly different output:

                  Code:
                  ssc install qrprocess, replace
                  
                  use http://www.stata.com/data/jwooldridge/eacsap/cps91, clear
                  
                  qrprocess lwage c.age##c.age educ i.black i.hispanic
                  
                  plotprocess
                  This also replicates your output above, but I get a slightly different variable name with the same error:
                  Code:
                  variable __000002 not found
                  r(111);
                  The same thing happens when I run sample code in the help file, and when I run this (or drprocess) on my own data. I am consistently able to run qrprocess and drprocess, just not plotprocess. Thanks for any assistance.


                  Best,
                  Rachel

                  Comment


                  • #10
                    Dear Rachel,

                    Somewhat surprisingly I do not get the same error message as you. In any case, plotprocess does not make sense after the estimation of a single quantile regression. The goal is to plot the coefficients as functions of the quantile index. Could you try the following codes?

                    Code:
                    use http://www.stata.com/data/jwooldridge/eacsap/cps91
                    qrprocess lwage c.age##c.age educ i.black i.hispanic, quantiles(0.1(0.1)0.9)
                    plotprocess
                    These codes estimate nine quantile regressions at the quantile indices 0.1, 0.2, 0.3, ..., 0.9. Then it is possible to plot the quantile regression coefficient functions.

                    Best regards,
                    Blaise

                    Comment


                    • #11
                      Blaise Melly and Rachel Gilbert, it appears as though there is some piece of code in plotprocess that expects variable abbreviation to be set to on and which fails if variable abbreviation is set to off (see help set varabbrev):
                      Code:
                      . set varabbrev off
                      
                      . cap noi plotprocess
                      variable __000002 not found
                      
                      . set varabbrev on
                      
                      . cap noi plotprocess
                      
                      .

                      Comment


                      • #12
                        Thank you both for the quick replies. Blaise Melly, just adding the multiple quantile regressions did not resolve the issue but Joerg Luedicke (StataCorp)'s suggestion to turn on varabbrev did fix the issue. I am now able to run plotprocess without issue. Thank you very much!

                        Comment


                        • #13
                          Joerg Luedicke (StataCorp) Thanks! I never tested my command when variable abbreviation was set to off.

                          Rachel Gilbert, I have updated the ado file on Github. Now it should work with varabbrev turned on or off. But my remark above was still relevant. You will get empty figures if you run plotprocess after having estimated a single quantile regression.

                          Code:
                          net install qrprocess, replace from("https://raw.githubusercontent.com/bmelly/Stata/main/")
                          use http://www.stata.com/data/jwooldridge/eacsap/cps91, clear
                          qrprocess lwage c.age##c.age educ i.black i.hispanic, quantiles(0.1(0.1)0.9)
                          plotprocess

                          Comment

                          Working...
                          X