Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Do you need to use the same number of data observations for different types of regression?

    Hello,

    I want to perform a linear (1 independent variable) and a multiple regression with a sample of 30,000 observations.
    Let's say that I can use the total sample for the linear regression, but before running the multiple regression I only want to keep the positive values of the additional independent variables before running this multiple regression (by using 'keep if var1>=0, var2>=0', etc.). This reduces the number of observations to 16,000 observations. Am I still able to discuss both regressions in an unbiased way or do I also need to use the same smaller sample of 16,000 obs for the linear regression?

    Ps I have a good reason for removing the negative values as these are irrelevant for my research.

    Thanks!

  • #2
    Hi Mat
    So, as you say, there is no harm on restricting your sample based on truly independent variables. In fact, if the model is correctly specified, such restriction should have no impact on the coefficient of the models. The problem is that models are not often well specified, so restricting your sample may produce very different results.
    My personal suggestion is to provide both the full sample and restricted sample of your first mode, so you can better discuss if difference with the alternative models is due to different samples, or due to something else
    Best
    Fernando

    Comment


    • #3
      FernandoRios Thanks, I will try to do what you suggested.

      For your information, below are some data before (1) and after (2) cleaning of the data.
      It gives different outputs when I make a graph (I had to construct the price index based on regression) of both samples...

      sample 1:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long lprice byte howh float yearquarter long lot int unitsf byte(floors rooms bedrms baths garage porch poolacc fplwk airsys cellar)
       675000 10 206   4000  3000  2  8 4 2 1 1 1 1 1  4
       655000  8 196     -6  2600  2  8 4 2 1 1 1 2 1  4
      1444770  8 203   1500  2000  2  5 3 2 1 1 1 1 2  4
       718000  9 199   8000  2800  1  6 3 2 1 1 1 1 1  4
       504000  8 202   5000  2400  2  8 4 2 1 1 1 1 1  4
       450000 10 197     -6  1350  2  5 2 2 1 1 1 2 1  4
       360000  9 200     -6  1000  1  4 2 2 1 1 1 2 1  4
       325000  9 204     -6  1300  2  4 2 2 1 1 1 1 1  4
       720000  9 199   8000  2500  2  7 4 2 1 1 1 1 1  4
       299000 10 200     -6  1590  3  3 1 1 1 1 1 2 1  4
       375000  9 198     -6  1100  2  4 2 2 1 1 1 2 1  4
       847000 10 205   7000  5700  2  8 5 4 1 1 1 1 1  4
      1444770  8 200   1500  2700  3  9 4 3 1 1 1 1 1  4
       610000 10 206   5500  2600  2  7 4 2 1 2 1 1 1  4
       472000  9 199     -6  2460  3  8 3 2 1 1 1 1 1  5
      1444770 10 196   6800  3550  2  8 4 3 1 2 1 1 1  4
       600000 10 197     -6  1700  2  5 3 2 1 2 1 1 1  4
      1444770 10 198   6000    -7  2  6 4 3 1 1 1 1 1  4
       405000  9 198   4000  1400  2  4 2 2 1 1 1 2 1  4
       420000 10 199     -6  1400  2  4 2 2 1 1 1 2 1 -6
      end
      format %tq yearquarter
      sample 2:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long lprice byte howh float yearquarter long lot int unitsf byte(floors rooms bedrms baths garage porch poolacc fplwk airsys cellar)
      1444770 10 200 174113  2800 2  9 4 2 1 1 1 1 1 2
       130000 10 199  66000  1600 1  6 3 2 1 1 1 1 1 3
        60000 10 206  60720  1750 3  7 5 2 2 1 1 1 1 2
       149000  8 199  33000  3200 3  9 5 3 1 1 1 1 1 2
      1444770  8 201  11000  2700 3  6 3 2 1 1 1 1 1 2
       183000  9 206  44000  3000 2  8 5 3 1 1 1 1 1 2
        96000  8 205   2500  3500 2 10 6 4 1 1 1 1 1 2
        93700  9 202  16720  1684 3  7 3 3 1 1 1 1 1 2
       158000  9 198  11000  2800 2  7 4 3 1 1 1 1 1 1
      1444770  7 199  52800 16286 3 14 5 6 1 1 1 1 1 1
       439000  9 196   6160  2400 3  9 4 3 2 1 1 1 1 2
       292000  8 199  33000  1500 1  7 3 1 1 1 1 2 1 3
        70100 10 198   1000  1600 1  5 3 1 2 1 1 2 1 3
       118000 10 205  44000  1300 1  7 4 2 1 1 1 2 1 3
            5  6 201   5500   990 1  5 2 1 1 2 1 2 1 3
       180000  8 201  52800  1500 3  7 4 2 1 1 1 1 1 1
        89900  8 201  22000  1400 3  7 4 2 2 1 1 2 1 2
       113000 10 202  22000  2500 2  9 3 2 1 1 1 1 1 2
        55000  7 206  11000  1800 2  5 3 2 1 1 1 1 1 2
       262500  8 202  44000  1800 1  7 3 2 1 1 1 1 1 2
      end
      format %tq yearquarter

      Comment

      Working...
      X