Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Built In Inverse within Regress?

    I have a large data set that I am working on (120 GB). I would like to avoid generating new terms if I do not need to, to avoid making the dataset even larger.

    Assume y is a continuous dependent variable, x and w are continuous independent variables, z is categorical
    I know that I can save some space by using "#" and "##" in a regression rather than "gen x2 = x^2"
    The following code:
    Code:
    gen x2 = x^2
    regress y x x2 w i.z
    will produce the same results as:
    Code:
    regress y c.x##c.x w i.z
    But the second version avoids generating a new "x2" term, and thereby saves space.

    Is there a similar built in operator for generating the reciprocal of a variable?
    Let's say I want to regress a version of y that is transformed by the inverse of w
    Is there a way to produce the same results as the following regression without generating a new variable?
    Code:
    generate y_adj = y/w
    regress y_adj c.x##c.x i.z
    I am hoping there is something similar to the following
    Code:
    regress y?w  c.x##c.x i.z
    I have tried using "reg c.y#((c.w)^(-1))" and "reg y/w" with no luck ("Invalid Syntax", and "/ not allowed in a bound varlist")

  • #2
    Welcome to Statalist.

    The syntax you describe is in Stata called "Factor Variable Notation" and is described fully in the output of the command help factor variables. Unfortunately, no such capability exists.

    The point to factor variables is not so much saving space, but rather allowing Stata to recognize variables that are related to each other. That is, in your discussion, Stata would not recognize that x2 is a function of x, and thus that changing the value of x implies that the value of x2 should change accordingly. Thus Stata treats as independent two variables that are not. Factor variable notation allows Stata to understand how polynomial and interaction terms are being constructed, and how categorical variables can be treated as a collection of indicator variables without creating the actual indicator variables.

    Comment


    • #3
      Since Thomas is new, let me add a couple of notes to William's comment.
      One of the big benefits of factor variable notation is that it lets the margins statement handle interactions neatly after the estimation.
      Thomas might also note that the Stata command compress may save him a lot of space. Stata is faster when it can hold all the data in memory, so he might also consider whether he can work with part of the data at specific times

      Comment

      Working...
      X