Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by William Lisowski View Post
    I see the problem.

    First, let me quote a piece from help reshape


    So for your reshape wide, you definitely want a list of variable names like the following:
    Code:
    reshape wide ils* ics* ... hear*, i(id) j(wave)
    But that leads to another problem. When ils1 is reshaped wide, the two new variables will be ils11 and ils12 - ils1 with the value of wave appended. But there already is a variable named ils11 in the dataset, and Stata does not realize at this point that it will become ils111 and ils112 after the reshape. So we need to rename your variables slightly to avoid this problem - and as a side effect, their names will be a little more readable after the reshape. Here's a sample; the technique is described in the output of help rename group .
    Code:
    . list
    
    +--------------------------+
    | id wave ils1 ils11 |
    |--------------------------|
    1. | 1 1 101 1101 |
    2. | 1 2 102 1102 |
    +--------------------------+
    
    . reshape wide ils*, i(id) j(wave)
    (note: j = 1 2)
    variable ils11 already defined
    r(110);
    
    . rename (ils*) (=_)
    
    . list
    
    +----------------------------+
    | id wave ils1_ ils11_ |
    |----------------------------|
    1. | 1 1 101 1101 |
    2. | 1 2 102 1102 |
    +----------------------------+
    
    . reshape wide ils*, i(id) j(wave)
    (note: j = 1 2)
    
    Data long -> wide
    -----------------------------------------------------------------------------
    Number of obs. 2 -> 1
    Number of variables 4 -> 5
    j variable (2 values) wave -> (dropped)
    xij variables:
    ils1_ -> ils1_1 ils1_2
    ils11_ -> ils11_1 ils11_2
    -----------------------------------------------------------------------------
    
    . list
    
    +------------------------------------------+
    | id ils1_1 ils11_1 ils1_2 ils11_2 |
    |------------------------------------------|
    1. | 1 101 1101 102 1102 |
    +------------------------------------------+
    
    .
    OK, having gone through all that, let me offer up some more advice. I don't know what you want to accomplish in your analysis or how you plan to do so, but still, the experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. So if you discover that it seems inconvenient to accomplish what you need with the wide layout - if you find yourself wanting to loop over the wave1/wave2 pairs of variables - consider going back to the data as you now have it and starting from there.
    Hi, I want to reshape my data to long and facing similar problems. I have 478 firms and the data goes from 2006 to 2018.
    I used the command-

    reshape long Nlow_issue_1* Nlow_issue_2 Nlow_issue_3 Nlow_issue_4 Nlow_issue_5 Nmedium_issue_1 Nmedium_issue_2 Nmedium_issue_3 Nmedium_issue_4 Nmedium_issue_5 Nhigh_issue_1 Nhigh_issue_2 Nhigh_issue_3 Nhigh_issue_4 Nhigh_issue_5 Nlow_reach_issue_1 Nlow_reach_issue_2 Nlow_reach_issue_3 Nlow_reach_issue_4 Nlow_reach_issue_5 Nmedium_reach_issue_1 Nmedium_reach_issue_2 Nmedium_reach_issue_3 Nmedium_reach_issue_4 Nmedium_reach_issue_5 Nhigh_reach_issue_1 Nhigh_reach_issue_2 Nhigh_reach_issue_3 Nhigh_reach_issue_4 Nhigh_reach_issue_5 articles1 articles2 diff name Tickers LTD ROA TA ROCE TobQ MCap ROE , i(name) j(year)
    However, Stata says- variable name already defined

    Thereafter, I have used rename command ( as discussed here; but unsure as the discussion is for long to wide conversion)

    rename (Nlow_issue_1 Nlow_issue_2 Nlow_issue_3 Nlow_issue_4 Nlow_issue_5 Nmedium_issue_1 Nmedium_issue_2 Nmedium_issue_3 Nmedium_issue_4 Nmedium_issue_5 Nhigh_issue_1 Nhigh_issue_2 Nhigh_issue_3 Nhigh_issue_4 Nhigh_issue_5 Nlow_reach_issue_1 Nlow_reach_issue_2 Nlow_reach_issue_3 Nlow_reach_issue_4 Nlow_reach_issue_5 Nmedium_reach_issue_1 Nmedium_reach_issue_2 Nmedium_reach_issue_3 Nmedium_reach_issue_4 Nmedium_reach_issue_5 Nhigh_reach_issue_1 Nhigh_reach_issue_2 Nhigh_reach_issue_3 Nhigh_reach_issue_4 Nhigh_reach_issue_5 articles1 articles2 diff name Tickers LTD ROA TA ROCE TobQ MCap ROE) (=_)

    Stata says - Nlow_issue_1 ambiguous abbreviation

    Please help.

    Comment


    • #32
      Someone please help I'm trying to create log variables but in order to do that I have to get rid of the negatives first which means I have to create another variable first and just like the other posts I get an error code r(110).

      Here is the code:

      gen Liquidassets=Liquidassets+88
      Liquidassets already defined
      r(110);

      Any help would be extremely appreciated.

      Comment


      • #33
        Syntactically you'll want

        Code:
        replace  Liquidassets=Liquidassets+88
        or

        Code:
        gen Liquidassets_new =Liquidassets+88
        I don't know the context here, but creating log variables by adding a constant is generally bad practice. There are work arounds, however. Try searching Statalist for some alternatives. Here's a start

        https://www.statalist.org/forums/for...egative-values

        Comment


        • #34
          Thank you for the quick repley. The context is that the data of this variable contains negative ratio values and in order to avoid stata to drop data I have to make the data all positive. That is what I'm attempting to do. I saw on a youtube video that it was supposed to be done by adding a constant. I hope this gives a clearer picture.
          Last edited by Fin Lane; 09 Oct 2020, 10:32.

          Comment


          • #35
            In Stata "drop" has a very specific meaning, given the drop command.

            Stata won't drop (meaning delete) observations just because you try to take logarithms of zero or negative values. What I guess you mean is a two-step:

            1. Stata creates a missing value in any observation with such values if asked to calculate the logarithm.

            2. Those observations will be ignored in any subsequent model fitting command. that uses the variable.

            A good answer to this question will depend on what you are trying to do. If the variable concerned is a response, it's still possible that a generalized linear model with log link will work fine, because the implication there is just that the mean is positive given predictors, not that all values are positive. If the variable concerned is a predictor, I would still be hesitant beyond measure about log(x + constant) as a fix it is utterly arbitrary and may even generate outlers or skewness worse than in the original

            You should never overwrite original data any way. So Justin Blasongame 's second code is better.

            I would be interested to know if you can find a recommendation of this procedure in a good text or paper -- and not a video of unknown provenance.

            Comment


            • #36
              Dear Mr Cox,
              Thank you very much for your quick reply.
              Indeed stata creates missing values.
              The topic of my thesis is How firm characteristics affects a firm's Cash holding.
              In order find that out one of the tests I'm going to do is a Fixed effects and Random effects model.
              However due to the skewness of the standard errors the literature adviced me to transform some of the variables into log variables.
              The independent variable Liquid assets is one of the variables I would like to transform.
              I hope I have informed you enough, if anything is unclear please let me know.

              Nick Cox

              Comment


              • #37
                Standard errors are just what they are; they don't have skewness, so I don't follow your meaning there. I guess you're referring to skewness of variables.

                But still wondering: Why do you want to transform liquid assets? Because there is nonlinearity in the relationship? Predictors don't have to be normally or even symmetrically distributed.

                Here is a demonstration of my point that this transformation isn't guaranteed to help even with overall skewness.

                I made the numbers up to point up the problem, but nothing stops the problem being worse either. You need to check.


                Code:
                clear
                set obs 100
                set seed 2803
                gen x = cond(_n == 1, -99, exp(rnormal(4, 1)))
                transplot qnorm x, trans(@  ln(@+100))
                Click image for larger version

Name:	logxplusconstant.png
Views:	1
Size:	27.2 KB
ID:	1576410


                The general point is this: with logarithms the upper part of the range is squeezed relatively, usually what you want. But the converse can bite: the lower part of the range is stretched relatively, and the way that happens may be malign.

                transplot is from SSC. See also https://www.statalist.org/forums/for...dable-from-ssc

                Comment


                • #38
                  Dear Mr Cox,
                  Apologies for the unclarity and thank you for your help and time, i have been working for 8 hours straight on this .
                  Yes I meant the skewness of the variables and I use the rule of thumb that the skewness should not be higher than twice the standard error.
                  The rough reason for transforming the variables is that most of the variables are not normally distributed and I suffer from heteroskedasticity as well.
                  As a result most of the independent variables do not show a significant relationship with the dependent variable even though all the theses' in the repocitory and the general literature shows that there should be.
                  The data has been correctly processed and the variables are properly defined.
                  A fellow student of mine recommended transforming them into log variables which indeed improved the results.
                  However the variable liquid assets cannot be transformed due to the negative coefficients.
                  So now I'm at a loss

                  Nick Cox
                  Last edited by Fin Lane; 09 Oct 2020, 12:51.

                  Comment


                  • #39
                    This isn't easy to follow. Sorry, but I don't want to repeat earlier advice and I don't think I can add much that would be helpful. Attaining approximately normal distributions is about the weakest possible reason for a transformation when the context is some kind of regression. For example, using dummy or indicator variables would be out of court if normality were a strong goal.

                    See also the thread cited by Justin Blasongame in #31.

                    Comment

                    Working...
                    X