Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data: reports of repeated time values

    I tried to declare my dataset as a panel data as follows, to a conduct a synesthetic control method:
    ```
    tsset numeric_city_trans month
    ```
    But I keep receiving an error that I have "repeated time values within panel." I am not sure how to address this because my dataset are composed of real estate transactions per month at the neighborhood-level, so it is normal for some neighborhoods to have over one observation (i.e. more than one real estate transaction in a given month).

    One solution that I read is to keep one observation/row for a given neighborhood in each month, but I don't think it make sense in my case to drop all other observations in a given month and neighborhood, and just keep one observation.
    An example of which is the neighborhood with number "7558", where it has several transactions per month.

    Is there a way to declare my data as a panel without dropping observations?
    ```
    dataex neighborhood_numeric month numeric_city_trans medpricesqm

    ----------------------- copy starting from the next line -----------------------
    [CODE]
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long neighborhood_numeric float month long numeric_city_trans double medpricesqm
    14457 51 1 787.4016
    3435 51 1 5752.2124
    11262 51 1 1666.6667
    7552 51 1 773.3733
    7558 51 1 406.66665
    755851 1 553.1915
    11061 51 1 753.7248
    4540 51 1 514.9331
    2800 51 1 448
    2956 52 1 891.2656
    7558 52 1 807.4534
    6670 52 1 174.9915
    14457 52 1 524.17
    7558 52 1 533.3333
    7557 52 1 555.2471
    11252 52 1 894.8546
    ```
    Last edited by Paolo Maldini; 19 Feb 2022, 06:45.

  • #2
    Can you use a more granular identifier such as address for the panel? You could average the price variable by month and neighborhood and use that as one observation, that way you aren't simply picking a random transaction to represent the whole neighborhood for a given month.
    Last edited by Nate Tillern; 19 Feb 2022, 07:53. Reason: Typo

    Comment


    • #3
      Originally posted by Nate Tillern View Post
      Can you use a more granular identifier such as address for the panel? You could average the price variable by month and neighborhood and use that as one observation, that way you aren't simply picking a random transaction to represent the whole neighborhood for a given month.
      Fair that solution makes sense, how can I generate one observation for median or average price per month at the neighborhood-level?

      Comment


      • #4
        Easiest way is to combine by (or bysort) and egen. You'll want to specify month and neighborhood grouping with 'by' and then egen a variable equal to the average or median price.

        Comment


        • #5
          I forgot you had already provided the variable names, sample code could be something like:

          Code:
          bysort neighborhood_numeric month: egen avgprice = mean(medpricesqm)

          Comment


          • #6
            Originally posted by Nate Tillern View Post
            I forgot you had already provided the variable names, sample code could be something like:

            Code:
            bysort neighborhood_numeric month: egen avgprice = mean(medpricesqm)
            Thanks, I ran the following but still receiving the same error message:

            bysort neighborhood_numeric month: egen avgprice = mean(medpricesqm)

            then tsset neighborhood_numeric month, but I get this message again "repeated time values within panel"

            Comment


            • #7
              I have ran the following and did not receive an error message, but I believe Stata is only keeping the first observation per month and neighborhood, which is a bit random:

              ```
              bysort neighborhood_numeric month: keep if _n==1

              tsset neighborhood_numeric month
              ```

              Comment


              • #8
                If you are not using time-series operators such as leads and lags, xtset the data using the panel variable only and proceed as usual

                Code:
                xtset neighborhood_numeric

                If you are using time-series operators, you need to decide how to aggregate the transcations. You can, e.g., take the sum of transactions or average for a particular month using collapse. Assuming the former:

                Code:
                collapse (sum) transactions, by(neighborhood_numeric month)
                xtset neighborhood_numeric month

                Comment


                • #9
                  Originally posted by Andrew Musau View Post
                  If you are not using time-series operators such as leads and lags, xtset the data using the panel variable only and proceed as usual

                  Code:
                  xtset neighborhood_numeric

                  If you are using time-series operators, you need to decide how to aggregate the transcations. You can, e.g., take the sum of transactions or average for a particular month using collapse. Assuming the former:

                  Code:
                  collapse (sum) transactions, by(neighborhood_numeric month)
                  xtset neighborhood_numeric month
                  This is exactly what I needed, thanks!

                  Comment

                  Working...
                  X