Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • No observations for different size groups

    Hi all,

    I have a problem regarding the following STATA Code:

    ************************************************** ******************************
    * TABELLE 3
    ************************************************** ******************************

    * 0 Allgemeine Einstellungen

    set more off
    clear all

    cd "XXX" // Dateipfad des Arbeitsverzeichnisses

    * 1 Daten einlesen und aufbereiten

    use "combined.dta", clear

    ************************************************** ******************************
    * 2 Berechnungen der Variablen
    ************************************************** ******************************

    gen prccd_mil = prccd / 1000000 // lineare Transformation der prccd Variable

    gen debt_maturity_1 = (dltt - dd1) / (dltt + dlc) // Anteil der Schulden, die in mehr als 1 Jahr fällig werden
    gen size = csho * prccd
    gen market_to_book = (at + csho * prccd - ceq) / at
    gen leverage = (dltt + dlc) / at
    gen rd = xrd / at
    gen capex = capx / at
    gen ppe = ppent / at
    gen roa = ebitda / at
    gen cash = che / at
    gen ipo_year = year(ipodate)
    gen age = fyear - ipo_year
    gen taxes = txt / pi


    ************************************************** ******************************
    * 3. Gruppierung der Daten
    ************************************************** ******************************

    * Berechnung der 20. und 50. Perzentile nach Jahr
    egen size_20th = pctile(size), p(20) by(year)
    egen size_50th = pctile(size), p(50) by(year)

    * Gruppierung der Firmen nach Größe
    gen size_group = ""
    replace size_group = "small" if size <= size_20th
    replace size_group = "medium" if size > size_20th & size <= size_50th
    replace size_group = "large" if size > size_50th


    * Gruppierung für andere Variablen
    foreach var in market_to_book leverage ppe roa cash age {

    * Median (50. Perzentil) der jeweiligen Variable berechnen
    egen `var'_median = pctile(`var'), p(50) by(year)

    * Gruppierung basierend auf dem Median
    gen `var'_group = ""
    replace `var'_group = "low" if `var' < `var'_median
    replace `var'_group = "high" if `var' >= `var'_median

    }

    * Berechnung des 75. Perzentils für R&D pro Jahr
    egen rd_75th = pctile(rd), p(75) by(year)

    * Gruppierung basierend auf dem 75. Perzentil
    gen rd_group = ""
    replace rd_group = "low" if rd < rd_75th
    replace rd_group = "high" if rd >= rd_75th

    ************************************************** ******************************
    * 4. Berechnung der Zeitspannen
    ************************************************** ******************************

    gen year_group = floor(((year - 1990) / 5) * 5) + 1990
    label var year_group "4-Year Period Start"

    ************************************************** ******************************
    * 5. Berechnung und Ausgabe der Ergebnisse
    ************************************************** ******************************

    ************************************************** ******************************
    * Umwandlung der size_group in numerische Werte
    ************************************************** ******************************

    gen size_group_num = .
    replace size_group_num = 1 if size_group == "small"
    replace size_group_num = 2 if size_group == "medium"
    replace size_group_num = 3 if size_group == "large"

    label define size_groups 1 "small" 2 "medium" 3 "large"
    label values size_group_num size_groups

    ************************************************** ******************************
    * Schleife über 4-Jahresperioden und separate Verarbeitung der Größenkategorien
    ************************************************** ******************************

    local periods_start 1990
    local periods_end 2024

    * Ergebnisse speichern
    matrix results = J(3, 2, .) // 3 Gruppen x 2 Werte (Trend x100 und P-Wert)

    forvalues start = `periods_start'(4)`periods_end' {
    local end = `start' + 3

    * Daten auf die jeweilige Zeitspanne filtern
    preserve
    keep if year >= `start' & year <= `end'

    di "Period: `start' - `end'"

    * Verarbeitung für jede Größenkategorie
    forvalues size_group = 1/3 {
    di " Size Group: " `size_group'

    * Daten für die aktuelle Gruppe filtern
    keep if size_group_num == `size_group'

    if _N == 0 {
    di " Keine Daten für Size Group " `size_group'
    continue
    }

    * Berechnung des Medians
    quietly summarize debt_maturity_1, detail
    local median_dm = r(p50)
    di " Median Debt Maturity (Group " `size_group' "): " %6.3f `median_dm'

    * Trendberechnung
    reg debt_maturity_1 year

    * Extrahiere den Koeffizienten und berechne den P-Wert
    local trend = _b[year] * 100 // Koeffizientenwert extrahieren
    local pval = 2 * ttail(e(df_r), abs(_b[year] / _se[year])) // P-Wert berechnen

    * Ausgabe der berechneten Werte
    di " Trend x100 (Group " `size_group' "): " %6.3f `trend'
    di " P-Wert (Group " `size_group' "): " %6.3f `pval'

    * Speicherung der Ergebnisse in der Matrix
    matrix results[`size_group', 1] = `trend'
    matrix results[`size_group', 2] = `pval'
    }

    restore
    }

    ************************************************** ******************************
    * Ergebnisse anzeigen
    ************************************************** ******************************

    di "Ergebnisse der Trends und P-Werte nach Größenkategorien:"
    matrix list results


    For my size groups medium and large there are no observations. Any ideas how I can solve this?

  • #2
    Code:
          
    forvalues size_group = 1/3 {
        di " Size Group: " `size_group'
    
        * Daten für die aktuelle Gruppe filtern
        keep if size_group_num == `size_group'
    
        if _N == 0 {
            di " Keine Daten für Size Group " `size_group'
            continue
        }
    
        * Berechnung des Medians
        quietly summarize debt_maturity_1, detail
        local median_dm = r(p50)
        di " Median Debt Maturity (Group " `size_group' "): " %6.3f `median_dm'
    
        * Trendberechnung
        reg debt_maturity_1 year
    
        * Extrahiere den Koeffizienten und berechne den P-Wert
        local trend = _b[year] * 100 // Koeffizientenwert extrahieren
        local pval = 2 * ttail(e(df_r), abs(_b[year] / _se[year])) // P-Wert berechnen
    
        * Ausgabe der berechneten Werte
        di " Trend x100 (Group " `size_group' "): " %6.3f `trend'
        di " P-Wert (Group " `size_group' "): " %6.3f `pval'
    
        * Speicherung der Ergebnisse in der Matrix
        matrix results[`size_group', 1] = `trend'
        matrix results[`size_group', 2] = `pval'
    }
    will, on its first iteration, execute the command -keep if size_group_num == `size_group'-, which expands to -keep if size_group_num == 1-. At that point all of the data from size-groups 2 and 3 have been discarded--lost and gone for ever. So there is nothing left to generate results for the medium and large groups.

    There are a few ways to fix this
    1. Get rid of the -keep if size_group_num == `size_group' command and apply -if size_group_num == `size_group' to the appropriate commands inside the loop, or
    2. get rid of the loop, wrapping its code in a program, and use the -runby- command to iterate the program over the size_group_num-defined subsets of the data. (-runby is written by Robert Picard and me, and is available from SSC.), or
    3. save the data in a tempfile just before entering the -forvalues- loop, and then -use- the tempfile at the bottom of the loop to bring back the data for later iterations, or
    Added: Of these fixes, I think 1 is the simplest and best solution here because there are only two commands, -regress- and -summarize-, that require modification (plus the removal of the -keep- command), and unless your data set is really huge, it will run efficiently enough. If that turns out to run unacceptably slowly, 2 will speed it up considerably. I would only use 3 for a situation where the data set is pretty small, so the disk operations are not very time-consuming, but the computations in the loop are very intensive and sensitive to the size of the data set--which does not appear to be the case here.
    Last edited by Clyde Schechter; 02 Dec 2024, 13:30.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      [code]

      There are a few ways to fix this
      1. Get rid of the -keep if size_group_num == `size_group' command and apply -if size_group_num == `size_group' to the appropriate commands inside the loop, or
      2. get rid of the loop, wrapping its code in a program, and use the -runby- command to iterate the program over the size_group_num-defined subsets of the data. (-runby is written by Robert Picard and me, and is available from SSC.), or
      3. save the data in a tempfile just before entering the -forvalues- loop, and then -use- the tempfile at the bottom of the loop to bring back the data for later iterations, or
      Added: Of these fixes, I think 1 is the simplest and best solution here because there are only two commands, -regress- and -summarize-, that require modification (plus the removal of the -keep- command), and unless your data set is really huge, it will run efficiently enough. If that turns out to run unacceptably slowly, 2 will speed it up considerably. I would only use 3 for a situation where the data set is pretty small, so the disk operations are not very time-consuming, but the computations in the loop are very intensive and sensitive to the size of the data set--which does not appear to be the case here.
      1 fixed my problem, thank you so much!

      Comment

      Working...
      X