2015 German Stata Users Group Meeting, Announcement and Program

Ulrich Kohler

Join Date: May 2014
Posts: 89

2015 German Stata Users Group Meeting, Announcement and Program

29 Apr 2015, 07:36

Overview

Conference Date	Friday, June 26, 2015
	9:00 – 18:45h
Workshop Date	Thursday, June 25, 2015
Venue	Institute for Employment Research (IAB)
	Regenburger Str. 104
	90478 Nürnberg (Nuremberg)
	Germany
Cost	Meeting only: 45 € (students 30 €)
	Workshop only: 65 €
	Workshop and Meeting: 85 €
URL	http://stata.com/meeting/germany15/

Meeting

The 13th German Stata Users Group Meeting will be held on
Friday, 26th June 2015 in Nuremberg at the IAB (Institute for
Employment Research). We would like to invite everybody from
everywhere who is interested in using Stata to attend this
meeting.

The academic program of the meeting is being organized by
Johannes Giesecke, Humboldt University Berlin
([email protected]) and Stephanie Eckman, IAB
([email protected]). The conference language will be English due
to the international nature of the meeting and the participation of
non-German guest speakers. The logistics of the conference are being
organized by Dittrich und Partner, distributor of Stata in several
countries including Germany, The Netherlands, Austria, Czech Republic
and Hungary (http://www.dpc-software.de).

Workshop

On the day before the conference, there will be a workshop on
“Introduction to Nonparametric and Semiparametric Analysis with Stata”
given by Dr. Johannes Ludsteck of the IAB. The workshop will be held
at the Institut für Arbeitsmarkt- und Berufsforschung (IAB). Details
about the workshop are given below and at
http://stata.com/meeting/germany15/workshop/ or www.dpc-software.de

Conference Dinner

There is (at additional cost) the option of an informal meal at a restaurant in Nuremberg on Friday evening. Details about this event will be provided soon.

IMPORTANT NOTE:

All participants must present a valid ID to enter the conference venue!!!

Program Schedule

09:00 - 09:15 Registration

09:15 - 09:30 Welcome

09:30 - 10:30 Plenary Talk: Statistical learning with Boosting
Matthias Schonlau (University of Waterloo, Canada)

10:30 - 11:30 Estimating survival-time treatment effects and endogenous treatment effects using Stata
David Drukker (StataCorp)

11:30 - 11:45 Coffee

11:45 - 12:15 Multiprocess modeling with Stata
Tamás Bartus (Corvinus University of Budapest)

12:15 - 12:45 A Stata ado for categorical data analysis with latent variables
Hans-Jürgen Andreß, Maximilian Hörl, Alexander Schmidt-Catran (University of Cologne)

12:45 - 13:45 Lunch

13:45 - 14:15 simarwilson: DEA based Two-Step Efficiency Analysis
Harald Tauchmann (Friedrich-Alexander-University Erlangen-Nuremberg)

14:15 - 14:45 A simple procedure to correct for measurement errors in survey research
Anna de Castellarnau (Research and Expertise Centre for Survey Methodology, University Pompeu Fabra)

14:45 - 15:15 Time series analysis using ARFIMA
Frank Ebert (Ebert Beratung und Innovationen GmbH)

15:15 - 15:30 Coffee

15:30 - 16:00 PSIDTOOLS: An interface to the Panel Study of Income Dynamics
Ulrich Kohler (University of Potsdam)

16:00 - 16:30 Extensions to the label commands
Daniel Klein (University of Kassel)

16:30 - 17:00 A new Stata command for computing and graphing percentile shares
Ben Jann (University of Bern)

17:00 - 17:15 Coffee

17:15 - 18:00 Report to the users
Bill Rising (StataCorp)

18:00 - 18:30 Wishes and grumbles

18:30 End of the meeting

Conference venue

Institute for Employment Research (IAB)
Bundesagentur für Arbeit
Room 168
Regenburger Str. 104
90478 Nürnberg (Nuremberg)

(see http://www.iab.de)

How to get to the venue

From Nuremburg central station, take tram line No 9 towards “Dokumentations*zentrum” and exit at “Meistersingerhalle”. Cross the tracks, turn into Weddigenstraße and walk for about five minutes. Turn left into Regensburger Straße until you reach No 104, the main entrance of the conference venue.

For more information on how to get to the conference venue, see http://www.iab.de/en/ueberblick/finden.aspx

Public transportation timetables can be found at http://www.vgn.de

The conference will take place in room 168 (on the first floor).

IMPORTANT NOTE: All participants must present a valid ID to enter the conference venue!!!

Abstracts

09:00 - 09:15 Registration
09:15 - 09:30 Welcome

09:30 - 10:30 Plenary Talk: Statistical learning with Boosting
Matthias Schonlau (University of Waterloo, Canada)
Email: [email protected]

Abstract: When conducting linear and logistic regression, the conscientious scientist should search for interactions, nonlinearities, outliers, conduct residual analysis, assess goodness of fit, and iterate as needed. This is time consuming. Statistical or machine learning offers alternatives: instead of imposing a linear model, a flexible nonlinear model can be fit. Boosting, or boosted regression, is a statistical learning technique that has shown considerable success in predictive accuracy. I introduce boosting and describe my Stata implementation, boost, that implements the MART boosting algorithm (Hastie et al. 2001). Currently, boost accommodates Gaussian, logistic, and Poisson boosted regression and is implemented as a Windows C++ plugin to Stata. I will illustrate this technique with several examples.

References:
Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer

10:30 - 11:30 Estimating survival-time treatment effects and endogenous treatment effects using Stata
David Drukker (StataCorp)
Email: [email protected]

Abstract: After reviewing the potential-outcome approach to estimating treatment effects from observational data, this talk discusses new estimators in Stata 14 for estimating average treatment effects from survival-time data and estimators for average treatments from endogenous-treatment designs. The talk also covers new research on estimating quantile treatment effects.

11:30 - 11:45 Coffee

11:45 - 12:15 Multiprocess modeling with Stata
Tamás Bartus (Corvinus University of Budapest)
Email: [email protected]

Abstract: Multiprocess hazard models consist of multilevel hazard and discrete-choice equations with correlated random effects, and are routinely used by demographers to correct estimates for endogeneity and sample selection. Although no official Stata command is devoted to estimate systems of hazard equations, the official gsem command and the user-written cmp command offer the opportunity to estimate models of this sort (Roodman 2011, Bartus and Roodman 2014). The presentation addresses (1) the joint estimation of multilevel discrete-time survival and discrete-choice equations with the gsem and the cmp commands; (2) the estimation of (either multilevel or single-level) systems of lognormal survival and discrete-choice equations with the cmp command; and (3) the preparation of multi-spell survival datasets for the purpose of estimation. Multiprocess survival modeling is illustrated using standard examples from demographic research.

References:
Roodman, D. (2011). Estimating fully observed recursive mixed-process models with cmp. Stata Journal 11: 159-206.
Bartus, T., Roodman, D. (2014). Estimation of multiprocess survival models with cmp. Stata Journal, 14: 756-777.

12:15 - 12:45 A Stata ado for categorical data analysis with latent variables
Hans-Jürgen Andreß, Maximilian Hörl, Alexander Schmidt-Catran (University of Cologne)
Email: [email protected]; [email protected]; [email protected]

Abstract: Path models are used widely in the social sciences to illustrate statistical models used in applied research. They describe the assumed relationships and dependencies between the variables of interest and are easy to comprehend even for statistical laypersons. Up to now they have mostly been applied to quantitative data. But the main ideas are easily transferred to the analysis of categorical data. In doing so, they present a unified approach on different statistical methods for categorical data analysis. The catsem ado attempts to access all these different methods, which are scattered over a whole range of Stata commands, with an easy to understand and intuitive command language that basically describes path diagrams. Moreover, it adds functionality that at present is not yet included in Stata. One is the possibility to include categorical latent variables Andreß et al. (1997), the other is the possibility to analyze fairly general functions of the responses as described by Grizzle et al. (1969).

References:
Andreß, H.-J., Hagenaars, J. A., Kühnel, S. (1997). Analyse von Tabellen
und kategorialen Daten: Log-lineare Modelle, latente Klassenanalyse, logistische
Regression und GSK-Ansatz. Springer-Lehrbuch. Springer, Berlin et al.
Grizzle, J., Starmer, C., Koch, G. (1969). Analysis of categorical data by
linear models. Biometrics, 25:489–504.

12:45 - 13:45 Lunch

13:45 - 14:15 simarwilson: DEA based Two-Step Efficiency Analysis
Harald Tauchmann (Friedrich-Alexander-Universität Erlangen-Nürnberg)
Email: [email protected]

Abstract: Measuring efficiency of production units (DMU) has developed into an industry in applied econometrics. Unlike parametric approaches, non-parametric techniques – namely DEA – yield individual efficiency scores for DMUs but do not directly answer the question of what determines efficiency differentials between them. One obvious way to circumvent this limitation is to conduct a two-stage analysis, where DEA scores, obtained on the first stage, serve as lefthand-side variable in regression on the second stage that links efficiency to exogenous factors. Such two-step approach, however, encounters severe problems: (i) DEA efficiency scores are bounded – depending on how efficiency is defined – from above or from below at the value of one, (ii) DEA generates a complex and generally unknown correlation pattern among estimated efficiency scores resulting in invalid inference in the subsequent regression analysis. To address these problems Simar & Wilson (2007) suggest a simulation based, multi-step iterative procedure that follows DEA and is based on (i) truncated regressions, (ii) simulating the unknown error correlation, and (iii) calculating bootstrapped standard errors. We introduce the new Stata command simarwilson that implements this procedure in Stata. It complements the user written command dea (Ji & Lee, 2010), which has to precede simarwilson in applied work.

References:
Simar, L., Wilson, P.W. (2007): Estimation and inference in two-stage semi-parametric models of production processes, Journal of Econometrics 136: 31–64.
Ji, Y.-B., Lee, C. (2010): Data envelopment analysis, Stata Journal 10: 267–280.

14:15 - 14:45 A simple procedure to correct for measurement errors in survey research
Anna de Castellarnau (Research and Expertise Centre for Survey Methodology, University Pompeu Fabra)
Email: [email protected]

Abstract: Although there is wide literature on the existence of measurement errors, few researchers are correcting them in their analyses. In this presentation we will show that correction for measurement errors in survey research is not only necessary but also possible and actually rather simple. Using the quality estimates obtained from the free online software Survey Quality Predictor (SQP), correlation and covariance matrices can easily be corrected and used as input for your analyses. This procedure was described for Stata, LISREL and R in the ESS EduNet module “A simple procedure to correct for measurement errors in survey research”. This presentation will focus on the correction of measurement errors in regression analysis and causal models using Stata.

14:45 - 15:15 Time series analysis using ARFIMA
Frank Ebert (Ebert Beratung und Innovationen GmbH)
Email: [email protected]

Abstract: Since version 12, Stata offers the analysis of ARFIMA models. How can it be applied and what should be considered when using it? Weather data is reported to show a “long” memory. This can be checked by estimating the fractional integration parameter d of an autoregressive fractionally (or fractal) integrated moving average (ARFIMA) process. Further relevant data are high-frequency stock-market quotations and energy prices. Weather data (in particular wind time series) seem to show a complimentary behavior to energy prices. A further aspect is the characterization of time series by its fractional integration parameter d. Can it be used to compress large amounts of time series data? More technical questions are: What should be considered working with data that is influenced by fractal (non-white) noise and what could be done to overcome performance problems?

15:15 - 15:30 Coffee

15:30 - 16:00 PSIDTOOLS: An interface to the the Panel Study of Income Dynamics
Ulrich Kohler (University of Potsdam)
Email: [email protected]

Abstract: The presentation presents a collection of user written programs designed to make analyses of the Panel Study of Income Dynamics (PSID) easier. The PSID is the longest running longitudinal household survey in the word. Beginning in 1968, the PSID collected yearly information from over 18.000 individuals living in 5.000 households. The PSID offers data to study a broad range of topics including employment, income, wealth, expenditures, health, and numerous others. Like for many other Panel studies, the hurdles for using the data are relatively high, however. One reason is that the main corpus of the PSID data is being delivered to the end user in sets of yearly ASCII text files, forcing the user to first retrieve a dataset streamlined to the research topic. The PSIDtools make these initial steps of PSID data analysis real easy. Particularly, the programs automatically create Stata datasets from ASCII text files, load and merge items from several PSID waves, ease wide-long conversions (while keeping labeling information) and automatically add value label information from the PSID homepage to the data set in memory.

16:00 - 16:30 Extensions to the label commands
Daniel Klein (University of Kassel)
Email: [email protected]

Abstract: Stata has commands to change variable names as well as their contents using expressions, a variety of functions or simple transformation rules. Name abbreviations, wildcard characters, time-series operators and factor-variable notation further facilitate working with variables. Managing value and variable labels on the other hand, is not as convenient. Despite a large number of existing user-written commands for this purpose, there is still room for improvement. In this presentation, I introduce a new package, elab, that aims at transferring concepts for manipulating variables to value and variable labels. The package enhances the capabilities of official Stata’s label suit and introduces additional tools similar to existing Stata commands for managing variables. Features of elab include support for value label name abbreviations and wildcard characters, as well as restricting requests to subsets of integer to text mappings. The package offers commands to systematically change integer values and text in value labels using arithmetic expressions or string functions. It further provides programming utilities making it easy to implement these features in do- and ado-files.

16:30 - 17:00 A new Stata command for computing and graphing percentile shares
Ben Jann (University of Bern)
Email: [email protected]

Abstract: Percentile shares provide an intuitive and easy-to-understand way for analyzing income or wealth distributions. A celebrated example are the top income shares sported by the works of Thomas Piketty and colleagues. Moreover, series of percentile shares, defined as differences between Lorenz ordinates, can be used to visualize whole distributions or changes in distributions. In this talk I present a new command called -pshare- that computes and graphs percentile shares (or changes in percentile shares) from individual level data. The command also provides confidence intervals and supports survey estimation.

17:00 - 17:15 Coffee

17:15 - 18:00 Report to the users
Bill Rising (StataCorp)
Email: [email protected]

Abstract: Bill Rising, Director of Educational Services at StataCorp, talks about developments at Stata.

18:00 - 18:30 Wishes and grumbles

18:30 End of the meeting

Registration and accommodations

Participants are asked to travel at their own expense. The conference fee covers costs for coffee, tea, and lunch. There will also be an optional informal meal at additional cost at a restaurant in Nuremberg on Friday evening.

You can enroll by emailing Christiane Senczek ([email protected]) or by writing, phoning, or faxing to

Christiane Senczek
Dittrich & Partner Consulting GmbH
Prinzenstraße 2
42697 Solingen
Germany
Tel: +49 (0)212 2 60 66-51
Fax: +49 (0)212 2 60 66-66
www.dpc-software.de

Workshop “Introduction to Nonparametric and Semiparametric Analysis with Stata”

Date and Place
Thursday, June 25 2015, 9:00 – 17:00
Institute for Employment Research (IAB)
Regenburger Str. 104
90478 Nürnberg (Nuremberg)
Germany

Presenters
Dr. Johannes Ludsteck (IAB)

Fees
65 € (Workshop and Conference: 85 €)

Register
[email protected]

Information
http://stata.com/meeting/germany15/workshop/
www.dpc-software.de

Applied researchers typically think of nonparametric methods as strange exotic animals, mainly because they are not part of the standard graduate curriculum. These methods are avoided frequently even if they are readily implemented in Stata, mainly for two reasons. Either users hesitate to apply things they don't understand intuitively, or they are unsettled by the importance of tuning parameters that seem to introduce arbitrariness. We show that nonparametric methods may be quite transparent and extremely useful, especially for explorative and visual data analysis. The lecture will give a short, intuition-based introduction to basic methods (kernel density estimation, regression-smoothing) and demonstrate their applications and visual analysis. Focus is put on cases where simple Stata programming is required to obtain semiparametric workhorses by combining nonparametric and parametric estimators.

Tags: None