New commands xtimportu and xtmipolateu

Ilya Bolotov

Join Date: Nov 2018
Posts: 75

New commands xtimportu and xtmipolateu

02 Dec 2020, 15:14

Dear Stata users,

I've written two commands for time series and panel data which I'd like to share. The first one imports long or wide (pivoted) data from CSV, MS Excel and other supported file types. For example, macroeconomic data is often published in MS Excel format as some kind of pivoted tables which need to be manually processed to be used (see the example on the Czech demographic statistics or UNCTAD bilateral FDI tables). xtimportu can do the processing for you in less than a minute.

The second one allows batch application of mipolate on multiple variables, exports summarise statistics and produces a detailed report on before and after.

Code:

ssc desc xtimportu
ssc desc xtmipolateu

xtimportu imports long or wide (pivoted) monthly, quarterly half-yearly or yearly time series and panel data from a supported file type using standard Stata import syntax, filtering cross-sectional units with regular expressions (case-sensitive!) and encoding the matches (each match to a string, spaces are replaced with underscores "_") (if required), and returns long data as panelvar timer valuevar to memory and/or to a file using standard Stata export and/or save syntax. Imported data are assumed to be wide if timevar is omitted.

xtimportu uses (SSC) sxpose2 in combination with destring and reshape to process wide (pivoted) data and accepts all destring options. For wide (pivoted) data the first imported row (_n == 1) must include future timevar values (at least the years with possible breaks in place of higher frequency) which can be achieved with the help of import and xtimportu subcommand options such as cellrange([start][:end]) or preformat(string).

Unicode strings are supported in both regex(string) and encode(string).

Code:

        ****
        * Example 1. Population time series for the Czech Republic (a country in Central Europe, EU member since 2004)
        ****

        * RegEx for the indicator, case sensitive!
        * unoptimized, illustration only
        . local regex "Počet"

        * ČSÚ's (Czech Statistical Office) file URL for Population
        . local url "https://www.czso.cz/documents/10180/123502877/32018120_0101.xlsx/d60b89c8-980c-4f3a-bc0c-46f38b0b8681?version=1.0"

        * import the time series data to memory, unit: thousand
        . xtimportu excel "`url'", cellrange(A3) regex(`regex') encode("Czech Republic") tfreq(Y) tde clear

        * revert underscores to spaces in the unit
        . replace unit = ustrregexra(unit, "_", " ")

        * tsset data
        . tsset year


        ****
        * Example 2. FDI matrix from UNCTAD's Bilateral FDI statistics (historical data, 2000–2014)
        ****

        * RegEx for the EU-28, case sensitive! "{0,}$" (0 or more non-word characters) excludes Netherlands Antilles
        * unoptimized, illustration only
        . local regex "`regex'Austria|Belgium|Bulgaria|Croatia|Cyprus|Czech Republic|Denmark|Estonia|"
        . local regex "`regex'Finland|France|Germany|Greece|Hungary|Ireland|Italy|Latvia|Lithuania|"
        . local regex "`regex'Luxembourg|Malta|Netherlands\W{0,}$|Poland|Portugal|Romania|"
        . local regex "`regex'Slovakia|Slovenia|Spain|Sweden|United Kingdom"

        * UNCTAD's (United Nations Conference on Trade and Development) file URL for the U.S.
        . local url "https://unctad.org/system/files/non-official-document/webdiaeia2014d3_USA.xls"

        * import the panel data to memory, export a copy as a CSV file
        . xtimportu excel "`url'", sheet("inflows") cellrange(E5) regex(`regex') tfreq(Y) clear tde export(delimited "./usa_fdi_matrix.csv", replace)

        * rename variables to form the 28x1 aka the EU-28 x U.S. FDI matrix, unit: million USD
        . rename unit from
        . rename value to_USA

        * xtset data
        . encode from, gen(id)
        . xtset id year

xtmipolateu replaces missing values in a time series, two- or multidimensional varlist with interpolated (extrapolated) ones using (SSC) mipolate, at the same time allowing the user to export related descriptive statistics (mean, sd, min and max) collected with (SSC) summarizeby (an extension of statsby for summarize with the same syntax but no ": command") and to write corresponding tsline or xtline graphs and tables with these statistics into a new or existing report document using putdocx or putpdf, keeping it open or saving it and finishing work.

The user can change the style of the report document by typing a customized putdocx begin or putpdf begin prior to running xtmipolateu or with putdocx table or putpdf table after the command has finished running. The name of each table in the document is identical to each varname in the varlist.

Code:

        . sysuse xtline1.dta, clear
        . replace calories = . if ! mod(_n, 5) | ! mod(_n, 14)

        * pure mipolate:
        . xtmipolateu calories, i(person) t(day) spline 

        * export descriptive statistics to "./stats.csv"
        . xtmipolateu calories, i(person) t(day) nearest export(delimited using "./stats.csv", replace)

        * create a report file "./report.docx"
        . xtmipolateu calories, i(person) t(day) spline epolate put("docx") sa("./report.docx", replace)

        * create a report file "./report.pdf" with a custom style
        . set scheme economist
        . putpdf begin, landscape font("Verdana")
        . xtmipolateu calories, i(person) t(day) epolate put("pdf") pbr sa("./report.pdf", replace)

Tags: None

Ilya Bolotov

Join Date: Nov 2018
Posts: 75

02 Dec 2020, 15:50

More complex example with the preformat() option in xtimportu
Source: BEA, GDP by Industry
https://www.bea.gov/data/gdp/gdp-industry

Code:

program def _fix
    * we want values from 2017-Q1 onwards
    drop C-G
    * concatenate years and quarters (Q#)
    qui ds B, not
    foreach var of varlist `r(varlist)' {
        cap replace `var' = `var' + "-Q" +  ustrregexra(`var'[_n + 1], "\D", "", .) if _n == 1
        * drop empty columns
        if missing(`var'[1]) {
            drop `var'
        }
    }
end

xtimportu excel "https://www.bea.gov/sites/default/files/2020-09/gdp2q20_3rd.xlsx", sheet(Table 1) cellrange(B3) preformat(_fix) regex(Gross domestic product) encode(GDP %change) tfreq(Q) tdestring

Last edited by Ilya Bolotov; 02 Dec 2020, 15:53.

Comment

Ilya Bolotov

Join Date: Nov 2018

Posts: 75
#3

02 Dec 2020, 17:50

Almost forgot, many thanks to Kit Baum for uploading the commands to SSC!
Comment
Ilya Bolotov

Join Date: Nov 2018

Posts: 75
#4

01 Apr 2024, 14:11

Update

xtimportu now keeps string variables.
Comment

Announcement

New commands xtimportu and xtmipolateu

Comment

Comment

Comment