Automating EUROSTAT in Stata – Part 3: Reshaping the data

The downloaded and cleaned Eurostat files are usually not in the format that most statistical softwares (Stata, SPSS) require to handle panel datasets. The format is usually of the form:

ID Year Var
1  2001   1
1  2002   0
1  2003   1
2  2001   1
2  2002   1
2  2003   0

such that the ID and Year combination has not duplicates. Eurostat files are given in the “wide” format where the columns represent the variables across time in the following format:

ID Var_2001 Var_2002 Var_2003
1         1        0        1
2         1        1        0


The follow Stata syntax reshapes the data files in the correct format:


use ./split/nama_gdp_c.dta   // read the file

reshape long y, i(geo unit indic_na) j(time) string   // reshape from wide to long 

destring _all, replace  // convert string variables to numeric

encode geo, gen(geo2) // generate a numerical variable for countries


save ./final/nama_gdp_c.dta, replace

The file can now be set up in a panel format by defining a unique id based on panel indicator (geo, indicator) and time indicator (year).

