The downloaded and cleaned Eurostat files are usually not in the format that most statistical softwares (Stata, SPSS) require to handle panel datasets. The format is usually of the form:
ID Year Var
———–
1 2001 1
1 2002 0
1 2003 1
2 2001 1
2 2002 1
2 2003 0
such that the ID and Year combination has not duplicates. Eurostat files are given in the “wide” format where the columns represent the variables across time in the following format:
ID Var_2001 Var_2002 Var_2003
—————————–
1 1 0 1
2 1 1 0
The follow Stata syntax reshapes the data files in the correct format:
use ./split/nama_gdp_c.dta // read the file
reshape long y, i(geo unit indic_na) j(time) string // reshape from wide to long
destring _all, replace // convert string variables to numeric
encode geo, gen(geo2) // generate a numerical variable for countries
compress
save ./final/nama_gdp_c.dta, replace
The file can now be set up in a panel format by defining a unique id based on panel indicator (geo, indicator) and time indicator (year).
Other posts in this series:
Part 2: Formatting data files
Part 1: Downloading files