Automating EUROSTAT in Stata – Part 1: Downloading files

Eurostat is a great database covering various socio-economic, environmental, and regional indicators in Europe. On the other hand, navigating the database can be daunting; similar variables exist across various different datasets, several subsets of meta datasets exist, and some datasets contain derived variables. Given these combinations, it is not unusual to extract and collate information from different dataset and combine them in one database for analysis. Unless you are a database specialist who can automate this using softwares like R, Python, Java, SQL, etc., you will spend a lot of time checking for new versions, downloading files, unzipping them, putting them together, and fixing them before you are anywhere near analysis.

This three part guide gives a step-by-step process to automating the downloading, extracting, and cleaning of Eurostat datasets such that they end up in a user-friendly format that is ready for analysis.

Users can download individual files using the interface given in the navigation window or they can use the bulk download facility. Getting individual files is cumbersome especially if you are working with several different files.

Two little hacks can be used in Stata to deal with the file management issue:

  1. Stata can read files using URLs. Eurostat has stable URLs for all datasets (the domain was recently updated). If you go in the bulk downloads facility, and click All, you can see the list of all the files in the database. Each file can be downloaded individually in a zip format.
  2. Stata can access the DOS shell allowing Stata to call in other programs. This allows you to access softwares like 7-zip (a free software), or Winrar to unzip the files within the Stata syntax.

Let’s say we want to download the file that contains the basic macro indicators, e.g. GDP, for Europe. This information is given in Table nama_gdp_c. The following set up of Stata commands initiate the download and the unzip procedure for the file:

clear // clear all data from memory
set more off, perm // don’t let Stata pause. perm does it permanently.
cap log close // close any log files that are open

global rawdir “<your directory>/Eurostat/raw” // Replace <your directory> with your directory name

cd $rawdir // change to the root directory

copy “http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&downfile=data%2Fnama_gdp_c.tsv.gz” “nama_gdp_c.tsv.gz”, replace // This command copies the zipped file from the website to the folder defined above.

** NOTE: If you directly copy this line from this post, make sure that you replace quotation marks (“”) in Stata. The symbol used by the web fonts are not recognized by Stata.

shell “C:\Program Files (x86)\7-Zip\7zG.exe” e -y nama_gdp_c.tsv.gz   // this line calls in the 7-Zip using the Stata shell command. As part of 7-zip syntax, e stands for extract and -y for replace file.

If your directories are set up properly, the last two commands should allow you to download the nama_gdp_c file smoothly. The extracted file can be opened in Excel. (If you want to set it up in Stata, follow this post Part 2: Formatting data files)

If you want to extend the set up to several other files; lets say price levels of macro indicators nama_gdp_p and real values of macro indicators nama_gdp_k, a simple loop should do the trick:

global filelist ///
        nama_gdp_c /// // GDP and main components – Current prices
        nama_gdp_p /// // GDP and main components – Price indices
        nama_gdp_k     // GDP and main components – Volumes

foreach x of global filelist {
        copy “http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&downfile=data%2F`x’.tsv.gz” “`x’.tsv.gz”,             replace
        shell “C:\Program Files (x86)\7-Zip\7zG.exe” e -y `x’.tsv.gz
        }

You can keep expanding the list of files by selecting file names from the regular browser window in the Eurostat’s data browser page. File names are always given in brackets at the end of the data you want to download.

Other posts in this series:
Part 3: Reshaping files
Part 2: Formatting data files

4 thoughts on “Automating EUROSTAT in Stata – Part 1: Downloading files

  1. Dear Asjad Naqvi,

    I have tried to run the code you suggested in the Stata, but from the following code with the “$” does not work. I run exactly like you wrote. Is there any explanation for it?

    cd $rootdir // change to the root directory

    copy “http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&downfile=data%2Fnama_gdp_c.tsv.gz” “nama_gdp_c.tsv.gz”, replace

    1. Hi Sam,

      If you copy paste the command as it is from the post, then the wrong quotation mark symbol (“”) is used. I would suggest that you re-enter them in the syntax. Otherwise the command is working fine.

      HTH!
      Asjad

Leave a Reply

Your email address will not be published.