суббота, 5 марта 2016 г.

Building Minimum Spanning Trees for Import / Export flows with R

Introduction
Inspired by this Complex Networks and Minimal Spanning Trees in International Trade Network paper, I decided to build such MST's by using R for 2010's decade and to see what are the changes since 2000 year (the paper provides MST for this year).

Input Data

I downloaded data set of export / import flows between countries from the Observatory of Economic Complexity site.

I used this data set which contains product trade data between origin and destination countries for 1995-2013 years according to Harmonized System nomenclature. The data set is a tab separated file archived with bzip2.

I loaded this file into R and groupped rows by origin, destination, and year, throwing out product code.

I also removed rows which represent trades with regions (country code is started with "x" letter).

Here is R code which I used to load input data:

library(data.table)
 
stats = fread(".\\year_origin_destination_hs92_4.tsv", sep = "\t", header = T)
 
agg_stats = stats[
 !origin %like% "x.." & !destination %like% "x..", 
 list(export = sum(export_val, na.rm = T), import = sum(import_val, na.rm = T)),
 by = list(year, origin, destination)
]

Note that the file is relatively big (> 2.5GB), so if you are using x86 system, you may not be able to read it by single call of fread() function. In such case, you may try to read it by chunks as follows:

library(data.table)
 
stats = list()
file = file(".\\year_origin_destination_hs92_4.tsv", "r")
invisible(read.table(file, header = F, nrows = 1)) #skip header row
 
repeat {
 chunks = data.table(read.table(file, sep = "\t", header = F,
  colClasses = c("integer", "character", "character", "integer", "numeric", "numeric"), skip = 0, nrows = 1e6,
  col.names = c("year", "origin", "destination", "prod", "export_val", "import_val")))
 if (nrow(chunks) == 0) break
 stats[[length(stats) + 1]] <- chunks[, list(export = sum(export_val, na.rm = T), import = sum(import_val, na.rm = T)),
  by = list(year, origin, destination)]
}
 
close(file)
 
stats = rbindlist(stats)
 
agg_stats = stats[
 !origin %like% "x.." & !destination %like% "x..", 
 list(export = sum(export_val, na.rm = T), import = sum(import_val, na.rm = T)),
 by = list(year, origin, destination)
]

Minimum Spanning Tree with R

I used igraph package to build minimum spannig tree. This package implements Prim's algorithm which seems to be a good choice for such dense graph as international trade network.

As edge weight between countries i and j, I used the following value:



Here is R code which I used to build MST's per year - it saves trees as JPG files:

library(igraph)
 
for(y in unique(agg_stats$year)) {
 
 year_stats = agg_stats[year == y]
 
 d = sort(unique(c(year_stats$origin, year_stats$dest)))
 
 m = matrix(0, nrow = length(d), ncol = length(d))
 colnames(m) = rownames(m) = d
 
 for(i in seq_len(nrow(year_stats))) {
  m[year_stats[i, origin], year_stats[i, dest]] = 
   year_stats[i, 1 / (0.5 * (export + import))]
 }
 
 g = graph.adjacency(m, weighted = T, mode = "lower")
 mst = minimum.spanning.tree(g)
 
 jpeg(sprintf('.\\stats_%d.jpg', y), width = 2000, height = 2000)
 plot(mst, vertex.size = 3)
 dev.off()
}

Results

Here are resulting MST's per year:
 2000:


2001:


2002:


2003:


2004:


2005:



2006:



2007:


2008:



2009:


2010:



2011:



2012:



2013:

And here are some observations by looking at these MST's:

  • In 2000 - there was one central hub around USA (which includes sub-hubs around the Great Britain, Japan, and Chine) and there was one smaller semi-hub around German which also includes France, Italy, and Russia sub-hubs. This fits to what we can see in the paper.
  • The picture remains relatively same till 2006, when we observe China separation into standalone hub with Japan as sub-hub. Next year, in 2007, we see the clear picture with relatively equal three hubs around China, USA, and German.
  • And finally, in 2013 we can see the same three hubs around China, USA, and German. But now China's hub is obviously dominating.



Комментариев нет:

Отправить комментарий