Introduction
In this article I would like to compare different approaches for data serialization available in R. The comparison will be done from point of views of serialization / deserialization performance and compactness of disk space required. I would perform analysis for data table objects, since these are objects which I need to serialize/deserialize in my practice most often.
The following approaches are reviewed:
- Functions saveRDS / readRDS:
- This is standard serialization mechanism available in R.
- It supports all R objects types and provides as-is serialization / deserialization (with possible nuances for custom references objects).
- It supports compressed & uncompressed data storage.
- Essentially, this is dump of object memory representation in R, so unfortunately this is R-only serialization format.
- Package feather:
- This is fast and language agnostic alternative for RDS format.
- It uses column oriented file format (based on Appache Arrow and Flatbuffers library).
- The format is open-source and is supported both in R & Python.
- Package fst:
- This is another alternative for RDS & Feather formats which can be used for fast data frames serialization.
- It supports compression by using LZ4 and ZSTD algorithms.
- The big advantage of this approach that it provides full random access to rows & columns of stored data.
- Package RProtoBuf:
- This is R interface package for Protocol Buffers serialization method proposed by Google.
- Usually, this approach is used for serializing of relatively small structured objects. But it would be interesting to see how it will deal with data tables serialization in R.
- Functions write.csv & read.csv:
- This is standard R functions for storing & reading data frames in text-based CSV format.
- This approach can be easily applied only to data frame objects, but I've included it into comparison, since most objects which I need to serialize in my practice are data tables.
- Functions fwrite & fread from data.table package:
- This is another approach for storing & reading data table objects.
- These functions are much more optimized in comparison to standard ones above, so it would be nice to compare them.
- Package RSQLite:
- This package provides R interface to SQLite embedded database engine.
- Also it may be overkill to use such approach for simple data tables serialization purposes, I've included this package into comparison for sake of completeness.