суббота, 27 февраля 2016 г.

Best Practices for Batch Systems Creation

Introduction


Inspired by article Creating a Microservice? Answer these 10 Questions First I decided to summarize my experience working with batch processing systems and created check list of items which need to be taken into account / think over when working on such projects.

The existing literature on this topic usually concentrates on batch architecture design and specific frameworks. In this article I distilled my experience and try to reflect on general best practices for real world applications.

Before start I would like to introduce some terms:

* Batch is the execution series of processing and calculation steps in non-interactive mode. Usually batches are run in daily mode (e.g. overnight batches), but there may be intraday batches (several re-runs per day) or less frequent batches (e.g. monthly).

* Job (or batch job) is a single step in batch processing. Usually batch consists of multiple jobs which can be executed in parallel or sequentially. Batch processing jobs may be run on pre-defined schedule and have cascade dependencies (e.g. job B is executed after job A is done).

* Environment is a place where batches are executing. This usually includes pre-configured hosts (application boxes, database servers, cluster nodes) where batches is running and messaging infrastructure (e.g. message queues, service buses). There may be multiple environment instances - production, disaster recovery, development, testing, etc.