Introduction
In this article I would like to describe my new R package - typer - which allows to describe function input & output parameter types and verify actual parameter values during execution.
R is dynamically typed programming language. On one hand, this simplifies experimenting and writing code in REPL style. On other hand, it may cause problems if code is going to be reused. For example, in absence of type definitions any value can be passed into function call which may cause unexpected behavior.
Here is simple illustration:
func <- function(a, b) { return(a + b) } func(1, "1") # This will throw error func(1, TRUE) # This may result in incorrect behaviour
There are multiple ways how to fight this problem. The most common one is to perform simple checks of input params at the beginning of the function like below.
func <- function (a, b) { if (is.numeric(a)) stop("a must be numeric") if (is.numeric(b)) stop("b must be numeric") return (a + b) } func(1, "1") # This will throw clear error func(1, TRUE) # This will throw clear error
There are also some packages which simplify such checks. The good example here is checkmate package. It allows writing easy-to-understand and fast assertions on function input parameter values.
This is how our sample function may look like with asserts from checkmate package:
func <- function (a, b) { assertNumeric(a) assertNumeric(b) return (a + b) }
Typer
The idea of typer package is to provide ability to describe function parameters in declarative way. Then this info can be used to verify actual parameter values during function execution. It also can be used for documents generation (as alternative for roxygen comments).
Here is simple example of typer usage:
func <- function(a, b) { return(a + b) } func_safe <- typer::check_function( func, input = list( a = type.numeric(), b = type.numeric() ), output = type.numeric() ) func_safe(1, 2) # Call annotated function as usual func_safe(1, TRUE) # Input params will be checked automatically
In the example above, we have func() which is our regular R function which does some work but doesn't have any type checks.
Then we have another function - func_safe() - essentially this is decorator for func(). It checks for input params at the beginning, then it calls func(), and finally checks the output value.
This decorator is created by using check_function() from typer package. It accepts 3 params - function to be decorated, types description for input params, type description for output (return) value.
In this example types are specified by using type.numeric() function from typer package. As one can guess, it indicates that param should be numeric vector. This function has additional parameters which can be used to indicate min & max length of vector, whether NA values are allowed, etc.
Typer package provides similar functions to describe other basic types - type.integer(), type.logical(), type.character(), etc.
What's more important typer package supports more complex types checks for data frames and lists.
Function type.data.frame() describes expected structure of data.frame - column names, column types, expected rows count, etc.
Function type.list() describes expected structure of list parameter - element names & types. It also supports recursive structures - e.g. list of lists of lists.
For example, let's have a look at the following function. It accepts 2 params - a and b - which assumed to be lists with certain elements (name, age, weight). Then it contatenates this data and returns data.frame.
func <- function(a, b) {
rbind(
data.frame(name = a$name, age = a$age, weight = a$weight),
data.frame(name = b$name, age = b$age, weight = b$weight)
)
}
Checks for this function may look as follows:
func_safe <- check_function( func, input = list( a = type.list(name = type.character(), age = type.integer(), weight = type.numeric()), b = type.list(name = type.character(), age = type.integer(), weight = type.numeric()) ), output = type.data.frame(cols = list( name = type.factor(), age = type.integer(), weight = type.numeric() )))So finally we can safely call this decorated function and be sure that param types will exactly match.
# All is correct func_safe( list(name = "John", age = 30L, weight = 90), list(name = "Bill", age = 40L, weight = 80) ) # Age is of wrong type func_safe( list(name = "John", age = 30, weight = 90), list(name = "Bill", age = 40, weight = 80) )Types checks in typer package are implemented via C extension, so they should not cause performance degradation.
Conclusions
Essentially typer package allows to describe R functions API. Types descriptions can be stored separately from functions definitions. They can be also reused (for example, if we have some complex structures - lists or data frames).
The main advantage of the package is that it allows describing quite complicated structures in simple way.
I believe that this package may be especially useful at late development stages. When one wants to move R functions into production usage and wants to be sure that they won't be broken by incorrect input (e.g. due to data quality issues).
Комментариев нет:
Отправить комментарий