vignettes/webs/custom_distributions.rmd
custom_distributions.rmd
The previous
tutorial introduced wrappers in distr6. This final tutorial puts
everything we’ve learnt together to create your own custom distribution
object (this is not the same as creating a new class!). All
distributions implemented in distr6 inherit from class
SDistribution
this tells you that they are the ‘special
distributions’ that we have implemented. SDistribution
is
an ‘abstract’ class, this means it can’t be constructed to make a
SDistribution
object, however the Distribution
class can be.
The most basic distribution that can be constructed consists of a
name and one of pdf or cdf. But most of the time we will also require a
ParameterSet
. We will demonstrate all of this by using the
running example of a custom uniform distribution.
The self
keyword is used to tell an object that it
should call a method on itself. For example we have used the method
getParameterValue()
on objects before but often we need the
object to use this method on itself, so we use
self$getParameterValue()
this is especially important when
defining d/p/q/r functions.
The pdf of the Uniform distribution is defined by where and are upper and lower limits respectively.
Hence our pdf function needs to get the values of these limits, and define the distribution support
# non-vectorised
pdf <- function(x){
pdf <- numeric(length(x))
lower <- self$getParameterValue("lower")
upper <- self$getParameterValue("upper")
pdf[x >= lower & x <= upper] = 1/(upper - lower)
return(pdf)
}
In distr6, all pdf and cdf functions use the first argument of
x
, for univariates this is assumed to be a vector and for
multivariates a matrix.
We have a pdf that accesses parameters, but currently we have no parameters to access. To add these we use param6.
We now have the basics required to construct our custom uniform distribution, the last thing we require is the distribution support. Often the support can be omitted, in which case, the default set of Reals will be used, but in the case of the uniform distribution the support is very important.
support <- set6::Interval$new(1, 10)
type <- set6::Reals$new()
U <- Distribution$new(name = "Uniform", pdf = pdf, parameters = ps, support = support,
type = type)
Other traits are automatically filled
U$traits
#> $type
#> ℝ
#>
#> $valueSupport
#> [1] "continuous"
#>
#> $variateForm
#> [1] "univariate"
And now we can use your distribution:
U$pdf(5)
#> [1] 0.1111111
# The log argument can be imputed with decorators
decorate(U, "CoreStatistics")
#> Uniform is now decorated with CoreStatistics
#> Uniform(lower = 1, upper = 10)
U$pdf(4, log = T)
#> [1] -2.197225
# Automatically returns 0 when outside the support
U$pdf(-2)
#> [1] 0
U$pdf(11)
#> [1] 0
But the cdf returns NULL as we never supplied a function, so we could
supply one or we could impute it using the
FunctionImputation
decorator:
U$cdf(5)
#> NULL
decorate(U, "FunctionImputation")
#> Uniform is now decorated with FunctionImputation
#> Uniform(lower = 1, upper = 10)
U$cdf(5)
#> Results from numeric calculations are approximate only. Better results may be available.
#> Results from numeric calculations are approximate only. Better results may be available.
#> Results from numeric calculations are approximate only. Better results may be available.
#> [1] 0.4444444
# The same as expected
punif(5, min = 1, max = 10)
#> [1] 0.4444444
Finally a whole host of other arguments could be supplied to the
Distribution to make the results more precise, the full list can be seen
in ?Distribution
. A couple of things to take care about
are:
cdf <- function(x){
cdf <- numeric(length(x))
lower <- self$getParameterValue("lower")
upper <- self$getParameterValue("upper")
cdf[x >= upper] = 1
cdf[x >= lower & x < upper] = (x - lower) / (upper - lower)
return(cdf)
}
U <- Distribution$new(name = "Uniform", short_name = "unif", type = set6::Reals$new(),
support = set6::Interval$new(1, 10), symmetric = TRUE, pdf = pdf, cdf = cdf, parameters = ps, description = "Custom uniform distribution")
decorate(U, c("CoreStatistics", "ExoticStatistics", "FunctionImputation"))
#> Uniform is now decorated with CoreStatistics, ExoticStatistics, FunctionImputation
#> unif(lower = 1, upper = 10)
U$mean()
#> Results from numeric calculations are approximate only. Better results may be available.
#> [1] 5.5
U$variance()
#> Results from numeric calculations are approximate only. Better results may be available.
#> [1] 6.75
U$hazard(5)
#> [1] 0.25
U$rand(5)
#> Results from numeric calculations are approximate only. Better results may be available.
#> [1] 6.040 3.880 1.657 2.305 1.432
U$kurtosis()
#> Results from numeric calculations are approximate only. Better results may be available.
#> [1] -1.2
U$survivalPNorm(3, 2, 6)
#> Results from numeric calculations are approximate only. Better results may be available.
#> [1] 0.5981347
These tutorials have covered everything from the basics of
constructing an implemented SDistribution
right the way
through, accessing and setting parameters, analysis distributions,
manipulating them with decorators and wrappers, and finally adding your
own custom distribution and using decorators to analyse it. Everything
we have covered also applies to the Kernels in distr6, although these
have less functionality, to see which are implemented run
listKernels()
.
The Extension Guidelines explain how to implement your own SDistribution, Kernel, Decorator or Wrapper and the Appendices include discussions about OOP, R6, C vs. R implementation, the current API lifecycle and other design decisions. The project wiki includes design documentation and contributor guidelines, please read these before making a pull request.
We hope you find distr6 intuitive to use but if you have any questions or want to report a bug, please don’t hesitate to raise an issue.
Good luck and happy coding!