Mathematical and statistical functions for the Categorical distribution, which is commonly used in classification supervised learning.
Returns an R6 object inheriting from class SDistribution.
The Categorical distribution parameterised with a given support set, \(x_1,...,x_k\), and respective probabilities, \(p_1,...,p_k\), is defined by the pmf, $$f(x_i) = p_i$$ for \(p_i, i = 1,\ldots,k; \sum p_i = 1\).
Sampling from this distribution is performed with the sample function with the elements given
as the support set and the probabilities from the probs
parameter. The cdf and quantile assumes
that the elements are supplied in an indexed order (otherwise the results are meaningless).
The number of points in the distribution cannot be changed after construction.
The distribution is supported on \(x_1,...,x_k\).
Cat(elements = 1, probs = 1)
N/A
N/A
McLaughlin, M. P. (2001). A compendium of common probability distributions (pp. 2014-01). Michael P. McLaughlin.
Other discrete distributions:
Arrdist
,
Bernoulli
,
Binomial
,
Degenerate
,
DiscreteUniform
,
EmpiricalMV
,
Empirical
,
Geometric
,
Hypergeometric
,
Logarithmic
,
Matdist
,
Multinomial
,
NegativeBinomial
,
WeightedDiscrete
Other univariate distributions:
Arcsine
,
Arrdist
,
Bernoulli
,
BetaNoncentral
,
Beta
,
Binomial
,
Cauchy
,
ChiSquaredNoncentral
,
ChiSquared
,
Degenerate
,
DiscreteUniform
,
Empirical
,
Erlang
,
Exponential
,
FDistributionNoncentral
,
FDistribution
,
Frechet
,
Gamma
,
Geometric
,
Gompertz
,
Gumbel
,
Hypergeometric
,
InverseGamma
,
Laplace
,
Logarithmic
,
Logistic
,
Loglogistic
,
Lognormal
,
Matdist
,
NegativeBinomial
,
Normal
,
Pareto
,
Poisson
,
Rayleigh
,
ShiftedLoglogistic
,
StudentTNoncentral
,
StudentT
,
Triangular
,
Uniform
,
Wald
,
Weibull
,
WeightedDiscrete
distr6::Distribution
-> distr6::SDistribution
-> Categorical
name
Full name of distribution.
short_name
Short name of distribution for printing.
description
Brief description of the distribution.
alias
Alias of the distribution.
properties
Returns distribution properties, including skewness type and symmetry.
Inherited methods
distr6::Distribution$cdf()
distr6::Distribution$confidence()
distr6::Distribution$correlation()
distr6::Distribution$getParameterValue()
distr6::Distribution$iqr()
distr6::Distribution$liesInSupport()
distr6::Distribution$liesInType()
distr6::Distribution$median()
distr6::Distribution$parameters()
distr6::Distribution$pdf()
distr6::Distribution$prec()
distr6::Distribution$print()
distr6::Distribution$quantile()
distr6::Distribution$rand()
distr6::Distribution$setParameterValue()
distr6::Distribution$stdev()
distr6::Distribution$strprint()
distr6::Distribution$summary()
distr6::Distribution$workingSupport()
new()
Creates a new instance of this R6 class.
Categorical$new(elements = NULL, probs = NULL, decorators = NULL)
# Note probabilities are automatically normalised (if not vectorised)
x <- Categorical$new(elements = list("Bapple", "Banana", 2), probs = c(0.2, 0.4, 1))
# Length of elements and probabilities cannot be changed after construction
x$setParameterValue(probs = c(0.1, 0.2, 0.7))
# d/p/q/r
x$pdf(c("Bapple", "Carrot", 1, 2))
x$cdf("Banana") # Assumes ordered in construction
x$quantile(0.42) # Assumes ordered in construction
x$rand(10)
# Statistics
x$mode()
summary(x)
mean()
The arithmetic mean of a (discrete) probability distribution X is the expectation $$E_X(X) = \sum p_X(x)*x$$ with an integration analogue for continuous distributions.
mode()
The mode of a probability distribution is the point at which the pdf is a local maximum, a distribution can be unimodal (one maximum) or multimodal (several maxima).
variance()
The variance of a distribution is defined by the formula $$var_X = E[X^2] - E[X]^2$$ where \(E_X\) is the expectation of distribution X. If the distribution is multivariate the covariance matrix is returned.
skewness()
The skewness of a distribution is defined by the third standardised moment, $$sk_X = E_X[\frac{x - \mu}{\sigma}^3]$$ where \(E_X\) is the expectation of distribution X, \(\mu\) is the mean of the distribution and \(\sigma\) is the standard deviation of the distribution.
kurtosis()
The kurtosis of a distribution is defined by the fourth standardised moment, $$k_X = E_X[\frac{x - \mu}{\sigma}^4]$$ where \(E_X\) is the expectation of distribution X, \(\mu\) is the mean of the distribution and \(\sigma\) is the standard deviation of the distribution. Excess Kurtosis is Kurtosis - 3.
entropy()
The entropy of a (discrete) distribution is defined by $$- \sum (f_X)log(f_X)$$ where \(f_X\) is the pdf of distribution X, with an integration analogue for continuous distributions.
mgf()
The moment generating function is defined by $$mgf_X(t) = E_X[exp(xt)]$$ where X is the distribution and \(E_X\) is the expectation of the distribution X.
cf()
The characteristic function is defined by $$cf_X(t) = E_X[exp(xti)]$$ where X is the distribution and \(E_X\) is the expectation of the distribution X.
pgf()
The probability generating function is defined by $$pgf_X(z) = E_X[exp(z^x)]$$ where X is the distribution and \(E_X\) is the expectation of the distribution X.
## ------------------------------------------------
## Method `Categorical$new`
## ------------------------------------------------
# Note probabilities are automatically normalised (if not vectorised)
x <- Categorical$new(elements = list("Bapple", "Banana", 2), probs = c(0.2, 0.4, 1))
# Length of elements and probabilities cannot be changed after construction
x$setParameterValue(probs = c(0.1, 0.2, 0.7))
# d/p/q/r
x$pdf(c("Bapple", "Carrot", 1, 2))
#> [1] 0.1 0.0 0.0 0.7
x$cdf("Banana") # Assumes ordered in construction
#> [1] 0.3
x$quantile(0.42) # Assumes ordered in construction
#> [1] 2
x$rand(10)
#> [[1]]
#> [1] "Banana"
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 2
#>
#> [[4]]
#> [1] "Bapple"
#>
#> [[5]]
#> [1] "Banana"
#>
#> [[6]]
#> [1] 2
#>
#> [[7]]
#> [1] 2
#>
#> [[8]]
#> [1] "Banana"
#>
#> [[9]]
#> [1] 2
#>
#> [[10]]
#> [1] 2
#>
# Statistics
x$mode()
#> [1] 2
summary(x)
#> Categorical Probability Distribution.
#> Parameterised with:
#>
#> Id Support Value Tags
#> <char> <char> <list> <list>
#> 1: elements 𝕍 <list[3]> required
#> 2: probs [0,1]^n 0.1,0.2,0.7 required
#>
#>
#> Quick Statistics
#> Mean: NaN
#> Variance: NaN
#> Skewness: NaN
#> Ex. Kurtosis: NaN
#>
#> Support: {2, Banana, Bapple} Scientific Type: 𝕍
#>
#> Traits: discrete; univariate
#> Properties: asymmetric; undefined; undefined