Mathematical and statistical functions for the Categorical distribution, which is commonly used in classification supervised learning.

## Value

Returns an R6 object inheriting from class SDistribution.

## Details

The Categorical distribution parameterised with a given support set, $$x_1,...,x_k$$, and respective probabilities, $$p_1,...,p_k$$, is defined by the pmf, $$f(x_i) = p_i$$ for $$p_i, i = 1,\ldots,k; \sum p_i = 1$$.

Sampling from this distribution is performed with the sample function with the elements given as the support set and the probabilities from the probs parameter. The cdf and quantile assumes that the elements are supplied in an indexed order (otherwise the results are meaningless).

The number of points in the distribution cannot be changed after construction.

## Distribution support

The distribution is supported on $$x_1,...,x_k$$.

## Default Parameterisation

Cat(elements = 1, probs = 1)

N/A

N/A

## References

McLaughlin, M. P. (2001). A compendium of common probability distributions (pp. 2014-01). Michael P. McLaughlin.

## See also

Other discrete distributions: Arrdist, Bernoulli, Binomial, Degenerate, DiscreteUniform, EmpiricalMV, Empirical, Geometric, Hypergeometric, Logarithmic, Matdist, Multinomial, NegativeBinomial, WeightedDiscrete

Other univariate distributions: Arcsine, Arrdist, Bernoulli, BetaNoncentral, Beta, Binomial, Cauchy, ChiSquaredNoncentral, ChiSquared, Degenerate, DiscreteUniform, Empirical, Erlang, Exponential, FDistributionNoncentral, FDistribution, Frechet, Gamma, Geometric, Gompertz, Gumbel, Hypergeometric, InverseGamma, Laplace, Logarithmic, Logistic, Loglogistic, Lognormal, Matdist, NegativeBinomial, Normal, Pareto, Poisson, Rayleigh, ShiftedLoglogistic, StudentTNoncentral, StudentT, Triangular, Uniform, Wald, Weibull, WeightedDiscrete

## Super classes

distr6::Distribution -> distr6::SDistribution -> Categorical

## Public fields

name

Full name of distribution.

short_name

Short name of distribution for printing.

description

Brief description of the distribution.

alias

Alias of the distribution.

## Active bindings

properties

Returns distribution properties, including skewness type and symmetry.

## Methods

Inherited methods

### Method new()

Creates a new instance of this R6 class.

Categorical$new(elements = NULL, probs = NULL, decorators = NULL) #### Arguments elements list() Categories in the distribution, see examples. probs numeric() Probabilities of respective categories occurring. decorators (character()) Decorators to add to the distribution during construction. #### Examples # Note probabilities are automatically normalised (if not vectorised) x <- Categorical$new(elements = list("Bapple", "Banana", 2), probs = c(0.2, 0.4, 1))

# Length of elements and probabilities cannot be changed after construction
x$setParameterValue(probs = c(0.1, 0.2, 0.7)) # d/p/q/r x$pdf(c("Bapple", "Carrot", 1, 2))
x$cdf("Banana") # Assumes ordered in construction x$quantile(0.42) # Assumes ordered in construction
x$rand(10) # Statistics x$mode()

summary(x)

### Method mean()

The arithmetic mean of a (discrete) probability distribution X is the expectation $$E_X(X) = \sum p_X(x)*x$$ with an integration analogue for continuous distributions.

#### Arguments

which

(character(1) | numeric(1)
Ignored if distribution is unimodal. Otherwise "all" returns all modes, otherwise specifies which mode to return.

### Method variance()

The variance of a distribution is defined by the formula $$var_X = E[X^2] - E[X]^2$$ where $$E_X$$ is the expectation of distribution X. If the distribution is multivariate the covariance matrix is returned.

#### Arguments

...

Unused.

### Method kurtosis()

The kurtosis of a distribution is defined by the fourth standardised moment, $$k_X = E_X[\frac{x - \mu}{\sigma}^4]$$ where $$E_X$$ is the expectation of distribution X, $$\mu$$ is the mean of the distribution and $$\sigma$$ is the standard deviation of the distribution. Excess Kurtosis is Kurtosis - 3.

#### Arguments

base

(integer(1))
Base of the entropy logarithm, default = 2 (Shannon entropy)

...

Unused.

### Method mgf()

The moment generating function is defined by $$mgf_X(t) = E_X[exp(xt)]$$ where X is the distribution and $$E_X$$ is the expectation of the distribution X.

#### Arguments

t

(integer(1))
t integer to evaluate function at.

...

Unused.

### Method pgf()

The probability generating function is defined by $$pgf_X(z) = E_X[exp(z^x)]$$ where X is the distribution and $$E_X$$ is the expectation of the distribution X.

#### Arguments

deep

Whether to make a deep clone.

## Examples


## ------------------------------------------------
## Method Categorical$new ## ------------------------------------------------ # Note probabilities are automatically normalised (if not vectorised) x <- Categorical$new(elements = list("Bapple", "Banana", 2), probs = c(0.2, 0.4, 1))

# Length of elements and probabilities cannot be changed after construction
x$setParameterValue(probs = c(0.1, 0.2, 0.7)) # d/p/q/r x$pdf(c("Bapple", "Carrot", 1, 2))
#> [1] 0.1 0.0 0.0 0.7
x$cdf("Banana") # Assumes ordered in construction #> [1] 0.3 x$quantile(0.42) # Assumes ordered in construction
#> [1] 2
x$rand(10) #> [[1]] #> [1] "Banana" #> #> [[2]] #> [1] 2 #> #> [[3]] #> [1] 2 #> #> [[4]] #> [1] "Bapple" #> #> [[5]] #> [1] "Banana" #> #> [[6]] #> [1] 2 #> #> [[7]] #> [1] 2 #> #> [[8]] #> [1] "Banana" #> #> [[9]] #> [1] 2 #> #> [[10]] #> [1] 2 #> # Statistics x$mode()
#> [1] 2

summary(x)
#> Categorical Probability Distribution.
#> Parameterised with:
#>
#>          Id Support       Value     Tags
#>      <char>  <char>      <list>   <list>
#> 1: elements       𝕍   <list[3]> required
#> 2:    probs [0,1]^n 0.1,0.2,0.7 required
#>
#>
#> Quick Statistics
#> 	Mean:		NaN
#> 	Variance:	NaN
#> 	Skewness:	NaN
#> 	Ex. Kurtosis:	NaN
#>
#> Support: {2, Banana, Bapple} 	Scientific Type: 𝕍
#>
#> Traits:		discrete; univariate
#> Properties:	asymmetric; undefined; undefined