EvidentialFlux.jl

Evidential Deep Learning is a way to generate predictions and the uncertainty associated with them in one single forward pass. This is in stark contrast to traditional Bayesian neural networks which are typically based on Variational Inference, Markov Chain Monte Carlo, Monte Carlo Dropout or Ensembles.

New to EvidentialFlux?

See the Choosing the Right Layer guide for practical advice on which layer and loss to use for your problem, with real-world examples.

How it works

The framework places a conjugate prior over the parameters of a likelihood function. The neural network predicts the prior's hyperparameters, and the marginal likelihood (prior integrated out) serves as the training loss. This yields calibrated uncertainty in a single forward pass.

LayerLikelihoodPriorMarginalUse case
NIGNormalNormal-Inverse-GammaStudent-TReal-valued regression
PGPoissonGammaNegative BinomialCount regression
EGExponentialGammaLomaxPositive continuous regression
BBBinomialBetaBeta-BinomialProportion estimation
BNBNegative BinomialBetaBeta-NBOverdispersed counts
ZIPZero-Inflated PoissonBeta × GammaZero-Inflated NBZero-inflated counts
VMVon MisesVon MisesVon Mises marginalDirectional/circular data
DIRCategoricalDirichletDir-MultinomialClassification
FDIRCategoricalFlexible DirichletMixture of Dir-MultCalibrated classification
MVENormal(point estimate)NormalSimple variance estimation

Quick example

using Flux, EvidentialFlux

# Build a model with an evidential output layer
model = Chain(Dense(10 => 64, relu), Dense(64 => 64, relu), NIG(64 => 1))
opt_state = Flux.setup(AdamW(1e-3), model)

# Training: use predict + loss
for epoch in 1:1000
    loss, grads = Flux.withgradient(model) do m
        γ, ν, α, β = splitnig(m(x))
        mean(nigloss_scaled(y, γ, ν, α, β, 0.01))
    end
    Flux.update!(opt_state, model, grads[1])
end

# Inference: use predictive for the full picture
r = predictive(model, x_test)
r.ŷ          # predicted value
r.epistemic  # model uncertainty (high when extrapolating)
r.aleatoric  # data noise (high when data is inherently noisy)

Deep Evidential Regression

Deep Evidential Regression[amini2020] applies the principles of Evidential Deep Learning to regression problems.

It works by putting a prior distribution over the likelihood parameters $\mathbf{\theta} = \{\mu, \sigma^2\}$ governing a likelihood model where we observe a dataset $\mathcal{D}=\{x_i, y_i\}_{i=1}^N$ where $y_i$ is assumed to be drawn i.i.d. from a Gaussian distribution.

\[y_i \sim \mathcal{N}(\mu_i, \sigma^2_i)\]

We can express the posterior parameters $\mathbf{\theta}=\{\mu, \sigma^2\}$ as $p(\mathbf{\theta}|\mathcal{D})$. We seek to create an approximation $q(\mu, \sigma^2) = q(\mu)(\sigma^2)$ meaning that we assume that the posterior factorizes. This means we can write $\mu\sim\mathcal{N}(\gamma,\sigma^2\nu^{-1})$ and $\sigma^2\sim\Gamma^{-1}(\alpha,\beta)$. Thus, we can now form

\[p(\mathbf{\theta}|\mathbf{m})=\mathcal{N}(\gamma,\sigma^2\nu^{-1})\Gamma^{-1}(\alpha,\beta)=\mathcal{N-}\Gamma^{-1}(γ,υ,α,β)\]

which can be plugged in to the posterior below.

\[p(\mathbf{\theta}|\mathbf{m}, y_i) = \frac{p(y_i|\mathbf{\theta}, \mathbf{m})p(\mathbf{\theta}|\mathbf{m})}{p(y_i|\mathbf{m})}\]

Now since the likelihood is Gaussian we would like to put a conjugate prior on the parameters of that likelihood and the Normal Inverse Gamma $\mathcal{N-}\Gamma^{-1}(γ, υ, α, β)$ fits the bill. This allows us to express the prediction and the associated uncertainty as below.

\[\underset{Prediction}{\underbrace{\mathbb{E}[\mu]=\gamma}}~~~~ \underset{Aleatoric}{\underbrace{\mathbb{E}[\sigma^2]=\frac{\beta}{\alpha-1}}}~~~~ \underset{Epistemic}{\underbrace{\text{Var}[\mu]=\frac{\beta}{\nu(\alpha-1)}}}\]

The NIG layer outputs 4 tensors for each target variable, namely $\gamma,\nu,\alpha,\beta$. This means that in one forward pass we can estimate the prediction, the heteroskedastic aleatoric uncertainty as well as the epistemic uncertainty.

NIG loss variants

Three loss functions are available for NIG regression, each improving on the previous:

Deep Evidential Classification

We follow [sensoy2018] in our implementation of Deep Evidential Classification. The neural network layer is implemented to output the $\alpha_k$ representing the parameters of a Dirichlet distribution. These parameters has the additional interpretation $\alpha_k = e_k + 1$ where $e_k$ is the evidence for class $k$. Further, it holds that $e_k > 0$ which is the reason for us modeling them with a softplus activation function.

Since we are now constructing a network layer that outputs evidence for each class we can apply Dempster-Shafer Theory (DST) to those outputs. DST is a generalization of the Bayesian framework of thought and works by assigning belief mass to states of interest. We can further concretize this notion by Subjective Logic (SL) which places a Dirichlet distribution over these belief masses. Belief masses are defined as $b_k=e_k/S$ where $e_k$ is the evidence of state $k$ and $S=\sum_i^K(e_i+1)$. Further, SL requires that $K+1$ states all sum up to 1. This practically means that $u+\sum_k^K~b_k=1$ where $u$ represents the uncertainty of the possible K states, or the "I don't know." class.

Now, since $S=\sum_i^K(e_i+1)=S=\sum_i^K(\alpha_i)$ SL refers to $S$ as the Dirichlet strength which is basically a sum of all the collected evidence in favor of the $K$ outcomes. Consequently the uncertainty $u=K/S$ becomes 1 in case there is no evidence available. Therefor, $u$ is a normalized quantity ranging between 0 and 1.

DIR loss variants

API Reference

Layers

EvidentialFlux.NIGType
NIG(in => out, σ=NNlib.softplus; bias=true, init=Flux.glorot_uniform)
NIG(W::AbstractMatrix, [bias, σ])

Create a fully connected layer which implements the NormalInverseGamma Evidential distribution whose forward pass is simply given by:

y = W * x .+ bias

The input x should be a vector of length in, or batch of vectors represented as an in × N matrix, or any array with size(x,1) == in. The out y will be a vector of length out*4, or a batch with size(y) == (out*4, size(x)[2:end]...) The output will have applied the function σ(y) to each row/element of y except the first out ones. Keyword bias=false will switch off trainable bias for the layer. The initialisation of the weight matrix is W = init(out*4, in), calling the function given to keyword init, with default [glorot_uniform]. The weight matrix and/or the bias vector (of length out) may also be provided explicitly. Remember that in this case the number of rows in the weight matrix W MUST be a multiple of 4. The same holds true for the bias vector.

Arguments:

  • (in, out): number of input and output neurons
  • σ: The function to use to secure positive only outputs which defaults to the softplus function.
  • init: The function to use to initialise the weight matrix.
  • bias: Whether to include a trainable bias vector.
source
EvidentialFlux.PGType
PG(in => out, σ=NNlib.softplus; bias=true, init=Flux.glorot_uniform)
PG(W::AbstractMatrix, [bias, σ])

Create a fully connected layer which implements a Poisson-Gamma evidential model for count regression. Places a Gamma(α, β) prior over the Poisson rate parameter λ, yielding a Negative Binomial marginal likelihood.

The output has shape (out*2, batch...) containing [α, β] stacked vertically, where both α and β are passed through σ to ensure positivity.

Use with pgloss for training and splitpg / split_params(PG, y) to decompose the output. The expected count is E[λ] = α/β.

Arguments:

  • (in, out): number of input features and output count targets
  • σ: activation ensuring positivity (default: softplus)
  • init: weight initialisation function (default: glorot_uniform)
  • bias: whether to include a trainable bias vector
source
EvidentialFlux.EGType
EG(in => out, σ=NNlib.softplus; bias=true, init=Flux.glorot_uniform)
EG(W::AbstractMatrix, [bias, σ])

Create a fully connected layer which implements an Exponential-Gamma evidential model for positive continuous regression (durations, prices, distances, etc.). Places a Gamma(α, β) prior over the Exponential rate parameter λ, yielding a Lomax (Pareto Type II) marginal likelihood.

The output has shape (out*2, batch...) containing [α, β] stacked vertically, where both α and β are passed through σ to ensure positivity.

Use with egloss for training and spliteg / split_params(EG, y) to decompose the output. The expected value is E[y] = β/(α-1) for α > 1.

Arguments:

  • (in, out): number of input features and output positive targets
  • σ: activation ensuring positivity (default: softplus)
  • init: weight initialisation function (default: glorot_uniform)
  • bias: whether to include a trainable bias vector
source
EvidentialFlux.BBType
BB(in => out, σ=NNlib.softplus; bias=true, init=Flux.glorot_uniform)
BB(W::AbstractMatrix, [bias, σ])

Create a fully connected layer which implements a Binomial-Beta evidential model for proportion/success-rate estimation. Places a Beta(α, β) prior over the Binomial success probability p, yielding a Beta-Binomial marginal likelihood.

The output has shape (out*2, batch...) containing [α, β] stacked vertically, where both α and β are passed through σ to ensure positivity.

Use with bbloss for training and splitbb / split_params(BB, y) to decompose the output. The predicted probability is E[p] = α/(α+β).

Arguments:

  • (in, out): number of input features and output proportion targets
  • σ: activation ensuring positivity (default: softplus)
  • init: weight initialisation function (default: glorot_uniform)
  • bias: whether to include a trainable bias vector
source
EvidentialFlux.BNBType
BNB(in => out, σ=NNlib.softplus; bias=true, init=Flux.glorot_uniform)
BNB(W::AbstractMatrix, [bias, σ])

Create a fully connected layer which implements a Beta-Negative Binomial evidential model for overdispersed count regression. Places a Beta(α, β) prior over the Negative Binomial success probability p, with a learned dispersion parameter r.

The output has shape (out*3, batch...) containing [r, α, β] stacked vertically, where all three are passed through σ to ensure positivity.

Use with bnbloss for training and splitbnb / split_params(BNB, y) to decompose the output. The predicted count at the Beta mean is r·α/β.

Arguments:

  • (in, out): number of input features and output count targets
  • σ: activation ensuring positivity (default: softplus)
  • init: weight initialisation function (default: glorot_uniform)
  • bias: whether to include a trainable bias vector
source
EvidentialFlux.ZIPType
ZIP(in => out, σ=NNlib.softplus; bias=true, init=Flux.glorot_uniform)
ZIP(W::AbstractMatrix, [bias, σ])

Create a fully connected layer which implements a Zero-Inflated Poisson evidential model for count data with excess zeros. Places independent priors on the zero-inflation probability π ~ Beta(απ, βπ) and the Poisson rate λ ~ Gamma(αλ, βλ), yielding a closed-form marginal likelihood that is a zero-inflated Negative Binomial.

The output has shape (out*4, batch...) containing [α_π, β_π, α_λ, β_λ] stacked vertically, where all four are passed through σ to ensure positivity.

Use with ziploss for training and splitzip / split_params(ZIP, y) to decompose the output. The predicted count is E[Y] = β_π/(α_π+β_π) · α_λ/β_λ.

Arguments:

  • (in, out): number of input features and output count targets
  • σ: activation ensuring positivity (default: softplus)
  • init: weight initialisation function (default: glorot_uniform)
  • bias: whether to include a trainable bias vector
source
EvidentialFlux.VMType
VM(in => out, σ=NNlib.softplus; bias=true, init=Flux.glorot_uniform)
VM(W::AbstractMatrix, [bias, σ])

Create a fully connected layer which implements a Von Mises evidential model for directional/circular regression on angles in [-π, π). Places a Von Mises(μ₀, κ₀) prior over the mean direction μ, with a learned observation concentration κ, yielding a closed-form marginal likelihood on the circle.

The output has shape (out*3, batch...) containing [μ₀, κ₀, κ] stacked vertically, where κ₀ and κ are passed through σ to ensure positivity. The mean direction μ₀ is left unconstrained (periodicity is handled by the cosine in the loss function).

Use with vmloss for training and splitvm / split_params(VM, y) to decompose the output. The predicted direction is μ₀.

Arguments:

  • (in, out): number of input features and output angular targets
  • σ: activation ensuring positivity for concentration parameters (default: softplus)
  • init: weight initialisation function (default: glorot_uniform)
  • bias: whether to include a trainable bias vector
source
EvidentialFlux.DIRType
DIR(in => out; bias=true, init=Flux.glorot_uniform)
DIR(W::AbstractMatrix, [bias])

A Linear layer with a softplus activation function in the end to implement the Dirichlet evidential distribution. In this layer the number of output nodes should correspond to the number of classes you wish to model. This layer should be used to model a Multinomial likelihood with a Dirichlet prior. Thus the posterior is also a Dirichlet distribution. Moreover the type II maximum likelihood, i.e., the marginal likelihood is a Dirichlet-Multinomial distribution. Create a fully connected layer which implements the Dirichlet Evidential distribution whose forward pass is simply given by:

y = softplus.(W * x .+ bias)

The input x should be a vector of length in, or batch of vectors represented as an in × N matrix, or any array with size(x,1) == in. The out y will be a vector of length out, or a batch with size(y) == (out, size(x)[2:end]...) The output will have applied the function softplus(y) to each row/element of y. Keyword bias=false will switch off trainable bias for the layer. The initialisation of the weight matrix is W = init(out, in), calling the function given to keyword init, with default [glorot_uniform]. The weight matrix and/or the bias vector (of length out) may also be provided explicitly.

Arguments:

  • (in, out): number of input and output neurons
  • init: The function to use to initialise the weight matrix.
  • bias: Whether to include a trainable bias vector.
source
EvidentialFlux.FDIRType
FDIR(in => out; bias=true, init=Flux.glorot_uniform)

Create a Flexible Dirichlet evidential layer from Yoon & Kim, "Uncertainty Estimation by Flexible Evidential Deep Learning" (2025). Predicts the parameters of a Flexible Dirichlet (FD) distribution, a mixture of Dirichlets that generalizes the standard Dirichlet used by DIR.

The layer has three output heads from a shared input:

  • α (out): Gamma concentration parameters via exp (α > 0)
  • p (out): allocation probabilities via softmax (Σp = 1)
  • τ (1): shared dispersion via softplus (τ > 0)

The output shape is (out*2 + 1, batch...). Use split_params(FDIR, y) or splitfdir(y) to decompose the output into (α, p, τ).

Standard Dirichlet EDL is a special case when τ=1 and pk = αk/Σα.

Arguments:

  • (in, out): number of input features and output classes
  • init: weight initialisation function (default: glorot_uniform)
  • bias: whether to include trainable bias vectors
source
EvidentialFlux.MVEType
MVE(in => out, σ=NNlib.softplus; bias=true, init=Flux.glorot_uniform)

Create a fully connected layer which implements a Mean-Variance Estimation network. This models a Normal distribution and only captures aleatoric uncertainty (no epistemic). For full uncertainty decomposition, use NIG.

The layer uses two parallel Dense branches internally:

  • Mean head (μ): applies σ as activation (default: softplus)
  • Variance head (σ): always uses softplus to ensure positivity

The output has shape (out*2, batch...) containing [μ, σ] stacked vertically. Use with mveloss for training.

The parallel branch architecture supports selective parameter freezing via Flux.freeze!/Flux.thaw! on the named branches (μw, σw).

Arguments:

  • (in, out): number of input and output neurons
  • σ: activation for the mean head (default: softplus). The variance head always uses softplus.
  • init: The function to use to initialise the weight matrix.
  • bias: Whether to include a trainable bias vector.
source

Loss functions — Regression

EvidentialFlux.niglossFunction
nigloss(y, γ, ν, α, β, λ = 1, ϵ = 0.0001)

This is the standard loss function for Evidential Inference given a NormalInverseGamma posterior for the parameters of the gaussian likelihood function: μ and σ.

Arguments:

  • y: the targets whose shape should be (O, B)
  • γ: the γ parameter of the NIG distribution which corresponds to it's mean and whose shape should be (O, B)
  • ν: the ν parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • β: the β parameter of the NIG distribution which relates to it's uncertainty and whose shape should be (O, B)
  • λ: the weight to put on the regularizer (default: 1)
  • ϵ: the threshold for the regularizer (default: 0.0001)
source
EvidentialFlux.nigloss_scaledFunction
nigloss_scaled(y, γ, ν, α, β, λ = 1, p = 1)

Corrected DER loss from Meinert, Gawlikowski & Lavin, "The Unreasonable Effectiveness of Deep Evidential Regression" (2022). Normalizes the prediction error by the aleatoric uncertainty before scaling by evidence, preventing the network from inflating variance to reduce the regularizer.

Arguments:

  • y: the targets whose shape should be (O, B)
  • γ: the γ parameter of the NIG distribution which corresponds to it's mean and whose shape should be (O, B)
  • ν: the ν parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • β: the β parameter of the NIG distribution which relates to it's uncertainty and whose shape should be (O, B)
  • λ: the weight to put on the regularizer (default: 1)
  • p: the power which to raise the scaled absolute prediction error (default: 1)
source
EvidentialFlux.nigloss_uregFunction
nigloss_ureg(y, γ, ν, α, β, λ = 1, λ₁ = 1)

Uncertainty-regularized evidential regression loss from Ye, Chen, Wei & Zhan, "Uncertainty Regularized Evidential Regression" (AAAI 2024). Adds a term that ensures non-zero gradients in high-uncertainty regions where the standard regularizer's gradient vanishes.

Arguments:

  • y: the targets whose shape should be (O, B)
  • γ: the γ parameter of the NIG distribution which corresponds to it's mean and whose shape should be (O, B)
  • ν: the ν parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • β: the β parameter of the NIG distribution which relates to it's uncertainty and whose shape should be (O, B)
  • λ: the weight to put on the evidence regularizer (default: 1)
  • λ₁: the weight to put on the uncertainty loss (default: 1)
source
EvidentialFlux.nllstudentFunction
nllstudent(y, γ, ν, α, β)

Returns the negative log likelihood of the StudentT distribution which in this case is the model evidence for a gaussian likelihood with a normal inverse gamma prior.

Arguments:

  • y: the targets whose shape should be (O, B)
  • γ: the γ parameter of the NIG distribution which corresponds to it's mean and whose shape should be (O, B)
  • ν: the ν parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • β: the β parameter of the NIG distribution which relates to it's uncertainty and whose shape should be (O, B)
source
EvidentialFlux.mvelossFunction
mveloss(y, μ, σ)

Calculates the Mean-Variance loss for a Normal distribution. This is merely the negative log likelihood. This loss should be used with the MVE network type.

Arguments:

  • y: targets
  • μ: the predicted mean
  • σ: the predicted variance
source
mveloss(y, μ, σ, β)

Calculates the Mean-Variance loss for a Normal distribution. This is merely the negative log likelihood. This loss should be used with the MVE network type.

Arguments:

  • y: targets
  • μ: the predicted mean
  • σ: the predicted variance
  • β: used to increase or decrease the effect of the predicted variance on the loss
source

Loss functions — Count data

EvidentialFlux.pglossFunction
pgloss(y, α, β, λ = 1)

Loss for Poisson-Gamma evidential count regression. Combines the Negative Binomial NLL (from nllpg) with a regularizer that penalizes high confidence (large α) when the predicted rate α/β is far from the observed count.

Arguments:

  • y: non-negative count targets, shape (O, B)
  • α: Gamma shape parameter (> 0) from a PG layer, shape (O, B)
  • β: Gamma rate parameter (> 0) from a PG layer, shape (O, B)
  • λ: regularization weight (default: 1)
source
EvidentialFlux.nllpgFunction
nllpg(y, α, β)

Negative log-likelihood of the Negative Binomial marginal obtained by integrating out the Poisson rate λ ~ Gamma(α, β):

p(y|α,β) = Γ(y+α) / [Γ(y+1)·Γ(α)] · βᵅ / (β+1)^(y+α)

Use this with the PG layer for evidential count regression.

Arguments:

  • y: non-negative count targets, shape (O, B)
  • α: Gamma shape parameter (> 0), shape (O, B)
  • β: Gamma rate parameter (> 0), shape (O, B)
source
EvidentialFlux.eglossFunction
egloss(y, α, β, λ = 1)

Loss for Exponential-Gamma evidential positive regression. Combines the Lomax NLL (from nlleg) with a regularizer that penalizes high confidence (large α) when the predicted duration β/(α-1) is far from the observed value.

Arguments:

  • y: positive continuous targets, shape (O, B)
  • α: Gamma shape parameter (> 0) from an EG layer, shape (O, B)
  • β: Gamma rate parameter (> 0) from an EG layer, shape (O, B)
  • λ: regularization weight (default: 1)
source
EvidentialFlux.nllegFunction
nlleg(y, α, β)

Negative log-likelihood of the Lomax (Pareto Type II) marginal obtained by integrating out λ ~ Gamma(α, β) from Exp(y | λ):

p(y|α,β) = α·βᵅ / (β+y)^(α+1)

Use this with the EG layer for evidential positive continuous regression.

Arguments:

  • y: positive continuous targets, shape (O, B)
  • α: Gamma shape parameter (> 0), shape (O, B)
  • β: Gamma rate parameter (> 0), shape (O, B)
source
EvidentialFlux.bblossFunction
bbloss(k, n, α, β, λ = 1)

Loss for Binomial-Beta evidential proportion estimation. Combines the Beta-Binomial NLL (from nllbb) with a regularizer that penalizes high confidence (large α+β) when the predicted probability α/(α+β) is far from the observed proportion k/n.

Arguments:

  • k: observed successes (non-negative), shape (O, B)
  • n: number of trials (positive), shape (O, B)
  • α: Beta shape parameter (> 0) from a BB layer, shape (O, B)
  • β: Beta shape parameter (> 0) from a BB layer, shape (O, B)
  • λ: regularization weight (default: 1)
source
EvidentialFlux.nllbbFunction
nllbb(k, n, α, β)

Negative log-likelihood of the Beta-Binomial marginal obtained by integrating out p ~ Beta(α, β) from Binomial(k | n, p):

p(k|n,α,β) = C(n,k) · B(k+α, n-k+β) / B(α,β)

Use this with the BB layer for evidential proportion estimation.

Arguments:

  • k: observed successes (non-negative), shape (O, B)
  • n: number of trials (positive), shape (O, B)
  • α: Beta shape parameter (> 0), shape (O, B)
  • β: Beta shape parameter (> 0), shape (O, B)
source
EvidentialFlux.bnblossFunction
bnbloss(y, r, α, β, λ = 1)

Loss for Beta-Negative Binomial evidential count regression. Combines the Beta-NB NLL (from nllbnb) with a regularizer that penalizes high confidence (large α+β) when the predicted count r·α/β is far from the observed count.

Arguments:

  • y: non-negative count targets, shape (O, B)
  • r: NB dispersion parameter (> 0) from a BNB layer, shape (O, B)
  • α: Beta shape parameter (> 0) from a BNB layer, shape (O, B)
  • β: Beta shape parameter (> 0) from a BNB layer, shape (O, B)
  • λ: regularization weight (default: 1)
source
EvidentialFlux.nllbnbFunction
nllbnb(y, r, α, β)

Negative log-likelihood of the Beta-Negative Binomial marginal obtained by integrating out p ~ Beta(α, β) from NB(y | r, p):

p(y|r,α,β) = [Γ(y+r)/(Γ(y+1)Γ(r))] · B(y+α, r+β) / B(α, β)

Use this with the BNB layer for evidential overdispersed count regression.

Arguments:

  • y: non-negative count targets, shape (O, B)
  • r: NB dispersion parameter (> 0), shape (O, B)
  • α: Beta shape parameter (> 0), shape (O, B)
  • β: Beta shape parameter (> 0), shape (O, B)
source
EvidentialFlux.ziplossFunction
ziploss(y, α_π, β_π, α_λ, β_λ, λ = 1)

Loss for Zero-Inflated Poisson evidential count regression. Combines the ZINB NLL (from nllzip) with a regularizer that penalizes high confidence when the predicted count is far from the observed count.

Arguments:

  • y: non-negative count targets, shape (O, B)
  • α_π: Beta shape parameter for zero-inflation (> 0), shape (O, B)
  • β_π: Beta shape parameter for zero-inflation (> 0), shape (O, B)
  • α_λ: Gamma shape parameter for Poisson rate (> 0), shape (O, B)
  • β_λ: Gamma rate parameter for Poisson rate (> 0), shape (O, B)
  • λ: regularization weight (default: 1)
source
EvidentialFlux.nllzipFunction
nllzip(y, α_π, β_π, α_λ, β_λ)

Negative log-likelihood of the Zero-Inflated Negative Binomial marginal obtained by integrating out π ~ Beta(απ, βπ) and λ ~ Gamma(αλ, βλ) from the ZIP(π, λ) likelihood:

p(0|α_π,β_π,α_λ,β_λ) = E[π] + E[1-π]·p_NB(0|α_λ,β_λ)
p(y|α_π,β_π,α_λ,β_λ) = E[1-π]·p_NB(y|α_λ,β_λ)   for y > 0

where p_NB is the Negative Binomial PMF from the Poisson-Gamma conjugacy and E[π] = α_π/(α_π+β_π).

Use this with the ZIP layer for evidential zero-inflated count regression.

Arguments:

  • y: non-negative count targets, shape (O, B)
  • α_π: Beta shape parameter for zero-inflation (> 0), shape (O, B)
  • β_π: Beta shape parameter for zero-inflation (> 0), shape (O, B)
  • α_λ: Gamma shape parameter for Poisson rate (> 0), shape (O, B)
  • β_λ: Gamma rate parameter for Poisson rate (> 0), shape (O, B)
source
EvidentialFlux.vmlossFunction
vmloss(θ, μ₀, κ₀, κ, λ = 1)

Loss for Von Mises evidential directional regression. Combines the marginal NLL (from nllvm) with a regularizer that penalizes high prior concentration (κ₀) when the predicted direction is far from the observed angle, using the circular distance 1 - cos(θ - μ₀).

Arguments:

  • θ: angular targets in radians, shape (O, B)
  • μ₀: prior mean direction (unconstrained), shape (O, B)
  • κ₀: prior concentration parameter (> 0), shape (O, B)
  • κ: observation concentration parameter (> 0), shape (O, B)
  • λ: regularization weight (default: 1)
source
EvidentialFlux.nllvmFunction
nllvm(θ, μ₀, κ₀, κ)

Negative log-likelihood of the Von Mises marginal obtained by integrating out the mean direction μ ~ VonMises(μ₀, κ₀) from VonMises(θ | μ, κ):

p(θ|μ₀,κ₀,κ) = I₀(R) / (2π · I₀(κ) · I₀(κ₀))

where R = √(κ² + κ₀² + 2κκ₀cos(θ-μ₀)) and I₀ is the modified Bessel function of the first kind of order 0.

Use this with the VM layer for evidential directional regression.

Arguments:

  • θ: angular targets in radians, shape (O, B)
  • μ₀: prior mean direction (unconstrained), shape (O, B)
  • κ₀: prior concentration parameter (> 0), shape (O, B)
  • κ: observation concentration parameter (> 0), shape (O, B)
source

Loss functions — Classification

EvidentialFlux.dirlossFunction
dirloss(y, α, t)

Regularized version of a type II maximum likelihood for the Multinomial(p) distribution where the parameter p, which follows a Dirichlet distribution has been integrated out.

Arguments:

  • y: the targets whose shape should be (O, B)
  • α: the parameters of a Dirichlet distribution representing the belief in each class which shape should be (O, B)
  • t: counter for the current epoch being evaluated
source
EvidentialFlux.dirloss_corFunction
dirloss_cor(y, α, t)

Dirichlet classification loss with correct evidence regularization from Pandey, Choi & Yu, "Generalized Regularized Evidential Deep Learning Models" (2025).

Extends dirloss with an additional term ℒ_cor that prevents gradient vanishing when the ground-truth class has low evidence (the "learning freeze" problem). The correction is weighted by the vacuity ν = K/S and only active when the pre-activation logit for the ground-truth class is negative (i.e., evidence below the softplus inflection point).

The total loss is ℒ_evid + λₜ·ℒ_inc + ℒ_cor where ℒ_cor = -𝟙(o_gt < 0)·ν·o_gt.

Arguments:

  • y: one-hot encoded targets, shape (K, B)
  • α: Dirichlet concentration parameters from a DIR layer, shape (K, B)
  • t: current epoch (used for KL annealing on ℒ_inc)
source
EvidentialFlux.dirmultlossFunction
dirmultloss(y, α)

Negative log-likelihood of the Dirichlet-Multinomial distribution, obtained by integrating out p ~ Dir(α) from Multinomial(y | n, p). Use this with the DIR layer when targets are count vectors (e.g., word counts, event tallies) rather than one-hot categories.

p(y|α) = [n!/Πₖyₖ!] · B(y+α)/B(α)

where n = Σyₖ, S = Σαₖ, and B is the multivariate Beta function.

Unlike dirloss (which uses a Bayes Risk MSE + KL regularizer for one-hot targets), this is a proper type II maximum likelihood loss that needs no additional regularization.

Arguments:

  • y: non-negative count targets, shape (K, B) where K is the number of categories
  • α: Dirichlet concentration parameters from a DIR layer, shape (K, B)
source
EvidentialFlux.fdirlossFunction
fdirloss(y, α, p, τ)

Loss for the Flexible Dirichlet EDL model from Yoon & Kim, "Uncertainty Estimation by Flexible Evidential Deep Learning" (2025).

Computes the expected Brier score under the Flexible Dirichlet distribution plus a Brier score regularizer on the allocation probabilities p. The FD distribution is a mixture of Dirichlets Σⱼ pⱼ Dir(α + τeⱼ), and the loss decomposes analytically as:

ℒ = Σₖ [E_FD[πₖ²] - 2yₖ E[πₖ] + yₖ] + ‖y - p‖²

No manual hyperparameter tuning is needed for the regularization (unlike the KL-based regularizer in dirloss).

Arguments:

  • y: one-hot encoded targets, shape (K, B)
  • α: Gamma concentration parameters (> 0) from an FDIR layer, shape (K, B)
  • p: allocation probabilities (Σp = 1) from an FDIR layer, shape (K, B)
  • τ: shared dispersion parameter (> 0) from an FDIR layer, shape (1, B)
source

Prediction

EvidentialFlux.predictiveFunction
predictive(m, x)

Inference-time prediction returning a NamedTuple with:

  • ŷ: point prediction in data space (posterior predictive mean)
  • epistemic: epistemic uncertainty (nothing if not available for this layer)
  • aleatoric: aleatoric uncertainty (nothing if not available for this layer)
  • params: raw distributional parameters from predict

Use predict during training (returns raw parameters for loss computation). Use predictive at inference time for a complete uncertainty-aware output.

Examples

r = predictive(model, x)
r.ŷ          # point prediction
r.epistemic  # model uncertainty
r.aleatoric  # data noise
r.params     # raw (γ, ν, α, β) etc. for advanced use
source
EvidentialFlux.predictive_meanFunction
predictive_mean(::Type{<:NIG}, params)
predictive_mean(::Type{<:PG}, params)
predictive_mean(::Type{<:BNB}, params)
predictive_mean(::Type{<:DIR}, params)
predictive_mean(::Type{<:FDIR}, params)
predictive_mean(::Type{<:MVE}, params)

Returns the point prediction in data space given the raw distributional parameters. This is the mean of the posterior predictive distribution.

source
EvidentialFlux.predictFunction
predict(m, x)

Returns the predictions along with the available epistemic and aleatoric uncertainty. Dispatches on the last layer type of the model:

  • NIG: returns (γ, ν, α, β) NamedTuple
  • MVE: returns (μ, σ) NamedTuple
  • DIR: returns α directly (raw array, for backward compatibility)

Arguments:

  • m: the model whose last layer is an AbstractEvidentialLayer
  • x: the input data which has to be given as an array or vector
source
EvidentialFlux.split_paramsFunction
split_params(::Type{<:NIG}, y)

Split NIG layer output into a NamedTuple (γ, ν, α, β).

split_params(::Type{<:MVE}, y)

Split MVE layer output into a NamedTuple (μ, σ).

split_params(::Type{<:DIR}, y)

Wrap DIR layer output into a NamedTuple (α,).

split_params(::Type{<:PG}, y)

Split PG layer output into a NamedTuple (α, β).

split_params(::Type{<:EG}, y)

Split EG layer output into a NamedTuple (α, β).

split_params(::Type{<:BB}, y)

Split BB layer output into a NamedTuple (α, β).

split_params(::Type{<:BNB}, y)

Split BNB layer output into a NamedTuple (r, α, β).

split_params(::Type{<:FDIR}, y)

Split FDIR layer output into a NamedTuple (α, p, τ). The first K rows are α, the next K rows are p, and the last row is τ, where K = (size(y,1) - 1) ÷ 2.

source
EvidentialFlux.splitnigFunction
splitnig(y)

Splits the concatenated output of a NIG layer into its four components: γ, ν, α, β. The input y should have shape (nout*4, batch...) where nout is the number of output neurons.

Arguments:

  • y: the concatenated NIG output with shape (nout*4, batch...)

Returns:

  • (γ, ν, α, β): tuple of arrays each with shape (nout, batch...)
source
EvidentialFlux.splitmveFunction
splitmve(y)

Splits the concatenated output of an MVE layer into its two components: μ, σ. The input y should have shape (nout*2, batch...) where nout is the number of output neurons.

Arguments:

  • y: the concatenated MVE output with shape (nout*2, batch...)

Returns:

  • (μ, σ): tuple of arrays each with shape (nout, batch...)
source
EvidentialFlux.splitpgFunction
splitpg(y)

Splits the concatenated output of a PG layer into its two components: α, β. The input y should have shape (nout*2, batch...).

Arguments:

  • y: the concatenated PG output with shape (nout*2, batch...)

Returns:

  • (α, β): tuple of arrays each with shape (nout, batch...)
source
EvidentialFlux.splitegFunction
spliteg(y)

Splits the concatenated output of an EG layer into its two components: α, β. The input y should have shape (nout*2, batch...).

Arguments:

  • y: the concatenated EG output with shape (nout*2, batch...)

Returns:

  • (α, β): tuple of arrays each with shape (nout, batch...)
source
EvidentialFlux.splitbbFunction
splitbb(y)

Splits the concatenated output of a BB layer into its two components: α, β. The input y should have shape (nout*2, batch...).

Arguments:

  • y: the concatenated BB output with shape (nout*2, batch...)

Returns:

  • (α, β): tuple of arrays each with shape (nout, batch...)
source
EvidentialFlux.splitbnbFunction
splitbnb(y)

Splits the concatenated output of a BNB layer into its three components: r, α, β. The input y should have shape (nout*3, batch...).

Arguments:

  • y: the concatenated BNB output with shape (nout*3, batch...)

Returns:

  • (r, α, β): tuple of arrays each with shape (nout, batch...)
source
EvidentialFlux.splitzipFunction
splitzip(y)

Splits the concatenated output of a ZIP layer into its four components: απ, βπ, αλ, βλ. The input y should have shape (nout*4, batch...).

Arguments:

  • y: the concatenated ZIP output with shape (nout*4, batch...)

Returns:

  • (α_π, β_π, α_λ, β_λ): tuple of arrays each with shape (nout, batch...)
source
EvidentialFlux.splitvmFunction
splitvm(y)

Splits the concatenated output of a VM layer into its three components: μ₀, κ₀, κ. The input y should have shape (nout*3, batch...).

Arguments:

  • y: the concatenated VM output with shape (nout*3, batch...)

Returns:

  • (μ₀, κ₀, κ): tuple of arrays each with shape (nout, batch...)
source
EvidentialFlux.splitfdirFunction
splitfdir(y)

Splits the concatenated output of an FDIR layer into its three components: α, p, τ. The input y should have shape (K*2 + 1, batch...) where K is the number of classes.

Arguments:

  • y: the concatenated FDIR output with shape (K*2 + 1, batch...)

Returns:

  • (α, p, τ): tuple where α and p have shape (K, batch...) and τ has shape (1, batch...)
source

Uncertainty and evidence

EvidentialFlux.epistemicFunction
epistemic(ν)

This is the epistemic uncertainty as recommended by Meinert, Nis, Jakob Gawlikowski, and Alexander Lavin. 'The Unreasonable Effectiveness of Deep Evidential Regression.' arXiv, May 20, 2022. http://arxiv.org/abs/2205.10060.

Arguments:

  • ν: the ν parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
source
epistemic(::Type{<:NIG}, ν, α, β)

Epistemic uncertainty for the NIG model: 1/√ν (Meinert et al. 2022).

source
epistemic(::Type{<:DIR}, α)

Epistemic uncertainty for the Dirichlet model: K/Σα.

source
epistemic(::Type{<:EG}, α, β)

Epistemic uncertainty for the Exponential-Gamma model: the variance of the expected duration E[Y|λ] = 1/λ under the Gamma prior.

Var[1/λ] = β² / ((α-1)²(α-2))

Requires α > 2 for the moments to exist; α is clamped internally.

source
epistemic(::Type{<:BB}, α, β)

Epistemic uncertainty for the Binomial-Beta model: the variance of the success probability under the Beta prior.

Var[p] = αβ / ((α+β)²(α+β+1))
source
epistemic(::Type{<:PG}, α, β)

Epistemic uncertainty for the Poisson-Gamma model: the variance of the Poisson rate under the Gamma prior, Var[λ] = α/β².

source
epistemic(::Type{<:BNB}, r, α, β)

Epistemic uncertainty for the Beta-Negative Binomial model: the variance of the conditional mean E[Y|p] = rp/(1-p) under the Beta prior.

Var[E[Y|p]] = r²·α(α+β-1) / ((β-1)²(β-2))

Requires β > 2 for the moments to exist; β is clamped internally.

source
epistemic(::Type{<:ZIP}, α_π, β_π, α_λ, β_λ)

Epistemic uncertainty for the Zero-Inflated Poisson model: the variance of the conditional mean E[Y|π,λ] = (1-π)λ under the independent Beta and Gamma priors.

Var[(1-π)λ] = E[(1-π)²]E[λ²] - (E[1-π])²(E[λ])²

where E[(1-π)²] = β_π(β_π+1) / (S_π(S_π+1)) and E[λ²] = α_λ(α_λ+1)/β_λ².

source
epistemic(::Type{<:VM}, κ₀)

Epistemic uncertainty for the Von Mises model: the circular variance of the prior on the mean direction μ.

CV[μ] = 1 - A(κ₀)

where A(κ) = I₁(κ)/I₀(κ) is the mean resultant length. Ranges from 0 (certain, κ₀ → ∞) to 1 (uniform on the circle, κ₀ → 0).

source
epistemic(::Type{<:FDIR}, α, p, τ)

Epistemic uncertainty for the Flexible Dirichlet model (Yoon & Kim 2025). Returns a (1, B) scalar per sample:

EU = Σₖ [μₖ(1-μₖ)/(S+1) + τ²pₖ(1-pₖ)/(S(S+1))]

where μₖ = (αₖ+τpₖ)/S and S = Σαₖ + τ.

source
EvidentialFlux.aleatoricFunction
aleatoric(ν, α, β)

This is the aleatoric uncertainty as recommended by Meinert, Nis, Jakob Gawlikowski, and Alexander Lavin. 'The Unreasonable Effectiveness of Deep Evidential Regression.' arXiv, May 20, 2022. http://arxiv.org/abs/2205.10060. This is precisely the $σ_{St}$ from the Student T distribution.

Arguments:

  • ν: the ν parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • β: the β parameter of the NIG distribution which relates to it's uncertainty and whose shape should be (O, B)
source
aleatoric(::Type{<:NIG}, ν, α, β)

Aleatoric uncertainty for the NIG model: the Student-T standard deviation σ_St = β(1+ν)/(να) (Meinert et al. 2022).

source
epistemic(::Type{<:MVE}, σ)

Aleatoric uncertainty for the MVE model: the predicted variance σ itself. MVE has no epistemic uncertainty — it only models aleatoric.

source
aleatoric(::Type{<:EG}, α, β)

Aleatoric uncertainty for the Exponential-Gamma model: the expected Exponential variance under the Gamma prior.

E[Var[Y|λ]] = E[1/λ²] = β² / ((α-1)(α-2))

Requires α > 2 for the moments to exist; α is clamped internally.

source
aleatoric(::Type{<:BB}, α, β)

Aleatoric uncertainty for the Binomial-Beta model: the expected Bernoulli variance under the Beta prior.

E[p(1-p)] = αβ / ((α+β)(α+β+1))
source
aleatoric(::Type{<:PG}, α, β)

Aleatoric uncertainty for the Poisson-Gamma model: the expected Poisson variance, E[Var[Y|λ]] = E[λ] = α/β.

source
aleatoric(::Type{<:BNB}, r, α, β)

Aleatoric uncertainty for the Beta-Negative Binomial model: the expected NB variance under the Beta prior.

E[Var[Y|p]] = r·α(α+β-1) / ((β-1)(β-2))

Requires β > 2 for the moments to exist; β is clamped internally.

source
aleatoric(::Type{<:ZIP}, α_π, β_π, α_λ, β_λ)

Aleatoric uncertainty for the Zero-Inflated Poisson model: the expected variance of the ZIP observation given the parameters.

E[Var[Y|π,λ]] = E[1-π]·E[λ] + E[π(1-π)]·E[λ²]

where E[π(1-π)] = α_π β_π / (S_π(S_π+1)).

source
aleatoric(::Type{<:VM}, κ)

Aleatoric uncertainty for the Von Mises model: the circular variance of the observation noise.

CV[θ|μ] = 1 - A(κ)

where A(κ) = I₁(κ)/I₀(κ) is the mean resultant length.

source
aleatoric(::Type{<:FDIR}, α, p, τ)

Aleatoric uncertainty for the Flexible Dirichlet model: AU = TU - EU where TU = 1 - Σₖ μₖ² is the total uncertainty. Returns (1, B).

source
EvidentialFlux.uncertaintyFunction
uncertainty(ν, α, β)

Calculates the epistemic uncertainty of the predictions from the Normal Inverse Gamma (NIG) model. Given a $\text{N-}\Gamma^{-1}(γ, υ, α, β)$ distribution we can calculate the epistemic uncertainty as

$Var[μ] = \frac{β}{ν(α-1)}$

Arguments:

  • ν: the ν parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • β: the β parameter of the NIG distribution which relates to it's uncertainty and whose shape should be (O, B)
source
uncertainty(α, β)

Calculates the aleatoric uncertainty of the predictions from the Normal Inverse Gamma (NIG) model. Given a $\text{N-}\Gamma^{-1}(γ, υ, α, β)$ distribution we can calculate the aleatoric uncertainty as

$\mathbb{E}[σ^2] = \frac{β}{(α-1)}$

Arguments:

  • α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • β: the β parameter of the NIG distribution which relates to it's uncertainty and whose shape should be (O, B)
source
uncertainty(α)

Calculates the epistemic uncertainty associated with a MultinomialDirichlet model (DIR) layer.

  • α: the α parameter of the Dirichlet distribution which relates to it's concentrations and whose shape should be (O, B)
source
uncertainty(::Type{<:NIG}, ν, α, β)

Epistemic uncertainty for the NIG model: Var[μ] = β/(ν(α-1)).

source
EvidentialFlux.evidenceFunction
evidence(α)

Calculates the total evidence of assigning each observation in α to the respective class for a DIR layer.

  • α: the α parameter of the Dirichlet distribution which relates to it's concentrations and whose shape should be (O, B)
source
evidence(ν, α)

Returns the evidence for the data pushed through the NIG layer. In this setting one way of looking at the NIG distribution is as ν virtual observations governing the mean μ of the likelihood and α virtual observations governing the variance $\sigma^2$. The evidence is then a sum of the virtual observations. Amini et. al. goes through this interpretation in their 2020 paper.

Arguments:

  • ν: the ν parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
  • α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
source

Index

References

  • amini2020Amini, Alexander, Wilko Schwarting, Ava Soleimany, and Daniela Rus. "Deep Evidential Regression." ArXiv:1910.02600 [Cs, Stat], November 24, 2020. http://arxiv.org/abs/1910.02600.
  • sensoy2018Sensoy, Murat, Lance Kaplan, and Melih Kandemir. "Evidential Deep Learning to Quantify Classification Uncertainty." Advances in Neural Information Processing Systems 31 (June 2018): 3179-89.
  • meinert2022Meinert, Nis, Jakob Gawlikowski, and Alexander Lavin. "The Unreasonable Effectiveness of Deep Evidential Regression." arXiv, May 20, 2022. http://arxiv.org/abs/2205.10060.
  • ye2024Ye, K., Chen, T., Wei, H. & Zhan, L. "Uncertainty Regularized Evidential Regression." AAAI 38, 16460-16468 (2024).
  • pandey2025Pandey, D. S., Choi, H. & Yu, Q. "Generalized Regularized Evidential Deep Learning Models." arXiv (2025).
  • yoon2025Yoon, T. & Kim, H. "Uncertainty Estimation by Flexible Evidential Deep Learning." arXiv (2025).