FitDistr: Maximum-likelihood Fitting of Univariate Distributions
Source:R/1.3_Distr_DistrCollection_Functions.R
FitDistr.RdMaximum-likelihood fitting of univariate distributions, allowing parameters to be held fixed if desired.
Arguments
- x
A numeric vector of length at least one containing only finite values. Either a character string or a function returning a density evaluated at its first argument.
- densfun
character string specifying the density function to be used for fitting the distribution. Distributions `"beta"`, `"cauchy"`, `"chi-squared"`, `"exponential"`, `"gamma"`, `"geometric"`, `"log-normal"`, `"lognormal"`, `"logistic"`, `"negative binomial"`, `"normal"`, `"Poisson"`, `"t"` and "weibull" are recognised, case being ignored.
- start
A named list giving the parameters to be optimized with initial values. This can be omitted for some of the named distributions and must be for others (see Details).
- ...
Additional parameters, either for `densfun` or for `optim`. In particular, it can be used to specify bounds via `lower` or `upper` or both. If arguments of `densfun` (or the density function corresponding to a character-string specification) are included they will be held fixed.
Value
The function `FitDistr` returns an object of class `fitdistr`, which is a list containing:
- estimate
a named vector of parameter estimates.
- sd
a named vector of the estimated standard errors for the parameters.
- vcov
the estimated variance-covariance matrix of the parameter estimates.
- loglik
the log-likelihood of the fitted model.
- n
length vector.
Details
For the Normal, log-Normal, geometric, exponential and Poisson distributions the closed-form MLEs (and exact standard errors) are used, and `start` should not be supplied.
For all other distributions, direct optimization of the log-likelihood is performed using `optim`. The estimated standard errors are taken from the observed information matrix, calculated by a numerical approximation. For one-dimensional problems the Nelder-Mead method is used and for multi-dimensional problems the BFGS method, unless arguments named `lower` or `upper` are supplied (when `L-BFGS-B` is used) or `method` is supplied explicitly.
For the `"t"` named distribution the density is taken to be the location-scale family with location `m` and scale `s`.
For the following named distributions, reasonable starting values will be computed if `start` is omitted or only partially specified: `"cauchy"`, `"gamma"`, `"logistic"`, `"negative binomial"` (parametrized by mu and size), `"t"` and `"weibull"`. Note that these starting values may not be good enough if the fit is poor: in particular they are not resistant to outliers unless the fitted distribution is long-tailed.
There are `print`, `coef`, `vcov` and `logLik` methods for class `"FitDistr"`.
Examples
set.seed(123)
x = rgamma(100, shape = 5, rate = 0.1)
FitDistr(x, "gamma")
#> $estimate
#> shape rate
#> 6.4869913 0.1365106
#>
#> $sd
#> shape rate
#> 0.894601 0.019573
#>
#> $vcov
#> shape rate
#> shape 0.80031094 0.0168407776
#> rate 0.01684078 0.0003831024
#>
#> $loglik
#> [1] -429.1792
#>
#> $n
#> [1] 100
#>
#> attr(,"class")
#> [1] "FitDistr"
# Now do this directly with more control.
FitDistr(x, dgamma, list(shape = 1, rate = 0.1), lower = 0.001)
#> $estimate
#> shape rate
#> 6.4868655 0.1365101
#>
#> $sd
#> shape rate
#> 0.89438949 0.01956818
#>
#> $vcov
#> shape rate
#> shape 0.79993257 0.0168323391
#> rate 0.01683234 0.0003829136
#>
#> $loglik
#> [1] -429.1792
#>
#> $n
#> [1] 100
#>
#> attr(,"class")
#> [1] "FitDistr"
set.seed(123)
x2 = rt(250, df = 9)
FitDistr(x2, "t", df = 9)
#> $estimate
#> m s
#> -0.01069496 1.04410551
#>
#> $sd
#> m s
#> 0.07222623 0.05434369
#>
#> $vcov
#> m s
#> m 5.216629e-03 -2.155702e-05
#> s -2.155702e-05 2.953237e-03
#>
#> $loglik
#> [1] -395.4873
#>
#> $n
#> [1] 250
#>
#> attr(,"class")
#> [1] "FitDistr"
# Allow df to vary: not a very good idea!
FitDistr(x2, "t")
#> $estimate
#> m s df
#> -0.009653468 1.006167671 6.627292033
#>
#> $sd
#> m s df
#> 0.07147271 0.07707297 2.71033164
#>
#> $vcov
#> m s df
#> m 5.108348e-03 -5.932325e-05 -0.002431418
#> s -5.932325e-05 5.940243e-03 0.148215915
#> df -2.431418e-03 1.482159e-01 7.345897580
#>
#> $loglik
#> [1] -395.26
#>
#> $n
#> [1] 250
#>
#> attr(,"class")
#> [1] "FitDistr"
# Now do fixed-df fit directly with more control.
mydt = function(x, m, s, df) dt((x-m)/s, df)/s
FitDistr(x2, mydt, list(m = 0, s = 1), df = 9, lower = c(-Inf, 0))
#> $estimate
#> m s
#> -0.01069635 1.04409435
#>
#> $sd
#> m s
#> 0.07222562 0.05434249
#>
#> $vcov
#> m s
#> m 5.216541e-03 -2.156228e-05
#> s -2.156228e-05 2.953107e-03
#>
#> $loglik
#> [1] -395.4873
#>
#> $n
#> [1] 250
#>
#> attr(,"class")
#> [1] "FitDistr"
set.seed(123)
x3 = rweibull(100, shape = 4, scale = 100)
FitDistr(x3, "weibull")
#> $estimate
#> shape scale
#> 4.080342 99.983688
#>
#> $sd
#> shape scale
#> 0.3126797 2.5819679
#>
#> $vcov
#> shape scale
#> shape 0.09776862 0.2542929
#> scale 0.25429293 6.6665584
#>
#> $loglik
#> [1] -462.6639
#>
#> $n
#> [1] 100
#>
#> attr(,"class")
#> [1] "FitDistr"