FitDistr: Maximum-likelihood Fitting of Univariate Distributions
Source:R/1.3_Distr_DistrCollection_Functions.R
      FitDistr.RdMaximum-likelihood fitting of univariate distributions, allowing parameters to be held fixed if desired.
Arguments
- x
- A numeric vector of length at least one containing only finite values. Either a character string or a function returning a density evaluated at its first argument. 
- densfun
- character string specifying the density function to be used for fitting the distribution. Distributions `"beta"`, `"cauchy"`, `"chi-squared"`, `"exponential"`, `"gamma"`, `"geometric"`, `"log-normal"`, `"lognormal"`, `"logistic"`, `"negative binomial"`, `"normal"`, `"Poisson"`, `"t"` and "weibull" are recognised, case being ignored. 
- start
- A named list giving the parameters to be optimized with initial values. This can be omitted for some of the named distributions and must be for others (see Details). 
- ...
- Additional parameters, either for `densfun` or for `optim`. In particular, it can be used to specify bounds via `lower` or `upper` or both. If arguments of `densfun` (or the density function corresponding to a character-string specification) are included they will be held fixed. 
Value
The function `FitDistr` returns an object of class `fitdistr`, which is a list containing:
- estimate
- a named vector of parameter estimates. 
- sd
- a named vector of the estimated standard errors for the parameters. 
- vcov
- the estimated variance-covariance matrix of the parameter estimates. 
- loglik
- the log-likelihood of the fitted model. 
- n
- length vector. 
Details
For the Normal, log-Normal, geometric, exponential and Poisson distributions the closed-form MLEs (and exact standard errors) are used, and `start` should not be supplied.
For all other distributions, direct optimization of the log-likelihood is performed using `optim`. The estimated standard errors are taken from the observed information matrix, calculated by a numerical approximation. For one-dimensional problems the Nelder-Mead method is used and for multi-dimensional problems the BFGS method, unless arguments named `lower` or `upper` are supplied (when `L-BFGS-B` is used) or `method` is supplied explicitly.
For the `"t"` named distribution the density is taken to be the location-scale family with location `m` and scale `s`.
For the following named distributions, reasonable starting values will be computed if `start` is omitted or only partially specified: `"cauchy"`, `"gamma"`, `"logistic"`, `"negative binomial"` (parametrized by mu and size), `"t"` and `"weibull"`. Note that these starting values may not be good enough if the fit is poor: in particular they are not resistant to outliers unless the fitted distribution is long-tailed.
There are `print`, `coef`, `vcov` and `logLik` methods for class `"FitDistr"`.
Examples
set.seed(123)
x = rgamma(100, shape = 5, rate = 0.1)
FitDistr(x, "gamma")
#> $estimate
#>     shape      rate 
#> 6.4869913 0.1365106 
#> 
#> $sd
#>    shape     rate 
#> 0.894601 0.019573 
#> 
#> $vcov
#>            shape         rate
#> shape 0.80031094 0.0168407776
#> rate  0.01684078 0.0003831024
#> 
#> $loglik
#> [1] -429.1792
#> 
#> $n
#> [1] 100
#> 
#> attr(,"class")
#> [1] "FitDistr"
# Now do this directly with more control.
FitDistr(x, dgamma, list(shape = 1, rate = 0.1), lower = 0.001)
#> $estimate
#>     shape      rate 
#> 6.4868655 0.1365101 
#> 
#> $sd
#>      shape       rate 
#> 0.89438949 0.01956818 
#> 
#> $vcov
#>            shape         rate
#> shape 0.79993257 0.0168323391
#> rate  0.01683234 0.0003829136
#> 
#> $loglik
#> [1] -429.1792
#> 
#> $n
#> [1] 100
#> 
#> attr(,"class")
#> [1] "FitDistr"
set.seed(123)
x2 = rt(250, df = 9)
FitDistr(x2, "t", df = 9)
#> $estimate
#>           m           s 
#> -0.01069496  1.04410551 
#> 
#> $sd
#>          m          s 
#> 0.07222623 0.05434369 
#> 
#> $vcov
#>               m             s
#> m  5.216629e-03 -2.155702e-05
#> s -2.155702e-05  2.953237e-03
#> 
#> $loglik
#> [1] -395.4873
#> 
#> $n
#> [1] 250
#> 
#> attr(,"class")
#> [1] "FitDistr"
# Allow df to vary: not a very good idea!
FitDistr(x2, "t")
#> $estimate
#>            m            s           df 
#> -0.009653468  1.006167671  6.627292033 
#> 
#> $sd
#>          m          s         df 
#> 0.07147271 0.07707297 2.71033164 
#> 
#> $vcov
#>                m             s           df
#> m   5.108348e-03 -5.932325e-05 -0.002431418
#> s  -5.932325e-05  5.940243e-03  0.148215915
#> df -2.431418e-03  1.482159e-01  7.345897580
#> 
#> $loglik
#> [1] -395.26
#> 
#> $n
#> [1] 250
#> 
#> attr(,"class")
#> [1] "FitDistr"
# Now do fixed-df fit directly with more control.
mydt = function(x, m, s, df) dt((x-m)/s, df)/s
FitDistr(x2, mydt, list(m = 0, s = 1), df = 9, lower = c(-Inf, 0))
#> $estimate
#>           m           s 
#> -0.01069635  1.04409435 
#> 
#> $sd
#>          m          s 
#> 0.07222562 0.05434249 
#> 
#> $vcov
#>               m             s
#> m  5.216541e-03 -2.156228e-05
#> s -2.156228e-05  2.953107e-03
#> 
#> $loglik
#> [1] -395.4873
#> 
#> $n
#> [1] 250
#> 
#> attr(,"class")
#> [1] "FitDistr"
set.seed(123)
x3 = rweibull(100, shape = 4, scale = 100)
FitDistr(x3, "weibull")
#> $estimate
#>     shape     scale 
#>  4.080342 99.983688 
#> 
#> $sd
#>     shape     scale 
#> 0.3126797 2.5819679 
#> 
#> $vcov
#>            shape     scale
#> shape 0.09776862 0.2542929
#> scale 0.25429293 6.6665584
#> 
#> $loglik
#> [1] -462.6639
#> 
#> $n
#> [1] 100
#> 
#> attr(,"class")
#> [1] "FitDistr"