Skip to content Skip to sidebar Skip to footer

Using R To Fit Data From A Csv With A Gamma Function?

Using the following data: time_stamp,secs,count 2013-04-30 23:58:55,1367366335,32 2013-04-30 23:58:56,1367366336,281 2013-04-30 23:58:57,1367366337,664 2013-04-30 23:58:58,1367

Solution 1:

-1 to the OP for not showing work, but +1 for posting an interesting problem. Here's a stab at it. The idea is to fit a curve to the (x, y) data, where the curve is a (scaled) gamma density. Since the data is a table of counts over a grid, a poisson error distribution seems to be a reasonable assumption. I've implicitly rescaled the problem so that the grid is 1...365 rather than 1367366395...1367366699 to make life easier; translating the results back to the original scale is an exercise for the reader.

negll <- function(par, x, y)
{
    shape <- par[1]
    rate <- par[2]
    mu <- dgamma(x, shape, rate) * sum(y)
    -2 * sum(dpois(y, mu, log=TRUE))
}


optim(c(1, 1), negll, x=seq_along(g$count), y=g$count, method="L-BFGS-B", lower=c(.001, .001))
$par
[1] 0.73034879 0.00698288

$value
[1] 62983.18

$countsfunction gradient 
      32       32 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

This is actually a pretty poor fit (shape less than 1 indicates the fitted curve goes to infinity at the origin, in contrast to the observed data). Looking at the observed trend suggests that a gamma density is a poor choice for this data; the peak is too sharp and decays too slowly. A more heavy-tailed distribution like the (generalised) Pareto might be a better bet.

Couple more things:

  1. This is fitted to the entire table rather than just the first 30 points. Using only 30 points would make it even more difficult, since you get minimal information on the shape and rate from that part of the curve alone.

  2. There's a noticeable bump in the curve 2/3rds of the way down. Whether this is materially significant is for the OP to decide, but I hope they actually noticed this rather than skipping directly to the curve-fitting stage.

Solution 2:

I would visualize my distribution before fitting :

 library(latticeExtra)  ## useful for zoo objects## plotting 30 points don't give a gamma like plot...## but 100 points is better
 marginal.plot(dat$count[1:100])

enter image description here

Then Using fitdistr from MASS

## Normalize to avoid bad minima
xx <- as.numeric(dat[1:100,'count'])/max(dat$count)
fitdistr(xx,'gamma',  start=list(shape = 2, rate = 0.1),lower=0.00001)
     shape       rate   
  1.6190036   6.1598943 
 (0.2096129) (0.9329586)

Post a Comment for "Using R To Fit Data From A Csv With A Gamma Function?"