Tags: dfittool, distribution, distributions, files, findthe, fitting, graduation, internet, matlab, programming, project, sizes, tasks, wehave, working

Fitting distributions using DFITTOOL

On Programmer » Matlab

2,647 words with 1 Comments; publish: Wed, 30 Apr 2008 21:04:00 GMT; (20046.88, « »)

Hi,

We are working on our graduation project. One of the tasks is to find

the distribution of the sizes of the files on the Internet today. We

have collected ~ 2.3 Million samples. The next step is to feed this

huge amount of data into our lovely friend MATLAB to do analysis. We

are using DFITTOOL in the statistics toolbox to find the histogram,

PDF, CDF, etc... Now, we want to fit the distribution that we found

into a "well-known" distribution. From theory, we know that the

distribution of files on the Internet is either Pareto or Log-normal.

The shape of our curve is quite consistent with that. Now, we have

two questions:

1) Using the dfittool, we found the fit for lognormal, weibulm etc...

One of the parameters returned by the tool is "Log likelihood". We

looked in MATLAB help for the interpretation of this number but we

found some theoritical explanation. Can anybody please explain what

does that number means? and when we say that the fit is good ?

2) It seems that Pareto distribution is not supported by MATLAB. Is

there any online scripts or toolboxes which has this distribution in

it? (we have no time to write the distribution from scratch)

Thanks alot in advance

All Comments

Leave a comment...

  • 1 Comments
    • "Mohamed Bamakhrama" <mohameda.matlab.itags.org.computer.org> wrote in message

      news:ef054a9.-1.matlab.itags.org.webx.raydaftYaTP...

      > 1) Using the dfittool, we found the fit for lognormal, weibulm etc...

      > One of the parameters returned by the tool is "Log likelihood". We

      > looked in MATLAB help for the interpretation of this number but we

      > found some theoritical explanation. Can anybody please explain what

      > does that number means? and when we say that the fit is good ?

      Mohamed, I can't think of a way to use the log likelihood to decide if the

      lognormal distribution fits well. This is the objective function that

      dfittool maximizes to find the maximum likelihood estimates. The likelihood

      is used sometimes to compare two models when one is a special case of the

      other, or to compute quantities such as the AIC (Akaike information

      criterion).

      > 2) It seems that Pareto distribution is not supported by MATLAB. Is

      > there any online scripts or toolboxes which has this distribution in

      > it? (we have no time to write the distribution from scratch)

      You may want to look at this demo:

      http://www.mathworks.com/products/d...paretodemo.html

      -- Tom

      #1; Wed, 30 Apr 2008 21:05:00 GMT