Tags: dfittool, distribution, distributions, files, findthe, fitting, graduation, internet, matlab, programming, project, sizes, tasks, wehave, working
Fitting distributions using DFITTOOL
On Programmer » Matlab
2,647 words with 1 Comments; publish: Wed, 30 Apr 2008 21:04:00 GMT; (20046.88, « »)
Hi,
We are working on our graduation project. One of the tasks is to find
the distribution of the sizes of the files on the Internet today. We
have collected ~ 2.3 Million samples. The next step is to feed this
huge amount of data into our lovely friend MATLAB to do analysis. We
are using DFITTOOL in the statistics toolbox to find the histogram,
PDF, CDF, etc... Now, we want to fit the distribution that we found
into a "well-known" distribution. From theory, we know that the
distribution of files on the Internet is either Pareto or Log-normal.
The shape of our curve is quite consistent with that. Now, we have
two questions:
1) Using the dfittool, we found the fit for lognormal, weibulm etc...
One of the parameters returned by the tool is "Log likelihood". We
looked in MATLAB help for the interpretation of this number but we
found some theoritical explanation. Can anybody please explain what
does that number means? and when we say that the fit is good ?
2) It seems that Pareto distribution is not supported by MATLAB. Is
there any online scripts or toolboxes which has this distribution in
it? (we have no time to write the distribution from scratch)
Thanks alot in advance
http://matlab.itags.org/q_matlab_18383.html
All Comments
Leave a comment...
- 1 Comments

- "Mohamed Bamakhrama" <mohameda.matlab.itags.org.computer.org> wrote in message
news:ef054a9.-1.matlab.itags.org.webx.raydaftYaTP...
> 1) Using the dfittool, we found the fit for lognormal, weibulm etc...
> One of the parameters returned by the tool is "Log likelihood". We
> looked in MATLAB help for the interpretation of this number but we
> found some theoritical explanation. Can anybody please explain what
> does that number means? and when we say that the fit is good ?
Mohamed, I can't think of a way to use the log likelihood to decide if the
lognormal distribution fits well. This is the objective function that
dfittool maximizes to find the maximum likelihood estimates. The likelihood
is used sometimes to compare two models when one is a special case of the
other, or to compute quantities such as the AIC (Akaike information
criterion).
> 2) It seems that Pareto distribution is not supported by MATLAB. Is
> there any online scripts or toolboxes which has this distribution in
> it? (we have no time to write the distribution from scratch)
You may want to look at this demo:
http://www.mathworks.com/products/d...paretodemo.html
-- Tom
#1; Wed, 30 Apr 2008 21:05:00 GMT