API Documentation¶

`plfit` Module¶

numpy/matplotlib version of plfit.py¶

A power-law distribution fitter based on code by Aaron Clauset. It can use fortran, cython, or numpy-based power-law fitting ‘backends’. Fortran’s fastest.

Requires pylab (matplotlib), which requires numpy

Example very simple use:

from plfit import plfit

MyPL = plfit(mydata)
MyPL.plotpdf(log=True)

plfit.plfit.alpha_gen(x)[source]¶

Create a mappable function alpha to apply to each xmin in a list of xmins. This is essentially the slow version of fplfit/cplfit, though I bet it could be speeded up with a clever use of parellel_map. Not intended to be used by users.

Docstring for the generated alpha function:

Given a sorted data set and a minimum, returns power law MLE fit
data is passed as a keyword parameter so that it can be vectorized

If there is only one element, return alpha=0

plfit.plfit.discrete_alpha_mle(data, xmin)[source]¶

Equation B.17 of Clauset et al 2009

The Maximum Likelihood Estimator of the “scaling parameter” alpha in the discrete case is similar to that in the continuous case

plfit.plfit.discrete_best_alpha(data, alpharangemults=(0.9, 1.1), n_alpha=201, approximate=True, verbose=True)[source]¶

Use the maximum L to determine the most likely value of alpha

alpharangemults [ 2-tuple ]: Pair of values indicating multiplicative factors above and below the approximate alpha from the MLE alpha to use when determining the “exact” alpha (by directly maximizing the likelihood function)

plfit.plfit.discrete_ksD(data, xmin, alpha)[source]¶

given a sorted data set, a minimum, and an alpha, returns the power law ks-test D value w/data

The returned value is the “D” parameter in the ks test

(this is implemented differently from the continuous version because there are potentially multiple identical points that need comparison to the power law)

plfit.plfit.discrete_likelihood(data, xmin, alpha)[source]¶

Equation B.8 in Clauset

Given a data set, an xmin value, and an alpha “scaling parameter”, computes the log-likelihood (the value to be maximized)

plfit.plfit.discrete_likelihood_vector(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)[source]¶

Compute the likelihood for all “scaling parameters” in the range (alpharange) for a given xmin. This is only part of the discrete value likelihood maximization problem as described in Clauset et al (Equation B.8)

alpharange [ 2-tuple ]: Two floats specifying the upper and lower limits of the power law alpha to test

plfit.plfit.discrete_max_likelihood(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)[source]¶: Returns the argument of the max of the likelihood of the data given an input xmin

plfit.plfit.discrete_max_likelihood_arg(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)[source]¶: Returns the argument of the max of the likelihood of the data given an input xmin

plfit.plfit.kstest_gen(x, unique=False, finite=False)[source]¶

Create a mappable function kstest to apply to each xmin in a list of xmins.

Parameters:

unique : bool

If set, will filter the input array ‘x’ to its unique elements. Normally, this would be done at an earlier step, so unique can be disabled for performance improvement

finite : bool

Apply the finite-sample correction from Clauset et al 2007... Not clear yet which equation this comes from.

Docstring for the generated kstest function::

Given a sorted data set and a minimum, returns power law MLE ks-test against the data

data is passed as a keyword parameter so that it can be vectorized

The returned value is the “D” parameter in the ks test.

plfit.plfit.most_likely_alpha(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)[source]¶: Return the most likely alpha for the data given an xmin

plfit.plfit.pl_inv(P, xm, a)[source]¶: Inverse CDF for a pure power-law

plfit.plfit.plexp_cdf(x, xmin=1, alpha=2.5, pl_only=False, exp_only=False)[source]¶: CDF(x) for the piecewise distribution exponential x<xmin, powerlaw x>=xmin This is the CDF version of the distributions drawn in fig 3.4a of Clauset et al. The constant “C” normalizes the PDF

plfit.plfit.plexp_inv(P, xmin, alpha, guess=1.0)[source]¶

Inverse CDF for a piecewise PDF as defined in eqn. 3.10 of Clauset et al.

(previous version was incorrect and lead to weird discontinuities in the distribution function)

plfit.plfit.plexp_pdf(x, xmin=1, alpha=2.5)[source]¶

class plfit.plfit.plfit(x, **kwargs)[source]¶

Bases: object

A Python implementation of the Matlab code `http://www.santafe.edu/~aaronc/powerlaws/plfit.m`_ from `http://www.santafe.edu/~aaronc/powerlaws/`_.

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062)

The output “alpha” is defined such that $p(x) \sim (x/xmin)^{-alpha}$

alphavsks(autozoom=True, **kwargs)[source]¶: Plot alpha versus the ks value for derived alpha. This plot can be used as a diagnostic of whether you have derived the ‘best’ fit: if there are multiple local minima, your data set may be well suited to a broken powerlaw or a different function.

discrete_best_alpha(alpharangemults=(0.9, 1.1), n_alpha=201, approximate=True, verbose=True, finite=True)[source]¶

Use the maximum likelihood to determine the most likely value of alpha

alpharangemults [ 2-tuple ]: Pair of values indicating multiplicative factors above and below the approximate alpha from the MLE alpha to use when determining the “exact” alpha (by directly maximizing the likelihood function)
n_alpha [ int ]: Number of alpha values to use when measuring. Larger number is more accurate.
approximate [ bool ]: If False, try to “zoom-in” around the MLE alpha and get the exact best alpha value within some range around the approximate best

vebose [ bool ] finite [ bool ]

Correction for finite data?

lognormal(doprint=True)[source]¶: Use the maximum likelihood estimator for a lognormal distribution to produce the best-fit lognormal parameters

plfit(nosmall=True, finite=False, quiet=False, silent=False, usefortran=False, usecy=False, xmin=None, verbose=False, discrete=None, discrete_approx=True, discrete_n_alpha=1000)[source]¶

A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062

There are 3 implementations of xmin estimation. The fortran version is fastest, the C (cython) version is ~10% slower, and the python version is ~3x slower than the fortran version. Also, the cython code suffers ~2% numerical error relative to the fortran and python for unknown reasons.

There is also a discrete version implemented in python - it is different from the continous version!

discrete [ bool | None ]: If discrete is None, the code will try to determine whether the data set is discrete or continous based on the uniqueness of the data; if your data set is continuous but you have any non-unique data points (e.g., flagged “bad” data), the “automatic” determination will fail. If discrete is True or False, the distcrete or continuous fitter will be used, respectively.
xmin [ float / int ]: If you specify xmin, the fitter will only determine alpha assuming the given xmin; the rest of the code (and most of the complexity) is determining an estimate for xmin and alpha.
nosmall [ bool (True) ]: When on, the code rejects low s/n points. WARNING: This option, which is on by default, may result in different answers than the original Matlab code and the “powerlaw” python package
finite [ bool (False) ]: There is a ‘finite-size bias’ to the estimator. The “alpha” the code measures is “alpha-hat” s.t. ᾶ = (nα-1)/(n-1), or α = (1 + ᾶ (n-1)) / n
quiet [ bool (False) ]: If False, delivers messages about what fitter is used and the fit results
verbose [ bool (False) ]: Deliver descriptive messages about the fit parameters (only if *quiet*==False)
silent [ bool (False) ]: If True, will print NO messages

plot_lognormal_cdf(**kwargs)[source]¶: Plot the fitted lognormal distribution

plot_lognormal_pdf(**kwargs)[source]¶: Plot the fitted lognormal distribution

plotcdf(x=None, xmin=None, alpha=None, pointcolor='k', pointmarker='+', **kwargs)[source]¶: Plots CDF and powerlaw

plotpdf(x=None, xmin=None, alpha=None, nbins=50, dolog=True, dnds=False, drawstyle='steps-post', histcolor='k', plcolor='r', **kwargs)[source]¶

Plots PDF and powerlaw.

kwargs is passed to pylab.hist and pylab.plot

plotppf(x=None, xmin=None, alpha=None, dolog=True, **kwargs)[source]¶: Plots the power-law-predicted value on the Y-axis against the real values along the X-axis. Can be used as a diagnostic of the fit quality.

test_pl(niter=1000.0, print_timing=False, **kwargs)[source]¶

Monte-Carlo test to determine whether distribution is consistent with a power law

Runs through niter iterations of a sample size identical to the input sample size.

Will randomly select values from the data < xmin. The number of values selected will be chosen from a uniform random distribution with p(<xmin) = n(<xmin)/n.

Once the sample is created, it is fit using above methods, then the best fit is used to compute a Kolmogorov-Smirnov statistic. The KS stat distribution is compared to the KS value for the fit to the actual data, and p = fraction of random ks values greater than the data ks value is computed. If p<.1, the data may be inconsistent with a powerlaw. A data set of n(>xmin)>100 is required to distinguish a PL from an exponential, and n(>xmin)>~300 is required to distinguish a log-normal distribution from a PL. For more details, see figure 4.1 and section

WARNING This can take a very long time to run! Execution time scales as niter * setsize

xminvsks(**kwargs)[source]¶: Plot xmin versus the ks value for derived alpha. This plot can be used as a diagnostic of whether you have derived the ‘best’ fit: if there are multiple local minima, your data set may be well suited to a broken powerlaw or a different function.

plfit.plfit.plfit_lsq(x, y)[source]¶: Returns A and B in y=Ax^B http://mathworld.wolfram.com/LeastSquaresFittingPowerLaw.html

plfit.plfit.sigma(alpha, n)[source]¶

Clauset et al 2007 equation 3.2:: sigma = (alpha-1)/sqrt(n)

plfit.plfit.test_fitter(xmin=1.0, alpha=2.5, niter=500, npts=1000, invcdf=<function plexp_inv at 0x7fced5360e60>)[source]¶

Tests the power-law fitter

Examples

Example (fig 3.4b in Clauset et al.):

xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100]
xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=1,npts=50000)
loglog(xminin,xmarr.squeeze(),'x')

Example 2:

xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100]
xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=10,npts=1000)
loglog(xminin,xmarr.mean(axis=0),'x')

Example 3:

xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000)
hist(xmarr.squeeze());
# Test results:
# mean(xmarr) = 0.70, median(xmarr)=0.65 std(xmarr)=0.20
# mean(af) = 2.51 median(af) = 2.49  std(af)=0.14
# biased distribution; far from correct value of xmin but close to correct alpha

Example 4:

xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000,invcdf=pl_inv)
print("mean(xmarr): %0.2f median(xmarr): %0.2f std(xmarr): %0.2f" % (mean(xmarr),median(xmarr),std(xmarr)))
print("mean(af): %0.2f median(af): %0.2f std(af): %0.2f" % (mean(af),median(af),std(af)))
# mean(xmarr): 1.19 median(xmarr): 1.03 std(xmarr): 0.35
# mean(af): 2.51 median(af): 2.50 std(af): 0.07

`plfit_py` Module¶

Duplicate of the above plfit module, but without using numpy (or matplotlib, therefore no plots)

Pure-Python version of plfit.py¶

A pure python power-law distribution fitter based on code by Aaron Clauset. This is the slowest implementation, but has no dependencies.

Example very simple use:

from plfit_py import plfit

MyPL = plfit(mydata)
MyPL.plotpdf(log=True)

plfit.plfit_py.pl_inv(P, xm, a)[source]¶: Inverse CDF for a pure power-law

plfit.plfit_py.plexp(x, xm=1, a=2.5)[source]¶: CDF(x) for the piecewise distribution exponential x<xmin, powerlaw x>=xmin This is the CDF version of the distributions drawn in fig 3.4a of Clauset et al.

plfit.plfit_py.plexp_inv(P, xm, a)[source]¶: Inverse CDF for a piecewise PDF as defined in eqn. 3.10 of Clauset et al.

class plfit.plfit_py.plfit(x, **kwargs)[source]¶

A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062

The output “alpha” is defined such that $p(x) \sim (x/xmin)^{-alpha}$

alpha_(x)[source]¶: Create a mappable function alpha to apply to each xmin in a list of xmins. This is essentially the slow version of fplfit/cplfit, though I bet it could be speeded up with a clever use of parellel_map. Not intended to be used by users.

kstest_(x)[source]¶

plfit(nosmall=True, finite=False, quiet=False, silent=False, xmin=None, verbose=False)[source]¶

A pure-Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062

nosmall is on by default; it rejects low s/n points can specify xmin to skip xmin estimation

This is only for continuous distributions; I have not implemented a pure-python discrete distribution fitter

plfit.plfit_py.test_fitter(xmin=1.0, alpha=2.5, niter=500, npts=1000, invcdf=<function plexp_inv at 0x7fced51cc668>, quiet=True, silent=True)[source]¶

Tests the power-law fitter

Examples

Example (fig 3.4b in Clauset et al.):

xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100]
xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=1,npts=50000)
loglog(xminin,xmarr.squeeze(),'x')

Example 2:

xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100]
xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=10,npts=1000)
loglog(xminin,xmarr.mean(axis=0),'x')

Example 3:

xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000)
hist(xmarr.squeeze());
# Test results:
# mean(xmarr) = 0.70, median(xmarr)=0.65 std(xmarr)=0.20
# mean(af) = 2.51 median(af) = 2.49  std(af)=0.14
# biased distribution; far from correct value of xmin but close to correct alpha

Example 4:

xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000,invcdf=pl_inv)
print("mean(xmarr): %0.2f median(xmarr): %0.2f std(xmarr): %0.2f" % (mean(xmarr),median(xmarr),std(xmarr)))
print("mean(af): %0.2f median(af): %0.2f std(af): %0.2f" % (mean(af),median(af),std(af)))
# mean(xmarr): 1.19 median(xmarr): 1.03 std(xmarr): 0.35
# mean(af): 2.51 median(af): 2.50 std(af): 0.07

API Documentation¶

plfit Module¶

numpy/matplotlib version of plfit.py¶

plfit_py Module¶

Pure-Python version of plfit.py¶

`plfit` Module¶

`plfit_py` Module¶