Monday, October 6, 2014

Properties of heavy tailed distributions

You may have to install poweRlaw package if you have not done so earlier. This is needed to generate a power law distribution
install.packages("poweRlaw")
We will use the zoo library to find the rolling mean see comments in code ##### Lets plot characteristics of normal random variables
library(zoo)
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
# generate 10000 random numbers from normal distribution with mean 500
norm <- rnorm(10000, mean=c(500))
# plot the histogram with 100 breaks
histrv<-hist(norm, breaks=100)

# we can get the breaks used by hist function as below
histrv$breaks
##  [1] 496.2 496.3 496.4 496.5 496.6 496.7 496.8 496.9 497.0 497.1 497.2
## [12] 497.3 497.4 497.5 497.6 497.7 497.8 497.9 498.0 498.1 498.2 498.3
## [23] 498.4 498.5 498.6 498.7 498.8 498.9 499.0 499.1 499.2 499.3 499.4
## [34] 499.5 499.6 499.7 499.8 499.9 500.0 500.1 500.2 500.3 500.4 500.5
## [45] 500.6 500.7 500.8 500.9 501.0 501.1 501.2 501.3 501.4 501.5 501.6
## [56] 501.7 501.8 501.9 502.0 502.1 502.2 502.3 502.4 502.5 502.6 502.7
## [67] 502.8 502.9 503.0 503.1 503.2 503.3 503.4 503.5 503.6 503.7 503.8
## [78] 503.9 504.0
# similarly we can get the density of breaks as below
histrv$density
##  [1] 0.001 0.001 0.001 0.001 0.000 0.001 0.005 0.003 0.004 0.010 0.008
## [12] 0.012 0.015 0.020 0.029 0.027 0.047 0.040 0.051 0.068 0.083 0.102
## [23] 0.115 0.153 0.161 0.167 0.207 0.236 0.259 0.288 0.301 0.343 0.353
## [34] 0.392 0.389 0.366 0.401 0.415 0.395 0.395 0.391 0.369 0.376 0.356
## [45] 0.293 0.281 0.271 0.263 0.192 0.183 0.187 0.141 0.143 0.125 0.101
## [56] 0.088 0.073 0.057 0.054 0.045 0.036 0.025 0.022 0.010 0.013 0.012
## [67] 0.004 0.005 0.004 0.004 0.001 0.001 0.003 0.002 0.001 0.001 0.001
## [78] 0.001
# however the breaks give us the boundaries  of the x variable to get the mid points I will use rollmean from zoo with window size of 2
rollmean(histrv$breaks, 2)
##  [1] 496.3 496.4 496.5 496.6 496.7 496.8 496.9 497.0 497.1 497.2 497.2
## [12] 497.4 497.5 497.6 497.7 497.8 497.9 498.0 498.1 498.2 498.2 498.4
## [23] 498.5 498.6 498.7 498.8 498.9 499.0 499.1 499.2 499.2 499.4 499.5
## [34] 499.6 499.7 499.8 499.9 500.0 500.1 500.2 500.2 500.4 500.5 500.6
## [45] 500.7 500.8 500.9 501.0 501.1 501.2 501.2 501.4 501.5 501.6 501.6
## [56] 501.8 501.9 502.0 502.1 502.1 502.2 502.4 502.5 502.6 502.6 502.8
## [67] 502.9 503.0 503.1 503.1 503.2 503.4 503.4 503.6 503.6 503.8 503.9
## [78] 503.9
# Now plot the densities against the mid point of x variable boundaries
plot(rollmean(histrv$breaks, 2),histrv$density)

# Now plot the logs of the same
plot(log(rollmean(histrv$breaks, 2)),log(histrv$density))

Repeat the steps for exponential random numbers
exp <- rexp(10000, rate = 5000)
histrv<-hist(exp, breaks=100)

histrv$breaks
##  [1] 0.00000 0.00002 0.00004 0.00006 0.00008 0.00010 0.00012 0.00014
##  [9] 0.00016 0.00018 0.00020 0.00022 0.00024 0.00026 0.00028 0.00030
## [17] 0.00032 0.00034 0.00036 0.00038 0.00040 0.00042 0.00044 0.00046
## [25] 0.00048 0.00050 0.00052 0.00054 0.00056 0.00058 0.00060 0.00062
## [33] 0.00064 0.00066 0.00068 0.00070 0.00072 0.00074 0.00076 0.00078
## [41] 0.00080 0.00082 0.00084 0.00086 0.00088 0.00090 0.00092 0.00094
## [49] 0.00096 0.00098 0.00100 0.00102 0.00104 0.00106 0.00108 0.00110
## [57] 0.00112 0.00114 0.00116 0.00118 0.00120 0.00122 0.00124 0.00126
## [65] 0.00128 0.00130 0.00132 0.00134 0.00136 0.00138 0.00140 0.00142
## [73] 0.00144 0.00146 0.00148 0.00150 0.00152 0.00154 0.00156 0.00158
## [81] 0.00160 0.00162 0.00164 0.00166 0.00168 0.00170 0.00172 0.00174
## [89] 0.00176 0.00178
rollmean(histrv$breaks, 2)
##  [1] 0.00001 0.00003 0.00005 0.00007 0.00009 0.00011 0.00013 0.00015
##  [9] 0.00017 0.00019 0.00021 0.00023 0.00025 0.00027 0.00029 0.00031
## [17] 0.00033 0.00035 0.00037 0.00039 0.00041 0.00043 0.00045 0.00047
## [25] 0.00049 0.00051 0.00053 0.00055 0.00057 0.00059 0.00061 0.00063
## [33] 0.00065 0.00067 0.00069 0.00071 0.00073 0.00075 0.00077 0.00079
## [41] 0.00081 0.00083 0.00085 0.00087 0.00089 0.00091 0.00093 0.00095
## [49] 0.00097 0.00099 0.00101 0.00103 0.00105 0.00107 0.00109 0.00111
## [57] 0.00113 0.00115 0.00117 0.00119 0.00121 0.00123 0.00125 0.00127
## [65] 0.00129 0.00131 0.00133 0.00135 0.00137 0.00139 0.00141 0.00143
## [73] 0.00145 0.00147 0.00149 0.00151 0.00153 0.00155 0.00157 0.00159
## [81] 0.00161 0.00163 0.00165 0.00167 0.00169 0.00171 0.00173 0.00175
## [89] 0.00177
histrv$density
##  [1] 5015 4435 3905 3525 3140 2855 2595 2375 2090 1840 1705 1605 1270 1330
## [15] 1200 1080 1005  820  720  815  660  660  460  550  465  375  390  255
## [29]  280  205  245  155  180  175  145  135  100  140   95   95   85   70
## [43]   85   60   70   85   45   35   15   20   25   30   10   30   25   15
## [57]   25   15   10   15   15   15    0    5   10   15   10    0    5    0
## [71]    5    0   10    5   10    5    5    0    0   10    5    0    0    5
## [85]    0    0    0    0    5
plot(rollmean(histrv$breaks, 2),histrv$density)

plot(log(rollmean(histrv$breaks, 2)),log(histrv$density))

Repeat the steps for power law random numbers
library(poweRlaw)
# create a Discrete power-law object
m = displ$new()
# The lower threshold, xmin.
m$setXmin(5000); 
# A parameter vector 
m$setPars(4);
# Generate the numbers
powerlaw <- dist_rand(m, 100000);
histrv<-hist(powerlaw, breaks=100)

histrv$breaks
##   [1]   4000   6000   8000  10000  12000  14000  16000  18000  20000  22000
##  [11]  24000  26000  28000  30000  32000  34000  36000  38000  40000  42000
##  [21]  44000  46000  48000  50000  52000  54000  56000  58000  60000  62000
##  [31]  64000  66000  68000  70000  72000  74000  76000  78000  80000  82000
##  [41]  84000  86000  88000  90000  92000  94000  96000  98000 100000 102000
##  [51] 104000 106000 108000 110000 112000 114000 116000 118000 120000 122000
##  [61] 124000 126000 128000 130000 132000 134000 136000 138000 140000 142000
##  [71] 144000 146000 148000 150000 152000 154000 156000 158000 160000 162000
##  [81] 164000 166000 168000 170000 172000 174000 176000 178000 180000 182000
##  [91] 184000 186000 188000 190000 192000 194000 196000 198000 200000 202000
## [101] 204000 206000 208000 210000 212000 214000 216000 218000 220000 222000
## [111] 224000
rollmean(histrv$breaks, 2)
##   [1]   5000   7000   9000  11000  13000  15000  17000  19000  21000  23000
##  [11]  25000  27000  29000  31000  33000  35000  37000  39000  41000  43000
##  [21]  45000  47000  49000  51000  53000  55000  57000  59000  61000  63000
##  [31]  65000  67000  69000  71000  73000  75000  77000  79000  81000  83000
##  [41]  85000  87000  89000  91000  93000  95000  97000  99000 101000 103000
##  [51] 105000 107000 109000 111000 113000 115000 117000 119000 121000 123000
##  [61] 125000 127000 129000 131000 133000 135000 137000 139000 141000 143000
##  [71] 145000 147000 149000 151000 153000 155000 157000 159000 161000 163000
##  [81] 165000 167000 169000 171000 173000 175000 177000 179000 181000 183000
##  [91] 185000 187000 189000 191000 193000 195000 197000 199000 201000 203000
## [101] 205000 207000 209000 211000 213000 215000 217000 219000 221000 223000
histrv$density
##   [1] 2.096e-04 1.679e-04 5.927e-05 2.663e-05 1.370e-05 7.660e-06 4.640e-06
##   [8] 2.970e-06 1.730e-06 1.300e-06 9.150e-07 6.800e-07 5.400e-07 4.150e-07
##  [15] 3.700e-07 2.800e-07 1.900e-07 1.400e-07 1.400e-07 1.000e-07 1.050e-07
##  [22] 7.000e-08 7.500e-08 7.000e-08 2.000e-08 6.000e-08 3.000e-08 2.500e-08
##  [29] 1.000e-08 1.500e-08 1.500e-08 2.000e-08 1.500e-08 1.000e-08 1.500e-08
##  [36] 1.000e-08 1.000e-08 1.000e-08 5.000e-09 5.000e-09 2.500e-08 1.500e-08
##  [43] 1.000e-08 2.500e-08 5.000e-09 1.500e-08 1.000e-08 1.000e-08 5.000e-09
##  [50] 5.000e-09 0.000e+00 0.000e+00 0.000e+00 1.000e-08 0.000e+00 0.000e+00
##  [57] 5.000e-09 0.000e+00 5.000e-09 0.000e+00 0.000e+00 0.000e+00 0.000e+00
##  [64] 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
##  [71] 0.000e+00 5.000e-09 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
##  [78] 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
##  [85] 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 5.000e-09 0.000e+00
##  [92] 0.000e+00 0.000e+00 5.000e-09 0.000e+00 0.000e+00 0.000e+00 0.000e+00
##  [99] 0.000e+00 0.000e+00 0.000e+00 5.000e-09 0.000e+00 0.000e+00 0.000e+00
## [106] 0.000e+00 0.000e+00 0.000e+00 0.000e+00 5.000e-09
plot(rollmean(histrv$breaks, 2),histrv$density)
plot(log(rollmean(histrv$breaks, 2)),log(histrv$density))
You can clearly observe that the graph on log log scale is linear. Especially as compared to the normal or exponential distributions.

No comments:

Post a Comment