Here's a plot of my email since October 23, 2003 (1280899 emails received in 315 weeks), categorized as spam or ham (not spam):


We started filtering viruses on May 2, 2004, which cut the number of spams slightly (about 25 viruses/day at the time, but not all were caught by spamassassin, so this number could vary a bit).

The plot shows quadratic, exponential, and oscillatory fits to predict how many spams I'll be receiving each day a month from now.

The details, which follow, assume the following fit functions:

t=0 is today.
const shows how many spams I can expect to receive today if the quadratic fit is correct
C shows how many spams I can expect to receive today if the exponential fit is correct
The oscillatory fit is only to the most recent 100 weeks of data.
degrees of freedom (ndf) : 312
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 6.76495
variance of residuals (reduced chisquare) = WSSR/ndf : 45.7646

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

const           = 191.051          +/- 17.91        (9.375%)
linear          = -0.466755        +/- 0.04057      (8.691%)
quadratic       = -0.000230233     +/- 1.797e-05    (7.805%)

--
degrees of freedom (ndf) : 313
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 8.36948
variance of residuals (reduced chisquare) = WSSR/ndf : 70.0482

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

C               = 345.252          +/- 16.56        (4.796%)
r               = 0.000104781      +/- 3.761e-05    (35.9%)


--
degrees of freedom (ndf) : 49
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 1.17872
variance of residuals (reduced chisquare) = WSSR/ndf : 1.38939

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

Z               = 251.641          +/- 2.784        (1.106%)
A               = 26.2518          +/- 4.159        (15.84%)
F               = 34.2367          +/- 1.424        (4.159%)
G               = 1.45872          +/- 0.3137       (21.51%)
--
degrees of freedom (ndf) : 289
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 8.3324
variance of residuals (reduced chisquare) = WSSR/ndf : 69.4289

BREAK:          Singular matrix in Invert_RtR

Back to Ben's Home Page.


Page last updated Fri Nov 6 23:58:26 CST 2009. Comments should be directed to menscher@uiuc.edu.