Here's a plot of my email since October 23, 2003 (1286219 emails received in 317 weeks), categorized as spam or ham (not spam):


We started filtering viruses on May 2, 2004, which cut the number of spams slightly (about 25 viruses/day at the time, but not all were caught by spamassassin, so this number could vary a bit).

The plot shows quadratic, exponential, and oscillatory fits to predict how many spams I'll be receiving each day a month from now.

The details, which follow, assume the following fit functions:

t=0 is today.
const shows how many spams I can expect to receive today if the quadratic fit is correct
C shows how many spams I can expect to receive today if the exponential fit is correct
The oscillatory fit is only to the most recent 100 weeks of data.
degrees of freedom (ndf) : 314
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 6.80179
variance of residuals (reduced chisquare) = WSSR/ndf : 46.2643

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

const           = 187.891          +/- 18.01        (9.586%)
linear          = -0.464238        +/- 0.04059      (8.743%)
quadratic       = -0.000226359     +/- 1.785e-05    (7.886%)

--
degrees of freedom (ndf) : 315
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 8.3734
variance of residuals (reduced chisquare) = WSSR/ndf : 70.1138

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

C               = 342.639          +/- 16.44        (4.799%)
r               = 9.89977e-05      +/- 3.737e-05    (37.75%)


--
degrees of freedom (ndf) : 49
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 1.23754
variance of residuals (reduced chisquare) = WSSR/ndf : 1.53151

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

Z               = 251.657          +/- 2.927        (1.163%)
A               = 26.5018          +/- 4.385        (16.55%)
F               = 34.1934          +/- 1.47         (4.3%)
G               = 1.53765          +/- 0.3274       (21.29%)
--
degrees of freedom (ndf) : 291
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 8.34022
variance of residuals (reduced chisquare) = WSSR/ndf : 69.5592

BREAK:          Singular matrix in Invert_RtR

Back to Ben's Home Page.


Page last updated Sun Nov 22 23:58:25 CST 2009. Comments should be directed to menscher@uiuc.edu.