How spam scoring works

Inkwell scores each submission with a composable pipeline of seven signals. Each signal contributes points to a 0–100 total; the form's threshold determines the cutoff between clean, quarantined, and spam. Honeypot and known-bad-IP signals hard-block (rejected, no payload stored).

The seven signals

Honeypot

A hidden form field that real visitors leave empty. If a bot fills it, the submission is rejected — 100 points (hard block). The field name is configurable per form (default _subject_honeypot).

IP reputation

StopForumSpam publishes a daily CSV of known-abusive IPs. Inkwell's cron job loads it into a Redis SET; per-submission lookup is O(1). A hit hard-blocks. Misses don't add points.

Timing

Real visitors take more than two seconds to fill a form. The optional 3 KB widget writes the page-render timestamp into a hidden field; submissions under two seconds get +25 points. Without the widget, the signal returns null (no penalty for no-JS visitors).

Submission rate

More than ten submissions from the same IP to the same form in 60 seconds gets +25 points. This is on top of the per-IP rate-limit middleware — the limit catches floods, the signal catches sustained-low-rate scraping.

Content

Heuristic checks on the longest free-text field — URL density (≥ 3 = +10), all-caps ratio (> 0.5 = +5), phone-number density (≥ 3 = +5), very-short body (< 6 chars = +5). Capped at +25.

Email validity

RFC 5322 syntactic check (+15 on fail) + disposable-domain blacklist (+10). The blacklist covers the major throwaway services (10minutemail, mailinator, guerrillamail, …) — extend as a v2 enhancement.

Captcha (optional)

Pass-through to Cloudflare Turnstile (or compatible). If a visitor submits a token that verifies, the signal contributes 0; if it fails, +10 on top of whatever else fired.

Graduated decision

Score is mapped to a state by the form's threshold (default 50):

0–29 → clean — accepted, destinations dispatched immediately.
30–49 → quarantined — accepted but flagged. Visitor sees a captcha challenge; on pass → clean; on fail or timeout → spam.
50+ → spam — stored, no destinations dispatched, visible in admin's Spam tab.
Hard-block (honeypot or blocklisted IP) → rejected — no payload stored, only metadata.

Why the breakdown is public

Black-box scoring breeds frustrated buyers — when a real customer's email is wrongly flagged, the buyer has no recourse without the signal-level detail. Inkwell exposes the full breakdown verbatim in the admin and (for the live demo) on the public result page. Buyers see the explainability, tune their threshold or per-form weights, and trust the system.

Adding a signal

Implement App\Services\Spam\SpamSignal, register the class in config/inkwell.php, add a corpus row in tests/corpus/spam-corpus.json, ship. The corpus is the regression contract — change semantics? Update the corpus first.