73 Commits

Author SHA1 Message Date
950aea9eeb
feat: add cmd/form2mail using formmailer/ipcohort/geoip/gitshallow
Drop-in replacement for the legacy standalone form2mail binary.
Preserves CLI flags, env vars, .env loading, password prompt, response
file hot-reload, field mapping (input_1/3/4/5/7), rate limit (5/min,
burst 3), North-America country gate, .ru silent-drop, routes, startup
banner, and [REDACTED] placeholder for bot rejections.

Changes vs. legacy:
- Replaces embedded iploc with net/geoip (requires GeoIP.conf)
- Reads bitwire-it repo's native layout (tables/inbound/*) instead of
  a hand-managed flat inbound.txt
- gitshallow.Repo with GCInterval=24 keeps .git from accumulating
  orphaned blobs after hourly upstream updates

formmailer additions to support form2mail's legacy behavior:
- SuccessBodyFunc / ErrorBodyFunc — per-request body providers for
  hot-reloadable templates
- HiddenSupportValue — string to render in place of {.SupportEmail}
  for blacklist/bot rejections (form2mail uses "[REDACTED]")
2026-04-20 22:09:11 -06:00
0d4bce8a38
refactor(formmailer): use geoip instead of iploc for country check
Swap github.com/phuslu/iploc for the shared net/geoip package,
matching the pattern established by check-ip. AllowedCountries now
reads CountryISO off fm.Geo.Value().Lookup(ipStr) instead of
iploc.IPCountry, so the same GeoLite2 databases serve both callers
and refresh on the same cadence.

New field: Geo *dataset.View[geoip.Databases]. Required when
AllowedCountries is set; if Value() is nil (pre-load), the check is
skipped (unknown = allow), matching the prior iploc behavior on
unknown IPs.
2026-04-20 20:35:18 -06:00
0d9df94a24
test(gitshallow): regression test for upstream force-push recovery
Sets up a local bare upstream, clones via gitshallow, then rewrites
upstream history as an unrelated commit and force-pushes. A fresh
Repo instance's Fetch must succeed and install the new HEAD — the
old pull-based flow would fail with "refusing to merge unrelated
histories".

Runs under the default test build (no integration tag) because it
uses only a local bare repo; no network access required.
2026-04-20 20:05:52 -06:00
06e6cfa211
fix(formmailer): cap request body with MaxBytesReader
ParseMultipartForm(maxFormSize) caps post-header bytes but doesn't
bound the raw body transfer, so a slow/chunked POST can burn server
time before rejection. Wrap r.Body in http.MaxBytesReader so the
transport cuts off over-size bodies immediately.
2026-04-20 20:02:41 -06:00
b77872623a
feat(formmailer)!: replace FormFields struct with ordered []Field
Form inputs are now declared as an ordered slice with Kind-driven
validation (KindText, KindEmail, KindPhone, KindMessage). Arbitrary
input names are fine — callers pick the Label shown in the email
body and the FormName of the HTML input. Per-field MaxLen and
Required overrides supported; defaults come from Kind.

Exactly one KindEmail entry is required (used for Reply-To, Subject
{.Email} substitution, and the MX check); misconfiguration is
detected at first request and returns 500.

Email body, log line, and validation now iterate Fields in order, so
the email preserves the form's declared layout.

BREAKING: FormMailer.Fields is now []Field, not FormFields struct.
Callers must migrate to the slice form.
2026-04-20 19:45:53 -06:00
f972d6f117
refactor(formmailer): use *dataset.View directly, tighter timeouts
- Drop CohortSource interface — it had exactly one implementation.
  Blacklist is now *dataset.View[ipcohort.Cohort] directly, matching
  check-ip's usage. One concrete type, no premature abstraction.
- SMTP 15s → 5s, MX 3s → 2s. A relay or resolver that isn't
  responding inside those bounds isn't going to deliver the mail;
  faster failure is better than holding the request goroutine.
2026-04-20 19:36:51 -06:00
b23610fdf1
refactor(formmailer): production-readiness + dataset.View compatibility
- Blacklist is now a CohortSource interface (Value() *ipcohort.Cohort).
  *dataset.View[ipcohort.Cohort] satisfies it directly; callers with
  an atomic.Pointer can wrap. Drops the atomic/sync import from the
  public API.
- SMTP send now uses net.Dialer.DialContext with a bounded SMTPTimeout
  (default 15s) and conn deadline, so a slow/hung relay no longer holds
  the request goroutine for WriteTimeout. Opportunistic STARTTLS added.
- MX lookup uses net.DefaultResolver.LookupMX with a bounded MXTimeout
  (default 3s), cancellable via r.Context().
- clientIP uses net.SplitHostPort (was LastIndex(":"), broken for IPv6).
- Per-IP limiter map now has a 10-minute TTL with opportunistic sweep
  every 1024 requests — previously grew unbounded.
- Sentinel errors switched to errors.New; fmt.Errorf was unused.
2026-04-20 19:32:40 -06:00
46b31b75c2
style: format entry counts with comma thousands separators
3,406,727 scans cleanly; 3406727 does not. Go's fmt has no
thousands-separator verb and golang.org/x/text/message pulls in a
multi-MB Unicode tree for what is 15 lines inline, so each cmd gets
its own commafy helper.
2026-04-20 19:15:47 -06:00
8ebc571928
refactor: apply check-ip CLI conventions to sibling cmds
Propagate the patterns used in cmd/check-ip to the other command-line
tools touched by this PR:

- flag.FlagSet + Config struct instead of package-level flag.String
  pointers (geoip-update, ipcohort-contains, git-shallow-sync).
- -V/--version/version and help/-help/--help handled before Parse,
  matching the project's CLI conventions.
- Stderr "Loading X... Nms (counts)" progress lines on the stages that
  actually take time: blocklist cohort parse (ipcohort-contains),
  per-edition fetch (geoip-update), and repo sync (git-shallow-sync).
  Stdout stays machine-parseable.
2026-04-20 19:13:47 -06:00
c99cd3a2b8
refactor: default cache to ~/.cache on all platforms
os.UserCacheDir returns ~/Library/Caches on macOS, which is intended
for bundled desktop apps and hides files from anyone looking under
~/.cache. These are CLI tools — use the XDG convention everywhere so
the cache lives somewhere predictable and cross-platform-consistent.
2026-04-20 17:33:31 -06:00
5e6688c2a9
feat(gitshallow): add MaxAge gate via FETCH_HEAD mtime
Short-lived CLI invocations were doing a full git fetch+reset on every
run because the only debounce was an in-memory lastSynced field. MaxAge
skips the fetch when .git/FETCH_HEAD is younger than the configured
duration — git rewrites FETCH_HEAD on every successful fetch, so its
mtime is effectively "last time we talked to the remote", and it
survives process restart. Wire check-ip's blocklist repo to the same
47m refresh interval it uses for the background Tick.
2026-04-20 17:28:53 -06:00
631f32cf95
fix(gitshallow): mirror upstream with fetch+reset instead of pull --ff-only
The shallow clone is a read-only mirror, so a force-push on the
upstream branch caused pull --ff-only to bail with "refusing to merge
unrelated histories". Switch to git fetch + git reset --hard
origin/<branch> so the local copy always tracks upstream, force-push
or not. Auto-detects the branch from origin/HEAD when Branch is empty.
2026-04-20 17:23:53 -06:00
8f40bbf110
feat(geoip): Open falls back to lex-latest <edition>_*.tar.gz
Prefer <edition>_LATEST.tar.gz (what httpcache writes), but fall back
to the lexicographically greatest <edition>_*.tar.gz — MaxMind's dated
Content-Disposition names sort chronologically, so this picks the most
recent archive when the cache was populated by hand or by another tool.
Exposes FindTarGz for callers that need the resolved path.
2026-04-20 17:14:11 -06:00
e594f2503c
refactor(geoip): cache tarballs as <edition>_LATEST.tar.gz
Adds geoip.TarGzName(edition) as the single source of truth for the
cache filename. The _LATEST suffix signals that the file is whatever
MaxMind served most recently (versus the dated Content-Disposition
name) and keeps httpcache's ETag sidecar tied to a stable path across
releases.
2026-04-20 17:13:41 -06:00
0c509fb563
docs: note GeoLite2 free signup in check-ip and geoip.Conf
Missing GeoIP.conf now points users at the free MaxMind signup with an
example config. Also documented on the geoip.Conf godoc.
2026-04-20 17:07:22 -06:00
159cf2d4d3
refactor(httpcache): sentinel errors for Fetch failure modes
ErrUnexpectedStatus, ErrEmptyResponse, ErrSaveMeta are exposed so
callers can branch with errors.Is. Messages remain descriptive (status
code, URL, Path) via %w wrapping.
2026-04-20 17:01:04 -06:00
ba64018838
fix(httpcache): propagate sidecar write errors from Fetch
saveMeta now returns an error instead of silently swallowing WriteFile/
Rename failures. Fetch wraps and returns it (with updated=true, since
the body rename already succeeded). Callers get a loud signal when the
sidecar can't be written — the body is still good, but the next
conditional GET may redownload.
2026-04-20 17:00:22 -06:00
f75d5c489a
refactor(httpcache): use http.Header instead of AuthHeader/AuthValue
Cacher.Header is a stdlib http.Header that's merged into every request.
Authorization is stripped on redirect unconditionally (presigned S3/R2
targets, etc). Callers build the header with the usual http.Header
literal; BasicAuth/Bearer still produce the Authorization value.
2026-04-20 16:55:15 -06:00
4753888402
refactor(geoip): ParseConf takes a string, not a file path
The old ParseConf opened the file itself, which the name did not
convey. Now it parses the config text directly, matching
encoding/json.Unmarshal-style conventions: callers read the file (or
source the string however they like) and pass it in. Also introduce
errors.ErrMissingCredentials for the credential-missing case so callers
can branch on it.
2026-04-20 16:53:17 -06:00
56a150826e
refactor: geoip opens tar.gz in place, no Transform, no intermediate mmdb
- httpcache.Cacher loses Transform (always atomic copy to Path); adds
  BasicAuth and Bearer helpers for Authorization header values.
- geoip.Open now reads <dir>/GeoLite2-City.tar.gz and GeoLite2-ASN.tar.gz
  directly: extracts the .mmdb entry in memory and opens via
  geoip2.FromBytes. No .mmdb files written to disk.
- geoip.Downloader/New/NewCacher/Fetch/ExtractMMDB removed — geoip is
  purely read/lookup; fetching is each caller's concern.
- cmd/check-ip/main.go is a single main() again: blocklists via
  gitshallow+dataset, geoip via two httpcache.Cachers (if GeoIP.conf
  present) + geoip.Open. No geo refresh loop, no dataset.Group for geo.
- cmd/geoip-update and the integration test construct httpcache.Cachers
  directly against geoip.DownloadBase + edition IDs, writing .tar.gz.
2026-04-20 16:27:32 -06:00
cb39f30d91
refactor(geoip,check-ip): inline literal mmdb filenames
Use 'GeoLite2-City.mmdb' / 'GeoLite2-ASN.mmdb' directly instead of
composing from the edition constants. Reads plainly — the actual
filename is right there.
2026-04-20 16:13:30 -06:00
359b740cec
refactor(geoip): Open takes dir, derives canonical edition paths
Filenames are deterministic (<dir>/GeoLite2-City.mmdb,
<dir>/GeoLite2-ASN.mmdb) — callers no longer pass both paths. cmd/check-ip
drops its cityPath/asnPath locals and just hands the maxmind dir to
geoip.Open and the fetcher builder.
2026-04-20 16:12:46 -06:00
9b92136f91
refactor(geoip,check-ip): lift download/refresh out of geoip into cmd
geoip.Open now just opens files; download/refresh/polling logic lives at
the cmd layer using dataset.Group with a combined httpcache.Cacher
fetcher (or PollFiles when no GeoIP.conf is available). Removes
geoip.OpenDatabases — the library is no longer concerned with refresh.
2026-04-20 16:10:51 -06:00
a84116f806
refactor: strip all optional/nil-guard plumbing from check-ip + geoip
- drop Checker struct, loadCohort helper, and contains() nil-wrapper
- inline check logic into server as a closure
- geoip.Databases: no nil-receiver guards, no nil-field branches, no
  "disabled" mode. City + ASN are both required; caller hands explicit
  paths and OpenDatabases returns a fully-initialized value or an err
- main.go is now straight-line wiring with no helper functions
2026-04-20 15:55:55 -06:00
cdce7da04c
refactor(check-ip): simplify to 4 flags, push MkdirAll into libs
check-ip now takes only --serve, --geoip-conf, --blocklist-repo,
--cache-dir. Blocklist always comes from git; GeoIP mmdbs always go
through httpcache (when GeoIP.conf is available). Format negotiation
lives entirely server-side.

main.go is now straight-line wiring: parse flags, build the two
databases, run the server. All filesystem setup (MkdirAll for clone
target, for cache Path parents) is pushed into gitshallow and
httpcache so the cmd doesn't do filesystem bookkeeping.
2026-04-20 15:51:46 -06:00
912e1179d4
feat(check-ip): --format pretty|json, move rendering out of geoip
geoip.Databases now exposes a structured Lookup(ip) Info. Rendering
moved up to the cmd — the library no longer writes to io.Writer.

check-ip adds a Result struct and --format flag (pretty/json). Serve
mode dispatches on ?format=json or Accept: application/json. Pretty
is the default for both one-shot and HTTP.
2026-04-20 14:18:39 -06:00
01a9185c03
refactor: delete net/dataset package
check-ip and geoip no longer use it; formmailer now takes
*atomic.Pointer[ipcohort.Cohort] for Blacklist so callers own the
refresh + swap lifecycle directly. gitshallow doc comments that
referenced dataset.Syncer are trimmed.

The concepts the package tried to share (atomic-swap, group sync,
ticker-driven refresh) may come back under sync/dataset once we have
more than one in-tree caller that wants them.
2026-04-20 13:22:08 -06:00
5985ea5e2d
refactor(geoip): drop dataset dep, become barebones load/open/get
Databases is now just two *geoip2.Reader fields with Open/Close/PrintInfo.
OpenDatabases still auto-discovers conf and downloads stale .mmdb files
via httpcache before opening, but it no longer runs background goroutines
or holds atomic pointers. Long-running callers that want refresh can wire
httpcache.Cacher to atomic.Pointer themselves.

check-ip drops geo.Init/geo.Run — OpenDatabases does the fetch+open work
itself, and a one-shot CLI doesn't need background refresh.
2026-04-20 13:20:34 -06:00
f5f992ae94
refactor: move geoip setup into geoip.OpenDatabases, remove cmd/check-ip/geo.go
OpenDatabases(confPath, cityPath, asnPath) handles conf discovery, cache
dir setup, and Databases construction. DefaultConfPaths lists the standard
GeoIP.conf locations. cmd/check-ip/geo.go deleted; main calls one function.
2026-04-20 12:51:50 -06:00
994d91b2bf
refactor: dataset.Add returns *Dataset, no View; main uses Group for all cases
Remove View[T] — Add now returns *Dataset[T] directly. Callers use Load()
on the returned Dataset; Init/Run belong to the owning Group.

main.go simplified: declare syncer + file paths per case, then one
g.Init() and one g.Run(). No manual loops over individual datasets.
Add gitshallow.Repo.FilePath helper.
2026-04-20 12:48:38 -06:00
cc945b0c09
refactor: dataset.Sync() = fetch+conditional-swap, no public Swap()
Callers only need Init() + Run() + Load(). Sync() handles the full
fetch→swap cycle internally when the source reports a change.
2026-04-20 12:46:02 -06:00
7b71dec445
feat: gitshallow.File for per-file path/open/sync; use in check-ip git case 2026-04-20 12:39:24 -06:00
6b420badbc
refactor: merge blacklist.go into main.go via dataset.MultiSyncer 2026-04-20 12:23:13 -06:00
ddd0986e20
refactor: push complexity into packages; main.go is orchestration only
- geoip.Databases: wraps city+ASN datasets with nil-safe Init/Run/PrintInfo
- geoip.(*Downloader).NewDatabases: builds Databases from downloader
- cmd/check-ip/geo.go: setupGeo() handles conf parsing, dir creation, DB path resolution
- cmd/check-ip/blacklist.go: isBlocked() + cohortSize() moved here
- cmd/check-ip/main.go: flags, source selection, init, check, print — nothing else
2026-04-20 12:15:14 -06:00
34a54c2d66
refactor: multi-module workspace + dataset owns Syncer interface
- Each package gets its own go.mod: net/{dataset,httpcache,gitshallow,ipcohort,geoip,formmailer}
- go.work with replace directives for cross-module workspace resolution
- dataset.Syncer/NopSyncer moved here from httpcache; callers duck-type it
- dataset.View[T] returned by Add to prevent Init/Sync/Run misuse on group members
- cmd/check-ip moved from net/ipcohort/cmd/check-ip to top-level cmd/check-ip
- Add net/ipcohort/cmd/ipcohort-contains for standalone cohort membership testing
2026-04-20 11:22:01 -06:00
225faec549
fix: FormFields defaults to GravityForms-compatible input_N names 2026-04-20 11:02:46 -06:00
d57c810c2e
feat: add net/formmailer with updated paradigms
Rewrite from feat-formmailer WIP:
- Blacklist is *dataset.View[ipcohort.Cohort] — caller wires dataset group
- http.Handler via ServeHTTP — drop-in for any mux
- SuccessBody/ErrorBody []byte — caller loads files; no file I/O per request
- Rate limiter per-instance (sync.Once init), not global
- Fields configurable (default standard names, not GravityForms input_N)
- AllowedCountries []string for geo-blocking via iploc (nil = allow all)
- ContainsAddr used directly (pre-parsed netip.Addr, no re-parse)
- No Init()/Run() — caller drives dataset lifecycle
- Fix getErrorBotty typo; expose support email only to legitimate errors
2026-04-20 11:01:15 -06:00
b2eb5aef9a
fix: skip redundant pull when another caller just synced under the lock
Records lastSynced time after each pull. A concurrent caller that was
waiting behind the mutex sees lastSynced < 1s ago and returns early,
avoiding a wasted network round-trip.
2026-04-20 10:15:53 -06:00
bd62122ac8
feat: default cache dirs; test both inbound files
- geoip.DefaultCacheDir() → ~/.cache/maxmind (os.UserCacheDir based)
- check-ip defaults data dir to ~/.cache/bitwire-it; -data-dir flag overrides;
  positional data-dir arg removed (IP is now the only required arg)
- geoip conf: DatabaseDirectory defaults to geoip.DefaultCacheDir() when blank
- httpcache integration tests now cover both inbound files (single_ips + networks)
2026-04-20 10:11:49 -06:00
d24a34e0e5
test: strengthen gitshallow integration tests to assert updated=false on re-pull 2026-04-20 10:07:06 -06:00
297fba10f5
feat: persist ETag/Last-Modified to sidecar file; add integration tests
httpcache: write <path>.meta JSON sidecar after each successful download;
load it on first Fetch so conditional GETs work after process restarts.

Tests verify: download, sidecar written, same-cacher 304, fresh-cacher 304
(the last being the key case — no in-memory state, sidecar drives ETag).
MaxMind integration test reads GeoIP.conf, downloads City+ASN, verifies
fresh-cacher conditional GET skips re-download via sidecar ETag.
2026-04-20 10:04:56 -06:00
344246362f
test: add integration tests for httpcache and gitshallow 2026-04-20 10:01:57 -06:00
4e8321af97
fix: restore auth stripping on redirect, keyed off AuthHeader 2026-04-20 09:59:27 -06:00
3feb248ce1
refactor: replace Username/Password with AuthHeader/AuthValue in httpcache
Generic header pair works for any auth scheme — Bearer, X-API-Key, Basic, etc.
Auth is forwarded on redirects; the MaxMind-specific stripping is removed.
geoip.go encodes Basic auth credentials directly into AuthValue.
2026-04-20 09:58:08 -06:00
d0a5e0a9d2
fix: split connection and download timeouts in httpcache
ConnTimeout (default 5s) caps TCP connect + TLS handshake via net.Dialer
and Transport.TLSHandshakeTimeout. Timeout (default 5m) caps the overall
request including body read. Previously a single 30s timeout covered both,
which was too short for large downloads and too long for connection failures.
2026-04-20 09:56:24 -06:00
86ffa2fb23
chore: remove IPv6 special-casing (YAGNI)
Drop the explicit IPv6 early-exit in ReadAll — ParseIPv4 already rejects
non-IPv4 via Is4(). Remove IPv6-specific tests and error message wording.
2026-04-20 09:54:04 -06:00
ad5d696ce6
refactor: dataset.Add returns View[T] instead of Dataset[T]
Group-managed datasets must never have Init/Sync/Run called on them.
Rather than patching with NopSyncer, introduce View[T] — a thin wrapper
that exposes only Load(). The compiler now prevents misuse: callers can
read values but cannot drive fetch/reload cycles directly.

Dataset[T] no longer needs a syncer when owned by a Group; View.reload()
delegates to the inner Dataset.reload() for Group.reloadAll().
2026-04-20 09:50:48 -06:00
896031b6a8
fix: idiomatic Go cleanup across net packages
- gitshallow: replace in-place Depth mutation with effectiveDepth() method;
  remove depth normalisation from New() since it was masking the bug
- ipcohort: extract sortNets() helper using cmp.Compare, eliminating 3 identical
  sort closures; add ContainsAddr(netip.Addr) for pre-parsed callers; guard
  Contains() against IPv6 panic (As4 panics on non-v4); add IPv6 test
- dataset: Add() now sets NopSyncer{} so callers cannot panic by accidentally
  calling Init/Sync/Run on a Group-managed Dataset
2026-04-20 09:47:50 -06:00
410b52f72c
test: ipcohort + dataset; fix ParseIPv4 panic on IPv6
- ParseIPv4 now returns an error instead of panicking on IPv6 addrs
- Add ipcohort tests: ParseIPv4, Contains (host/CIDR/mixed/fail-closed/empty), Size, LoadFile, LoadFiles, IPv6 skip
- Add dataset tests: Init, Sync (updated/no-update), error paths, Close hook, Run tick, Group (single fetch drives all loaders)
2026-04-20 09:36:13 -06:00
aeb94fc26b
fix: remove double-fetch, add httpcache.NopSyncer, drop Sources.Init
Sources.Init() was redundant: gitshallow.Repo.Fetch() already clones
if missing via syncGit()->clone(). Removing it means blGroup.Init()
is the single entry point, no duplicate network calls.

httpcache.NopSyncer{} replaces the private nopSyncer in the cmd —
exported so any caller can build a file-only Dataset without a syncer.
2026-04-20 09:31:58 -06:00