20 Commits

Author SHA1 Message Date
0d9df94a24
test(gitshallow): regression test for upstream force-push recovery
Sets up a local bare upstream, clones via gitshallow, then rewrites
upstream history as an unrelated commit and force-pushes. A fresh
Repo instance's Fetch must succeed and install the new HEAD — the
old pull-based flow would fail with "refusing to merge unrelated
histories".

Runs under the default test build (no integration tag) because it
uses only a local bare repo; no network access required.
2026-04-20 20:05:52 -06:00
8ebc571928
refactor: apply check-ip CLI conventions to sibling cmds
Propagate the patterns used in cmd/check-ip to the other command-line
tools touched by this PR:

- flag.FlagSet + Config struct instead of package-level flag.String
  pointers (geoip-update, ipcohort-contains, git-shallow-sync).
- -V/--version/version and help/-help/--help handled before Parse,
  matching the project's CLI conventions.
- Stderr "Loading X... Nms (counts)" progress lines on the stages that
  actually take time: blocklist cohort parse (ipcohort-contains),
  per-edition fetch (geoip-update), and repo sync (git-shallow-sync).
  Stdout stays machine-parseable.
2026-04-20 19:13:47 -06:00
5e6688c2a9
feat(gitshallow): add MaxAge gate via FETCH_HEAD mtime
Short-lived CLI invocations were doing a full git fetch+reset on every
run because the only debounce was an in-memory lastSynced field. MaxAge
skips the fetch when .git/FETCH_HEAD is younger than the configured
duration — git rewrites FETCH_HEAD on every successful fetch, so its
mtime is effectively "last time we talked to the remote", and it
survives process restart. Wire check-ip's blocklist repo to the same
47m refresh interval it uses for the background Tick.
2026-04-20 17:28:53 -06:00
631f32cf95
fix(gitshallow): mirror upstream with fetch+reset instead of pull --ff-only
The shallow clone is a read-only mirror, so a force-push on the
upstream branch caused pull --ff-only to bail with "refusing to merge
unrelated histories". Switch to git fetch + git reset --hard
origin/<branch> so the local copy always tracks upstream, force-push
or not. Auto-detects the branch from origin/HEAD when Branch is empty.
2026-04-20 17:23:53 -06:00
cdce7da04c
refactor(check-ip): simplify to 4 flags, push MkdirAll into libs
check-ip now takes only --serve, --geoip-conf, --blocklist-repo,
--cache-dir. Blocklist always comes from git; GeoIP mmdbs always go
through httpcache (when GeoIP.conf is available). Format negotiation
lives entirely server-side.

main.go is now straight-line wiring: parse flags, build the two
databases, run the server. All filesystem setup (MkdirAll for clone
target, for cache Path parents) is pushed into gitshallow and
httpcache so the cmd doesn't do filesystem bookkeeping.
2026-04-20 15:51:46 -06:00
01a9185c03
refactor: delete net/dataset package
check-ip and geoip no longer use it; formmailer now takes
*atomic.Pointer[ipcohort.Cohort] for Blacklist so callers own the
refresh + swap lifecycle directly. gitshallow doc comments that
referenced dataset.Syncer are trimmed.

The concepts the package tried to share (atomic-swap, group sync,
ticker-driven refresh) may come back under sync/dataset once we have
more than one in-tree caller that wants them.
2026-04-20 13:22:08 -06:00
994d91b2bf
refactor: dataset.Add returns *Dataset, no View; main uses Group for all cases
Remove View[T] — Add now returns *Dataset[T] directly. Callers use Load()
on the returned Dataset; Init/Run belong to the owning Group.

main.go simplified: declare syncer + file paths per case, then one
g.Init() and one g.Run(). No manual loops over individual datasets.
Add gitshallow.Repo.FilePath helper.
2026-04-20 12:48:38 -06:00
7b71dec445
feat: gitshallow.File for per-file path/open/sync; use in check-ip git case 2026-04-20 12:39:24 -06:00
34a54c2d66
refactor: multi-module workspace + dataset owns Syncer interface
- Each package gets its own go.mod: net/{dataset,httpcache,gitshallow,ipcohort,geoip,formmailer}
- go.work with replace directives for cross-module workspace resolution
- dataset.Syncer/NopSyncer moved here from httpcache; callers duck-type it
- dataset.View[T] returned by Add to prevent Init/Sync/Run misuse on group members
- cmd/check-ip moved from net/ipcohort/cmd/check-ip to top-level cmd/check-ip
- Add net/ipcohort/cmd/ipcohort-contains for standalone cohort membership testing
2026-04-20 11:22:01 -06:00
b2eb5aef9a
fix: skip redundant pull when another caller just synced under the lock
Records lastSynced time after each pull. A concurrent caller that was
waiting behind the mutex sees lastSynced < 1s ago and returns early,
avoiding a wasted network round-trip.
2026-04-20 10:15:53 -06:00
d24a34e0e5
test: strengthen gitshallow integration tests to assert updated=false on re-pull 2026-04-20 10:07:06 -06:00
344246362f
test: add integration tests for httpcache and gitshallow 2026-04-20 10:01:57 -06:00
896031b6a8
fix: idiomatic Go cleanup across net packages
- gitshallow: replace in-place Depth mutation with effectiveDepth() method;
  remove depth normalisation from New() since it was masking the bug
- ipcohort: extract sortNets() helper using cmp.Compare, eliminating 3 identical
  sort closures; add ContainsAddr(netip.Addr) for pre-parsed callers; guard
  Contains() against IPv6 panic (As4 panics on non-v4); add IPv6 test
- dataset: Add() now sets NopSyncer{} so callers cannot panic by accidentally
  calling Init/Sync/Run on a Group-managed Dataset
2026-04-20 09:47:50 -06:00
7c0cd26da1
refactor: GCInterval replaces LightGC; Sync/Init drop lightGC param
gitshallow.Repo.GCInterval int:
  0 (default) = git auto gc (no explicit call)
  N = aggressive gc + prune every Nth successful pull

GC() simplified to always aggressive+prune (the only mode we use).
Sync(), Init(), Fetch() all parameter-free; GCInterval baked into Repo.
2026-04-20 09:24:51 -06:00
105e99532d
refactor: Syncer interface, zero-length guard, Sources uses []Syncer
httpcache.Syncer interface: Fetch() (bool, error) — satisfied by both
*httpcache.Cacher and *gitshallow.Repo (new Fetch method + LightGC field).

httpcache.Cacher.Fetch now errors on zero-length 200 response instead of
clobbering the existing file with empty content.

Sources.Fetch/Init drop the lightGC param (baked into Repo.LightGC).
Sources.syncs []httpcache.Syncer replaces the separate git/httpInbound/
httpOutbound fields — Fetch iterates syncs uniformly, no more switch.
Sources itself satisfies httpcache.Syncer.
2026-04-20 09:22:16 -06:00
e2236aa09b
refactor: remove callbacks from gitshallow and httpcache
Top-layer callers (IPFilter) now drive all reloads directly after
Sync/Fetch return. gitshallow.Init now returns (bool, error).
httpcache drops Init and Sync — callers just call Fetch.
2026-04-19 23:30:30 -06:00
73b033c3e1
refactor: remove Run from gitshallow.Repo
Polling loop (ticker + Sync check) is generic to any update source —
git HEAD, HTTP ETag, file mtime. Caller drives the loop.
2026-04-19 22:54:21 -06:00
d6837d31ed
refactor: fold dataset into gitshallow, caller owns atomic.Pointer
fs/dataset deleted — generic File[T] wrapper didn't earn its abstraction layer
gitshallow.ShallowRepo → Repo (redundant with package name)
gitshallow.Repo.Register(func() error) — callbacks fire after each sync
gitshallow.Repo.Init/Run — full lifecycle in one package
caller (check-ip-blacklist) holds atomic.Pointer[Cohort] directly
2026-04-19 22:51:52 -06:00
8731eaf10b
refactor: decouple gitdataset/ipcohort for multi-file repos
gitshallow: fix double-fetch (pull already fetches), drop redundant -C flags
gitdataset: split into GitDataset[T] (file+atomic) and GitRepo (git+multi-dataset)
  - NewDataset for file-only use, AddDataset to register with a GitRepo
  - one clone/fetch per repo regardless of how many datasets it has
ipcohort: split Cohort into hosts (sorted /32, binary search) + nets (CIDRs, linear)
  - fixes false negatives when broad CIDRs (e.g. /8) precede specific entries
  - fixes Parse() sort-before-copy order bug
  - ReadAll always sorts; unsorted param removed (was dead code)
2026-04-19 22:34:25 -06:00
eb5e1d1336
feat: add net/gitshallow (for incremental updates to data repos) 2026-04-19 19:34:21 -06:00