Sets up a local bare upstream, clones via gitshallow, then rewrites
upstream history as an unrelated commit and force-pushes. A fresh
Repo instance's Fetch must succeed and install the new HEAD — the
old pull-based flow would fail with "refusing to merge unrelated
histories".
Runs under the default test build (no integration tag) because it
uses only a local bare repo; no network access required.
Propagate the patterns used in cmd/check-ip to the other command-line
tools touched by this PR:
- flag.FlagSet + Config struct instead of package-level flag.String
pointers (geoip-update, ipcohort-contains, git-shallow-sync).
- -V/--version/version and help/-help/--help handled before Parse,
matching the project's CLI conventions.
- Stderr "Loading X... Nms (counts)" progress lines on the stages that
actually take time: blocklist cohort parse (ipcohort-contains),
per-edition fetch (geoip-update), and repo sync (git-shallow-sync).
Stdout stays machine-parseable.
Short-lived CLI invocations were doing a full git fetch+reset on every
run because the only debounce was an in-memory lastSynced field. MaxAge
skips the fetch when .git/FETCH_HEAD is younger than the configured
duration — git rewrites FETCH_HEAD on every successful fetch, so its
mtime is effectively "last time we talked to the remote", and it
survives process restart. Wire check-ip's blocklist repo to the same
47m refresh interval it uses for the background Tick.
The shallow clone is a read-only mirror, so a force-push on the
upstream branch caused pull --ff-only to bail with "refusing to merge
unrelated histories". Switch to git fetch + git reset --hard
origin/<branch> so the local copy always tracks upstream, force-push
or not. Auto-detects the branch from origin/HEAD when Branch is empty.
check-ip now takes only --serve, --geoip-conf, --blocklist-repo,
--cache-dir. Blocklist always comes from git; GeoIP mmdbs always go
through httpcache (when GeoIP.conf is available). Format negotiation
lives entirely server-side.
main.go is now straight-line wiring: parse flags, build the two
databases, run the server. All filesystem setup (MkdirAll for clone
target, for cache Path parents) is pushed into gitshallow and
httpcache so the cmd doesn't do filesystem bookkeeping.
check-ip and geoip no longer use it; formmailer now takes
*atomic.Pointer[ipcohort.Cohort] for Blacklist so callers own the
refresh + swap lifecycle directly. gitshallow doc comments that
referenced dataset.Syncer are trimmed.
The concepts the package tried to share (atomic-swap, group sync,
ticker-driven refresh) may come back under sync/dataset once we have
more than one in-tree caller that wants them.
Remove View[T] — Add now returns *Dataset[T] directly. Callers use Load()
on the returned Dataset; Init/Run belong to the owning Group.
main.go simplified: declare syncer + file paths per case, then one
g.Init() and one g.Run(). No manual loops over individual datasets.
Add gitshallow.Repo.FilePath helper.
- Each package gets its own go.mod: net/{dataset,httpcache,gitshallow,ipcohort,geoip,formmailer}
- go.work with replace directives for cross-module workspace resolution
- dataset.Syncer/NopSyncer moved here from httpcache; callers duck-type it
- dataset.View[T] returned by Add to prevent Init/Sync/Run misuse on group members
- cmd/check-ip moved from net/ipcohort/cmd/check-ip to top-level cmd/check-ip
- Add net/ipcohort/cmd/ipcohort-contains for standalone cohort membership testing
Records lastSynced time after each pull. A concurrent caller that was
waiting behind the mutex sees lastSynced < 1s ago and returns early,
avoiding a wasted network round-trip.
- gitshallow: replace in-place Depth mutation with effectiveDepth() method;
remove depth normalisation from New() since it was masking the bug
- ipcohort: extract sortNets() helper using cmp.Compare, eliminating 3 identical
sort closures; add ContainsAddr(netip.Addr) for pre-parsed callers; guard
Contains() against IPv6 panic (As4 panics on non-v4); add IPv6 test
- dataset: Add() now sets NopSyncer{} so callers cannot panic by accidentally
calling Init/Sync/Run on a Group-managed Dataset
gitshallow.Repo.GCInterval int:
0 (default) = git auto gc (no explicit call)
N = aggressive gc + prune every Nth successful pull
GC() simplified to always aggressive+prune (the only mode we use).
Sync(), Init(), Fetch() all parameter-free; GCInterval baked into Repo.
httpcache.Syncer interface: Fetch() (bool, error) — satisfied by both
*httpcache.Cacher and *gitshallow.Repo (new Fetch method + LightGC field).
httpcache.Cacher.Fetch now errors on zero-length 200 response instead of
clobbering the existing file with empty content.
Sources.Fetch/Init drop the lightGC param (baked into Repo.LightGC).
Sources.syncs []httpcache.Syncer replaces the separate git/httpInbound/
httpOutbound fields — Fetch iterates syncs uniformly, no more switch.
Sources itself satisfies httpcache.Syncer.
Top-layer callers (IPFilter) now drive all reloads directly after
Sync/Fetch return. gitshallow.Init now returns (bool, error).
httpcache drops Init and Sync — callers just call Fetch.
fs/dataset deleted — generic File[T] wrapper didn't earn its abstraction layer
gitshallow.ShallowRepo → Repo (redundant with package name)
gitshallow.Repo.Register(func() error) — callbacks fire after each sync
gitshallow.Repo.Init/Run — full lifecycle in one package
caller (check-ip-blacklist) holds atomic.Pointer[Cohort] directly
gitshallow: fix double-fetch (pull already fetches), drop redundant -C flags
gitdataset: split into GitDataset[T] (file+atomic) and GitRepo (git+multi-dataset)
- NewDataset for file-only use, AddDataset to register with a GitRepo
- one clone/fetch per repo regardless of how many datasets it has
ipcohort: split Cohort into hosts (sorted /32, binary search) + nets (CIDRs, linear)
- fixes false negatives when broad CIDRs (e.g. /8) precede specific entries
- fixes Parse() sort-before-copy order bug
- ReadAll always sorts; unsorted param removed (was dead code)