Tenant Migration Stuck on Advisory Lock

Alert / Symptom

User-visible symptom: requests for a specific tenant hang for ~10s and then fail with a tenant-migration timeout (finsta.tenant-migration.request-filter-timeout, default 10s). The on-demand request-filter migration path can’t acquire the Flyway lock for that tenant’s schema.

Secondary signal: the scheduled migration loop log line Migrate [N] tenants to latest version […​] stops advancing — pending count stays flat in startup logs across pod restarts.

This runbook applies when the lock holder is a finsta JVM that is alive but hung (long GC pause, native deadlock, blocked on a slow query). A crashed pod releases the lock automatically when its TCP connection drops.

Background

SchemaRepository runs tenant migrations with Flyway’s postgresql.transactional.lock=false (see issue #1222 and plans/2026-05-07_1222__flyway-transactional-lock-multitenant-investigation.md). This swaps Flyway’s coordination lock from pg_advisory_xact_lock (transaction-scoped) to pg_advisory_lock (session-scoped).

Tradeoff: the session-scoped lock survives until the holding TCP connection actually closes. The PostgreSQL idle_in_transaction_session_timeout does not apply because the lock connection is no longer in a transaction. A hung-but-alive JVM can therefore hold the lock until TCP keepalives reap the dead connection (typically minutes) or the pod is killed.

Impact

  • Requests for the affected tenant hang and time out on the request-filter migration path.

  • The scheduled migration loop on every instance blocks when it picks the same tenant.

  • Other tenants are unaffected — the advisory lock key is per-schema (LOCK_MAGIC_NUM + qualified-table-name.hashCode()).

Diagnose

Find the stuck advisory lock and the connection holding it:

select l.pid,
       l.objid,
       l.objsubid,
       l.granted,
       a.application_name,
       a.client_addr,
       a.state,
       a.query_start,
       now() - a.state_change as held_for,
       a.query
from   pg_locks l
join   pg_stat_activity a on a.pid = l.pid
where  l.locktype = 'advisory'
order  by held_for desc nulls last;

A granted = true row with state = 'idle' (not idle in transaction) and application_name matching the finsta migrator (e.g. Flyway or the configured ApplicationName), held for many minutes, is the stuck lock.

Identify the finsta pod owning that connection by client_addr and the matching k8s pod IP:

kubectl --context <context> -n <namespace> get pods -o wide | grep <client_addr>

Confirm the pod is hung and not making progress: check liveness probe status, recent log activity, and JVM thread state (kubectl exec …​ — jstack 1 if available).

Mitigate

Preferred: restart the holding pod. The TCP connection drops, PostgreSQL releases the session lock, and the next migration attempt for that tenant proceeds.

kubectl --context <context> -n <namespace> delete pod <finsta-pod>

If restarting the pod is not possible (rare), terminate the specific backend in PostgreSQL:

-- only after confirming the pid is the stuck advisory lock holder
select pg_terminate_backend(<pid>);

pg_cancel_backend is not sufficient — it cancels the current statement but does not close the session, so the advisory lock stays held.

Prevent

  • Keep k8s liveness probes on finsta tight enough that a hung pod is killed within ~2 minutes.

  • Ensure JDBC tcpKeepAlive=true and PostgreSQL server-side tcp_keepalives_idle / tcp_keepalives_interval are short enough to reap dead TCP sessions in single-digit minutes.

  • If stuck-lock incidents become recurring rather than one-off, revisit the transactionalLock decision in SchemaRepository.tenantFluentConfiguration — flipping it back trades stuck-lock recovery time for the autovacuum-stall regression that issue #1222 was fixing.

  • Issue #1222 — Tenant migration performance: reduce per-tenant Flyway overhead at scale

  • plans/2026-05-07_1222__flyway-transactional-lock-multitenant-investigation.md — root-cause investigation

  • tritt.finsta.domain.tenant.SchemaRepository — where the lock mode is configured