The Broker in the Middle: Building an OIDC Federation Server in Go

The first enterprise customer didn't ask for SSO. They asked why their employees needed yet another password. Their security team had a policy: no standalone credentials for SaaS tools. Every login goes through Okta. Every session is governed by their IdP. If we couldn't do that, the deal would stall.

At the time, our admin platform had one authentication path: email and password. Argon2-hashed credential in the database, cookie-based session. It worked. It was simple. And it was now a blocker for the kind of customer that signs annual contracts.

Buy, Integrate, or Build

The obvious answer was to drop in an identity provider. Auth0, Keycloak, pick one. I spent a week evaluating that path. Three things made it uncomfortable.

First, the platform is a Go monorepo. All backends are ConnectRPC services sharing a database, a query layer, and a deployment pipeline. Introducing a Java-based Keycloak instance would add a new operational surface, a new failure domain, and a new deployment artifact that doesn't share any existing infrastructure.

Second, what we actually needed was narrow. One flow: Authorization Code with PKCE. No implicit grants, no hybrid flows, no device authorization. The feature surface of a full IdP would be 90% dead weight, but 100% of the maintenance burden.

Third, multi-tenant IdP configuration was a first-class requirement. Each organization connects to its own upstream identity provider. Org A uses Okta. Org B uses Azure AD. Org C uses Google Workspace. The mapping between tenant and provider needs to live in the same database as the rest of the tenant model, governed by the same RBAC. Bolting that onto an external IdP's configuration model would mean synchronizing state between two systems — the kind of dual-write problem that never ends well.

So we built it. A lightweight OIDC-compliant authorization server, embedded in the existing Go binary, running on its own port, sharing the database and the deployment pipeline.

💡

When the feature you need is a subset of a standard, and the integration surface is tighter than the adaptation surface, building the subset is simpler than wrapping the whole.

The Dual Role

Here's the architectural detail that makes this interesting. The SSO server isn't just an OpenID Connect Provider. It's also an OpenID Connect Relying Party. It plays both sides of the protocol simultaneously.

To the frontend (the React SPA), it looks like any standard OIDC provider. The frontend redirects to /authorize, gets back an authorization code, exchanges it at /token, and receives an ID token. Standard OAuth2 dance.

To the upstream corporate IdP (Okta, Azure AD, Google), it looks like any standard OIDC client. It redirects the user to the IdP's authorization endpoint, receives a callback with a code, exchanges it for tokens, and verifies the ID token.

The server sits in the middle, translating between two trust boundaries. It receives an identity assertion from the upstream IdP ("this person is jane@acme.com according to Okta"), maps it to an internal user ("jane@acme.com is user UUID abc-123"), and issues its own identity assertion to the frontend ("here's a JWT signed by us saying you're user abc-123 with these roles").

This is the federation broker pattern. One server, two protocol roles, two trust domains.

The Double Redirect

The flow has a shape that's unusual if you're used to single-hop OAuth. There are two full authorization code exchanges, nested inside each other.

Walk through it step by step:

The user clicks "Sign in with SSO." The frontend redirects to the SSO server's /authorize endpoint with the usual parameters (client_id, redirect_uri, code_challenge, state, nonce) plus two multi-tenant extensions: org_id and idp_id.
The SSO server validates every parameter against a strict chain — response type, scope, client ID, redirect URI allowlist, organization existence, IdP connection config, user email. Eleven checks before anything else happens.
If validation passes, the server starts a second OAuth flow. It looks up the upstream IdP's discovery document, generates its own PKCE verifier and state, persists the entire downstream context into a state record with a 10-minute TTL, and redirects the browser to the upstream IdP.
The user authenticates at Okta (or Azure AD, or Google). The upstream IdP redirects back with an authorization code.
The frontend calls the SSO server's /token endpoint with the upstream code and the downstream PKCE verifier.
The SSO server unwinds everything: claims the state record atomically, verifies PKCE, exchanges the upstream code for tokens, verifies the upstream ID token, links the identity, and issues its own ID token.

Two complete OAuth handshakes. One user experience.

PKCE All the Way Down

One detail that's easy to overlook: PKCE runs on both legs of the flow.

The frontend generates a code verifier and sends the S256 hash as a challenge to the SSO server. This protects the downstream authorization code from interception. But the SSO server also generates its own, separate PKCE verifier when it redirects to the upstream IdP. The same interception risk applies on the second leg.

Two independent PKCE exchanges. Two independent verifier/challenge pairs. Two independent protection boundaries. Neither trusts the transport to be secure on its own.

The State Record

The hardest part of the double redirect is bridging the gap. When the upstream IdP redirects back, the server needs to reconstruct the entire downstream context: which user initiated the flow, which organization, what the original PKCE challenge was, the redirect URI, the nonce.

All of this gets captured in a state record with a 10-minute TTL:

CREATE TABLE sso_auth_states (
    state      TEXT PRIMARY KEY,       -- 32-byte random token
    payload    JSONB NOT NULL,         -- full downstream context
    expires_at TIMESTAMPTZ NOT NULL,   -- 10 minutes from creation
    claimed_at TIMESTAMPTZ            -- set once during token exchange
);

When the /token endpoint receives the state, the server claims the record atomically:

UPDATE sso_auth_states
SET claimed_at = $2
WHERE state = $1
  AND claimed_at IS NULL
  AND expires_at > $2
RETURNING state, payload, expires_at, claimed_at;

Three things happen in one query. claimed_at IS NULL prevents replay. expires_at > $2 prevents stale states. And the atomic UPDATE ensures that if two requests race with the same state, exactly one wins. Same check-then-act pattern that applies everywhere: make the check and the mutation a single atomic operation.

Identity Linking

Once the upstream ID token is verified, the server needs to create a durable link between the upstream identity and the local user:

INSERT INTO sso_identities (user_id, connection_id, idp_sub)
VALUES ($1, $2, $3)
ON CONFLICT (connection_id, idp_sub)
DO UPDATE SET updated_at = now()
WHERE sso_identities.user_id = excluded.user_id
RETURNING user_id;

The WHERE sso_identities.user_id = excluded.user_id clause is the guard. If the upstream identity is already linked to a different local user, the UPDATE doesn't match, the query returns no rows, and the server rejects the login. This prevents a subtle attack where an IdP reassigns a subject identifier — the identity link acts as a second verification layer.

The Provider Cache Nobody Notices

Every authorization request requires the upstream IdP's endpoints: authorization URL, token URL, JWK Set URL. This metadata comes from the IdP's /.well-known/openid-configuration document.

Without caching, every login triggers an HTTP round-trip before the user even sees a login screen. The provider cache solves this with three design choices:

Singleflight. If a hundred concurrent requests all need the same provider metadata, only one HTTP call is made. The rest wait and share the result. This comes from golang.org/x/sync/singleflight.

TTL-based expiration. Provider metadata changes rarely, so the cache keeps entries warm for five minutes. Simpler than LRU and better suited to the access pattern.

Error caching. If a discovery endpoint is down, the failed lookup is cached with the same TTL. Without this, a down IdP would cause retry storms.

type providers struct {
    mu sync.RWMutex
    m  map[string]providerEntry  // issuerURL -> cached provider
    g  singleflight.Group        // prevents thundering herd

    http *http.Client            // dedicated client, 10s timeout
    ttl  time.Duration           // default: 5 minutes
}

The Deployment That Didn't Change

One of the quieter benefits of building this inside the existing binary: the deployment didn't change. The SSO server runs on port 8084, alongside the admin API on 8082 and the backoffice API on 8081. Same Go binary. Same Docker image. Same CI pipeline. Same Kubernetes pod.

The only infrastructure addition was a new HTTPRoute in the Kubernetes Gateway API. No new deployments, no new sidecar containers, no new database instances. The SSO tables live in the same PostgreSQL database, managed by the same migration tool, queried through the same connection pool.

ℹ️

The cheapest infrastructure to operate is the infrastructure you didn't add. Every new deployment artifact is a new thing that can fail independently, a new thing to monitor, a new thing to page someone about at 2 AM.

The Lesson

Authentication is one of those domains where the temptation to reach for a managed service is strong, and usually justified. The standards are complex, the security surface is unforgiving, and the cost of getting it wrong is measured in breached accounts.

But there's a specific situation where building makes sense: when the integration boundary with an external service is wider than the implementation boundary of the subset you need. We needed one flow, one grant type, one signing algorithm, embedded in an existing system with existing multi-tenant infrastructure. The subset of OIDC we implemented fits in about 800 lines of Go.

The federation broker pattern made it work. The frontend doesn't know about Okta. Okta doesn't know about the frontend. The broker translates between them, and the database holds the state that makes the translation possible.

⚠️

Build the subset you need when the integration cost exceeds the implementation cost. But only when you're willing to own the security surface that comes with it.