Hardening a Cognito SPA against supply chain XSS: auth proxy, httpOnly cookies, and Lambda@Edge CSP

The September 2025 compromise of chalk and 17 other widely-used npm packages put malicious code into a dependency graph that ships 2.6 billion downloads per week. The injected payload hooked fetch(), XMLHttpRequest, and window.ethereum to monitor network traffic and intercept wallet transactions, all while running from inside the same trusted origin as the application embedding the package. That class of attack, where a compromised dependency runs in the user’s browser as part of the application itself, has gone from theoretical to near-monthly headline in the npm ecosystem over the last two years.

The pattern these attacks share is that the malicious code runs as application code, from the same origin, with the same privileges. A WAF in front of the application sees exfiltration requests indistinguishable from legitimate ones; WAFs with behavioural analytics can flag anomalous destinations but are easily evaded by exfiltrating to a high-reputation domain. WAF still has its place for other attack classes, but this is not where supply chain XSS gets stopped: nothing in a WAF’s core model handles the case where the attacker is already in the browser, as part of the application.

This matters specifically for SPAs that authenticate with Amazon Cognito following the hosted UI pattern. Every Cognito tutorial documents the same flow: hosted UI for login, authorization code flow with PKCE, tokens returned to the browser and stored client-side (usually in localStorage), bearer token in the Authorization header on every API call. WAF in front, IAM behind, Cognito groups for application-level authorization. Nothing about the pattern is wrong by the standards of any tutorial I could find. None of it physically prevents a compromised dependency from reading the tokens straight out of browser storage and sending them somewhere, because the WAF is not where supply chain attacks happen and the IAM layer is not in the loop for token theft.

The hardening I ended up with combines two layers: a backend Auth Proxy that keeps the refresh token off the browser entirely, short-lived tokens in JavaScript closure memory rather than browser storage, and a strict Content Security Policy enforced via Lambda@Edge with hash-based script-src. The two layers work as a pair: one closes the passive exfiltration path, the other closes the active one. Neither is sufficient alone, and the implementation details that make them work together took a few design iterations to get right.

Supply chain XSS threat model for browser-based applications

Two things go wrong if your tokens live in browser storage when a compromised dependency starts executing. First, the attacker reads them passively: a single line of code that reads localStorage.getItem('cognito_tokens'), base64-encodes the result, and pushes it to a server the attacker controls. Second, even if the tokens were not in storage, the attacker can intercept them at the moment of use, because every request the SPA makes attaches the bearer token to the Authorization header, and the attacker can monkey-patch fetch or XMLHttpRequest to capture the header on its way out. Putting the tokens in JavaScript memory instead of storage closes the first path but not the second, because a script running in the same execution context can still observe them at the point of use. The design has two layers precisely to close both: the refresh token kept off the browser entirely, and a strict CSP whose connect-src denies the attacker anywhere to send what they did manage to read.

The classes of attack this hardening does not address are worth naming up front. Application-level XSS in your own code belongs to your sanitisation library and your code review. Server-side vulnerabilities in the API belong to the data layer and have their own controls. A device fully compromised by malware wins regardless of what the application does. The hardening here targets the supply chain class specifically, which is the one this design was built to close.

Auth Proxy architecture: Backend for Frontend for OAuth2 on Cognito

The pattern of putting a small backend between the SPA and the identity provider is sometimes called Backend for Frontend, sometimes BFF, and the IETF draft on OAuth 2.0 for Browser-Based Apps has been recommending it as the preferred architecture over the pure-PKCE-in-browser approach for some time. The concept is general. This section walks through the concrete implementation we ended up with on Cognito specifically, with the decisions we had to make and the trade-offs they imply.

The Auth Proxy in my setup is a single Lambda behind API Gateway, exposing three routes:

POST /api/auth/token for the authorization code exchange after the Cognito hosted UI redirects back to the application
POST /api/auth/refresh for silent refresh
POST /api/auth/logout for sign-out with token revocation

All three are public from API Gateway’s point of view, with no Cognito authorizer, because the authentication mechanism for refresh and logout is the httpOnly cookie itself, and for the code exchange it is the authorization code Cognito has just minted.

The Lambda runs in the same region as the rest of the application. It sits on the critical path of every page load, so cold start latency is something to keep in mind when sizing concurrency and choosing memory: a one-to-two-second cold start on the silent refresh that fires when the user opens the dashboard would make the security improvement feel like a regression.

The endpoint that does most of the work is the code exchange. The SPA, after the Cognito hosted UI redirects back with ?code=... in the query string, sends the code to /api/auth/token along with the redirect URI and a rememberMe flag. The Lambda calls Cognito’s /oauth2/token endpoint with grant_type=authorization_code, the code, the configured redirect URI, the client ID, and the client secret. Cognito returns the access token, the ID token, the refresh token, and the expiry. The Lambda returns the access token, ID token, and expiry in the JSON body of its response, and sets the refresh token in a Set-Cookie header with the right attributes:

Set-Cookie: __Host-tg_refresh=...; HttpOnly; Secure; SameSite=Strict; Path=/; Max-Age=2592000

The refresh token never appears in the response body. The browser persists the cookie according to the Max-Age attribute, and from that point on it gets automatically attached to any request the browser makes to the application’s origin under /, including the POST /api/auth/refresh calls. The four cookie attributes carry the security properties:

HttpOnly: JavaScript running on the page cannot read the cookie
SameSite=Strict: the cookie cannot be sent across origins
Secure: the cookie cannot be served over plaintext HTTP
__Host- prefix: the browser enforces those attributes at delivery time, rejecting any cookie with that prefix that lacks Secure, includes a Domain attribute, or has any Path other than /

The choice of __Host- with Path=/ rather than path-scoping the cookie to /api/auth/* is a deliberate trade-off. Path-scoping looks tighter at first glance, because the cookie travels with fewer requests, but the __Host- prefix provides stronger guarantees against subdomain attacks and domain attribute misconfiguration. The cookie gets sent on every same-origin request, but only the /api/auth/* endpoints actually read it, and the rest of the application ignores it. For a B2B dashboard with a single origin, the prefix guarantee is more valuable than the path narrowing.

Confidential Cognito client with server-side secret rotation

One of the architectural shifts the Auth Proxy makes possible is moving the Cognito app client from public to confidential, with the client secret in AWS Secrets Manager instead of in the bundle. An attacker who intercepts an authorization code, for example through a corporate proxy with overly verbose logging, cannot exchange it for tokens without also having the client secret. The secret rotates on a 90-day Secrets Manager schedule; the Lambda caches it in a module-level variable at cold start, and handles in-flight rotation by retrying once with the previous secret version only on invalid_client errors from Cognito. Other error types (network failures, expired codes, generic upstream issues) do not trigger the retry, because they are not symptoms of a stale secret.

Silent refresh, page reload, and the Memory Token Store

The access token and ID token live in JavaScript closure variables inside a module that exposes a handful of synchronous getters and setters. The variables are not properties of window, not stored anywhere the DOM APIs can reach, not serialised to any browser storage mechanism. A compromised dependency that calls localStorage.getItem or sessionStorage.getItem or document.cookie finds nothing, because nothing is there.

The trade-off is that closing the browser tab loses the in-memory tokens, which is exactly what you want for short-lived credentials, but means the application has to reacquire them on every page load. This is the silent refresh flow: when the SPA initialises and finds the memory store empty, it sends a POST /api/auth/refresh to the Auth Proxy. The browser attaches the httpOnly cookie containing the refresh token automatically. The Lambda exchanges the refresh token with Cognito, gets new short-lived tokens, returns them in the response body, and updates the cookie if Cognito rotated the refresh token.

The user experience target is that the silent refresh completes fast enough not to feel like a load delay. While the refresh is in flight, the SPA renders a lightweight loading state with an ARIA live region announcement for screen reader users, and holds back API calls that would require a token. If the refresh fails with 401, meaning the refresh token is expired or revoked, the SPA redirects to the Cognito hosted UI for a fresh login.

Cross-tab coordination with the Web Locks API

The scenario that breaks naive implementations of this pattern is multiple tabs of the same application opening simultaneously after a browser restart. Each tab on cold start has an empty memory store, each tab issues a silent refresh request, and Cognito’s refresh token rotation kicks in: only the first request gets a valid response, the others receive 401 because they are presenting a refresh token that has already been consumed. The user is bounced back to login despite having a perfectly valid session.

The fix is to serialise the cold-start refresh across tabs using the Web Locks API. One tab elects itself as leader by acquiring a named lock, performs the refresh, and broadcasts the new tokens to the other tabs via the BroadcastChannel API. Followers wait for either the broadcast or the lock release, then read the refreshed tokens from the channel without ever hitting the proxy. If the leader tab dies before broadcasting, followers fall back to an independent refresh after a conservative timeout, accepting a small race window where a second refresh may be issued.

Logout flow and refresh token revocation at Cognito

The logout flow has one detail that often gets skipped: the refresh token has to be revoked at Cognito, not just deleted from the browser. Cognito supports revocation through the /oauth2/revoke endpoint; a refresh token only cleared from the cookie remains valid at Cognito until its natural expiry, up to 30 days. A stolen refresh token still works for the rest of its lifetime even after the user has clicked “Sign Out” on every device they own.

The Auth Proxy’s logout endpoint clears the cookie by setting Max-Age=0, then calls /oauth2/revoke against Cognito. Revocation is mandatory but non-blocking: if Cognito returns an error, the Lambda logs the failure and returns success anyway, because the user-visible logout flow has to complete regardless. The cookie is cleared, the memory store is cleared, and all open tabs receive a logout broadcast and redirect to the hosted UI logout endpoint.

Strict Content Security Policy delivery with Lambda@Edge and S3

The second layer of the hardening is the Content Security Policy. The policy that defends against supply chain XSS uses a small set of directives that work together:

script-src allowlists only the inline scripts the application actually serves (by SHA-256 hash), uses strict-dynamic to propagate trust to dynamically loaded chunks, and blocks everything else
connect-src lists the API Gateway origin and the Cognito User Pool domain explicitly and nothing else
require-trusted-types-for 'script' closes the DOM-based XSS path that strict-dynamic does not cover
object-src, base-uri, frame-ancestors, form-action get locked down with the standard hardening directives

Reporting-Endpoints: csp-endpoint="/api/csp-report"

Content-Security-Policy: default-src 'self';
  script-src 'self' 'sha256-...' 'sha256-...' 'strict-dynamic';
  style-src 'self' 'unsafe-inline';
  connect-src 'self' https://api.example.com https://auth.example.com;
  img-src 'self' data: https:;
  font-src 'self';
  object-src 'none';
  base-uri 'self';
  frame-ancestors 'none';
  form-action 'self';
  require-trusted-types-for 'script';
  report-to csp-endpoint

The style-src 'unsafe-inline' is a documented trade-off. React and Tailwind inject inline styles through the style attribute, and eliminating those would mean refactoring the entire UI library layer. Style injection has a substantially lower attack surface than script injection, but it is not zero: data exfiltration via CSS selectors and background-image: url(...) is a known technique. The residual risk was judged acceptable given that the primary threat (script execution from a compromised dependency) is already constrained by everything else in the policy.

The interesting decision is how to deliver the CSP header. The nonce-based pattern generates a random nonce per request and rewrites every inline <script> tag with it. The hash-based pattern precomputes the SHA-256 hashes of all inline scripts at build time and serves the same hashes on every response. For a statically built React SPA where the inline scripts are known and stable, hash-based is simpler and more cacheable.

The deployment shape is where the design hit a wall. The obvious choice was a CloudFront Function on viewer-response, small and cheap, but CloudFront Functions cannot read external files at runtime: the hash list would have to be baked into the function code at deploy time, coupling the frontend deploy to a backend redeploy. The fix is Lambda@Edge, which can do an S3 GetObject from inside the viewer-response handler. The hash manifest can live in S3 alongside the frontend assets, get refreshed on every frontend deploy, and be read by the CSP layer at runtime. Lambda@Edge is more expensive per invocation and adds some viewer-response latency, but the manifest read happens once per warm container and the runtime cost is dominated by the rest of the page load.

The handler itself is small. The hash manifest is read from S3 with a short in-memory cache, so a warm Lambda@Edge container does not pay the GetObject on every request. New frontend deploys propagate to the CSP header within the cache TTL, without any CDK redeploy. The handler fails open: if the manifest is unreachable, the response goes out without a CSP header rather than blocking the page entirely. A CloudWatch alarm on the manifest fetch error metric catches issues without taking the application down. The heart of the handler is the CSP construction itself:

def handler(event, context):
    response = event['Records'][0]['cf']['response']
    if 'text/html' not in _content_type(response):
        return response
    manifest = _get_manifest()  # cached S3 GetObject
    script_hashes = ' '.join(f"'{h}'" for h in manifest['hashes'])
    csp = (
        f"default-src 'self'; "
        f"script-src 'self' {script_hashes} 'strict-dynamic'; "
        f"style-src 'self' 'unsafe-inline'; "
        f"connect-src 'self' {CONNECT_SRC}; "
        f"img-src 'self' data: https:; font-src 'self'; "
        f"object-src 'none'; base-uri 'self'; "
        f"frame-ancestors 'none'; form-action 'self'; "
        f"require-trusted-types-for 'script'; "
        f"report-to csp-endpoint"
    )
    response['headers']['content-security-policy'] = [
        {'key': 'Content-Security-Policy', 'value': csp}
    ]
    response['headers']['reporting-endpoints'] = [
        {'key': 'Reporting-Endpoints', 'value': f'csp-endpoint="{REPORT_URI}"'}
    ]
    return response

A practical note: Lambda@Edge deploys from us-east-1 regardless of where the primary stack lives. The CDK experimental.EdgeFunction construct handles the cross-region replication, but the first deploy from another region fails with a cryptic CloudFormation error if you do not know this.

Inline loader pattern for the entry point under strict-dynamic

The combination of strict-dynamic and a bundler-generated entry point has a sharp edge that took me a deploy iteration to find. Vite produces an index.html with a <script type="module" src="/assets/index-XXX.js"> tag, which is parser-inserted in CSP terminology. With strict-dynamic in the policy, parser-inserted external scripts are blocked unless they match a hash, because strict-dynamic disables the 'self' host allowlist for those tag types. The first deploy with strict CSP enforcement produced a blank page on every load, with a console violation pointing at the entry point bundle.

The fix is to make the entry point load through a trusted inline script rather than a parser-inserted tag. The Vite plugin that generates the hash manifest also rewrites the entry point tag during the build, replacing the standard <script type="module" src="/assets/index-XXX.js"> with an inline loader that creates the same tag via DOM APIs:

<script>
  var s = document.createElement('script');
  s.type = 'module';
  s.src = '/assets/index-XXX.js';
  document.head.appendChild(s);
</script>

The inline loader is hashed by the plugin, which adds its SHA-256 to the manifest. At runtime, the browser trusts the inline loader because its hash matches, and strict-dynamic propagates that trust to the script the loader creates via document.createElement and appendChild. The entry point and all its lazy chunks load under the propagated trust, with no host-based allowlisting needed.

A subtle point: dynamic import() does not propagate trust under strict-dynamic the way createElement does, a known gap in the CSP Level 3 specification tracked at the W3C. Use createElement, not import().

Deployment gotchas with strict CSP and BFF auth

Three things that went wrong on the first attempt at production deploy:

The blank page from the entry point. Symptom described above. Diagnosis: strict-dynamic blocking the parser-inserted module tag. Fix: the inline loader described above.

The Auth Proxy returning 200 with an empty body. Symptom: the frontend got a 200 from /api/auth/token but response.json() failed and the user bounced back to Cognito with a generic “code exchange failed” message. Root cause: a missing https:// prefix in the COGNITO_DOMAIN environment variable made urllib.request.Request reject the URL as malformed, and an over-eager outer try/except swallowed the exception and returned an empty 200 instead of a 502. Two fixes: validate the env var format at handler init, and make the outer exception handler return a meaningful status code.

The silent refresh redirecting to login during the OAuth callback. Symptom: on the redirect back from Cognito with ?code=... in the URL, the SPA bounced straight back to Cognito for login. Root cause: the SPA initialised with an empty memory store and fired a silent refresh that failed (no refresh cookie yet, this was the first login), redirecting before the callback handler could process the code. Fix: a guard at the top of the silent refresh that checks for ?code= or ?error= in the URL and skips the refresh, leaving the callback handler to do its work.

The pattern across the three is that the failure modes of strict CSP and BFF auth are not obvious from the symptom. Each took a debugging session to isolate. Worth budgeting time for.

Threat model limitations: what this hardening does not cover

The hardening I described closes the supply chain XSS vector specifically. The classes of compromise it does not address:

A compromised dependency that runs before the auth machinery initialises can still do damage. The Auth Proxy moves the refresh token out of reach, but the access token exists in memory once the user logs in, and a sufficiently early hook into fetch will see it on every API call. The CSP connect-src is what stops the attacker from sending it anywhere useful, and the strictness of that allowlist carries the weight of the active-path defence.
Sender-constrained access tokens via DPoP (RFC 9449) are not in the design, because Cognito does not support DPoP at the OAuth2 layer. DPoP would bind the access token to a client-held cryptographic key, so that a token exfiltrated by a compromised dependency is unusable without the corresponding private key. The Pre Token Generation Lambda trigger that Cognito exposes for token customisation does not have access to the request headers and cannot validate the DPoP proof, so a workaround at that layer is ineffective. If your identity provider supports DPoP natively, it is the natural next layer of defence and closes the window where the access token is briefly readable in JavaScript memory.
The Cognito hosted UI itself is outside the trust boundary this design controls. The user types their password into a page served from the Cognito domain, under a CSP and a JavaScript bundle that AWS maintains. The architecture trusts that surface implicitly. This is a fiducial dependency on AWS, not a property of the hardening.
A server-side bug that broadens the Cognito app client’s scope, leaks the client secret, or misconfigures the Auth Proxy’s IAM role takes the whole pattern down, because everything assumes the backend is the one you wrote and reviewed.
A user device fully compromised by malware that can install browser extensions or read process memory wins regardless of what the application does. The browser is the attacker’s machine at that point and no application-layer control survives that.

Implementation notes for the Auth Proxy and strict CSP

The notes below are split between things that change the security properties of the pattern and things that you only learn by operating it.

Security-critical configuration for the Auth Proxy and CSP

Keep connect-src tight enough to matter. A connect-src that allows https: accepts any HTTPS destination and provides no protection against exfiltration. The whole point of the directive is to be the exact list of origins your application legitimately talks to: your API Gateway, your Cognito User Pool domain, your telemetry endpoint, and nothing else. Auditing this list periodically is necessary, because each new dependency tends to want to add to it.

Roll out require-trusted-types-for 'script' gradually, and know its coverage limits. The directive blocks DOM sink operations (innerHTML, document.write, eval) that receive raw strings instead of Trusted Type objects, closing the DOM-based XSS path that strict-dynamic does not cover. React rarely uses those sinks in normal rendering, but third-party dependencies do, so the rollout went through Content-Security-Policy-Report-Only first to surface violators, then either patched them or wrapped their usage in narrow policies, then moved to enforcement. Browser support is uneven: Chromium enforces it, Firefox supports it behind a flag, Safari does not support it at all. For a user base with a meaningful Safari share, Trusted Types protects only some of the users, and the rest of the hardening carries the weight on the others.

Reject missing Origin headers at the Auth Proxy. A common mistake is to write if origin and origin not in allowlist: reject, which silently accepts requests where the Origin header is absent entirely. Browser fetch() always sends the Origin header on cross-origin requests; a missing Origin means a non-browser client (curl, an attacker tool, a misconfigured proxy). The correct check is if origin not in allowlist: reject, treating absence as a rejection criterion.

Always send credentials: 'include' from the SPA’s fetch calls. The httpOnly cookie does not travel on cross-origin requests without it on the client, and the server response must include Access-Control-Allow-Credentials: true with a specific origin (not *). Both sides have to opt in, or the refresh flow silently fails.

Operational pitfalls when running BFF auth on Cognito

Match the redirect URI byte for byte. Cognito compares the redirect_uri parameter against the allowed callback URLs with exact string comparison. A trailing slash, a different case in the hostname, an extra query parameter, all cause invalid_grant errors that look identical to expired codes from the outside. Three places have to agree exactly:

the Cognito app client’s allowed callback URLs
the Lambda environment variable
the value the frontend sends in the /api/auth/token call

Watch for React StrictMode in development. StrictMode double-invokes effects, including the callback handler, which causes the second code exchange to fail with invalid_grant since Cognito has already consumed the code. The fix is an idempotence guard around the code exchange, not removing StrictMode.

Filter browser extension reports in CSP violation logging. Once strict CSP is live, the report endpoint drowns in violations from browser extensions injecting scripts. Filter on source-file, excluding URIs starting with chrome-extension://, moz-extension://, and safari-extension://.

Defense in depth against supply chain XSS in the browser

The supply chain XSS vector is one of those problems that sits underneath the security model of most SPAs and never quite gets addressed, because the harder pattern requires a backend and feels heavier than the framework defaults make it look. The cost, once you accept it, is a Lambda, a cookie, a CSP, and a few weeks of implementation. The benefit is that defense in depth in the browser stops being a slogan: storage becomes memory, the long-lived credential becomes a cookie the application code cannot read, the exfiltration path becomes a CSP allowlist the attacker cannot extend. None of those layers is sufficient on its own; together they make the class of attack genuinely unprofitable.