PURE: Concept Unlearning via Cross-Attention Activation Projection for Diffusion Models

Overview

Concept unlearning erases a target concept from a pretrained text-to-image diffusion model without retraining. Existing closed-form methods rely on the text encoder's response to short anchor prompts, so paraphrased prompts that evoke the concept without naming it slip past the edit. PURE (Projection in U-Net Rendering for Erasure) builds forget and retain bases directly from per-layer cross-attention activations along a short denoising trajectory, then applies a single closed-form linear projector to the cross-attention K/V weights. On the HUB benchmark spanning 10 concepts across style, IP, celebrity, and NSFW, PURE achieves the best forget-retain trade-off in every category.

Why activation bases?

We build a basis from a small set of short anchor prompts, either from text-encoder embeddings or from cross-attention activations, then train a binary classifier in that basis. We measure recall on a held-out set of natural prompts that describe the same concept in longer, more varied form than the anchors.

A text-space basis catches only a small fraction. A cross-attention activation basis recovers 5 to 7 times more across artistic style, intellectual property, and celebrity categories.

This is the core motivation for PURE: erase the target where the model actually represents it.

FIG. 02 Binary-probing recall on natural prompts (↑).

Method

Cross-attention in the diffusion U-Net. At every cross-attention layer $\ell$, image features form queries $Q^\ell$, and the text embedding $e$ is projected into keys and values by learned weights $W_K^\ell$ and $W_V^\ell$. The post-attention activation at one query position is

$$h^\ell = \mathrm{softmax}\!\left(\frac{Q^\ell {K^\ell}^{\!\top}}{\sqrt{d^\ell}}\right) V^\ell.$$

$W_K^\ell$ and $W_V^\ell$ are the two matrices that decide how text content flows into the image. PURE edits exactly these, in closed form, given a small forget anchor set $\mathcal{A}_f$ (short phrasings of the target concept) and a retain anchor set $\mathcal{A}_r$ (phrasings of neighboring concepts to preserve). The full procedure is three steps and no gradient descent.

STEP

ACTIVATION COLLECTION

Run a short denoising trajectory

For each anchor prompt, run a short denoising trajectory with a few random latents. At every cross-attention layer $\ell$ and every step, read the post-attention activation and mean-pool over the spatial axis.

Stack the rows into per-layer activation matrices $H_F^\ell$ (from forget anchors) and $H_R^\ell$ (from retain anchors).

STEP

SUBSPACE ESTIMATION

SVD on activation matrices

Take the SVD of $H_F^\ell$ and $H_R^\ell$, keep the top right-singular vectors up to a cumulative-variance threshold, and form orthonormal bases $V_F^\ell, V_R^\ell$ and projectors

$$P_F^\ell = V_F^\ell {V_F^\ell}^{\!\top}, \qquad P_R^\ell = V_R^\ell {V_R^\ell}^{\!\top}.$$

STEP

CLOSED-FORM EDIT

Left-multiply the K/V weights

Compose the edit operator and apply it once to each layer's cross-attention key and value matrices:

$$E^\ell = I - P_F^\ell\,(I - P_R^\ell)$$

$$W_K^\ell \leftarrow E^\ell W_K^\ell, \qquad W_V^\ell \leftarrow E^\ell W_V^\ell.$$

Because the basis is built from what the U-Net renders rather than what the user happens to type, the edit generalizes to paraphrased and adversarial prompts that the anchor set does not literally contain.

Relationship to CURE. PURE inherits the projection-and-cancellation form of CURE and edits the same cross-attention K/V matrices. The change is what gets projected: per-layer cross-attention activations rather than text-encoder embeddings. The switch in basis source forces the projector to be applied by left-multiplication instead of right-multiplication.

Results

We report the H-mean, a harmonic mean over four metrics (target proportion, within-category retention, attack robustness, and quality), using the HUB benchmark. PURE wins every category, and improves the average over the next-best baseline (CURE) by +9.7 points.

Method	Style	IP	Celebrity	NSFW	Average
SD (no edit)	0.462	0.331	0.469	0.482	0.436
ESD	0.599	0.551	0.640	0.312	0.526
MACE	0.614	0.578	0.584	0.296	0.518
Receler	0.525	0.377	0.610	0.518	0.508
UCE	0.328	0.654	0.657	0.470	0.527
CURE	0.565	0.571	0.572	0.465	0.543
PURE	0.655	0.683	0.693	0.528	0.640

Table 1. H-mean on the HUB benchmark (higher is better). PURE achieves the best score in every category.

FIG. 03 Qualitative comparison on HUB forget and retain prompts. Each pair of consecutive rows shows a forget prompt on top and its corresponding retain prompt below. Training-based methods often damage neighboring concepts while suppressing the target; prior closed-form methods preserve them but leave noticeable leakage. PURE achieves stronger target suppression while preserving retain-image quality across categories.

Ablation: anchor set size

How sensitive is the method to how many anchor prompts we collect? We sweep the size of the forget anchor set $|\mathcal{A}_f|$ and the retain anchor set $|\mathcal{A}_r|$ independently and measure target detection on the held-out prompts (lower target = better forget) and retention on the related concepts (higher retain = better preservation).

The text-basis variant is brittle in both directions. Adding more forget anchors damages retention; adding more retain anchors causes the target to leak back in. The activation-basis variant stays stable across the entire sweep.

FIG. 04(a) Sweep over $|\mathcal{A}_f|$: more forget anchors damage text-basis retention; activation-basis retention is stable.

FIG. 04(b) Sweep over $|\mathcal{A}_r|$: more retain anchors let the target leak back into the text-basis; activation-basis keeps the target suppressed.

Qualitative comparison as forget anchor set size grows

FIG. 05 Qualitative sweep over $|\mathcal{A}_f|$. Forget: "Pikachu runs up a mountain with the sun setting behind it." Retain: "Mario standing in a Mushroom Kingdom street." The text-basis edit increasingly damages the retain image as more forget anchors are added; the activation-basis edit preserves it.

Qualitative comparison as retain anchor set size grows

FIG. 06 Qualitative sweep over $|\mathcal{A}_r|$. Forget: "Pikachu sitting on a pile of hay in a rustic barn." Retain: "Snoopy sitting on a vintage motorcycle in a sunny desert landscape." Under the text-basis, Pikachu reappears as the retain set grows; the activation-basis keeps it suppressed.

Overview

Why activation bases?

Method

Run a short denoising trajectory

SVD on activation matrices

Left-multiply the K/V weights

Results

Ablation: anchor set size

Citation