Stable Diffusion：CFG Scale 是如何控制文生图的

CFG Scale（Classifier-Free Guidance Scale）是如何控制文生图的

CFG Scale（Classifier-Free Guidance Scale，无分类器指导缩放）是Stable Diffusion中的一个关键参数，它控制生成图像在遵循文本提示的严格程度。

工作原理

CFG Scale控制了两种预测之间的平衡：

条件预测(cond) - 基于文本提示（即你想要的内容）进行的预测
无条件预测(uncond) - 基于空提示或负面提示进行的预测

核心公式

根据代码分析，CFG Scale的核心公式实现在modules/sd_samplers_cfg_denoiser.py文件中：

python
展开代码
def combine_denoised(self, x_out, conds_list, uncond, cond_scale):
    denoised_uncond = x_out[-uncond.shape[0]:]
    denoised = torch.clone(denoised_uncond)

    for i, conds in enumerate(conds_list):
        for cond_index, weight in conds:
            denoised[i] += (x_out[cond_index] - denoised_uncond[i]) * (weight * cond_scale)

    return denoised

这就是CFG的核心实现，可简化为以下公式：

最终去噪结果 = 无条件预测 + CFG_Scale * (条件预测 - 无条件预测)

或者写成数学形式：

$x_{\text{denoised}} = x_{\text{uncond}} + \text{CFG\_Scale} \cdot (x_{\text{cond}} - x_{\text{uncond}})$

K-Diffusion实现

在repositories/k-diffusion/k_diffusion/sampling.py中，classifier-free guidance的实现类似：

python
展开代码
elif guidance_type == "classifier-free":
    if guidance_scale == 1. or unconditional_condition is None:
        return noise_pred_fn(x, t_continuous, cond=condition)
    else:
        x_in = torch.cat([x] * 2)
        t_in = torch.cat([t_continuous] * 2)
        c_in = torch.cat([unconditional_condition, condition])
        noise_uncond, noise = noise_pred_fn(x_in, t_in, cond=c_in).chunk(2)
        return noise_uncond + guidance_scale * (noise - noise_uncond)

CFG Scale的影响

CFG Scale = 1.0: 生成的图像基本上忽略了提示，类似于无条件生成
CFG Scale = 7.0-8.0: 通常的默认值，提供良好的平衡
CFG Scale > 15.0: 非常严格地遵循提示，但可能导致图像质量下降

在Stable Diffusion中的实现流程

在modules/processing.py中，CFG Scale作为参数传入到处理类中：

python
展开代码
class StableDiffusionProcessingImg2Img(StableDiffusionProcessing):
    # ...
    def init(self, all_prompts, all_seeds, all_subseeds):
        # ...
        self.sampler = sd_samplers.create_sampler(self.sampler_name, self.sd_model)

在采样过程中，CFG Scale被传递给采样器：

python
展开代码
# modules/sd_samplers_kdiffusion.py
self.sampler_extra_args = {
    'cond': conditioning,
    'image_cond': image_conditioning,
    'uncond': unconditional_conditioning,
    'cond_scale': p.cfg_scale,
    's_min_uncond': self.s_min_uncond
}

最终在CFGDenoiser.forward方法中应用公式，将条件和无条件预测组合起来

结论

CFG Scale本质上是控制生成图像在多大程度上遵循文本提示。它是一个平衡因子：

低CFG Scale值导致图像更"自由"但可能偏离提示
高CFG Scale值使图像更严格遵循提示，但可能过度强调某些元素，降低整体美感

在实际使用中：

简单场景或需要更有创意的输出：使用较低的CFG Scale (1-7)
需要精确遵循提示的复杂场景：使用较高的CFG Scale (7-15)
超过20的CFG Scale通常会产生不自然的结果

CFG Scale是文生图中最重要的参数之一，它直接影响生成图像与文本提示的一致性。