The Curse of Recursion 论文复现:Stable Diffusion 自回归问题.

相关论文:2305.17493.pdf (arxiv.org)

自回归问题:GPT 或者 Diffusion 模型再进行训练或者 finetune 的时候,如果使用自己生成的数据来训练自己,则会使得最终训练出来的模型收到灾难性破坏,并影响最终生成效果。

原理

从论文的描述可以得出如下结论,使用 AI 生成的数据来训练 AI 本身会导致“模型崩溃(model collapse)”,原因主要有两种:

  • Statistical approximation error(统计逼近误差):由于样本数量有限,因此训练本身会出现误差;当样本数量趋近于无穷大时,误差会消失。
  • Functional approximation error(函数逼近误差):函数拟合能力不够强,模型误差也是不可避免。在没有统计误差的情况下,函数逼近误差只会发生在第一代。一旦新分布属于函数逼近器的图像,它在后续代中将保持完全相同。

具体复现流程

使用如下 AI 生成的人脸图作为数据集:

image

部分原图示例:

image

image

相关参数如下:

# Train data path
$pretrained_model = "xxx.safetensors" # base model path

# Network settings
$network_module = "networks.lora"
$network_weights = "" # pretrained weights for LoRA network
$network_dim = 128 # network dim
$network_alpha = 128 # network alpha

# Train related params
$resolution = "512,512" # image resolution
$batch_size = 1 # batch size
$max_train_epoches = 10 # max train epoches
$save_every_n_epochs = 1 # save every n epochs

$train_unet_only = 0 # train U-Net only
$train_text_encoder_only = 0 # train Text Encoder only

$noise_offset = 0 # noise offset
$keep_tokens = 0 # keep heading N tokens when shuffling caption tokens

# Learning rate
$lr = "1e-4"
$unet_lr = "1e-4"
$text_encoder_lr = "1e-5"
$lr_scheduler = "cosine_with_restarts" # "linear", "cosine", "cosine_with_restarts", "polynomial", "constant", "constant_with_warmup"
$lr_warmup_steps = 0 # warmup steps
$lr_restart_cycles = 1 # cosine_with_restarts restart cycles

训练完后进行图像生成,可以发现生成效果较差:

image

image

图像生成信息:cinematic lighting, brown hair, (1girl:1.8), beautiful face, portrait, short hair, gorgeous clothes, smile, [lora:SkySewa-000002:1](lora:SkySewa-000002:1) Negative prompt: horny, sexy, wrong hand, bad framing, out of frame, deformed, cripple, old, fat, ugly, poor, missing arm, additional arms, additional legs, additional head, additional face, multiple people, group of people, (worst quality, low quality:1.4), (greyscale, monochrome:1.1), 3D face, cropped, lowres, text, jpeg artifacts, signature, watermark, username, blurry, artist name, trademark, watermark, title, multiple view, Reference sheet, ((5 fingers, 6 fingers)), ((hands, hand)) Steps: 50, Sampler: DPM++ 2M, CFG scale: 7, Seed: 3770394888, Size: 512x512, Model hash: f30196796e, Model: dalcefo_portrait