Took way more time than I’d like. I did so many inpaintings and manual paintings fixing weird things. I wonder if people just brute force the generations or also do the same.
Workflow (txt2img)
Positive: (masterpiece, best quality:1.3), (flat color, watercolor:1.1), a girl sitting on a sidewalk, leaning on wall, 1girl, pink shirt, black skirt, (frills:0.8), black thighhighs, black choker, frilled shirt collar, lolita fashion, medium breasts, detailed background, nice hands, perfect hands, face
Negative: embedding:easynegative, embedding:ng_deepnegative_v1_75t, (worst quality, low quality:1.3), jpeg artifacts, signature, watermark, bad anatomy, bad proportions, disfigured, amputee, disembodied limb, severed limb, missing limb, missing arms, extra arms, extra legs, extra hands, missing finger, fewer digits, extra digits, (cropped head:1.5), realistic, monochrome, 3d, loli, maid, apron, huge breasts, big breasts, flat chest, hat
Checkpoint: Anything V5 PrtRE, Seed: 1020843393444063, Steps: 20, Sampler: DPM++ 2M Karras, Size: 768x512, Upscaler: 4x Ultra-Sharp and RealESRGAN_x4plus_anime_6B
This workflow only produces the base image. The final image went through countless img2img, inpaintings, and also manual fixes. The process is roughly like this:
Base gen > inpainting and manual painting > 2x upscale (4x-Ultrasharp + x4plus_anime_6B) > inpainting and manual painting > 2x upscale 4x-Ultrasharp > finishing touches
Note that I also use ComfyUI, so the result might be different on other UIs.