Creating Breathtaking AI Art on Your Laptop: A Step-by-Step Guide
Written on
Chapter 1: Introduction to AI Art Generation
As a child, I was always fascinated by those who could effortlessly bring their imagination to life through drawing. I could spend hours watching them transform simple lines into intricate artwork. Unfortunately, I lacked that artistic talent myself.
However, with the advancements in AI, I can now bring my ideas to fruition, albeit in a different manner. While it might not provide the same fulfillment as traditional art, it allows me to express my thoughts visually.
I was thrilled when I received access to the DALL·E 2 private beta from OpenAI, yet I quickly realized its limitations regarding usage frequency, costs, and lack of control. Then came Stable Diffusion. This open-source AI system, akin to DALL·E 2, generates realistic images from natural language descriptions, merging concepts, attributes, and styles to produce unique visuals. The best part? You can operate it directly from your laptop without requiring a GPU.
In this guide, I will demonstrate how you can create stunning artwork on your machine in under 10 minutes, regardless of GPU availability. All you need is a system capable of running Python with Git installed. By the end of this process, you'll be able to articulate your artistic vision in natural language and see it come to life.
Chapter 2: Understanding Stable Diffusion
Stable Diffusion is a Latent Diffusion Model (LDM) designed to create realistic images based on natural language prompts. Developed by Stability AI in collaboration with LMU Munich, and supported by communities like Eleuther AI and LAION, it draws inspiration from projects like DALL·E 2 and Imagen by Google Brain. This model was trained using the LAION-Aesthetics dataset, part of the larger LAION 5B collection.
The open-source nature of Stable Diffusion, available on GitHub, empowers countless users to generate beautiful images without restrictions.
Section 2.1: Setting Up Your Environment
To get started, you need to set up your Python environment. Stable Diffusion provides instructions for creating a Miniconda environment, but I prefer using Python's built-in venv to create simple virtual environments and install packages via pip.
My system runs on Ubuntu 22.04 with Python 3.8.10, but you'll need at least Python 3.8.5. Begin by cloning the repository:
Navigate to the project directory and create a virtual environment:
cd stable-diffusion && python3 -m venv venv
Activate your environment and upgrade pip:
source venv/bin/activate && pip3 install --upgrade pip==20.3
If you've cloned the branch mentioned earlier, a requirements.txt file is included. If not, you can create one with the following content:
numpy==1.19.2
torch==1.11.0+cpu
torchvision==0.12.0+cpu
albumentations==0.4.3
diffusers
opencv-python==4.1.2.30
pudb==2019.2
invisible-watermark
imageio==2.9.0
imageio-ffmpeg==0.4.2
pytorch-lightning==1.4.2
omegaconf==2.1.1
test-tube>=0.7.5
streamlit>=0.73.1
einops==0.3.0
torch-fidelity==0.3.0
transformers==4.19.2
torchmetrics==0.6.0
kornia==0.6
Now, install the dependencies:
pip3 install -r requirements.txt
Lastly, install the Stable Diffusion library itself:
pip3 install -e .
Section 2.2: Downloading the Model
Depending on your internet speed, the download should take around 5–10 minutes, as the file size is approximately 4GB. After downloading, create a directory for the model:
mkdir models/ldm/stable-diffusion-v1
Then, move the model file into this folder and rename it:
mv /path/to/sd-v1-4.ckpt models/ldm/stable-diffusion-v1/model.ckpt
Section 2.3: Running Your First Model
You're now ready to generate your first piece of art! Execute the following command:
python3 scripts/txt2img.py --prompt "An astronaut riding a horse, painted by Pablo Picasso." --plms --n_iter 5 --n_samples 1
The system will download weights for pre-trained models, such as the CLIP encoder and the core transformer. Be patient during this process. If you're running without a GPU, this step may fail, but don't worry—there’s a way to create images without one.
To run it on CPU, simply add the --config flag and point to an alternative configuration file:
python3 scripts/txt2img.py --prompt "An astronaut riding a horse, painted by Pablo Picasso." --plms --n_iter 5 --n_samples 1 --config configs/stable-diffusion/v1-inference-cpu.yaml
The process might take over 30 minutes, depending on your hardware. But consider how much time you’d wait for a traditional portrait to be drawn!
Chapter 3: Conclusion
In summary, much like DALL·E 2, Stable Diffusion is an open-source AI model capable of generating realistic images from textual descriptions. This guide has illustrated how you can quickly create impressive art on your system, even without a dedicated GPU. The most exciting aspect is the control you have over the software, allowing you to generate images whenever you choose—just be mindful of the content you create!
About the Author
My name is Dimitris Poulopoulos, and I work as a machine learning engineer at Arrikto. I have developed AI solutions for prominent organizations, including the European Commission, IMF, and IKEA. For more insights on Machine Learning, Deep Learning, and Data Science, follow me on Medium, LinkedIn, or Twitter @james2pl.
The views expressed here are solely my own and do not reflect those of my employer.