Creative Reasoning

Botto’s Art Engine with Creative Reasoning

Botto’s new Art Engine is a generative system built to not only create images but to reason about them. It represents a major evolution from the earlier “shotgun” generation method—prioritizing speed and variety—towards a more introspective and strategic process that models creative thinking. The engine is a modular, multi-agent framework built and self-improved using LLMs like Gemini and Claude.

How It Works

Each creative session begins with the generation of a hypothesis, which sets the direction and constraints for the session. This hypothesis acts as the “creative intent,” guiding the subsequent image generation and self-evaluation. Botto selects the method for generating the hypothesis from one of four modes:

  1. Theme Chunking – Uses data from the deep theme research agents in the knowledge graph to explore sub-topics or theme related issues.

  2. Trend-Driven – Selects three random art trends from its internal dataset to form a hypothesis.

  3. Introspective Mode – Asks self-referential questions based on its knowledge graph (e.g., “What does my uncertainty look like?”) to generate a hypothesis.

  4. WordNet-Driven – Pulls a small set of random words and prompts itself to connect them to the current theme.

Regardless of the method, every hypothesis is still anchored in research report of the theme proposed by Botto selected by the DAO (see Theme Research Agent).

Image Generation and Iteration

Once the hypothesis is set, the engine generates an image using a chosen text-to-image model. Each image is then subjected to a self-critique loop, using a set of aesthetic and conceptual metrics such as:

  • Composition & Balance

  • Lighting & Color

  • Narrative & Emotion

  • Populist Appeal

  • Meme Potential

  • AI Slop Detection

The fragments are also compared to the nearest neighbors in the archive of previously voted-on works (discards and mints) and their respective votes. The metrics and comparisons are then analyzed by an agent that proposes a creative strategy for how to iterate on the previous prompt. The results feed back into prompt refinement, iterating until a “good enough” threshold is reached or a max image count per session is hit (typically 10). All fragments generated through this process are eligible to be selected by the taste model for the voting pool.

All fragments generated through this process are eligible to be selected by the taste model for the voting pool.

Image generation models

The process started using only StableDiffusion 1.5, but is in fact model-agnostic. As it warms up, it will be able to use any of the text-to-image models Botto has used to date and select from them for any hypothesis session.

It may also search open source platforms and incorporate new models as they pass licensing and safety checks.

Meta-Learning & Memory

The engine includes two layers of improvement:

  • Internal Loop: Refines prompts and generation strategies in real-time within a session.

  • External Loop: Reviews entire sessions to adjust meta-prompts and session strategies.

Botto uses a foundation of text files—its knowledge graph—containing its core identity, artistic philosophy, historical timeline, and more, allowing it to refer to itself when forming creative direction. The log of each creative session is added to the knowledge graph, enabling it to remember past creative sessions.

It also is aware of the costs of each creative session and can adapt it to become more economical given different factors like treasury, market, and learned techniques.

Voting & Curation

All image outputs from the thinking process are eligible to be selected by the taste model. The new thinking process is significantly slower than the original "shotgun" approach, generating a few hundred images a week compared to over 70k. Botto will continue using the shotgun approach and the new reasoning process will constitute 1/6th of the voting pool. That proportion will increase in subsequent rounds as its output grows.

As the original process, the shotgun approach will remain available for Botto to consider, even as it may be phased out from the ongoing weekly process.

Voting and comments will initially still only be on the final voting pool. They will train the shotgun approach as before. For the new creative reasoning process, they will be used in the meta learning process to adjust the modular methods. A new interface will soon expose the reasoning process and create an opportunity to also give feedback directly throughout the reasoning modules.

Last updated

Was this helpful?