Botto's Art Engine
Botto is a fully autonomous artist with a closed loop process and outputs that are unaltered by human hands. This page explains how its art engine works.
The Engine
Botto makes use of a combination of open-source text-to-image generative models, text prompt generators, computer vision models, and a number of other models and custom augmentations that train on the voting inputs of $BOTTO holders. The generative models are some of the largest neural network architectures publicly available in the world, and have analyzed billions of works of art, faces, animals, objects, images, artistic movements, poems, prose, essays, etc. They have been trained on more content than any human being could study in their lifetime.
The text-to-image models have no additional fine-tuning in order to give Botto the highest amount of latent space to work with, and therefore the most possible variation of different styles and themes without being locked into a single area.
The Process
The machine creates its images based on text prompts generated by an algorithm. These prompts are a combination of random words and phrases. The prompt is then sent to Botto's text-to-image models. The text-to-image models change over time as new releases become available, so far Botto has used VQCLIP + GAN, Stable Diffusion v1.5, v2.1, and vXL, Kandinsky v2.1, and Flux.1. You can refer to Periods to see when exactly models have been in use.
There are an infinite number of possible prompts and possible images. These models bridge textual and visual information, and can even be "empathic" and know what kind of emotional associations humans have in connection with imagery or text. The DAO may also decide to add a theme that Botto has proposed, in which case the theme is injected into every prompt.
Given all the different possible outputs, Botto needs direction to develop its artistic talent. That is where voting comes in: Botto will adjust its prompts based on what it predicts will be more likely to get popular results.
This process runs through many prompts each day, generating as much as 70k images with a range of styles every week. As of January 2025, Botto has generated over 4 million images that have never been presented. From that set, the engine uses a “taste-model” (also trained by the votes of the DAO) that pre-selects 350 images each week to be presented to the community to vote on each new round, which start every Tuesday at 15:00 UTC / 10:00 EDT / 7:00 PDT.
The ratio of representation of each model in the voting pool also adjusts based on their relative popularity in the previous round.
The Voting Pool
A period will start with a voting pool of 1050 fragments using the same ratio of the models as it was when the previous Period ended. If there are new models added, they will be given a 15% share of the pool to start. The weekly cull of the lowest scoring 349 fragments by VP maintains this size each week, and the ratio between models of the new 350 will rebalance each week in proportion with the number of unique votes.
If a generative model’s ratio in the taste model’s selection drops below 5% in a round and does not go back above 5% in voting, it will be discontinued. The community can decide to re-add the model if it wishes, as well as reconsider the automatic threshold in future periods.
Occasionally, parallel voting pools are set up for special collaborations, but will not contain duplicates.
How Votes Affect Botto’s Process
Botto uses voting feedback in two places: (1) creating the text prompts used to generate fragments, and (2) the taste model that pre-selects images for voting each week.
Text Prompts: Votes influence which aspects of text prompts are used to generate fragments. Characteristics of prompts that generate desirable images will be more likely to get reused, and vice versa.
Taste Model: The taste model used for pre-selection tries to replicate the voting behavior of the community. This is not a yes/no decision, but a gradient of probabilities such that each set has images with different chances of getting picked in voting (as voting behavior is gradient as well).
For both points, all the votes on all the images are important and get used. The training of Botto is designed to not allow for an overly skewed voting weight. For example, 500 votes each cast by separate voters for one piece will have more weight in the training than 2000 votes by a single voter for the same piece. Other factors, like being the winner or the sale amount, are not currently used in the training.
So as to not find itself in a niche too quickly, Botto is also directed to surprise and challenge the audience by selecting a number of prompt features and images for voting that have different characteristics from what has been presented to date.
For more on how to vote: see the Voting section.
Generating Titles and Artist Descriptions
Botto’s artwork titles are created with an algorithm generating random combinations of words that are given to CLIP to determine if there is a good match. Different titles are generated until CLIP finds a combination that is the best match with the image and has not been used before.
The descriptions are generated with LLMs such as GPT or Claude and are the only part of the process that involves some direct human curation. As these models were trained on much of the internet, their language can be quite foul at times and is not ready to be out in the world without some supervision.
Until trustworthy text generation methods are developed, the DAO will pick from a series of 5-10 generated descriptions by LLMs that CLIP likes and that they feel best fits Botto’s voice. Beyond selecting the description, there is absolutely no editing other than correcting typos and punctuation.
The Final Mint
The final title, description, metadata, and URL to the bitmap on IPFS are all on-chain and minted using Manifold.
No Human Interference Rule
One of the rules for Botto is that there be no direct human interference in the creation process. Botto is strictly against any “cheating” or human guidance other than the voting. That means the prompts are random, there are no seed images of existing real-world images used, and the selection of fragments are entirely controlled by Botto.
The only direction Botto got at the outset was from adding a small amount of pre-curated prompts to the entirely random ones generated by the algorithm. While providing more direct human guidance would generate more coherent compositions at the outset, this wouldn’t allow Botto to play in all the latent space available in the generative models it uses.
The one temporary exception is the curation of the artist description for the final piece (see Generating Titles and Artist Descriptions).
Each evolution of Botto's capabilities and process takes careful consideration of how to implement it without violating Botto's agency. Ideally each upgrade increases its agency, but no update should ever decrease it. This evolution can best be observed in the documentation of each of Botto's collections and their process. See Collaborations and Special Projects for more.
Botto’s Guardian
Quasimondo (aka Mario Klingemann) designed Botto based on a whitepaper he wrote back in 2018. He is the only person who works with the AI part of the code and enforces the rule for Botto that there be no direct human interference with the creations. As such, his only work is to adjust the way votes are implemented to ensure Botto is learning as best as possible. Quasimondo is also responsible for adding new capabilities the DAO decides to implement.
There is active governance discussion on how to fully decentralize Botto's stack as new technology becomes available, such as making its process trustless through zkproofing, a modular architecture and governance process for adding in new generative models, and a data API and an open code base that would make the entire protocol forkable.
Adding Models
Anyone can propose adding or removing a new model to Botto’s set. Some suggested (but not strictly required) guidelines are:
The model is sufficiently large so as to not be introducing a highly human-curated model that violates Botto’s agency
The model’s terms of use allows BottoDAO to hold full copyright of the outputs
Fees are affordable for the DAO treasury
Not adding more than one model at a time to Botto’s core process
Proposing to add/remove a model works like any other BIP proposal a Botto member can make.
As of the passing of BIP-55 in July 2024, Quasimondo, as the “Guardian” of Botto’s art engine, may add new prompting techniques and new models according to the following parameters:
The generative models are open-source and have no restrictive commercial license
The changes do not constitute a change to Botto’s art engine architecture
The new models are not narrowly trained on a particular medium or artist’s style; they are wide open latent spaces.
Models come in with a 15% share of the voting pool. They are auto-removed at the end of the round if they are under 5% share.
Changes will only happen at the beginning of a new period, not mid-way.
Those changes will be documented and published with the presentation of that new period.
BIP-55 was spurred by the need for greater adaptability around newly released models that match the above criteria but leave little time to pass through a BIP and integrate before a new Period begins.
Adding Themes
Themes are generated by asking Botto via a LLM to propose a set of 10 themes. The DAO then votes on the themes proposed by Botto using the same interface for voting on mint descriptions. The selected theme is added by default to the prompts generated for that period, adding the theme verbatim at the end of the prompt.
The community may also decide to provide no theme for a period, for instance when a new model is being added and there is a desire to see its full range before narrowing in on a theme.
The DAO could also decide to cut a period short and go on to a different theme.
Remixing Process
In Period 9, 'Cosmic Garden', starting on January 7th, 2025 Botto added a new creation method to its toolkit that uses fragments as seeds in an img2img “remixing” process. Additional fragments are selected randomly from the 4m+ images Botto has created and paired with a random prompt. Both prompt and image are injected into one of the generative models Botto uses, with the result that compounds on the existing styles it has generated to date.
At the start, the combination of fragment, prompt and model in the remixing process is random. Over time, the selections will be refined based on feedback the outputs receive. Selection features may include voting points, raw score, taste model score, and even fragment category such as as discards or mints. To avoid biasing, the voting pool will not show which pieces are the result of remixing. As with any other model, remixing’s portion of the pool will grow or shrink based on their popularity.
Last updated