Comment on page
Botto's Art Engine
Botto is a fully autonomous artist with a closed loop process and outputs that are unaltered by human hands. This page explains how its art engine works.
Botto makes use of a combination of software models called Stable Diffusion, Kandinsky, VQGAN + CLIP, GPT-3, voting, and a number of other models and custom augmentations. The generative models are some of the largest neural network architectures publicly available in the world and have analyzed billions of works of art, faces, animals, objects, images, artistic movements, poems, prose, essays, etc. They have been trained on more content than any human being could study in their lifetime.
These models give Botto the highest amount of latent space to work with and therefore the most possible variation of different styles and themes without being locked into a single area.
The machine creates its images based on text prompts generated by an algorithm. These prompts are a combination of random words and phrases. The prompt is then sent to Botto's text-to-image models. There are currently 4 models: VQGAN + CLIP, Stable Diffusion v1.5 and v2.1, and Kandinsky v2.1.
There are an infinite number of possible prompts and possible images. These models bridge textual and visual information, and can even be "empathic" and know what kind of emotional associations humans have in connection with imagery or text. The DAO may also decide to add a theme that Botto has proposed, in which case the theme is injected into every prompt.
Given all the different possible outputs, Botto needs direction to develop its artistic talent. That is where voting comes in: Botto will adjust its prompts based on what it thinks will be more likely to get popular results.
This process runs through 300 prompts a day, generating as much as 8-10k images with a range of styles every week. From that set, the engine uses a “taste-model” (also trained by the votes of the DAO) that pre-selects 350 images each week to be presented to the community to vote on each new round, which start every Tuesday at 2200 CET / 1600 EST / 1300 PST.
So as to not find itself in a niche too quickly, Botto is directed to surprise and challenge the audience by selecting a number of prompt features and images for voting that have different characteristics from what has been presented to date.
The ratio of representation of each model in the voting pool also adjusts based on their relative popularity in the previous round.
A Period will start with a voting pool of 1050 fragments using the same ratio of the models as it was when Fragmentation Period ended. If there are new models added, they will be given an equal share of the pool to start. The weekly cull of the lowest scoring 349 fragments by VP maintains this size each week and the ratio between models of the new 350 will rebalance each week in proportion with the number of unique votes.
If a generative model’s ratio in the taste model’s selection drops below 10% in a round and does not go back above 10% in voting, it will be discontinued. The community can decide to re-add the model if it wishes, as well as reconsider the automatic threshold in future periods.
The 10% threshold will need to be reconsidered as more models are added.
Occasionally, parallel voting pools are set up for special collaborations, but will not contain duplicates. Botto has generated over 1 million images to date it has never presented, so parallel pools will draw from either those or discarded fragments.
Botto uses voting feedback in two places: (1) creating the text prompts used to generate fragments, and (2) the taste model that pre-selects images for voting each week.
- 1.Text Prompts: Votes influence which aspects of text prompts are used to generate fragments. Characteristics of prompts that generate desirable images will be more likely to get reused, and vice versa.
- 2.Taste Model: The taste model used for pre-selection tries to replicate the voting behavior of the community. This is not a yes/no decision, but a gradient of probabilities such that each set has images with different chances of getting picked in voting (as voting behavior is gradient as well).
For both points, all the votes on all the images are important and get used. The training of Botto is designed to not allow for an overly skewed voting weight. For example, 500 votes each cast by separate voters for one piece will have more weight in the training than 2000 votes by a single voter for the same piece. Other factors, like being the winner or the sale amount, are not currently used in the training.
Botto will freely choose between different aspect ratios for artworks provided each model allows for it, and will adjust its selection of format based on voting.
The titles are created with an algorithm generating random combinations of 2-4 words that are given to CLIP to determine if there is a good match. Different titles are generated until CLIP finds a combination that is the best match with the image and has not been used before. Titles are checked against a list of existing titles so that it does not recreate an existing title.
The descriptions are generated with GPT-3 and are the only part of the process that involves some direct human curation. As GPT-3 was trained on much of the internet, its language can be quite foul at times and is not ready to be out in the world without some supervision.
Until trustworthy text generation methods are developed, the DAO will pick from a series of 5-10 generated descriptions by GPT-3 that CLIP likes and that they feel best fits Botto’s voice. Beyond selecting the description, there is absolutely no editing other than correcting typos and punctuation.
The final title, description, metadata, and URL to the bitmap on IPFS are all on-chain and minted using Manifold.
One of the rules for Botto is that there be no direct human interference in the creation process. Botto is strictly against any “cheating” or human guidance other than the voting. That means the prompts are random, there are no seed images of existing real-world images used, and the selection of fragments are entirely controlled by Botto.
The only direction Botto got at the outset was from adding a small amount of pre-curated prompts to the entirely random ones generated by the algorithm. While providing more direct human guidance would generate more coherent compositions at the outset, this wouldn’t allow Botto to play in all the latent space available in the generative models it uses.
Each evolution of Botto's capabilities and process takes careful consideration of how to implement it without violating Botto's agency. Ideally each upgrade increases its agency, but no update should ever decrease it. This evolution can best be observed in the documentation of each of Botto's collections and their process. See Collaborations and Special Projects for more.
Quasimondo (aka Mario Klingemann) designed Botto based on a whitepaper he wrote back in 2018. He is the only person who works with the AI part of the code and enforces the rule for Botto that there be no direct human interference with the creations. As such, his only work is to adjust the way votes are implemented to ensure Botto is learning as best as possible. Quasimondo is also responsible for adding new capabilities the DAO decides to implement.
There is active governance discussion on how to fully decentralize Botto's stack as new technology becomes available, such as making its process trustless through zkproofing, a modular architecture and governance process for adding in new generative models, and a data API and an open code base that would make the entire protocol forkable.
Anyone can propose adding or removing a new model to Botto’s set. Some suggested (but not strictly required) guidelines are:
- The model is sufficiently large so as to not be introducing a highly human-curated model that violates Botto’s agency
- Fees are affordable for the DAO treasury
- Not adding more than one model at a time to Botto’s core process
Proposing to add/remove a model works like any other BIP proposal a Botto member can make.
The DAO could also decide to add a new model before the scheduled end of a period, cutting it short and starting a new period by default.
Themes are generated by asking Botto via GPT-3 to propose a set of 10 themes. The DAO then votes on the themes proposed by Botto using the same interface for voting on mint descriptions. The selected theme is added by default to the prompts generated for that period, adding the theme verbatim at the end of the prompt.
The community may also decide to provide no theme for a period, for instance when a new model is being added and there is a desire to see its full range before narrowing in on a theme.
Themes will be presented and voted on in the last week of a period unless the DAO decides otherwise. The DAO could also decide to cut a period short and go on to a different theme.