Title: Latent Walks

Abstract: The computational work of art in the age of mechanical generative reproduction reproduction.

computational - the work is performed by computing units able to run massively parallel mathematical operations

generative - having the power of origination, showing thought and spontaneity

reproduction (biological) - the production of offspring, as opposed to the production of copies

The assembly line introduced interchangeable parts and the conveyance of the assembly, not the assembler. The automobile assembly line in America ushered in the age of the automobile and with it vast economic gains through order of magnitude increases in productivity. Artificial intelligence, specifically generative AI and other neural network architectures, are today championed as the next paradigm shift in productivity, promising untold new gains. Architecture diagrams of individual AI systems suggest the schematics of the interchangeable machines of the 19th and early 20th century assembly lines. Like the machines, it is possible to assemble these generative systems in complex assembly line sequences, or even break out of linearity into networks, to produce exponentially more vast phenomenological sensory assets and experiences than previously possible. Questions naturally arise: how can these systems be constructed to generate new possibilities for experience and just-in-time personality-specific phenomena? How does the social nature of shared story and experience change when the story and experience is authored for each individual? What happens when changes are made to components of these generative assembly lines and how do these changes perpetuate through the system and its output? If the medium was the message in the days of McLuhan, the medium is now the soil, the raw material, the genetic material.

The form of this project is space: physical space, virtual space, latent space, and the magic circle. This project wants to know embodiment at the intersection of virtual and physical, specifically how magic circles occupying privileged orientations inside the intersectional space of physical and virtual enable embodied navigation and play in latent space and how this play is powered by the assembly line networks of generative systems.

The vehicle we will drive through space is the human body in concert with mathematical computation. To illustrate with a specific example, we can apply linear algebra to compute and map the direction and distance between two text prompts existing in the latent space of their model. In this example we choose to drive the cartesian coordinates of this space by the human body as GPS position on Earth. In this way, one can take a walk through the latent space of a generative model. Interestingly, by selecting an audio diffusion model as our generative system and playing sampled outputs, one hears the latent space as one takes a physical walk. Building up this system as an assembly line of generative production, a new form of experience is made real. The investigation of these latent walks, the assembly lines that produce them, and the ramifications of this rupture in work and production for the individual and society is the focus of this research.

Walk #1: A stroll through sonic spaces.

Google Collab Notebook for Walk #1

Our first stroll is a linear path. At the start of the path is Text Prompt A:

“The distant sound of children playing, birds chirping in the canopy, the soft rustle of leaves underfoot.”

And at the other end of the path is Text Prompt B:

“The gentle sound of water from the pond, a duck's quack, the quiet hum of insects in the air.”

The world where we will take our stroll is AudioLDM2, an AI model that generates audio from text. To start, we encode text prompts A and B into tensors, transforming them into numerical representations within the latent space of the text encoder models. Imagine these prompts as points in a multi-dimensional space, akin to dots on a paper but in way more dimensions. And don’t worry, you are not alone, our brains struggle to visualize dimensional spaces above 3, let alone hundreds or thousands of dimensions.

Now that we have our two points we can use math (linear algebra) to subtract our second point from our first point, which gives us a vector that marks the path from prompt A to B. As we mathematically stroll along this path, we generate audio at selected points. Each point represents a unique text prompt embedding, with sounds gradually shifting from resembling A to B. This exploration allows us to audibly experience the transition between two textual concepts. Let’s take a pleasant stroll along this path, creating an engaging sonic journey through the model's latent space.

Interactive Example #1.1: A 1d Walk

Click in the window to focus, then use the arrow keys to move left and right along the path. As you navigate, listen to how the audio changes, reflecting the shift from one text prompt to another. The three white dots in the middle represent audio clips generated from interpolated embedded prompts, showcasing our mathematical journey through the latent space.

Next Page: A 2D park to stroll through ->