Machine Learning, Design and the Death of the Data Structure

October 11, 2016

Machine LearningDesignGenetic Algorithms

Introduction

Over the last seven years or so, I've been working on ideas related to the development of machine-learning-assisted design software. In this time, the field of machine learning has advanced at a rate beyond anyone's wildest expectations. This in turn has opened an infinite assortment of possibilities for intelligent design software — new ways of thinking about how humans can express ideas in collaboration with machines that would have been unimaginable only a few years ago.

In this article, I would like to review some of the stages in my thinking along the way and offer a point of view about what will come next in this rapidly developing field.

Genetic Algorithms

While I was a student at NYU's Interactive Telecommunications Program (ITP), I was introduced to genetic algorithms and the work of Karl Sims.

The basic idea of a genetic algorithm is to use a simplified Darwinian model as a means of automating the solution to a design problem.

Perhaps the best way to understand genetic algorithms is with a simple example:

Let's say we want to design a box car (a car without a motor).
We have some idea of the necessary components: wheels, a chassis, etc.
We also have some idea of what makes a good car.
In our case, we want the fastest car possible.

We know that the specific properties of these components (and the relationships between them) will determine whether the car is fast. But we're not exactly sure what combination of those properties will make for the fastest car. What happens if we make the front wheel larger in proportion to the rear wheel? What happens if we make the chassis longer?

With just three properties — rear wheel radius, front wheel radius and chassis length — there is an infinite number of possible cars. How can we find the ideal values for these properties in order to produce a really fast car?

Diagram showing a simple car design with three properties: rear wheel radius, front wheel radius, and chassis length
A simple car design with three adjustable properties

An automotive engineer can make these sorts of decisions by drawing upon prior experience, a knowledge of the underlying physics, etc.

A layperson, however, might be left to tinker endlessly.

Enter the Genetic Algorithm!

We start by creating a data structure or "genome" to represent the relevant properties. In our case, that might look something like this:

Diagram showing a genome data structure with three genes representing car properties
A genome data structure representing car properties

where each unit in the genome is a positive real-valued number.
For example:

Example genome with specific numerical values for the three car properties
Example genome with specific values

Next, we generate a "population" of cars, each of which has a randomly assigned value for each of the three properties.

Diagram showing a population of different car designs with varying properties
A population of randomly generated car designs

Of course, some of these may result in completely dysfunctional cars. But, we'll sort that out through evolution.

Next, we need some way of testing the quality of each car in the population to determine whether it meets our need for speed.

Since this is a simplified example, we won't worry about the details too much and assume that we have physics simulation software that will allow us to generate a simulation of each car and test it on a computer-generated hill. We start each car at the top of the hill and see which one reaches the bottom first. In an evolutionary sense, we'll say that the first one to reach the bottom is the most "fit."

In this first round, we chose the properties values randomly. Some of the cars may not work at all. It is likely that none of them will be all that great.

After we've evaluated the fitness of each car, we can then determine each car's likelihood of "mating." The fastest cars are picked for mating, the remaining ones are not. We pair each of the chosen cars with another chosen car and produce an offspring by averaging their property values together (or by randomly choosing which of the two parents to copy each gene from).

Diagram showing the evolution process with mating, crossover, and mutation in genetic algorithms
The evolution process: selection, mating, and reproduction

We repeat this process over many "generations" of successive populations and arrive at a set of properties that hopefully makes for a really fast car!

This approach works quite well even with more complex design goals. The key caveat to genetic algorithms, however, is that they need objective metrics for assessing the fitness of individuals. Therefore, this approach doesn't easily lend itself to more subjective problems like making "good" art.

Sure, you could define a good painting as one that has a lot of blue in it and use a genetic algorithm to produce lots of blue paintings. But if your definition of a good painting is more nuanced and subjective than that, it will likely be difficult to represent with a fitness function.

Nevertheless, genetic algorithms offer a fascinating distillation of a key force in nature and powerful tool for at least some kinds of design. In the 1990s, Karl Sims demonstrated the awesome power of genetic algorithms with his Evolved Virtual Creatures project:

Karl Sims' Evolved Virtual Creatures project demonstrating genetic algorithms

Watching this as a student, I was amazed to see that some of the evolved creatures took evolutionary strategies quite like ones that had been realized by nature. In other cases, the virtual creatures had found successful strategies that were quite unlike anything I'd seen before.

This got me thinking about the possibility that these algorithms could produce a sort of post-history in design.

A-historic Design

When human designers sit down to design a new car or chair or evening gown, they cannot help but draw upon the history of that field. Even if they wish to reject past approaches and put forward a new paradigm, they inevitably do so in relation to what has come before. They can't help it. They went to school for it. They lived in the world and were surrounded by earlier cars, chairs and evening gowns. For this reason, it is very difficult for any field of design to truly break free from past assumptions. In general, this is a good thing. We learn from the successes and failures of the past. But it also sometimes means that we cannot see beyond conventions that arose from some earlier constraint that need not impede our work today.

Genetic algorithms, on the other hand, are blissfully ignorant to the history of the design fields with which they engage. They try to solve design problems by starting with many random variations and allow the best ones to emerge and be refined through evolution. Their design lineage is internal — starting from something random and moving to something better through a succession of small, objective improvements.

This is a powerful thing! It is a way of automatically finding a needle in a haystack. But it cannot function entirely on its own. It requires a fitness metric. It needs someone to tell it how to identify the needle.

In some engineering contexts, this can be achieved with little or no direct human input. For example, NASA used a genetic algorithm to develop an ideal geometry for a radio antenna. Like our car example, the physical laws governing the quality of a radio antenna can be stated in a relatively objective manner.

NASA's ST5 Spacecraft Antenna designed using genetic algorithms
NASA's ST5 Spacecraft Antenna (2006).

For a problem of this nature, we don't really care whether the antenna fits into any particular design lineage. But for design problems that involve human users, there are other considerations — subjective taste, the feeling of familiarity, etc — that are hard to state explicitly or model with a computer simulation.

In my study of genetic algorithms, I ultimately came to the conclusion that this a-historic quality could be a powerful force in design, but only if it were coupled with the human and historical considerations made by a human designer. Rather than providing a means for automating design in a cultural vacuum, I saw genetic algorithms as a means of assisting designers in expanding their ways of seeing a given design problem — a way of sparking new ideas in the human mind.

Visualizing the Diversity of Species

In order for genetic algorithms or any other form of machine intelligence to play an assistive role in a human designer's process, the designer must be able to fold the machine's speculative creations back into his or her own thinking about the design task at hand.

A key component to this is understanding how the machine's creations relate to one another and to the designer's own creations.

In the simple car example above, we used only three genes. It might be possible for a designer to wrap his or her mind around the range of possible individuals that could arise within this "species." But any real-world design problem would be likely to involve many more genes.

It is very difficult to comprehend the genetic diversity of a large population whose individuals are comprised by hundreds or thousands of genes.

We need some way of making the range of genetic expressions more comprehendible to the designer.

Enter Dimensionality Reduction!

Thus far, we have discussed the component properties of a design through the analogy of genetics. Another way to think about an individual gene in this context is as an abstract dimension.

In everyday speech, we associate the term dimension with the three spatial dimensions of our physical reality. In math, however, we can apply this term more abstractly to any numeric variable.

Just as a line segment occupies a range of values in one spatial dimension, we could think of "loudness" as a kind of abstract dimension ranging from 0 decibels ("inaudible") to 130 decibels ("deafening"). Similarly, we could think of the Rear Wheel Radius property of our car as a dimension ranging from say 1/4 inch ("matchbox car") to 6 feet ("monster truck").

Diagram showing how abstract dimensions work, comparing spatial dimensions to abstract properties like loudness
Abstract dimensions: from spatial to conceptual

In these terms, the genome of a species with one hundred component genes can be thought of as a hundred dimensional system.

Just as two objects can be near or far from one another in a two- or three-dimensional world, genetic expressions can be near or far from one another in a hundred-dimensional world. The problem is that while we can visualize physical proximity within the ordinary physical dimensions, we don't perceive a sufficient number of physical dimensions to visualize the similarities and differences between hundred-dimensional entities. To visualize high-dimensional variations, we need some way of mapping them onto a lower number of dimensions.

Even in our low-dimensional physical reality, we are already familiar with a form of dimensionality reduction: a photograph is a two-dimensional visualization of a three-dimensional system.

Of course, there is no way to produce a two-dimensional photograph that portrays all aspects of a three-dimensional object. Some information must be lost or skewed (see Cubism). Ambiguities or false impressions can arise from dimensionality reduction, as we see in the forced perspective image below:

Example of forced perspective showing how dimensionality reduction can create optical illusions
Forced perspective: the limitations of dimensionality reduction

Nonetheless, if we want to visualize the similarities and differences between hundred-dimensional entities, dimensionality reduction is our best and only approach, despite its inherent limitations.

In my early work on this project, I was not yet aware of some of the more advanced forms of dimensionality reduction that have arisen from Deep Learning techniques. We'll get to those later in the article.

At the time, the most robust form of dimensionality reduction available to me was through a form of artificial neural network called a Kohonen Neural Network or Self-Organizing Map.