Languages and Objects
I wonder what computers will look like in 100 years. Not the chips, but the human interfaces. How we’ll use them. Computer science has little to say on the matter; a computer is anything that can run this very abstract thing called software.
Human-computer interaction has certainly changed a lot in the past. But once desktop PCs matured, the industry seems to have decided that it’s figured out what a serious computer should look like, and shifted its focus to smaller gadgets. The specs have continued to improve, but the basic idea has stayed the same: a glass rectangle full of files and apps and buttons and menus, attached to a keyboard and some sort of pointing device. Much of that paradigm even carried over to new device categories, like smartphones.
Have they really figured it all out? I feel like that can’t be true. Maybe things will change with some new piece of hardware. Maybe all we need is ideas, but the dominant platforms are crowding out the startups that would develop them. Or maybe, after growing up surrounded by derivatives of ’80s desktops, a generation of designers has simply lost the ability to imagine an alternative.
It’s hard to count the good ideas we haven’t had; that would involve having the ideas oneself. But there are some big-picture considerations which at least suggest a map of the design space.
Information theory
Here’s a reductionistic thought experiment. Human-computer interaction is just two streams of bits, one in each direction. To make the bits meaningful, we need some sort of encoding. An obvious starting point is machine language. We could turn this into a REPL of sorts: give the user 10 buttons, use one of them as a clock signal, and use the others to transmit 9 bits of information on each pulse. (The bitrate could be increased by involving feet, facial muscles, etc.) For output we could just show the computer’s binary memory on an array of LEDs.
What’s wrong with a machine language REPL? From an information theory perspective, the problem is redundancy. Like with natural text, there will be predictable patterns in both input and output, so each bit sent over the wire will carry less than one shannon of entropy. To make it more efficient, you could employ a standard compression scheme like Deflate. Or, if you want to go all-out, use a fancy language model to assign a probability to every sequence of bits and then make an arithmetic code out of that.
The only other problem is that this is completely unusable. You can’t just design an efficient code in a vaccuum and then expect people to use it. The human brain is specialized such that complex things like walking come easily and objectively simpler things like calculus require total concentration. People will prefer an interface that on some level mimics the activities their brain specializes in.
Human factors
Surviving on Earth involves lots of activities, and the human brain has gotten reasonably good at all of them. But running and eating aren’t expressive enough to be the primary means of programming a computer. More complex media like painting and sculpture aren’t up to the job either. In principle they could carry arbitrary information, but we don’t have the instincts that would make that easy.
Fortunately, we do have one instinct for conveying arbitrary information: language. If you want a universal mode of interaction for computers, the ideal probably looks like natural language with some shorthands and conventions sprinkled on top to better describe the sorts of things we want to do with computers.
The caveat there lies in the world “universal”. Language can do everything, but it can’t do everything easily or well. Children spend a long time learning how to read and write, and domain-specific jargon is confusing to most adults. Even after the initial learning curve, language takes a fair bit of cognitive effort. Often, it’s entirely the wrong tool for the job: try dictating a painting.
I don’t think this is a coincidence. If Homo sapiens were any worse at communicating complex ideas, it wouldn’t have been able to invent computers at all. If it were any better at communicating complex ideas, an ancestor species with a half-developed version of that faculty would have invented computers much earlier, and found itself in the same situation we do today.
So yeah, it’s a pretty deep problem. To make computers feel natural to use, the only option is to involve more “primitive” modalities somehow. The more primitive the better, in fact. A truly intuitive interface is one we’ve been using for millions of years, like… the physical world.
This brings us to another kind of interface, composed of objects and spaces instead of symbols and statements. We’ve already adapted our object-related instincts to a lot of modern technologies: we organize papers in a filing cabinet, throw trash into the trash can, and operate buttons and dials on appliances. We also have spatial instincts which we can use to, say, navigate an urban transit system (and which I’ll be lumping in with the object-related ones for brevity). With just a bit more adaptation we can use objects that are connected to, or simulated by, a computer.
As I noted before, this approach comes with constraints. Physical objects aren’t very good for working with abstractions. But the constraints make them self-explanatory in a way that languages aren’t. Whereas a language takes years to learn, the affordances of a new object are often something we can figure out in minutes and then use without thinking.
Technological constraints
So far I’ve been imagining interfaces limited only by the human mind’s ability to comprehend them. In reality, of course, technological constraints have a huge influence on how these things look.
Take natural language. For a long time, it’s been obvious that language would be a great way to interact with computers. The hard part was getting computers to actually understand it. So instead, programmers came up with new languages that were narrower and easier to implement—but even harder to use. Computers may never have gone mainstream in the way they did if every user had to put up with error messages about missing semicolons.
It was the other kind of interface—a rudimentary “desktop metaphor”—that brought computing to the masses. But the usual limtations applied: all it really brought to the masses was the kinds of computing supported by available applications. To be useful, applications had to be complex; they had to have enough “features” to handle any action a user might want to take. And they still had to be implemented in one of those cumbersome programming languages. So for the most part application development only made sense as a commercial enterprise, and computers weren’t the tool of individual empowerment that some had hoped for.
This is changing now because computers are getting better at understanding the things we say. Natural language will be adopted where rigid computer languages were before, and also where they were rejected because the learning curve was too steep. Eventually, it’ll become the most common way to write programs, and Python will seem as arcane as assembly does today.
That wouldn’t be optimal from an information theory perspective; there are ways that language could be specialized and improved upon for use with computers. Maybe we’ll develop shorthands to ask a question in fewer keystrokes. Maybe someone will design a whole new grammar, like Lojban’s, to reduce ambiguities in natural-language programs, and it’ll attract a niche community of people who think Vim and Dvorak keyboards are too easy to use. But for the most part I expect we’ll use the same languages with computers that we use with humans, and accept any resulting inefficiencies because of the network effects and switching costs.
The long term
Again, there are lots of things that language just isn’t very good for. So the objects aren’t going away. But objects face technological constraints of their own. They’re bulky, heavy, expensive, and fragile. For “normal” objects, that’s fine. But for objects to work as a computer interface, they have to be as general as computers themselves. We want them to appear, disappear, teleport, replicate, float in midair, and shape-shift. Given the limits of today’s technology, this means they have to be simulated.
But simulating objects is also difficult. Currently we do it using a glass rectangle that can display any image and play any sound, but provides little in the way of tactile feedback, and is pretty much unaware of the physical world around it. This is an okay compromise, but it’s definitely a compromise. If computers are going to be such a big part of our lives, they really ought to be richer and more expressive.
That doesn’t mean we’ll want to follow the physical world exactly, in the same way we’ll want to follow natural language. Because objects are easier to learn, we can more readily modify them to make our work more efficient. The computer mouse is a good example of this. It’s a simple idea, yet provides meaningful benefits over more “natural” direct-manipulation interfaces. Spending all day reaching up to a 27-inch touchscreen at eye level just wouldn’t be that ergonomic.
So there are two ways to make object-based interfaces richer and more expressive. One is to bring parts of them back into the physical world as best we can given physical constraints. In a sense, this is what every specialized hardware device is doing: e-readers, pen tablets, MIDI keyboards, NFC tags. A more ambitious version might involve ubiquitous projection and computer vision like Dynamicland. The maximally ambitious version would be a drawer full of nanobots that can self-assemble into anything you want.
The second way is to go all-in on simulation: VR headsets, omnidirectional treadmills, haptic gloves, eventually brain-computer interfaces. The industry seems to be making a lot of progress in this direction. Unfortunately, the vibes are vaguely dystopian and the ambitious version looks a lot like The Matrix.
On either path, we have a much longer way to go with objects than we do with language, and it’s harder to predict what the future will look like. Probably both approaches will continue to develop, and the most popular products will be somewhere between them. Economic factors might tilt the playing field towards simulation though; it seems easier to make money selling personal simulation devices than sprawling, interdependent physical infrastructure. (Would you rather own the biggest car company, or the biggest road paving company?)
Returning to the original question: no, I don’t think they’ve figured it all out. Human-computer interaction can and will be different in the future. Monolithic commercial applications are probably an artifact of early computers’ poor language comprehension. The constraints that led us to make them out of flat surfaces are also merely technological, if formidable. So there’s space for new ideas. But there isn’t infinitely much of it. The basic dichotomy between languages and objects, and the roles for each, might be with us for as long as our minds are recognizably human.