Machine learning basics for everyone

Simple answers to common questions about AI and machine learning

April 2017

This article predates the rise of large language models.

Vintage illustration: a bespectacled man rests his head on his hand among stacks of books while a large mechanical tabulating machine dominates the foreground, with a technician operating it

The basics of machine learning

Though I've been following machine learning for a long time, only recently have I tried to become a practitioner. Last year I threw myself into learning the fundamentals of natural language processing, and wrote a five-part series on NLP aimed at other programmers. This year I'm tackling machine learning more broadly, with a focus on text comprehension and production.

Friends and colleagues are naturally curious about artificial intelligence, and tend to ask me the same reasonable questions. I'll answer them to the best of my ability for you. While the answers won't be too technical, I will try to make them precise, because there's a lot of room between highly mathematical explanations and hand-wavy "the robots are coming to take our jobs" stuff. I'm also increasingly of the opinion that a cocktail-party-level understanding of AI is both important and achievable.

The question I'm asked most often requires some unpacking, so let's start with this one topic:

"What are the differences between artificial intelligence, machine learning, neural networks, and deep learning?"

These are all real terms with distinct meanings but are often used interchangeably. I actually think that's fine in most contexts, but the distinctions can help you understand where the industry has been and where it's going.

Artificial intelligence is the umbrella term for the entire field of programming computers to solve problems. I would distinguish this from software engineering, where we program computers to perform tasks. "AI" can be used so broadly as to be almost meaningless, in part because the scope of the phrase is constantly evolving.

There are modern conveniences we've become so accustomed to that we hardly think of them as AI: driving directions, auto-complete, full-text search. These affordances — that we rely on and are infuriated by in equal measure—were state of the art AI just decades ago. To us they're natural parts of our phones and cars.

(There's an analogous trend in animal research. Using tools was once considered a defining characteristic of human intelligence, but the bar gets raised each time we find octopuses opening jars or ravens solving puzzles. Similarly, artificial intelligence tends to mean whatever it is that computers can't quite master yet.)

Machine learning is a subset of artificial intelligence. The important word there is "learning"—as in, not being explicitly taught. Instead, machine learning systems are trained by being presented with lots of examples—thousands, if not ideally billions — but without a lot of guidance about how to solve the problem or even what exactly they're looking for.

Teaching a child can superficially resemble training an ML system: we provide lots of examples over time and give them feedback about whether they're right or wrong. But we also tell children how to learn — that words are made up of syllables, that they need to "carry the one," or how to think critically about a story they read. It isn't until kids are older that they learn primarily through inductive reasoning, by recognizing patterns and developing an intuitive understanding of the world — an understanding that's unique to them.

A better model for ML training is teaching a dog — especially teaching a dog to do something we can't do ourselves, like sniff out a bomb. We give the dog lots of training, but we can't tell them exactly how to do it because it isn't something we quite know how to do ourselves. We can only repeatedly expose them to the target and reward them when they get it right. It's up to the dogs to learn what features are salient to them, that add up to "this is a bomb."

Prior to machine learning, AIs (often called expert systems) could be "smart," but had to be explicitly taught everything they knew. Expert systems work like enormous checklists, and checklists are effective ways to make decisions, even for humans, but they are fiendishly time-consuming to construct and are narrowly focused on solving one particular domain: diagnosing one class of illness, or safety-checking one kind of airplane.

This ability to extract relevant features without guidance is at the heart of the machine learning revolution. By building up a set of features, known only to the machine, an ML system becomes capable of generalizing—operating on examples that aren't exactly like ones its seen before. Generalization means that a network that is trained to distinguish pictures of cats from pictures of dogs will be capable of doing the same task with pictures it's never seen before, because it's learned a set of features that distinguish dogness from catness.

At this point, you can safely think of machine learning and AI as synonymous. Even though expert systems and similar still exist and serve an operational purpose, we tend not to think of them as "AI" anymore. We've moved the bar.

Vintage illustration of a room-sized early computer system with multiple cabinets containing tape reels and circuitry, and a technician standing at one end

Neural networks are a popular and effective way to implement machine learning. It's fun to talk about them because "neural networks" sound extremely cyberpunk. You can do machine learning without neural nets, and those less-sexy architectures solve some kinds of problems better, faster, or cheaper.

It's hard to explain what neural networks actually are without getting a little technical, something I will happily do in a future article. At the very highest level, neural networks are complex systems made up of very simple components which learn to divide up work and specialize. It's sometimes counterintuitive, if not downright amazing, how much complexity they can encapsulate using very primitive representations.

While they are at some level modeled after how real neurons work, I don't think that in the long run AI systems will resemble real brains at all, any more than commercial airplanes flap their wings.

Deep learning refers to a class of neural networks. It has a specific technical meaning but has also become a buzzword. The "deep" part just means that the network architecture has multiple layers—three-layers-deep versus one-layer-shallow. The outcome of this is that deep networks can learn and reason much more effectively than one-layer networks.

Stacking virtual neurons into layers allows networks to develop richer features by building up hierarchical representations. If we train a deep model to recognize elephants from zebras, it's likely the model will develop some concept of "ears", "trunks," and "stripes," because those are salient features that aid in distinguishing the two animals. This is exactly how all complex systems evolve, whether they're biological organisms or corporate org charts: many workers doing focused, discrete tasks whose output gets synthesized further up the chain into ever-larger representations and actions. A human hand is much more useful than the sum of its muscles; a company makes high-level strategic decisions based on local conditions reported on by staff.

Deep learning does enable machines to reason more deeply about certain kinds of problems through their ability to extract features from data, but these AIs aren't truly "deep" in the sense of being philosophical. Nevertheless, it is fascinating — and perhaps a bit worrying — that we humans don't get to play a role in deciding what those features are. This automatic feature extraction is often at the root of the problem of AIs amplifying human bias. And in fairness to all the "robots are coming to take our jobs" articles, there are already tasks that deep networks can perform better than skilled humans, because they are able to detect patterns that are otherwise invisible to us.

(Confusingly, the chess-playing Deep Blue, the first real-life AI that many people know by name, was not "deep" in either of these senses, and was instead IBM's rebranding of Deep Thought, Douglas Adams' fictional AI, which itself was a reference to a popular porn film, thus demonstrating that one should never allow engineers to name things.)

Mid-century scientific illustration of a nerve cell with branching dendrites radiating outward, drawn in brown on an olive green background

Talking confidently about AI

Here's some summary sentences to take with you to that cocktail party:

Machine learning is a type of artificial intelligence in which the system learns patterns from lots of examples.
Artificial intelligence systems can either be explicitly taught answers, or learn to infer them (or both!).
Many machine learning systems are implemented using neural networks, in which hundreds or thousands of small workers learn to collaborate to solve problems.
Recent advances in neural network architecture have resulted in so-called deep learning systems which can infer complex features, and sometimes outperform expert humans.

How AI classifies and predicts our world

Given the fundamental principles of modern AI — that ML systems learn from lots of examples, and that deep learning enables richer representations of those examples — what can we reliably do with these systems today? What's practical, what's still emerging from research, and what remains unsolved?

AI capabilities can be assessed along two broad axes:

Better data science

Tasks that we've long been able to do with traditional statistics or software engineering, but that machine learning can do better, faster, or at scale:

Prediction: Given past conditions, like the weather or stock prices, predict the next value in the series.
Classification: Is this a picture of a zebra or a horse? Is this tweet positive or negative?
Regression: How old is the person in this picture? How fuel efficient do we expect this car to be?

Towards true AI

Machine learning can also perform tasks that were difficult or impossible for earlier computer systems to perform; these transcend smarter/fast/better and become qualitatively closer to true non-human intelligence:

Generation: Create a new thing after seeing lots of examples of that thing, or modify a thing that already exists.
Task learning: Via trial and error, learn to perform a complex task like playing a game.

The first grouping feels more like science, while the second approaches fiction.

Predicting the future

Extrapolating from historical data into the future, also called forecasting, is an everyday activity for any sufficiently large organization: given how well we did last year, how well will we do next year? Simple predictive models might involve just a few factors; after all, I can reasonably guess what the temperature will be in Boston just based on the day of the year. More sophisticated models may involve many more parameters, more than a human can reasonably calculate by hand.

Forecasting is where the distinction between statistics, machine learning, and neural networks comes into play:

If you're solving a common business problem, like trying to predict the best price for a product based on historical data, stick with a traditional financial or statistical model.
If you have a lot of factors but don't know exactly how they inform the prediction, you might turn to a machine learning solution to find a best fit for your model. This kind of problem tends to fall under the umbrella of data science and might be solved by a mix of complex spreadsheets and programming languages like R.
If you're trying to make predictions on unusually or poorly-understood datasets that aren't easily reducible to simple values, a neural network is worth exploring.

Neural networks are quite capable of performing the same math as traditional models — I wrote a toy example that can (roughly!) predict the weather — but high-impact models may require scientific peer review or formal financial audit. Neural nets can be a black box and are almost impossible to reverse-engineer except by another machine learning scientist. All things being equal, an Excel model prepared by a human expert may well be better than a neural network model, if only because of its transparency.

Summary: Most organizations don't need "artificial intelligence" to manage the day-to-day financials, and there are disadvantages with predictions that can't easily be verified or audited. But there's opportunity to use ML for more exploratory analysis.

Classification and recognition

For 40 years, US mail has been sorted by artificial intelligence. Automated handwriting recognition is a machine classification task: given a letterform, choose the letter that it represents. The earliest optical character recognition systems were explicitly taught how to map lines to characters; this is an example of something that is AI but not "machine learning." By contrast, modern neural networks can teach themselves this task with relative ease and no explicit guidance from humans.

People are excited about deep learning because you can throw almost any kind of classification problem at it (provided you have enough examples to learn from) in whatever kind of format they naturally occur. This means we can train networks on things in the world as they really are — photos or video stills or audio samples — and let the network figure out which parts are important for making a good classification. Prior to machine learning, AI systems were limited to tasks where we could enumerate all the features ourselves, like recognizing postal addresses that consist of a limited set of letters, numbers, and punctuation. As the problems get more complex, human instructors become the bottleneck. ML gets us out of that logjam.

Recognizing images

Once a deep learning system is trained, it can make its classifications pretty quickly — fast enough for autonomous vehicles to use cameras and other sensors to recognize important real-world objects like trees, buses, and traffic signals.

They're not perfect at it, though, and that's one reason we don't have self-driving cars speeding down our streets yet. ML systems can still struggle with broad visual recognition tasks like identifying objects in any lighting conditions or location, something we can do effortlessly. But perversely, if the classification task requires humans to be highly trained — say, in identifying tumors or counting sea lions — deep learning can (or may eventually) do better than people. Neural networks learn by example, and in a single day these systems can review more examples than any human could in a lifetime.

Recognizing language

Beyond just identifying letters, machine learning systems can also perform language classification tasks at the semantic level. By looking at lots of labeled sentences, they can be trained to answer questions like, do these two sentences mean the same thing? Is this restaurant review positive or negative? Is this story naughty or nice?

Natural language systems can be useful when trying to make generalizations about human dialogue at scale. Social media produces a tsunami of written human opinions, and marketing and PR departments are acutely interested in whether people seem happy or sad about their brands. The best language systems can be easily outperformed by people, but they're good enough to be useful when there simply aren't enough humans to read all the words on Twitter.

Summary: Deep learning methods are particularly good at classifying complex, raw, real-world data. If properly trained, neural networks can excel at finding subtle patterns in images, which has profound implications in medical diagnosis. Machine learning can be applied towards understanding the subtleties of human language, but for now is better as a supplement to human textual analysis rather than a replacement.

Regression analysis

Sometimes you want to predict values that don't fall into neat buckets like "positive or negative" or "cat versus donkey versus butterfly." These numeric value calculations are called regression models. (The distinctions here are a little fuzzy: predicting the weather involves extrapolating from the past to predict real values, like the temperature, but also categorical values, like "sunny" or "cloudy.")

As with time-series prediction, there are well-understood algorithms and techniques for performing regression analyses and in most cases there's no reason for machine learning to upend that discipline. Neural networks could be useful for exploring a well-trod problem space looking for new solutions, but there's real social and economic impact in applying models that are poorly understood. We should proceed with care.

I think the interesting opportunities in neural network-based prediction are where there isn't neat numeric data already. Just as with classification, deep neural networks can make real-value predictions based on all kinds of human-shaped data, like predicting stock volatility based on financial disclosure statements.

To try this out, I trained a deep network on 40,000 famous paintings and their year of origin, and then asked it to predict the creation year for works it's never seen before. I barely know what I'm doing when it comes to machine learning, and yet it kinda works? On average it guesses within 64 years of the real date of the painting — certainly no replacement for a human expert, but better than I could personally do, especially considering the works range anywhere from AD 1200 to 1930.

Summary: Based on the features they've learned during training, neural networks can perform regression analysis, as well as if not better than traditional statistical models. They're most useful when the data to be analyzed isn't straightforward, or when the most relevant correlations aren't yet known.

From science to fiction

Machine-based prediction and classification are exciting because they're immediately useful tools. Thanks to the internet, we are generating more words, text, and video than we could ever hope to decompose into analyzable data; machine learning give us the ability to grapple with this abundance when all other techniques are impractical.

But deep neural networks can do more than simply reflect our own understanding of the world back to us. They can invent new things, and teach themselves to interact with us and with physical space.

How AI can create art and keep learning

Generation

When we talk about "training a network," often what we're really saying is that we're teaching the machine about probabilities:

If an image is mostly blue with round blobs of white, it's probable that it's a picture of the sky; but if it's mostly blue with triangular blobs of white, it suddenly becomes more probable that it's a picture of sailboats.
If a sentence begins with The, it's more probable that the next word is a noun rather than a verb (but very unlikely that it's a proper noun).

We can make use of these internalized probabilities to ask the network to invent plausible but wholly new examples of what it was trained on.

Take the second example: a neural network that's been trained on sentences. Once it's internalized how likely one word is to follow another, you can ask it invent new sentences, and these can be fun to read. While they don't make a ton of sense, they can have surface resemblances to real prose. A network asked to continue on from "Mary had a little ___" might not pick lamb (unless it was trained on nursery rhymes), but will pick a phrase that's grammatically correct, like more ice cream.

In a limited way, it's possible to coerce the output away from the natural probabilities. As part of a computer-generated poetry jam, I trained a network letter-by-letter on a selection of naturalist essays from the 19th century, and then asked it to generate alliterative new sentences:

Occasionally an old olive on one of our own spring, observed only once, only over others, pressed plants perched pictures. Pale peaks pouring: the palm-trees that proved place placed performance, provided part pebbles and passes perfect; but perhaps principally preserve perpetual poets. — "No solitude in the caves of spring"

Machine learning systems can produce slightly more sensible text if they're given constraints, and are already used to generate decent if uninspiring summaries of sports scores and financial news. Longer generated sentences that are semantically cohesive are still a challenge, but I suspect wholly computer-generated prose will start sneaking into our online lives before we realize it.

Generating images

Deep neural networks can generate images as well, and in doing so demonstrate interesting aspects of their structure. The widely publicized 2015 Google Deep Dream images were among our first peeks into how these networks "see" the world. These pictures are dreamlike in part because the network was only being trained to classify the primary subjects of the photos — it didn't matter much what the overall composition of the picture was, just that it was mostly about sharks or pirates or motor scooters.

A clever technique that produces more realistic computer-generated media has the dreary name generative adversarial networks. GANs are actually two networks joined together: one which generates the example (say, an image), and the other which tells the first whether it believes the example is "real" or not. The two networks are trained in a kind of arms race — in each training cycle, the generator makes better fakes, and the discriminator gets more sensitive about finding them. The outcome is a much more convincing whole image. (Unless you deliberately mess with it: there's a terrific online demo by Christopher Hesse in which you can coerce a GAN into producing extremely unsettling cats.)

Summary: Deep learning systems can synthesize new examples based on their training sets, but they can't quite fool people yet. We're still years away from human-like long-form generated text, but generative images, video, and audio are advancing fast. At this point I wouldn't trust any picture with a cat in it. And soon you might not want to trust any media at all.

Task learning

One of the hottest areas of AI involves training a computer to perform a particular task through trial and error. It may surprise you that this is considered a new area of research. Computers are, after all, really good at repetition: they're endlessly patient, never bored, and unafraid of looking stupid. And it's what we're used to thinking artificial intelligence is all about.

This trial-and-error process, called reinforcement learning, is actually quite an old idea in AI, first implemented back in the 1960s just the way they show in War Games: to teach computers to play games. (If the game is simple enough, like tic-tac-toe, reinforcement learning can be done with no computer at all, just boxes of beans.)

Reinforcement learning research stalled after those initial successes; there simply wasn't enough computing power available to write general-purpose problem-solving systems, and 60s-era training algorithms couldn't scale to complex tasks. With modern hardware and some innovations in training and evaluation methods, reinforcement learning is back.

The deep neural network AlphaGo used reinforcement learning to beat human champions at the board game Go, years before conventional wisdom thought it possible. The same team of researchers pioneered using reinforcement learning to let neural nets teach themselves to play classic Atari games at super-human levels.

Why Atari games specifically? Because reinforcement learning is easiest when it's obvious whether a move was good or bad. Simple video games tend to punish wrong moves both quickly and decisively. The more the network has to backtrack to figure out where it went wrong, the harder it is to train. (The game Go is particularly difficult in this respect, which is one reason why AlphaGo did not rely on reinforcement learning alone.)

Smarter every day?

A funny thing about perception versus reality in AI: most of the machine learning systems we use in our phones, cars, and computers are software products released in the same way as traditional applications. Engineers work on them, improve them, and release new versions in stages. Their models might get tweaked or trained with updated data (maybe even data that was sampled from your activity), but that all happens behind the scenes, offline. It's difficult to construct AI models that can learn new things in real time because they often rely on carefully labeled examples — "This sentence is positive," or, "This painting was created in 1475."

Reinforcement learning allows for systems that truly could get better as they navigate the world, because they are able to measure their incremental improvement based on the task itself and not some arbitrary label. They learn more like we do.

Because the world does not all look like Mountain View, California, machine learning systems need to be able to augment their own training data with the wild diversity of real life, the weather conditions, regional accents, and skin colors that can't all be anticipated in the lab. AIs need to interact with other intelligences, both human and machine, which may behave differently than anything they've seen before. If they can't, we're still fundamentally in a world where AI doesn't get smarter unless we teach it to be.

In the last year, reinforcement learning has been augmented by other techniques, including some methods that help bootstrap the network's decision-making. Even though the technique is decades old, the press is calling reinforcement learning a transformational technology. If it doesn't hit a second wall, reinforcement learning opens up the possibility of AIs that truly do learn how to learn.