Thinking as a Hobby

Get Email Updates
Email Me

Admin Password

Remember Me

3478384 Curiosities served
Share on Facebook

Learning, Search, and Optimization
Previous Entry :: Next Entry

Read/Post Comments (0)

This is an attempt to explain different types of learning, which can be considered as different types of search, using an extended analogy.

In >Blondie24, Fogel uses the analogy of a radio. Let's say there's only one station broadcasting music, and you have a single knob to turn to try to find the signal. You turn the dial until you get something that kind of sounds like music, then you adjust the knob (which is a parameter) until you optimize the signal and it sounds nice and clear. What if you had two knobs? And instead of moving along a line, you had to adjust the two knobs like an Etch-a-Sketch to find the single point in 2D space that gave you the best signal? This idea extends to hundreds or thousands of parameters, which is what real-world problems involve. At a gross-scale level you can think of your brain as a radio with thousands of knobs, which are in constant need of tuning.

Personally, I like the landscape analogy better, and I'll use it here to explain the differences between some different types of learning.

Think of it this way: You're the head of an exploratory team, and your job is to explore an area of terrain and find the highest peak. You don't have any maps or prior information, and you also have to explore completely in the dark. It's like the maps in video games where they start out all black, and fill in as you explore.

Let's say the terrain looks like this:

What different strategies could you use?

Error-driven Learning

The analogy here is that you have a lone explorer parachute into the landscape at night. He lands in a random spot in the terrain. Maybe he gets lucky and he's close to a peak. Maybe he's in a low spot. He sets off in a particular direction (in this case he's adjusting the x and y parameters by moving, like moving the two knobs on the radio). Remember he's trying to find the highest peak, so the strategy he uses is: If his altitude rises when he moves in a particular direction, he keeps going in that direction. If his altitude decreases, he stops moving in that direction and instead chooses another path. This is known as a basic hill-climbing algorithm.

It's analogous to the widely-used technique in training artificial neural networks via backpropagation. Basically you get a bunch of inputs, the network is activated, and error is generated. The error is used to propagate changing parameters in the network (like turning the knobs or moving in a different direction) to reduce error. This method is limited in that your explorer may find a local high peak, and then get stuck, because at that point moving in any direction will increase the error. So it's difficult using this method to find the global peak. There are some ways around this, but in general this is how it works.

Evolutionary Algorithms

Evolutionary algorithms use a population, so the analogy here is that instead of a single explorer, you may have 100 explorers parachute in at random spots. Each generation, each explorer has his altitude measured. The ones in the highest spots live, the others die, and the survivors reproduce and send out more explorers in random directions (analogous to mutation).

This type of search is potentially more powerful, since you have more explorers searching in parallel. But the downside is that it is much more expensive computationally. Also, you can have the same problem of convergence on local peaks. The mutation rate can be thought of as an "exploration rate", and another problem is that this may be set too low or too high. If too low, then progress will be slow and convergence on local peaks is more likely. If too high, then explorers are more likely to abandon local peaks (which could be good), but then they are also more likely to completely overshoot good solutions.

There is also reinforcement learning, which I don't understand quite as well, but which might be more analogous to error-driven learning. The main difference, I think, is that instead of getting a clear error signal, the agent is provided a reward (like getting a cookie) for maximizing his altitude, and that payoff may be adjusted for either short-term gains or long-term gains. In search algorithms, a "greedy search" is one in which short-term payoffs are more important than long-term ones. There's a trade-off between exploitation and exploration. This is like the grasshopper and the ants. Carpe diem or plan for tomorrow as two extremes. Usually the best strategy is somewhere in between.

Since we are products of both culture and evolution, our minds are the result of a combination of different types of learning. Sometimes its difficult to wrap your head around abstract concepts, and I find analogies particularly useful. And both the radio and explorer analogies lend themselves well to explaining search and optimization.

Read/Post Comments (0)

Previous Entry :: Next Entry

Back to Top

Powered by JournalScape © 2001-2010 All rights reserved.
All content rights reserved by the author.