Thinking as a Hobby


Home
Get Email Updates
LINKS
JournalScan
Email Me

Admin Password

Remember Me

3478611 Curiosities served
Share on Facebook

How Much Can We Learn Without Feedback From the Environment?
Previous Entry :: Next Entry

Read/Post Comments (5)

I'm planning on defending my dissertation proposal in the Fall, and the focus of my research has shifted mostly to learning. My thinking is going in a particular direction, and I thought I'd talk it out here.

In machine learning, types of learning are usually divided into categories such as:

Supervised learning describes approaches where there is an external signal from the environment regarding what the correct output of the system should be. There are degrees of supervision. For example:

Teacher: Say "cat".
Child: Cad.
Teacher: No, "cat".
Child: Cat.

An example of a higher degree of supervision might be a case where a teacher is teaching a child how to draw a circle. In one case they might verbally guide the child, "No, go up, then over." To increase the level of feedback and make it more explicit, the teacher might actually hold the child's hand and move the pencil so that it draws the circle.

Reinforcement learning refers to learning where the agent receives feedback from the environment in a less direct way, in the form of either a reward or punishment. Using the examples above, instead of correcting the child, the teacher might give the child a cookie if the behavior is correct, or hit them if it is incorrect.

Finally, unsupervised learning includes those models where there is not explicit feedback from the environment, either in the form of supervision or a reward/punishment signal.

But how in the heck do you learn anything without any kind of explicit signal from the environment?

My thinking is beginning to shift to the idea that the bulk of what we learn, we learn in this way.

Here are some examples...

How did you learn that when one pool ball hits another on a pool table that it causes it to move in a particular way? Did you learn this in school? Were you given a cookie every time you saw a similar interaction in the world? I don't think so. What happened was, you saw very similar interactions in the world, over and over, with no reinforcement signal or supervision, and your neural circuitry was able to encode the statistical regularity of the events you witnessed.

Basically, if you see the same thing over and over again, you learn that things work that way, so you expect them to in the future.

Another more contentious area is language. In both children and second language learners comprehension outstrips production. Unless we want to believe that there is little to no language learning occurring before the child starts speaking, or even extensively interacting with the environment, the logical conclusion is that language learning is occurring in the child without explicit reinforcement or supervision. Basically, for learning the foundations of language, the child is learning the statistical regularities of speech, that certain sounds tend to follow one another more than others (e.g. "th" is followed a lot more often by "uh" than by "ch").

How do these things get learned? By close temporal proximity of the inputs. In other words, time is the teacher. And in the case of visual learning, things that tend to be grouped together in space tend to be closely related. So really space and time are the teachers.

Who you are and what you know is a result of your synapses and how much the firing of one neuron will influence the firing of another. The phenomenon of long-term potentiation is an empircally-verified process whereby when one neuron fires just before another one that it is connected to, helping that neuron to fire, then that synapse will strengthen, so that the next time the first neuron fires, it will have an even bigger influence on causing the second one to fire.

So when you see or hear or feel input A, and it's closely followed by input B, your A -> B link will strengthen. The more you see A followed by B, the more it will strengthen.

Now neurotransmitters like dopamine can facilitate and modulate this kind of learning, acting as a reward signal. But it is not necessary for the learning to occur.

So I think the bedrock of the rest of our cognition rests on this type of learning, the statistical clustering of related elements in space and time without any explicit feedback from the environment.


Read/Post Comments (5)

Previous Entry :: Next Entry

Back to Top

Powered by JournalScape © 2001-2010 JournalScape.com. All rights reserved.
All content rights reserved by the author.
custsupport@journalscape.com