Thinking as a Hobby 3478325 Curiosities served |
2007-08-08 1:06 PM Minimum Description Length Previous Entry :: Next Entry Read/Post Comments (4) I'm definitely not a mathematician, but I'm interested in structure and regularity, and have been long before I wrote the section of the SVS based on it as a value.
But I need some help here. I was reading Eric Baum's What is Thought?, a really interesting book. He talks about cognition as basically data compression and extrapolation. We discern regularity in the world and represent those regularities with the most compact representation that best fits the data (Occam's Razor). If the data closely fit a straight line, that's great. Most real-world data fits non-linear functions, like an S-curve, which are more complicated. If data points are scattered randomly, then they don't fit any function, so it is impossible to discern a pattern and encode it. Once you have a representation encoded, you can then make predictions about data points that you haven't actually observed yet. They'll either fit to the curve and reinforce your representation, or they'll fall somewhere outside the representation, in which case you'll either need to revise your representation or ignore the new data. This all sounds good, and jives with my current thinking about thinking. It nicely explains a lot of things, such as us vs. them kinds of dichotomies, simple linearly-separable views of the world, like prejudices, that are more easily represented in cognitive systems. But I'm having a bit of problem with randomness and something called minimum description length. It's the mathematical concept of the amount of regularity in a pattern, and the maximum compression of the pattern by the most minimal description. Wikipedia says:
And here's the entry from Kolmogorov complexity:
But I'm confused. I understand graphically how a random scattering of points doesn't fit to any linear or non-linear function. But I'm having a hard time with the bit-string example. If the second string could be produced by an algorithm allowing for an equal probability of each bit being either 0 or 1, which is the definition of random, then why can't we describe the second bit string as "random", which has a smaller description length than the one for the top bit string. And algorithmically, would the length of the program that outputs the top string really be much shorter than the one that outputs the second one? Are algorithms that generate random strings that complex? Or is it condition that it has to output that particular string every time?
Is any of this making any sense? Read/Post Comments (4) Previous Entry :: Next Entry Back to Top |
||||||

© 2001-2010 JournalScape.com. All rights reserved. All content rights reserved by the author. custsupport@journalscape.com |