Random thoughts about strong A.I.

The Singularity is Near

Category: Strong A.I.

WHY PROGRAMMING IS A GOOD MEDIUM FOR EXPRESSING POORLY UNDERSTOOD AND SLOPPILY-FORMULATED IDEAS

26minsky-obit-web-facebookjumbo

By Marvin Minsky @ MIT 1967

This is a slightly revised version of a chapter published in Design and Planning II — Computers in Design and Communication, (Martin Krampen and Peter Seitz, eds.), Visual Committee Books, Hastings House Publishers, New York, 1967.

There is a popular, widespread belief that computers can do only what they are programmed to do. This false belief is based on a confusion between form and content. A rigid grammar need not make for precision in describing processes. The programmer must be very precise in following the computer grammar, but the content he wants to be expressed remains free. The grammar is rigid because of the programmer who uses it, not because of the computer. The programmer does not even have to be exact in his own ideas‑he may have a range of acceptable computer answers in mind and may be content if the computer’s answers do not step out of this range. The programmer does not have to fixate the computer with particular processes. In a range of uncertainty he may ask the computer to generate new procedures, or he may recommend rules of selection and give the computer advice about which choices to make. Thus, computers do not have to be programmed with extremely clear and precise formulations of what is to be executed, or how to do it.

The argument presented here is not specifically about “design,” but about the general question of what we can get computers to help us do. For a number of reasons, it is customary to underestimate the possibilities. To begin, I want to warn against the pitfall of accepting the apparently “moderate” positions taken by many people who believe they understand the situation. Science‑fiction writers, scientists of all descriptions, economic forecasters, psychologists, and even logicians tell us often, and make it a convincing tale, that computers will never really think. “We must not fall into anthropomorphic ways of thinking about machines; they do only what their programs say; they can’t be original or creative.” We have all heard these views, and most of us accept them.

It is easy to understand why a humanist will want to rhapsodize about the obscurity of thought processes, for there is an easy non sequitur between that obscurity and the desired an anthropomorphic uniqueness. But this isn’t the non sequitur important here. The fallacy under discussion is the widespread superstition that we can’t write a computer program to do something unless one has an extremely clear, precise formulation of what is to be done, and exactly how to do it. This superstition is propagated at least as much by scientists—and even by “computer scientists”—as by humanists.

What we are told, about the limitations of computers, usually takes this general form: “A computer cannot create. It can do only exactly what it is told. Unless a process is formulated with perfect precision, you cannot make a computer do it.” Now this is perfectly true in one sense, and it is absolutely false in another. Before explaining why, it is interesting to note that ‑ long before computers ‑ the same was said of the Devil: he could only appear to be creative.

In the September 1966 issue of Scientific American, I discussed three programs: one is the checkers program of Samuel, which plays at the master level. Another is the ANALOGY program of Evans, which does rather well on certain intelligence‑test problems of recognizing analogous relations between geometric figures. The third is the program “STUDENT” of Bobrow, which takes high school algebra “story” problems given in English:

Mary is twice as old as Ann was when Mary was as old as Ann is now. If Mary is 24 years old, how old is Ann?

and solves some, but not all of them. In that article I was concerned with problems of going further, to extend such work in the direction of more versatile general intelligence. But for my purpose here, they can serve as adequate examples even in their present state, for while limited in what they can handle, they already do enough to confound the old comfortable superstitions.

The old view is that a program is “nothing but” a set of rigid rules for exactly what to do in each situation. This is indeed a useful point of view for reassuring beginners at programming, or for analyzing the programs written by beginners. However, for more advanced processes, while “perfectly” true in one sense, it would be as correct to say that “houses are nothing but arrangements of construction materials” or “books are merely long strings of words.” Indeed, a review of my Scientific American article (in Computer Reviews 8, 1, Jan. 1967) asserts that these programs are made of “dictionary lookup routines, sequences of search and comparison functions, and sort-merge type operations.”

LOGIC AND CONSISTENCY
Let me begin by discussing one of the skeptical attitudes that is derived from some statements of good logicians and bad philosophers. We are told that certain theorems about proving the self‑consistency of logical systems have the consequence: “It is impossible to mechanize completely the process of discovery and therefore humans will always have the advantage over machines.” I am sure almost every reader has heard of Godel’s theorem, and most have heard of this alleged consequence. But this is not a logical consequence, for it is based on a careless technical oversight. If applicable at all to the problem‑solving question, Godel’s theorem applies strictly only to perfectly self‑consistent logical systems. People are not this consistent, and there is no reason whatever why we should feel constrained to build our machines along such lines. Instead we can, and already do, build machines that can tolerate contradictory factual assertions. To do this, we have to add selection rules for resolving contradictions, priority hierarchies for choosing between incompatible statements, and the like. A simple example of such a scheme is described in Bertram Raphael’s 1964 MIT PhD dissertation. Here is an example of a dialog with that program:

EVERY BOY IS A PERSON
(I UNDERSTAND)
       A FINGER IS PART OF A HAND
(I UNDERSTAND)
       EACH PERSON HAS TWO HANDS
THE ABOVE SENTENCE IS AMBIGUOUS •• PLEASE REPHRASE IT)
       THERE ARE TWO HANDS ON EACH PERSON
(I UNDERSTAND)
       HOW MANY FINGERS DOES JOHN HAVE?
(I DON’T KNOW WHETHER FINGER IS PART OF JOHN)
       JOHN IS A BOY
(I UNDERSTAND)
       HOW MANY FINGERS DOES JOHN HAVE?
(HOW MANY FINGERS PER HAND?)
       EVERY HAND HAS 5 FINGERS)
(I UNDERSTAND)
       HOW MANY FINGERS DOES JOHN HAVE?
(THE ANSWER IS 10)
       HOW MANY AUTOMOBILES DOES JOHN HAVE?
THE ABOVE SENTENCE IS AMBIGUOUS. PLEASE RE-PHRASE IT)
       WHO IS PRESIDENT OF THE UNITED STATES?)
(STATEMENT FORM NOT RECOGNIZED)
       THE B0Y IS JUST TO THE LEFT OF THE TABLE
(G02840 IS A TABLE)
(I UNDERSTAND)
       THE LAMP IS JUST TO THE LEFT OF THE TABLE
(G02841 IS A LAMP)
(THE ABOVE STATEMENT IS IMPOSSIBLE)
       THE TABLE IS TO THE RIGHT OF THE CHAIR
(G02842 IS A CHAIR)
(I UNDERSTAND)
       WHAT IS THE RELATIVE POSITION OF A PERSON?
(THE LEFT-TO-RIGHT ORDER IS AS FOLLOWS)
(CHAIR BOY TABLE))

Raphael’s knowledge‑collecting, question‑answering system shows how it gives higher priority to particular facts when they conflict with general principles. This “Exception Principle” is, of course, a rather simple‑minded way to resolve contradictions, and it can be only used when there is a clear difference in level of generality between the statements. When they are on the same level, the program simply rejects the later statement, as seen here:

THERE ARE 5 FINGERS ON EVERY HAND)
(I UNDERSTAND)
       THERE ARE TWO HANDS ON A PERSON
(I UNDERSTAND)
       A BOY IS A PERSON
(I UNDERSTAND)
       TOM IS A BOY
(I UNDERSTAND)
       DICK IS A BOY
(I UNDERSTAND)
       HARRY IS A BOY
(I UNDERSTAND)
       TOM HAS NINE FINGERS
(THE ABOVE SENTENCE IS AMBIGUOUS •• BUT I ASSUME (HAS) MEANS (HAS AS PARTS))(I UNDERSTAND)
       DICK HAS ONE HAND
(THE ABOVE SENTENCE IS AMBIGUOUS •• BUT I ASSUME (HAS) MEANS (HAS AS PARTS))(I UNDERSTAND)
       HOW MANY FINGERS DOES TOM HAVE?
(THE ANSWER IS 9)
       HON MANY FINGERS DOES DICK HAVE?
(THE ANSWER IS 5)
       HOW MANY FINGERS DOES HARRY HAVE?
(THE ANSWER IS 10)
       HOW MANY FINGERS DOES JOE HAVE?
(I DON’T KNOW WHETHER FINGER IS PART OF JOE)

But of course Raphael could have written some other priority rule. Incidentally, the program’s statement, “The above sentence is ambiguous…” concerns the possibility that the word “has” might mean either “has as a part” or “owns. ” Raphael’s program usually guesses correctly by a study of whether the entities in question are already known to own things, or to be parts of things, etc. I will describe this later in more detail. Raphael’s demonstration that such “contextual” decisions can be programmed, illustrates a more general point, or rather, shows a different and healthier attitude toward programs than the “nothing but” approach. We will therefore try to explain some of these better ways to think about programs.

(1) A PROGRAM AS A SEQUENCE OF INSTRUCTIONS TO BE OBEYED.

The most common and simple‑minded view is that a computer program is a sequence of clear-cut operations to be performed on some data. Let’s take a simple example of a program: suppose that X is a number given as input:

START: SET R = 1
LOOP:    SET S = X/R
SET T=R+S
SET R = T/2
SET U = R x R
SET V = X – U
IF |V| > .001, GO TO LOOP
ELSE PRINT V
This program (attributed to Isaac Newton) will compute an approximation to the square root of X. At each line in the program it is perfectly clear what is the current status of things, and what is to be done next. Of course, it requires a mathematical analysis, to show that the sequence of situations so developed will eventually lead to an acceptable value of R, and terminate the computation.’ So there is no great harm in thinking of these statements as a sequence of steps. The “GO TO” instruction might be considered to be a super‑abbreviation for several more copies of the “main loop.” Of course, until the analysis is made, one should think of the program as an expression of a useful but poorly understood idea.

(2) A PROGRAM AS A COURT OF LAW

In Raphael’s program, it has to be decided whether “has” means “owns” or “has as parts” when a statement of the form “x has y” is encountered:

JOHN IS A PERSON)
(I UNDERSTAND)
    DICK IS A PERSON)
(I UNDERSTAND)
    A CHAIN IS PART OF A BICYCLE
(I UNDERSTAND)
     THE POWER-SAW HAS A CHAIN
(THE ABOVE SENTENCE IS AMBIGUOUS •• BUT I ASSUME (HAS) MEANS (HAS AS PARTS))(I UNDERSTAND)
     JOHN OWNS A CHAIN
(I UNDERSTAND)
     DICK HAS A CHAIR
(THE ABOVE SENTENCE IS AMBIGUOUS •• BUT I ASSUME (HAS) MEANS (HAS AS PARTS))
     THE CUCKOO-CLOCK HAS A CHAIN
(THE ABOVE SENTENCE IS AMBIGUOUS •• PLEASE REPHRASE IT)

The problem, when recognized, is transmitted to a part of the program that is able to review all that has happened before. This sub‑program makes its decision on the following basis:

(1) Is y already known to be part of some other thing? Or is y a member of some set whose members are known to be parts of something?

(2) Is y known to be owned by something, or is it a member of some set whose members are known to be owned by something?

(3) If exactly one of (1) or (2) is true, make the choice in the corresponding direction. If neither holds, give up and ask for more information. If both are true, then consider the further possibilities at (4) below. (Thus the program uses evidence about how previously acquired information has been incorporated into its “model” of the world.)

(4) If we get to this point, then y is known already to be involved in being part of something and in being owned and we need a finer test.

Let U1 and U2 be the “something” or the “some set” that we know exists, respectively, in the answers to questions (1) and (2). These depend on‑ y. We now ask: is x a member of, or a subject of U1 or U2? If neither, we give up. If one, we choose the corresponding result‑”part of” or “owns.” If both, we again give up and ask for more information. As Raphael says:

“These criteria are simple, yet they are sufficient to enable the program to make quite reasonable decisions about the intended purpose in various sentences of the ambiguous word “has.” Of course, the program can be fooled into making mistakes, e.g., in case the sentence, “Dick has a chain,” had been presented before the sentence, “John owns a chain,” in the above dialogue. However, a human being exposed to a new word in a similar situation would make a similar error. The point here is that it is feasible to automatically resolve ambiguities in sentence meaning by referring to the descriptions of the words in the sentence‑descriptions which can automatically be created through proper prior exposure to unambiguous sentences.”

Thus, the program is instructed to attempt to search though its collection of prior knowledge, to find whether x and y are related, if at all, more closely in one or the other way. This “part” of the program is best conceived of as a little trial court, or as an evidence‑collecting and evidence-weighing procedure. It is not good to think of it as a procedure directly within a pre‑specified sequence of problem solving, but rather as an appeal court to consult when the program encounters an inconsistency or ambiguity. Now when we write a large program, with many such courts, each capable if necessary of calling upon others for help, it becomes meaningless to think of the program as a “sequence.” Even though the programmer himself has stated the “legal” principles which permit such “appeals,” he may have only a very incomplete understanding of when and where in the course of the program’s operation these procedures will call on each other. And for a particular “court,” he has only a sketchy idea of only some of the circumstances that will cause it to be called upon. In short, once past the beginner level, programmers do not simply write ‘sequences of instructions’. Instead, they write for the individuals of little societies or processes. For try as we may, we rarely can fully envision, in advance, all the details of their interactions. For that, after all, is why we need computers.

(3) A PROGRAM AS A COLLECTION OF STATEMENTS OF ADVICE

The great illusion shared not only by all terrified humanists but also by most computer “experts,” that programming is an inherently precise and rigid medium of expression, is based on an elementary confusion between form and content. If poets were required to write in units of fourteen lines, it wouldn’t make them more precise; if composers had to use all twelve tones, it wouldn’t constrain the overall forms; if designers had to use only fourth order surfaces no ‑one would notice it much! It is humorous, then, to find such unanimity about how the rather stiff grammar of (the older) programming language makes for precision in describing processes. It’s perfectly true that you have to be very precise in your computer grammar (syntax) to get your program to run at all. No spelling or punctuation errors are allowed! But it’s perfectly false that this makes you have a precise idea of what your program will do. In FORTRAN, if you want your program to call upon some already written procedure, you have to use one of the fixed forms like “GO TO.” You can’t say “USE,” or “PROCEED ON TO,” etc., so the syntax is stiff. But, you can “GO TO” almost anything, so the content is free.

A worse fallacy is to assume that such stiffness is because of the computer! It’s because of the programmers who specified the language! In Bobrow’s STUDENT program, you could type once and for all, if you wish, “USE ALWAYS MEANS GO TO” and in simple situations it would then allow you to use “USE” instead of “GO T0.” This is, of course, a trivial example of flexibility, but it is a point that most people don’t appreciate: FORTRAN’s stiffness is, if anything, derived from the stiffness superstition, not an instance of some stiffness fact!

For an example of a modern system with more flexibility, a programming language called PILOT, developed by Warren Teitelman (Ph.D. dissertation, MIT, 1966), allows the programmer to make modifications both in his programs and in the language itself, by external statements in the (current version of) the language. We can often think of these as “advice” rather than as “program,” because they are written at odd times, and are usually conditionally applied in default situations, or as a consequence of previous advice. An example is the following typed in while developing a program to solve problems like the well‑known “missionaries and cannibals” dilemma ‑ the one with the boat that holds only two people, etc:

Tell progress, if m is a member of side‑1 and m is a member of side‑2 and (countq side‑1 m) is not equal to (countq side‑1 c), then quit. (An earlier collection of advice statements to the input system has been used to produce the reasonably humanoid input syntax.)

The program is a heuristic search that tries various arrangements and moves, and prefers those that make “progress” toward getting the people across the river. Teitelman writes the basic program first. But the missionaries get eaten, and the above “advice” says to “modify the progress‑measuring part of the program to reject moves that leave unequal numbers of missionaries and cannibals on the sides of the river.” As Teitelman says:

This gives the eating conditions to PROGRESS. It is not sufficient to simply count and compare, because when all of the cannibals are on one side with no missionaries, they do outnumber the missionaries 3 to 0. However, nobody gets eaten.

The point, however, is not in the relaxation of syntax restrictions, but in the advice‑like character of the modification just made in the program. The “tell progress” statement can be made without knowing very much about how “progress” works already or where it lies in the “program.” It may already be affected by other advice, and one might not have a clear idea of when the new advice will be used and when it will be ignored. Some other function may have been modified so that, in certain situations, “progress” won’t get to evaluate the situation at all, and someone might get eaten anyway. If that happened, the outsider would try to guess why.

He would have the options (1) of thoroughly understanding the existing program and “really fixing” the trouble, or (2) of entering anew advice statement describing what he imagines to be the defective situation and telling the program not to move the missionary into the position of being eaten. When a program grows in power by an evolution of partially‑understood patches and fixes, the programmer begins to lose track of internal details and can no longer predict what will happen—and begins to hope instead of know, watching the program as though it were an individual of unpredictable behavior.

This is already true in some big programs, but as we enter the era of multiple‑console computers, it will soon be much more acute. With time‑sharing, large heuristic programs will be developed and modified by several programmers, each testing them on different examples from different consoles and inserting advice independently. The program will grow in effectiveness, but no one of the programmers will understand it all. (Of course, this won’t always be successful‑the interactions might make it get worse, and no one might be able to fix it again!) Now we see the real trouble with statements like “it only does what its programmer told it to do.” There isn’t any one programmer.

LATITUDE OF EXPRESSION AND SPECIFICITY OF IDEAS
Finally we come to the question of what to do when we want to write a program but our idea of what is to be done, or how to do it, is incompletely specified. The non sequitur that put everyone off about this problem is very simple:

 Major Premise: If I write a program it will do something particular, for every program does something definite.

Minor Premise: My idea is vague. I don’t have any particular result in mind.

Conclusion: Ergo, the program won’t do what I want.

So, everyone thinks, programs aren’t expressive of vague ideas.

There are really two fallacies. First, it isn’t enough to say that one doesn’t have a particular result in mind. Instead, one has an (ill-defined) range of acceptable performances, and would be delighted if the machine’s performance lies in the range. The wider the range, then, the wider is one’s latitude in specifying the program. This isn’t necessarily nullified, even when one writes down particular words or instructions, for one is still free to regard that program as an instance. In this sense, one could consider a particular written-down story as an instance of the concept that still may remain indefinite in the author’s mind.

This may sound like an evasion, and in part it is. The second fallacy turns around the assertion that I have to write down a particular process. In each domain of uncertainty 1 am at liberty to specify (instead of particular procedures) procedure‑generators, selection rules, courts of advice concerning choices, etc. So the behavior can have wide ranges‑it need never twice follow the same lines, it can be made to cover roughly the same latitude of tolerance that lies in the author’s mind.

At this point there might be a final objection: does it lie exactly over this range? Remember, I’m not saying that programming is an easy way to express poorly defined ideas! To take advantage of the unsurpassed flexibility of this medium requires tremendous skill‑technical, intellectual, and esthetic. To constrain the behavior of a program precisely to a range may be very hard, just as a writer will need some skill to express just a certain degree of ambiguity. A computer is like a violin. You can imagine a novice trying first a phonograph and then a violin. The latter, he says, sounds terrible. That is the argument we have heard from our humanists and most of our computer scientists. Computer programs are good, they say, for particular purposes, but they aren’t flexible. Neither is a violin, or a typewriter, until you learn how to use it.

The concept of One

addition_1_to_5_using_objects_worksheet_01

The band Three Dog Night has the lyrics “One is the loneliest number that you’ll ever do.” as part of one of their songs.  They must have been working on Strong AI at the time when they came up with that song.

Lots of work related to ML and such has been applied to NLP/NLU and largely failed. If we skipped the problem of parsing English for a moment, what would you do with the information if you had a great parser?  What would the backend look like?  How would you represent the concept of the integer number one or addition?

I happen to be enamored with Category Theory at the moment.  In particular, I have been thinking about how to represent the concept of addition in the backend.  In traditional modern object oriented computer programming, there is several ways we might be able to add this functionality to a class.  Lets review some of them.

  • Interfaces: For each object that we want to be able to add, we supply an addition interface.  For instance, if you want to add two persons, the person object would implement an addition interface.  Or better yet, a container such as a List or Array object and a little generics might do the same (such as C# List<Person> People).
  • Implicit and/or Explicit Operators: The person object or better yet, the List or Array provides a way to cast between a Person or array of People and an integer allowing normal addition arithmetic to be performed.
  • Conversion Functions: .NET has a well formed system for doing conversions between objects.  Create a new conversion function that converts between people and integers.  One of the problems is loss of information.  As you convert objects, you need to make sure you don’t loose information in the process.
  • Massive switch statement in the addition operator: In this version, the addition operator has a switch statement that looks at both the left and right side of the addition symbol and tries to coalesce them so they can be added.
  • Generics/Templates: Create a generic class that tries to do most of the work.  Helps implementation, but doesn’t really reduce the amount of code required.

And more exotic ways such as proxies, mixins, traits, compound objects at runtime, automatic code generation, etc.

So, there seems lots of ways to do this, what’s the problem?  Computers can add integers natively through the instruction set of the processor.  So they don’t need to “understand” how to add, they can just “do it”.  But simple math concepts are but the tip of the iceberg of concepts to which there is no native implementation.

Part of the problem is that it all leads to an explosion of object oriented code.  For instance, I would have a Person class, that needs to implement an addition interface and numerous other interfaces that are related to individual concepts.  The Person class would need thousands of interfaces.  And every other type of object: Car, Dog, Bike, etc. would need them too.  In addition, the smarts required to generate it at runtime seems very daunting.

Worse yet it the design by Amazon’s Alexa.  The knowledge of how to do something is in individual skills coded by different developers.  Hence, you can’t even apply some of the techniques above because each skill is an island isolated from every other skill.  Ultimately, if left unfixed, this will be Alexa’s downfall.

I would like to implement addition once in my entire code base and have it apply to any object.  So, this leads to the question on how to structure a code base that would allow this.  In Alexa parlance, I want to write an addition skill and have it work with every other kind of skill. How would you represent the concept of one so it could be used by addition?

This is what I am working on now.  To truly understand the concept of counting and addition and have that concept be applied to any other kind of object.  To have the code in one place and not duplicated everywhere.

I think Category Theory might have part of the answer.  More to come.

 

Protected: Dialogue Generation Papers

This content is password protected. To view it please enter your password below:

Representing Time (Temporal Logic)

Quantum_gravity_head_FLAT

I have decided that time should be part of the base instruction set or innate knowledge of the A.I. engine. I am going with a straightforward theory of time that time flows inexorably forward, and that events are associated with either points or intervals in time, as on a timeline and familiar temporal logic of past, present and future. Theoretical physicists will be disappointed that I will skip other exotic models for now , such as time is an emergent phenomenon that is a side effect of quantum entanglement for instance.

Back to the problem at hand. Language is filled with verb tenses to describe time as the following:

I arrived in Boston.
I am arriving in Boston.
I will arrive in Boston.

Each describe the same action, but with different periods in time and all relative to “now”. Much of time is described relative to other events with the a most common of events: “now” and rarely described exactly as with a watch. This is another area for fuzzy logic (to be described in a future post). I will be working on this over the weekend.

An update on Scopes or Context

In a previous post, I had started to think about object references (proper nouns) and how that should work.  After more thought today, it seems that not only are the scopes or contexts a set of trees that get traversed to find an object reference, but the objects themselves must add have their own scopes that get added into the search order. I haven’t figured out yet how all these pieces fit together yet. Something to look forward to for tomorrow.

One could imagine a story that introduces a known character and by doing so, introduces that characters metadata into the scope for future words.

A Light at the End of the Tunnel?

light-at-the-end-of-the-tunnel

For the most of my professional life, I have ignored the various goings on within the A.I. community. First, as I have previously explained, I didn’t think that computers had the requisite computational power and second, I figured, existing research hadn’t solved the problem, so maybe the current line of thinking might cloud the way to an innovative alternative solution.

Fast forward to today and after I hope an epiphany on the Strong A.I. problem set, I have started to read the various papers of the last forty years or so. First, I am struck by how much early work has been done on such a difficult problem set. Second, that sadly, much of my own thoughts have been researched by lots of different groups in depth. That’s good news because sometimes, individual researchers have spent a lifetime on vary narrow, but important aspects of the problem set, but it does beg the question, why hasn’t anyone put all the various research together to build Strong A.I.?

Here is my thoughts on the subject. Over the last couple of days in particular, I have been reading Speech and Language Processing, by Jurafsky and Martin. I have almost finished reading the textbook and watching their videos online from Stanford. As I mentioned, I have been fortified by the approaches discussed in the book, because I have come to similar approaches independently.

I have a pretty good idea on how to build Symantec Analysis step in the parser and generate First Order Predicate Logic and the resulting inference engine (a future post). But how to get to the last step. In particular, what to do after the Symantec Analysis step in the parser. My initial thought was to use “Actions as Meaning” similar to Terry Winograd’s thoughts with his SHRDLU system. Using his own words:

“One of the basic viewpoints underlying the model is that all language use can be thought of as a way of activating procedures within the hearer. We can think of an utterance as a program – one that indirectly causes a set of operations to be carried out within the hearer’s cognitive system.”

What does this mean? It means the knowledge representation system needs to be able to run functions (actions) and maintain state. Take for instance, commands generated using SHRDLU:

Person: Pick up a big red block.
Computer: OK.
Person: Grasp the pyramid.
Computer: I don’t understand which pyramid you mean.
Person (changing their mind): Find a block which is taller than the one you are holding and put it into the box.
Computer: By “it”, I assume you mean the block which is taller than the one I am holding.
Computer: OK.
Person: What does the box contain?
Computer: The blue pyramid and the blue block.

Such systems contain a model of the current state of their domain. But what does that mean for Strong A.I. where the domain is limitless. Does that mean, the A.I. engine must model the entire universe from past to present to many possible futures?

Similar to the SHRDLU commands above, consider the following:

“Bob went to the store.”
and
“Bob goes to the store each week.”

In the first sentence, using “Action as Meaning” we would model the movement of Bob and change his location to the store. And the second, would setup a recurring event of Bob’s adventures to the store. To understand the meaning of English, does this level of detail need to be remembered and “acted” upon in the knowledge system?

My initial thought is yes, it does. After all, that’s what we do as Humans. Amazingly, we all keep these kind of mental models around in our head. Maybe the true genius is in how to create the model in such a way, that the model itself doesn’t take up the same amount of physical matter as what we are modeling? At what level do you model? If we could, would you model all of Bob’s atoms and their interactions as he moves to the store. Clearly, our brains don’t have this information, so it must not be needed to build an artificial brain. However, there would many fields of study that would appreciate this level of atomic modeling where our brains are used to think about such problems as in Microbiology.

And maybe this is why no-one has built an artificial brain yet. To do so, means to build a flexible enough model to model the universe. To build models of the infinitely small and large and common sense enough to abstract from one to the other.

An Ontology or CLR/JRE?

AlbertEinstein

Many modern computer languages generate code against a backend or framework. For instance, Microsoft languages generate code that uses the CLR or Common Language Runtime. And similarly, Java uses the JRE or Java Runtime Environment. For these languages, their compilers generate code during the semantic analysis phase against these frameworks. The question is what should the framework be for our English language compiler?

Looking at it from a language point of view (as compared to a computer programming point of view), the backend could be viewed as an ontologies or structural framework for organizing information. The two are not quite the same, because the CLR and JRE are tools used to create an ontology rather than one itself. However, digging deeper, one could say that these computer language frameworks are in fact an ontology, not for the programs they create, but the languages they support. Their ontology is a world where stacks, virtual machines, virtual processors and instructions exist.

These frameworks work within a world of base types such as integers, floating point, single characters, Booleans, strings, etc., or compound types such as arrays, pointers, records, structs, unions, classes, and so on. These types are constructed as aggregations of the base types and simple compound types. But all of these types have no ambiguity or fuzziness to them. When we say that a variable x has the value of 10, we are sure it does.

But for our English compiler, nothing is so simple. Take for instance, the population of Gilford NH, a summer resort town on Lake Winnipesaukee, is 6,803 (according to Google in 2000). But that was likely the population during the summer months, at its peak. In the winter, the number would be much lower. I am not sure what the number is today or for that matter at any other period in history. And lastly, we have conflicting data. Wikipedia says the number is 7,126 as of 2010. And the town website says 7,320 (with no date). So, if asked “what is the population of Gilford”, how should we respond?

Another example of the problem set. Albert Einstein’s hair was white. However, that was during the later period of his life. Of course, during his younger years, he had black hair. So, again when asked what color hair does Albert have, the correct answer would be an array of values representing the different time periods in his life and the corresponding hair colors.

It turns out Wikipedia has a similar problem. They have lots of people with conflicting views providing information. On Wikipedia (or more correctly WikiMedia) values are not a single answer, but a collection of many weighted possible answers. The weights are based on a number of factors, including the sources of the answer. This will require a new base type that can handle the fuzziness or ambiguity. For instance, the answer to the population might come back with 7,320 +/- 300 in 2012. Maybe given several data points, it could extrapolate a value for 2014, but increase the ambiguity. And maybe the question should be reformed, to “what is the last known census figure for Gilford, NH?”
Clearly this backend is going to be very different as compared to the CLR or JRE frameworks.

An English Compiler

51FWXX9KWVL__SY300_

Over the years, I have written several compilers/interpreters: Forth, Pascal, Basic, HyperTalk (from HyperCard), SQL, dBase, SQL to name a few. As the old adage goes, “if you have a hammer everything looks like a nail”, and this problem set looked like just another compiler for me; albeit a one with a few extra wrinkles.

So, this post discusses the current state of the English compiler:

1. Sentence determination: First you have to determine sentences from an input stream. Sadly, not as easy as you might think because periods are sprinkled throughout English in things like “Dr.” and “$4.99”. I advocate that we remove all periods except at the end of lines. In the meantime, I wrote code to determine a true end of line.

2. Lexical analysis breaks the English text into small pieces called tokens. Each token is a single atomic unit of the language. This phase is also called lexing or scanning, and the software doing lexical analysis is called a lexical analyzer or scanner. I’ve decided to do word level parsing instead of using a character based one.

3. Preprocessing: Unlike a traditional compiler where macro expansion, etc. occurs in this step, I decided now would be a good time to do determination of known character patterns such as time, date, units, zip codes, telephone numbers, etc. At this point, the original string is still available, so it’s easier than later where multiple tokens might get created for these known patterns.

4. Syntax analysis involves parsing the token sequence to identify the syntactic structure. This phase builds a parse tree, which replaces the linear sequence of tokens with a tree structure built according to the rules of a formal grammar which define the language’s syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the compiler. It’s during this phase that we add word level tagging and determine word relationships.

5. Stemming, lemmatisation, word identification, word joining and metadata adornment is done next. In a previous post, we talked about stemming and lemmatization, so I won’t go into it further here. During this phase, we combine proper nouns together to form full nouns. For instance, two tokens for my name “Peter” and “Chapman”, would be combined into a single token “Peter Chapman”. Also, during this step, we examine each token and corresponding word and adorn the token with additional information required in future steps. For instance, we might add metadata that a particular token is a floating point number, etc.

This is where I am today. The next step is what I am going to work on tomorrow.

6. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree and performs semantic checking. Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the parsing phase, and logically precedes the code generation phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.

Progress: Stemming and Lemmatisation

A writer is someone who writes, and a stinger is something that
stings. But fingers don’t fing, grocers don’t groce, haberdashers
don’t haberdash, hammers don’t ham, and humdingers don’t
humding.
–Richard Lederer, Crazy English

Just finished coding and testing the stemming routines.

A stemmer is a function that returns the root of a word. The de facto gold standard is the Porter Stemmer algorithm.

In many languages, words appear in several inflected forms. For example, in English, the verb ‘to walk’ may appear as ‘walk‘, ‘walked‘, ‘walks‘, ‘walking‘. The base form, ‘walk‘, that one might look up in a dictionary, is called the lemma for the word. The combination of the base form with the part of speech is often called the lexeme of the word.

Lemmatisation (or lemmatization) in linguistics, is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item

I also finished added Lemmatisation today as well. See Wikipedia for more information.

One of the things you learn about the world of natural language processing is they have a lot of made up words.

Innate Knowledge?

innateidea

It seems that some in the field of A.I. wish to start at a nerve cell as the basic building block. However, to build a floating point arithmetic using nerve cells to me seems like quite a challenge (since I have written these functions in assembly language). I decided to jump ahead several million years in evolution. A nerve cell is too low level for me. That’s not to say that we will forgo some of the underlying principals, but it doesn’t need to be our basic building block.

The first microprocessors did not support floating point instructions. Instead, floating point functions where created out of more primitive integer instructions. Somewhere in the early history of microprocessors, they added floating point instructions to the base instruction set.

This leads to a most important question: What is the base instruction set for Strong A.I. or another way of thinking about it is what is the Strong A.I. innate knowledge? Should integer instructions be included? How about floating point? What other concepts? How about time? How about things it can never directly experience such as color? Wouldn’t it be better to have a sighted person define as part of the base instruction set a concept of color rather than trying to teach a blind program about the concept? Is the base instruction set immutable. Over time, can the A.I. engine come to it’s own meaning replacing or adding to the meaning given to by the original programmers?

And it is clear, that not all learning can come from the base instruction set. Somewhere you have to bite the bullet and get to the A.I. piece. And maybe in this sense, that starting with nerve cells allows you to focus on this question without all the clutter.

I don’t have all the answers yet, but very interesting questions, indeed!