Wednesday, 30 July 2008

Knowledge Storage

I'm back at work on Egor now.

While the old version handled sentences such as 'what is a cat?', I want to now extend this fully to WH questions (what, why, where etc):

Thus questions such as 'where do you live?'. The old system was a bit of a bodge. Now, when a question such as this is formulated, it adds an entry for the 'WH-word' unknown into the knowledge tree... it can either identify the answer now or perhaps come up with the answer at a later time when it has more knowledge.

An interesting thing happens when you look at slightly more complex variants of these questions.

For instance:
------------------------------------
The cat eats sardines in the kitchen.
The cat eats mice in the garden.

Where does the cat eat sardines?
------------------------------------

Initially I was storing the information that the cat eats sardines, and the cat eats mice on separate branches (sub trees) from the subject. However, it occurred that reusing branches may be the way to go, both in terms of efficient compression of information, but also in terms of speedy and efficient access to the information.

However, once you start compressing the information, another 'issue' appears:

If you store, 'the cat eats sardines in the kitchen' in one tree, it essentially doesn't matter the order of the object and supplementary information...

i.e. the cat eats in the kitchen sardines = the cat eats sardines in the kitchen.

Once you start compressing several sentences of information in the same subtree, you then have to start considering the order of information.

Thus: The cat eats sardines in the kitchen, The cat eats tuna in the kitchen...

You may start to think of this as a hierarchy: cat -> eats -> in the kitchen -> sardines / tuna

However, this has many implications. Firstly you can no longer directly store information as generics (i.e. in tree terms the 'in the kitchen' needs to be distinct and have child nodes). This is an added level of complexity - so we would have to be sure we were getting a payback for that complexity.

In addition, once you start to consider several pieces of supplementary information for a sentence, the optimum storage arrangement may not be obvious (i.e. how are you going to regularly access this information determines the best tree structure).

As I am modelling things according to how biological systems tend to work .. there is also the point that biological systems often take the simplest path (making complexity from simple rules) rather than working with a complex 'operating system'. I.e. there is a danger of anthromorphosizing the problem - producing a computer science solution instead of a simpler (possible) biological solution.

I am not sure which one to go with at the moment, because it seems a major design issue. I may well start by experimenting with the simple approach. It may turn out to be incorrect (and later need a considerable rewrite), but the fact is that the whole project is a huge undertaking and I would rather have a simple system working than a more complex system that I didn't have nearly enough time to get to a working state.

In essence I can't hope to get everything perfectly right and optimal on my first attempts, I think this is something that will be refined in many decades to come, to one or several optimal solutions.

No comments: