#include "common.h"
#ifdef INFORMATION
This file covers routines that createand access a "dictionary entry" (WORDP) and the "meaning" of words(MEANING).
The dictionary acts as the central hash mechanism for accessing various kinds of data.
The dictionary consists of data imported from WORDNET 3.0 (copyright notice at end of file) + augmentations + system and script pseudo - words.
A word also gets the WordNet meaning ontology(->meanings & ->meaningCount).The definition of meaning in WordNet
is words that are synonyms in some particular context.Such a collection in WordNet is called a synset.
Since words can have multiple meanings(and be multiple parts of speech), the flags of a word are a summary
of all of the properties it might haveand it has a list of entries called "meanings".Each entry is a MEANING
and points to the circular list, one of which marks the word you land at as the synset head.
This is referred to as the "master" meaning and has the gloss(definition) of the meaning.The meaning list of a master node points back to
all the real words which comprise it.
Since WordNet has an ontology, its synsets are hooked to other synsets in various relations, particular that
of parentand child.ChatScript represents these as facts.The hierarchy relation uses the verb "is" and
has the child as subject and the parent as object.Such a fact runs from the master entries, not any of the actual
word entries.So to see if "dog" is an "animal", you could walk every meaning list of the word animaland
mark the master nodes they point at.Then you would search every meaning of dog, jumping to the master nodes,
then look at facts with the master node as subjectand the verb "is" and walk up to the object.If the object is
marked, you are there.Otherwise you take that object node as subjectand continue walking up.Eventually you arrive at
a marked node or run out at the top of the tree.
Some words DO NOT have a master node.Their meaning is defined to be themselves(things like pronouns or determiners), so
their meaning value for a meaning merely points to themselves.
The meaning system is established by building the dictionaryand NEVER changes thereafter by chatbot execution.
New words can transiently enter the dictionary for various purposes, but they will not have "meanings".
A MEANING is a reference to a specific meaning of a specific word.It is an index into the dictionary
(identifying the word) and an index into that words meaning list(identifying the specific meaning).
An meaning list index of 0 refers to all meanings of the word.A meaning index of 0 can also be type restricted
so that it only refers to noun, verb, adjective, or adverb meanings of the word.
Since there are only two words in WordNet with more than 63 meanings(breakand cut) we limit all words to having no
more than 63 meanings by discarding the excess meanings.Since meanings are stored most important first,
these are no big loss.This leaves room for the 5 essential type flags used for restricting a generic meaning.
Space for dictionary words comes from a common pool.Dictionary words are
allocated linearly forward in the pool.Strings have their own pool(the heap).
All dictionary entries are indexable as a giant array.
The dictionary separately stores uppercaseand lowercase forms of the same words(since they have different meanings).
There is only one uppercase form stored, so Unitedand UnItED would be saved as one entry.The system will have
to decide which case a user intended, since they may not have bothered to capitalize a proper noun, or they
may have shouted a lowercase noun, and a noun at the start of the sentence could be either proper or not.
Dictionary words are hashed as lower case, but if the word has an upper case letter it will be stored
in the adjacent higher bucket.Words of the basic system are stored in their appropriate hash bucket.
After the basic system is read in, the dictionary is frozen.This means it remembers the spots the allocation
pointers are at for the dictionaryand heap spaceand is using mark - release memory management.
The hash buckets are themselves dictionary entries, sharing space.After the basic layers are loaded,
new dictionary entries are always allocated.Until then, if the word hashs to an empty bucket, that bucket
becomes the dictionary entry being added.
We mark sysnet entries with the word& meaning number& POS of word in the dictionary entry.The POS not used explicitly by lots of the system
but is needed when seeing the dictionary definitions(:word) and if one wants to use pos - restricted meanings in a match or in keywords.
#endif
#define DICTSEARCHLIMIT 1000
bool exportdictionary = false;
bool warnedaboutbuild0 = false;
char current_language[150] = ""; // indicate current language used
char language_list[400];// indicate legal languages- comma separated
unsigned int wordhash;
bool hasUpperCharacters, hasUTF8Characters, hasSeparatorCharacters;
unsigned int fullhash;
unsigned int language_bits = LANGUAGE_UNIVERSAL; // current language used to mark/filter WORDP items
unsigned int languageIndex; // current language used to mark facts and index data by language
unsigned int max_language = LANGUAGE_1;
unsigned int max_fact_language = FACTLANGUAGE_1;
static WORDP startOfLanguage = NULL;
static WORDP endOfLanguage = NULL;
int worstDictAvail = 1000000;
bool dictionaryBitsChanged = false;
bool multidict = false;
unsigned int languageCount = 1;
HEAPREF propertyRedefines = NULL; // property changes on locked dictionary entries
HEAPREF flagsRedefines = NULL; // systemflags changes on locked dictionary entries
HEAPREF ongoingDictChanges = NULL; // ability to revert dynamic changes
HEAPREF ongoingUniversalDictChanges = NULL; // ability to revert dynamic changes
bool monitorDictChanges = false;
bool xbuildDictionary = false; // indicate when building a dictionary
char dictionaryTimeStamp[50]; // indicate when dictionary was built
const char* mini = ""; // what language
unsigned int* hashbuckets = 0;
#define ALL_CONTROLBITS 0xFFFF000000000000ULL // 6 bits controls + other data
static unsigned char* writePtr; // used for binary dictionary writes
// memory data
unsigned long maxHashBuckets = MAX_HASH_BUCKETS;
bool setMaxHashBuckets = false;
uint64 maxDictEntries = MAX_DICTIONARY;
MEANING posMeanings[64]; // concept associated with propertyFlags of WORDs
MEANING sysMeanings[64]; // concept associated with systemFlags of WORDs
#include