From: Michele Andreoli ([email protected])
Date: Thu Apr 05 2001 - 19:13:12 CEST
I managed to put a rudimental form of fuzzy-logics in the "tell" command.
This is a fashinating question, i'm enjoyng with the subject.
Fuzzy, i.e. approximate pattern matching, is being introduced in muLinux
using the standard mulinux gate: i.e. the rustic gate :-)
You know the subject: "tell" consult a DB to find the traslation of
a sentence. The DB is simply a list of pairs (x,y) in a file, where
x="rustic english sentence" and y="translated sentence". Sentences
may spans over several lines:
Examples: segment from it.db
=======================================================================
^B
You may want to give the PASSWORD and/or an USER_NAME
required by the server. Otherwise, leave blank.
^A
Potresti voler specificare la PASSWORD e/o l'utente
(USER_NAME) richieste dal server. Altrimenti, lascia bianco.
=======================================================================
The problem is: If someone modify "y", no problem. "Tell" use the "x"
field to get a translation.
But if I modify "x" in the Setup scripts? "Tell" is unable to find
the right "y", because sentences aren't numbered.
So, I introduced approximate pattern matching in AWK: if changes
are not relevant for "x", the good "y" is always chosen.
I introduced also a "threshold": if score<thereshold, the "tell"
program simply outputs "x", not "y".
"Tell" is "changes-tolerant".
How it works? (still experimental)
--------------
When you write:
tell "my name is Bond"
the program scan the DB and compile a dictionary for every "x"
sentence. Every word is simply replaced with a single character A,B,C ...
The original sentence is rewritten using the new dictionary, as
a single string like ACDE, ABFH, etc.
Example: is the program found "is Bond from England?" , this second
is converted as "ABCD" and "my name is Bond" is converted as "CD",
because "my" and "name" are not in the dictionary.
A this point, I can use the built-in, fast functions in AWK that handles
regular expressions and wild-cards, such match(), etc.
match("ABCD","[CD]+")
mAWK return also the "length of the match", in this case 2, and
score is set to 50%.
Mah!
Michele
-- In summing up, I wish I had some kind of affirmative message to leave you with, I don't. Would you take two negative messages? - Woody Allen --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
This archive was generated by hypermail 2.1.6 : Sat Feb 08 2003 - 15:27:18 CET