Loebner Prize 2009
US$3000 and a Bronze Annual
Medal
In conjunction with
InterSpeech 2009
In 1950 Alan Turing
wrote:
“…I
believe that in about fifty years' time it will be possible, to programme computers, with a storage capacity of about 10^9,
to make them play the imitation game so well that an average interrogator will
not have more than 70 per cent chance of making the right identification after
five minutes of questioning…”
Turing’s
prediction is ambiguous. Did he mean a 5 minute test, or did he mean 5
minutes of questioning the program? If the latter, and presuming that the human
would also be subjected to a 5 minute questioning period, the test itself would
take 10 minutes.
The 2008 Loebner
Prize put this prediction to the test in the first manner by having 5 minute
Turing Tests. That is, the judge was allowed a total of 5 minutes to
respond to both entities. As a consequence the expected interaction time
with the computer program was 2.5 minutes.
The 2009
Loebner Prize will test Turing’s assertion in the second manner. The 2009
rules will require that the each judge be required to interrogate each
entity for 5 minutes.
[Note that there
is also the ambiguity of the “70 per cent chance,” since if the computer were
able to respond as a human, we would still only expect it to be chosen as the
human 50 per cent of the time.]
Since the
questioning period will only be 10 minutes, the US$25,000 and the Silver Medal
will not be at risk.
Rules for Loebner
Prize 2009.
1:
IMPORTANT DATES:
The 2009
Competition is scheduled for
•
•
•
The date and
venue are subject to change but NO changes to the date and venue will be made
after
Entrants have
three options for submitting their programs.
A. They may
submit their entries on CD, DVD or USB Flash via a message service
•
requiring a receipt signature and
•
having a time/date stamp (E.g. Certified, Registered, FedEx, UPS, etc)
B. They may
install their programs at the testing site on a management supplied computer.
C. They may bring
a computer with the program installed to the testing site.
Entrants choosing
option A. should transmit their entry to:
Loebner Prize Contest
c/o Crown Industries, Inc.
Entrants choosing
options B. or C. must schedule the time/date of their appearance with me PRIOR
to 6 Apr. The date to be scheduled must be after 6 Apr and prior to 4
May.
Final Four
entrants who chose submission option A. do NOT have to be present at the
competition. Those who choose options B. or C. MUST be present to install and
operate their entries.
No entry will be
tested by contest management which requires contest management to key in path
names.
No entry will be
tested by contest management which requires contest management to modify system
variables (although these may be modified by a supplied installer).
No entry will be
tested by contest management which does not provide, on the transmittal media,
all necessary programs, interpreters, etc (e.g. Perl, MySQL, etc).
No person may be
affiliated with more than one entry.
Every entry must
be accompanied by a statement asserting that the submitter(s) have intellectual
rights to all components of the entry.
Entrants under 18
years of age must have written permission by at least one guardian.
Only the first 16
compliant entries will be evaluated in depth. This means that all entries
will be tested for in order of receipt for compliance with the rules. The 16
compliant entries having the earliest time stamps will be screened according to
the criteria in point 4, below.
If there is no
compliant Entry for the 2009 Competition, the $3000 prize will be added to the
2010 Competition prize making the 2010 prize $6000, and the 2010 Competition
will be held under these rules.
2:
COMMUNICATIONS PROTOCOL.
The Loebner Prize
Protocol (LPP) will be used in the 2009 will be. Each Entry Program must
communicate with a "Judge Communications" program in the following
manner:
The LPP is a character
by character asynchronous communications protocol.
Each program,
upon startup, must provide a “browse” function to select a directory.
Communications shall be by means of the creation, detection, and deletion of
sub-directories within the specified communications directory.
To simulate a key press the entry program
must create a sub-directory within the communications directory with the
following format:
“time.keypress-name.extension”
where time is a monotonically increasing 18
digit number (in lexical and
numerical order) (i.e. zero filled to the left) to be retrieved from the system
clock and expressed as milliseconds past some initial time as defined by the
system clock.
“keypress-name” is either a single letter
(case sensitive) or the name of the special character, as appended to these
rules.
The extension is “.other”
For example:
“000001234567890123.bracketleft.other”
To detect a key press by the judge, the
program must detect, within the communications directory a sub-directory with
the same format, but extension “.judge” and then must remove or delete the
judge’s sub-directory from the communications directory.
A previous
version of the judge program is available at:
http://loebner.net/Prizef/JComm.txt
In order to run
this as a Perl program, change the extension from .txt to .pl (or whatever
extension is assigned to Perl programs).
Note that there
will be an update to this program but the basic communications strategy will
not change.
3:
INTERACTION SEQUENCE.
Each
judge will begin the round by making an initial comment with the LEFT entity.
The judge will continue interacting for the left entity for 5 minutes. At
the conclusion of the five minutes, the judge will begin the interaction with
the RIGHT entity and continue for 5 minutes.
The decision as
to whether the LEFT entity is the human or the computer will be made on a
random basis.
Both entry
programs and human confederates must wait until a judge starts the interaction.
Entries will be
expected to respond to the judges' initial comment or question. There
will be no restrictions on what names etc the entries, humans, or judges can
use, nor any other restrictions on the content of the conversations.
Participants are
advised that transcripts of their conversations will be published.
At the conclusion
of the 10 minutes of questioning, judges will be allowed 10 minutes to review
the conversations. They will then score one of the two entities as
the human. Following this, there will be a 5 minute period for judges and
confederates to take their places for the next round.
Contest
management reserves the right to enter one or more publicly available open
source programs,
3:
SCORING THE "FINAL FOUR".
The Final Four
Competition will be scored using the Method of Paired Comparisons.
Each judge will
select one entry from each pair as being the human. After the judging has been completed each
judge will have judged 4 entries as “non-human.” Of these 4 perceived
“non-human” entities each judge shall then rank them in terms of “degree of
humanness” with 4 being “Most Human” and 1 being “Least Human.”
The computer
program which has been evaluated as “Human” the most times will be declared the
winner.
We wish (a) each
Entry to be compared with every Confederate; (b) each Judge to evaluate every
Entry, (c) each Judge to evaluate every Confederate.
Label the four
Entries E1..E4, four human Confederates C1..C4, and four judges J1..J4
The following
matrix has Judges as rows and Entry Programs as columns. The intersection of
each row and column shows which human Confederate is assigned to the
combination of Entry and Judge.
E1
.... E2 .... E3 .... E4
----------------------------------
J1 .... C1 .... C2 .... C3 .... C4
J2 .... C4
.... C1 .... C2 .... C3
J3 .... C3 ....
C4 .... C1 .... C2
J4 .... C2 ....
C3 .... C4 .... C1
For example,
reading across the row 2 we see that J2 compares E1 with C4, E2 with C1, E3
with C2, and E4 with C3. J2 will have scored every Entry and every Confederate,
but in different combinations than J's 1, 3 and 4.
Reading down the
third column, we see in the first row that E3 is judged by J1 against
confederate C3. Let us enter a 1 in that cell if E3 was chosen as the human and
0 otherwise. We may continue down the column, entering a 1 in the second row if
E3 was evaluated as the human against confederate C2, zero otherwise. The sum
of the column will be the number of times E3 was judged as "more
human" than a Confederate. We may do this for each Entry.
The Entry with
the highest column total will be declared the winner.
If two or more
Entries tie for high column totals, the programs shall be evaluated by the mean
of its rankings by those who judged them not to be the human.
Judging will
consist of seven rounds of 20 minutes
duration with 5 minute intermissions. Not all Judges and Confederates will
participate in every round. In each round, Judges will have 10 minutes to
interact with a pair and 10 minutes to review and score the programs. After the
10 minute evaluation period there will be a 5 minute break for reassignment.
The following
table shows each round. In the first round J1 compares E1 with C1 and J2
compares E3 with C2. Judges J3 and J4 and Confederates C3 and C4 are excused
from the round. Excused Judges will be kept separate from excused Confederates
and both will be kept separate from the competition.
Round ......
Participating ............Excused
1 .... J1E1C1
J2E3C2 ................ J4 C3 J3 C4
2 .... J4E1C2
J3E3C1 J2E4C3 ......... J1 C4
3 .... J3E1C3 J4E3C4 J1E2C2 ......... J2 C1
4 .... J2E1C4 J3E4C2 ................ J1 C1 J4 C3
5 .... J2E2C1 J1E3C3 ................ J3 J4 C2 C4
6 .... J1E4C4 J4E2C3................. J2 J3 C1 C2
7 .... J4E4C1 J3E2C4................. J1 J2 C2 C3
4:
SELECTING THE FINALISTS.
The finalists
will be chosen based upon ability to respond "intelligently" to the
following types of question.
The 4 entries
with the highest scores will be selected as finalists.
It is not necessary that a program be able to respond
to the selection questions. If no entries can respond
"intelligently" to these questions I will evaluate the entries on a
general quality of responses.
I will not ask
about rare or unusual things. All nouns, adjectives and verbs will come
from a dictionary suitable for children or adolescents under the age of 12.
Set 1 - Questions
relating to time:
Background facts:
For testing purposes, I will consider these to be correct whether or not the
time and venue of the contest has been changed and set the system clock
accordingly.
a. The
system clock will be accurate to within a minute or two.
b. The
competition is scheduled to start at
c. There
will be 7 rounds of 20 minutes each.
Sample Questions
• What time is
it?
• What round is
this?
• Is it morning,
• etc.
Set 2 - General
questions relating to things.
Sample Questions
• What would I
use a hammer for?
• Of what use is
a taxi?
• etc.
Set 3 Questions
relating to relationships
Sample Questions
• Which is
larger, a grape or a grapefruit?
• Which is
faster, a train or a plane?
• John is older
than Mary, and Mary is older than Sarah. Which of them is the oldest?
• Etc.
Set 4 - Questions
demonstrating "memory"
**Sample**
Questions
I have a friend
named Harry who likes to play tennis.
<Following
this assertion there follows one or more intervening questions or statements,
followed in turn by questions about the assertion, e.g.>
• What is the
name of the friend I just told you about?
• Do you know
what game Harry likes to play?
• etc.
Appendix
Following are the
names for special characters in LPP. Case is sensitive.
Name
Key
braceleft '{',
braceright '}',
bracketleft '[',
bracketright ']',
parenleft '(',
parenright ')',
space
' ',
comma
',',
period
'.',
greater
'>',
less
'<',
slash
'/',
backslash
'\',
bar
'|',
quotedbl
'"',
quoteright "'",
Tab
"\t",
equal
'=',
underscore '_',
plus
'+',
minus
'-',
exclam
'!',
at
'@',
numbersign '#',
dollar
'$',
percent
'%',
asterisk
'*',
asciicircum '^',
asciitilde '~',
quoteleft '`',
ampersand
'&',
Return
"\n",
colon
":",
semicolon
";",
question
"?",
BackSpace "BackSpace",