HP-71 POLL HANDLERS - Michael Markov, P.O. Box 17, Lockwood, N.Y. 14859

Poll handlers provide incredibly powerfull means to enhance the performance of
your HP-71. However, if the poll handlers in your HP-71 are poorly designed, or
if there are too many LEX files, each with its' own poll handler, then you can
expect sluggish response and degraded performance. In this article, I will
share with you some of the information I picked-up in my search for the "ideal"
poll handler. I hope this will stimulate discussion on poll handler
optimization. I welcome comments and suggestions.

An excellent way to optimize the response time of your HP-71 is to combine all
the smaller LEX files that reside more or less permanently in your HP-71 into a
single large LEX file.  See the WRITHEAD programs and the associated
documentation on SWAP07. Very quickly, you can find that you are handling a
dozen or more polls, and writing the poll handler requires considerably more
care than before.

DESIGN CONSIDERATIONS:  First, we must keep in mind that all the poll handlers
inside an HP-71 are proccessed in series, one after another.  As far as HP-71
performance is concerned, you are dealing with all of the poll handlers at
once.  When you combine several small LEX files into a single large LEX file,
you eliminate quite a lot of pointer and register save operations.  See the
documentation for =POLL and =FPOLL. However, unless you optimize your combined
poll handler, the time required to execute the poll handler shells will remain
essentially unchanged. Proper design of the poll handler shells can minimize
the average poll handler execution time, thus improving HP-71 performance.

Obviously, we also want to minimize code length, provided that this does not
increase execution time. These objectives are usually but not neccessarily
mutually exclusive.  The best way to clarify this statment is to analyze some
actual code examples. We shall start with the "standard" poll handler shell,
which is used almost universally:

POLHND  SETHEX              See example on pg. 53, FORTH/ASSEMBLER ROM manual
        P=       0
        LC(2)    #FB
        ?B=C     B
        GOYES    POLLFB     Configuration poll, save buffers, ect.
        LC(2)    #1E
        ?B=C     B
        GOYES    POLL1E     Unrecognized IMAGE char in parse
        LC(2)    #1F
        ?B=C     B
        GOYES    POLL1F     Unrecognized IMAGE symbol in execution
        ?B=0     B
        GOYES    POLL00
NtHNDL  C=-C-1              Clear carry, NoT HaNDLed.
RTNSXM  RTNSXM
-------------------------------
POLLFB  GOTO     pollfb     Often required to fix a "jump too long"
POLL1E  GOLONG   poll1e     error during assembly
        .
-------------------------------
POLL00  .                   Process code follow.
        .
        .
POLL1E  .
        .
        .
        ect.

The above source code does not process a poll in any way. It is the "shell"
that must be executed to determine whether or not you wish to do some
processing when ANY poll is issued. This shell is the part of a poll handler
that deserves extra programmer attention because any poll that is NOT handled
must execute the entire shell. Any delays you introduce will adversly affect
the performance of your HP-71. That is not to say that the code that processes
a specific poll deserves less attention, just that it affect performance only
when the specific poll is issued.

So, what's wrong with the above? The code seems to be very compact, and you
do not waste time deciding whether or not you handle the particular poll.
Still, take a look at the following considerations:

- SETHEX and P=   0  are not needed, unless some LEX file has altered this
  status. If so, this is a bug in the misbehaving poll handler. Including these
  instructions will help prevent crashes caused by other programmer's mistakes.
  The penalty associated with including these instruction is only 4 nibbles and
  5 machine cycles. This is a very low cost for safety, especially if you have
  combined all of your LEX files into one.. How confident are you of all the
  LEX files in your machine?
  (If you crash often, then I would advise including SETHEX and P=  0.)

- LC(2)  #1F  should be replaced with  C=C+1   B    This saves a nibble without
  increasing execution time.

- GOYES has a naximum range of  -128 <= offset <=127. This means that you may
  need to introduce a table with secondary jumps, as shown above.

- The C=-C-1   A instruction clears carry. However, if the poll was not 00,
  that is, the VER$ poll, then carry is already cleared by the ?B=0   B
  instruction. C=-C-1   A  should therefore be omitted unless the sequence
  C=-C-1 A    RTNSXM can be used to reduce the length of a poll processor.

- The test for the configuration poll, POLLFB should be the next-to-last test
  in the sieve, not the first. This would insure slightly faster response to
  the IMAGE polls. The logic is that the polls that are issued more frequently
  should come first.

  So - should the parse time poll be first, or the execution time poll?
  Personally I would say the parse time poll should be first, as I spend a lot
  of time programming. People concerned with running programs would most likely
  decide otherwise.

- The GOLONG instruction should not have to be used. It is both longer and
  slower than GOTO. Try relocating the process code closer to the poll handler
  shell.

- Polls that are not handled by this LEX must go through the entire sieve, each
  and every time. This is not a real problem with short sieves as given above,
  but if your LEX responds to a large number of polls, the time wasted is no
  longer insignificant.

IN SPITE OF ALL OF THESE NEGATIVE COMMENTS, THIS IS ONE OF THE BETTER WAYS TO
DO THE JOB UNLESS YOU HANDLE SO MANY POLLS THAT COMPUTED JUMPS USING AN OFFSET
TABLE BECOMES JUSTIFIED. IF YOU PROCESS THE HIGH FREQUENCY POLLS (pKYDF, pWTKY,
pSREQ,... ) FIRST, YOU CAN INSURE THAT 80% TO 90% OF ALL POLLS ISSUED WILL BE
PASSED ON TO THE PROCESS CODE IN EITHER 20, 33, 46, ... MACHINE CYCLES. THIS IS
VERY FAST, EVEN IF YOU HAVE TO ADD 11 OR 14 CYCLES FOR A SECONDARY JUMP (GOTO
OR GOLONG) FOR JUMPS TO THE REMAINING 10% TO 20%.

Consider the following alternative: The mainframe =FINDA routine can be used to
reduce the length of your poll handler shell in at least half. Nice, is'nt it?
But before you jump-up and go rewrite your poll handlers, you should also know
that the execution time required to scan through the shell will also increase,
by a factor of more than 3!

POLHND   ABEX    A          Saves A[A] in B[A], data may be needed by polls
         GOSBVL  =FINDA     Table scan routine. See also WRDSCN routine.
         CON(2)  #1E
         REL(3)  POLL1E
         CON(2)  #1F
         REL(3)  POLL1F
         CON(2)  #FB
         REL(3)  POLLFB
         CON(2)  0          END-OF-TABLE MARKER
         ?A#0    B          Poll process is now in A[B]
         GOYES   NtHNDL     Not HaNDLed, part of VER$ poll handler
POLL00   C=R3               The VER$P poll handler follows.
         .
         .
NtHNDL   ABEX    A          Not really required, but very safe thing to do.
         C=-C-1  A          Clear carry for Not Handled
         RTNSXM

By using the =FINDA entry point, you will increase the allowable jump length by
a factor of 16. Secondary GOTO or GOLONG instructions should not be required.

The third major type of poll handler shell is the one implemented by the
HPILROM. See the HPIL ROM IDS, vol. II, NZ&DIR for details.  The HP-IL poll
handler uses two tables of offsets to the process code to minimize the length
of the poll handler shell, without significant increases in the time required
to execute the shell.  The fact that it takes only 63 cycles to determine
whether a poll is included in one of the two tables is a major advantage, as it
insures that polls that are not handled or processed will not suffer delays...
Also, the process code could easily be anywhere in the HP-71, as the offsets
used are REL(5) POLLxx instructions.  The major disadvantage is that the tables
must include offsets for every poll in the range #00 to #1D and #FF to #F7,
including some 8 offsets to a RTNSXM, which are not handled but must be in the
tables to permit a fast computed jump.

I believe the HPILROM poll handler shell could have been improved upon by using
a single table going from #F7 to #1D. Sometimes, I will have to try to write
the code to see if it can be done without undue execution time penalty. Also,
it should be possible to modify the HPILROM poll handler shell to use REL(3) or
REL(4) jumps without too much difficulties, thus cutting the length of the
address look-up tables by up to 2/5. However, REL(4) would probably involve
time consuming P pointer manipulations that would introduce undesirable delays.
These options could be considered in the event you are desperate for for a few
bytes to make everything fit on an EPROM.

The HPILROM poll handler is 269 nibbles long, and it processes 31 polls. An
equivalent standard shell would use 424 machine cycles to determine that it
does not process a given poll, and be 287 nibbles long, not including the
secondary jumps that would be almost certainly required. It is also important
to note that the average time required to get to the poll processing code would
exceed the 168/197 cycles required by the HPILROM handler. Thus, the HPILROM
handler is both shorter and faster than other alternatives when you need to
process 30 or more polls... assuming that all polls occur at more or less the
same rate, which is the best assumption to make unless you obtain an actual
frequency distribution.

The following table summarizes the trade-offs between the three major types of
poll handler shells. Here, execution time refers to the time it takes to
determine that a given poll is NOT handled. Obviously, if the poll is
processed, the time required to get to the process code would be the time to
get to the Nth entry on the table, see the table below. With standard poll
handler shells, you will need an extra 7 cycles due to the difference between
GOYES jumps with carry set, and possibly an extra 11 or 14 cycles for the
secondary GOTO or GOLONG instructions.

         Length             Execution
         in                 time in
         nibbles            machine cycles      Remarks

Standard 8+9N[+4G,[+6L]]    21+13N              N= number of table entries
                                                G= secondary GOTO, if needed
=FINDA   11+5N              119+45N             L= secondary GOLONG, if needed

HPILROM  74+5N              63   not in tables
                            168  in table I
                            197  in table II

In other words, if you try to cut the code length in half by using =FINDA
instead of the standard approach, , you would have to increase execution time
by a factor of at least 3... This, to me, would be totally unacceptable. In
fact, I would seriously consider adding an instruction or two to split the
standard table in two such as :

POLHND   LC(2)   #1E
         ?B<C    B             The extra 8/15 cycles required  to choose
         GOYES   POLTBL        between the less than 1E polls and the more than
         ?B=C    B             1E polls is well worth it if it allows you to
         GOYES   POLL1E        split a table of 15 or more polls into two
         .                     tables that effectively cut execution time in
         .                     half.
         .
         RTNSXM
POLTBL   C=C-1   B
         ?B=C    B
         GOYES   POLL1D
         .
         .
         .
         RTNSXM

Actually, you should be able to improve on the above by taking advantage of the
status of the carry flag on entry to POLHND. If set, the poll is a fast poll
issued with =FPOLL, and if not it is a slow poll issued by =POLL. This
allows us to use GOC FSTPOL or GONC SLOPOL to split the poll handler shell with
the smallest possible overhead, 3 and 10 machine cyles, depending on whether
you fall-through or jump according to the offset. These instructions are only
three nibbles long.. Incidentally, the MATHROM poll handler uses this technique
with a rather strange variation on the standard shell:

POLHND  SETHEX          MATHROM DISASSEMBLY
        P=     0        standard HP safety measures
        GOC    FSTPOL   if carry, spend 10 cyles jumping to the fast poll shell
        LCHEX  F2       or only 3 cycles to fall-through to the slow polls
        ?B=C   B
        GOYES  POLLF2
        .
        .
        RTNSXM          end of slow poll shell
FSTPOL  LCHEX  38       complex math fast poll
        ?B#C   B        the strange variation
        GOYES  FSTP01   keep going through the shell
*----------------------------------------------------------------*
        GOTO   POLL38   The process code could be inserted here, if it is less
*                       than 127 nibbles long...
*----------------------------------------------------------------*
FSTP01  LCHEX  FB
        ?B#C   B
        GOYES  FSTP02
        GOTO   POLLFB
FSTP02  .               and so on to the end fo the fast poll shell
        .
        .
        RTNSXM          end of fast poll shell.

Personally, I would use GONC SLOPOL, as this would minimize the overhead
execution time required to get to the fast polls, which include the most
frequently issued polls. Also, using ?B#C B instead of the more common ?B=C B
reduces the time required to access the poll process code for the first poll in
the FSTPOL shell by 7 cycles due to the difference in GOYES execution with
with carry clear, but the time required to fall-through the shell when a poll
is not handled increases from 13 cycles per test to 20. Since most polls are
not handled as a result of any given test, I find this variation inefficient. 

There is a potential pitfall associated with splitting the poll handler shell
into two smaller shell specializing in either the fast polls or the slow poll:
some polls may be either fast or slow. This includes specifically the pENTER,
pTRANS and pTEST polls, and quite possibly other polls as well. These would
require special handling, see below:

POLHND  SETHEX          Optional
        P=     0        Optional
        GONC   SLOPOL   Carry is set for fast polls, clear for slow polls. As
FSTPOL  LC(2)  #1B      shown, fast polls are delayed only 3 machine cycles.
        ?B=C   B        This alternative is very attractive if you process
        GOYES  POLL1B   both fast and slow polls.
        LC(2)  #1C
        .
        .
        GONC   pENTER   B.E.T. now, go handle the wishy-washy polls
SLOPOL  LC(2)  #F2      standard poll handler shell, for the slow polls
        ?B=C   B
        GOYES  POLLF2
        .
        .
        .
pENTER  LC(2)  #12
        ?B=C   B
        GOYES  POLL12
        RTNSXM

There are many other ways. We could adapt the TBLJMP routine, or write a
similar routine that supports 16-way branching with faster code. However,
splitting the shells into shorter sequences is not a cure-all, and an HPILROM
type handler must eventually be used as the number of polls processed keeps
increasing.

POLL FREQUENCY DISTRIBUTION: In the material above I say that pKYDF and pWTKY
are issued far more frequently than other polls. While there is no doubt that
they are more frequent than pCLDST (Cold Start, following Memory Lost) or
pCONFG (Configure on wake-up) or pPWROF (Power off), we really had no basis
for saying how often polls are issued. Since we need such information in order
to optimize our poll handler shells in the future, I decided to find out what
goes on in my HP-71. See the LEX file POLLDATA and its source code POLDATAS.

POLLDATA intercepts every poll issued and increments the applicable storage
registers, to include a total count register and a slow poll count register.
POLLDATA also provides several keywords that allow you to access all this
information and produce neat summary reports under program control. Finally,
it provides the FPTIME and SPTIME keywords that report the time overhead
required to execute poll handler shells with both fast polls and slow polls.

Based on a short testing period, my HP-71 issued some 25233 polls. Of these,
36% were pKYDF (#1B), and pWTKY (#1C) accounted for another 36% of all polls.
pSREQ amounted to 18%. These three polls were issued 9060, 9060 and 4555 times,
respectively. Other polls were issued far less frequently, with pPRTIS (#0F)
issued only 495 times and pMNLP (#FA) 257 times being the next highest counts.
This suggests that we should be primarily concerned with the response time for
the above listed polls, as they will be determining overall performance.
However, this should not be done in a way that increases execution time for
other polls, since the software you are using will have a significant impact on
the polls issued.

Please note that these results did confirm the importance of handling keystroke
related polls on a priority basis. It would be interesting to compare poll
frequency distribution information for different types of HP-71 use.

Well, that is all for now. Happy programming!
Mike Markov.
