TSORTDOC documents the current vertion of TSORTLEX, which is TSORTLX9. 12/18/88

TSORTLEX Copyright is by Michael Markov 1988.
TSORTLX9 (current version) is 1472 bytes long.

TSORTLX9 corrects a bug in TSORTLX7 having to do with where the file pointer is
set to after you use APPEND, BCOPY and RCOPY. TSORTLX7 left the pointer at the
OLD location of <destination>, which is also the start of the record(s) that 
has been appended or copied. This is common to INSERT#, REPLACE# and DELETE#.
Presumably, this allows you to use a serial read to get back the last record
stored into a text file. However, I find this implementation a waste, since you
already have the record in a text string. In my opinion, it is more useful to
have the pointer set to the new location of the record that follows the one(s)
that you have appended or copied...

Of course, the change affects BDEL#, which uses common code with the above 
keywords. However, when you delete a record or a block of records, the new
address of the records that followed the record(s) that have just been deleted
is the start of the information just deleted, which means that there will be
no visible impact.

TSORTLX9 also provides a new keyword, PTINSERT#, or "insert at current file 
pointer". PTINSERT is similar to INSERT#, except that instead of inserting at a
line number, it inserts at the current file position pointer. Since you the 
current file pointer is available in the File Information buffer, PTINSERT#
is much faster than INSERT# for serial insertion of information into a text
The speed advantage increases with the length of the file and the number of 
records between the start of the file and the pointer position. To avoid
corrupting the text file, you should set the pointer to a valid address with
RESTORE#N,R or with some other keywords that sets the pointer to a known
position before using PTINSERT repeatedly. See below for further details.    

The keywords in this LEX file are primarily designed to help you to manipulate
and sort HP71 TEXT files under program control. The keywords provided work with
channel numbers, since this optimizes executions speed.  Errors will be
generated if the channel # is more than 255 or less than 1.

TSORTLEX is intended to provide keywords that really should have been provided
by EDLEX (FORTH/ASSEMBLER ROM, or TEXT EDITOR ROM). The new keywords implement
in assembly language capabilities that the above modules implemented in BASIC.
This should greatly improve the performance of text editor programs, after they
are updated to take advantage of the new keywords. (Not yet done).

The following keywords are provided:

APPEND#N;A$                         Append a record to the file.
BCOPY#N,<from>,<to>,<destination>   Copy a block of record, insert in front of
                                    <destination>.
BDEL#N,<from>,<to>                  Delete of block of records from the file.
BMOVE#N,<from>,<to>,<destination>   Move a block of records within a file. New
                                    location is in front of <destination>.
BREV#N,<from>,<to>                  Reverse the order of a block of records.
LASTR(N)                            Return the last record number.
PTINSERT#N;A$                       Insert a record in the file in front of the
                                    record at current position pointer.
RCOPY#N,<from>,<destination>        Copy a record, insert in front of <dest.>.
RMAXSZ(N)                           Return the length of the longuest record.
RMOVE#N,<from>,<destination>        Move a record within a file. New location
                                    is in front of <destination>.
RSWAP#N,<from>,<with>               Exchange the contents of two records.
TXTMAXC(field,start column,<from>,<to>,N)  Locate the record whose contents at
TXTMAX(field,start column,<from>,<to>,N)   beginning at <start col.> have the
TXTMINC(field,start column,<from>,<to>,N)  have the minimum or maximum value.
TXTMIN(field,start column,<from>,<to>,N)   <from> and <to> specify the first
                                           and last records included in the
                                           search.
where  N is a channel number (1 to 255),
      <field> is an expression. The maximum field is 16 bytes, any higher value
      gives the same result as 16. If <field> is equal to zero, ranking is
      based on the least significant byte of the record header, or the length
      of the records. This assumes all records are 255 bytes or less, which is 
      true in most, but not all, instances.

NOTE: The LIF standard specifies that text file records headers are two byte 
long, thus supporting record lengths of FFFE (hex) bytes. The FFFF record
header is reserved for use as an End Of File (EOF) marker. This marker is NOT
mandatory, at least within the HP-71 implementation. EOF is also determined by
the start of next file pointer in the file header. However, this EOF marker is
required on mass storage for compatibility with other controllers.

The difference between TXTMAXC and TXTMAX is that TXTMAXC ranks the information
as if all lowercase characters have been converted to uppercase characters,
while TXTMAX uses the numerical values of the characters.

The difference between TXTMINC and TXTMIN is that TXTMINC ranks the information
as if all lowercase characters have been converted to uppercase characters,
while TXTMIN uses the numerical values of the characters.

For BMOVE, <destination> may not fall inside the <from> <to> block. If
<destination> is equal to either <from> or <to>, no error is generated, but
nothing changes. This is totally compatible with the operation of the
FORTH/ASSEMBLER text editor.

For BCOPY, <destination> may be inside the <from> <to> block.

For BCOPY, BMOVE, RCOPY and  RMOVE, the record or block of records to be moved
or copied is inserted in front of <destination>.

The <from> and <to> records are included in the block of records that are being
copied, moved or ranked. In all instances, <from> must exist. <to> usually may
be set to -1 to specify that the block of records includes everything from
to EOF (End Of File). It is generally expected that <from> is less than or
equal to <to>. An Invalid Argument error results whenever violation of the
above guidelines is likely to cause problems.

The current file position pointer in the File Information Buffer (FIB) is
updated to insure the validity of subsequent serial read or write operations.
See below for details. This should be of considerable interest since serial
operations are much faster than the computed READ#N,I;A$, etc.

Additional information on how these keywords are used follows.

-------------------------------------------------------------------------------

SORTING TEXT FILES:

The following program sorts text files in ascending order:

ASSIGN#N TO <FILENAME>
F=LASTR(N) @ FOR I=0 TO F @ RSWAP#N,I,TXTMINC(16,1,I,F,N) @ NEXT I

You could use RMOVE instead of RSWAP and get the same results. RSWAP is
considerably faster if record lengths are all the same.

You could use TXTMIN instead of TXTMINC if you want "a" to go after "Z" (and
characters 91 to 96) rather than be considered another "A". TXTMIN is faster
than TXTMINC.

Actually, you can easily improve on the above by sorting the file from end to
start, while giving the user improved control over what is done. See below.
Sorting the file backwards is faster because the length of the chain of records
to be traced by the TXTMIN/TXTMAX/RSWAP/RMOVE keywords is decreasing by 1 record
each time you go around the FOR I ... NEXT I loop. In other words, you reduce
the number of times certain routines are executed from E*(E-S) times to  
(E+S)*(E-S)/2. For small values of S, you get an improvement of almost 50%. 
However, since tracing the record chain is only part of what has to be done,
actual execution time improvements will be on the order of 30%.

0010 INPUT 'File? ';F$ @ ASSIGN #1 TO F$ @ F=LASTR(1)
0011 INPUT 'Column? ';N
0012 INPUT 'START REC? ','3';S
0013 INPUT 'END REC? ',STR$(F);E
0019 T=TIME
0040 FOR I=E TO S STEP -1 @ RSWAP #1,TXTMAX(10,N,S,I,1),I @ NEXT I
0240 T=TIME-T
0250 ASSIGN #1 TO * @ DISP TCNV$(T)

I call the above program TRIX (Tri is French for sort, X is because I lost
count of the number of versions I developed). I am interested in any improved
versions users may develop. The TCNV$ keyword comes from the DATA ACQUISITION
ROM, and is convenient (but not required) when displaying elapsed time.

If reverse order is desired, then the contents of the sorted file can be
reversed with the BREV keyword: BREV#N,S,E

Compare performance  with the REV program(s) below, and you will understand
why I wrote BREV# .

0010 INPUT 'File? ';F$ @ T=TIME @ ASSIGN #1 TO F$ @ F=LASTR(1) ! @ K=RMAXSZ(1)
0015 ! DIM A$[K],B$[K]
0020 FOR I=0 TO F DIV 2 @ RSWAP #1,I,F-I @ NEXT I
0030 T=TIME-T @ DISP TCNV$(T)
0040 ! FOR I=0 TO F DIV 2 @ READ#1,I;A$ @ READ#1,F-I;B$ @ REPLACE#1,F-I;A$
0041 ! REPLACE#1,I;B$ @ NEXT I
0050 ! T=TIME-T @ DISP TCNV$(T)

And, obviously, by using a TXTMIN instead of a TXTMAX keyword in the TRIX
program, you can get files sorted in revese order, without using REV or BREV#.

It is interesting to note that serial read operations such as
READ#N;A$ [,B$ [,C$ ...]] are much faster than computed reads such as
READ#N,I;A$. In fact, given enough memory, you can read the entire text file
into a string array, which allows much faster access (and sorting) than is
possible if you trace the chain of the variable length records in a text file.
You can ASSIGN#N TO <text file> @ L=LASTR(N) @ DIM Q$(N)[RMAXSZ(N)]
@ RESTORE #N @ READ#N;Q$ to do the job in a single, very fast operation. This
approach is very attractive since it can be performed with mass storage files
just as easily as with (I)RAM files. Similarily, you can store the contents of
sorted string array back into a text file with RESTORE#N @ PRINT#N;Q$ in a
single operation. The main problem I have with that approach is that I rarely
have enough memory to spare for the job, since the string array takes more room
than the source text file. See the HP71 Reference Manual, pg. 332. Besides, I
expect that this LEX will take care of the speed problem, when finished.

-------------------------------------------------------------------------------
KEYWORDS NOT YET IMPLEMENTED:

TXTSORT#N,<from>,<to>,<start col.>,<field> [NOCASE] [DESCENDING]

where NOCASE and DESCENDING are options that give the flexibility provided by
the four TXTMIN/TXTMAX keywords.

Actually, this is what I started to write in the first place. The current set
of keywords were developed as building blocks that make the more complicated
TXTSORT keyword possible. Writing this keyword should yield significant
improvements over the TRIX program above, speedwise. (I hope.) Many "overhead"
operations such a evaluating parameters, finding the FIB entry, computing the
file start address, etc.,) will have to be performed only once. 


IMPORT#N,<file specifier or channel #>,<from>,<to>,<dest>  
EXPORT#N,<from>,<to>,<destination channel> [APPND]

IMPORT will allow you to copy the contents of the file specified by 
<file specifier>, from record number <from> to record number <to> inclusive,
and insert it in front of record number <destination> in the file associated 
with channel N. EXPORT will perform the inverse operation, allowing you to 
either replace the content of the destination file or to append the block of
records specified to the destination file. These keywords will support 
non-destructive delete/undelete operations, as well as allow you to build a
new text files using blocks of text available in other files.

SEARCHC( same as EDLEX )           Search file for match as if lowercase and
                                   uppercase characters are the same

TXTFIND( search string, start column, occurance #, N)   A fast search for files
                                   where data is organized in known tabular
                                   format. Perhaps with optional toggle to
                                   allow uppercase/lowercase matching.

FIRST(<string>,<start col>,<from>,<to>,N)

LAST(<string>,<start col>,<from>,<to>,N)

The operation of the last two keyword is not defined. The intent is to allow
secondary sorts within a file. An example that comes to mind is the SDL utility
for IBM machines that allows you to sort directories by file extensions and
filelengths at the same time.. It is possible that TXTFIND may well be a more
useful equivalent. To be decided later. 

It would be nice to allow sorting by word number, as well as by start column.
This would actually be easy to do, using the code for TXTMIN. All that would be
required is a new SETPTR routine. The main reason NOT to do this is that this
would be much slower than the computation used by SETPTR. Also, it would double
the number of TXTMIN/TXTMAX keywords required to cover all the possible
combinations of options. I may do something about it after I finish the above
mentioned keywords, possibly as an extra TXTSORT option.

-------------------------------------------------------------------------------
SYNTAX AND DETAILS OF KEYWORD OPERATIONS:

NOTE: you may use negative numbers with most of the keywords in this LEX file,
since the numbers are poped from the stack and converted to hex with RNDAHX
rather than with =POP1R and =FLTDH. As mentioned above, this allows you to
move a record to the end of the file with the convenient RMOVE#N,<from>,-1
rather than RMOVE#N,<from>,1048575. Record -1 is equivalent to record FFFFF in
hex. This can be handy when using these keywords from the keyboard.
 

Syntax: APPEND#N;A$

N must be a channel number assigned to an unsecured text file.  The contents of
A$ are appended at the end of the target text file.  The primary advantage over
RESTORE#N,1048575 @ PRINT#N;A$ is that you do not have to know what the highest
allowable record number is 1048575, or FFFFF in hex. This can prevent losing
information in a file by accident. (Keep in mind that you are allowed PRINT#N;A$
as well as READ#N;A$ and READ#N,R;A$ but not PRINT#N,R;A$ with text files.)

After APPEND#N;A$ the file current position pointer is set to End Of File, just
as if you had used RESTORE#N,1048575 @ PRINT#N;A$

NOTE: If we ignore practical considerations such a available memory, and a
minimum requirement of 2 bytes per zero length records, there is no reason why
a text file could not hold more than 1048575 records... However, the
hex<=>decimal conversions routines used by all the keywords required to handle
text files limit the maximum number of records in a text file as mentioned
above. Other computers such as the HP-75 can limit you to a maximum 9999 lines.
 
This keyword was taken out of IDS#1 almost as is. See the INSERT# keyword
(EDLEX). The major differences are that I removed code (ever try to unravel
some wool after a playful kitten got into it?) that pertained to other stuff,
as well as CPU status flag that were nicely confusing the issue. I suppose I
will have to put some of the junk back in to reduce code length, perhaps as
much as 20%. However, before I do that, I want to make sure everything works
exactly as I want it to.  Debugging "spaghetti" code is no fun - the least
change and several things go wrong at once.

RCOPY#N,<from>,<destination>      Copy the <from> record and insert in in front
                                  of <destination>

BCOPY#N,<from>,<to>,<destination>  Same, but for an entire block

BCOPY and RCOPY are very similar to both INSERT#(EDLEX) and APPEND#: The data
to be copied is copied from the file and placed into the HP-71 output buffer,
after making sure there is sufficient available memory. Then, the contents
of the output buffer is inserted into the file using RPLLIN (RePLace LINe), a
mainframe utility that allows you to edit/replace/delete/insert lines into file
space, taking care of updating all the file pointers, buffer pointers, etc.
Following BCOPY and RCOPY, the file current position pointer points at the
<destination> record. This was both easy to do and consistent with the EDTEXT
handling of the Copy options.


Syntax: RMOVE #N,<from>,<destination>
Syntax: BMOVE #N,<from>,<to>,<destination>

BMOVE and RMOVE are identical, except that BMOVE handle a block of records
instead of only one.

N must be a channel number assigned to an unsecured text file.  <from> and
<destination> are numerical expressions for record numbers.  <from> must be an
existing record #. It is moved in front of <destination>.  The file 'current
position' pointer in the FIB (File Information Buffer) points to <destination>
on exit. If destination does not exist, <from> is moved to the end of the file.


Syntax: RSWAP#N,<record #>,<record#>

N must be a channel number assigned to an unsecured text file.  <from> and
<destination> are numerical expressions for record numbers. Both of the
specified records MUST exist. The contents of the two records are exchanged.
The file 'current position' pointer in the FIB (File Information Buffer) should
point to the same record it was set to on entry to RSWAP, at its new location
if it has moved.  Since I have not yet done the job correctly, I presently set
this pointer to the end of the record with the higher address.  This was easier
to do, and insures a valid pointer position for subsequent serial READ#N;A$
type of operations. Exception: if you use RSWAP on pre-formated files and the
record lengths are the same, then the pointer remains at its original position.


BREV#N,<from>,<to>

N must be a valid channel number assigned to an unsecured text file.
BREV# reverses the order of the records in the block specified. Very fast- 
each record is reversed, then the entire block. This keyword and the associated
code support the [DESCENDING] option to be implemented in TXTSORT#.
BREV# sets the current file position pointer to the start of the block you have
reversed, for easy viewing.


N=LASTR(<channel #>)

Returns the RECORD # of the last record.  NOTE: This is the last line number
less 1. This is equivalent to FILESZR(<filename>) in EDLEX, but faster.


N=RMAXSZ(<channel #>)

Returns the size of the longest record, not including the 4 nibble length
header, but including the pad byte if there. This allows you to dimension your
strings in a program to be large enough to hold the longest record in the file,
i.e. M=RMAXSZ(N) @ DIM A$[M],B$[M],...


N=TXTMAXC(#,#,#,#,#)
 N=TXTMAX(#,#,#,#,#)
N=TXTMINC(#,#,#,#,#)
 N=TXTMIN(#,#,#,#,#)       See the SEARCH keyword in IDS VOL.I
          | | | | |______ Channel number (1 to 255)
          | | | |________ End record. The last record scanned in the search.
          | | |__________ Start record. The first record searched
          | |____________ Start Column. The first character in each record
          |                where the comparison starts.
          |______________ Field. (0 to 16). Values above 16 are set to 16.

Purpose: Locate the smallest or largest record between (and including) <begin
line> and <end line>, based on the contents of the records starting at <start
column> for a field of <#bytes> long.

Four keywords allow you to rapidly locate the largest or smallest record within
the specified range of records. TXTMIN returns the record # of smallest record
in the range, while TXTMAX returns the record # of the largest. So, why four
keywords instead of just two?

TXTMIN and TXTMAX determine the highest or lowest record based on the relative
numerical value of each character, such that "A"<"a".  TMAXC and TMINC are near
equivalents of TXTMIN and TXTMAX, except that lower case characters "a" through
"z" are converted to upper-case characters for the purpose of making
comparisons. Here, "a"="A"!

Thus, let us assume that you remember that the item you are looking for is
"Aadvark", but you cannot recall whether it was entered as AADVARK or perhaps
something else with one or more lower-case character. Then, you can sort your
text file using TXTMINC or TXTMAXC to make sure all the A characters are
together.

If you set the <start column> to zero and the record lenghts are less than 256,
you can sort the file by record length. I am not sure how useful this feature
is, but at least, it is extra capability!

The limit of 16 characters for the sort field may seem arbitary. It is! I am
sure someone will come-up with an example where the sort field needs to be
greater. My answer will be "Sorry".  First, in my own experience, most
comparisons actually require about three characters. Second, you can always
perform two or more sorts, first at the desired start column+16, then at the
desired start column. Finally, the real reason: I keep track of the count using
the A[S] field, which greatly simplifies code.  The [B] field would have
allowed fields of 256 bytes, but they are not available without considerable
effort. If someone has an idea that would allow any-size fields without
incurring penalties such as slower execution, please let me know.

-------------------------------------------------------------------------------

BREV, BMOVE, RMOVE and RSWAP do their job of moving records around WITHOUT
USING ANY MEMORY, aside from the space required by TSORTLEX and the text file
you are manipulating. This is was done to address the all too frequent problem
of not having enough memory to sort large text files, or sorting text files
with long records. (Maximum record length is 65534 bytes). 

The trick is accomplished by a carefully ochestrated set of byte exchanges. See
below. The tick marks ' denote the start of a record, and do not take any
memory.
 

                 '<from> '??????'<with>'
Original order : 'ABCDEFG'HIJKLM'NOPQR'
Step1 RSWAP    : 'GFEDCBA'MLKJIH'RQPON' reverse <from>, <to> and the records
                                        in between, each as a block.
Final order    : 'NOPQR'HIJKLM'ABCDEFG' reverse entire space as single block

There are several special cases. First, if the <from> and <to> records are of
equal length, you can save a lot of time exchanging the contents of these two
records byte for byte. See SWAPEQ.

Second, <from> and <to> may be equal, in which case nothing needs be moved.

Third, the <to> record may follow the <from> record, there is nothing between
<from> and <to>.

The above described method requires that every byte be moved twice before the
swap is completed. A somewhat faster method could be developed where <from> and
<to> are moved to the math stack, the records between the two shifted in memory
up or down (only once), and <from> and <to> moved from the stack back into the
text file. This would be especially advantageous if there is a lot of bytes
between <from> and <to>. However, this could lead to problems if available
memory is limited. The method described above is very fast, even with
relatively large files.

BMOVE
RMOVE          '<from-to>'??????'<dest>'
Original order : 'ABCDEFG'HIJKLM'NOPQR'
Step1          : 'GFEDCBA'MLKJIH'NOPQR' reverse <from-to>, and the records
                                       following <from> up to but not including
                                       <destination> as a single block.
Final order    : 'HIJKLM'ABCDEFG'NOPQR' reverse entire space as single block

BMOVE
RMOVE            '<dest> '??????'<frm-to>'
Original order : 'ABCDEFG'HIJKLM'NOPQR'
Step1 RMOVE    : 'MLKJIH'GFEDCBA'RQPON' reverse <from-to>, and the records
                                       <dest.> to start of <from> as a block
Final order    : 'NOPQR'ABCDEFG'HIJKLM' reverse entire space as single block

Special cases include moving <from> to the end of the file, and when <from> and
<to> follow each other, where a block of zero lengths is reversed.

BREV uses the same approach, in a slightly different way: first, each record in
the block is reversed, then the entire block. The net result: the block is
reversed, as desired.

-------------------------------------------------------------------------------

LIF Standard: The LIF standard, which is fully implemented in the HP-71,
requires that text file records be an even number of bytes long. The minimum
lenght is two bytes, where the record consists of only the lenght of record
header, which is set to 0000 hex. This corresponds to an empty line such as a
spacer between two paragraphs. A record holding a single character will be four
bytes long, 0001CCxx, where 0001 is the length of record, one byte, CC is the
character and xx is a pad byte added to make the record length even. Therefore,
the length of a record and the actual length of the record are related as
follows:

Length of record = 2 (header) + header value + 1 if header value is odd.

This computation is time consuming at best, and additional time must be used to
make sure that the header is or is not the EOF marker, or beyond the actual end
of file. The NXTRC routine, which is based on its counterpart in HP's EDLEX,
was designed to minimize the usage of CPU registers and optimize execution
speed. A faster version exists in DISASMLX (See TKS source file on SWAP10).
That version is restricted to text files with record lengths of 254 bytes or
less.

The fact that all records are an even number of bytes long, or a numerical
multiple of 4 nibs allows us to reverse records or sets of records in exchanges
involving two sets 4 nibbles each, a significant improvement over exchanges 
involving only a byte at a time. If this seems confusing, wait until you try
to make sense of the source code. The key fact is that all the records MUST be
reversed twice, to restore the record structure of the file. The order and 
from-to inputs to the reversal routine determine whether you move, reverse or
swap records or block of records.

-------------------------------------------------------------------------------
