TSORTDOC documents the current vertion of TSORTLEX, which is TSORTLX9. 12/18/88 TSORTLEX Copyright is by Michael Markov 1988. TSORTLX9 (current version) is 1472 bytes long. TSORTLX9 corrects a bug in TSORTLX7 having to do with where the file pointer is set to after you use APPEND, BCOPY and RCOPY. TSORTLX7 left the pointer at the OLD location of , which is also the start of the record(s) that has been appended or copied. This is common to INSERT#, REPLACE# and DELETE#. Presumably, this allows you to use a serial read to get back the last record stored into a text file. However, I find this implementation a waste, since you already have the record in a text string. In my opinion, it is more useful to have the pointer set to the new location of the record that follows the one(s) that you have appended or copied... Of course, the change affects BDEL#, which uses common code with the above keywords. However, when you delete a record or a block of records, the new address of the records that followed the record(s) that have just been deleted is the start of the information just deleted, which means that there will be no visible impact. TSORTLX9 also provides a new keyword, PTINSERT#, or "insert at current file pointer". PTINSERT is similar to INSERT#, except that instead of inserting at a line number, it inserts at the current file position pointer. Since you the current file pointer is available in the File Information buffer, PTINSERT# is much faster than INSERT# for serial insertion of information into a text The speed advantage increases with the length of the file and the number of records between the start of the file and the pointer position. To avoid corrupting the text file, you should set the pointer to a valid address with RESTORE#N,R or with some other keywords that sets the pointer to a known position before using PTINSERT repeatedly. See below for further details. The keywords in this LEX file are primarily designed to help you to manipulate and sort HP71 TEXT files under program control. The keywords provided work with channel numbers, since this optimizes executions speed. Errors will be generated if the channel # is more than 255 or less than 1. TSORTLEX is intended to provide keywords that really should have been provided by EDLEX (FORTH/ASSEMBLER ROM, or TEXT EDITOR ROM). The new keywords implement in assembly language capabilities that the above modules implemented in BASIC. This should greatly improve the performance of text editor programs, after they are updated to take advantage of the new keywords. (Not yet done). The following keywords are provided: APPEND#N;A$ Append a record to the file. BCOPY#N,,, Copy a block of record, insert in front of . BDEL#N,, Delete of block of records from the file. BMOVE#N,,, Move a block of records within a file. New location is in front of . BREV#N,, Reverse the order of a block of records. LASTR(N) Return the last record number. PTINSERT#N;A$ Insert a record in the file in front of the record at current position pointer. RCOPY#N,, Copy a record, insert in front of . RMAXSZ(N) Return the length of the longuest record. RMOVE#N,, Move a record within a file. New location is in front of . RSWAP#N,, Exchange the contents of two records. TXTMAXC(field,start column,,,N) Locate the record whose contents at TXTMAX(field,start column,,,N) beginning at have the TXTMINC(field,start column,,,N) have the minimum or maximum value. TXTMIN(field,start column,,,N) and specify the first and last records included in the search. where N is a channel number (1 to 255), is an expression. The maximum field is 16 bytes, any higher value gives the same result as 16. If is equal to zero, ranking is based on the least significant byte of the record header, or the length of the records. This assumes all records are 255 bytes or less, which is true in most, but not all, instances. NOTE: The LIF standard specifies that text file records headers are two byte long, thus supporting record lengths of FFFE (hex) bytes. The FFFF record header is reserved for use as an End Of File (EOF) marker. This marker is NOT mandatory, at least within the HP-71 implementation. EOF is also determined by the start of next file pointer in the file header. However, this EOF marker is required on mass storage for compatibility with other controllers. The difference between TXTMAXC and TXTMAX is that TXTMAXC ranks the information as if all lowercase characters have been converted to uppercase characters, while TXTMAX uses the numerical values of the characters. The difference between TXTMINC and TXTMIN is that TXTMINC ranks the information as if all lowercase characters have been converted to uppercase characters, while TXTMIN uses the numerical values of the characters. For BMOVE, may not fall inside the block. If is equal to either or , no error is generated, but nothing changes. This is totally compatible with the operation of the FORTH/ASSEMBLER text editor. For BCOPY, may be inside the block. For BCOPY, BMOVE, RCOPY and RMOVE, the record or block of records to be moved or copied is inserted in front of . The and records are included in the block of records that are being copied, moved or ranked. In all instances, must exist. usually may be set to -1 to specify that the block of records includes everything from to EOF (End Of File). It is generally expected that is less than or equal to . An Invalid Argument error results whenever violation of the above guidelines is likely to cause problems. The current file position pointer in the File Information Buffer (FIB) is updated to insure the validity of subsequent serial read or write operations. See below for details. This should be of considerable interest since serial operations are much faster than the computed READ#N,I;A$, etc. Additional information on how these keywords are used follows. ------------------------------------------------------------------------------- SORTING TEXT FILES: The following program sorts text files in ascending order: ASSIGN#N TO F=LASTR(N) @ FOR I=0 TO F @ RSWAP#N,I,TXTMINC(16,1,I,F,N) @ NEXT I You could use RMOVE instead of RSWAP and get the same results. RSWAP is considerably faster if record lengths are all the same. You could use TXTMIN instead of TXTMINC if you want "a" to go after "Z" (and characters 91 to 96) rather than be considered another "A". TXTMIN is faster than TXTMINC. Actually, you can easily improve on the above by sorting the file from end to start, while giving the user improved control over what is done. See below. Sorting the file backwards is faster because the length of the chain of records to be traced by the TXTMIN/TXTMAX/RSWAP/RMOVE keywords is decreasing by 1 record each time you go around the FOR I ... NEXT I loop. In other words, you reduce the number of times certain routines are executed from E*(E-S) times to (E+S)*(E-S)/2. For small values of S, you get an improvement of almost 50%. However, since tracing the record chain is only part of what has to be done, actual execution time improvements will be on the order of 30%. 0010 INPUT 'File? ';F$ @ ASSIGN #1 TO F$ @ F=LASTR(1) 0011 INPUT 'Column? ';N 0012 INPUT 'START REC? ','3';S 0013 INPUT 'END REC? ',STR$(F);E 0019 T=TIME 0040 FOR I=E TO S STEP -1 @ RSWAP #1,TXTMAX(10,N,S,I,1),I @ NEXT I 0240 T=TIME-T 0250 ASSIGN #1 TO * @ DISP TCNV$(T) I call the above program TRIX (Tri is French for sort, X is because I lost count of the number of versions I developed). I am interested in any improved versions users may develop. The TCNV$ keyword comes from the DATA ACQUISITION ROM, and is convenient (but not required) when displaying elapsed time. If reverse order is desired, then the contents of the sorted file can be reversed with the BREV keyword: BREV#N,S,E Compare performance with the REV program(s) below, and you will understand why I wrote BREV# . 0010 INPUT 'File? ';F$ @ T=TIME @ ASSIGN #1 TO F$ @ F=LASTR(1) ! @ K=RMAXSZ(1) 0015 ! DIM A$[K],B$[K] 0020 FOR I=0 TO F DIV 2 @ RSWAP #1,I,F-I @ NEXT I 0030 T=TIME-T @ DISP TCNV$(T) 0040 ! FOR I=0 TO F DIV 2 @ READ#1,I;A$ @ READ#1,F-I;B$ @ REPLACE#1,F-I;A$ 0041 ! REPLACE#1,I;B$ @ NEXT I 0050 ! T=TIME-T @ DISP TCNV$(T) And, obviously, by using a TXTMIN instead of a TXTMAX keyword in the TRIX program, you can get files sorted in revese order, without using REV or BREV#. It is interesting to note that serial read operations such as READ#N;A$ [,B$ [,C$ ...]] are much faster than computed reads such as READ#N,I;A$. In fact, given enough memory, you can read the entire text file into a string array, which allows much faster access (and sorting) than is possible if you trace the chain of the variable length records in a text file. You can ASSIGN#N TO @ L=LASTR(N) @ DIM Q$(N)[RMAXSZ(N)] @ RESTORE #N @ READ#N;Q$ to do the job in a single, very fast operation. This approach is very attractive since it can be performed with mass storage files just as easily as with (I)RAM files. Similarily, you can store the contents of sorted string array back into a text file with RESTORE#N @ PRINT#N;Q$ in a single operation. The main problem I have with that approach is that I rarely have enough memory to spare for the job, since the string array takes more room than the source text file. See the HP71 Reference Manual, pg. 332. Besides, I expect that this LEX will take care of the speed problem, when finished. ------------------------------------------------------------------------------- KEYWORDS NOT YET IMPLEMENTED: TXTSORT#N,,,, [NOCASE] [DESCENDING] where NOCASE and DESCENDING are options that give the flexibility provided by the four TXTMIN/TXTMAX keywords. Actually, this is what I started to write in the first place. The current set of keywords were developed as building blocks that make the more complicated TXTSORT keyword possible. Writing this keyword should yield significant improvements over the TRIX program above, speedwise. (I hope.) Many "overhead" operations such a evaluating parameters, finding the FIB entry, computing the file start address, etc.,) will have to be performed only once. IMPORT#N,,,, EXPORT#N,,, [APPND] IMPORT will allow you to copy the contents of the file specified by , from record number to record number inclusive, and insert it in front of record number in the file associated with channel N. EXPORT will perform the inverse operation, allowing you to either replace the content of the destination file or to append the block of records specified to the destination file. These keywords will support non-destructive delete/undelete operations, as well as allow you to build a new text files using blocks of text available in other files. SEARCHC( same as EDLEX ) Search file for match as if lowercase and uppercase characters are the same TXTFIND( search string, start column, occurance #, N) A fast search for files where data is organized in known tabular format. Perhaps with optional toggle to allow uppercase/lowercase matching. FIRST(,,,,N) LAST(,,,,N) The operation of the last two keyword is not defined. The intent is to allow secondary sorts within a file. An example that comes to mind is the SDL utility for IBM machines that allows you to sort directories by file extensions and filelengths at the same time.. It is possible that TXTFIND may well be a more useful equivalent. To be decided later. It would be nice to allow sorting by word number, as well as by start column. This would actually be easy to do, using the code for TXTMIN. All that would be required is a new SETPTR routine. The main reason NOT to do this is that this would be much slower than the computation used by SETPTR. Also, it would double the number of TXTMIN/TXTMAX keywords required to cover all the possible combinations of options. I may do something about it after I finish the above mentioned keywords, possibly as an extra TXTSORT option. ------------------------------------------------------------------------------- SYNTAX AND DETAILS OF KEYWORD OPERATIONS: NOTE: you may use negative numbers with most of the keywords in this LEX file, since the numbers are poped from the stack and converted to hex with RNDAHX rather than with =POP1R and =FLTDH. As mentioned above, this allows you to move a record to the end of the file with the convenient RMOVE#N,,-1 rather than RMOVE#N,,1048575. Record -1 is equivalent to record FFFFF in hex. This can be handy when using these keywords from the keyboard. Syntax: APPEND#N;A$ N must be a channel number assigned to an unsecured text file. The contents of A$ are appended at the end of the target text file. The primary advantage over RESTORE#N,1048575 @ PRINT#N;A$ is that you do not have to know what the highest allowable record number is 1048575, or FFFFF in hex. This can prevent losing information in a file by accident. (Keep in mind that you are allowed PRINT#N;A$ as well as READ#N;A$ and READ#N,R;A$ but not PRINT#N,R;A$ with text files.) After APPEND#N;A$ the file current position pointer is set to End Of File, just as if you had used RESTORE#N,1048575 @ PRINT#N;A$ NOTE: If we ignore practical considerations such a available memory, and a minimum requirement of 2 bytes per zero length records, there is no reason why a text file could not hold more than 1048575 records... However, the hex<=>decimal conversions routines used by all the keywords required to handle text files limit the maximum number of records in a text file as mentioned above. Other computers such as the HP-75 can limit you to a maximum 9999 lines. This keyword was taken out of IDS#1 almost as is. See the INSERT# keyword (EDLEX). The major differences are that I removed code (ever try to unravel some wool after a playful kitten got into it?) that pertained to other stuff, as well as CPU status flag that were nicely confusing the issue. I suppose I will have to put some of the junk back in to reduce code length, perhaps as much as 20%. However, before I do that, I want to make sure everything works exactly as I want it to. Debugging "spaghetti" code is no fun - the least change and several things go wrong at once. RCOPY#N,, Copy the record and insert in in front of BCOPY#N,,, Same, but for an entire block BCOPY and RCOPY are very similar to both INSERT#(EDLEX) and APPEND#: The data to be copied is copied from the file and placed into the HP-71 output buffer, after making sure there is sufficient available memory. Then, the contents of the output buffer is inserted into the file using RPLLIN (RePLace LINe), a mainframe utility that allows you to edit/replace/delete/insert lines into file space, taking care of updating all the file pointers, buffer pointers, etc. Following BCOPY and RCOPY, the file current position pointer points at the record. This was both easy to do and consistent with the EDTEXT handling of the Copy options. Syntax: RMOVE #N,, Syntax: BMOVE #N,,, BMOVE and RMOVE are identical, except that BMOVE handle a block of records instead of only one. N must be a channel number assigned to an unsecured text file. and are numerical expressions for record numbers. must be an existing record #. It is moved in front of . The file 'current position' pointer in the FIB (File Information Buffer) points to on exit. If destination does not exist, is moved to the end of the file. Syntax: RSWAP#N,, N must be a channel number assigned to an unsecured text file. and are numerical expressions for record numbers. Both of the specified records MUST exist. The contents of the two records are exchanged. The file 'current position' pointer in the FIB (File Information Buffer) should point to the same record it was set to on entry to RSWAP, at its new location if it has moved. Since I have not yet done the job correctly, I presently set this pointer to the end of the record with the higher address. This was easier to do, and insures a valid pointer position for subsequent serial READ#N;A$ type of operations. Exception: if you use RSWAP on pre-formated files and the record lengths are the same, then the pointer remains at its original position. BREV#N,, N must be a valid channel number assigned to an unsecured text file. BREV# reverses the order of the records in the block specified. Very fast- each record is reversed, then the entire block. This keyword and the associated code support the [DESCENDING] option to be implemented in TXTSORT#. BREV# sets the current file position pointer to the start of the block you have reversed, for easy viewing. N=LASTR() Returns the RECORD # of the last record. NOTE: This is the last line number less 1. This is equivalent to FILESZR() in EDLEX, but faster. N=RMAXSZ() Returns the size of the longest record, not including the 4 nibble length header, but including the pad byte if there. This allows you to dimension your strings in a program to be large enough to hold the longest record in the file, i.e. M=RMAXSZ(N) @ DIM A$[M],B$[M],... N=TXTMAXC(#,#,#,#,#) N=TXTMAX(#,#,#,#,#) N=TXTMINC(#,#,#,#,#) N=TXTMIN(#,#,#,#,#) See the SEARCH keyword in IDS VOL.I | | | | |______ Channel number (1 to 255) | | | |________ End record. The last record scanned in the search. | | |__________ Start record. The first record searched | |____________ Start Column. The first character in each record | where the comparison starts. |______________ Field. (0 to 16). Values above 16 are set to 16. Purpose: Locate the smallest or largest record between (and including) and , based on the contents of the records starting at for a field of <#bytes> long. Four keywords allow you to rapidly locate the largest or smallest record within the specified range of records. TXTMIN returns the record # of smallest record in the range, while TXTMAX returns the record # of the largest. So, why four keywords instead of just two? TXTMIN and TXTMAX determine the highest or lowest record based on the relative numerical value of each character, such that "A"<"a". TMAXC and TMINC are near equivalents of TXTMIN and TXTMAX, except that lower case characters "a" through "z" are converted to upper-case characters for the purpose of making comparisons. Here, "a"="A"! Thus, let us assume that you remember that the item you are looking for is "Aadvark", but you cannot recall whether it was entered as AADVARK or perhaps something else with one or more lower-case character. Then, you can sort your text file using TXTMINC or TXTMAXC to make sure all the A characters are together. If you set the to zero and the record lenghts are less than 256, you can sort the file by record length. I am not sure how useful this feature is, but at least, it is extra capability! The limit of 16 characters for the sort field may seem arbitary. It is! I am sure someone will come-up with an example where the sort field needs to be greater. My answer will be "Sorry". First, in my own experience, most comparisons actually require about three characters. Second, you can always perform two or more sorts, first at the desired start column+16, then at the desired start column. Finally, the real reason: I keep track of the count using the A[S] field, which greatly simplifies code. The [B] field would have allowed fields of 256 bytes, but they are not available without considerable effort. If someone has an idea that would allow any-size fields without incurring penalties such as slower execution, please let me know. ------------------------------------------------------------------------------- BREV, BMOVE, RMOVE and RSWAP do their job of moving records around WITHOUT USING ANY MEMORY, aside from the space required by TSORTLEX and the text file you are manipulating. This is was done to address the all too frequent problem of not having enough memory to sort large text files, or sorting text files with long records. (Maximum record length is 65534 bytes). The trick is accomplished by a carefully ochestrated set of byte exchanges. See below. The tick marks ' denote the start of a record, and do not take any memory. ' '??????'' Original order : 'ABCDEFG'HIJKLM'NOPQR' Step1 RSWAP : 'GFEDCBA'MLKJIH'RQPON' reverse , and the records in between, each as a block. Final order : 'NOPQR'HIJKLM'ABCDEFG' reverse entire space as single block There are several special cases. First, if the and records are of equal length, you can save a lot of time exchanging the contents of these two records byte for byte. See SWAPEQ. Second, and may be equal, in which case nothing needs be moved. Third, the record may follow the record, there is nothing between and . The above described method requires that every byte be moved twice before the swap is completed. A somewhat faster method could be developed where and are moved to the math stack, the records between the two shifted in memory up or down (only once), and and moved from the stack back into the text file. This would be especially advantageous if there is a lot of bytes between and . However, this could lead to problems if available memory is limited. The method described above is very fast, even with relatively large files. BMOVE RMOVE ''??????'' Original order : 'ABCDEFG'HIJKLM'NOPQR' Step1 : 'GFEDCBA'MLKJIH'NOPQR' reverse , and the records following up to but not including as a single block. Final order : 'HIJKLM'ABCDEFG'NOPQR' reverse entire space as single block BMOVE RMOVE ' '??????'' Original order : 'ABCDEFG'HIJKLM'NOPQR' Step1 RMOVE : 'MLKJIH'GFEDCBA'RQPON' reverse , and the records to start of as a block Final order : 'NOPQR'ABCDEFG'HIJKLM' reverse entire space as single block Special cases include moving to the end of the file, and when and follow each other, where a block of zero lengths is reversed. BREV uses the same approach, in a slightly different way: first, each record in the block is reversed, then the entire block. The net result: the block is reversed, as desired. ------------------------------------------------------------------------------- LIF Standard: The LIF standard, which is fully implemented in the HP-71, requires that text file records be an even number of bytes long. The minimum lenght is two bytes, where the record consists of only the lenght of record header, which is set to 0000 hex. This corresponds to an empty line such as a spacer between two paragraphs. A record holding a single character will be four bytes long, 0001CCxx, where 0001 is the length of record, one byte, CC is the character and xx is a pad byte added to make the record length even. Therefore, the length of a record and the actual length of the record are related as follows: Length of record = 2 (header) + header value + 1 if header value is odd. This computation is time consuming at best, and additional time must be used to make sure that the header is or is not the EOF marker, or beyond the actual end of file. The NXTRC routine, which is based on its counterpart in HP's EDLEX, was designed to minimize the usage of CPU registers and optimize execution speed. A faster version exists in DISASMLX (See TKS source file on SWAP10). That version is restricted to text files with record lengths of 254 bytes or less. The fact that all records are an even number of bytes long, or a numerical multiple of 4 nibs allows us to reverse records or sets of records in exchanges involving two sets 4 nibbles each, a significant improvement over exchanges involving only a byte at a time. If this seems confusing, wait until you try to make sense of the source code. The key fact is that all the records MUST be reversed twice, to restore the record structure of the file. The order and from-to inputs to the reversal routine determine whether you move, reverse or swap records or block of records. -------------------------------------------------------------------------------