INSERT  SORTING  AND  INDEXING

                             by Dale Thorn [1760]

                     Part A:  Insert sorting in RAM using
                              a single string of data.


The following program illustrates insert sorting where a single sorting order
is sufficient, and RAM is large enough to contain all of the sorted data.  This
program requires the user to specify the number of records (Line 40), number of
fields per record (Line 50), and the length of each field (Lines 90 - 130).

NOTE:  If you change the number of fields on line 50, you will have to add or
subtract lines where the field lengths are specified (Lines 90 - 130).

The example program specifies 127 records of data where each record has 5 units
of information (fields), and the fields are 10, 20, 30, 35, and 15 characters
long, for a total of 110 characters per record.  If you multiple 110 characters
(bytes) per record times 127 records you get 13,970 bytes total, which is the
dimensioned length of A$ (see Lines 240 and 270).

In this simplified example program, we require the number of records to be a
power of 2, minus 1, to make the search routine simple (believe it or not); in
actual applications, intermediate values would be acceptable.

NOTE:  If you set up your own example where the file size (the dimensioned
length of A$) is much greater than 30,000 bytes, this program will begin to run
very slowly because of the time it takes the 71B to mount large strings on the
internal stack.

The first things the program does are to sum the number of bytes in each field
to get the total number of bytes per record, find the beginning and ending byte
positions in the record string (B$) and assign them to the arrays H() and I(),
and set the variable Z equal to the length or size of the longest field.

The program will place all input fields of data into the record string (B$) on
Lines 490 - 520, and then use the H() and I() arrays on Line 550 to find the
field(s) in the record string (B$) and the file string (A$) which are used for
sorting comparisons.  If you wish to sort on a different field number, you will
need to change only the 1's on Line 550 to the new field number.  If you wish
to sort on more than one field, you could rewrite Line 550 to read:
550  IF B$[H(3),I(3)]&B$[H(4),I(4)]>A$[P1+H(3),P1+I(3)]&A$[P1+H(4),P1+I(4)]
THEN P1=P1+P2 ELSE P1=P1-P2 !  etc., etc.
Needless to say, once you get started with a file, you cannot change the number
of fields or the sequence of the fields in the sorting order on Line 550.

NOTE:  If a change in the sorting order is necessary, instead of using one of
the published sort routines to re-sort your file, you could simply create a new
file with a new sorting order and use this program (with slight modification)
to bring in the records from the previous file, allowing this program to
insert-sort them much as if they were being typed in from the keyboard.
.PA
The next thing the program does in Lines 250 - 260 is to establish a control
pointer and increment/decrement variable for the Binary Searches.  As long as
the number of records is 2^n-1, these two algorithms do not change.  The
following is an illustration of the use of pointers in a Binary Search.

Imagine a sorted file with 7 records (2^n-1 records, where n=3).  For control
pointer purposes, we will assume N7=15 (bytes-per-record) and N0=105
(total bytes in file).

1) Allen, Jim
2) Best, Frank
3) Dierkop, Chuck
4) Feldman, Bill
5) Firestone, Ray
6) Nelson, Richard
7) ~~~~~~~~~~~~~~~  (end-of-file code)

We have just input a new name at the keyboard (Johnson, Ray), and we want to
file it in order with the preceding names.  Our imaginary control pointer is
set to 45, and our increment/decrement variable is set to 30.  We read record
number 4 from the file (Feldman, Bill, byte positions 46 through 60), compare
it to our input record (Johnson, Ray), and when we see that Johnson is greater
in the ASCII order than Feldman, we move the control pointer up to 75 by adding
the increment/decrement variable to it, and divide the increment/decrement
variable by 2, giving it the value of 15.  We now read record number 6 from the
file (Nelson, Richard, byte positions 76 through 90), compare it to our input
record, and when we see that Johnson is less than Nelson in the ASCII order, we
lower the control pointer to 60 by subtracting the increment/decrement
variable, and divide the increment/decrement variable by 2, giving it the value
of 7.5.  We now read record number 5 from the file (Firestone, Ray, byte
positions 61 through 75), compare it to our input record, and when we see that
Johnson is greater than Firestone in the ASCII order, we move the control
pointer up to 67.5 by adding the increment/decrement variable to it, and divide
the increment/decrement variable by 2, giving it the value of 3.75.

NOTE:  In the last paragraph, when the increment/decrement variable dropped to
the value 3.75 (N7/4), it signalled the program that the search was finished
and the control pointer was midway between a lesser and a greater-or-equal
file record.  At this point, the control pointer is raised to 75, just in front
of record number 6 in the file (see Line 660 in the program).

The next step is to "shift" the file string by 1 record to create a "hole" for
the new record to drop into (see Lines 690 and 710).  These lines will then
have the actual values as follows:

690  A$[75+15+1,105]=A$[75+1,105-15] !  the file string is shifted by the width
710  A$[75+1,75+15]=B$               !  of one record and the new record is
                                     !  dropped into place as per the following
                                     !  illustrations:
.PA
The file string before shifting (count only the characters and spaces inside of
the double brackets:

||Allen, Jim     Best, Frank    Dierkop, Chuck Feldman, Bill  Firestone, Ray ||
||Nelson, Richard~~~~~~~~~~~~~~~||

The file string after shifting:

||Allen, Jim     Best, Frank    Dierkop, Chuck Feldman, Bill  Firestone, Ray ||
||Nelson, RichardNelson, Richard||

The file string after the new record is dropped into place:

||Allen, Jim     Best, Frank    Dierkop, Chuck Feldman, Bill  Firestone, Ray ||
||Johnson, Ray   Nelson, Richard||


The reason for creating a string of blanks (see Lines 300 - 330) is to be able
to "pad" each input field before the fields are joined together in the record
string in Lines 490 - 520.  The computer can easily find any record, and any
field within a record, when every record is the same length and every field is
the same length.  There are rare instances in filing and sorting when it is
better not to pad these inputs, but the great majority of situations require
equal-length records and equal-length fields to facilitate direct random-access
retrieval.

In Lines 340 - 390 we fill B$ and then A$ with CHR$(126)'s.  These ASCII values
are useful as end-of-file markers, and help prevent loss of data, file
overflow, and sorting out of order.

The last things for the program to do are:

Check for end-of-file in Line 420 (make sure there is room for another record).
Input the new data in Lines 430 - 480, and pad the inputs if applicable.
Join the fields together in the record string in Lines 490 - 520.
Set the starting values in Lines 530 and 540 for the control pointer and the
increment/decrement variable.
Set up a Binary Search loop in Lines 550 - 600.
Adjust the control pointer in Line 660, after the search is completed.
Shift the file string in Line 690, in preparation for the new insert.
Insert the new record in Line 710.
Return to INPUT NEW DATA starting in Line 410.

Last note:  If you use a language that does not permit long strings, you can
simulate a long string by concatenating shorter strings end-to-end (in series,
so to speak).  You will need 2 extra pointers; a pointer that denotes which
string in the series you are examining, and a pointer that denotes which byte
in the string you are looking at.  When you locate the insertion point, the
strings will have to be shifted one-by-one, and although it is not as fast in
interpreted BASIC as the long-string approach, the compiled versions will
handle tens of thousands of records in under 1 second.
