DATA PACKING ON THE HP71 AND HP75 Introduction This article proposes a method for packing strings of data. We have decided to pack ordinary strings because we believe they represent one of the most universal and versatile data structure available in HPBASIC, (a part from files). Strings can be used does not achieve the maximum packing which is possible, but it has the advantage of being simple and fast. With these features, it is easy to implement in machine-code. We will present the algorithm by means of an informal description, followed by a prototype in BASIC (HP75 + I/O ROM), and then by two LEX files, one for the HP75 and one for the HP71. Initially, Stefano Piccardi had the idea of applying the adaptive Huffman codes [1], but the method proved itself too complicated. Thus Stefano Piccardi devised the algorithm presented here. After the accomplishment of a first prototype in BASIC, he involved Stefano Tendon, who developed a LEX file version for the HP71. Stefano Tendon also modified the unpacking algorithm (only for the HP71), whereas the LEX file version for the HP75, by Stefano Piccardi, reflects faithfully the prototype in BASIC. Great care has been taken in order to make uniform the I/O formats of both the HP71 and HP75 versions, and in order to ensure that unpacking is the inverse function of packing. Description of the algorithm The packing scheme intends to get rid of all unused most significant bits from every byte of a string. For example, the string "ABEGH" is composed of the following bytes (in standard ASCII encoding): "A" "B" "E" "G" "H" (65) (66) (69) (71) (72) [0100 0001] [0100 0010] [0100 0101] [0100 0111] [0100 1000] Note: we represent binary values enclosed by square brackets, and decimal values enclosed by round brackets. We observe that if we subtract the minimum ASCII value (65) which appears in the string from every character contained in the same string, then we will have no loss of information. The five characters can still be distinguished one from another, thus the string "ABEFG" can be represented by the following bit- configuration alike: "A" "B" "E" "G" "H" (0) (1) (4) (6) (7) [000] [001] [100] [110] [111] Now, if we take the bits we have found and put them one after the other, we will get a sequence of bytes, which represent exactly the original string in a packed form: (3) (55) [?000 0011] [0011 0111] \_/ \_/\_____/\__/\_/ A B E G H In this particular example, we note that the first bit of the first byte - represented with a question mark - has no other function than that of a "filler" so that the total number of bits is an integer multiple of 8. In this way, the bit configuration can easily be represented by a certain number of bytes. The algorithm will use zero-bit fillers, and these can never be more than 7. Thus the original string which contained the five bytes (65), (66), (69), (71) and (72), has been reduced to the only two bytes (3) and (55). Actually, in the real programs, the packed bytes obtained in the way described above, need to be prefixed by another four bytes. These additional four bytes make up a "header" which is unique to any string containing packed data. In particular, this header contains all information necessary for the unpacking algorithm to work. The unpacking algorithm itself works in the opposite way, re-building the original bytes from the packed string by means of the information encoded in the four-byte header. Therefor, in the example described above, the packed string will really be six bytes long instead of two, and the length of the string will increase instead of decreasing. (This example is only a particular case in which it is not worthwile to pack the string; it has been presented only for the sake of illustration). "All compression methods depend on having some knowledge of the structure of their input - otherwise there would be no redundant information to squeeze out. And so with every method you can always find a special case that gets worse when compressed." [2]. So we understand that the efficiency of the algorithm depends also on the characteristics of the input data. In particular, the compression will be greater the longer is the string and the narrower is the interval between the minimum and maximum ASCII value which appear in the string. This invites to manipulate the strings you wish to pack before you proceed with the packing itself. For example, if you need to pack real numeric data, in order to minimize the range of the interval between the minimum and maximum ASCII value, it is advisable to choose the exponent symbol, the decimal point symbol, the sign symbol and the data separator symbol so that they are adjacent to the set {'0'...'9'} in the ASCII encoding. Thus: Exponent symbol '/' Decimal point symbol '.' Sign symbol '-' Data separator symbol ',' In this way MAX - MIN = 14, and you will need only 4 bits to represent any symbol. In this case the compression is near to 50%. The BASIC prototype We whish to perform the transformation: I$ -> PACKING -> O$. Let I = LEN (I$) and O = LEN (O$). The relation between I and O is given by the following expression: O = CEIL (A/8*I) + 4, where A = CEIL (LOG (MAX - MIN) / LOG (2)), and MAX and MIN are, respectively, the maximum and minimum ASCII value of the characters of I$. The preceding expression is defined only for 1 <= A <= 7. The BASIC packing algorithm is correct only within that interval. If A = 0, then unpacking O$ will not give I$ but RPT$( I$[I],I ). Furthermore: I > 1, in order not to have errors at line 1130. These last two limitations do not exist for the LEX file versions. For these, it is understood that O = I+4 if A = 0, and that for every string S$ we have: SUNPACK(SPACK(S$)) = $. Note that not necessarily I <= O. And lastly I must be encoded in two bytes: I < 65536. Algorithm Data packing occurs according to the following phases: [1] Calculation of MAX {U} and MIN {L} (1010-1050); [2] Subtraction of MIN from every character (1060-1080); [3] Calculation of the number of bits to get rid of from every byte {A} (1090- 1110). Line 1090 is operative only if the string is composed of the same character; [4] Initialization of the right shift factor {S} for every byte. The shift factor indicates how many bits are available in the destination byte used in the current iteration of the packing loop. The shift factor may vary between 1 and 8, in a 'circular' manner, i.e. with respect to modulus 8 (but with the value 0 mapped in the value 8). The values of the circular decrement factor {D}, (i.e. the number of bits that contain information in every byte of the original string), the source byte pointer {P2} and the destination byte pointer {P1} are also initialized; [5] Starting at the end of the string and proceeding towards the beginning, the following operations are performed on each byte. Right shift of all bits (1130-1190). X$, at line 1130 is a temporary two byte buffer, which is used to accomplish the right shift. The 8 left-most bits of X$ contain a byte from the original string, and the remaining bits are initialized to zero. All bits of X$ are shifted right by S positions, introducing zero-bits from the left end, i.e. it is a "logical" right shift (1130). After the right shift, the right half of X$ will contain, in the correct position, all or some of the informative bits of the original byte. These bits are added logically ("ORed") to the current destination byte (1140). The shift factor for the next iteration is calculated {S1}. Due to the structure of X$, this value must be corrected to 8 whenever the circular decrement factor will make it zero (1150). If the new shift is not less than the preceding shift factor, a whole packed byte has been built. In this case the destination pointer is update, and any informative bits left over from the last original byte are recovered in the new destination byte (1160). The source pointer is updated (1180), and the iteration continues until the beginning of the string is reached; [6] If, at the end of the packing loop, the shift factor is eight, then the last destination byte allocated has not been used. In this case the destination pointer must be corrected by making it point one character to the right (1200). [7] The packed string so obtained is prefixed with the following four byte header: the first two bytes encode the length of the original string in base 256. The third byte is the value of the circular decrement factor. The last byte is the value of the minimum ASCII value (MIN) in the original string (1210-1220). The unpacking scheme occurs according to the following phases: [1] Determination of the length of the original string {N}, the number of bits to insert in front of each packed byte {A}, and MIN (2010-2040). Allocation of a string of N bytes (2020); [2] Creation of a 'mask' byte that, multiplied logically with another byte, will put to zero its A left-most bits {M$} (2050-2060); [3] Initialization of the source pointer {P2} and of the shift factor {S} (2070); [4] Starting at the end of the string to unpack, and proceeding towards the beginning, two successive bytes are fetched through the pointer P1 and they are put in X$ (2090). X$ is shifted right by S bits, introducing zero bits from the left end (2100). In the secon half of X$, the 8-A right-most bits will be the informative bits of original string which has been packed. These bits will be in the correct position, but they will also be precede by some other bits coming from another byte of the packed string. These unrelated bits are put to zero by means of M$, and the result is transferred to the allocated string (2110). The next shift factor is calculated {S1}, and if it is less than the preceding shift factor, then the current source byte has been unpacked completely, and thus the souce pointer is updated to next source byte (2130) and the iteration continues (2150). [5] Every character of the new string obtained in this way is incremented by MIN (2160) and so the original string is reconstructed completely (2170). BASIC program listing (HP75 + I/O ROM) 0997 ! 0998 ! PACKING 0999 ! 1000 DEF FNC$(S$) 1009 ! 1010 U=0 @ L=255 1020 FOR I=1 TO LEN(S$) 1030 IF NUM(S$[I])>U THEN U=NUM(S$[I]) 1040 IF NUM(S$[I])=S THEN P1=P1-1 @ S$[P1,P1]=X$[1,1] 1170 S=S1 1180 P2=P2-1 1190 IF P2 THEN 1130 1199 ! 1200 IF S=8 THEN P1=P1+1 1210 X$=CHR$(LEN(S$))&CHR$(LEN(S$) DIV 256) 1220 FNC$=X$&CHR$(A)&CHR$(L)&S$[P1] @ END DEF 1997 ! 1998 ! UNPACKING 1999 ! 2000 DEF FNU$(S$) 2009 ! 2010 N=256*NUM(S$[2])+NUM(S$[1]) 2020 U$[N,N]="" 2030 A=NUM(S$[3]) 2040 L=NUM(S$[4]) 2049 ! 2050 X$=CHR$(0) @ X$=ASHF$(X$,A,1) 2060 M$=AXOR$(CHR$(255),X$) 2069 ! 2070 P2=LEN(S$) @ S=0 2079 ! 2080 FOR P1=N TO 1 STEP -1 2090 X$=S$[P2-1,P2] 2100 X$=ASHF$(X$,S,0) 2110 U$[P1,P1]=CHR$(L+NUM(AAND$(M$,X$[2,2]))) 2120 S1=MOD(S-A,8) 2130 IF S1 MAX? 0055 FB 01 IFNC yes 0057 A0 LDB R21,R03 char -> MAX ENDIF 0058 50 C0 CMB R20,R03 char < MIN? 005A FA 01 IFCY yes 005C A0 LDB R20,R03 char -> MIN ENDIF 005D F0 EC WHMP go on SCALE reduce all char's of input string within [0,MAX-MIN] 005F 50 20 A2 STB R20,R40 save MIN 0062 56 1E A3 STM R26,R36 initialize P1 0065 1C A3 STM R26,R34 initialize P2 0067 26 A3 STM R26,R46 save output string end address LOOP 0069 58 8B DCM R30 dec char count 006B F4 09 JNG COMPA done 006D 43 14 E2 POBD R03,-R24 get a char 0070 10 C4 SBB R03,R20 char -= MIN 0072 16 E6 PUBD R03,-R26 char to output 0074 F0 F3 WHMP go on COMPA compute A 0076 51 10 C4 SBB R21,R20 R21 <- MAX-MIN 0079 F6 02 IFZR MAX - MIN = 0?, yes 007B A8 01 LDB R21,=1 make it 1 ENDIF 007D 50 92 CLB R20 A <- 0 CY SHFT1 yes, done with A 0083 50 88 ICB R20 A += 1 0085 F0 F8 WHMP go on /* may want to issue warning 'no compression' on R20 = 0. SHFT1 shift bytes 0087 51 A8 08 LDB R21,=8 008A 14 A2 STB R21,R24 initialize S=8 008C 15 A2 STB R21,R25 save copy for shifting loop 008E 10 C4 SBB R21,R20 initialize D = 8 - A 0090 58 92 CLB R30 initialize temp area for S$[P1,P1] 0092 59 A8 F8 LDB R31,=11111000B bit mask for MOD LOOP compact bytes 0095 52 8B DCM R22 decr char count 0097 F4 26 JNG PUTTAG done 0099 56 92 CLB R26 set X$ to 009B 57 1C E2 POBD R27,-R34 S$[P2,P2]&CHR$(0) LOOP shift R26/27 right R24 bits 009E 57 87 LRM R27 00A0 54 8A DCB R24 00A2 F6 FA WHNZ 00A4 58 16 94 ORB R30,R26 temp area for S$[P1,P1] or= X$[2,2] 00A7 55 11 C4 SBB R25,R21 S -= D 00AA FA 02 JNC NEWCHR S < 0? or 00AC F6 05 JNZ SKIP S > 0? NEWCHR a whole compressed byte is assembled 00AE 58 1E E6 PUBD R30,-R36 S$[P1,P1] to output 00B1 17 A0 LDB R30,R27 temp area for S$[P1,P1] <- X$[1,1] SKIP 00B3 55 19 94 ORB R25,R31 do MOD 00B6 96 XRB R25,R31 00B7 F6 02 IFZR map S=0 to S=8 00B9 A8 08 LDB R25,=8 ENDIF 00BB 14 A2 STB R25,R24 save copy for shifting loop 00BD F0 D6 WHMP go on PUTTAG prefix string with decoding information 00BF 55 C8 08 CMB R25,=08 00C2 F7 03 IFNE 00C4 58 1E E6 PUBD R30,-R36 last S$[P1,P1] to output ENDIF 00C7 60 1E E6 PUBD R40,-R36 MIN to output 00CA 50 E6 PUBD R20,-R36 A to output 00CC 5A E7 PUMD R32,-R36 length to output 00CE 66 C5 SBM R46,R36 compute output string length 00D0 0A E5 PUMD R46,+R12 length to stack 00D2 5E E5 PUMD R36,+R12 addr to stack 00D4 9E RTN 00D5 18 2E BYT 30,56 one num. param., string function UPAK$. uncompact string 00D7 54 DRP 24 get string addr 00D8 CE 12 1F JSB =GETAD# off stack addr 00DB 66 0A E3 POMD R46,-R12 get string length 00DE 58 14 A1 LDM R30,R24 initialize P2 00E1 26 C3 ADM R30,R46 (R30/31) 00E3 6E 14 E1 POMD R56,+R24 get output string length 00E6 CE 1F 24 JSB =RSMEM- get some room 00E9 F8 E9 REN none, give up 00EB 6E 0A E5 PUMD R56,+R12 length to stack 00EE 1E A3 STM R56,R36 save copy 00F0 56 0A E5 PUMD R26,+R12 addr to stack 00F3 2E C3 ADM R26,R56 initialize P1 (R26/27) 00F5 50 14 E0 POBD R20,+R24 get A (R20) 00F8 11 A2 STB R20,R21 save copy 00FA 60 14 E0 POBD R40,+R24 get MIN (R40) 00FD 5A A9 00 FF LDM R32,=FF00 compute mask M$ (R32) LOOP 0101 5B 87 LRM R33 0103 51 8A DCB R21 0105 F6 FA WHNZ 0107 5B 92 CLB R33 initialize S=0 0109 61 92 CLB R41 copy S for shifts 010B 54 18 E2 POBD R24,-R30 initialize temp 010E 55 E2 POBD R25,-R30 area for X$ 0110 62 A8 F8 LDB R42,=11111000B bit mask for MOD LOOP uncompact string 0113 5E 8B DCM R36 decr char count 0115 F4 BD RNG 0117 54 12 A3 STM R24,R22 copy X$ for shifts LOOP shift X$ right S bits 011A 61 8A DCB R41 decr shift count 011C F4 04 JNG ABC done 011E 53 87 LRM R23 0120 F0 F8 WHMP ABC 0122 52 1A 94 ORB R22,R32 X$[2,2] and= M$ 0125 96 XRB R22,R32 0126 20 C2 ADB R22,R40 add MIN 0128 16 E6 PUBD R22,-R26 char to output 012A 5B 10 C4 SBB R33,R20 S -= A 012D FA 05 IFCY cross byte boundary 012F 55 14 A2 STB R25,R24 P2 -= 1 0132 18 E2 POBD R25,-R30 ENDIF 0134 5B 22 94 ORB R33,R42 do MOD 0137 96 XRB R33,R42 0138 21 A2 STB R33,R41 save copy for shifting loop 013A F0 D7 WHMP go on HP75 object code hex listing SPACKLEX L 334 Bytes LEX ID:12556 Line ---------------DATA---------------- Check 1 71 A4 3C 01 8D 4C A7 4E 24 A1 53 50 8C 2 41 43 4B 4C 45 58 0C 31 0A 00 16 00 17 3 0E 00 25 00 29 00 2C 00 D7 00 29 00 89 4 29 00 FF FF 53 50 41 43 4B A4 53 55 E9 5 4E 50 41 43 4B A4 FF 00 FF 61 17 9E 2A 6 18 2E 54 CE 12 1F 6E 0A E3 12 A3 18 C4 7 A3 CB 04 00 CE 1F 24 F8 EA 56 2E C3 B1 8 6E 12 A1 1A A3 50 A9 FF 00 6E 8B F4 C8 9 10 43 14 E0 51 03 C0 FB 01 A0 50 C0 0C 10 FA 01 A0 F0 EC 50 20 A2 56 1E A3 1C C1 11 A3 26 A3 58 8B F4 09 43 14 E2 10 C4 5E 12 16 E6 F0 F3 51 10 C4 F6 02 A8 01 50 FA 13 92 51 84 FB 04 50 88 F0 F8 51 A8 08 2D 14 14 A2 15 A2 10 C4 58 92 59 A8 F8 52 7B 15 8B F4 26 56 92 57 1C E2 57 87 54 8A A3 16 F6 FA 58 16 94 55 11 C4 FA 02 F6 05 19 17 58 1E E6 17 A0 55 19 94 96 F6 02 A8 50 18 08 14 A2 F0 D6 55 C8 08 F7 03 58 1E 1E 19 E6 60 1E E6 50 E6 5A E7 66 C5 0A E5 E1 20 5E E5 9E 18 2E 54 CE 12 1F 66 0A E3 D1 21 58 14 A1 26 C3 6E 14 E1 CE 1F 24 F8 67 22 E9 6E 0A E5 1E A3 56 0A E5 2E C3 50 92 23 14 E0 11 A2 60 14 E0 5A A9 00 FF 5B 5D 24 87 51 8A F6 FA 5B 92 61 92 54 18 E2 86 25 55 E2 62 A8 F8 5E 8B F4 BD 54 12 A3 E2 26 61 8A F4 04 53 87 F0 F8 52 1A 94 96 41 27 20 C2 16 E6 5B 10 C4 FA 05 55 14 A2 1C 28 18 E2 5B 22 94 96 21 A2 F0 D7 30 29 92EA HP71 LEX file listing LEX 'SPACKLEX' TITLE String Data Packing Functions. *********************************************************** * By: Stefano Tendon [CHHU 635] * Cantone Delle Asse 5 * 29100 Piacenza * Italy * * Tel. (0523) 21180 * * Date: 1985.08.13. *********************************************************** ID #5C MSG 0 POLL 0 ENTRY ascmax CHAR #F ENTRY ascmin CHAR #F ENTRY okbits CHAR #F ENTRY dpack CHAR #F ENTRY dunpck CHAR #F KEY 'ASCMAX' TOKEN 01 KEY 'ASCMIN' TOKEN 02 KEY 'GOODBITS' TOKEN 03 KEY 'SPACK$' TOKEN 04 KEY 'SUNPACK$' TOKEN 05 *********************************************************** * E Q U A T E T A B L E * A-MULT EQU #1B349 ADHEAD EQU #181B7 AVMEMS EQU #2F594 EXPR EQU #0F23C FUNCD0 EQU #2F8BB HDFLT EQU #1B31B POP1S EQU #0BD38 REVPOP EQU #0BD31 REV$ EQU #1B38E STKCHR EQU #18504 STRHDR EQU #0F09A * *********************************************************** NIBHEX 411 ascmax A=0 W GOSBVL POP1S Check if parameter is a string. ASRB A Transform nibble length into bytes. GOSUB findrg Find ASCII range of bytes in string. A=0 W A=B B Take the maximum ASCII value. D1=D1- 16 Point to new TOS. GOSBVL HDFLT Convert maximum to decimal. DAT1=A W Write maximum to math stack. GOTO expr *********************************************************** NIBHEX 411 ascmin A=0 W GOSBVL POP1S Check if parameter is a string. ASRB A Transform nibble length into bytes. GOSUB findrg Find ASCII range of bytes in string. A=0 W C=D B A=C B Take the minimum ASCII value. D1=D1- 16 Point to new TOS. GOSBVL HDFLT Convert minimum to decimal. DAT1=A W Write minimum to math stack. GOTO expr *********************************************************** NIBHEX 411 okbits A=0 W GOSBVL POP1S Check if parameter is a string. ASRB A Transform nibble length into bytes. GOSUB findrg Find ASCII range of bytes in string. C=0 W C=B B C=C-D B Compute range: maximum - minimum. A=0 W GOSUB okloop Get # significant bits. D1=D1- 16 Point to new TOS. GOSBVL HDFLT Convert # sign. bits to decimal. DAT1=A W Write result to math stack. GOTO expr *********************************************************** okloop CSRB B Find how many bits are significant A=A+1 B in the byte C[B]. ?C#0 B GOYES okloop RTN *********************************************************** * Find ASCII range of bytes in a string. findrg B=0 B Initialize maximum. LCHEX FF D=C B Initialize minimum ?A#0 A IF null string THEN GOYES findlp D=0 B maximum = minimum = 0 RTN * ELSE findlp A=A-1 A WHILE there are characters DO RTNC C=DAT1 B Read the current character ?C<=B B IF char > max THEN GOYES maxok B=C B max = char ENDIF maxok ?C>=D B IF char < min THEN GOYES minok min = char ENDIF D=C B minok D1=D1+ 2 Update pointer to next char. GONC findlp ENDWHILE * ENDIF *********************************************************** NIBHEX 411 dpack GOSBVL REVPOP If parameter is a string REVerse it. CD0EX Save PC. D0=(5) FUNCD0 DAT0=C A B=0 W B=A A BSRB Transform length into byte count. A=B W R3=A Save byte count. GOSUB findrg Find ASCII range of bytes in string. C=D B R2=C Save minimum ascii value. CD1EX R1=C Save start of stack item for ADHEAD. D1=C D1=^(End Of String). D0=C D0=^(EOS) C=0 W C=B B C=C-D B Max - min. A=0 W Initialize significant bits counter. GOSUB okloop R0=A Save # goodbits (G). A=R3 B=A A Initialize main loop counter. D=0 B Initialize # available bits (A) in * the destination byte. packlp B=B-1 A WHILE there are characters DO: GOC pckout D0=D0- 2 Update source pointer. A=0 W A=DAT0 B Read source character. C=R2 A=A-C B Subtract minimum. ASL A Set up null byte to the right of ASL A the current byte. C=D B Set up shift counter (S): S=A * Shift S bits from current byte: sftlop C=C-1 B GOC sftout ASRB GONC sftlop * sftout C=DAT1 B Read destination character. C=C!A B St"OR"e source into destination. DAT1=C B Write destination character. C=R0 ?C A THEN GOYES nxtone D1=D1- 2 Update destination pointer. ASR A ASR A Initialize destination with DAT1=A B remaining part of source byte. LCHEX 08 D=D+C B A = A + 8 C=R0 ENDIF nxtone D=D-C B A = A - G GONC packlp ENDWHILE. * pckout LCHEX 08 ?D#C B IF A = 8 THEN GOYES adhead D1=D1+ 2 Correct pointer to new TOS. * adhead ST=1 0 Do return from ADHEAD. D0=(5) AVMEMS C=DAT0 A D=C A C=R2 Add minimum ASCII value to string. GOSBVL STKCHR LCHEX 08 A=R0 C=C-A B Compute # badbits. GOSBVL STKCHR Add # badbits to string. C=R3 CSR A CSR A GOSBVL STKCHR Add second length byte. C=R3 GOSBVL STKCHR Add first length byte. GOSBVL ADHEAD Add adequate string header. GOSBVL REV$ Re-Reverse the packed string. D0=(5) FUNCD0 Restore PC. C=DAT0 A D0=C GOTO expr *********************************************************** * Shift A[A] left of C[B] bits. aslbAC C=C-1 B RTNC A=A+A A GOTO aslbAC *********************************************************** * Lp = Length (bytes) of packed string. * Lo = Length (bytes) of original string. * Of = Overhead factor. * U = # of bits left to unpack in current byte. * G = # of significant bits in unpacked bytes. * S = # of bits to shift in shift loops. * NIBHEX 411 dunpck GOSBVL REVPOP If parameter is a string REVerse it. CD0EX Save PC. D0=(5) FUNCD0 DAT0=C A CD1EX D0=C D0=^(Start of Packed string Header). C=C+A A C[A]=^(End Of String) D1=C D1=^(EOS) D=C A D=^(EOS) A=0 W A=DAT0 8 Read packed string header. C=0 W C=A A C=P 4 R3=C Save original string length (bytes). ASR W ASR W ASR W ASR W C=0 A LCHEX 8 C=C-A B Compute # significant bits (G). R0=C Save G. ASR W ASR W R2=A Save minimum ASCII value. D0=D0+ 8 D0=^(Start of Packed String). CD0EX D0=C D=D-C A D[A]=Nibble length of packed string. D=D+D A D=D+D A Compute: Lp * 8. C=R3 Get Length of original string. C=C+C A Transform byte length into nibbles. GOSBVL STRHDR D1=^(Start of Unpacked String) * R1[A]=^(new TOS) A=R0 C=R3 GOSBVL A-MULT Compute: Lo * G C=A A D=D-C A Of = Lp * 8 - Lo * G. LCHEX 08 CDEX B D=D-C B U = 8 - Of A=0 W A=DAT0 B Read first character. GOSUB aslbAC Left align the significant bits. C=R3 B=C A Initialize main loop counter. * unpklp B=B-1 A WHILE there are characters DO: GOC unpkot C=A B A=0 W Set up a null byte in front of A=C B the current byte. C=R0 ?C= U THEN GOYES outif C=D B S = U. GOSUB aslbAC Shift out S bits from byte. C=R0 C=C-D B S = G - U. A=C B LCHEX 08 D=D+C B U = U + 8. C=A B D0=D0+ 2 Update source pointer. A=DAT0 B Read the next character. * ENDIF outif GOSUB aslbAC Shift out S bits from byte. C=A A CSR A CSR A AR2EX C=C+A B Add minimum ASCII value. AR2EX DAT1=C B Write unpacked charater. C=R0 D=D-C B U = U - G. D1=D1+ 2 Update destination pointer. GONC unpklp * ENDWHILE unpkot C=R1 D1=C D1=^(TOS) GOSBVL REV$ Reverse the unpacked string. D0=(5) FUNCD0 Restore PC. C=DAT0 A D0=C * expr GOVLNG EXPR * END HP71 object code hex listing SPACKLEX ID#5C 464 bytes 0123456789ABCDEF ck 000: 35051434B4C45485 FF 001: 802E004100101000 1E 002: 5A300C5105000000 7F 003: FB30000000000000 C3 004: 008000FF003A000F 2F 005: E109C000F1305310 85 006: 0F040E4200FB1435 F2 007: 34D4148510B14353 08 008: 4D494E420F74F4F4 D3 009: 442494453530B350 F8 00A: 51434B44240F3555 A5 00B: E4051434B4425041 9E 00C: 1AF08F83DB081C7A 39 00D: 80AF0AE41CF8FB13 9E 00E: B1151767D2411AF0 81 00F: 8F83DB081C7E50AF 98 010: 0AEBAEA1CF8FB13B A7 011: 1151768A2411AF08 7F 012: F83DB081C7F20AF2 EC 013: AE9B6BAF072101CF C2 014: 8FB13B1151762728 5E 015: 1EB6496E7F01AE13 7E 016: 1FFAE78AC70AE301 8E 017: CC40014F9E950AE5 D2 018: 9EF50AE717154E41 76 019: 18F13DB01361BBB8 C5 01A: F2144AF1D881DAF4 AD 01B: 10375AFAEB10A137 4B 01C: 109135134AF2AE9B 6E 01D: 6BAF0767F100113D 19 01E: 8AE3CD405181AF01 A4 01F: 4A11AB6AF0F0AEBA F9 020: 6E48081C56F14F0E CA 021: 6A14D1189E3611C1 40 022: F4F41493180A6311 20 023: 8B635FA318096750 B6 024: 1718501B495F2146 E3 025: D711A8F405813180 2B 026: 110B628F4058111B E2 027: F6F68F4058111B8F 39 028: 405818F7B1818FE8 21 029: 3B11BBB8F2146134 10 02A: 6D11A6E400C467FF 3C 02B: 4118F13DB01361BB 2D 02C: B8F2144137134C21 29 02D: 35D7AF015A7AF2D6 7C 02E: 80C410BBF4BF4BF4 5A 02F: BF4D2308B62108BF 54 030: 4BF4102167136134 89 031: E3C7C711BC68FA90 1D 032: F011011B8F943B1D 63 033: 6E33180AEFB63AF0 88 034: 14A7D5F11BD5CD45 6C 035: 5AE6AF0AEA1189E3 2E 036: 22AEB7B3F118B6BA BE 037: EA3180A63AE61611 07 038: 4A7E1FD6F6F6122A E7 039: 6212214D118B6317 9B 03A: 15AA1191358FE83B CF 03B: 11BBB8F21461348D CB 03C: C32F0 47 User's instructions The following instructions are applicable both for the HP71 and the HP75. The I/O formats are functionally compatible in both versions, (i.e. a string packed on the HP75 can be unpacked on the HP71, and viceversa). The syntax to be used is given in the following syntax diagrams (È la HP71 Reference Manual): String packing: ----> ( SPACK$ ) ----> [ (string) ] ----> String unpacking: ----> ( SUNPACK$ ) ----> [ (string) ] ----> Examples: A$=SPACK$(UPRC$("Stefano & Stefano")) PRINT SUNPACK$(A$) Note for the HP71 LEX file The HP71 LEX file contains even the following string functions: ASCMAX, ASCMIN and GOODBITS. These functions return, respectivly, the maximum and the minimum ASCII value in a string, and the number of informative bits which are needed to encode a string whithout loss of information. These functions are not necessary in order to pack and unpack strings of data, (although the subroutines they call are necessary), and thus they can be eliminated if you don't need them. These functions have been used extensivly during program develepment and debugging: they allow to perform a 'manual' packing, so that the results obtained can be used to verify that the packing routine gives the desired output. Conclusions The data packing routines presented here do not give the maximum efficiency, although, if used appropriately, they will give significant results. The envisioned applications are endless, and a separate article would be necessary only to describe them. Just to mention a few: database programs, data transmission through the telephone line (less data = less time = less money!), enciphering programs, adventure games with large text files, data acquisition of very large amount of data, and so on ... On the more general side, we think we have reached a positive result. Since the HP71 has come out, 71'ers and 75'ers have been eating one another trying to show how much their machine was superior to the "other box". Now, at least we know that LEX files can bring these two distant "universes" closer. Stefano Piccardi [487] Stefano Tendon [635] Via Antonio Panizzi 13 Cantone Delle Asse 5 20146 MILANO 29100 PIACENZA Italy Italy ____________________ [1] "Variations on a theme by Huffman", by Robert G. Gallager, IEEE Transactions on Information Theory, Vol. IT-24, No. 6, P.668. [2] "Software Tools in Pascal", by Kernighan and Plauger, Addison-Weseley, 1981, p.37.