^ma5 75^pl-62 0 0 ^ceWHAT GOOD IS THIS THING CALLED AN ASSEMBLER ? ^sk2^paAt this time we might do well to ask what is a LEX file ? Other than to acknowledge their existence, the HP71 Owners Manual does not discuss them (see pages 99 and 160). Before jumping into the subject, I will assume the reader is somewhat familiar with the following; binary, decimal, and hexadecimal numbers; ASCII characters; the HP71 Owners Manual. I will also make some references to the HP41 for those familiar with that machine. ^sk1^paTo gain the necessary background, a good starting point might be; "What happens when I type on the 71 keyboard then press ENDLINE ?" [for memory layout see IDS Volume 1, Chapter 3, Page 3-12 ; or FORTH ROM Owners Manual, p25] While awake and waiting for a key press, the operating system is in a 'character editer' routine. This character editer has been called from the 'main loop', which is where the machine does a little housekeeping and waits to be told what to do. As keys are pressed from the keyboard, the ASCII representation of the key is placed sequentially in a section of memory called 'Display Buffer' (actually inside the LCD IC chip, 96 bytes, 192 nibs starting @ 2F480). The Display Buffer, although inside the LCD IC chip, is not the visable Liquid Crystal Display. The visable LCD is addressed in three non-contiguous segments (2E100, 2E200, 2E300), the operating system makes the LCD reflect the Display Buffer. When the ENDLINE key is finally pressed, a copy of what was in the Display Buffer is edited, in ASCII form, into the Command Stack. In the Command Stack, a Carriage Return (#0D, or 13) is added after the last character. ^sk1^paNow in the Command Stack, we begin 'Parsing'. Parsing, is the computers way of seeing if what we typed is legal. The lexical analyzer (a lexicon is a dictionary) scans our input, comparing it with the mainframe 'dictionary' (actually two LEX files). The HP71 operating system now does several things. ^paFirst, the ASCII representation (TEXT files, etc) tends to use a LOT of memory ! The 71 operating system therefore has assigned 'Tokens' (much like 41 XROM numbers) to all built-in BASIC Keywords and symbols. All the built-in Keyword and symbols are contained in two LEX files. LEX files are the BASIC language dictionary, among other things. ^paAs the parser scans the input text, it builds in another part of memory, a stream of tokens which are commands used later at execution time. ^paSecond , we look for a leading line number; if present we edit the token stream into the current file, then return to the main loop to start over. If there is no line number, then the line is immediate execute. In this case the token stream is copied to the 'Statement Buffer' where execution commences. ^sk1^paAt execution time, either keyboard or program, the BASIC interpreter scans a stream of Tokens. Some of the mainframe Tokens vary, but in general a Token consists of a one byte LEX ID number and a one byte Token within that LEX ID number. I have now used a word to have two meanings, lets get sorted out. The Parser tokenizes an ASCII stream into a token stream. The reference to an external LEX file is called a XWORD. Within a LEX file we have Token numbers. ^ad ^sk2A BASIC Keyword thus is represented by a byte for the LEX ID and a byte for the Token number. Token number zero is not usable leaving 255 tokens (Keywords) in one LEX file. The mainframe uses LEX ID numbers 00 and 01 leaving 254 possible external LEX IDs. ^sk1^paThe BASIC interpreter uses the LEX ID and jumps to the LEX files 'MAIN Table'. The Main Table is arranged in ascending numerical value by Token number. We then go to the appropriate Token entry in the Main Table. There we find the address (relative to where we currently are) of the 'execution code'. The interpreter then jumps to the appropriate code to execute. When finished executing the code for a particular keyword, the interpreter checks to see if there are any tokens left in the stream to execute, if not control is returned to the main loop. ^sk1^paIf taking the ASCII and turning it into to Tokens (memory efficient) is called Parsing, then what do we do to view, list or print our original ASCII ? We 'Decompile'. Decompiling is sort of the opposite of Parsing, once Parsing is understood, Decompile is easy. ^sk1^paThe HP71 file chain is a contiguous block of memory starting at the location in memory contained in the system pointer MAINST (5 nibs @ 2F558). Each file (BASIC, TEXT, DATA, etc) starts with a 37 nibble header. Within the header is the files name, type, flags (private/secure), copy code (needed by the system), creation time, creation date, and a chain field. The chain field points to the start of the next file, the last file has chain length zero. After the file header, a BASIC file has a 12 nib subheader containing the Subprogram and User Defined Function chains (5 nibs each) and a 2 nib constant F0. Following the subheader is the programs Token stream. ^sk1^paIn a LEX file, following the file header is 25 nibs which are LEX ID (2 nibs), lowest Token number (2 nibs), highest Token number (2 nibs), LEX file link (5 nibs),either 1 or 80 nibs depending on existence of the speed table, offset to the TEXT Table (4 nibs), offset to the Message Table (4 nibs), and last, a 5 nib offset to the Poll handler. ^paIt is possible to link together multiple LEX files yet have them appear as one 'regular' file in the file chain. If a LEX file contains a lot of Keywords (remember 255), parsing can be made much faster by including a Speed Table, which is a alphabetically arranged table pointing into the TEXT Table. ^sk1^paThe MAIN and TEXT tables are what the interpreter uses during Parse, Decompile, and Execution to access the files Keywords. The MAIN table is ordered numerically ascending, by Token number. Each entry in the MAIN table consists of three things; (1) a 3 nib constant indicating which Keyword in the TEXT table is associated. (2) A 5 nib offset to the start of the associated execution code. (3) A 1 nib characterization; Function, Statement, programmable, etc. ^paA TEXT table entry also has three parts; (1) A 1 nib length (in nibs-1) of the ASCII text. (2) A 2-16 nib constant representing the ASCII of the Keyword, the table is in alphabetical order. (3) A 2 nib constant of the Token number (which entry in the MAIN table associates). ^ad ^sk2^paAt Parse time, the lexical analyzer scans the TEXT table for a match. If a match is found, the Token number is read which serves as an offset into the MAIN table. In the MAIN table, the execution address is read and jumped to. For a Statement, the 5 nibs immediatly prior to the execution address contain a relative offset to a routine which knows how to parse this particular Keyword. 10 nibs prior to the start is a 5 nib offset to the routine to decompile this Statement. For a Function, the 2 nibs prior to the execution address indicate the minimum and maximum number of allowable parameters going into the Function. Prior to those 2 nibs, is a 1 nib descriptor of each allowable parameter (8=num, 4=string, etc). ^paDuring Execution, the Token number gets us to the execution code via the MAIN table. During Decompile, the Token number gets us to the Decompile routine and to the text for the Keyword (in the TEXT table).[for further info see IDS Vol 1, Chaps 6 & 7] ^paLEX files also contain a Message table (optional). This approach allows 'messages' to be built-up out of 'building blocks', thus increasing flexability and lowering memory usage. This approach also allows easy foreign language translation. For full details please refer to IDS Volume 1, Chapter 10. ^paLEX files can also contain (optional) a poll handler routine. Polls are the systems way of providing 'hooks' to the outside world. At umpty different places in the 71, polls get issued. Each LEX file in the machine, RAM or ROM, is given the oppertunity to 'answer' the poll. This allows things to be easily customised. As an example, the HP41c with a Timer ROM. When the 41c shutsdown, a poll is issued, the timer ROM checks if the shift key has been pressed and if so, displays the clock. The HP41c only recognises 7 poll types, by contrast the 71 can understand 256. As an example, Richard Nelson asked me to write a file to move the 71 cursor. I did that by answering a poll issued each time a key is pressed. I check if USER mode is set, (I forgot to trap out CALC mode, it SHOULD NOT be used in CALC!, system pointers are funny, not funny haha, corrupt, not normal), a test is then made for the left or right, blue shift arrow keys being pressed. If so the escape sequence for L or R cursor is sent 21 times. Please note that hardware stack levels had to be saved and restored and the poll exited properly. ^sk1^paWe now come to the question, "What is Assembly language?" Most of what we, the user community get from HP is in Assembly language. Assembly (microcode for 41c), is what the operating system is and what FORTH uses. The Assembly language instruction set tells the CPU chip what to do next. BASIC is very slow (relatively speaking) because of the overhead needed to process Tokens. By writing directly in Asembly Language, much of the overhead is eliminated, the code is compact, and the speed is awesome. In my own experience, I could never get the machine to behave the way I REALLY wanted without Assembly language. Absolutely, positively, EVERYTHING in the machine is completely accessable. For those who read Scientific American, core-war type diseases could easily be passed via LEX files (though INIT:3 and a thorough file check should remedy). Come to think of it, BINARY files are no so immune either. ^ad ^sk2^paIf, in order to gain access to the CPU, we had to use binary, it would take forever. Enter thee Assembler. The Assembler, is a program, written in FORTH, which takes a TEXT file containing lines of easily remembered mnemonics, and writes the proper code to memory to create the desired LEX, or BINARY files, or FORTH primitives. To write in Assembly language then, we use a text editor to create a TEXT file. We then tell the Assembler to assemble the TEXT file into the object file. Once assembled, the source TEXT file is no longer needed in memory. In fact, very long source files may be assembled from tape or disc without EVER being in memory. ^sk1^paMost of the information needed is in the FORTH/Assembler ROM Owners Manual pages 45-47. That is, information on how the Assembler interprets a TEXT source file. For anyone writing in Assembly Language, IDS Volume 1 is a must. It contains ALL of the needed information on the operating system theory. IDS Volume 2 is also handy, it is all the information needed by the supported entry points. If you don't mind the price difference, Volume 3 contains all of Volume 2 plus the source code. Just in passing, both CHHU and PPC need 5.50 inch column width for publication. Happily, the Assembler will cooperate. On page 46 of the Assembler Rom Owners Manual, a suggested line format is shown. In the intrests of memory and good-looking AND submittable source files, I suggest starting comments at character position 22 and limit line length to 59, not 80. Comments between 60-80 are truncated by the Assembler to the listing file. This makes the listing files somewhat unreadable (ARE YOU LISTENING HP !!!!!). Additional comments are easily handled by a comment line (line starts with *). If you are really tight on memory, column formatting (readable) can be replaced by a space. The only thing is; (WATCH OUT) unless you want it to be a label, start at least in column 3. Any word in column 1 or 2 is treated as a label. ^sk1^paJust in passing: ^sk0I now start all serious Assembly sessions with an INIT:3. This is because I once had a situation where assembly of a bad file left memory corrupt. The corrupt memory trashed all future assembly until an INIT:3 was finally done. Save the headache; start clean. ^sk1^paWhen writing in Assembly language, certain pointers must be maintained. CPU Data Pointers D0 and D1 and RAM pointers. When using mainframe subroutines, the entry and exit conditions of the CPU Registers and memory are needed (contained in IDS Volume 2 or {more expensive} 3). The mainframe has over 1700 supported entry points. ALL the math functions are present, in '15 form' (15 digit mantissa, 5 digit exponent, wow!). It's RPN (sort of); certainly it's as fun as programming the 41. If you enjoyed synthetic programming on the 41C, then you'll love Assembly language on the 71B. Mind you, I'm not advocating writing everything in Assembly language, rather, using BASIC and FORTH as 'shells' and using Assembly language to customise the machine to perform EXACTLY as we desire.