Previous Contents Next

CHAPTER 6: INCLUDE

The INCLUDE command is a utility with many applications under CP/M. The benefits it brings to assembly programming should be clear by this point in the book; all the of the programs here use it to include subroutines from a collection of standard code, with great savings in space and effort. But INCLUDE can be applied to any ASCII file.

INCLUDE was the first really ambitious program undertaken for this book. It presented a serious challenge, and could easily have turned into a monster. In fact, the program was completed in a reasonable amount of time and required very little debugging.

THE NEED FOR INCLUDE

I spent some time thinking about the possible uses for INCLUDE before I began designing it. I wanted it to be more than just a replacement for the include assembler directive that MAC ought to have and doesn't, or a copy of the #include directive that is a usual feature of compilers for the C language. The general notion of including one file into another can have much wider application.

The Use of Boilerplate

In both programming and writing, there arise countless occasions when we want to say "...and so on and so on, just as before." In word processing circles it is common to speak of "boilerplate," sections of standard verbiage to be inserted at standard points in the structure of a document. The same need arises in programming, where chunks of code are often identical from one program to the next.

The implication of the term "boilerplate" is that it is a mechanical task to reproduce the text at the point where it is needed. The boilerplate section is essential to the final product, but since skill isn't needed to produce it, there is no challenge in doing so. The insertion of boilerplate into a file is just the kind of mindless task that is best turned over to an idiotically diligent assistant—a computer.

Boilerplate in Program Text

The programmer has, or should have, a special affection for boilerplate. Whenever a programmer can say "here proceed exactly as before," he or she is doing a good job. Programmers' time is one of the most precious resources in modern industry. Wherever the programmer can arrange the circumstances of a specific problem so that a general, prewritten, solution applies, a valuable economy has been effected.

General, prewritten, solutions are usually implemented as subroutines or—where the text of the code has to be tailored before it is compiled—as macros. Once the solution has been prepared, its details cease to burden the programmer's mind; they can be neglected in favor of more important things.

The problem of incorporating the text of a prewritten solution in the program remains. MAC can, at least, handle macros, but it will include subroutine code only in a very awkward way. The linkage processors distributed with several compilers will link precompiled subroutines with a main program. But there are inconsistencies between one compiler's linkage conventions and another's, and many CP/M language translators have no automatic aids for including prewritten code at all.

However, if we base a solution on inserting prewritten text into a source file, just as boilerplate is inserted into a document, it will serve for both documents and programs, and will be independent of the rules of any programming language translator or word processor.

The Uses of Heirarchy

It is useful to be able to build up large files as a heirarchy of smaller ones. When we say "here proceed as before," what was done before might contain a place at which we said "here, proceed as in that other case." And that other case might have included a point at which we said "here use the standard solution," and so on. Such recursive calls for boilerplate arise often in programming, but they can be useful in writing as well. An entire book might be represented by a file that says, in effect,

Here insert Chapter 1. Here insert Chapter 2. Here insert Chapter 3...

A Chapter file might call for the insertion of Parts 1, 2, and 3, and a Part might call for Figures and Tables. However, the writer should not have to know at the time the whole-book file is prepared what sections Part 4 of Chapter 6 will want to include.

In the same way, the programmer should be able to say "here insert standard code for X." The programmer should not have to be concerned if X in turn requires the standard code for Y, and Y requires Z, and so on. But what if the programmer then says "here insert the code for Y"? The utility that does the insertion should not insert two copies of the Y code, once as requested by X and once where the programmer called for it.

The Need for Collections

Sections of boilerplate are often very small—a paragraph of text, a tiny subroutine. Often there will be a group of small sections that are related in some way. There are two reasons why it would be useful to be able to collect related boilerplate sections into a single file.

When the sections are related to each other, it would be nice to be able to treat the group as a unit. If the sections were in a single file, they could be loaded together for editing and copied from one disk to another with a single command. However, each section is an individual, and should retain an individual indentification of some kind.

CP/M wastes disk space when it stores a large number of small files. Several kilobytes of disk space may be saved when several boilerplate sections are stored in a single file. Furthermore, there is a limit to the number of different files that can be saved on one CP/M diskette, regardless of the size of the files. A single-density diskette can contain no more than sixty-four files. It is easy to build up a collection of a hundred bits of boilerplate. Unless they are grouped into files, they'd quickly fill a disk directory.

Search Order

The files and file segments to be inserted into a primary file may be on different disk drives. It is very important that the user not have to code their drive-letters in the primary file. That would make the contents of the file dependent on a particular disk layout, which would be intolerably inconvenient. The program that does the insertion must be capable of finding an included file on any of several different drives. But in a system that has several drives, the program could waste a lot of time searching for files. It must be possible to tell it which few drives to search, and in which order, so as to make the running time short.

There is another reason for telling the program the order in which to search drives for included files. In program development, we may have different versions of the same included text. There might be a standard version of a subroutine and also a new, updated version. These two versions might be stored on different disks under the same filenames. It is important that the include process find the right version of the included file. Which version is the "right" one depends on what we are trying to achieve at the moment. If we are about to test the updated version, we don't want the standard version included. But if we want to compile the standard version of the program, it is essential that the standard version of the included file be found. If we can tell the include utility the order in which it is to search drives, we can ensure that it will find the right version of a file before it finds the wrong one.

Requirements on INCLUDE

We want a program, then, that can operate on any ASCII file. It will look within the file for instructions to include text from other files, and insert that included text in its output file. There will be a way of including only a named part of an included file. Included text may call for other inclusions to some reasonable depth. If some file (or file section) is called for more than once, only the first call will be honored; later calls will be ignored. Include instructions will not specify the drive where the included text is to be found; instead, the program must accept a list of drives and search them in a specified order.

SPECIFYING INCLUDE

The INCLUDE command reads an ASCII input file and produces an output file that contains the input text plus any included files and file sections. The syntax of the command is:

INCLUDE input-ref output-ref /drives

INCLUDE uses the utility conventions for its input and output files. Omitted parts of output-ref will be supplied from input-ref. The output-ref operand may be omitted completely; in that case the output file will replace the input file.

The /drives operand, one or more drive letters preceded by "/", specifies which drives are to be searched for included files. If it is omitted, INCLUDE will search only the default drive. If it is given, INCLUDE will search only on the named drives, in the order given.

The INCLUDE command processes files one line at a time, rather than one character at a time. It can handle lines up to 2048 bytes in length, but no longer.

Here are some examples of the command:

A>include chap005.doc chap005.txt /b

The input file A:CHAP005.DOC will be read and copied to A:CHAP005.TXT. Where inclusions are called for, the command will search for the included files on drive B.

A>include b:pack.asm a: /bca

The input file A:CHAP005.DOC will be read and copied to A:CHAP005.TXT. Where inclusions are called for, the command will search for the included files on drive B, then on C, then on A.

Requesting Inclusions

The INCLUDE command inserts new text into the output file in response to a command in a line in the current input file. The command to include an entire file is

#INCLUDE fileref

The fileref operand may contain a drivecode, but it will be ignored. When the program finds an #include command, it searches for fileref on the drives it was told to search. If it finds the file, it displays the #include line at the console, then inserts the contents of the file in the output, preceded by the line containing the #include command.

The command to insert a named unit of a file is

#INCLUDE fileref,unit-name

There must be no spaces between the fileref, the comma, and the unit-name. The program searches for fileref. After finding the file, it reads the file line by line, looking for the named unit. When the unit is found, its contents are inserted in the output following the line that contained the #include command.

Once it has begun inserting text, the program treats the included text exactly as it did the main input file. If the included text contains an #include command, it will be processed to start another level of inclusion. The program may have as many as eight files active at once. That is, it can handle the main input file and as many as seven nested include files.

The #include command and its operands may be given in upper- or lowercase; it will be treated as uppercase. The command may begin anywhere in the line; it does not have to start in column 1. It may be preceded and followed by characters that make another program treat the line as a comment. For example,

ASM or MAC: ; #include utilio.inc,SetUpOutput Pascal : (* #include pasparts,typedefs *) Word Star: ..#include address.dat,wilson&co Magic Wand: \:BS\* #include chap02.fig,figure2 BASIC: 10000 REM #INCLUDE FILESUB.BAS,RANDOMIO

Defining Named Units

While reading any text, the program watches for commands that define the beginning and end of named units. These have the form

#START unit-name #END unit-name

Like #include, these commands may be given in lowercase, but are processed as if they were in uppercase. The unit-name may be as much as nineteen characters long. It may contain any character that is not a CP/M command delimiter (not a control character, a blank, or one of "[.];=:/"). The #start and #end commands may begin at any point in the line.

A named unit begins with a #start command, and runs to the end of the file or to an #end command with a matching unit-name. The lines containing #start and #end are part of the unit and are included with it.

Nested Units

It is alright to include a complete file, even when the file contains named units. One named unit may contain other named units within it. The inner units must be completely nested in the outer one; units should not overlap each other.

When INCLUDE is skipping through a file looking for the start of a named unit, it ignores all commands. When it is copying a file or unit, it processes all #start commands. Whenever it sees a #start, it notes that the named unit has now been included in the output. A later #include for that unit will be ignored.

USING INCLUDE

There are some ways of using INCLUDE that aren't immediately obvious from its specification. Let's consider them.

Simple Includes

The simple, straightforward way to use the command is to construct a separate file for each boilerplate section. Then write the main file, inserting #include lines where desired.

This obvious process can be reversed, however. You can write the main file first, inserting an #include line wherever you want to defer work on a section until later. When the main file is complete make a list of all the included files and begin work on them in the same way. At any point, a run of INCLUDE will produce a merged file containing all the work to date, and will note on the console the files that haven't been created yet.

Self-Includes

In that top-down process, the included sections don't have to be put into separate files. They can be tacked on to the end of the main file. Suppose you are working on a Pascal program called SOMEPAS. You might begin with a file like the one in Figure 6-1. Consider running that rather odd-looking Pascal program through INCLUDE. What will happen?

Any file can include a named unit from its own body. The named unit will only appear once in the output file. If the unit appears before the #include for it, the #include will be ignored. If the unit is defined after the #include line, it will appear where it is included and not where it is defined.

In the example, each time it finds an #include line, INCLUDE will open SOMEPAS.PAS again, search through it to the named unit, and insert that unit into the output file. Later, INCLUDE will encounter the named units that it has already included. It will recognize the #start unit-names as ones that have already been processed, and will skip over them.

In other words, you can build a file in the order in which you think it out, then use INCLUDE to put it into a different order. This is particularly nice with Pascal. That language requires the main program to follow its subroutines, the opposite order to the way most people design their programs.

Figure 6-1. A hypothetical Pascal program that uses self-includes.

program somepas; (* #include somepas.pas,constants *) (* #include somepas.pas,types *) (* #include somepas.pas,sub1 *) (* #include somepas.pas,sub2 *) begin sub1; sub2 end. (* #start constants *) (* #end constants *) (* #start types *) (* #end types *) procedure sub1; begin (* #start sub1 *) end; (* #end sub1 *) procedure sub2; begin (* #start sub2 *) end; (* #end sub2 *)

Nested Units

It is useful to be able to nest named units when building a library of subroutines. I used that technique with the subroutines in this book. In the Textlib file, the subroutine MoveHtoD (a general string-mover) stands alone. The subroutine FillA depends on MoveHtoD. FillA could have contained a command to include MoveHtoD. But since they were to be grouped in the same file, I chose to embed the MoveHtoD unit inside the FillA unit. That saves one nesting level when FillA is included.

In the same file, subroutine FillZero requires FillA. Again, I chose to wrap the FillZero unit around the FillA unit. If a program includes FillZero, it will get FillA and MoveHtoD without any further file searches and without increasing the depth of #include nesting. If MoveHtoD is included specifically, it alone will be inserted. If FillA is included later, the lines unique to FillA will be inserted, but the embedded MoveHtoD unit will be skipped.

Using Search Order

The ability to specify the order of drives to be searched has several uses. The first is in execution speed. It takes time to scan a disk directory for a file. The disk arm has to be moved out to the periphery of the disk where the directory is kept, and then inward toward the data when the file is found. With typical diskette drives, these two "seeks" can take as long as the total processing time for a small include file. If INCLUDE's first search for a file fails, it will try again on another drive.

The shortest run time results when most file searches succeed on the first try. If most of the included files are on one drive, that drive would usually be specified first in the /drives operand.

Search order can be used in a special way when you are developing a program. Suppose that you want to test a modification to one of your included subroutines. You might mount the standard set of include files on the B drive, then create a modified version of one of them on the A drive. By running INCLUDE with the parameter "/ab" you ensure that it will find the modified version of the test file and the standard versions of the others.

The idea can be extended to several drives. If you are supporting several versions of a program, you might have several generations of include files. Here's a hypothetical arrangement:

drive A: work in progress drive B: files changed for next release drive C: the current release of the code

Then a /drives parameter of "/c" would pick up only the released version of the program. A parameter of "/bc" would pick up the next-release program as it exists to date, and one of "/abc" would build a test version with your latest modifications.

DESIGNING INCLUDE

I knew that INCLUDE was going to be a large program, so I took extra pains in the design stages. In the end, I spent roughly sixty hours on the program, of which at least forty hours went into design work: thinking, pseudo-coding, sketching trial assembly code. Another twelve hours went into coding and typing the assembly language; the rest was spent testing, debugging, and adding last-minute improvements.

Instead of starting with a sketch of the main program, as I had done so far in the book, I began with the processing logic for including one file. I reasoned that inclusion as specified is so obviously a recursive process that there should not be any difference between processing the main file and processing an included file. The central problem was to read and process one included file, recurring (recursing?) into the same code when an #include command appeared. With that logic built, the main program would only need to set up the primary input file as if it were an included file, and then include it.

The Name and the File

Figure 6-2 shows the initial result of this effort; my plan for processing any one file.

OneFile would receive as arguments two data structures. As it turned out, the whole structure of the program revolved around the management of these two structures.

The first, a File structure, would hold the complete "state" of an input file. Under CP/M, the current state of an input file comprises its FCB, a buffer containing the record last read from it, and an index to the next byte to come from that buffer. This entire state would have to be maintained separately for each of up to eight open files. Buffers couldn't be shared among files, because reading had to be suspended while an #include was processed, and resumed at the same point afterward.

The second structure would be the Name of the data unit to be included. If that was an entire file, then the Name would be just the filename and filetype. If the included data were a named unit, then the Name would include the filename, the filetype, and the unit name. The Name record would hold these identifying strings in some format that would allow easy comparison of one Name with another, and of a Name with the operand of a command.

Figure 6-2. The logic that INCLUDE must apply to the input file and recursively to an included file.

procedure OneFile(File,Name) File is a record of the state of one open input file. Name is a formatted string of a fileref and a unit name. Searching, Skipping, and Ended are boolean flags. ShowLine { display #include line at console } Skipping := false; Ended := false if (Name.unit is null) then {whole file} Searching := false else {part file } Searching := true while (not Ended) and (not end of file after reading a line) if (not Searching) and (not Skipping) then if ("#INCLUDE") then PutLine format a Name for the included data if (it isn't in the list of processed units) then put it in the list prepare and open a File for it OneFile(new File, new Name) elif ("#START") then format a Name using our fileref and the starting unit if (it's in the list of processed units) then Skipping := true else {new name} PutLine put new Name in the list endif. elif ("#END") and (the unit matches Name.unit) then PutLine Ended := true else PutLine endif. else { either Searching or Skipping } if (Searching) then if ("#START") and (the unit matches Name.unit) then PutLine Searching := false endif. else {Skipping} if ("#END") and (the unit matches the skipped unit) then Skipping := false endif. endif. endif. end while. if (Searching) then report the unit wasn't in the file end OneFile.

The Shape of OneFile

It emerged that OneFile could be in one of three states (Figure 6-2). It could be searching for the start of a named unit. It could be processing, that is, examining lines and copying them to the output. Or it could be skipping a unit that had already been included. Boolean flags Searching and Skipping would control the logic. Another flag, Ended, would serve to stop the loop when an #end command terminated a named unit before end of file.

Should OneFile copy command lines to its output? That is, should the lines containing #include, #start, and #end appear in the output file? I decided that they all should be copied. When they are copied, you can look at the output file and tell where its different parts came from. Furthermore, there are times when it is useful to have data on the same line with a command.

Originally, the program copied the #start and #end lines of a unit even if the unit had already been included and was to be skipped. That provided good documentation of what the program was doing, but it caused problems when there was other data on the same line with #start and #end (as in Figure 6-1). It also was not consistent with the principle that #start and #end were a part of the unit they delimited. I changed the logic so that a skipped unit would be skipped in its entirety, although that complicated the program slightly.

Data Organization

Figure 6-3 shows the layout of the main data structures, and Figure 6-4 shows the plan of storage allocation. I worked back and forth between Figures 6-2, 6-3, and 6-4, revising them and trying different sequences of assembler code, for some time. (This was neither "top-down" design, nor was it "bottom-up." Rather it was the "both ends against the middle" method of program design, in my experience the one that is actually practiced by most people regardless of their stated philosophy.)

The NameTable would be a simple linear array of Name records. It would act as a symbol table for the program, recording all the units that had been processed (whether specifically included or nested in other units). Names would be entered in the table in the order they occurred. I never gave serious thought to making the NameTable a tree, a hash table, or anything more elaborate than a simple list. The whole, 128-entry, table could be searched a hundred times in the time it takes to open one CP/M file, so search time would be insignificant compared to I/O time.

The FileTable would be a linear array of File structures. Because of the recursive nature of the program, the FileTable would actually work like a program stack. The current File would be at the top of the stack. To process an #include, a new File would be created and pushed on the stack.

The size of the LineBuffer was an arbitrary choice. The operating system doesn't place any upper limit on the length of a line, but a program that looks at a file one line at a time must do so. The size of 2048 bytes represents my opinion of the absolute minimum that a CP/M utility should be prepared to accept. I don't have any files with lines as long as 2048 bytes, but I'll bet that someone, somewhere, has.

Figure 6-3. Plan for the data structures usind in INCLUDE.

NameTable is an array[1..128] of Names: a Name is a 32-byte structure: NameFile is a 12-byte fileref in FCB format, NameUnit is a string of 0 to 19 bytes, ending in 00h. NextName is a pointer to the next free Name. NameCount is a count of the Names in use. FileTable is an array[1..8] of Files: a File is a 165-byte structure: FileFlag is a byte for stowing OneFile's flags, FileIndex is an index 0..128 for GetChar's use, FileFCB is a 35-byte FCB for GetChar's use, FileBuffer is a 128-byte physical-record buffer. NextFile is a pointer to the next free File. FileCount is a count of the Files in use. LineBuffer is a space of 2048 bytes. OutBuffer is output space, determined at execution time.

The Main Program

When OneFile had been planned and the storage layout designed, the main program could be approached. Figure 6-5 shows the plan I arrived at. I didn't carry this plan to any greater level of detail, and as a result I spent more terminal time coding and debugging it than should have been necessary. However, I was anxious to get back to OneFile, the real heart of the program.

Figure 6-5. The plan of the outer program of INCLUDE. The primary input file is set up as if it were being included, then OneFile (Figure 6-6) is applied to it.

Program Include(input-ref, output-ref, /drives) if ("/drives" given) then prepare a list of drivecodes from it else prepare a list with the default drive only endif. initialize the NameTable, NameCount, NextName initialize the FileTable, FileCount, NextFile prepare a File entry for "input-ref" in FileTable[1] open the file; abort if it doesn't exist prepare a Name entry for "input-ref" in NameTable[1] set up the output mechanism in the usual way OneFile( FileTable[1], NameTable[1] ) finish the output file end Include.

OneFile in Detail

Figure 6-6 shows the logic of OneFile worked out in detail. Table 6-1 is a list of all the subroutines named in OneFile, with a brief specification of each one. It took a fair amount of work to get from the outline of Figure 6-2 to the detailed plan shown in Figure 6-6 and Table 6-1. At first, the design wouldn't settle down and come into sharp focus. Eventually, it dawned on me that I was violating an important principle of software engineering. I had unthinkingly let my decisions about data structures permeate the logic of the program, where they had no business to be. I was not practicing "information hiding."

Figure 6-6. The logic of OneFIle, worked out in enough detail to define subroutines and assign registers to variables.

procedure OneFile(File,Name) File is a record of the state of one open input file. Name is a formatted string of a fileref and a unit name. Searching, Skipping, and Ended are boolean flags. T is a pointer used to scan the line text. N is a pointer to another Name structure. F is a pointer to another File structure. ShowLine Skipping := false; Ended := false if (Name.unit is null) then Searching := false else Searching := true while (not Ended) and (not GetLine(File,T)) if (not Searching) and (not Skipping) then if (Keyword("START",T)) then if (RecordUnit(N,Name,T)) then if (Lookup(N)) then Skipping := true else {new name} Addname(N) endif if (not Skipping) then PutLine else { not #start } PutLine if (Keyword("INCLUDE",T)) then if (RecordName(N,T)) then if (not Lookup(N)) then if (Addname(N)) then if (GetFile(F,N)) then if (OpenFile(F)) then OneFile(F,N) RelFile else -- too many open files else -- no space for new name else -- unit already included else -- invalid name syntax elif (Keyword("END",T)) then if (EqualUnit(Name,T)) then Ended := true else -- not end of our unit endif. endif. else { either Searching or Skipping } if (Searching) then if (Keyword("START",T)) then if (EqualUnit(Name,T)) then PutLine Searching := false else -- not start of our unit endif. else {Skipping} if (Keyword("END",T)) then if (EqualUnit(N,T)) then Skipping := false endif. endif. endif. end while. if (Searching) then report text unit not found end OneFile.

The principle of "information hiding" was first discussed by Parnas in a landmark paper [1] . The principle is simply that the only part of a program that should be aware of how a unit of data is stored is the code that stores and retrieves it. The best design is the one that confines such knowledge to the smallest number of modules.

In my initial attempts to elaborate OneFile, I violated the principle in several ways. For example, in subroutine RecordName I planned to use the next free NameTable entry as a work area in which to build the new Name. Then the Lookup subroutine expected to find it there, and could "easily" add it to the table by incrementing NextName. But what if the table was full? RecordName—whose only proper job is to format a Name—had to check for that. But if RecordName returned a failure indication, it might be because the command contained an invalid name, or because the NameTable was full. There were similar problems in handling the FileTable.

Once I realized what I was doing wrong, the design popped into focus in a very satisfying way. RecordName would attend only to its proper business, building a Name in a scratch area that its caller must provide. Lookup would attend to its proper work, searching for a Name in the table. A new subroutine, AddName, and only it, would know how to add a Name to the table. The structure of NameTable would be known only to Lookup and AddName. In the unlikely event that it took too long to find names, a more sophisticated table organization could be implemented by changing only those two modules. In the same way, all knowledge of FileTable was encapsulated in GetFile and RelFile.

OneFile's Subroutines

The final version of Figure 6-6 calls for a number of subroutines. They are listed in alphabetical order in Table 6-1. I recorded the data of Table 6-1 on scratch paper as the pseudo-code grew and evolved. Later, I sketched pseudo-code plans for most of the subroutines (not printed here).

Table 6-1. Specification of the subroutines names in Figure 6-5. Some of these were worked out in pseudo-code, generating more subroutine requirements.

AddName(Name):returns boolean
Add the Name record to the list of all processed units. If the list is full, report the error to the console and return false. Otherwise return the new Name and true.
EqualUnit(Name,Pointer) : returns boolean
Compare the unitname in the Name record to the text addressed by Pointer. If the strings are equal, and if the text name is delimited by a blank, tab, or control character, return true. Otherwise return false.
GetFile(F,Name) : returns boolean
Get a free File record, set it up with the fileref from the Name record, and return its address in F. If there is no File record available, report the error to the console and return false. Otherwise return true.
GetLine(File,Pointer) : returns boolean
Read a complete line (through a linefeed) from the File into the global LineBuffer. Set the Pointer to the address of the first trigger (#) character seen, or to 0 if none appears. Return true if end-of-file occurs, else false.
Keyword(String,Pointer) : returns boolean
Compare the given String to the text addressed by the Pointer. If the characters match, and if the next byte is a blank or a tab, update Pointer to point beyond the keyword and return true. If no match, leave Pointer as-is and return false.
Lookup(Name) : returns boolean
Search the list of processed file-and-unit names for a match to Name. Return true if a match is found, else return false.
OpenFile(File) : returns boolean
Try to open the FCB in the File record, using the list of drivecodes given as a parameter to the program. If the file is not found on any drive, report the error to the console and return false. Otherwise return true.
PutLine
Write the line in LineBuffer to the output file.
RecordName(Name,Pointer) : returns boolean
Scan text from Pointer for "fileref,unitname." Format the file and unit names into the Name record. If the names have acceptable syntax, return true. Otherwise, report the error to the console and return false.
RecordUnit(Name1,Name2,Pointer) : returns boolean
Copy the fileref name from Name2 into Name1. Then scan text from Pointer for a unitname. Format the unitname into the Name1 record. Return true if the unitname has proper syntax. Otherwise report the error to the console and return false.
RelFile(File)
Release the File record for use (under MP/M, the file it represents should be closed as well)
ReportError(Message)
Display the current input line, followed by the given message, at the console.
ShowLine
Display the current input line at the console. Type not over 80 bytes, followed by CR, LF.

In general, each subroutine represented a piece of work that was being deferred for separate attention. But I had to look both forward toward the final code and backward at the emerging design. I kept changing things around until I was sure of two things. From experience, I know that subroutines like these ought to come out to be of roughly equal sizes. If the subroutines vary widely in their complexity, if the list of subroutines contains both very trivial and very complicated specifications, then something is going wrong with the design. I revised several times to reduce the "lumpy" feel of the subroutine list. Since the subroutines would be implemented in assembly language, I tried to keep their interfaces simple. I aimed at simple parameters that could be passed in the machine registers.

In previous programs in this book, a subroutine either succeeded at its work or it aborted the entire program. Most of the OneFile subroutines had to detect and report errors, then return some kind of pass/fail signal to their caller. That's the significance of "returns boolean" in the specifications of Table 6-1; it makes it possible to use a subroutine as the operand of a pseudo-code "if" statement. I planned to use the 8080 CPU's Zero flag for this signal.

In part, specifying so many subroutines (many called only once) was just a design tactic, a way of deferring decisions of local scope until the global pattern had been tidied up. The pseudo-code subroutines did not have to be implemented as subroutines in the final code. "GetFile," for example, is only used in one place; it could have been coded in-line with the code that calls it. I did not do so for several reasons.

First, the design tactic is equally useful as a coding tactic. When coding OneFile, I could write

call GetFile jnz ...

without thinking at all about the details of what GetFile had to do. It put quite enough strain on my powers of concentration to make sure that the register arguments to GetFile were correct. Later I could code a GetFile subroutine without overloading my mind with the details of what would happen next in OneFile. I find it very hard to underestimate the capacity of my own brain!

Furthermore, implementing the pseudo-code exactly in assembly code makes it easier to compare the two. The fewer the differences, the easier it is to verify the assembly code by eye.

THE INCLUDE PROGRAM

The assembly code of INCLUDE appears in Listing 6-1. It is a large program, some 950 lines exclusive of included code. It calls on a large number of included subroutines, several used here for the first time in the book.

The program contains no especially clever assembly language tricks or techniques. Someone new to the 8080 might look at the code of Keyword (lines 554-572), which has to return HL either updated or not; the use of XTHL to accomplish this is not obvious at first. Other than that, the thousand lines of INCLUDE are bland as milk. All the mental gymnastics went into its design.

Command Syntax

INCLUDE interprets three commands; some decisions had to be made about the syntax rules it would impose. CP/M is afflicted with a plethora of delimiter characters, and they are applied in inconsistent ways in different parts of the system. Officially, a command operand is supposed to be terminated by a blank, a tab, a comma, or the end of the line; the slash and the left-bracket are official delimiters in MP/M 2. The Console Command Processor of CP/M 2 actually recognizes only the blank and the end of line (try it). The colon and the period are delimiters within filerefs but not outside them. The semicolon terminates a command in CP/M 2, but heads a file password in MP/M 2.

I had to impose some order on this confusion; you may or may not agree with what I did. The included routine Delimiter tests a byte for membership in the set of all possible delimiters. Delimiter calls any control character a delimiter, so a carriage return, a linefeed, a tab, or the null byte with which the CCP ends the command tail are delimiters.

The tab and the space are equally "white space." Thus the white space characters are a subset of the delimiters. The included subroutine WhiteSpace tests specifically for white space, and SkipWhite advances HL over any amount of whitespace. After using SkipWhite, the calling routine has to decide whether the nonwhite character that follows is a delimiter or not. In a similar way the routine that calls Delimiter learns whether or not it is looking at a delimiter, but it has to decide for itself if that delimiter is the particular one—a period, a comma, a slash—that it expects.

I chose to allow any amount of white space between a verb and the first operand. Perhaps it is inconsistent that no white space is allowed between #include's first and second operands, but that was easier and did not seem to produce unpleasantly restrictive commands.

String Values

INCLUDE has to deal with two kinds of strings. It has to scan over the raw text of a line of input, recognizing keywords and extracting filerefs and unitnames. And it has to handle formatted strings of characters: comparison keywords and Names of included files.

Working with strings of bytes is simpler when a string can be terminated by just one kind of delimiter. Where INCLUDE stores a string for later comparison, it marks the end of the string with a null byte, 00h. This is a very convenient convention. It makes it easy to compare two strings. When the two strings are equal in their leading bytes, but one is shorter than the other, then the first 00h byte will end the comparison, automatically revealing the shorter string as "less." I designed the Name record so that it contained a string in this format. The first eleven bytes of a Name contain a fileref. If there is no unit name, the next byte is 00h. If there is a unit name, it follows the fileref and ends in 00h. Then two Name records may be compared as strings using the included routine CmpString. The routine CmdStrText compares a string delimited in this way to a string embedded in raw text, but it leaves it to the caller to decide if the text string is properly terminated.

Boolean Results

Some problems arose when coding the subroutines that return a boolean flag to their caller. I wanted to use the Z flag consistently for this purpose. Sometimes this works out very naturally, but sometimes it takes extra, artificial code to make sure that the subroutine will return the correct flag under all circumstances.

With the 8080, it is hard to return a flag setting independently of the contents of the A register. As a result, I had to relax the strict rule that a subroutine would preserve all registers except those which contain its result. A subroutine that returns a flag-setting is allowed to modify the A register even when A will not contain a result.

IMPROVING INCLUDE

I believe that INCLUDE as it stands is an excellent tool with many applications. However, any program can be improved. One change would make it easier to use self-includes. The #include command could accept a null fileref operand as meaning "use the active fileref." Thus if this line were found in the file PROG1.PAS,

#include ,sub1

it would be the equivalent of

#include PROG1.PAS,sub1

In some applications, it might be necessary to include the same file or unit more than once. There should be some way of saying "include this regardless." One reason I did not add this facility is that I couldn't see any simple, obvious way to add such an option to the #include command.

Finally, with quite a bit of extra design work, INCLUDE could be given some form of macro, or text-substitution, ability.

[1] D. L. Parnas, "On the Criteria to Be Used in Decomposing Systems into Modules." The paper originally appeared in Communications of the ACM, Vol. 5, No.12 (December 1972); it has been reprinted as Chapter 12 of Classics in Software Engineering, Edward Nash Yourdon ed., Yourdon Press, 1979.