The stated aim of this project is to "investigate techniques for efficient software-based graphics APIs". To this end, I intend to design and produce an efficient 3-D graphics library for the ARM microprocessor, with RISC OS being the target operating system. This platform is typified by a lack of hardware graphics acceleration, making it an ideal candidate for a software graphics renderer.
Although I shall only implement a 'toy' library that is capable of quickly throwing polygons around the screen (e.g. for 3-D games), ideally the graphics API adopted will be general enough to also be used for applications where advanced graphics rendering is required.
I shall begin by describing the hardware and software background for the project.
In 1979 Dr Hermann Hauser and Dr Chris Curry founded Acorn Computers Ltd. During the early 1980s Acorn produced a series of home computers - the kit-built Atom, the BBC Microcomputer and its cheaper cousin the Electron. All were based upon the 6502, a popular off-the-shelf 8-bit processor from Western Design Centre. Over 150,000 BBC Micros were sold in the first two years, virtually monopolising the British schools market.
For their next generation computer Acorn wanted a windowing system, such as that pioneered by the Apple Lisa. This meant a move to a new architecture, but the 16-bit processors that they looked at all had very complex instruction sets and poor interrupt response. However the experimental microprocessor built by graduate students at Berkeley's RISC project looked much more promising. 
At this point I should probably explain the basic tenets of the RISC philosophy of processor design:
A conventional CPU such as the Intel 80x86 has a large number of instructions, many of which are quite powerful (multimedia extensions etc.). These CPUs are therefore known as CISC (Complex Instruction Set Computer) processors, and because of the complexity of their design they execute code relatively slowly.
RISC (Reduced Instruction Set Computer) processors such as ARM or Sun's SPARC are designed on the principle that CISC processors spend most of their time executing a only small subset of simple instructions. By providing only these commonly used instructions, a RISC processor can run code more quickly and efficiently.
On the occasions where a CISC processor does use complex instructions, a RISC processor may need a number of instructions to perform the equivalent task. However, performance may still be comparable since a single slow instruction is being replaced by several fast ones. 
RISC processors are also sometimes called "load-store". This refers to the characteristic that memory access is simplified or restricted, to load and store operations. The reasoning behind this is that memory access is relatively slow compared to register access, so the majority of instructions should operate on registers rather than memory. 
In general, RISC processors have the following characteristics:
The RISC philosophy is a very software-oriented rather than hardware-oriented approach. Libraries of efficient routines are necessary to allow programmers access to such niceties as complex mathematical functions. Good compilers for high-level languages such as C/C++ are also an asset, to allow the programmer to use concepts not necessarily directly supported by the CPU.
The Acorn RISC Machine was designed by Steve Furber with Sophie Wilson (who had developed BBC BASIC). They started work in late 1983, with Furber designing the architecture, and Wilson developing the instruction set.
Unsurprisingly, the instruction set has a passing resemblance to that of the 6502. It is straightforward to hand-code ARM assembly language, unlike some RISC processors which rely on sophisticated compilers to manage complicated instruction interdependencies. 
The first ARM ran on April 26th 1985, making it arguably the first commercial RISC processor ("MIPS for the masses"). The ARM was only part of a family of custom chips developed for Acorn's first RISC computer, the Archimedes. These included MEMC (memory controller), VIDC (video & sound controller) and IOC (timing, interrupts, peripherals). 
Key features of the ARM processor:
The fact that every instruction is conditionally executed allows many branches to be eliminated entirely, speeding execution. The ARM's barrel shifter is another unique idea, which allows the equivalent of two or more instructions to be combined into one.
Because of these features, ARM code is both efficient and dense, compared to other RISC processors. Despite relatively low clock speeds and a short pipeline, in operation the ARM is equivalent to much more complex and power-hungry processors. Low-power consumption and high MIPS-to-watts ratio make it ideal as an embedded processor, e.g. for hand-held devices.
Whilst early ARM processors such as the ARM1 and ARM2 had no cache, the ARM3 was the first to break this trend, featuring a 4kb on-chip cache. More recent CPUs such as the ARM610/710 have a cache as standard.
In later versions of the ARM architecture, the program counter was extended to a full 32-bits, with the program status register moving to a dedicated register - the CPSR (current PSR), with a SPSR (saved PSR) for each privileged mode. 
Apart from incremental increases in clock speeds, other improvements have included the addition of DSP-like fast 64-bit multiply instructions and the availability of hardware floating point systems. The ARM processors around which current RISC OS machines are built range from the 56Mhz ARM7500FE at the low end to 300Mhz StrongARM in clock speeds.
In 1990 Acorn's microprocessor group was spun out as a separate venture 'Advanced RISC Machines', backed by Apple and manufacturers VLSI Technology. Today, this company dominates the world market for embedded processors: "ARM processors are teetering on the verge of ubiquity in widgets of all shapes, sizes, and functions. Nintendo's Game Boy Advance, mobile phones too numerous to mention..." 
The first operating system to be known as RISC OS was RISC OS 2 of 1988, which was so-called because it was in fact the second operating system for the Archimedes computer. The first had been "Arthur", which was essentially a hasty port of the BBC Microcomputer's OS, bundled with a primitive desktop written in BASIC(!)
By contrast, RISC OS 2 provided a proper desktop environment in which multiple applications could run simultaneously, exchanging data between themselves and with the Filer by a drag-and-drop user interface. Subsequent releases have gradually improved the OS both aesthetically and internally, and updated it for new hardware, but it has never undergone a large-scale overhaul. 
RISC OS has the following characteristics:
The fact that RISC OS is supplied on ROM rather than disc has a number of advantages: It cannot be damaged or lost by viruses or accident, and since it does not need to be loaded into memory it is much faster to start up. When running it does not take a significant proportion of the computer's memory. Admittedly it is harder to upgrade ROMs, but this can generally be done by soft-loading replacement modules from disc. 
Because it is ROM-based, RISC OS is ideal for embedded systems and network computers. Designed to be usable on low-end Archimedes computers that had no hard disc and low-resolution monitors, the suitability of RISC OS for TV set-top box products is obvious. In particular the quality of RISC OS's font rendering at low resolutions has attracted praise.
Acorn were involved in Oracle's network computer project, and brought at least one NC to market. Pace Micro Technology (the new copyright holders) are currently using RISC OS in their set-top boxes and in consumer products such as the Bush Internet TV.
The following typographical conventions are followed in the rest of this document:
Where direct quotations or information from other authors is included, the source is attributed using a reference number in square brackets, e.g. . This can be looked up in the numbered list of references given at the end of this document.
When computer software is referred to, the name is italicised rather than bracketed by quotation marks, e.g. "TechWriter is based on EasiWriter, with the addition of a powerful equation editor."
Equations, variables and other mathematical expressions also appear in an italicised font, e.g. "The tangent at the point where , cuts the -axis at ."
The general convention for any quoted code, function or SWI
name is that it appears in a monospaced font, e.g. "Changes
made to array data between the execution of
glBegin and the corresponding execution of
glEnd may affect calls to
Where larger code examples are given to illustrate a technical point, the section is additionally highlighted by a grey background:
Most code examples are given either in BBC BASIC or ARM assembly language (as above). The former is generally used for client program code whilst the latter is generally used for module code. Whilst a full explanation of neither language is within the scope of this report, the following sections give a (very) basic grounding in the syntax of some of the commands used.
Since many of the code examples involve calling SWIs (see
section 3), it may be helpful to know the syntax of BASIC's
A comma-separated list of expressions may follow the SWI
name, each an argument to be passed in one of the ARM
registers R0-R7. Numbers are converted to integers and placed
directly into a register whilst strings are passed by
pointer. Any registers omitted from the list (indicated by
,,) are zeroed. After the optional
TO, a similar list specifies output variables in which
the returned values of registers are to stored. Again,
registers may be omitted from the list. Finally, a trailing
semicolon and variable can be used to retrieve the state of
the processor flags on exit from the SWI.
SYS "OS_Find",&40,"foo" TO
handle% would call the
OS_Find SWI with
&40 in register 0 (meaning open existing file with read
access) and register 1 pointing to the string "foo". The
return value of register 0 (a file handle) would be stored in
the BASIC integer variable
Another BASIC keyword commonly used in this document is
REM, which simply means that the rest of the
line is a comment, to be ignored by the interpreter.
In general the instruction mnemonics and options used follow the de facto standard - that in Peter Cockerell's book . The general syntax of an assembler source line is as follows:
An address label at the beginning of the line is terminated by a colon, followed by the instruction. Comment text is prefixed by a semicolon, and is ignored by the assembler. The label, instruction and comment are all optional parts of the source line.
ARM instructions are referred to by a mnemonic such as
LDR (load) or
MLA (multiply with
accumulate). All instruction mnemonics may be postfixed by a
condition code that must be satisfied for the instruction to
be executed (see section 2.3). Examples might be
RSBMI (reverse subtract, if negative flag set) or
SWIVC (software interrupt, if overflow flag
Operands are specified as a list of comma separated
registers, which are referred to by number as
R0-R15. Alternative names for
SP (stack pointer),
PC (program counter), reflecting
their usual roles. The commonest (though by no means the
only) format for instruction operands is as follows:
ADD R0,R1,R2 would perform
R2. To complicate
matters, for many instructions <operand2> may be
either a constant value, a register, a register shifted by a
register, or a register shifted by a constant value:
For example, ADD R0,R0,R0,ASL#1would multiply
R0 by 3 (
Finally, a convention that I use to aid readability of ARM code is that conditionally executed instructions are indented. This is analogous to the way loops and other constructions are indented in well formatted C sourcecode. Groups of instructions dependent on different condition codes are indented slightly by different amounts:
Furthermore, where a sequence of complimentary comparisons
are made, the indentation is cumulative. The following code
branches on the condition
It is probably obvious that this indentation scheme is less
than foolproof, but generally conditional constructions
follow a few well known patterns (the cumulative AND, the
cumulative OR etc) and the indentation works nicely.
1 The program counter (address from which instructions are fetched) and processor status (flags and current mode/interrupt state) are packed into a single register, R15. Therefore less than the full 32-bits are available for the PC address.
2 Arithmetic and logic unit.
that the assembler treats
LSL as synonymous, since shifting a 2's compliment
signed value n bits to the left multiplies it by
2n whether or not it is negative. The same is not
true of the right-shifts