*************************************
|         Star Fighter 3000         |
|       Documentation Project       |
*************************************
|  Reference 8: Compression format  |
|       Changed:  CJB 03.01.11      |
*************************************

N.B. This text is best viewed at a display width of 77 columns.
     To create an index do a 'List of found' search for ">>".

-----------------------------------------------------------------------------
>> Contents:
   =========
8.1 Introduction
8.2 File types
8.3 Compression format
8.4 Known implementations
8.5 Implementation quirks

-----------------------------------------------------------------------------
>> 8.1 Introduction
   ================

  This document describes a compression format used to reduce the size of
(and perhaps obfuscate) files belonging to several RISC OS games: 'Chocks
Away' (published by The Fourth Dimension), 'Stunt Racer 2000' and 'Star
Fighter 3000' (both published by Fednet Software).

-----------------------------------------------------------------------------
>> 8.2 File types
   ==============

  There follows a list of filetypes that compressed files may have. Beware
that files of type &400 are not necessarily compressed because the 'Stunt
Racer 2000' track editor saves non-compressed tracks with this type!

Type   Contents
----   --------
&154   'Star Fighter 3000' or 'Stunt Racer 2000' compressed code or data.
&300   'Star Fighter 3000' compressed polygonal objects set.
&400   'Star Fighter 3000' compressed ground map, or
       'Stunt Racer 2000' track (not necessarily compressed).
&402   'Star Fighter 3000' compressed objects map.
&401   'Star Fighter 3000' compressed ground map overlay, or
       'Stunt Racer 2000' rec file.
&403   'Star Fighter 3000' compressed objects map overlay.
&404   'Star Fighter 3000' compressed sky colours.
&405   'Star Fighter 3000' compressed mission data.
&406   'Star Fighter 3000' compressed sky pictures set.
&407   'Star Fighter 3000' compressed map tile graphics set.
&408   'Star Fighter 3000' compressed ground map animations.
&FFD   'Chocks Away' or 'Stunt Racer 2000' compressed code or data
       (not easily identifiable).

-----------------------------------------------------------------------------
>> 8.3  Compression format
   =======================

  The first 4 bytes of a compressed file give the expected size of the data
when decompressed, as a 32 bit signed little-endian integer. Gordon Key's file
decompression module 'FDComp', which is presumably normative, rejects input
files where the top bit of the fourth byte is set (i.e. negative values).

  Thereafter, the compressed data consists of tightly packed groups of 1, 8
or 9 bits without any padding between them or alignment with byte boundaries.
A decompressor must deal with two main types of directive: The first (store a
byte) consists of 1+8=9 bits and the second (copy previously-decompressed
data) consists of 1+9+8=18 or 1+9+9=19 bits.

The type of each directive is determined by whether its first bit is set:

0.   The next 8 bits of the input file give a literal byte value (0-255) to
   be written at the current output position.

     When attempting to compress input that contains few repeating patterns,
   the output may actually be larger than the input because each byte value
   is encoded using 9 rather than 8 bits.

1.   The next 9 bits of the input file give an offset (0-511) within the data
   already decompressed, relative to a point 512 bytes behind the current
   output position.

     If the offset is greater than or equal to 256 (i.e. within the last 256
   bytes written) then the next 8 bits give the number of bytes (0-255) to be
   copied to the current output position. Otherwise, the number of bytes
   (0-511) to be copied is encoded using 9 bits.

    If the read pointer is before the start of the output buffer then zeros
  should be written at the output position until it becomes valid again. This
  is a legitimate method of initialising areas of memory with zeros.

    It isn't possible to replicate the whole of the preceding 512 bytes in
  one operation.

-----------------------------------------------------------------------------
>> 8.4 Known implementations
   =========================

  In 'Chocks Away' and 'Stunt Racer 2000' the contents of compressed files
were extracted by a module named 'DeComp' (©1993 The fourth dimension)
which provides a single command *CLoad - analogous to the standard command
*Load. In 'Star Fighter 3000' a similar module is supplied encrypted but it
is named 'FDComp' (© Gordon Key 1994) and its command is *LComp.

  Apart from an unbalanced stack bug having been fixed (when SWI OS_Find
returns an error) the two modules appear to be virtually identical. Actually
'FDComp' doesn't behave much better in the same situation because *LComp
returns with V set but R0 still pointing to the command tail rather than an
error block. It seems safe to assume that Gordon Key wrote both modules.

  The 'Stunt Racer 2000' track designer released through Visions Of The
Impossible saves compressed tracks using a module named 'Comp' (©1993 The
fourth dimension) which provides a single command *CSave - analogous to the
standard commmand *Save. David O'Shea's desktop version of the track designer
incorporates a re-implementation of the 'DeComp' module in C. He also wrote
a compression module (not publicly released).

   Until release 39, Chris Bazley's C programmers' library 'CBLibrary'
incorporated code heavily based on David's compressor and decompressor. In
later releases, this will be superseded by new state machine implementations.

-----------------------------------------------------------------------------
>> 8.5 Implementation quirks
   =========================

  The decompressors written by the Fourth Dimension and David O'Shea always
copy at least 1 byte from the source offset, even if the compressed bitstream
specified 0 as the number of bytes to be copied. A well-written compressor
should not insert directives to copy 0 bytes and no instances are known in
the wild. CBLibrary's new decompressor will treat directives to copy 0 bytes
as invalid input.

  The Fourth Dimension's compressor terminates a sequence when the sum of its
offset and size is 511, even if the most recent byte matched the data to be
compressed. This explains why the decompressor only expects 8 bits for the
size of sequences starting at byte 256.

  All compatible compressors must impose a size limit of 255 or 511 on
sequences starting at offsets 256 or 0 respectively. The most recent byte
can't be included in a sequence starting at either of those offsets (e.g.
[255 .. 511] and [257 .. 511] are valid but [256 .. 511] isn't).

  David O'Shea's compressor only searches the most recent 511 bytes for
matches, so it produces different output from The Fourth Dimension's
compressor for the same input. (A sequence starting at the oldest byte may be
as long as, or longer than, a later sequence.)

  An optimal compressor should allow sequences to include the most recent
byte (at offset 511) unless the start offset is 0 or 256.