[spotlight] Data storage as audio recordings

This was a mini-spotlight conversation: I was relating some findings about Acorn’s technology for reading data from audio cassettes, and Mesar has an interest in perhaps using similar techniques for cold storage. It is the case that recordings from 40 years ago are, mostly, recoverable today.

Acorn’s system records at 300 or 1200 baud, using 1200Hz and 2400Hz tones to represent ones and zeros. (This is, or was, a standard.) Both left and right stereo tracks will be recorded onto. The signal is derived from a four-level approximation to sine waves, with an exact number of waves used for each bit: twice as many waves for the high frequency.

Acorn cassette format - BeebWiki
Serial ULA - BeebWiki

The interesting bit is in recovering the data, from a noisy weak signal which may run at a slightly different speed and also have speed fluctuations (and therefore also frequency fluctuations.) It’s also possible that the consumer audio circuits are not too careful about preserving phase. The signal being represented as one or other pure tone should help here. (Even so, note that MP3 recordings are sufficiently lossy that they probably won’t work at 1200 baud.)

I’d expected Acorn’s input stage to detect one or other tone using relatively high-Q filters, but that’s not how it was done. (The forum thread also describes two other, very different, decoding approaches: Nascom, and Compukit. Acorn’s is felt to be particularly good.)

What Acorn’s circuit does is to apply a bandpass filter and then square off the signal, in the analogue domain, and then detect the remaining zero-crossings in the digital domain. Each incoming edge is used to produce 4 rapid-fire clock edges, and if there’s a timeout waiting for an edge (corresponding to a wave of high frequency tone) then that’s noted as representing a low frequency tone, and another burst of clocks is produced.

The net result is that the same number of clocks is produced both for zero and for one bits, and the timeout flag signals whether the data was zero or one. The clock signal and data signal are then passed to a commodity chip, an ACIA or UART, which detects the byte-level framing. Each byte is preceded by a zero start bit and followed by a one stop bit.

Acorn’s system also has a block-level structure: a block of up to 256 bytes is recorded with a header, a CRC, the data, and a CRC. The header includes the filename, block number, data address, and other metadata. There’s a gap between blocks too, which allows the tape transport to be stopped and started if need be.

Because of the CRC protection, it has proved possible to recover some surprisingly bad recordings, by sketching in probable waveforms and seeing if the CRC matches up. Using the better of the left and right tracks has also been helpful. Important data, of course, could be recorded in duplicate or triplicate form, to improve the chance of recovery.

It would also be possible to add some proportion of blocks carrying Forward Error Correction data, but I don’t think that was done back in day.