ASCII, NFO Art & Text Encodings

After last month’s successful rollout of the JavaScript DOS emulation throughout the site. There has been one other gripe I have been wanting to overcome on Defacto2 and that is the accurate display of ASCII art and NFO text in a browser.

Surprisingly this ability to display files created on a text standard from 1981 has been mostly out of reach for numerous reasons that I will get into later. But for now, the last missing piece of the puzzle, the need for proper bitmapped DOS fonts converted into a web friendly format. Has been solved thanks to the marvellous work of Viler and his Ultimate Oldschool PC Font Pack.

Home of the world’s biggest collection of classic text mode fonts, system fonts and BIOS fonts from DOS-era IBM PCs and compatibles

You may recognise Viler as one member of the 2015 team that developed the technically amazing and competition winning 8088 MPH demo. Which impossibly pulled off the feat of displaying an image using 1024 simultaneous colours on a 1981 era home/office PC.

Before the availability of Viler’s font pack, you couldn’t properly display ASCII/NFO art in a modern operating system without a specialised application to either view or convert the text into an image.

ANSILove is a set of tools to convert ANSi and artscene-related file formats into PNG images

But image conversion has its downsides. For one text embedded into an image isn’t searchable, nor is it selectable, transferable or assessable. So the text can only be read using a fixed colour and small font size. Plus web browsers themselves add further limitations by placing memory restrictions on the size of the images they are able to load.

Thanks to Viler’s fonts and some code page character conversions I created (really a hack). Most of the text files on Defacto2 are now accurately displayed in the browser as HTML text which eliminates all those mentioned issues using images.

Here are a few examples.

For mostly novelty value you can switch between these 4 sets of DOS era fonts while viewing the text files; 9 pixel VGA, high-resolution thin CGA, IBM PC BIOS and Tandy 1000 series BIOS. And there are a number of colour combinations too, DOS grey on black, white on black, monochrome green on black, black on white and a gimmick black on white with CSS shadow effects. Not surprisingly some files look better with different colour and font combinations.

Before: The original DEADLINE.NFO poorly rendered by Chrome
Now: In browser render of DEADLINE.NFO
deadline shadow
CSS Shadow effects
select all
All text is searchable and selectable
Text pasted into Notepad

For any web developers out there the basic implementation involved taking a DOS encoded text file and reading it using the Windows 1252 character set.

I looked up all the common ASCII art characters that are malformed by using this incorrect code page, pattern matched and replaced them with their UTF-8 coded equivalent.

For example, when using CP-437 the lower half block glyph ▄ is represented as decimal 220With the Windows 1252 code page which has no lower half block character, decimal 220 returns the Ü glyph.

After loading an ASCII art file using Windows 1252, I replace all the incorrect Ü glyphs with U+2584 or ▄ and then display the converted document in the Unicode compatible, web browser friendly UTF-8 encoding.

The translated body of text is wrapped between a set of <pre></pre> tags on a HTML5 page rendered as UTF-8. A CSS font-face rule is inserted to remotely load Viler’s Truetype font and apply to it the content of the <pre> tags.

So why did this take so long?

There were a couple of problems that hampered this process.

First and foremost was the font issue. Even if Viler had done this font conversion in the 2000s they would have been useless for web developers such as myself. The ability to force browsers to download and use specific fonts was only introduced in CSS3 and wasn’t widely implemented in most browsers until a few years ago.

But the more frustrating complication was character encoding. The character sets used in DOS are non-standard and are not supported in modern operating systems. These encodings assign each character, letter, digit or glyph a unique numeric reference. Without the right character set, DOS text files will never display correctly in web browsers.

The common character set in use today is UTF-8 which itself is an implementation of Unicode. Unicode didn’t implement all the extended DOS characters until revision 3.2 released in 2002. Yet it took an extremely long time for that standard to be supported by modern web browsers. It was only after the widespread adoption of HTML5 that we saw progress with the in browser support for DOS era block and box characters.

Older browsers may not support all the HTML5 entities in the table below. Chrome has good support. But (currently) only IE 11+ and Firefox 35+ support all the entities.

ASCII explained in a long winded historical context

Today people generally refer to text-based art as ‘ASCII’ but that is a misuse of the acronym. The first ASCII (American Standard Code for Information Interchange) standard that we associate with text encoding came about from a 1963, binary based standard known as the ASA (American Standards Association) standard X3.4-1963. It was severely limited in a number of ways including the complete lack of lowercase lettering. But unlike earlier telegraph and teletype communications encoding schemes. It was designed from the ground up for computers and programs rather than human operators.

Just months after the release of the ASA standard, the ISO (International Organization for Standardisation) announced the intention to improve the obvious deficiencies in the encoding scheme. What became of that was EMCA-6 (European Computer Manufacturers Association) and adopted as ISO/IEC 646. ASCII X3.4-1967 was the United States adoption of this 1965 standard where “ASCII” became the common use name and is still today the basis of many modern character code sets including Unicode.

Unfortunately, there are numerous names for the identical standards depending on who publish or adopted them. ASCII X3.4-1967 was later renamed to ANSI X3.4-1967 (American National Standards Institute) and again to US-ASCII but can also be known under its ISO 646 classification. Still to this day people often shorten the names to either ANSI or ASCII and confusingly mean the same thing. Or worse interchange ANSI for Windows-1252 due to a historical Microsoft mislabelling. For simplicity I will refer to ASCII X3.4-1967 as ASCII for the remainder of this text.

ANSI: Acronym for the American National Standards Institute. The term “ANSI” as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community. The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft—which became International Organization for Standardization (ISO) Standard 8859-1. “ANSI applications” are usually a reference to non-Unicode or code page–based applications.

ASCII gave American English glyphs a unique binary code that could also be sequentially counted. In fact, the standard US keyboard layout today is still only able to output the same ASCII character code set standardised during the late 1960s. And I would imagine that this basic keyboard layout still greatly influences the syntax of many modern programming languages.

In ASCII X3.4-1967, the upper-case A is encoded as 100 0001 in 7 bit binary.  For humans and web developers it is represented by decimal 65.

  • B is 100 0010, decimal 66.
  • C is 100 0011, decimal 67.
  • D is 100 0100, decimal 68.
  • And so on.

Despite being a 7-bit encoding scheme with 128 character possibilities, only 94 are used as display characters comprising of upper/lowercase letters, numerical digits, common pronunciation marks, mathematical and Fortran programming symbols.

The remainder are known as control characters (or control codes) and were designed to allow computers to share the text with other machines. Or to control the formatting on output devices such as displays and printers. It was up to the devices themselves as to which control characters to support as different types of machines had different requirements.

Many of these control characters are now redundant and do nothing in a modern computing sense but some are still in use.

SP spacebar, ESC escape key, HT tab key , SI SO shift keys , DEL to remove the character at the active position, BS backspace , CR and LF enter and return keys .

Interesting, adhering to the ASCII standard to start a new line requires the sending of two control characters, the CR and LF. This is to return the cursor back to the start of the line then drop it onto the new line. Windows, DOS and a number of legacy computers still use this method. While on Unix (and Linux, OSX, Amiga) they dropped this two character requirement as it was unnecessary and wasteful. So on those systems either a single CR or LF will create the new line.

Back in 1981 when IBM introduced their IBM PC running PC-DOS, it was not fully ASCII compatible. Depending on the machine’s purpose and market, IBM gave its computers custom character encodings. It would designate these encodings as Code Page [number] with the original PC receiving Code Page 437 or CP437. The character glyphs associated with these code pages were stored in the computer’s ROM as easy access for programs.

The IBM PC was 8-bit and so it could support a character set of up to 256 characters. The CP437 mostly contains the ASCII standard but dropped support for the control codes. These were instead replaced with programmer friendly glyphs intended for use with text user interfaces. These same glyphs became the basis of the modern ASCII art scene despite not having anything to do with ASCII text per se.

There were a number of problems with IBM’s approach, though. The lack of control codes meant that text documents created on a PC had to use glyphs to simulate pseudo control functions and it was up to the text editor or viewer to decide how to format and display.

For example a right pointing arrow glyph in DOS is used to mark the end of file. And there is no proper tabbing support because an ASCII HT (horizontal tab) control code in DOS can also be used to display a glyph. This makes DOS text less portable and many of its non-standard glyphs will usually not display on other machines.

ascii with control codes
An ASCII document with control codes that in DOS CP437 should reference arrow glyphs
ascii doc typed in dosbox
The same ASCII file in DosBox, but the left arrow fails to show
dos fail
FreeDOS Edit mostly fails with the file
notepad fail
Notepad in Windows 10 formats the file fine but the DOS specific arrow glyphs fail to show

The original CP437 set had limited use for international languages. So numerous incompatible DOS code pages were created, each with nonmeaningful numeric references to target different groups of languages. CP-850 for Western Europe, CP-852 for Central Europe, CP-860 Portuguese, CP-865 Nordic, etc.

In each of these new code pages, many of the more frivolous glyphs were dropped or moved around. But there were no means of embedding to the file which code page was used to author the text. It was left to the reader to work it out themselves.

EMCA-94 is an 8-bit character code standard from early 1985 and was designed to add better internationalisation to the original 7 bit ASCII X-67 reference. Its key goal was communication interoperability and so all the unnecessary display characters were ignored or dropped. This is important to note as many of the block, shade and line characters associated with ASCII art were included in that rejection. The standard became known as ISO/IEC 8859-1 in 1987 with other groups of languages gaining support in subsequent releases.

From its first release, Windows adopted the ISO 8859-1 standard but replaced some of the control codes with additional characters and rename it Code Page 1252. But it is more commonly known today as Windows-1252. ISO 8859-1 is compatible with Windows-1252 but not the other way around.

Because of this text files created in DOS will not accurately display in Windows and vice versa without some kind of prior conversion.

Thankfully today much of the web has moved onto Unicode-based encodings that remove the issues with incompatible character code sets and legacy encodings. As it gives each glyph a unique identifier and all languages now use the one common code page.

  • A will always be represented as U+1D00 
  • À as U+00E0
  • as U+2588 
  • as U+2016

This is the main reason to use the Unicode compatible UTF-8 encoding to display ASCII art in the browser. And why until now it has been rather difficult to accurately display many text files created with MS-DOS in modern web browsers as text.

Additional sources.

  1. An annotated history of some character codes or ASCII: American Standard Code for Information Infiltration.
  2. Standard EMCA – 6 7 -bit Input/Output Coded Character Set 4th Edition 1973
  3. KreativeKorp CP437
  5. Code Page 1252 Windows Latin 1 (ANSI) with its misuse of the ‘ANSI’ acronym

DOS Emulation

Taking the lead from the Internet Archive, in the past couple of weeks, I have rolled out a significant update to the site. One in which we can now all run Defacto2’s entire collection of  1,350+ DOS scene productions online and in the browser!

The idea for this has been simmering for the past couple of years ever since the Internet Archive’s first DOS emulation announcement. But I have held back, waiting for the quality of the JavaScript emulation to improve, specifically its audio.

Thankfully that time came and I’ve successfully rolled out a very customised implementation of The Emularity, a front-end for Em-DOSBox; an Emscripten port for the famous DOS emulator. I believe this, for the most part, is the same set of tools that the Internet Archive uses for its massive DOS emulation project?

Now we are free to run and view over 75 demos, 500 cracktros and 400 BBS loaders in the browser without having to fuss around with downloads, package extractions, emulation configurations and possible troubleshoots.

More importantly nearly 200 PC e-mags (electronic magazines) are readable in-browser. These contain thousands of articles and interviews with scene participants of the era that were previously inaccessible to most people.

You can even play around with numerous intro makers, ANSI editors, software patches and cracking tools. Here are just a few highlights.

Rave out to Faith’s 1993 Caesar’s Palace cracktro
Be wowed by Future Crews Unreal 1.1 demo with GUS audio
Pretend you’re a 1990s hipster viewing Paradigm’s Rouge Squadron cracktro running on a horribly underpowered machine
Learn about ANSI Bombs and ANSI Music by reading ACiD’s The Product

Overall I am very happy with the results. There is full-screen support with correct aspect ratio on widescreen monitors and even the ability to capture screenshots and save them to the browser’s download folder.


Finally, it uses web friendly, linkable DOS hardware configurations that users can change by submitting a web form or an URL query string. A wide range of hardware configurations is supported from CGA through to Super-VGA, Covox Speech Thing to the Gravis Ultrasound (with sound patches). Even the ability to change the screen render and effects so you can emulate a CRT monitor or keep a clean, crisp LCD look.  The implementation uses modular, automatic configurations with the goal to be as low maintenance as possible.


There have been a few interesting observations learned from scene applications by my rolling out and troubleshooting this project.

In the early days scene members or scene’rs both in the cracking and demo communities expected a high technical knowledge from its user base. Take this question  from Cascada’s 1990 X-Mas demo, “Please select your outport in hex for the sound”. As you will see there is very little handholding.


Scene’rs of the era were frequently teenage, bedroom coders with very little Q/A experience. An application may have claimed support for a set of hardware but often the implementation was untested and sometimes broken. In the days of DOS, there were no multimedia APIs. So it was left to the software, not the operating system to implement support for hardware. But amateur programmers often did not have access to all the hardware they attempted to support.

A couple of show-stopping issues I encountered were related to sound emulation. By default DOSBox has multiple types of audio hardware emulated at once. This would confuse and ultimately crash some intros that tried to detect which soundcard to support. I was forced to implement a setup that only permitted the emulation of one sound card at a time.

DOSBox uses Super VGA as default, as it is the most compatible graphics configuration for DOS games. However, I found with some intros the SVGA machine type would create glitches and even outright crash whereas the VGA machine type would work seamlessly. 

Razor 1911 Ninja Gaiden 2 loader in SVGA and VGA machine modes.

A real annoyance that took a long time to troubleshoot was a ZIP archive incompatibility. The Emularity uses the BrowserFS library to mount zip files into EM-DOSBox as the emulated DOS c: hard drive. This works fine most of the time, except not all zip archives are created equal. There are some incompatibilities with legacy archives created with ancient copies of PKZIP’s DOS archive tools. BrowserFS still mounts these zip archives but some of the hosted files will fail and be unreadable. Needless to say, it was a real pain to troubleshoot the first time it cropped up. This is not a problem that is unique to BrowserFS and I have also seen it with modern desktop tools such as 7-Zip.

7-Zip failing with old archives

Some intros, especially from the demo scene, take a ridiculously long time to load. I am guessing this could be due to unoptimised decompression, decryption or self-modifying code? There is not much that can be done about this, unfortunately.

Finally, the only other downside to this implementation is its initial user download and setup. For first time users, there is around 40MB of transfers which is a silly large background download for a web page. Thankfully those on slower connections or older hardware can turn off the emulation functionality completely and this happens by default for all iOS and Android mobile devices.

Other then that I hope you enjoy your time warp to the late 20th century and have fun hacking those emulated DOS prompts.

P.S. Last month Defacto2 celebrated 20 years of existence online. Here’s hoping for another 20 years!