Last month I wrote a long post titled ASCII, NFO Art & Text Encodings. That covered the complications and technical difficulties involved in accurately rendering ASCII art and NFO text in modern browsers. It also dealt with how I overcame those difficulties and developed a backend rendering engine to accurately display text files hosted on Defacto2.net in the browser as text.
RetroTxt is a handy little extension that takes old fashioned text files and stylises them in a more pleasing visual format. Despite the web being built on text, web browsers are often incapable of accurately displaying texts written during the pre and early web eras. This is where RetroTxt comes in! It imports the text into a modern format, injects a font that mimics a chosen retro computer, then applies styling to improve the display and readability. It can also double up as a text viewer for your locally stored, off-line files and even serve as an NFO text viewer.
The best way to explain RetroTxt is to see it in action with this short 54 second muted video. Where I browse a few NFO files hosted on textfiles.com showing both before and after RetroTxt has been applied.
I’ve tried to keep the extension as light and hands off as possible within the confines of the restrictive Chrome extension API. While still giving users the ability to easily customise fonts, colours and other theme settings. It offers a choice of 25 retro fonts, 11 colour themes, text alignment options as well as some presets for MS-DOS, Commodore Amiga, Commodore 64, Apple II and even a DOS font, web hybrid.
Any web developers looking to browse the code would probably be interested in functions.js and text.css. As they contain the core functionality of the text character conversions, font colours and CSS styling. The rest of the code is mostly for the extension functionally and UI.
Surprisingly this ability to display files created on a text standard from 1981 has been mostly out of reach for numerous reasons that I will get into later. But for now, the last missing piece of the puzzle, the need for proper bitmapped DOS fonts converted into a web friendly format. Has been solved thanks to the marvellous work of Viler and his Ultimate Oldschool PC Font Pack.
Before the availability of Viler’s font pack, you couldn’t properly display ASCII/NFO art in a modern operating system without a specialised application to either view or convert the text into an image.
But image conversion has its downsides. For one text embedded into an image isn’t searchable, nor is it selectable, transferable or assessable. So the text can only be read using a fixed colour and small font size. Plus web browsers themselves add further limitations by placing memory restrictions on the size of the images they are able to load.
For mostly novelty value you can switch between these 4 sets of DOS era fonts while viewing the text files; 9 pixel VGA, high-resolution thin CGA, IBM PC BIOS and Tandy 1000 series BIOS. And there are a number of colour combinations too, DOS grey on black, white on black, monochrome green on black, black on white and a gimmick black on white with CSS shadow effects. Not surprisingly some files look better with different colour and font combinations.
For any web developers out there the basic implementation involved taking a DOS encoded text file and reading it using the Windows 1252 character set.
I looked up all the common ASCII art characters that are malformed by using this incorrect code page, pattern matched and replaced them with their UTF-8 coded equivalent.
For example, when using CP-437 the lower half block glyph ▄ is represented as decimal 220. With the Windows 1252 code page which has no lower half block character, decimal 220 returns the Ü glyph.
After loading an ASCII art file using Windows 1252, I replace all the incorrect Ü glyphs with U+2584 or ▄ and then display the converted document in the Unicode compatible, web browser friendly UTF-8 encoding.
The translated body of text is wrapped between a set of <pre></pre> tags on a HTML5 page rendered as UTF-8. A CSS font-face rule is inserted to remotely load Viler’s Truetype font and apply to it the content of the <pre> tags.
So why did this take so long?
There were a couple of problems that hampered this process.
First and foremost was the font issue. Even if Viler had done this font conversion in the 2000s they would have been useless for web developers such as myself. The ability to force browsers to download and use specific fonts was only introduced in CSS3 and wasn’t widely implemented in most browsers until a few years ago.
But the more frustrating complication was character encoding. The character sets used in DOS are non-standard and are not supported in modern operating systems. These encodings assign each character, letter, digit or glyph a unique numeric reference. Without the right character set, DOS text files will never display correctly in web browsers.
The common character set in use today is UTF-8 which itself is an implementation of Unicode. Unicode didn’t implement all the extended DOS characters until revision 3.2 released in 2002. Yet it took an extremely long time for that standard to be supported by modern web browsers. It was only after the widespread adoption of HTML5 that we saw progress with the in browser support for DOS era block and box characters.
Older browsers may not support all the HTML5 entities in the table below. Chrome has good support. But (currently) only IE 11+ and Firefox 35+ support all the entities.
ASCII explained in a long winded historical context
Today people generally refer to text-based art as ‘ASCII’ but that is a misuse of the acronym. The first ASCII (American Standard Code for Information Interchange) standard that we associate with text encoding came about from a 1963, binary based standard known as the ASA (American Standards Association) standard X3.4-1963. It was severely limited in a number of ways including the complete lack of lowercase lettering. But unlike earlier telegraph and teletype communications encoding schemes. It was designed from the ground up for computers and programs rather than human operators.
Just months after the release of the ASA standard, the ISO (International Organization for Standardisation) announced the intention to improve the obvious deficiencies in the encoding scheme. What became of that was EMCA-6 (European Computer Manufacturers Association) and adopted as ISO/IEC 646. ASCII X3.4-1967 was the United States adoption of this 1965 standard where “ASCII” became the common use name and is still today the basis of many modern character code sets including Unicode.
Unfortunately, there are numerous names for the identical standards depending on who publish or adopted them. ASCII X3.4-1967 was later renamed to ANSI X3.4-1967 (American National Standards Institute) and again to US-ASCII but can also be known under its ISO 646 classification. Still to this day people often shorten the names to either ANSI or ASCII and confusingly mean the same thing. Or worse interchange ANSI for Windows-1252 due to a historical Microsoft mislabelling. For simplicity I will refer to ASCII X3.4-1967 as ASCII for the remainder of this text.
ANSI: Acronym for the American National Standards Institute. The term “ANSI” as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community. The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft—which became International Organization for Standardization (ISO) Standard 8859-1. “ANSI applications” are usually a reference to non-Unicode or code page–based applications.
ASCII gave American English glyphs a unique binary code that could also be sequentially counted. In fact, the standard US keyboard layout today is still only able to output the same ASCII character code set standardised during the late 1960s. And I would imagine that this basic keyboard layout still greatly influences the syntax of many modern programming languages.
In ASCII X3.4-1967, the upper-case A is encoded as 100 0001 in 7 bit binary. For humans and web developers it is represented by decimal 65.
B is 100 0010, decimal 66.
C is 100 0011, decimal 67.
D is 100 0100, decimal 68.
And so on.
Despite being a 7-bit encoding scheme with 128 character possibilities, only 94 are used as display characters comprising of upper/lowercase letters, numerical digits, common pronunciation marks, mathematical and Fortran programming symbols.
The remainder are known as control characters (or control codes) and were designed to allow computers to share the text with other machines. Or to control the formatting on output devices such as displays and printers. It was up to the devices themselves as to which control characters to support as different types of machines had different requirements.
Many of these control characters are now redundant and do nothing in a modern computing sense but some are still in use.
SP spacebar, ESC escape key, HT tab key ↹, SI SO shift keys ⇧, DEL to remove the character at the active position, BS backspace ←, CR and LF enter and return keys ↵.
Interesting, adhering to the ASCII standard to start a new line requires the sending of two control characters, the CR and LF. This is to return the cursor back to the start of the line then drop it onto the new line. Windows, DOS and a number of legacy computers still use this method. While on Unix (and Linux, OSX, Amiga) they dropped this two character requirement as it was unnecessary and wasteful. So on those systems either a single CR or LF will create the new line.
Back in 1981 when IBM introduced their IBM PC running PC-DOS, it was not fully ASCII compatible. Depending on the machine’s purpose and market, IBM gave its computers custom character encodings. It would designate these encodings as Code Page [number] with the original PC receiving Code Page 437 or CP437. The character glyphs associated with these code pages were stored in the computer’s ROM as easy access for programs.
The IBM PC was 8-bit and so it could support a character set of up to 256 characters. The CP437 mostly contains the ASCII standard but dropped support for the control codes. These were instead replaced with programmer friendly glyphs intended for use with text user interfaces. These same glyphs became the basis of the modern ASCII art scene despite not having anything to do with ASCII text per se.
There were a number of problems with IBM’s approach, though. The lack of control codes meant that text documents created on a PC had to use glyphs to simulate pseudo control functions and it was up to the text editor or viewer to decide how to format and display.
For example a right pointing arrow → glyph in DOS is used to mark the end of file. And there is no proper tabbing support because an ASCII HT (horizontal tab) control code in DOS can also be used to display a ○ glyph. This makes DOS text less portable and many of its non-standard glyphs will usually not display on other machines.
In each of these new code pages, many of the more frivolous glyphs were dropped or moved around. But there were no means of embedding to the file which code page was used to author the text. It was left to the reader to work it out themselves.
EMCA-94 is an 8-bit character code standard from early 1985 and was designed to add better internationalisation to the original 7 bit ASCII X-67 reference. Its key goal was communication interoperability and so all the unnecessary display characters were ignored or dropped. This is important to note as many of the block, shade and line characters associated with ASCII art were included in that rejection. The standard became known as ISO/IEC 8859-1 in 1987 with other groups of languages gaining support in subsequent releases.
From its first release, Windows adopted the ISO 8859-1 standard but replaced some of the control codes with additional characters and rename it Code Page 1252. But it is more commonly known today as Windows-1252. ISO 8859-1 is compatible with Windows-1252 but not the other way around.
Because of this text files created in DOS will not accurately display in Windows and vice versa without some kind of prior conversion.
Thankfully today much of the web has moved onto Unicode-based encodings that remove the issues with incompatible character code sets and legacy encodings. As it gives each glyph a unique identifier and all languages now use the one common code page.
A will always be represented as U+1D00
À as U+00E0
█ as U+2588
‖ as U+2016
This is the main reason to use the Unicode compatible UTF-8 encoding to display ASCII art in the browser. And why until now it has been rather difficult to accurately display many text files created with MS-DOS in modern web browsers as text.
Now we are free to run and view over 75 demos, 500 cracktros and 400 BBS loaders in the browser without having to fuss around with downloads, package extractions, emulation configurations and possible troubleshoots.
More importantly nearly 200 PC e-mags (electronic magazines) are readable in-browser. These contain thousands of articles and interviews with scene participants of the era that were previously inaccessible to most people.
You can even play around with numerous intro makers, ANSI editors, software patches and cracking tools. Here are just a few highlights.
Overall I am very happy with the results. There is full-screen support with correct aspect ratio on widescreen monitors and even the ability to capture screenshots and save them to the browser’s download folder.
Finally, it uses web friendly, linkable DOS hardware configurations that users can change by submitting a web form or an URL query string. A wide range of hardware configurations is supported from CGA through to Super-VGA, Covox Speech Thing to the Gravis Ultrasound (with sound patches). Even the ability to change the screen render and effects so you can emulate a CRT monitor or keep a clean, crisp LCD look. The implementation uses modular, automatic configurations with the goal to be as low maintenance as possible.
There have been a few interesting observations learned from scene applications by my rolling out and troubleshooting this project.
In the early days scene members or scene’rs both in the cracking and demo communities expected a high technical knowledge from its user base. Take this question from Cascada’s 1990 X-Mas demo, “Please select your outport in hex for the sound”. As you will see there is very little handholding.
Scene’rs of the era were frequently teenage, bedroom coders with very little Q/A experience. An application may have claimed support for a set of hardware but often the implementation was untested and sometimes broken. In the days of DOS, there were no multimedia APIs. So it was left to the software, not the operating system to implement support for hardware. But amateur programmers often did not have access to all the hardware they attempted to support.
A couple of show-stopping issues I encountered were related to sound emulation. By default DOSBox has multiple types of audio hardware emulated at once. This would confuse and ultimately crash some intros that tried to detect which soundcard to support. I was forced to implement a setup that only permitted the emulation of one sound card at a time.
DOSBox uses Super VGA as default, as it is the most compatible graphics configuration for DOS games. However, I found with some intros the SVGA machine type would create glitches and even outright crash whereas the VGA machine type would work seamlessly.
Razor 1911 Ninja Gaiden 2 loader in SVGA and VGA machine modes.
A real annoyance that took a long time to troubleshoot was a ZIP archive incompatibility. The Emularity uses the BrowserFS library to mount zip files into EM-DOSBox as the emulated DOS c: hard drive. This works fine most of the time, except not all zip archives are created equal. There are some incompatibilities with legacy archives created with ancient copies of PKZIP’s DOS archive tools. BrowserFS still mounts these zip archives but some of the hosted files will fail and be unreadable. Needless to say, it was a real pain to troubleshoot the first time it cropped up. This is not a problem that is unique to BrowserFS and I have also seen it with modern desktop tools such as 7-Zip.
Some intros, especially from the demo scene, take a ridiculously long time to load. I am guessing this could be due to unoptimised decompression, decryption or self-modifying code? There is not much that can be done about this, unfortunately.
Finally, the only other downside to this implementation is its initial user download and setup. For first time users, there is around 40MB of transfers which is a silly large background download for a web page. Thankfully those on slower connections or older hardware can turn off the emulation functionality completely and this happens by default for all iOS and Android mobile devices.
Other then that I hope you enjoy your time warp to the late 20th century and have fun hacking those emulated DOS prompts.
P.S. Last month Defacto2 celebrated 20 years of existence online. Here’s hoping for another 20 years!
First of I just want to say thankyou to everyone who has submitted files over the past year or so. The site now preserves an additional 4,000 scene produced productions most of which were submitted by a few individuals, well done!
On the subject of scene productions I have received a portable hard drive from Scize which contained his 1.7 TB collection of original scene releases mostly from the 1990s. It’s sourced from a number of personal collections that he has exchanged with over the years. So there are bound to be duplicates and it will take a long time to go through. But needless to say the site will continue to receive updates for a long time to come! So thanks again to Scize for going out of his way to make this happen.
If you have a collection of original warez releases that you’d like to exchange or donate to Scize feel free to get in contact with him at firstname.lastname@example.org. He has a personal website listing much of that collection at http://scenelist.hopto.org/.
On the more technical side I have applied some software changes and updates to the server so https://defacto2.net should be more reliable and faster. These changes also introduced SPDY 1.3 over HTTPS support which gives a very noticeable improvement to the pages with lots of thumbnails. I have personally seen some load times cut by half. Eventually the site will transition over to HTTP2 which should offer similar results for modern browsers over a secure connection. Though next on the list of updates is probably a switch to the SQL database software.
Defacto2 has changed servers, received a back-end upgrade and changed the overall look + feel of the site!
What is new?
Front-end has been redesigned to use Bootstrap, reduce complex navigation clutter and introduce a uniformed user interface. Icons have replaced a significant number of the text descriptions to hopefully make the site seem less cluttered and overloaded.
New and improved listings for files, links, search results, people and groups.
Files thumbnails can now be viewed in different sizes or in a more usable list mode and are now shown in their correct aspect ratio.
Fixed a significant number of back-end bugs.
Implemented both reactive and fluid design to improve support for mobile and desktop screens.
A new server and provider hopefully means improved bandwidth, so downloads for many users should be faster.
File details page layouts have been improved with options to view or play the file in browser, download it or export the database data as a JSON document.
User accessible links for file assets such as screenshots, thumbnails, image previews which are all covered under a liberal Creative Commons Attribution 4.0 licence.
Scene website portal has narrowed in focus to now only show sites that relate to the files hosted on the site. So links to alternative computer formats, emulation, etc have been culled.
Why the change?
I wanted to future proof the front-end code-base and migrate it to HTML5. The previous version of the site was stuck on XHTML 1.0. While compatible with all modern and most legacy browsers, it is an evolutionary dead-end. Keeping to XHTML it made implementing new functionality difficult so the change was needed. Besides the overhyped browser features of HTML5, it lays out the pages in a more logical manner which makes it easier to work with.
While the new HTML5 code-base will break many old browsers there is always the legacy HTML3 edition of the site. That will work in every browser and has access to every file served and hosted on Defacto2.
Site works great in modern Firefox, Chrome, Safari both on mobile and desktop. It does mostly work in Internet Explorer 9 but fails to render correctly in IE10 and IE11. There is no intention to fix this and instead these IE users are forced to view the page in IE9 quirks mode. I hope Internet Explorer will eventually catch-up to support the HTML5/CSS3 functionality that is breaking the layout.
We are happy to announce that after this week’s upload of 300+ files on our Arts & Files archive, Defacto2 is finish! Well by that we really mean the long queue of files for processing that we have sat on for years is finally now complete. This means unless we receive a large and unexpected future file donation there will probably be no more large, bulk file updates such as this.
These are some of the highlights of this current dump, enjoy.
It may have seemed that we have been a little quiet at Defacto2 of late but there has been a bit of work behind the scenes. This week a significant backend upgrade has been rolled out to improve usability and file navigation.
Redesigned the ‘Arts & files’, ‘Organisations’, ‘People’ and ‘Our favourite sites’ items navigation. To me it looks more cluttered but it should be more logical and easier to use. The categories and platforms have been given more meaningful names and have small descriptors that popup when the mouse is hovered over the link. While the navigation controls have now been clustered together. Thumbnails now have Sort By headers rather than being thumbnail icons lined side-by-side.
Introduced a HTML 3 edition of the site. Only the Art & files section of the site has been converted into this mostly text based, legacy format. I thought some people on slower connections or who are using legacy PC’s to obtain the hosted files would prefer this format. Plus seeing as the site is mostly focused on the online activities of the 1990s it seemed apt that we introduced a mock 1990s, web FTP edition of Defacto2. You can find this retro mode at www.defacto2.net/html3
Removed all the social network buttons within the site (except the welcome page) as they potentially tracked users and slowed the site down. If you want to remove all tracking (such as Google Adsense, Google Analytics etc) visit HTTPS://www.defacto2.net instead of HTTP://www.defacto2.net
Twitter has removed all support for RSS/Atom feeds which the Defacto2 welcome page relied on for the Twitter Wall function. This feature has been temporarily removed but will return once the it has been reproduced using Twitter’s 1.1v API.
Added improved tablet and mobile phone CSS optimisations.
Individual groups, sites and organisation now have their own XML feeds that you can use with a feed reader to track their new file submissions.