TMS9918, The First Sprite Chip – Presentation at VCF SouthWest

Stepping back 48 years into my past life

Matthew Hagerty’s 9918 FPGA Emulator

Last week, I gave a presentation at The Vintage Computer Festival Southwest (VCF SouthWest) about the development of the TMS9918 (and related devices), which I helped define, design, simulate, and debug from September 1977 through mid-1978. In addition to the presentation, I participated in a one-hour question-and-answer session hosted by Matthew Hagarty. Matthew has developed a series of FPGA-based 9918 emulators, the latest of which is pin-compatible except for providing R, G, and B output rather than composite video. Dave Ackley, the person at Texas Instruments who coined the term “sprite” and defined the 9918 early on before being assigned to another program, joined the Q&A session.

While I am best known today for writing about Augmented Reality (AR) displays and optics, I spent my first 20 years as an IC architect and designer at Texas Instruments. My first device was the TMS9918, where I defined how the DRAM interface would function. Along with Pete Macourek, I also defined how the Sprite pre-processing and processing would work. Most famously, the TMS9918 was used in the TI-99/4 and TI-99/4A home computers, the ColecoVision, the MSX Computer, and the Sega 1000/3000. Yamaha made register-compatible superset chips, the 9938 and 9958. Nintendo, which had developed software for the Colecovision, developed its video chip for its Family Computer and NES game systems, copying many aspects, including the 9918’s sprite preprocessing and processing, from the 9918.

Not only did the TMS9918 development define the way sprites worked for a generation of game players, but it was also the first consumer device to interface directly with dynamic random-access memory (DRAM). This, in turn, led to my involvement in defining the Multi-Port Video RAM (VDRAM), a DRAM with both a random and a serial port. The VDRAM, in turn, was the starting point that led to today’s synchronous DRAMs (SDRAM and GDRAM).

Links to the Videos and my Presentation

My VCF presentation lasted 75 minutes, including 10 minutes of questions and answers and is linked to below:

A PDF copy of my TMS9918 presentation is available here.

The TMS9918 VDP Chat, a Question and Answer session with Matthew Haggerty, Dave Ackley, and me, was held earlier in the day before the presentation and can be found below:

Related Discussions about the TI 99/4 and TMS9900 with Granville Ott (TI 980 and 99/4 computers)

Granville Ott developed high-performance seismic computers at Texas Instruments in the 1960s. While working as a university professor, he was asked to return to TI to head the development of the 99/4. I interacted with Granville during my development of the TMS9918. Granville discussed the 99/4 and the 9900 computer in his presentation below (I chimed in occasionally, based on my experience with the TM9918, TMS9995, and TMS99000).

Granville also participated in a panel on 1970s (and earlier) computing:

Some of my other work at Texas Instruments

After working on the TMS9918 in January 1979, just 1.5 years out of college, I was appointed the lead CPU architect of the TMS9995 16-bit microprocessor. The 9995 was to be the CPU for the TI 99/2 and 99/8 home computers, had TI not pulled out of the home computer market. I was then the lead architect of the TMS99000 (99110 and 99105), which, among other applications, served as the CPU in TI’s minicomputers.

Following the TMS99000, I led the development of the TMS340 family of graphics processors (TMS34010 and TMS34020) and their associated peripherals. The TMS34010 combined my experience with the TMS9918 and the work on microprocessors to define the first fully programmable graphics processor. While developing the TMS34010, I recognized the acute need for a specialty memory designed for computer graphics and helped define the VDRAM (DRAM with an added serial port). The TMS34020 was a somewhat enhanced version of the 34010 with a 32-bit data port. The TMS340 and its TIGA software family were used in approximately 100 graphics products between 1987 and 1992.

After leading the development of the TMS340 family, I led the development of the TMS320C80 and TMS320C82 image processors, which combined a RISC CPU with either four (C80) or two (C82) DSPs, video control, and a crossbar-accessible memory, all on a single die.

Thanks to the Conference and my connections with Erik Olson, I’m in the process of transferring thousands of pages of scanned documents from my work at TI to the Texas Instruments Records Collection at SMU.

After leaving TI, I entered the startup world as the CTO of Silicon Display in 1998, where we were developing an LCOS-based head-worn display. At Silicon Display, I designed the silicon backplane and the FPGA that controlled it. After leaving Silicon Display, I began thinking about how to improve LCOS backplanes, which led to my role as founder and CTO of LCOS maker Syndiant. After leaving Syndiant, I started this blog as a way to stay involved with technology while I figured out what to do next. After starting the blog, I cofounded Navdy and served as its CTO. Following my departure from Navdy, I became the Chief Scientist of RAVN.

Conclusions and Comments

I had a lot of fun discussing the good old days of the 1970s and the TMS9918. It was as if I were reliving a different life. I originally started this blog to discuss my work at TI from 1977 through 1998, but as fate would have it, the topics drifted into augmented reality displays and optics.

Next time, I will return to discussing AR, including the things I learned at SID Display Week and AWE 2025.

Karl Guttag
Karl Guttag
Articles: 297

11 Comments

  1. Hello Karl. Thank you very much for the detailed presentation regarding the history and the engineering of TMS9918! I have two questions:

    1) When on the TI product timeline TMS9118 (TMS9918 equivalent designed to work with the bidirectional data bus) was conceived (at the same time or much later)?

    2) Did Yamaha contacted TI to get the precise functional model of TMS9918 to achieve 100% backward compatibility when building their V9938/58 and YM2602 (Sega Master System VDP also designed and produced by Yamaha)?

    As I’ve spent about a year with the logic analyzer to perform the complete functional reverse engineering of V9938/58, Yamaha really copied exactly the VRAM bus timing for 9918 screen modes, and I can only assume that back in 1983/1984 Yamaha had to make a significant engineering effort to completely reverse TMS9918 timing and functionality if no information was sent from TI, also the accessibility to the advanced logic analysis equipment was very very limited those days (HP16x6A LAs were all new back then). And Yamaha did a pretty good job finding all the non-documented features of 9918. In contrary another clone of TMS9918 wasn’t that exact -> TOSHIBA T6950, which had no implementation of the VRAM mirroring and many other undocumented features.

    Thank you!

    • Thanks,

      Regarding your questions:

      1) No, the TMS9918 was originally designed to work with 4Kx1 DRAMs (specifically the Mostek 4027, which was also done by TI). The “A” version modification was made to support 16Kx1 DRAMs. The By-4 DRAMs were not even on the drawing board at the time the 9918 was designed in 1977. Note that the 16×4 (4416) was a variation of (and after) the 64Kx1. I have documents that show that what became the TMS9118/28/29 effort didn’t start until 1982.

      2) As far as I am aware, there was no cooperation between Yamaha and TI. TI was working on its own AVDP/9228, which I was only peripherally involved with (I was working on the TMS34010 and VRAM/TMS4161). ASCII Microsoft’s (a joint venture at the time) Kazuhiko Nishi was pushing for a new VDP for MSX2 in Japan. As I understand it, Nishi was simultaneously pushing TI and several Japanese companies to develop a register-compatible successor to the 9918.

      It is interesting that Yamaha copied the 9918 so exactly, whereas Toshiba did not. I think Yamaha was directly following the guidance of ASCII Microsoft. Note that there were many MSX computers, and both Sega and Nintendo wrote software for the Colecovision using the TMS9928, so there was a lot of in-depth knowledge of exactly how the 9918 worked. As I mentioned in one of the VCF Southwest videos, we didn’t intentionally include any “undocumented features” in the 9918; those that crept in were oversights rather than intentional additions.

      Karl

  2. Hi Karl. I code games for the Colecovision for homebrew publishers so I’m a big fan of your creation. I enjoy pushing the limits of what can be accomplished with the hardware to reproduce arcade games that at first glance wouldn’t be technically feasible. I’ve pulled off smooth side scrolling for some games, for another game I used a mix of tiles and sprites to get 30 animated multicolor characters active on the screen simultaneously with minimal flicker, and for the game I’m currently working on I’m drawing lines on the fly in bitmap mode. I enjoyed your discussion of the chip as I had never given much thought to the technical challenges that you had to overcome but clearly you succeeded where the VDP’s of other consoles like the Channel F, the Astrocade, and the Odyssey II fell short and provided lower resolutions, less colors, and ran very very slowly.

    The Colecovision gets credit for being the technical inspiration for the the NES and SG1000 but really it was your chip being used well in the Colecovision that gave us that next generation.

    • Thanks for the kind words.

      Its impressive that you worked around the horizontal scrolling limitation. If I could have one “do-over” it would have been to have horizontal scrolling on the 9918. We kluged it in on Sprites so they could scroll into the screen from the left.

      There were the memory cycles to do horizontal scrolling on the background and it would not have been that complicated, at least if we kludged it similar to sprites. We could have had extra Name table (since the main name table had 64 entries per line, i.e., a power of 2) and a 15-bit (to support the worst case) rather than 8-bit background pattern shift register, for example. All in all it would not have been that many transistors. But there was only Pete Macourek and me to do all display architecture of the 9918 and both of us were young (I was only a few months out of college and the 9918 was my first full design).

      It funny/sad that very few today know that Nintendo and Sega derived their Sprites from the 9918.

  3. You also designed the TMS34010.
    It was a great success in the arcade and in a few high end machines and on high end expansion cards, but it was never a put into a console or consumer PC, as it seems TI had in mind.

    In retrospect, did you overdo it or should there have been another mid tier VPU?

    It seems there would have been a huge market for an 4 – 8 bit colour chip with, blitting and fast polygon fill, put out around 84 to 87 at around the 10.000 to 30.000 transistor mark.
    A bit like an optimized and more specific Agnus and Denice.

    Did anyone do that, or at least think about it apart from the Amiga guys?

    • First, the 34010 was more my strategy than TI’s overall. After the TMS9918, I served as the lead architect for the TMS9995 and TMS99000. Both the 9995 and 99000 were design successes in that they worked as designed; it were clearly shooting behind the market with an antique workspace pointer architecture. The 9995 was primarily designed for the TI Home computer in the 99/2 and 99/8. It was must faster than the 9900 it was to replace, but the TI Home Computer was both a financial and design (terrible architecture) basket case.

      I didn’t want to do the 99000 as it was clearly too little too late, but I was still only 4 years out of college and didn’t have much say in the matter. I realized that TI didn’t have a strategy by the time I had enough pull to develop my own graphics strategy; unfortunately, I didn’t have enough pull to get it funded well.

      Additionally, TI was about a generation behind other companies in CMOS CPU I.C. process technology. TI was slow to switch from bipolar to NMOS and then again late to CMOS. Our feature sizes, metal grades, and other key process parameters were always behind those of other major companies.

      I started on what became the 34010 in 1982. The only accelerators were the likes of the NEC 7220. But I only had myself and two other people to architect it. I looked at the work of Xerox PARC; even Apple’s Lisa had not come out yet. I went for a fully programmable CPU architecture. I wanted to go “RISC,” but we couldn’t figure out how to implement the then-complex “BitBlts/PixBlts” (bit-aligned array transfers) in a RISC architecture at the time.
      There was a lot of political infighting in TI, making it hard to secure resources for the 34010. Ironically, the 9918, which I was one of the key designers of in 1977, took off in 1982 and 1983 with Colocovision, the TI Home computer, and the MSX computer of Japan. All of a sudden, managers who didn’t care a thing about the 9918 wanted a register-compatible successor. So resources were split between an AVP that was too late and the 34010.
      While we were designing the 34010, EGA and VGA were introduced, as were the LISA, Mac, and Windows 1.0. Windows 1.0 was released just two months before we got working 34010 silicon.

      Once the 34010 came out and gained some traction in 1986/87, we tried to work with Microsoft to get into Windows. The problem was that their code was built around the VGA architecture. They would only send very low-level commands to us, such as moving a pixel or drawing a single character. The 34010 was designed to handle the entire display list, not just one little thing at a time. It took the PC longer to send us commands than it took the 34010 to execute them, so it was impossible for the 34010 could not show a significant advantage. We were told that much of the MS-Windows code was written by people who had taken their stock options and left Microsoft, and that they could not change it.

      At the same time, foundationless companies, most notably ATI (founded in 1985), emerged, producing VGA chips with higher resolutions and simple BitBlt capabilities (leading to the VESA organization). The capability better aligns with what Windows would allow the accelerator to do. Additionally, ATI was getting better silicon (better technology and higher yield) from TSMC at lower cost than I could get from TI’s internal FABs which we had to use. From about 1986 to the early 1990’s, the 34010 and later the 34020 dominated the higher-end PC accelerator board space for applications like Adobe Photoshop and CAD.

      I spent two years working with the Apple Mac team to get the TMS34020 accepted as the graphics accelerator for the color Mac. The deal was all done, but then a couple of VPs at TI, who knew nothing about Apple or Graphics, had to step in and try to renegotiate a deal that was already done, and Apple, in the form of Jean-Louis Gassee (then CTO of Apple), kicked us out (and broke my heart).

      The above is the short version of the story.

  4. I had the opportunity to listen to (not watch!) the interviews with Granville Ott. I’m not sure if you were properly introduced on the 5-hour one, but I was later surprised to learn it was you! During that long discussion, patents came up, and that gave me some research ideas.

    These videos filled in a whole bunch of questions I had about the indirect history of the 990 platform. However, there’s also a bunch of stuff in there that contradicts other narratives, or raises weird questions that the interviewer chose not to follow up on. Alas. (Wait, did he actually say there was a real Z80-based 99/4 prototype!? That rumor is real!?)

    But this post isn’t about any of that.

    I’ve been doing an extensive research project on the TMS99105A and friends, and thus the 990 platform. I got a real kick out of reading your patents on the 99000 series chips this past week. Not only do the patents indirectly answer a whole bunch of my questions, they also *directly* answer at least one of them: Yes, this thing *is* older than me. The ‘105A chip I have is marked 1981, but the patents were filed on November 24, 1980. Most of your work would have had to have been done by that date. I was born two days later.

    I have questions! I don’t blame you if your memories of 1980ish are a bit hazy. I just turned 45 and I sure as hell can’t remember anything from 1980! But some of my earliest memories involve That One Blue of my older brother’s 99/4 as he tinkered in BASIC. It was one of the many things that lead me to becoming a software engineer.

    I’m not a 9918 fanboy, don’t worry. I’m weirder: all of my questions are about Macrostore. The Macrostore is the single feature that drew me to the ‘105A over the ’95 for fun tinkering hobbyist purposes. These questions are not just for personal curiosity, but to help me build a narrative about this wonderful little thing. The questions are also obscure and you may not even had an answer at the time because they involve things way outside your area of responsibility.

    1) I would love to know more about the 99105/110 Macrostore. What lead to it? I’m asking for a timeline, I think. The name and concept at least existed early enough to be on the 9995, but was this a coherent vision and evolution of a product line (a rare thing for TI, it seems)? To be honest, it feels like a “what if we just formalized undefined opcode handling instead of punting to interrupt 2 like everything else” decision — a natural evolution instead of a planned one.

    2) For the 99105/110, it kind of feels like Someone Important asked “Hey, what if we could do (read: sell our customers) the 990/12 Writable Control Store on a microchip?” and someone else answered with this idea. Wording in the patent even suggests this. But, from what I can tell from the way it’s used in the 990/10A, the 99000 itself seems to have no internal Macrostore at all. This confuses me. Do you have any insight you can share about the development of the Macrostore feature and interactions with the appropriate 990 group(s)? To what level was there actually any coordination or interaction there to begin with? Yup, look at me, asking for your business requirements more than 45 years later.

    3) It also seems unclear if there was any software or hardware that took advantage of the Macrostore or even the in-Macrostore-XOP emulation feature of the 99000 on the 990/10A, at least from TI. Do you know or believe otherwise? Does the mythical Pascal-opcodes-in-Macrostore 99120 fit in here, or would that be again assuming wrongly that TI had *coordinated* cross-organizational competence?

    3a) Exactly how mythical is the 99120? I understand that it got to the point of having working engineering samples? Can you share what was being done with it by users / testers / designers / the intended audience? From what I’ve read of TI Pascal, this could have been a really cool chip.

    4) What can you tell us about the choice to remap the three single bit CRU instructions to the in-Macrostore interrupt detection jumps? In specific, was there ever any consideration of making them “real” instructions with their own opcode instead of hijacking the CRU opcodes?

    And finally for now,
    5) TI’s naming system suggests that an “A” or “B” etc following a processor model number indicates a correction, revision, or even full version change. So, we have TMS99105A and TMS99110A public products, and the TMS99000 in the 990/10A… but where are / were the TMS99105 and TMS99110? What can you tell us about this, and other 99000 variations?

    5a) The TMS9995 tech manual available on the internet suggests that early versions shipped with a bug that prevented the math exception interrupt feature from working correctly. Was a corrected version ever shipped, and if so, did it not earn an “A”?

    —-

    Below here lies fanboy gushing.

    I think the 9900 series and the 990 platform in general ended in true tragedy. I don’t fathom what went wrong with the 9940/85. How hard could it possibly have been to glue a 9901 to a 9900 and multiplex the external interfaces? (I say this tongue in cheek, knowing nothing at all about how hard it really is.)

    That’s where I think this platform could have been a winner. Harvard architecture microcontrollers are where it is nowadays and the 990 platform had all the ingredients to have made it happen. A well-fed supply chain of well-marketed and working EPROM 9940s could have changed history. Imagine if TI had had the organizational competence!

    I’m kind of glad it didn’t work out, because I like the end result of the 99000s.

    I’m doing a lot of things at once with this project, including writing a giant essay around the 99000. I decided I actually wanted to understand what I was working with, and things got curiouser and curiouser as I peeled back each layer of the onion.

    You see, I got caught up in these videos on youtube by a guy named Ben Eater. You’ll probably get a kick out of the two “World’s Worst Video Card” ones if you look him up, and they’ll also tell you a lot about his target audience. But the series I’m interested most in is his homebrew 6502 computer. As I was following along, I realized at some point that *I could actually do all of that* … but I don’t care about the 6502.

    I remembered that one IEEE article, dove into a research hole, found kindred spirits, and before I knew it I was ordering a TMS99105A off of ebay. Yeah, the ’95 was more convenient from a hardware point of view, but the ‘105A is *cooler*. Here’s mine, in all of its glory: https://i.imgur.com/1RKi9Ry.jpeg

    I picked up an Arduino clone hardware kit with similar hardware to what Ben Eater is working. But they’re all SPI or I2C instead of memory mapped or *normal* serial devices. So, I have works in progress for SPI-to-CRU and I2C-to-CRU interfaces. (Why oh why does the CRU transmit the lowest bit first!? I don’t think there’s anyone that can answer that question with certainty. It’s just going to be “well, the 960 CRU did it that way and …”)

    But there’s also that Arduino. With some clever circuitry involving level shifting bus transceivers and the 99000’s *delightful* ability to wait-state even internal cycles indefinitely, I can actually just emulate the entire hardware bus cycle in software on my PC through a serial interface …

    And I needed to learn the assembly language, but all the emulators are for systems I don’t care about instead of just the chip itself, so of course I’ve started building one … It’s simple typescript and runs in a browser. I’m clearly going to have to hook it up to the serial interface somehow, at some point …

    And then of course, there’s turning all these notes into an essay.

    And all the ingredients are here for a multi-user multi-process memory-safe operating system that clearly I’m just going to have to write.

    I’m pretty deep in here.

    So, thank you. Your work on the 99000 may have been a dead end on the market, but the result is still pretty darn nifty. I appreciate the time you’ve taken to read this, and look forward to any answers you can provide.

    • Thanks,

      I sent a response to many of your questions to your email address (assuming it is good).

      Karl

Leave a Reply to Karl GuttagCancel reply

Discover more from KGOnTech

Subscribe now to keep reading and get access to the full archive.

Continue reading