Downtown Doug Brown https://www.downtowndougbrown.com Thoughts from a combined Apple/Linux/Windows geek. Tue, 30 Dec 2025 21:55:06 +0000 en-US hourly 1 https://wordpress.org/?v=6.9 Finding a broken trace on my old Mac with the help of its ROM diagnostics https://www.downtowndougbrown.com/2025/12/finding-a-broken-trace-on-my-old-mac-with-the-help-of-its-rom-diagnostics/ https://www.downtowndougbrown.com/2025/12/finding-a-broken-trace-on-my-old-mac-with-the-help-of-its-rom-diagnostics/#respond Tue, 30 Dec 2025 01:52:17 +0000 https://www.downtowndougbrown.com/?p=12855 Yesterday, for the first time in about a year, I tried powering on the Macintosh Performa 450 (LC III) from my past writeup about Apple's backwards capacitor.

It didn't work. The screen was black, it played the startup sound, and then immediately followed up with the "Chimes of Death". Nothing else happened from that point on. Here's what it sounded like:

This was a little frustrating because last year I had already replaced all of the capacitors and cleaned where they had leaked, so I didn't expect to encounter any problems with it so soon. The machine had worked fine the last time I'd tried it! But despite all that, something was failing during the power-on tests in Apple's ROM, prompting it to play the chimes of death. I remembered that people have been working towards documenting the Mac ROM startup tests and using them to diagnose problems, so I decided to give it a shot and see if Apple's Serial Test Manager could identify my Performa's issue. Where was the fault on this complicated board? Sure, I could test a zillion traces by hand, but why bother when the computer already knows what is wrong?

I hooked up the Mac's RS-422 modem port to my computer's RS-232 serial port using a couple of adapter cables to convert from Mini-DIN-8 to DB-25 and then DB-25 to DE-9. Next I opened up PuTTY, configured the serial port on my PC for 9600 baud, 8 data bits, no parity, and 2 stop bits (8N2), and tried typing the command to put the Serial Test Manager into ASCII mode:

*A

It echoed the command back to me, so it was working! Next, I typed the command to return the status:

*R

It printed this back to me:

2F1E122B0003*R

According to the documentation I linked earlier, this result shows that the status register contained the value 0x2F1E122B and the major error code was 0x0003. Error code 3 means RAM Bank A failure. The 0x2F1E122B seemed like gibberish, but I thought it was supposed to be a bitmask of bad bits. I later figured out that the value in the status register is always junk after the chimes of death play, because the code that plays the sound overwrites it.

The RAM test definitely knew which part of the RAM was failing though. I just needed it to give me all of the details. So I manually ran a test over a small range of RAM addresses:

*4
*000001000
*100002000
*T000200010001

What these commands do according to the documentation:

  • *4 clears the result of any previous test
  • *0 sets the value of register A0, containing the start address of the test. I set it to 0x00001000.
  • *1 sets the value of register A1 for the end address of the test. I set it to 0x00002000.
  • *T runs a "critical test". 0x0002 is the test (mod3 RAM test), the first 0x0001 is the number of times the test will run, and the second 0x0001 contains option flags.

Here is the printout I got back from the Mac when I ran these commands:

*4
*0
*1
*ERROR**T

This was actually really good news! It accepted the first three commands, and then the RAM test failed. This was consistent with what I expected to see. I tried to display the results again, hopeful that this time the status register would contain useful info about the failed RAM.

*R

It happily printed this back:

000008000000*R

Yay! This meant the status register was 0x00000800. The status register value showed which bit(s) in the RAM were acting up. In other words, the test was telling me that bit 11 was the problem.

I didn't have a RAM SIMM installed, so the problem was clearly with the 4 MB of onboard memory. It was very doubtful that a RAM chip had just randomly gone bad since the last time I'd powered up this machine. More likely, the leaked capacitor goo had eaten away another trace over time because I hadn't cleaned the board well enough. I grabbed my multimeter and checked the continuity of D11 between the RAM chip and various other components on the board. Luckily, Bomarc reverse-engineered the LC III logic board a while ago and their schematics are floating around on the internet these days.

The schematics indicate that onboard RAM data bit 11 is supplied by U28, pin 25. It's hooked directly to the CPU's data bus, which goes to the RAM SIMM slot, the CPU itself, an optional FPU, the PDS slot, one of the ROM chips (U19), and other random chips on the board.

Thanks to max1zzz's LC III Reloaded replica of the LC III logic board, I was easily able to follow the traces and verify where things were hooked up. Sometimes Bomarc's schematics can be a little iffy, so it's always good to double check them.

I confirmed that U28 pin 25 had a connection to the RAM SIMM socket right next to it (pin 55), but it wasn't connected to anything else. The ROM chip U19 was the easiest to test against. I also checked that other nearby data lines did indeed have good continuity between the RAM and ROM, so it was just this one data line that was bad. This all made sense and was consistent with the RAM test results. There was definitely a broken trace somewhere. Following along with max1zzz's replica board Gerber files, I had a pretty good idea of where the damage was: a cluster of tiny vias near where an electrolytic capacitor had badly leaked. Several of these vias look pretty icky. Also, please ignore my terrible alignment on the replacement tantalum cap.

I was in a hurry to get this Performa running again. Instead of trying to repair the bad trace/via, I opted for a quick bodge wire on the bottom of the board between pin 55 of the RAM SIMM socket and pin 21 of the relevant ROM socket (U19). That was easier than trying to repair a tiny via. I might experiment more with via repair in the future, though!

With the bodge wire in place, my Performa 450 is alive once again! For now, anyway. My board probably still has some issues. That's the tricky thing with capacitor leakage. You might think you've cleaned it well, but electrolyte could still be lurking there somewhere, slowly eating away more and more copper. I know some people have had good luck using ultrasonic cleaners, although I hear that they can damage oscillators.

If you're feeling nostalgic and/or have way too much time on your hands, and you're comfortable with building MAME from source, you can replicate my successful diagnosis in an emulator using MAME on Linux. Here's a quick patch I applied to screw up bit 11 of the RAM on the emulated LC III:


diff --git a/src/mame/apple/sonora.cpp b/src/mame/apple/sonora.cpp
index 141e3e9950d..7d07addc29e 100644
--- a/src/mame/apple/sonora.cpp
+++ b/src/mame/apple/sonora.cpp
@@ -191,6 +191,9 @@ u32 sonora_device::rom_switch_r(offs_t offset)
offs_t memory_mirror = memory_end & ~memory_end;

space.install_ram(0x00000000, memory_end & ~memory_mirror, memory_mirror, memory_data);
+ space.install_write_tap(0x0000, 0xffff, "faulty_ram", [&](offs_t offset, u32 &data, u32 mem_mask) {
+ data &= ~0x0800;
+ });
m_overlay = false;
}

Then, you can run MAME with this command:

./mame maclc3 -window -nomaximize -printer pty

This allocates a pseudo terminal that acts as the serial port. You may notice that I included -printer instead of -modem in the command, even though the physical port I used is definitely the modem port. That's because the current version of MAME as of this writing seems to have them swapped! Sometime in the future when that is fixed, you'll likely need to correctly type -modem instead.

With my patch applied, running MAME like this should give you the startup sound followed immediately by the error sound. Figure out which pseudo-terminal is linked to the port (it was /dev/pts/1 on my machine) and open it with your favorite serial program, such as minicom. You can now type all the commands I used to diagnose the problem.

Anyway, this was a successful use of Apple's ROM diagnostics to quickly solve my issue. It was much easier than manually checking continuity of a zillion PCB traces! Back in the day, Apple had a service tool called the TechStep that was capable of performing some of these diagnostics. There's even a modern clone of it, which happens to also be created by max1zzz. However, I'm not sure exactly how useful this device would have been for service techs other than as a pass/fail indicator. Wasn't Apple's policy just to replace full boards, similar to how it is today? Maybe they repaired faulty returned boards and reused them as service part stock. I'm not sure!

By the way, this wasn't my first successful use of the Serial Test Manager. Earlier this year, I also fixed a Performa 410 (LC II) that was experiencing the Chimes of Death. The failure code was 0x30, indicating an Egret error. Egret is the name of the logic board's microcontroller that handles the Apple Desktop Bus, battery-backed PRAM, and some power on stuff. After the ROM diagnostics pointed me in that direction, I did a much better job of cleaning the cap leakage around it, and the problem completely went away. So that's now two times that this cool functionality has helped me.

I'll talk more about my somewhat special Performa 410 in a future post!

]]>
https://www.downtowndougbrown.com/2025/12/finding-a-broken-trace-on-my-old-mac-with-the-help-of-its-rom-diagnostics/feed/ 0
Debugging BeagleBoard USB boot with a sniffer: fixing omap_loader on modern PCs https://www.downtowndougbrown.com/2025/11/debugging-beagleboard-usb-boot-with-a-sniffer-fixing-omap_loader-on-modern-pcs/ https://www.downtowndougbrown.com/2025/11/debugging-beagleboard-usb-boot-with-a-sniffer-fixing-omap_loader-on-modern-pcs/#comments Sat, 08 Nov 2025 20:27:32 +0000 https://www.downtowndougbrown.com/?p=11220 This post is about the original OMAP3530 BeagleBoard from 2008. Yes, the one so old that it doesn't even show up in the board list on BeagleBoard.org anymore. The BeagleBoard, not the BeagleBone. During my Chumby 8 kernel escapades, at one point I ran into a UART bug that affected multiple drivers, including the omap-serial driver. This led me to buy a BeagleBoard so I could verify the omap-serial bug on hardware.

After I figured out the bug with the UART driver, I realized that the OMAP3530 has support for booting from USB, so I decided to go off on a random tangent to get USB boot working. There was no problem I was trying to solve or anything like that. I just thought it would be a fun experiment (am I a masochist?). Little did I know, I would be getting myself into some tricky USB packet analysis.

I struggled to find info about this process because of how old the OMAP is today. The main utility I found was a program called omap_loader by Grant Hernandez, which is a newer rewrite of Martin Mueller's original omap3_usbload circa 2008. Thanks to some lucky searching combined with the Internet Archive, I connected the dots between 2008 and the present. At some point before 2013, Rick Bronson provided an update to omap3_usbload (along with a patch to TI's X-Loader bootloader) that enabled uploading additional files like a full U-Boot and Linux kernel into RAM after X-Loader, all through USB. This unlocked the ability to boot all the way to Linux from a completely blank BeagleBoard. Grant's newer omap_loader utility also incorporates these same improvements.

All of this research was difficult. Many of the links I found pointed to sites like gitorious.org and arago-project.org, both of which no longer exist (although Arago's Git repos are now hosted by TI). eLinux.org's BeagleBoard wiki was totally rearranged at some point and lost its info about USB recovery, and Rick's site no longer exists, but as usual, the Internet Archive saved the day.

At some point later on, X-Loader was replaced by U-Boot SPL, so I think that is partially why so much of this info eventually disappeared from the web. But it's a darn shame. This USB booting functionality is really cool, and it seems like most of the documentation for it has slowly gone by the wayside! The main breadcrumbs remaining on modern Google are the newer omap_loader utility, and also some references to Nest thermostats. For example, Nest's X-Loader had the USB patch applied (with some tweaks added).

With all that research out of the way, I was ready to try it all out. I compiled omap_loader, grabbed the pre-built binary of x-load.bin that was included with Rick's patchset, and also used a u-boot.bin that I had compiled myself using Buildroot while performing my UART tests with a modern kernel on the BeagleBoard. Then, I tried to load it:

$ sudo ./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000 -v
OMAP Loader 1.0.0
File 'x-load.bin' at 0x40200000, size 26956
File 'u-boot.bin' at 0x80800000, size 777760
[+] scanning for USB device matching 0451:d009...

The idea behind this command is it sends X-Loader (x-load.bin) as the main payload that the OMAP's on-chip bootloader is listening for over USB. Then, X-Loader starts up. Next, omap_loader sends any additional files using X-Loader's USB protocol. In this case, I've supplied one extra file: u-boot.bin, which I told it to load into RAM at 0x80800000. Finally, the -j 0x80800000 argument tells X-Loader to jump into U-Boot rather than hanging around doing nothing afterward.

The output of the command looked normal so far. I plugged in my BeagleBoard, which didn't have an SD card inserted and also had its NAND flash erased, so it had no bootloader installed and thus it would attempt a USB boot.

[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[-] fatal transfer error (BULK_OUT) for 26956 bytes (0 made it): LIBUSB_ERROR_PIPE
[-] failed to send file 'x-load.bin' (size 26956)
[-] failed to transfer the first stage file 'x-load.bin'

Darn. The utility recognized the BeagleBoard being plugged in, but libusb errored out with a pipe error. Long story short, I messed around with a few other computers, and I found that a few of my older computers, old enough that they didn't have USB 3.0 ports on their motherboards, actually worked perfectly fine with omap_loader. I couldn't get it to work properly with most of my modern machines though, AMD or Intel.

I thought this would be a great application for a USB sniffer, so I decided to record some traces of success versus failure.

Here's a link to my in-depth investigation comparing success versus failure on the GitHub issue about this problem. Yep, it turns out I wasn't the only one running into this exact same issue. Grant himself was seeing similar problems, and had come to a similar conclusion that it seemed to be machine-dependent. Other people had mentioned that adding delays at certain points in the code seemed to help. I was intrigued, so I tried to get to the bottom of it.

Here's what the USB boot process is supposed to look like, according to TI's OMAP35x Technical Reference Manual:

  • The OMAP device enumerates as a USB device.
  • Within 300 ms, the host needs to read an "ASIC ID" structure from the OMAP or else it will disconnect from USB.
  • Then, the host sends a 4-byte command: 0xF0030002 means to continue booting through USB.
  • Next, the host sends the 4-byte length of bootloader data it wants to transfer.
  • Finally, the host sends the bootloader (X-Loader in this case), which will be loaded into internal SRAM starting at 0x40200000.
  • After the OMAP device receives all of the data, it runs the received bootloader by jumping to 0x40200000.

Again, this process worked perfectly fine on my older computers that don't support USB 3.0, but on my newer computers with USB 3.0, it was hanging up. I did notice that the newer computers were trying to fit a lot more data into a single USB frame. For example, the start of my older computer's communication with the OMAP looked like this:

  • Frame 1
    • Host sends boot command
    • OMAP confirms it
  • Frame 2
    • Host sends length
    • OMAP confirms it
  • Frame 3
    • Host sends first packet of bootloader data
    • OMAP confirms it
    • Host sends second packet of bootloader data
    • OMAP says it's not ready
    • Host pings
    • OMAP says it's ready now
    • Host sends second packet of bootloader data again
    • OMAP confirms it

And then from that point on, it was just a process of sending the rest of the data like that. About 5 data packets would fit into each frame. My newer computer's traffic looked like this instead:

  • Frame 1
    • Host sends boot command
    • OMAP confirms it
    • Host sends length
    • OMAP says it's not ready
    • Host pings a few times until the OMAP is ready
    • Host sends length again
    • OMAP confirms it
    • Host sends first packet of bootloader data
    • OMAP confirms it
    • Host sends second packet of bootloader data
    • OMAP says it's not ready
    • Host pings several times, OMAP never says it's ready during the rest of this frame
  • Frame 2
    • Host pings
    • OMAP responds with a STALL packet

The newer xHCI host controller was trying its best to efficiently squeeze a lot of packets into the first frame. Even though this is a pattern that should be perfectly valid to follow when communicating with a USB device, the OMAP bootloader was clearly not happy about something, and eventually sent a STALL packet before omap_loader made much progress. Various USB packet traces on different modern computers revealed similar issues. It would either STALL after the second packet, or just NAK forever and never accept additional incoming data.

Inspired by other comments about adding delays, I tried to work around this by inserting an artificial 1 ms delay before every libusb_bulk_transfer() call. This would force modern machines to slow down a little bit. As soon as I added those delays, all of my new computers had no trouble uploading X-Loader to the OMAP. So yeah, I think the OMAP just doesn't like receiving USB data too quickly.

That wasn't the end of this little project, though. The 1 ms delay fixed the issue with getting X-Loader to run, but the newer computers also ran into problems while trying to upload U-Boot through X-Loader!

[-] device timed out while transfering in 512 bytes (got 0)
[-] device timed out while transfering in 512 bytes (got 0)
[-] device timed out while transfering in 512 bytes (got 0)
[-] failed to read command from X-Loader
[-] failed to transfer the additional files in to memory

Rats. I went back to the USB sniffer for more research.

This time, it was a different problem. I found the point where the host would try to read the initial request from X-Loader: USBf. On my older computer, this worked fine; it received a 13-byte string from X-Loader: USBffile req followed by a null terminator. It was happy with this, and omap_usbload kept going on with the rest of the file load process and everything succeeded.

On the newer computer, some shenanigans were going on. Let's look at the USB trace in depth:

The 335-byte packet contains 332 actual bytes of data (the packet ID and CRC account for the other 3 bytes), and is the final chunk of X-Loader. It was successfully received and confirmed by the OMAP with an ACK. At that point, we can assume that the OMAP has begun jumping into X-Loader to start it up.

A millisecond later (due to the delay I added), we start trying to read from X-Loader. It's clearly too soon, though; I don't think X-Loader has finished starting up yet. There's nothing ready to read. So these IN/NAK packets continue on for about 5 more milliseconds, which is totally normal. But then, something finally happens: the OMAP stops responding to our IN packets. My computer's USB host controller tries three times (see the three IN packets in a row below?) and then it gives up. I'm guessing this is around the same time that X-Loader is doing its own hardware initialization, so maybe the OMAP's USB controller is temporarily disabled.

This all makes sense so far. We tried to read too quickly before X-Loader finished starting up, so when it did finally load, there was a brief moment where it would not respond to IN packets. The host controller didn't like this and stopped trying, so all we saw from that point on was SOF packets because we weren't attempting any more USB reads. Some of my other computers gave up after 15 unanswered IN packets instead of 3. I'm not sure if that's a difference in the host controller or what, but it's the same root problem.

You may be wondering: why didn't the older computers run into this same issue? They were also trying to talk with X-Loader too early, so why wouldn't they run into this same roadblock? The answer is that their older host controllers are more tolerant of the missing NAKs. I recorded a similar trace with one of my older computers that works fine without any patches to omap_loader. It also immediately began sending IN packets trying to read from X-Loader way too soon. Just like the problematic computers, it experienced a brief period where the OMAP stopped responding to INs with NAKs. The difference is that it didn't abandon hope so quickly. There were 33 unanswered IN packets. After that, the OMAP continued responding with NAKs again and everything was fine from that point on. 17 ms after we had originally finished sending out X-Loader, the OMAP finally responded with an actual data packet. So that's the total time it took for X-Loader to launch.

Back to the newer computers that weren't playing nicely. I was still confused. Even though this is a problem with newer host controllers, omap_loader has a retry mechanism! If it fails to read X-Loader's initial data, it will try again 2 seconds later. You'd think it would succeed at that point. Let's see what happens:

Ah, interesting. The retry is actually 3 seconds later. I'm guessing the first failed read attempt had a 1-second timeout, so then with a 2-second retry timer, that adds up to 3 seconds total.

Anyway, something funky happens here. The USB host finally reads the 13-byte string from X-Loader as a DATA1 packet (again, it shows up as 16 bytes because of the packet ID and CRC). The host then acknowledges reception with an ACK, but for some bizarre reason, it immediately continues attempting to read more data! I won't show the whole trace, but the host keeps polling with IN packets for a whole second. And of course, they're all NAKed. X-Loader knows it successfully sent data to us, so it has shifted over to waiting for the host to send an OUT packet instead. It's like the host controller gets confused and expects X-Loader to send more data. The kernel never reports those 13 bytes back to libusb, even after the 1-second transfer timeout expires.

I don't consider myself to be a USB expert, so maybe I'm misunderstanding something. This behavior just seems wrong, though. When my computer finally reads 13 bytes (proven by the sniffer trace shown above), why isn't this data reported back to libusb? I would have expected the reception of a short DATA0/1 packet to cause the host controller to stop reading and return the data back immediately. Is this some kind of strange bug in the Linux kernel or the host controller hardware or something? I don't know for sure. I find this behavior to be very odd, and I can't explain it. My off-the-cuff guess is that the initial failure to respond to the three IN packets results in something getting out of sync in the host controller, but I really don't know for sure. In my opinion, the retry should have worked, but clearly, something got confused. Not sure what. I don't think it's libusb's fault, though.

I hate adding arbitrary delays in order to fix things, but a 20-millisecond delay between uploading X-Loader and attempting to read from it fixes this final issue. It ensures that the OMAP has been given ample time to launch X-Loader before we try reading from it, preventing the host controller from encountering the weird situation with unanswered IN packets.

After all of this tinkering and patching that I did to get things to play nicely on newer machines, here is a successful run of omap_loader:

[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[+] uploading 'u-boot.bin' (size 777760) to 0x80800000
[+] jumping to address 0x80800000
[+] successfully transfered 2 files

Meanwhile, the following output pops up on the BeagleBoard's UART:

Texas Instruments X-Loader 1.5.1 (Nov 15 2011 - 09:36:31)
Beagle Rev C4
Trying load from USB
USBLOAD_CMD_FILE total = 12 addr = 0x73425355 val = 0xbde20 val = 0x80800000
got file addr = 0x808bde20
USBLOAD_CMD_JUMP total = 8 addr = 0x6a425355 val = 0x80800000


U-Boot 2023.10 (May 25 2024 - 22:05:27 -0700)

OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 MHz
Model: TI OMAP3 BeagleBoard
OMAP3 Beagle board + LPDDR/NAND
I2C: ready
DRAM: 256 MiB
Core: 44 devices, 18 uclasses, devicetree: separate
NAND: 256 MiB
MMC: OMAP SD/MMC: 0
Loading Environment from NAND... *** Warning - bad CRC, using default environment

Beagle Rev C4
Timed out in wait_for_event: status=0000
Check if pads/pull-ups of bus are properly configured
No EEPROM on expansion board
OMAP die ID: 79b8000400000000040398da1401c009
Net: No ethernet found.
Hit any key to stop autoboot: 2

I believe the "Timed out in wait_for_event" error is harmless. Anyway, success! It loads U-Boot! You can imagine that I could have easily transmitted a Linux kernel and initramfs as well, and fully booted this thing over USB. Once U-Boot is running, I can do whatever I want.

With these simple delay tweaks, omap_loader works great on all modern computers I've thrown at it, including Raspberry Pis. The only "gotcha" I've encountered is that some slower computers (my i3-7100U laptop and a Raspberry Pi Zero) don't forward the USB hotplug event through udev quickly enough before the BeagleBoard decides it's not being asked to boot over USB. omap_loader never gets past scanning for a device, even though the dmesg log clearly shows that it was detected:

[4076310.258842] usb 11-5: new high-speed USB device number 65 using xhci_hcd
[4076310.407944] usb 11-5: unable to get BOS descriptor or descriptor too short
[4076310.410041] usb 11-5: New USB device found, idVendor=0451, idProduct=d009, bcdDevice= 0.00
[4076310.410046] usb 11-5: New USB device strings: Mfr=33, Product=37, SerialNumber=0
[4076310.410051] usb 11-5: Product: OMAP3430
[4076310.410054] usb 11-5: Manufacturer: Texas Instruments
[4076310.710703] usb 11-5: USB disconnect, device number 65

As you can see, it's a very short timeframe; just like TI's manual says, it only stays connected for about 300 ms if it doesn't hear from the host. I guess that's not enough time for udev on some computers. The only solution I found for this issue on my slower machines was to compile a custom version of libusb with udev disabled, which forces it to directly use netlink for hotplug detection instead.

My patch also limits libusb transfers to 512 bytes at a time. I don't think this change is critical, though. It fixed an issue I ran into where my bus was really loaded and libusb reported a memory error. I don't think it actually helps anything in most cases as long as people aren't performing crazy big USB transfers at the same time.

In summary:

  • Trying to write USB data to the OMAP's on-chip bootloader too quickly seems to hit some edge cases that it doesn't handle correctly. A 1 ms delay fixes this.
  • Trying to read from X-Loader before it's ready to go irritates newer USB host controllers when they send out several IN packets without receiving any response (not even a NAK). A 20 ms delay fixes this.
    • Even retries afterward fail; the host controller gets out of sync due to the unanswered IN packets or something like that.
  • On some slower computers, udev doesn't give you enough time to respond to the OMAP's 300 ms timeout, so libusb never detects the hotplug. This can be solved with a custom libusb that uses netlink instead of udev.

I opened up a PR to submit these fixes (except for the udev thing) upstream to omap_loader in 2024. Why am I writing about this now? Well, remember when I mentioned Nest earlier? Google ended support for older Nest thermostats last month, which renewed some interest in merging my reliability improvements so that people can flash custom firmware to their Nest thermostats. Those old Nest devices also use OMAP processors.

What it boils down to is: all this tinkering I did last year with pointlessly booting old BeagleBoards over USB accidentally ended up being useful. It helped out some Nest thermostat revival projects that have been popping up in the last month. So I thought now might be a fun time to talk about my tiny involvement with that. Yay! It's always fun when a random side project unexpectedly helps other people.

]]>
https://www.downtowndougbrown.com/2025/11/debugging-beagleboard-usb-boot-with-a-sniffer-fixing-omap_loader-on-modern-pcs/feed/ 1
An update about the hidden Performa 550 recovery partition https://www.downtowndougbrown.com/2025/08/an-update-about-the-hidden-performa-550-recovery-partition/ https://www.downtowndougbrown.com/2025/08/an-update-about-the-hidden-performa-550-recovery-partition/#comments Thu, 28 Aug 2025 02:28:38 +0000 https://www.downtowndougbrown.com/?p=12264 Earlier this year, I wrote about how I rescued a special recovery partition from an old Macintosh Performa 550's dead hard drive. This partition had been lost to time and it was a race to try to save it before the remaining Performa 550 machines out there with their original hard drives were reformatted or destroyed. It has now been preserved on the Macintosh Garden. I have a few updates to that post that I'd like to share.

The first update is that some extra discussion took place in the comments of my original post. Reader "Greg" pointed out that there was an Apple employee named John Yen who worked on the Mac OS during the System 7 era, and suggested he might be the "jy" in the associated "msjy" creator code. That would leave "ms" potentially being Microseeds, which is the company that developed Apple Backup.

This led me to search further, and I stumbled upon Apple's patent for the automatic OS recovery functionality filed in 1994. It was granted in 2002 and expired in 2019. John Yen is listed as the inventor. The patent contains some screenshots of the exact UI that I experienced while testing the functionality. I never thought to look through patents, but I should have. They are definitely a useful tool for historical research on this type of stuff. I thought that was a really cool discovery. Thanks, Greg!

Now, onto the second thing. After my research had seemingly concluded, I never turned off my eBay alerts. Last week, I received a notification about a damaged tray-loading Performa 550 (manufacture date February 1994) being parted out. Sure enough, one of the seller's auctions was a working 160 MB hard drive from the same machine. Of course, I couldn't resist snatching it up.

As soon as it arrived, I dumped all the contents. I was in for a very pleasant surprise: this hard drive also had the invisible recovery partition intact!

Better yet, unlike the last one, it still had all of the original Performa software, including Apple Backup, sitting in the Applications folder.

At one point during my initial search, I was really concerned that I might never find the lost recovery partition. Now everything has changed, and I'm pleased to be able to say I found it twice!

This second hard drive is a huge discovery because I now have two data points, which has led me to gain a little more confidence about how I think the special recovery partition was created in the first place. You may recall from my previous post that Apple's own tech notes said that Apple Backup was responsible for creating it, but I was never able to find any evidence supporting that claim. Unfortunately, the original owner had deleted Apple Backup from the first hard drive before I got my hands on it, so I couldn't draw any conclusions.

This new-to-me hard drive was exciting because Apple Backup had not been deleted! I was half expecting to find a weird unpreserved version of Apple Backup hanging around on it, but nope. It's just the same version 1.2 from System 7.1P6 that I had already looked at in depth, right down to the creation/modification dates and the exact number of bytes used.

The hidden partition's contents are exactly the same as what I found on the first hard drive. The file sizes are all precisely the same, and the icons are positioned identically too. That solves one mystery: the weird icon positions inside the invisible partition were not something the original owner caused. They were just weird on all machines.

The only difference I found is that the creation and modification dates of the files are slightly different between the two drives. The easiest place to show this is in the Get Info window of the partition. On the left is the first hard drive, and on the right is the second hard drive.

You can see that the exact same number of bytes have been used, but the partition on the second hard drive was created about 21 hours later. Also, it appears that on both machines, it took about 4-5 minutes to finish populating it.

One of the things I called out in my first post on this subject was that the System file strangely had a modification date several months later. It turns out that something similar happened with the second hard drive, but it ended up being a date way back in the past -- the August 27, 1956 date that some Macs default to if the PRAM battery goes bad. Again, hard drive #1 on the left and hard drive #2 on the right:

I still don't have a great explanation for why the System file's modification date changed on both of these partitions. Maybe a third-party utility or installer happened to tweak the modification date at some point. In my earlier post, I had suggested that the At Ease 'INIT' resource being missing from the System file could have potentially explained the modification date change, but that resource was also missing from the second hard drive's recovery partition System file. Plus, At Ease had not been uninstalled from the second machine. So that blows that theory out of the water. Clearly, the At Ease INIT was never part of the recovery partition's System file.

One other interesting thing I found was that the main Hard Disk volume had the exact same creation date on both drives:

Obviously, I would love to be given the opportunity to analyze a third hard drive in the future to gain even more confidence. With all that in mind, I think I'm much closer to being able to reach a conclusion today:

I still can't be 100% sure, but I think this latest hard drive analysis hammers the final nail in the coffin to the theory that the partition was created based on an action the user performed. I now believe the recovery partition was added by Apple in the factory, and the technote was just wrong about Apple Backup being involved. I can't find any code in Apple Backup that does it, and this second hard drive gives me a good reason to doubt that a customized Performa-550-specific Apple Backup version is hiding out there in the wild somewhere.

I'm still weirded out by the other initials in the creator code being "MS" given that Microseeds developed Apple Backup, but all signs are pointing to this being a factory-programmed thing. I think that makes the most sense anyway. If you're Apple and you've developed this functionality, why would you only enable it after the user has bought a zillion floppy disks and manually performed a backup? Why not just give it to everyone and allow them all to benefit from it?

The fact that the creation date of the recovery partition was not exactly the same between the two hard drives, but it was still within less than a day, is fascinating to me. This means the recovery partition wasn't simply imaged onto every machine at a block level, or the dates would have been exactly the same on every machine. Maybe one of the operations performed at the factory during testing was to run a script that created the partition? This would explain why the creation date of the recovery volume was slightly different between the machines.

Another data point in favor of this theory comes from a recently-preserved Apple Restoration CD: Market Software Series Volume 1 from March 1994. It has a bunch of factory Performa software bundles. I found a quick comment about "action atoms" for a backup partition not being included in the configuration that goes with the ultra-rare Performa 560 Money Magazine Edition:

I suspect it's referring to the recovery partition, and it's implying that Apple did have some kind of restore/imaging script that created it, and they specifically chose not to put it into this configuration.

Unfortunately, the Performa 550 recovery image on this same CD doesn't mention anything about the recovery partition, and doesn't create it. It's also directed to be used for several other Performa models, so I don't think Apple intended for this recovery CD to comprehensively restore a machine to the exact state it was in when it left the factory. It was just meant to restore it to something operable that was close enough.

With all that out of the way, there's one last update I want to talk about. In the first post, I mentioned that Apple published another tech note describing a bug where an educational Dinosaur Safari CD game would accidentally cause the machine to jump into recovery mode. I went so far as to buy the game, reproduce the problem, and post a video demonstrating it.

Looks like Creative Multimedia Corporation beat Apple to the punch at having a Mac app named Safari by almost a decade!

Several people were interested in learning more about why Dinosaur Safari did this. What would even cause an app launched from a CD to trigger the system to enter recovery mode? It's definitely an odd bug, and worthy of a little more investigation.

I spent a little bit of time in MAME tracing what the game did leading up to the machine deciding to reboot. After looking through some CPU execution logs, I found that it was happening in the middle of InitResources(). When I mentioned this in #mac68k on Libera, Josh Juran quickly explained to me that apps aren't supposed to call InitResources() in the first place. Inside Macintosh Volume I confirms this:

So that's the problem. The app is calling a system function that it's not supposed to, which results in the re-initialization of a bunch of system stuff, and somehow this causes the Mac to reboot into recovery mode. Strangely, the problem only happens if the app runs directly from the CD. It doesn't happen if you copy it to your hard drive and launch it from there. Weird. Here's the relevant part of the code as viewed in ResEdit:

It's alongside a bunch of other standard toolbox initialization routines that are often called early during a classic Mac app's lifetime. The developers of Dinosaur Safari inadvertently added a call to InitResources(). It's kind of funny how it's sitting there near the top of Apple's public Resources.h header file, even though it's not supposed to be called by programs. It's almost like they were just daring someone to do it.

Anyway, to test this theory, I patched the CD to replace the InitResources trap instruction with a nop instead:

Using this modified CD, Dinosaur Safari runs perfectly fine and doesn't activate the recovery partition.

I decided not to dive deeper and figure out the underlying sequence of events that leads to the reboot. It would be way too much reverse engineering work inside the bowels of the classic Mac OS for very little payoff. This experience might be a good clue about why Apple didn't go forward with this functionality. I'd be shocked if Dinosaur Safari was the only program with this bug. Maybe it was too easy to inadvertently jump to recovery mode and confuse users.

This should be the end of my investigation into the Performa 550 recovery partition functionality, unless I happen to stumble upon a third hard drive in the future that radically changes my understanding of everything.

My blog isn't turning solely into an Apple archaeology project, so if you're not interested in old Mac stuff, never despair. I'll write about lots of other fun stuff too. But as a forewarning, I do have another upcoming post about more obscure Apple software from the '90s that was lost and is now found. This time, it was something that I doubt too many people even remembered existing. It'll be a nice little blast to the past.

]]>
https://www.downtowndougbrown.com/2025/08/an-update-about-the-hidden-performa-550-recovery-partition/feed/ 2
Finding a 27-year-old easter egg in the Power Mac G3 ROM https://www.downtowndougbrown.com/2025/06/finding-a-27-year-old-easter-egg-in-the-power-mac-g3-rom/ https://www.downtowndougbrown.com/2025/06/finding-a-27-year-old-easter-egg-in-the-power-mac-g3-rom/#comments Tue, 24 Jun 2025 07:49:28 +0000 https://www.downtowndougbrown.com/?p=11983 I was recently poking around inside the original Power Macintosh G3's ROM and accidentally discovered an easter egg that nobody has documented until now.

This story starts with me on a lazy Sunday using Hex Fiend in conjunction with Eric Harmon's Mac ROM template (ROM Fiend) to look through the resources stored in the Power Mac G3's ROM. This ROM was used in the beige desktop, minitower, and all-in-one G3 models from 1997 through 1999.

As I write this post in mid-2025, I'm having a really difficult time accepting the fact that the Power Mac G3 is now over 27 years old. Wow!

While I was browsing through the ROM, two things caught my eye:

First, there was a resource of type HPOE which contained a JPEG image of a bunch of people, presumably people who worked on these Mac models.

This wasn't anything new; Pierre Dandumont wrote about it back in 2014. However, in his post, he mentioned that he hadn't figured out how to display this particular hidden image on the actual machine. Several older Macs have secret keypress combinations to show similar pictures, but the mechanism for displaying this one was a complete mystery.

The second thing I found was a big clue: I kept looking for other interesting information in the ROM, and eventually I stumbled upon nitt resource ID 43, named "Native 4.3". Thanks to Keith Kaisershot's earlier Pippin research, I was quickly able to conclude that this was the PowerPC-native SCSI Manager 4.3 code. The SCSI Manager wasn't what piqued my interest about this resource though. At the very end of the data, I found some interesting Pascal strings:

These strings were definitely intriguing:

  • .Edisk
  • secret ROM image
  • The Team

The "secret ROM image" text in particular seemed like it could be related to the picture shown above. I decided to dive deeper to see if I could figure out why the SCSI Manager contained these strings, in the hopes that I could solve the mystery. Would this be the clue I needed in order to figure out how to instruct the Power Mac G3 to display this picture?

Some quick Internet searching for the phrase "secret ROM image" revealed that it had been used for easter eggs with earlier PowerPC Macs. On those machines, you just had to type the text, select it, and drag it to the desktop. Then, the picture would appear. That approach didn't work on the G3.

I suspected there was some similar way to access this hidden image, but nobody had documented it, at least not as far as I could find. So I had no choice but to disassemble the code and see where this text was used. What is it with me and all these crazy rabbit holes?

I extracted the entire nitt resource ID 43 to a file and inspected it:

$ file nitt43
nitt43: header for PowerPC PEF executable

That wasn't too surprising, considering that the first twelve bytes were "Joy!peffpwpc". I fed this entire file into Ghidra, which immediately recognized it as a PEF file and had no trouble loading it. Although I'm pretty familiar with reading x86 and ARM assembly, I know essentially nothing about PowerPC assembly code. Thankfully, Ghidra's decompiler worked very well with this file.

There was one problem, though: it didn't detect any references to the "secret ROM image" string, other than inside of a huge list of pointers to variables. After scratching my head a little bit, I realized that Ghidra wasn't doing a great job of finding references to several variables. Luckily, running Auto Analyze a second time after the initial analysis seemed to help it find several more references to things, including all of the strings I was interested in! I didn't change any options with the analyzer; it just found more stuff on the second run.

The function that used all of these strings was definitely doing something with the .EDisk driver, which I already knew was the RAM disk driver because of past hackery. It seemed to be using strncmp() to see if a string was equal to "secret ROM image", and if so, it would create/open/write a file named "The Team".

I cleaned up this decompilation quite a bit by giving names to variables and figuring out data types. Fortunately, a lot of the functions like PBGetVInfoSync() had lots of public documentation, so I just had to tell Ghidra about the various Mac Toolbox structs being used.

Okay, that's a lot easier to understand!

I couldn't figure out how to format the 32-bit function arguments such as 0x48504f45 into four-letter codes like HPOE, so that's what the comments are. Ghidra simply wouldn't let me display them as ASCII in the decompilation no matter what I did, even though hovering over the constant showed a tooltip with the equivalent text. This is easy to do in IDA, but I couldn't figure out how to convince Ghidra to do it. I tried Set Equate, but it didn't change anything. If someone knows how to make it work, I'd love to hear how!

Anyway, the decompiled code shown above makes sense, and here's a summary of what it does:

  • It looks for a driver called .Edisk. (The driver is really named .EDisk, but I guess Mac OS doesn't care about case sensitivity for this.)
  • It finds a disk associated with that driver (the RAM disk).
  • It looks for a volume associated with that disk.
  • If the volume is named "secret ROM image":
    • It loads HPOE resource ID 1, which contains the JPEG image data.
    • It creates a file of creator ttxt and type JPEG called "The Team".
    • It opens the file, writes the JPEG data to it, and closes it.
    • Then it does something with the driver control entry that I didn't bother trying to understand further.

Okay, interesting! So this code was clearly looking for the RAM disk to be named "secret ROM image", but I wasn't sure exactly how to trigger it. This function was only ever called in one other place: another function, which was checking to see if its first argument was equal to the value 0x3DA (decimal 986).

I didn't have my beige G3 handy for tinkering, so instead, I mentioned what I had discovered in #mac68k on Libera. ^alex came to the rescue after playing around in Infinite Mac with the hints I had given. They quickly figured out that the trick was to format the RAM disk, and type the special text into the format dialog:

I got out my desktop G3, tested it out on real hardware, and sure enough, it worked! If you want to try it for yourself just like ^alex did, you can run Infinite Mac in your browser using this link, which sets up an emulated beige G3 running Mac OS 8.1 using DingusPPC. There's a quirk that causes it to fail to resolve an alias at startup. I intentionally disabled it; just click Stop when the error pops up. Here are instructions:

  • Enable the RAM Disk in the Memory control panel.
  • Choose Restart from the Special menu.
  • After the desktop comes back up, select the RAM Disk icon.
  • Choose Erase Disk from the Special menu.
  • Type the secret ROM image text exactly as depicted above.
  • Click Erase.

When you open the newly-formatted RAM disk, you should see a file named "The Team":

If you double-click the file, SimpleText will open it:

Based on various people's tests, including my own, it sounds like this trick works all the way up through Mac OS 9.0.4, but 9.1 may have been the first version where it finally stopped working.

As far as I have been able to determine, this particular secret was undiscovered until now. People definitely knew the image was there in the ROM, but nobody had figured out how to actually activate it. This is probably one of the last easter eggs that existed in the Mac prior to Steve Jobs reportedly banning them in 1997 when he returned to Apple. I wonder if he ever knew about this one?

Special thanks to ^alex for figuring out that the RAM Disk needed to be erased in order to activate the easter egg! I'm not sure I would have thought to try that, and it would have taken a lot more work to trace through the rest of the code to figure it out.

If you are reading this post and you were on "The Team", I'd love to hear about it! I'm curious if anyone who worked at Apple in the era remembers this little secret.

]]>
https://www.downtowndougbrown.com/2025/06/finding-a-27-year-old-easter-egg-in-the-power-mac-g3-rom/feed/ 22
Modifying an HDMI dummy plug's EDID using a Raspberry Pi https://www.downtowndougbrown.com/2025/06/modifying-an-hdmi-dummy-plugs-edid-using-a-raspberry-pi/ https://www.downtowndougbrown.com/2025/06/modifying-an-hdmi-dummy-plugs-edid-using-a-raspberry-pi/#comments Sun, 15 Jun 2025 14:17:58 +0000 https://www.downtowndougbrown.com/?p=6654 I recently found myself needing to change the monitor that a cheap HDMI "dummy plug" pretended to be. It was a random one I had bought on Amazon several years ago that acted as a 4K monitor, and I needed it to be something simpler that didn't support a 4K resolution. The story behind why is a long one that I'm still figuring out and might eventually become a separate blog post in the future.

If you're not familiar with dummy plugs, here's a quick primer: they are tiny dongles you can plug into an HDMI, DVI, etc. port that don't actually do anything with the video signal. They simply have the minimum circuitry needed for a video source device, like a computer, to think that a monitor is hooked up. In general this entails a pull-up resistor on pin 19 (HPD) to +5V, as well as a little I2C EEPROM chip containing the Extended Display Identification Data (EDID). This is useful for headless machines to force the OS to think a monitor is attached.

The EDID contains all the info about the monitor: the manufacturer, manufacture date, supported resolutions, audio channels, color space, and stuff like that. My goal was to replace the dummy plug's EDID with an identical copy of an EDID from one of my many 1080p HDMI capture devices. Then, the computer I plugged it into would think the capture device was plugged in instead of a 4K monitor, and everything would be hunky dory.

I wasn't sure if the dummy plug's EDID EEPROM would be programmable, but I decided to give it a shot. There was a chance that it would have its write-protect pin configured to disable programming, but I figured it wouldn't hurt to try.

Conveniently, I found that my Raspberry Pi Zero has an I2C controller wired to the correct pins on its HDMI port. This makes sense -- the Pi would need to be able to read the EDID of an attached monitor. This post on the Raspberry Pi Forums and this GitHub comment were helpful for explaining which I2C controller(s) to look at in software on various Pi devices:

  • Pi 0-3:
    • /dev/i2c-2
  • Pi 4:
    • /dev/i2c-20
    • /dev/i2c-21
  • Pi 5:
    • /dev/i2c-11
    • /dev/i2c-12

Before I go further, I want to make it clear that it may be possible to screw up a monitor if you follow these instructions while a real monitor is plugged in and it doesn't have its EDID protected. Be careful to only run these commands if you have something attached to the HDMI port that you're not afraid of bricking, such as a dummy plug! Also, make sure you are confident you're on the correct I2C bus! Always read the EDID and parse it first to make sure it actually contains an EDID before you attempt a write. If you attempt these commands on a PC, it's possible that you could accidentally flash hardware that isn't an EDID, like a RAM module's SPD EEPROM.

Starting from a fresh Raspberry Pi OS Lite install, I performed the following modifications:

  • sudo raspi-config
  • sudo apt install i2c-tools
    • Unfortunately, this requires network access, which creates a bit of a problem if you are on a Pi Zero. You might need a USB-Ethernet adapter to make this happen. Another slightly crazy option is to temporarily take the SD card out of your Pi, put it into your desktop PC running Debian/Ubuntu, run sudo apt install binfmt-support qemu-user-static on your PC, chroot into the SD card's rootfs (options 1.3 and 2.1 worked for me), and run the apt install command inside of the chroot.

And with those prerequisites out of the way, I was ready to start tinkering with the dummy plug's EEPROM. Note that I also needed an HDMI-to-Mini-HDMI adapter.

Since I was using a Raspberry Pi Zero, I chose bus 2. You could change the number below to something else on a different model, as listed above (e.g. 20 or 21 on a Pi 4).


edid_i2c=2

I ran i2cdetect to see if the EDID EEPROM was recognized:


i2cdetect -y $edid_i2c

This came back with the following result, showing that an I2C device was detected at address 0x50, which is exactly the address used for EDID:

0 1 2 3 4 5 6 7 8 9 a b c d e f
00: -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: 50 51 52 53 54 55 56 57 -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --

Interestingly, this particular dummy plug also responds with addresses 0x51 through 0x57 present. These other addresses seem to contain copies of the same EDID. Not all dummy plugs show up like this -- another one I have only detects 0x50. Anyway, next, I dumped the original EDID from it:


get-edid -b $edid_i2c > edid-orig.bin
2
This is read-edid version 3.0.2. Prepare for some fun.
Attempting to use i2c interface
Only trying 2 as per your request.
256-byte EDID successfully retrieved from i2c bus 2
Looks like i2c was successful. Have a good day.

Nice! To make sure I got a good dump, I tried it twice and compared the results to make sure they were identical. Then I printed it in a format suitable for copying/pasting to something like edidreader.com:


od -v -An -txC edid-orig.bin

This spit out a nice little hex dump of the EDID that was stored on it:

00 ff ff ff ff ff ff 00 1a ae 31 9d 00 00 00 00
01 19 01 04 85 58 31 78 3e a4 fd ab 4e 42 a6 26
0d 47 4a 2f cf 00 e1 c0 d1 c0 b3 00 a9 40 95 00
81 80 81 40 81 c0 02 3a 80 18 71 38 2d 40 58 2c
45 00 e0 0e 11 00 00 1e 4d d0 00 a0 f0 70 3e 80
30 20 35 00 c0 1c 32 00 00 1a 00 00 00 fc 00 48
44 4d 49 20 4d 6f 6e 69 74 6f 72 0a 00 00 00 10
00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 4a
02 03 46 70 52 e1 60 5f 5d e6 65 64 62 10 04 03
1f 20 21 22 13 12 01 26 09 7f 07 11 7f 50 83 01
00 00 6e b9 14 00 40 00 18 78 20 00 60 01 02 03
04 67 d8 5d c4 01 78 00 07 6c 03 0c 00 20 00 f0
78 20 00 40 01 04 08 e8 00 30 f2 70 5a 80 b0 58
8a 10 c0 1c 32 00 00 1e b7 e6 ff 18 f1 70 5a 80
58 2c 8a 00 ff 1c 32 00 00 1e 56 5e 00 a0 a0 a0
29 50 30 20 35 00 80 68 21 00 00 1a 00 00 00 e9

Pasting it into the site linked above, I could see it was a valid EDID:

Now that I was confident I had the dummy plug's original EDID backed up, I unplugged it from the Pi's HDMI port and plugged my capture device in instead, and repeated the exact same procedure to dump its EDID:


get-edid -b $edid_i2c > edid-capture-card.bin

I confirmed it was also a valid EDID. Finally, I unplugged the capture device and connected the dummy plug again, and wrote the capture device's EDID to it with this fun little code snippet. There are tools out there that can probably do this more efficiently, but hey, this works and doesn't require any special packages other than the standard userspace Linux I2C tools and bash or dash!


edidbytes=($(od -v -An -txC edid-capture-card.bin))
for i in "${!edidbytes[@]}"; do
byte=0x${edidbytes[$i]}
echo Writing byte $i: $byte...
i2cset -f -y $edid_i2c 0x50 $i $byte
done

As a quick explanation, this reads the entire EDID (probably 256 bytes in size) from the dump file created earlier, and formats it into an array of two-digit hex strings using od. Each entry in the array represents one byte in the EDID. Then it loops over each byte, prepending a "0x" prefix and writing it out to the EEPROM using i2cset.

After running this code, I re-read the EDID from the dummy plug and checked to see if it matched the file I started from:


get-edid -b $edid_i2c > edid-test.bin
diff edid-test.bin edid-capture-card.bin

The diff command produced no output at all, which indicated that the new dump was identical. The dummy plug's EEPROM had successfully been reprogrammed with the EDID from my capture device!

Of course, at this point I anxiously plugged it into my test computer, powered the computer up, and noticed that everything was great and it acted as though my HDMI capture device was plugged in instead of a 4K monitor. Success!

I thought I'd share this procedure in case it's useful for someone else in the future. You could probably also use this solution to go in the opposite direction -- upgrading an old 1080p dummy plug to add 4K support. Again, be careful with these commands! I wouldn't recommend tinkering with I2C writes on an actual PC. Use a Raspberry Pi so you don't accidentally brick your desktop PC.

]]>
https://www.downtowndougbrown.com/2025/06/modifying-an-hdmi-dummy-plugs-edid-using-a-raspberry-pi/feed/ 4
Please don't ship heavy, fragile vintage computers. They will be destroyed. https://www.downtowndougbrown.com/2025/05/please-dont-ship-heavy-fragile-vintage-computers-they-will-be-destroyed/ https://www.downtowndougbrown.com/2025/05/please-dont-ship-heavy-fragile-vintage-computers-they-will-be-destroyed/#comments Sun, 25 May 2025 17:41:43 +0000 https://www.downtowndougbrown.com/?p=11600 As part of my research into the Macintosh Performa 550's factory recovery partition, I paid a lot of attention to eBay listings for these computers. I came to an interesting discovery that I had already suspected: big CRT-based Macs in this form factor are regularly damaged in shipping after being sold on eBay.

Most vintage computer enthusiasts are well aware of this, but the plastic in old computers tends to become very brittle as it ages. With Macs in particular, machines made from 1993-ish onward seem to really be affected by this issue. I've seen this with my own eyes, too. The CD-ROM bezel on my Quadra 840av fell off because several very thin plastic clips snapped off. Good luck finding one that hasn't already broken off. One of the clips that holds the top case in place on my Power Mac 6100 also broke off. And I, too, once received a Performa 550 that was destroyed in shipping.

For what it's worth, I have experienced success with smaller "pizza box" Macs shipped to me. They are definitely from the same era of terrible plastic, but they have always been generously covered in all directions with bubble wrap and none of the ones I have received have ever been damaged in transit. I think the fact that they aren't super heavy or large helps a lot.

I was curious about bigger, heavier machines though, especially since I had experienced destruction on one myself. I decided to do a quick little research project on eBay. I went through all past auctions I could find for the Macintosh Performa/LC 5xx form factor. Sadly, eBay's history doesn't go back very far, so it's hard to get a decent sample size. Regardless of the lack of data, what I found doesn't look promising for prospective buyers. Out of 12 total computers sold, I found:

  • One arrived with "barely any damage".
  • Four were broken in shipping.
  • One buyer complained about the shipping and packing and asked for a refund, so I'm going ahead and calling it damage even though the feedback didn't specifically say it.
  • The other six didn't receive any feedback, so it's impossible to know whether they safely arrived or not.

Yes, that's right. There is no data showing that any of these twelve Macs arrived completely unscathed. The best confirmation I saw was the one that arrived with minimal damage, with the buyer reasonably acknowledging that these computers are brittle. I wouldn't be surprised if some of the other six were also damaged and feedback just hasn't been left yet. USPS takes a while on insurance claims.

No matter what, 50% of all shipments being damaged in eBay's currently-available history is a pretty bad statistic, and perfectly matches my own success rate as well. In 2013, I bought a Macintosh TV on eBay which was shipped to me, and the seller did a great job of wrapping it very thickly with bubble wrap and suspending it inside of a huge, sturdy box with packing peanuts filling every void in all directions. It arrived in pristine condition. On the other hand, I also received a Performa 550 more recently that was terribly packed in a flimsy box, with just a little bit of bubble wrap here and there, and it was very, very broken.

In 2013, the plastics in these machines were probably at least a little less brittle than they are today, so I don't even know if the success story of my Macintosh TV could be replicated nowadays.

Here is all the eBay feedback that I was able to find from the past couple of months. I guess you can treat this post as a collection of obituaries for several poor all-in-one Macs from the last few months that are unfortunately no longer with us. The purpose here is not to call out any specific sellers -- it's to raise awareness that these things really shouldn't be shipped, at least not without some kind of crazy reliable heavy-duty protection and a little bit of luck.

Okay, that first one wasn't really an obituary. The rest, though...

The Performa 575 pictured above is one I actually communicated with the seller about. I can vouch for the positive feedback of being very communicative and professional. They went above and beyond by kindly checking it to see if it had a special version of Apple Backup (it didn't). I mentioned the dangers of shipping these things, and it turned out I hadn't been the first person to mention it. The seller was pretty confident about having a good method for packing it. What it comes down to is I think shipping services are way rougher on packages than people realize.

eBay drama in that last one aside, this is a frequent enough issue that I felt the need to write about it. I think at some point we all need to use some common sense when dealing with computers like these. They're heavy, the plastic is fragile, and UPS/FedEx/USPS aren't exactly babying your packages on the way to the destination. If you are selling one of these machines, please think twice before deciding to ship it. And then, if you've decided that you want to ship it, reconsider your decision again just for good measure. I think you'll have a much more positive experience if you can find a local buyer.

Looking at it from the other perspective, if you're a buyer, consider that there's a very good chance that the computer is going to arrive in pieces, even if the seller thinks they're skilled at packing. These Macs haven't aged well, and shipping services are rough on packages. I know I learned my lesson.

]]>
https://www.downtowndougbrown.com/2025/05/please-dont-ship-heavy-fragile-vintage-computers-they-will-be-destroyed/feed/ 4
How I fixed the infamous Basilisk II Windows "Black Screen" bug in 2013 https://www.downtowndougbrown.com/2025/05/how-i-fixed-the-infamous-basilisk-ii-windows-black-screen-bug-in-2013/ https://www.downtowndougbrown.com/2025/05/how-i-fixed-the-infamous-basilisk-ii-windows-black-screen-bug-in-2013/#comments Thu, 15 May 2025 06:08:40 +0000 https://www.downtowndougbrown.com/?p=11224 I've been noticing a lot of fun stories lately about bugs in old software that suddenly showed up in newer Windows versions. For example, here's an excellent writeup by Silent about a bug in Grand Theft Auto: San Andreas that laid dormant until Windows 11 24H2 came out. MattKC also recently posted a cool video about the massive project of decompiling LEGO Island, which also solved the mystery of the "exit glitch" that happened in newer versions of Windows. Nathan Baggs has also been at it again, fixing a modern compatibility issue with Sid Meier's Alpha Centauri this time.

I won't spoil these stories for you, but they all reminded me of a bug that I fixed twelve years ago in Basilisk II but never wrote about until now. Basilisk II is one of the more popular 68k Mac emulators, allowing you to run an old Mac system on your modern machine. Nowadays, you can even run it in your browser using Infinite Mac! Here's a screenshot of Basilisk II running on my Windows 10 machine.

The bug was: when you launched it, the emulated Mac would just sit there with a black screen rather than booting up. It didn't happen every time, which really confused everybody. The problem seemed to be way more common on newer Windows versions, which were Vista and 7 at the time, but people also occasionally saw it on XP too. It definitely failed most of the time for me with Windows 7. Nobody was seeing this issue on Mac OS X or Linux.

To re-familiarize myself with this bug for the purposes of writing this post, I downloaded the broken version from the Internet Archive and tried it out in some virtual machines. Windows 2000 and XP ran it without any trouble on the first try, but Vista and 7 didn't:

Basilisk II has a UI quirk that's really annoying in this particular situation: the close button doesn't work. You have to cleanly shut down the emulated machine in order to exit, which is impossible to do when you're stuck with a black screen. This functionality is useful to protect you from losing data in the emulator when it's working correctly, but it meant that whenever it started with a black screen, I had to go into the Task Manager, right-click on the process in the list, and choose "End Task". How irritating! It took me about 10 tries before I was able to convince it to run properly without the black screen. No wonder I had tortured myself with this bug fix back in 2013.

Back in the day, there were all kinds of interesting theories and solutions posted by users about this problem. One person blamed a Bluetooth-related "BTTray.exe" service. Someone else found that opening the hard disk image with HFVExplorer before running Basilisk would allow it to work. Another person observed that running it as Administrator fixed the issue. Compatibility Mode settings were also a common workaround. Somebody was even using Safe Mode to get around it. People had been complaining about this as far back as 2005. Given that there were so many differing explanations with varying success, it seemed likely that none of them could truly be the answer.

The only solution that worked for everyone was to revert to an outdated version of Basilisk II from 2001, known as "build 142". This was sometimes referred to as the pre-JIT version, because it came out before a just-in-time compiler was added to the 68k emulation to drastically improve performance. The old version worked fine, but it lacked all of the modern (at the time) improvements such as JIT.

Anyway, in 2013 I was also affected by this problem on my Windows 7 computer, and decided to take a stab at fixing it. This bug extermination tale isn't quite as epic as the three linked above, because Basilisk II is open-source. I had access to all the code to see what was going on. But still, even having the source code, I had no idea where to start looking. Why would the behavior randomly change between runs? Maybe an uninitialized variable? Why were modern Windows versions more likely to cause it to fail?

Since I was able to reproduce both a failure case and a success case, I added a bunch of debug trace output to Basilisk II. I wanted to see what was changing between a successful run and an unsuccessful run. I focused on the video code. Was video actually working internally and failing to be displayed in the window, or was something deeper screwing up the video and/or causing the emulated machine to fail to boot?

This iterative testing and debug tracing process revealed that redraw_func() in BasiliskII/src/SDL/video_sdl.cpp was periodically called during both failures and successes, but the display was only ever found to be "dirty" and needing an update in video_refresh_window_vosf() when video worked correctly. When the black screen bug was happening, the display was never dirty.

Of course, this led me to keep tracing backwards to try to figure out why the display wasn't being marked as dirty. I added more checks to see all the code that was running. My big discovery ended up being that SDL_monitor_desc::video_open() was only being called once if there was a black screen, but three times if the video was working. Going backwards from there, I found that this traced all the way back to VideoDriverControl() in BasiliskII/src/video.cpp. It was never being called during "black screen" runs.

The VideoDriverControl() function is special. It's only ever called from the 68k CPU emulator's opcode parsing! It's hooked up to CPU opcode 0x7119. There's a big table of opcodes in this same range relating to disk, CD, floppy, display, sound, external filesystem access, and much more, starting at 0x7100. The Basilisk II source code identifies these as "Extended opcodes (illegal moveq form)".

If you look at the Motorola M68000 Family Programmer's Reference Manual, sure enough, 0x71xx is a family of invalid MOVEQ instructions. Bits 15-8 being 0x71 make it look like a MOVEQ involving register D0, but bit 8 is supposed to be 0, so it's invalid -- the instruction format specifically says it should be 0.

You can see this for yourself in your favorite 68000-series disassembler. 0x7119 fails to disassemble, but 0x7019 decodes as MOVQ #25, D0.

The whole point of this analysis is to show how Basilisk II cleverly uses this invalid range of instructions as its mechanism for communicating between the emulated CPU and the host machine. The CPU emulation looks for these invalid opcodes and calls various functions inside the codebase for handling disks, audio, displays, and stuff like that. In some ways, it's kind of similar to the A-line instruction mechanism that classic Mac programs use for communicating with the operating system, except it only works in this particular emulator.

So in a successful boot with video, the 0x7119 instruction was being executed at some point by the emulated CPU. In a boot with a black screen it wasn't. In other words, the emulated machine itself was the source of the problem. Yikes!

Wait a minute. This doesn't make sense. How would the emulated machine even know to call such an instruction in the first place? I wasn't even supplying a boot disk to Basilisk II, so how it possibly be loading a driver that executes Basilisk II's custom 0x7119 instruction?

This is where Basilisk II differs from other emulators like MAME. It's not trying to perfectly reproduce the way that a stock machine runs. Instead, it patches the ROM you supply (I used a Quadra/LC/Performa 630 ROM) so that it bypasses things that would cause crashes in the emulated environment, and also injects its own code to tell the emulated machine what to do for video, audio, keyboard and mouse input, and so on. So it does make sense after all.

This is all detailed in BasiliskII/src/rom_patches.cpp and BasiliskII/src/slot_rom.cpp. In particular, the part of the code I was dealing with involved the function InstallSlotROM(), which creates a declaration ROM containing two drivers: Display_Video_Apple_Basilisk and Network_Ethernet_Apple_BasiliskII. It sticks this DeclROM at the end of the Mac's ROM so that it'll be automatically detected at startup.

I verified that InstallSlotROM() was indeed being called, both during successes and failures. So the driver was definitely being added to the ROM. The problem was caused by something executing differently inside the emulated machine. When the video worked, the Display_Video_Apple_Basilisk driver was loading and running. When there was a black screen, it wasn't. Furthermore, it became apparent by looking at a CPU trace that during a black screen failure, the emulated machine was running fine otherwise! It just didn't have any video.

So why was the emulated machine often failing to load the driver in newer versions of Windows? Why would the version of Windows even matter for this? This is an emulator, for crying out loud. Shouldn't the internal state be the same every time I run it?

The big breakthrough for this problem came as I examined the InstallSlotROM() function in more detail, adding more debug output to try to discern differences. I noticed that whenever the black screen problem occurred, the value of the variable ROMBaseHost, used by InstallSlotROM(), looked much different than it did during successes.

Success values of ROMBaseHostFailure (black screen) values of ROMBaseHost
0x04C900000x02970000
0x04C400000x02730000
0x04C800000x02720000
0x04CA00000x02710000
0x04C500000x025C0000

That's odd. ROMBaseHost is the address from the host machine's perspective where the emulated machine's ROM lives. Why would the host address of the ROM even matter inside of the emulated machine? Was this just a coincidence? (Narrator: no, it wasn't.)

I looked at the code that allocated ROMBaseHost in the platform-specific Windows directory. First, it allocated space for the emulated RAM, and then allocated one megabyte for ROM:


// Create areas for Mac RAM and ROM
RAMBaseHost = (uint8 *)vm_acquire_mac(RAMSize);
ROMBaseHost = (uint8 *)vm_acquire_mac(0x100000);
if (RAMBaseHost == VM_MAP_FAILED || ROMBaseHost == VM_MAP_FAILED) {
ErrorAlert(STR_NO_MEM_ERR);
QuitEmulator();
}

vm_acquire_mac() is a function that goes through a few layers, but eventually it ends up calling VirtualAlloc() to do its job on Windows. The same section of code in the Unix port looked like this instead:


uint8 *ram_rom_area = (uint8 *)vm_acquire_mac(RAMSize + 0x100000);
if (ram_rom_area == VM_MAP_FAILED) {
ErrorAlert(STR_NO_MEM_ERR);
QuitEmulator();
}
RAMBaseHost = ram_rom_area;
ROMBaseHost = RAMBaseHost + RAMSize;

The difference is that this code allocates both RAM and ROM at the same time, rather than through two separate allocation calls. For a little more perspective here, in all of the test cases I listed above, regardless of success or failure, RAMBaseHost was somewhere in the range 0x3xxxxxx. Here are two examples:

SuccessFailure (black screen)
RAMBaseHost0x03C900000x03B80000
ROMBaseHost0x04C900000x02970000

Was it as simple as that? ROMBaseHost being below RAMBaseHost in the host machine's memory space prevented the emulated computer from loading the video driver? The equivalent Unix code prevented that situation from ever happening.

As it turns out, yes. That was the problem. My fix ended up being to port the Unix version of the code over to Windows.

I was slightly nervous that the separate vm_acquire_mac() allocations were an intentional thing on the Windows port, but as soon as I combined them into one, the black screen went away and everything worked perfectly every time.

To explain the fix in more detail, the individual calls to vm_acquire_mac() for allocating RAM and ROM meant that sometimes the address of ROM from the host's perspective was below RAM, and sometimes it was above RAM. It would fail whenever the ROM was below RAM. This was what caused the problem to be so random. It was also probably a decent explanation for why newer Windows versions seemed to experience the problem more often. My theory is that sometime around Vista, the behavior of Windows' memory allocator changed, and it became much more likely for the second allocation's address to be lower than the first. Experimentally, it seems like XP usually just kept going upward with addresses when running this code.

What the heck though? Why would the host machine's address for the ROM even matter to begin with? Wouldn't the emulated computer have its own completely independent address space anyway?

Looking through some of the documentation included with the source code, you can see that Basilisk II has a few different addressing modes. I checked, and the Windows version uses DIRECT_ADDRESSING:

Emulated CPU, "direct" addressing (EMULATED_68K = 1, DIRECT_ADDRESSING = 1):
As in the virtual addressing mode, the 68k processor is emulated with the UAE CPU engine and two memory areas are set up for RAM and ROM. Mac RAM starts at address 0 for the emulated 68k, but it may start at a different address for the host CPU. Besides, the virtual memory areas seen by the emulated 68k are separated by exactly the same amount of bytes as the corresponding memory areas allocated on the host CPU. This means that address translation simply implies the addition of a constant offset (MEMBaseDiff). Therefore, the memory banks are no longer used and the memory access functions are replaced by inline memory accesses.

What this means is host addresses are easily translated to virtual addresses in the emulated machine by subtracting an offset (MEMBaseDiff), which is simply the same value as RAMBaseHost. And likewise, to convert from an emulator address to a host machine address, you add MEMBaseDiff instead. This effectively makes the RAM always mapped to virtual address 0, and the ROM ends up mapped in virtual address space at ROMBaseHost - RAMBaseHost.

I find this whole setup quite confusing, but I will admit that I'm not highly experienced in writing emulator code. I'm assuming the reasoning behind this setup, as opposed to just using "if" statements to check if a virtual address is inside of RAM or ROM, has something to do with performance. I didn't spend much time looking further into it. I did notice that the old pre-JIT version without the black screen bug didn't have this direct addressing mode, so that's why it didn't have this problem.

Let's think about what this offset subtraction means in the example success and failure scenarios I listed above. In the success case, RAMBaseHost was 0x03C90000 and ROMBaseHost was 0x04C90000. This means the virtual ROM address was 0x04C90000 - 0x03C90000 = 0x01000000. That result actually makes a whole lot of sense, because I had Basilisk II set up to use 16 MB of RAM, so Windows' allocator did exactly what you might expect and allocated the ROM directly after the RAM. This is also what my patch guaranteed the behavior would always be in the Windows version of Basilisk II going forward. Makes sense.

On the other hand, in the failure scenario, RAMBaseHost was 0x03B80000 and ROMBaseHost was 0x02970000. The subtraction to determine the virtual ROM address ended up wrapping around below 0: 0x02970000 - 0x3B80000 = 0xFEDF0000. So the ROM was being mapped to a really high virtual address inside of the emulated machine. Here is a visual aid to show what was happening:

That's definitely a big difference from the emulated machine's perspective. It still doesn't really explain why it failed in this case, though. It's just an address, right? The Mac's ROM is relocatable. Who cares if it's at 0xFEDF0000 instead of 0x01000000? I traced through the emulated CPU instructions and found where they began to differ. The problem was inside of the ROM's Slot Manager code. During failures, the ROM's virtual address was at 0xFExxxxxx, which is the standard slot space for slot E. On the other hand, when it succeeded, it was in slot 0 because of the high nibble being 0. What it comes down to is the ROM didn't expect itself to be mapped at 0xFExxxxxx, so the Slot Manager failed when attempting to load the DeclROM that Basilisk II placed at the end of the ROM.

Basically, it's risky to allow the Mac's ROM to be placed anywhere in the address space of the emulated machine, and newer versions of Windows just so happened to be allocating memory in a way that the caused the emulated Mac's Slot Manager to dislike the virtual address of its ROM. This caused it to bail out when it should have been loading the video driver.

To really confirm that I had figured out the problem, I modified the Linux version of Basilisk II to force it to put the ROM below RAM, just like I was seeing when the Windows version didn't work. This caused the video to successfully fail in Linux every time. And with that, I was confident that I had tracked down and eliminated the bug for good. Two days after I submitted the fix, my pull request was merged, and Basilisk II has worked great on Windows ever since then.

The funny thing is that this actually used to be a bug in the Unix version too, and it had already been fixed in 2005 -- I wasn't even the first one to track this problem down. The person who fixed it for Unix didn't apply the same fix to the Windows version. As I look back on this today, I'm realizing that when I fixed it in the Windows port in 2013, I forgot to update the corresponding deallocation code to only be a single call to vm_release(). Whoops! That little mistake is probably harmless, but I should submit another PR to fix it for consistency.

TL;DR: As usual, this compatibility issue with newer Windows versions was not the fault of Windows. If you call malloc (or something equivalent) twice in a row back-to-back, you can't assume the second pointer you get back is going to be greater than the first. The broken code in Basilisk II mostly got away with it in Windows XP, but Vista must have changed something pretty significantly under the hood.

I want to close this off by saying that looking at my pull request from 2013 -- only the second one I ever opened on GitHub, by the way -- embarrasses me a little bit. It's a nice reminder of just how green I was with Git at the time. I was apologizing in the PR for accidentally including a minor unrelated file naming case sensitivity fix, which is something I easily should have been able to separate into individual branches and PRs myself if needed, but I had no idea what I was doing. It's fun to have a window into the past to see how far I've come in the past twelve years! I wonder what the next decade will bring?

]]>
https://www.downtowndougbrown.com/2025/05/how-i-fixed-the-infamous-basilisk-ii-windows-black-screen-bug-in-2013/feed/ 3
Apple's long-lost hidden recovery partition from 1994 has been found https://www.downtowndougbrown.com/2025/03/apples-long-lost-hidden-recovery-partition-from-1994-has-been-found/ https://www.downtowndougbrown.com/2025/03/apples-long-lost-hidden-recovery-partition-from-1994-has-been-found/#comments Sat, 15 Mar 2025 22:25:54 +0000 https://www.downtowndougbrown.com/?p=10477 In my last post about hard drives that go bad over time, I hinted at having rescued a lost piece of obscure Apple software history from an old 160 MB Conner hard drive that had its head stuck in the parked position. This post is going to be all about it. It's the tale of a tad bit of an obsession, what felt like a hopeless search, and how persistence eventually paid off. There's still an unsolved mystery too, so I'm hoping others will see this and help to fill in the blanks!

This whole saga starts with a very interesting blog post written by Pierre Dandumont in 2022. Pierre's (excellent) blog is in French -- Google does a good job of translating it for me. He found a quote in a book referring to special functionality bundled with Apple's Macintosh Performa 550 computer:

The LC 550's Secret Partition

If Apple's programmers, in creating the Performa series, were aiming to make idiot-proof computers, they were serious about it. The Performa 550 is an amazing case in point. When you run the included Apple Backup program (see Chapter 15), you get a little surprise that you didn't count on: a hidden partition on your hard drive!

This invisible chunk of hard drive space contains a miniature, invisible System Folder. Apple's internal memo explains it this way:

"When a system problem (one that prevents the Performa from booting) is detected, a [dialog box] informs the user of a system problem. The user can choose to fix the problem manually or to reinstall software from the backup partition's Mini System Folder."

If you choose to reinstall your System software, you get the wristwatch cursor for a moment while the miniature System Folder is silently copied to your main hard-drive partition. The Performa restarts from the restored hard drive, and the invisible system partition disappears once again.

We got a Performa team member to admit that this kind of sneaky save-the-users-from-themselves approach may well be adopted in other Performa models.

Who knows what goodness lurks in the hearts of men?

Cool! Although I have owned my own copy of this book for decades, I had no recollection of ever reading this little blurb. The book, if you're curious, is Macworld Mac Secrets by David Pogue and Joseph Schorr. I found this whole functionality very intriguing, particularly because I had what felt like a very personal connection to it: the very first Mac that my family had when I was growing up was a Performa 550. I don't think I have any pictures from back then, but in the meantime I've acquired one that looks exactly identical, so here's a (slightly blurry) view of the type of machine I'm talking about in this post:

I know that many people think the LC/Performa 5xx case style is ugly, but I really like it! I'm definitely biased though.

This is an early model manufactured in September of 1993, which came with a caddy-loading CD-ROM drive (AppleCD 300i). Like other Macs from the same era, newer versions from 1994 came with a tray-loading drive instead (AppleCD 300i Plus). For comparison, here's a photo of a late-model Performa 550 with a manufacture date of March 1994 that re4mat kindly gave me permission to share here:

Pierre asked me if I had a copy of Apple's software restoration CD for the Performa 550, and if I knew how to get it working in an emulator in order to try out this special functionality. I pointed him to a download link for the Performa CD for the 500 Series, version 7.1P6:

If you weren't using multimedia computers in the early 1990s, you might not recognize the weird rectangular container that this CD is enclosed inside of. It's a CD caddy, and it's what was used for inserting CDs into computers like the first one pictured above. You would open the caddy by squeezing the top right and bottom right ends toward each other, stick the disc into it, close it, and then push it into the slot in the computer, similarly to how you would insert a floppy disk. I really don't miss these things one bit!

Back to the story, though. I also gave Pierre some tips for using the restore CD in an emulator. Nowadays, my advice is outdated because it's much easier to use Apple restore CDs in at least one emulator -- MAME has come a long way in the last few years. He figured out a bunch more stuff on his own after that, including trying it in his own Performa 450 (not 550), but the bottom line was that the recovery partition was nowhere to be found.

Well, sort of. He found that the process of restoring from the CD actually did create a recovery partition. Here's a screenshot of the partitioning from inside of Apple HD SC Setup while booted from the Performa CD, after formatting the hard drive by clicking the Initialize button in the main window:

As you can see, there's a 2,560 KB partition of type Apple_Recovery almost at the end of the drive, just after the main partition named "Hard Disk". This was promising at first glance, but the partition was empty! Further testing revealed that the custom Performa-specific version of Apple HD SC Setup (7.2.2P6) bundled on the CD was responsible for creating it, but didn't actually populate it with any data. Apple Backup also didn't put anything onto the partition, despite what the book said. I even looked through my past disassemblies of the Apple Backup and Apple Restore code and confirmed that there was nothing related to creating a recovery partition.

The conclusion at the time was that someone needed to get ahold of a Performa 550 that still had its original hard drive and had never been reformatted. That's where this story sat for 3 years.

A few months ago, I remembered this whole situation and decided that I really wanted to try to find this partition. After all, the clock had always been ticking. The longer we waited, the fewer and fewer original Performa 550s would be out there in the wild. Not to mention that hard drives go bad and people throw them out without knowing that it's usually possible to recover data from drives of this era. I confirmed all of Pierre's findings in MAME. I even tried using Apple Backup in case I missed something, but no, it didn't do anything with the hidden recovery partition. An easy way to look at it is to manually edit the partition table in a hex editor and change the type from Apple_Recovery to Apple_HFS.

After doing this and booting up, I found another hard drive icon on my desktop called Recovery Volume, but it was empty, just like Pierre said:

Taking it a bit further, I tried recreating the recovery functionality myself. I copied a minimal system folder to the Recovery Volume, and then changed its type back to Apple_Recovery. This made it invisible again. Then I screwed up my main system folder and rebooted. Sure enough, it automatically came up with the Recovery Volume as the main boot volume.

This proved that the mechanism for booting from the recovery partition worked; we were just missing the data that was supposed to be on it. I came to the same conclusion that Pierre had already reached: we needed to find a Performa 550 that had never been reformatted. In the meantime, I spent some time digging into archives of Apple's old tech notes and found several more references to this functionality.

System 7: Performa Versions Compared (9/95) -- the first bullet point under System Software Version 7.1P6 refers to this feature:

Backup Partition Software-automatically detects corrupted system folders. When a bad System Folder is detected, the user is given the option to re-load another System Folder into their system.

Performa 550: Description of Backup Partition (3/94) -- this note is clearly the "internal memo" that Macworld Mac Secrets was quoting. Some interesting excerpts from this article:

The Apple Backup application creates a backup recovery partition that allows the Performa to boot even when the System Software on the main hard drive has been corrupted. The partition is invisible to the user.

There is no built-in limit to the number of times the backup partition can be used. However, the partition will be lost if the hard drive is re-formatted. At this time the backup partition is used only on the Performa 550.

Performa 550: System Folder Created w/ Dinosaur Safari CD (8/94) -- not that I needed any more proof of the recovery partition's existence at this point, but I got a kick out of this one. It talks about how launching an educational game about dinosaurs accidentally caused the system to go into recovery mode. It provided a little more info about what would happen when the recovery dialog popped up:

When I launch the Dinosaur Safari CD from Creative Multimedia, a dialog box appears telling me that my Performa computer is having trouble starting up. I only have two options Shutdown or Continue? Why?

After reading these articles, I was very convinced that the recovery partition was a real thing that existed, but I was also pretty confident that Apple Backup wasn't responsible for creating it, despite Apple claiming otherwise. I had already seen that the special build of Apple HD SC Setup was what actually created it, and plus, like I said earlier, I had looked closely into a disassembly of the version of Apple Backup supplied with the Performa 500 series restore CD. There was nothing that copied any files to another partition on the hard drive, at least not that I could see.

Really, the most important thing I gained from this exercise was that the second tech note confirmed the need to find a Performa 550 that had never been reformatted. Also, if the first tech note was to be believed, it needed to have come with System 7.1P6. This could narrow the search even further -- I know for a fact that earlier Performa 550 models came with 7.1P5, including my childhood one. The same tech note also pointed out that 7.1P6 was the first version to support the "AppleCD 300+", which is referring to the tray-loading CD-ROM drive. Based on this information, it's reasonable to deduce that all Performa 550s with a tray-loading CD-ROM drive would probably have originally come with at least System 7.1P6.

There was only one thing left to try at this point: asking the Internet for help. I asked people everywhere I could think of: Tinker Different, 68kMLA (where Pierre had already asked), and various social media sites. I searched Reddit and found people who had posted in the past about having a 550, asking if they still had the hard drive. I think I scared some of them -- at least one person deleted their post after I asked! To be honest, I can't blame them. I can imagine how freaky it would be to hear from someone begging to look at my hard drive's contents. I'm sure some people might think of it as crossing a line, but it's not as crazy of an ask if it's a machine they've received second-hand from someone else. Plus, I was very clear about exactly what I was looking for (and why).

I asked a seller of a Performa 550 that had been sitting on eBay for a long time if they would be willing to sell me the hard drive separately. They weren't interested. I even bought some random hard drives on eBay that definitely went with a 5xx-style case. These were easy to identify because this case style uses a unique adapter for plugging the drive into the chassis wiring harness when you slide it into place.

What do I have to show for all of these eBay purchases? Well, after dumping them all with my ZuluSCSI in initiator mode, I can say that the one pictured above came from a Macintosh TV. I also found another one from an LC 575. Lastly, I bought yet another drive that the seller said came from a Performa 577. The Performa 577 one was funny -- it had all the Mac mounting hardware on it, but when I dumped it, it turned out to be from an Atari TT or Falcon (not sure which). I'd love to hear the story of how it ended up with an LC 5xx drive sled and adapter on it! Needless to say, none of them had the elusive recovery partition. One particularly friendly eBay seller was even nice enough to show me a preview of a drive's contents in HFSExplorer, which helped me determine that it wasn't from a Performa.

I almost began questioning my sanity at one point during this search. Multiple people initially told me that they thought I was confused about this whole thing. I pointed them toward Apple's tech notes describing it. Were Pierre and I imagining this whole thing? Were Apple's tech notes all a lie?

The thing is, this whole functionality was super obscure. It's understandable that people weren't familiar with it. Apple publicly stated it was only included with this one specific Performa model. Their own documentation also said that it would be lost if you reformatted the hard drive. It was hiding in the background, so nobody really knew it was there, let alone thought about saving it. Also, I can say that the first thing a lot of people do when they obtain a classic computer is erase it in order to restore it to the factory state. Little did anyone know, if they reformatted the hard drive on a Performa 550, they could have been wiping out rare data that hadn't been preserved!

Someone who saw my post on Reddit mentioned that they had a Performa 550 and would check it out. It was a newer tray-loading model with a January 1994 manufacture date. Unfortunately, the Conner hard drive inside of it wouldn't cooperate, and plus this person didn't have anything capable of dumping the contents. Luckily for me though, they were totally comfortable with letting me borrow the drive and try to recover the data from it.

To tie everything together, we have now reached the point in this story that I covered in my last post about hard drives with stuck heads. As I mentioned in that blog, I could not get this drive to do anything. It would just spin up, sit there for a while, spin down, and then make an annoying buzzing sound for a while, repeating that whole process over and over again.

I tried all kinds of things. I nudged the head while the platters were spinning, inspected it with my thermal camera to see if any components were getting hot, and tried it at different temperatures -- cold shortly after it arrived, and at room temperature later. The only thing I noticed was that when it was making the buzzing sound, one of the IRFD123 MOSFETs would get much hotter than normal: up near 100 degrees Celsius.

I wasn't really sure what to do with this information though. It just seemed wrong that the head wasn't moving at all. That's when I finally decided to inspect everything further inside the drive and noticed the head stack seemed like it was sticking to a rubber/plastic looking piece. The Kapton tape trick I figured out and showed off in the last post finally allowed me to dump the drive contents. If you didn't catch it last time, here's a video showing how it was stuck, along with a successful dump with the help of the tape:

As soon as the drive imaging process completed, I powered everything off and anxiously opened the hard drive image file with my favorite hex editor (HxD):

Boom! This drive had a recovery partition on it! Now, that didn't necessarily mean anything. After all, I had already seen an empty partition created by Apple HD SC Setup on the Performa CD. Still, though, it was definitely promising. Here's an interpretation of the data at the beginning of the entry in the partition table:

50 4D = PM = Signature
00 00 = Padding
00 00 00 05 = 5 total partitions on the drive
00 04 E2 60 = starting physical block of the partition (0x4E260 blocks = 0x9C4C000 bytes)
00 00 14 00 = size of partition in blocks (0x1400 blocks = 0x280000 bytes = 2560 kilobytes)
name = MacOS
type = Apple_Recovery

Also, just like in the partition table created by the Performa CD that I had inspected earlier, there were four bytes "msjy" at an offset of 0x9C bytes into the partition table entry. No other partitions had any data at 0x9C. I wonder if these are a couple of developers' initials hiding in there or something? Is it an acronym? "Make Steve Jobs Yodel"? I even asked ChatGPT to come up with a playful interpretation in the context of Macs in the mid-1990s. It suggested "My System Jammed Yesterday", explaining it as a playful nod to the "chaotic charm" of the era's extension conflicts and Sad Mac screens. I didn't even mention anything about it involving OS recovery. Tell me how you really feel about old Macs, ChatGPT!

Knowing that the partition was there, the next step was to look near the end of the dumped drive image in HxD. If the partition had any actual data stored, it would be very obvious because starting at 0x9C4C000 in the file, there would be actual data and not just a bunch of zeros.

This is where I started to actually get excited. The partition contained boot blocks! This was obvious because of the starting signature of LK and all of the various system file names plainly visible. On the other hand, the recovery partition created by the Performa CD during testing had zeros at this location -- no boot blocks.

These boot blocks are identical to the main partition's boot blocks, except for one very important difference: at 0x1A, the Pascal string containing the Finder name is "recovery" instead of "Finder" like you'd normally see. This means that if you boot from this partition, it will load a program named recovery instead of the usual Finder app you'd expect on most Mac OS installs.

This was definitely something special that the restore CD was not capable of recreating. As I scrolled further down through the partition, it quickly became obvious that it actually had some files!

Okay, now I was totally stoked! I booted up a copy of the imaged drive in MAME and immediately noticed that there was evidence that the recovery partition had definitely activated itself on this machine in the past: there was a folder named Mini System Folder on the desktop with a creation date in 2004, and the trash contained an app called Read Me Mini System Folder with the exact same date.

I wanted to experience the automatic OS recovery process for myself without any customizations from the original owner of the machine this hard drive came from, so I used HxD to copy the entire 2,560 KB recovery partition onto the fresh hard drive image I had created by restoring from the Performa CD. This was easy because the Performa version of Apple HD SC Setup had created an empty recovery partition with the exact same size. Then I booted it up in MAME and dragged the System file out of my System Folder in order to intentionally mess it up. I had to turn off System Folder Protection in the Performa control panel first:

This is the classic kind of mistake that would have normally left you with an unbootable system showing a floppy disk icon with a flashing question mark. Would Apple's automatic Performa OS recovery save me from myself? I rebooted to see what would happen. Instead of seeing a flashing question mark, I saw a Happy Mac very briefly before the system rebooted itself again. Then another Happy Mac showed up, and this time, it looked like a normal boot, except no extension icons showed up at the bottom of the screen. It was definitely booting from the recovery partition. Eventually, I was greeted with this screen:

Hooray! This was exactly the dialog box that Macworld Mac Secrets and Apple's tech note had referred to. The recovery partition had been successfully rescued!

Let's walk through the rest of this feature. If you click Shut Down, obviously the machine turns itself off. But when it boots back up, the recovery partition doesn't automatically kick in anymore. So you're on your own to fix the problem by booting from the Performa CD or the Utilities floppy disk.

On the other hand, clicking OK does exactly what the tech note describes. You get the wristwatch cursor for a few seconds, the system reboots, and then you are greeted with this amazing screen, complete with an ugly yellow desktop pattern. Shall we call it the yellow screen of shame? Notice that the Mini System Folder on the desktop is the active System Folder, because it has the special icon.

Here are the rest of the pages in this Read Me Mini System Folder app:

Aha! So it's not entirely automatic, since you still have to manually drag the System, Finder, and System Enablers from the Mini System Folder back to your original System Folder. Still though, it's a very handy solution that gives you a bootable machine when something goes wrong with your OS.

If you just ignore these instructions and keep using the computer, you will be nagged with this Read Me on every boot because it lives inside the Startup Items folder of the Mini System Folder. The Read Me also appears on your desktop, but for some reason it doesn't show up until you open the Hard Disk icon.

Let's take a deeper look at how it all works by temporarily changing the partition type to Apple_HFS instead of Apple_Recovery and booting up again, so we can inspect the files. After a quick automatic rebuild of the desktop file, the Recovery Volume appears, with actual contents this time!

Inside of the System Folder, there are definitely some interesting things. As expected based on the earlier analysis of the boot blocks, there is an app named "recovery" that contains all of the interesting stuff. The icons are kind of arranged willy-nilly in here.

The creator code of the recovery app is msjy -- the exact same magic value we saw in the partition table entry.

Scrolling further down, there is a System file and various enablers. Everything is marked as being part of System Software v7.1P6.

It's interesting to me that although this recovery partition was only available on the 550, it still has a bunch of enablers for other Performa models: the 45x/46x, 47x/57x, and 600. I guess that's not too crazy considering all of these exact same enablers are included with a fresh copy of System 7.1P6 installed using the Performa CD.

As a quick detour, System Enabler 316 is an interesting one that is hard to find info about on the Internet. I inspected its 'gbly' resource and determined that it's for the Centris 610, Centris 650, and Quadra 800. It's an older version of the enabler created before the speed-bumped Quadra 610 and Quadra 650 were a thing. I wonder if there was a plan at some point to have a Performa model based on one of those machines? If I had to guess, maybe it would have been a 68040-based successor to the Performa 600, which uses the same case style as the Centris 650. The Performa 650?

Let's not get too far off track. Back to the Recovery Volume's System Folder -- as expected, the Startup Items folder contains the Read Me application:

Everything started to become clear. The recovery app was marked as the startup application instead of the Finder. It displayed the dialog giving the user the option to recover. If they clicked OK, it would copy the entire System Folder from the Recovery Volume, omitting itself, to the Desktop Folder of the main hard drive partition. Then, it would "bless" the newly-copied mini System Folder and reboot.

How did all this stuff get into the partition? Did Apple Backup do it, or was it factory-programmed data? I tried to see if I could deduce anything from the dates of the files. In order to preserve the integrity of all of the displayed dates, I performed this analysis with a read-only copy of the original drive image in order to prevent any modification dates from being updated.

All of the files in the partition have a creation date of March 4, 1994 -- over 31 years ago! Most of the files have a matching modification date, except for the System suitcase, which was last modified on September 26, 1994. I don't know exactly what this all means, considering it came from a machine with a January 1994 manufacture date.

The Recovery Volume itself also has a creation date of March 4th, just five minutes before the creation date of all the files. Interestingly, the modification date of the volume is still shown as March 4th in the Get Info window, even though the System suitcase was modified later in September of that year.

The Master Directory Block of the Recovery Volume says the modification date (drLsMod) is September 26th, matching when the System file was changed. I'm not sure what causes this discrepancy. I guess the date displayed in the Get Info window isn't simply the date stored in the Master Directory Block.

Similarly, although the main hard drive partition has a creation date of December 5, 1993 according to the Master Directory Block, the Get Info window says it was created on February 3, 1994. I'm not sure which one is more accurate. Either way, it's pretty clear this drive had not been reformatted. I did find it curious that the recovery partition was created over a month later, though. When you reformat a hard drive using the special version of Apple HD SC Setup on the Performa CD, the recovery partition ends up with a creation date about a minute after the main partition.

The Finder and System Enablers in the recovery partition are identical to the same stock files from a 7.1P6 restore. The only difference I could find in the System file was that the recovery partition's version was missing a single At Ease 'INIT' resource, but the At Ease Startup extension automatically adds it to the System file after you reboot. This leaves you with a System file totally identical to what is restored from the Performa CD. I find it odd that At Ease was stripped out, but the American Heritage Dictionary 'FKEY' resource was not.

The best theory that I can come up with is that Apple Backup really was responsible for creating this partition. After all, Apple went out of their way to specifically mention it in their tech note. Maybe March 4th, 1994 was the date when the original owner of the computer backed it up for the first time. September 26th could have been the last time that Apple Backup was run. Perhaps the owner completely uninstalled At Ease from the computer between March and September, so the System file had been changed and the recovery copy needed to be updated accordingly? Unfortunately, most of the Performa-specific software had been deleted from this computer. It was still running System 7.1P6, but Apple Backup was nowhere to be found. So I wasn't able to confirm whether or not a mysterious, unpreserved newer version of Apple Backup was really responsible for populating the partition.

The other theory floating around in my head is that maybe it came from the factory like this. The March 1994 timeline is consistent with the date of the tech note describing the functionality, so maybe that's when Apple created it and started bundling it. I don't know how long the machines sat at Apple's factory before they were actually sold -- does a manufacture date of January 1994 also mean it was shipped to a store in January 1994? Either way, I definitely don't know how to explain the September 26th, 1994 modification date. Maybe a third-party utility did something to the System file on the secondary partition? The first Apple Backup theory seems like the more likely explanation, especially given that Apple said that's how it was created.

This whole question is the last piece of the puzzle that hasn't been solved yet. If anyone else has a Performa 550 and would be willing to dump their hard drive or at least look at Apple Backup, I'd be very interested in finding out A) if it has the recovery partition and B) if there was a special newer version of Apple Backup that didn't make its way onto the Performa CD. I searched for various strings that show up in the "recovery" and "Read Me Mini System Folder" apps, and they aren't anywhere on the Performa CD. I guess they could be stored compressed somewhere, but I'm pretty confident based on the actual Apple Backup code that nothing is hiding in there. Here are the various versions (with their exact sizes and dates) of Apple Backup that I have seen on Performa 550 installations. None of these have the recovery partition creation built in:

I also found version 1.3 (June 15, 1994, 163,388 bytes used) by restoring from a Performa 636 restore CD. It, too, does not contain any recovery partition code.

For a demo, I thought it would be fun to replicate the problem that the Apple tech note mentioned about the Dinosaur Safari CD inadvertently activating the recovery partition, so I bought a copy to test it out. To make it even more interesting, I decided to run this test on real hardware. I'm leaning toward believing that a lot of the older caddy-loading models (possibly all of them) didn't have this recovery partition, so just pretend it's a newer model that came with System 7.1P6. I copied the recovery partition onto a real Apple-branded IBM 160 MB SCSI hard drive using ZuluSCSI's USB MSC initiator mode, which allows it to act as a USB-to-SCSI bridge. Sorry about the flickery screen; I couldn't get my phone camera's shutter speed to sync up perfectly with the display's refresh rate.

Sure enough, when I opened the game from the CD, the computer did exactly what Apple's tech note said it would do. The workaround of copying the application to my hard drive worked just fine. If it's not obvious, I sped up the process of copying it to the hard drive -- it took a while! It might be interesting someday to look into why this game accidentally activated the OS recovery, but this blog is already getting way too long!

I want to talk a little more about the yellow screen of shame. When I first saw it, I wasn't entirely sure if it was really part of the recovery functionality or if the original owner just had terrible taste.

Digging deeper, I found three clues that all made it clear it was an intentional choice by Apple to really make it obvious that something was wrong. First, the yellow pattern is stored as a 'ppat' resource in the recovery app.

Second, the System file in the recovery partition has the default blue-gray Performa background shown in the screenshots above. This makes sense, because it's the pattern that showed up with the dialog about the Performa having trouble starting up.

And lastly, page 3 of the Read Me app implies that something may have changed your desktop pattern.

So clearly, the recovery process, by design, sets up the custom yellow background.

Why did I care so much about finding this lost partition? Well, there are a number of reasons. For one, this is exactly the kind of research project that's perfect for me because I don't know how to let things go. It's also something that, quite frankly, needed to be preserved before it became extinct. The most important reason, though, is that this functionality is historically significant and deserves some attention. How many personal computers in 1994 still had the ability to boot after the OS was trashed? Isn't this an extremely early example of this type of functionality? Did Windows have anything like this prior to Vista? Did the Mac have anything else like this prior to sometime in the OS X era? I would love to hear more comments about what you think on this. I admittedly don't know a ton about older machines that weren't Macs.

I'm not saying this feature is perfect. Since we've already seen that the Dinosaur Safari CD was able to accidentally activate it, I wouldn't be surprised if there were other ways to inadvertently cause it to pop up too. It also required manual intervention after the recovery process, which meant that you needed a fair amount of computer knowledge to finish fixing your OS. The average Joe Schmoe would probably have trouble following these directions to fix the System Folder. But still, it leaves you with a bootable system instead of an unusable computer with a flashing question mark. It's very cool, especially for 1994.

I wonder why Apple didn't continue down this path with subsequent models? Or even retroactively adding the functionality to earlier ones after a fresh install of a newer OS. I'm not aware of any other Macs that have this partition. It doesn't depend on any special ROM support or anything like that, at least as far as I can see. I tried out the recovery functionality on several other machines: a IIci, LC, LC 475, and an emulated Performa 600, and it works great on all of them. Heck, it even works on the Classic II/Performa 200!

It kind of looks like the window size of the Read Me app was a calculated decision to ensure it would fit on the 512x342 screen used in black-and-white compact Macs.

Thinking about later models, the Performa 630 series used an internal IDE hard drive instead of SCSI, so the custom version of Apple HD SC Setup was no longer used. I wonder if the Performa 57x series had this partition? You'd think they would have had the exact same software bundle as the tray-loading 550 models. If any readers have a Performa 57x machine, I'd greatly appreciate it if you could check!

How did this functionality actually work under the hood? I haven't gone too deep into the code (maybe it can be a future post), but I have pieced together a few clues. The "msjy" magic number I talked about earlier definitely plays a part in everything. The special Performa version of Apple HD SC Setup also includes a custom version of Apple's hard disk driver. This driver contains several references to msjy, so I'm pretty sure that's what it uses to identify the recovery partition.

I also discovered that the 7.1P4 and 7.1P5 Utilities floppy disks, which were bundled with various Performas, have slightly older custom versions of Apple HD SC Setup: 7.2.1P and 7.2.2P respectively. They also create the recovery partition. The interesting thing about these versions is that it appears Apple accidentally forgot to strip out the debug function names, in both the utility itself and the bundled hard disk driver. They didn't make this mistake in the original non-Performa 7.2.2 version, and they also didn't make the mistake in the newer 7.2.2P6 version. Anyway, this is kind of cool, because it tells me the names of functions that look for "msjy" at an offset of 0x9C. Function names in that same area of the driver code include: recvrybootable, confirmminsystem, flushrecoveryflag, recoveryvolexists, and setrecoveryflags. So Apple definitely at least sort of released some of the recovery functionality to the public prior to 7.1P6, despite what their own version history says. And the disk driver is definitely involved in it.

Newer versions of Apple's disk driver no longer contain the magic number, so at some point they must have abandoned this functionality. In my opinion, it's a real shame that they ditched it -- this could have been very useful going forward on all Macs. They could have even expanded on it and automated more of the recovery process. Sure, it used some of your hard drive space, but it could have been a good trade-off for better reliability.

That's more than enough technical stuff for one post. I am sharing a download link where you can try this functionality out for yourself if you want. After all, the whole reason I did this was for software preservation purposes, so it makes sense to share it with the world. This is a small piece of Apple software history that, to my knowledge, has not been preserved until now. I uploaded a drive image to the Macintosh Garden. Don't worry, I didn't include any of the original owner's personal data. I started fresh with a blank hard drive image, restored it using the 7.1P6 Performa CD, and then only copied over the restore partition from the dumped hard drive. So this is a factory-fresh Performa 550 7.1P6 install with the recovery partition also present and populated.

The MAME command that I use to boot from this disk image is:

mame maclc550 -scsi:0 harddisk -harddisk1 Performa550.hda -window -nomaximize -ramsize 32M

Of course, you can also test it out on a real machine by copying the hard drive image to a ZuluSCSI or BlueSCSI and naming it something like HD00.hda.

Winding down this super long post now, the main lessons I learned from this research project are:

  1. If you get your hands on a vintage computer, strongly consider backing the hard drive up before erasing it. I know it might contain someone's personal files, so be mindful of that, and of course respect their privacy. But there might still be something hiding in the background that has been lost to time. You never know -- it happened here!
  2. The fact that many hard drives go bad as they age might actually be a good thing for software preservation. If a vintage computer's hard drive has a stuck head that can easily be bypassed, someone might sell it as non-working with data intact, rather than erasing it and selling it as "fully tested and wiped".
  3. There are some really awesome people out there in the world!

Special thanks to Pierre for discovering that this functionality even existed in the first place, and getting the word out so we could eventually preserve it. I also have to thank David Pogue and Joseph Schorr for writing about it in their book many decades ago. And of course, huge thanks to the amazing person from Reddit, who asked not to be credited, who gave me the opportunity to borrow and repair the drive that ended up containing the lost partition. You're seriously the best!

I'm going to repeat this again in case anybody has scrolled all the way to the end. There are still missing pieces of knowledge about how exactly this recovery partition would have been originally created. If you happen to have a Performa 550 with its original hard drive and wouldn't mind checking for the partition and/or a special version of Apple Backup, please let me know! I would be happy to walk anybody through the process of dumping the drive contents. I'll send you something if you don't have the equipment needed to dump a hard drive. Even if the machine has been upgraded to System 7.5 or Mac OS 7.6, it's still fine -- everything could very well still be there, lurking in the background.

]]>
https://www.downtowndougbrown.com/2025/03/apples-long-lost-hidden-recovery-partition-from-1994-has-been-found/feed/ 14
The gooey rubber that's slowly ruining old hard drives https://www.downtowndougbrown.com/2025/03/the-gooey-rubber-thats-slowly-ruining-old-hard-drives/ https://www.downtowndougbrown.com/2025/03/the-gooey-rubber-thats-slowly-ruining-old-hard-drives/#comments Sun, 02 Mar 2025 19:02:51 +0000 https://www.downtowndougbrown.com/?p=10245 As part of my work toward an upcoming post about a lost piece of very obscure Mac history that has finally been found, I've been playing around with old Apple-branded SCSI hard drives made by Quantum and Conner in the 1990s. What I'm about to describe is already common knowledge in the vintage computing world, but I thought it would be fun to share my take on it anyway.

What I'm talking about is how a lot of these hard drives just refuse to work anymore. This is very common with old Quantum ProDrive models, like the LPS or the ELS. The drive spins up, you don't hear the expected pattern of click sounds at startup, and then after a few seconds, it spins back down.

This Conner CP30175E drive has a similar problem, but it tries over and over again, playing a tone through the voice coil (I think) in between attempts.

These particular hard drives shown in the videos are both about 160 MB in capacity and were commonly used in computers during the early-to-mid 1990s. You can see they have Apple stickers, so they are definitely stock drives originally from a Mac. At least in the case of Quantum, there are a bunch of different capacities all affected by this same problem, ranging from 40 MB to 500 MB. Maybe even more than that -- those are just the ones I'm aware of. I'm less familiar with Conner, but I wouldn't be surprised if they similarly had a whole family of affected drives.

What's causing this issue? Let's open them up and find out.

There's a common misconception in the computer world that as soon as you open a hard drive and expose it to a single particle of dust, you've completely destroyed it and it will never work again. Now to be fair, with many modern, higher-density drives it's probably true -- some of them are even sealed with helium inside -- but older hard drives like the ones I've shown above are remarkably tolerant of being opened. That's not to say I would leave it operating without the cover for an extended period of time, but for quick data recovery purposes in a decently clean environment, it's fine.

Anyway, with the cover off, let's take a look at what the drive is doing:

The platters spin up, but the head doesn't move at all. Here are a few attempts at gently moving the head by hand with the drive powered off, and then powering it on. Would this fix it?

Nope, it didn't. The head would always just immediately go directly back to the center. Note that I had to unlatch the head before I could move it. That's what I was pointing out with the screwdriver. This is due to Quantum's patented technology (long expired) known as AIRLOCK, which automatically keeps the head stack latched in place near the spindle until the drive spins up. The purpose is to keep the head away from actual data on the platters when the drive is powered off, in case it is jolted. I circumvented the AIRLOCK functionality to see if starting out with the head away from the center of the platters would fix the problem, but obviously as you saw in the video, it didn't.

Many years ago, techknight shared his solution for bypassing this problem. The temporary fix ended up being to manually move the head just like I did before, but it had to be after the platters were already spinning and the latch was free. Some people have needed to manually release the latch because the airflow isn't good enough when the cover is off. I did not experience that issue -- you can see it open up on its own just before I start moving the head:

Ah yes, after the head was freed, you got to hear the familiar click pattern that Quantum hard drives make as they're running their seek test or calibration or whatever it is. That sound is still ingrained in my head after all these years as a little "startup click" that old computers make.

After the drive was working properly, my ZuluSCSI immediately began dumping the contents using initiator mode -- a super cool feature it provides, which is also available in the BlueSCSI v2 firmware fork. Normally these awesome SCSI emulator products are used as hard drive replacements, but initiator mode allows you to use them to connect to a physical drive (even a CD-ROM drive) and save the contents to an SD card -- great for software preservation purposes!

Back to the "spinning up and right back down" problem, though: what causes it? Why did I have to manually move the head in order to make the drive start working? At the time of techknight's video, the underlying cause wasn't as well-known as it is now, but the culprit is a rubber bumper that slowly disintegrates into goo over the years. There are actually two bumpers that both get sticky -- one at each extent of the head's movement. The really troublesome bumper is the one that the head rests against while parked, toward the center of the platter. The other one prevents the head from falling off the outer edge of the platter.

The voice coil motor controlling the head isn't strong enough to overcome this sticky rubber, so the head just sits there until the drive's firmware gives up on trying and spins the platter down.

On some older Quantum drives, both of the rubber bumpers can easily be reached by removing the top magnet that is over the head, so theoretically you could replace them with new rubber tubing with the correct inner and outer diameter. On "newer" drives like this one, only the outer bumper is accessible by removing the magnet, which isn't much help because that's not the bumper that causes the problem.

Instead, the problematic bumper that the head sticks to while parked is actually under the platter, making replacement much less feasible. Here's a view from my endoscope:

At the top of the image is the platter. The bottom is the hard drive enclosure. On the right side, you can see the head resting against the bumper. I stuck some Kapton tape in there while I was playing with it and had the platter off, so that's what the orange color is. Note that I switched to a different single-platter ProDrive LPS model for those last two pictures, but it has the exact same arrangement internally as the drive shown in the videos.

There is one solution readily available on eBay: a plastic insert that prevents the head from hitting the bumper. I don't have any experience with this fix, but it has a bunch of positive reviews. You can find it by searching eBay for "Quantum ProDrive ELS Repair Insert". If you're looking for a long-term solution, that might be the way to go. It's theoretically possible to remove the platter(s) to get to the bumper and replace it, but it's a lot of work and very risky -- especially on multi-platter drives where the platters have to stay in alignment with each other.

If your ultimate goal is just to get the data off of the drive, I have had a bunch of success with the simple method I showed of manually unsticking the head after the drive is totally spun up. Thanks to techknight for originally posting his very helpful video on this process!

Next, let's move onto the Conner drive. There is not quite as much information on the internet about these. How did it behave with the cover removed?

I jumped the gun a bit in the video by not showing how it behaves without any intervention from my hand, but it's the same problem. The head doesn't move at all. I tried manually moving the head in the same way I was able to get the Quantum drive working, but the Conner drive just didn't want to cooperate with me. Here's a closer view where you can see the problem. First, here's how it looks with the head parked:

And here's how it looks after I move the head toward the outside of the platter, away from the parked position:

I don't think there's supposed to be an indentation in the dark piece where the head assembly touches it. It seems to be grabbing fairly tightly and preventing the head from moving away. The grip seems pretty strong here, though -- I get the feeling this drive may be using another mechanism for intentionally keeping the head parked when it wants to, like something magnetic. I'm not 100% sure. I'm pretty sure that either way, the hole in the dark piece is not supposed to be there, though.

On this Conner drive, I was running out of ideas. It's not mine -- someone very kindly loaned it to me with a "no worries, it doesn't work anyway" type of reassurance that I could tinker with it and try to recover data from it as part of my search. I was really desperate to inspect the data on it because it came from a Mac model that I knew could potentially contain the lost software I was looking for, but I just couldn't get it to work.

I was about to give up and send it back, but then I randomly decided to hold a folded-over piece of Kapton tape in the way so that the head couldn't move all the way back to the position where it would get stuck. This was the magic solution I needed in order to get this drive operational. As soon as I did that and powered it on, the drive performed its startup click pattern and then the ZuluSCSI immediately began dumping the drive:

I think this Conner drive is simply more picky about the timing of unsticking the head. I'm still performing essentially the same temporary fix that works with the Quantum drive, but I'm physically holding it back from ever getting stuck. Due to the way the platters stick up on this model, I was too afraid to put the cover back on right away, so I allowed the entire drive to dump with the cover removed. All the data read back perfectly.

I believe I have figured out somewhat of a permanent fix for this drive by adhering a piece of Kapton tape as shown below:

To be honest, after all of the research I've done for this post, I'm concerned that this solution may be leaving the head in a vulnerable position when everything is powered off. When the platter spins down, the head doesn't push itself up against the tape, so it may be resting over a spot on the platter where data is actually stored. I ended up buying another identical drive with the exact same problem, so I might perform some further experimentation with alternative solutions. Maybe if I only wrap the Kapton tape around the dark thing with the indentation (is it rubber?) rather than the entire metal enclosure, the head will still park properly but not get stuck. I'm not sure.

Regardless of whether my fix was correct, the exciting thing about this drive is that it ended up containing the long-lost information I was looking for, so this whole project was a huge success. An upcoming post will go into detail about the Apple factory-programmed data I was able to locate and preserve. This Conner drive has since been returned to its owner, and is confirmed to still be working in the original machine it came from. So not only did I recover the lost data, but I was even able to return the drive to working order so it could be reunited with the computer it was bundled in over 30 years ago.

I think this is a good reminder that we shouldn't trust hard drives for long-term storage. Even if a drive is kept in pristine condition, there are components inside that might be aging in ways you would never begin to imagine. Keep multiple backups of any important data, and make sure you copy those backups onto a newer storage medium as the decades go by and technology evolves. Don't rely on a 30-year-old hard drive still working 30 years later.

Anyway, I hope this post was interesting and educational. I thought it was fun to take the cover off and see how these hard drives operate. I have used both of these data recovery strategies on multiple Quantum and Conner hard drives and have experienced a whole bunch of success. As long as the head is just stuck in the parked position and there hasn't been anything catastrophic like a head crash, these tricks definitely work for getting your data back. Would it be smart to open up a drive that's fully functional? Probably not. For example, I own two Apple-marked IBM drives from the same era, and they both work great, so I'm not going to open them up to check for any rubber. But if your drive isn't working and you have nothing to lose, why not give it a shot?

I'll leave you with one fun tidbit I picked up while working on these hard drives. If you've ever wondered about the arrangement in terms of how the data is physically stored on the platters, I think I have an answer, at least on these Quantum and Conner drives. As I dumped the entire content of a 2-platter drive, the head stack started at the outside and slowly made its way toward the center. Then, a quarter of the way through the dump, it immediately jumped all the way back to the outside of the platter and continued reading inward again. It happened again at the halfway mark and once more at 3/4. This makes total sense because there are 4 total sides that can contain data. So each platter-side contains one consecutive chunk of data on the drive, and the data is stored outside to inside. How's that for useless trivia?

Stay tuned for my post about what I actually recovered from that Conner drive!

]]>
https://www.downtowndougbrown.com/2025/03/the-gooey-rubber-thats-slowly-ruining-old-hard-drives/feed/ 16
The invalid 68030 instruction that accidentally allowed the Mac Classic II to successfully boot up https://www.downtowndougbrown.com/2025/01/the-invalid-68030-instruction-that-accidentally-allowed-the-mac-classic-ii-to-successfully-boot-up/ https://www.downtowndougbrown.com/2025/01/the-invalid-68030-instruction-that-accidentally-allowed-the-mac-classic-ii-to-successfully-boot-up/#comments Sat, 25 Jan 2025 19:25:22 +0000 https://www.downtowndougbrown.com/?p=9510 This is the story of how Apple made a mistake in the ROM of the Macintosh Classic II that probably should have prevented it from booting, but instead, miraculously, its Motorola MC68030 CPU accidentally prevented a crash and saved the day by executing an undefined instruction.

I've been playing around with MAME a lot lately. If you haven't heard of MAME, it's an emulator that is known best for its support of many arcade games. It's so much more than that, though! It is also arguably the most complete emulator of 68000-based Mac models, thanks in large part to Arbee's incredible efforts. I will admit that I've used MAME to play a game or two of Teenage Mutant Ninja Turtles: Turtles in Time, but my main use for it is Mac emulation.

Here's how this adventure begins. I had been fixing some issues in MAME with the command + power key combination that invokes the debugger, and decided to see if the keystroke also worked on the Classic II. Even though this Mac model has a physical interrupt button on the side, it also has an "Egret" 68HC05 microcontroller for handling the keyboard and mouse (among other things) that should be able to detect the keypress and signal a non-maskable interrupt to the main CPU. I believe the Egret disables this keystroke by default, but MacsBug contains code that sends the command to enable it.

I didn't get very far while testing the command+power shortcut in MAME's emulated Classic II, because I observed something very odd. It booted up totally fine in 24-bit addressing mode, but I could not get it to boot at all if I enabled 32-bit addressing, which I needed in order for MacsBug to load. It would just pop up a Sad Mac, complete with the Chimes of Death. On this machine, the death chime is a few notes from the Twilight Zone theme song.

If you're not familiar with Apple's whole 24-bit versus 32-bit addressing saga, I'll briefly summarize it for you here. The original Motorola 68000 processor only had 24 address lines even though it used 32 bits internally for addresses. Apple took those eight extra otherwise unused bits and repurposed them for storing flags as a way to save on RAM, which was scarce at the time. When newer machines/processors came out that supported a full 32-bit address space, the upper byte couldn't be used for flags anymore. Because of that discrepancy, old software would have been incompatible, so newer machines had two modes: 24-bit mode for compatibility with older software, and 32-bit mode for being able to use all of your RAM.

So why was the Classic II failing to boot in 32-bit mode in MAME? What was broken? Arbee also reproduced the issue, so at least I knew I wasn't losing my mind. I assumed it was a random bug in MAME, so I started looking deeper into it to try to understand what needed to be fixed.

According to an old Apple Tech Info Library article, 0000000F means an exception occurred and 00000001 means the exception was a bus error. A bus error on 68k Macs typically means that something tried to access an invalid address, like if you try to read from or write to an expansion card when there isn't one installed.

What was the invalid address being accessed? I decided to step through the code using MAME's amazing debugger to understand what was leading to the crash. Comprehending what's going on in the ROM with no context at all can be tricky, but luckily, Apple included symbol maps for a bunch of Mac ROMs with Macintosh Programmer's Workshop (MPW). MPW was Apple's development environment.

Tracing backwards from the actual Sad Mac screen would be difficult, because there is a ton of code involved in setting up and displaying the screen. To make it easier on myself, I decided I would set a breakpoint on the bus error handler and then look backwards from there. The 68030's vector table starts at the very beginning of the address space, and the bus error vector is at 0x00000008. With the Sad Mac error still on the screen, here's what memory looked like at that location:

This meant the bus error handler was at 0x40A026F0, which is also known as GenExcps in the ROM map. I performed a hard reset of the emulated machine, set a breakpoint on that address, and then waited until it hit the breakpoint. It looks like GenExcps is a big list of BSR instructions that all jump to 0x40A026A0, which is common error handling code identified in the ROM map as ToDeepShit. Nice name, Apple!

Anyway, since MAME hit my breakpoint, this meant Apple's technote was correct about it being a bus error. I was able to use the MAME debugger's history command to show a backtrace of instructions that led to this point. The end of the history output is displayed in the bottom pane of the screenshot below:

If we walk upwards, we can see that the instruction that caused the bus error was at 0x40A43B9C:


move.b #$90, ($1c00,A1)

I opened up this section of code in IDA, which I still find myself using with 68k Mac stuff because I'm used to it. It was pretty clearly part of the routine that starts at 0x40A43B40, which is helpfully labeled in the ROM map as InstallSoundIntHandler. Let's look at the whole function in more depth.

The first thing it does is immediately jump to V8SndIntPatch1. This appears to be something that was patched into this ROM for handling sound initialization for the V8. For some added context, the Classic II isn't powered by an eight-cylinder gasoline engine; V8 is the name of the custom chip that Apple first used in the Macintosh LC. From the LC hardware developer note:

Why are we talking about the LC here? Well, the reason is because the Classic II is architecturally based more on the LC than the original Macintosh Classic. Here's an explanation from the corresponding developer note for the Classic II:

The text description of the EAGLE gate array is very similar to that of the V8, so it should come as no surprise that the chips themselves are very similar too. MAME handles them both in the same source file. The point I'm trying to make here is that it makes sense that the Classic II's ROM has code referring to the V8. With that info out of the way, let's look at V8SndIntPatch1:

This chunk of code is calling the Gestalt trap, which is how you determine various info about the Mac. In particular, it's using the gestaltHardwareAttr selector, which is defined as 'hdwr' in Apple's public header files.

If bit 3 (gestaltHasASC) isn't set in the response, it bails and returns. Otherwise, it jumps to V8SndIntPatch1Rtn at 0x40A43B4A, which you can see in the history trace in the MAME debugger screenshot from earlier. I went pretty deep into the hardware tables for the Classic II and can confirm that gestaltHasASC is definitely set on the Classic II. After all, the EAGLE contains a stripped-down equivalent of the Apple Sound Chip (ASC).

Now, let's take a look at V8SndIntPatch1Rtn:

Phew! This is a decent amount of code. It's not that complicated though. I'll explain the important stuff. You can see the instruction that leads to the Sad Mac at 0x40A43B9C. If you start at the top, what's happening is it's loading a byte from RAM at 0xCB3 into register D0:


moveq #$0,d0
move.b (byte_CB3).w,d0

If you know where to look out there, you can discover that this global variable is called BoxFlag and contains a value identifying which machine you have. If I step through this code in MAME, I can see that D0 ends up loaded with the value 0x11 = 17, which is correct for the Classic II.

Continuing further through the code, some other stuff happens, and then at 0x40A43B6C the value in D0 ends up being doubled (so it turns into 0x22). Immediately after this, it is used as an offset in a jump instruction. Here's IDA's syntax for the jump, because it's more intuitive than what MAME displays:


add.w d0,d0
jmp loc_40A43B72(pc,d0.w)

Since D0 ends up as 0x22 after being doubled, we jump to 0x40A43B72 + 0x22 = 0x40A43B94, and here's what that code looks like in MAME's debugger when we reach it:

Stepping further through the code, you can see we eventually reach the instruction that causes a Sad Mac. Let's see what all the registers look like before it's executed:

Hmm, that's odd. This crashing instruction writes the value 0x90 to an offset 0x1C00 bytes past the address stored in A1. A1 is set to 0xFFFF8FBA, so the address where the write occurs at is 0xFFFF8FBA + 0x1C00 = 0xFFFFABBA. This is a totally invalid address on the Classic II! No wonder we get a Sad Mac. As expected, as soon as we step into this instruction, instead of reaching the RTS instruction just below it, we end up in the code path for displaying a Sad Mac error at 0x40A026F0. This is definitely where everything craps out.

Okay, so now I had a pretty good idea of what was happening in MAME. A1 had a junk value, so the ROM code was writing to an invalid address. FFFFABBA dabba doo! I decided to investigate further to understand how A1 came to be loaded with a bad address. And that's when I discovered something really bizarre.

Let's take a closer look at one of the earlier screenshots, after we used the value of D0 (BoxFlag) to jump to the correct chunk of code for the Classic II:

I thought about this some more, and eventually realized that something absolutely crazy happened here. We were supposed to be jumping into a table of BRA.S instructions, one for each possible BoxFlag value. That's why we added D0 to itself before using it as a jump offset -- each BRA.S instruction is two bytes long, so the index into the table needed to be doubled to turn it into a byte offset. Why didn't we end up pointing at a BRA.S instruction? And where did this CAS.W instruction come from?

If you look closely at the table of branches below the JMP instruction at 0x40A43B6E, there are only 16 entries in the table, corresponding to BoxFlags 0 through 15. The Classic II is BoxFlag 17!

As I said earlier, the calculated offset we jump to is 0x40A43B94, which is not even supposed to be the start of an instruction. It's smack dab in the middle of the MOVEA.L instruction at 0x40A43B92, which is the instruction that loads A1 with a real address that this code can use for enabling the sound interrupt.

When we jump to 0x40A43B94, we aren't running intended code anymore. The CPU gets out of sync with the path that the code was designed to follow. 0x0CEC was supposed to be the second half of the MOVEA.L instruction -- the address in RAM to load from -- but instead, it is being treated as the start of a new instruction.

The CPU doesn't get back in sync right away. We execute this mystery CAS (compare and swap) instruction, and then an unintended "MOVE.B D0, D4" instruction, before finally reaching a real MOVE.B instruction at 0x40A43B9C -- the instruction that crashes. That is the point where the CPU has returned to running code that Apple actually wanted it to run. But unfortunately, A1 contains an invalid address because the code that was supposed to fill it out wasn't reached, so of course everything crashes when we try to write to A1 + 0x1C00. It all makes sense.

Going back further, A1 gets loaded with the "junk" value of 0xFFFF8FBA as part of the initial jump to InstallSoundIntHandler. So, of course, it's not really junk. It's being used as an offset for a jump instruction:

IDA's disassembly is a little more readable. That value of 0xFFFF8FBA loaded into A1 represents how much you have to add to the program counter in order to reach InstallSoundIntHandler from where you currently are. Interpreting it as signed, it's a negative number because that function is further back in the ROM code.

Overall, I felt that I totally understood what was happening. I'm probably repeating myself, but I just want it to sink in one more time: The problematic value in A1 gets loaded as part of a big relative jump to this section of ROM, and an out-of-bounds table access is jumping past code that is supposed to load A1 with an actual address of a peripheral to configure for sound interrupts. So A1 still contains that negative offset for the jump instead of a real address. Finally, it ends up being used as an address in a write operation, and boom, Sad Mac.

If you've followed along with me thus far, I'm sure there are some burning questions on your mind. This explains how MAME fails, but not why. Why was this happening? Also, why didn't this same failure occur on actual hardware? Obviously, the Classic II wasn't recalled because of an inability to use 32-bit addressing. There's no way that happened. It would have been all over tech news. Not to mention the fact that the people actually working on the ROM code would have quickly noticed it while they were testing. It's kind of a glaring issue.

So what gives? Was MAME doing something wrong here that didn't match hardware? This code couldn't have really been reached on hardware, right? I have the answer to these questions, but as a forewarning, the situation is way more complicated than I expected it to be.

I started out by trying to understand what the CAS instruction reached after the out-of-bounds jump was doing. Here are the bytes:

0C EC 08 A9 00 04

I quickly noticed that if I changed my disassembly in IDA so that it thought the code was supposed to start there, it refused to disassemble the instruction at all:

When I tried to convert it to code starting at 0x40A43B94, it said:

Command "MakeCode" failed

GNU objdump also failed to disassemble it, and then got right back in sync with the intended code:

40a43b94: 0cec .short 0x0cec
40a43b96: 08a9 0004 1800 bclr #4,%a1@(6144)
40a43b9c: 137c 0090 1c00 moveb #-112,%a1@(7168)
40a43ba2: 4e75 rts

The fact that two well-known disassemblers balked at this instruction piqued my curiosity. I decided to use MacsBug on my Macintosh IIci, which also has a 68030 processor, to put all that code into RAM at a random location and see what MacsBug thought about it. Since I was going through all this effort, I also arranged all the other registers to be identical to what I was seeing on MAME. It wasn't a perfect match with what I was seeing in MAME, though; I had to leave the program counter pointing into RAM instead of ROM.

Interesting -- MacsBug also said it was a CAS.W instruction, but it interpreted it slightly differently. It said it was CAS.W D1,D2,$0004(A4).

Of course, I couldn't resist stepping through the code in MacsBug to see what it would do on a real 68030 processor:

Wait...what? If you compare the register display on the left side of the screen in the first picture with the same display in the second picture, something incredibly strange has happened. Even though MacsBug and MAME both don't mention A1 in their interpretations of this CAS instruction at all, the value of A1 has changed! It started out as 0xFFFF8FBA, and ended up as 0xFC6B8. It seems to have turned into a value similar to what's in A5 through A7 -- a valid RAM address.

Further tinkering with MacsBug and different register values revealed that the new value of A1 depended on the original value of A1, A7, and the program counter. I couldn't figure out exactly what it was doing, but it was definitely majorly changing A1's value.

At this point, I felt like I was onto something. The MAME-emulated Classic II was crashing because A1 didn't change, so it still contained an invalid address. On hardware, this weird instruction, which several disassemblers refused to touch, and wasn't even intended to be jumped to because it starts in the middle of an actual valid instruction, was changing A1 to a new value that was a good address. Was this crazy instruction accidentally fixing A1 and thus hiding a bug from Apple's ROM developers in the early 1990s?

This was about the time that Arbee suggested I start sharing my research on the 68kmla forums and the bannister.org forums to see if some of the incredible folks who know way more than me about the 68k instruction set might be able to chime in. I also asked around on IRC in #mac68k on Libera.

The consensus was that this is not a valid CAS instruction, and that MacsBug's interpretation of the registers being D1 and D2 is correct. Let's look at what the Motorola M68000 Family Programmer's Reference Manual says about the encoding of the CAS instruction:

Comparing this with the 3 words of the instruction (0x0CEC 0x08A9 0x0004) and filling in the fields, we can see the following:

The first word appears to be a valid CAS instruction. The second word, though, has a few bits that are 1 even though the instruction format specifically says they are supposed to be 0. I've marked them in red. Also, the Du and Dc fields match what MacsBug says, as opposed to how MAME interpreted it.

The third word, 0x0004, is the d16 value mentioned in the MODE field. It's the $0004 offset from A4. So according to Motorola's reference manual, this instruction is:


CAS D1,D2,$0004(A4)

...except it has three bits that are 1 in places where they are supposed to be 0. So it's not a valid instruction at all. At least, it's not documented.

Side note: I think MAME's debugger is also decoding normal CAS instructions incorrectly; if I change it to 0x0CEC 0x0081 0x0004 instead, which is the correct way to write this instruction without the three bad "1" bits, it still thinks Du is D0 instead of D2. But that's beside the point -- the instruction we're dealing with in this story is completely messed up either way.

The CAS (compare-and-swap) instruction is an interesting one. It's used for accomplishing various atomic operations without requiring a lock. It is one of the few instructions in the 68000 family CPUs that perform a read-modify-write bus cycle. What this particular instruction is supposed to do is compare the word value in memory at A4 + 4 to the value of D1. If they are the same, then the value in D2 is written to memory at A4 + 4. Otherwise, the value in memory at A4 + 4 is loaded into D1.

It clearly still does some of this stuff, like the read-modify-write cycle involving A4 + 4. If I change A4 to point to an invalid address, MacsBug complains to me. For example, on my Mac IIci with the same test setup I showed earlier, if I set A4 to 0xFFFF0000 and rerun the bad instruction, MacsBug tells me this:

Bus Error at 0004CB20
while reading word (read-modify-write) from FFFF0004 in Supervisor data space

This definitely means that this instruction still performs the RMW cycle at A4 + 4. It doesn't seem to do exactly what the CAS instruction is supposed to do though. Obviously, the normal CAS instruction wouldn't mess with the value of A1. I ran more tests after changing A4 to point to RAM. If I store the value 0xFFFF at A4 + 4, and D1 is set to 0x1111 and D2 is set to 0x2222, then after executing the instruction, memory at A4 + 4 changes to 0x2222. But that doesn't really make any sense, because it only should have written 0x2222 to memory if D1 was equal to 0xFFFF.

Let's summarize what we've learned so far.

  • The invalid code that the ROM accidentally jumps to (0x0CEC 0x08A9 0x0004) is sort of like "CAS D1,D2,$0004(A4)", but not really, because some of the bits that are supposed to be 0 are actually set to 1.
  • On another 68030-based Mac, I've observed that this instruction ends up modifying the value stored in register A1.
  • MAME's 68030 CPU emulator does not change A1 like this, because the instruction is undocumented and normal code would never use it.
  • The Sad Mac in MAME occurs a couple of instructions later because A1 is set to an invalid address, and code in ROM tries to write a byte to A1 + 0x1C00.

I was starting to believe something that sounded almost too crazy to be true: Apple had an out-of-bounds jump bug in the Classic II's ROM that should have caused a Sad Mac during boot, but they had no idea the bug was there because the 68030 was accidentally fixing the value of A1 by executing an undocumented instruction. How could I prove that my theory was correct?

By buying a Classic II and hacking the ROM in order to see exactly what is happening on hardware, of course!

This Classic II was manufactured in 1991, so it's about 34 years old at this point. Computers this old usually need to be repaired if nobody has already fixed them. As I mentioned in a previous post about the LC III capacitor polarization mistake, the surface mount capacitors in old Macs leak out corrosive goo over time. This machine was no exception. I didn't even try powering it up. I immediately opened it up and removed what I believe was the original Sonnenschein 1/2 AA lithium battery with a March 1991 date code. The battery had not leaked at all, luckily! That's another unfortunate thing that happens in a lot of old Macs that have been sitting around for decades -- severe battery damage.

I don't want to go into excruciating detail about the repair and make this post even longer to read, but if you're interested in hearing more about that process, I posted a few updates about the repair in my 68kmla forum thread. I did accidentally short /RESET to ground while soldering in new capacitors, but I eventually got the logic board working. This was very similar to what happened to Adrian's Digital Basement on his SE/30 board a few months ago, but lucky for me, my accidental short didn't involve 12 volts and fry a bunch of chips! Adrian's channel is an excellent resource for vintage computer enthusiasts out there, if you're not already subscribed.

The Classic II has a cathode-ray tube for its screen, which scares the crap out of me, so I opted to use a different solution in order to run the logic board by itself without any dangerous voltages. Plus, the analog boards in these computers are known to be a huge pain to repair. This one had really bad capacitor leakage, so that'll probably be a future repair project. The bottom line is I knew I'd be installing and removing ROM chips a bunch, so I wanted to run the logic board out in the open, for reasons of both safety and convenience. Special thanks to 68kmla forum member davewongillies for posting his similar setup with a Classic II logic board, complete with Amazon links for a bunch of the parts. That Mini-Fit Jr. pin extractor tool on Amazon is absolute garbage though -- I could never get it to work. I ended up using staples instead.

I got impatient while waiting for my RGBtoHDMI to arrive for converting the Classic II's video signal into HDMI, and eventually discovered another solution created by GitHub user guruthree involving a Raspberry Pi Pico that converts the signal to VGA. I made a few cables, tweaked the code until the timings and colors were correct, and ended up with this concoction to adapt between a CGA DE-9 video connector and VGA:

Although I would never do anything like this in a real production environment where reliability matters, I didn't even use level shifters for the 5V signals coming from the Mac. Technically, the RP2040 is "sort of" 5V tolerant, even though it's not documented in the datasheet. That, combined with the fact that I had a few Picos laying around that I didn't care about destroying, gave me enough confidence to try it out.

Here's the whole mess, all wired up together, along with an ATX power supply. I also swapped the original ROM chips out with some programmable SST29EE010 EEPROMs I had on hand. Interestingly, the factory chips from Apple were actually UV-erasable EPROMs, so I could have even used the original chips if I had a UV chip eraser.

I successfully booted from a SCSI drive using this setup, and could capture the video signal with one of my many video capture devices:

By the way, I think there's something wrong with at least one column of pixels on the left side. I think it has something to do with the tweak I made to the code running on the RP2040 in order to make it sync up correctly with the Classic II's video signal. I'm sure I could fix it if I played around more. The RGBtoHDMI, which arrived a few days later, does not have this problem.

Anyway, I knew I was in business. I threw together some 68030 assembly code to display the value of A1 on the screen, found an unused place in the ROM to put it, and then came up with three custom ROMs to try on the Classic II:

  • Custom ROM 1: Replace the instruction at 0x40A43B9C that results in a Sad Mac (the MOVE.B instruction) with a jump to my special code that draws A1 to the screen, so we can see what A1 is on hardware at that point. Also, this would verify whether this code even runs at all on hardware.
  • Custom ROM 2: Replace the instruction at 0x40A43B94 (the CAS instruction) with the same jump to my special code. This would verify whether the out-of-bounds jump was really happening, and what the value of A1 was leading up to it.
  • Custom ROM 3: Replace the instruction at 0x40A43B94 (the CAS instruction) with NOPs. This would ideally replicate exactly what I was seeing in MAME, proving that the bad CAS instruction was vital to the Classic II's ability to boot.

Let's look at the results one by one. Here's the result that was displayed when I ran Custom ROM 1:

This verified that the section of code I was looking at definitely ran on hardware. It also showed that A1 was set to a very interesting value when it reached the instruction that crashed on MAME. 0x40A4BBB2 is not really an address you would write to because it's a ROM address, but it doesn't cause a bus error if you attempt it.

Here is what I got when I ran Custom ROM 2:

The same A1 value that we saw in MAME! This proved two things. One, the out-of-bounds table jump was definitely happening -- if it wasn't, my custom A1 drawing code wouldn't have ran at all, since the JMP to it was stored out of sync of the normal intended code flow just like the accidental CAS instruction. Second, it also proved, along with the first test, that the CAS instruction was indeed fixing A1 on hardware, just like I theorized.

Lastly, my test run of Custom ROM 3, which eliminated the CAS instruction from the situation altogether, gave me the final proof I needed:

A Sad Mac, just like I saw with MAME in 32-bit mode. I also discovered during this test that on hardware, the same Sad Mac happens in 24-bit mode too. So MAME is actually more tolerant than hardware of that invalid write in 24-bit mode.

These results motivated me to make a couple more hacked ROM images to run on hardware in order to glean the values of all of the CPU's data and address registers immediately before and after the CAS instruction. The data register values are shown in the left column, address registers in the right column. Before:

And after:

Yep! Everything is the same except for A1, which has magically been transformed from FFFF8FBA to 40A4BBB2. The mystery instruction is definitely what was responsible for that.

One fun part about this test was being able to successfully verify that everything on hardware was exactly identical to what MAME did up to the bad instruction. The entire register state shown in the "before" picture is a perfect match to what MAME shows when booting in 32-bit mode prior to the bad CAS instruction. See for yourself:

If your brain is fried after reading all this, first of all I don't blame you at all, and second, let me bring everything together to explain what this all means:

I've discovered an undocumented MC68030 instruction that performs a read-modify-write bus cycle and also changes the value of the A1 register.

This newly-discovered instruction turns out to be the glue that's accidentally holding the Classic II together. Without this instruction modifying A1, the Classic II can't boot. I'm confident that it was a mistake and not something intentional. A totally understandable mistake, at that. If the pesky 68030 hadn't been hiding the bug from Apple's ROM developers, there is no doubt they would have caught it before the Classic II shipped.

I searched deeper and found the same chunk of code in the newer Macintosh IIvx ROM, and in that ROM they finally increased the size of the jump table. I confirmed that the case for the Classic II in that code does nothing at all. It just jumps directly to an RTS instruction. I wonder if the Apple ROM developer working on that chunk of code in the IIvx ROM scratched their head in confusion when they added new entries for a bunch of new models, including the Classic II, after the Classic II ROM had already been finalized and shipped. Who knows? I'm not sure how Apple handled all the different ROM variants back then.

Because of this new discovery, I think it's very likely that there is not a 100% perfect Motorola MC68030 emulator or replica in existence right now. This might be the only case in existence where it matters though. What this means is I could write a small chunk of code that determines whether it's running on a physical 68030 or an emulator, by simply using that instruction and looking at the resulting value of A1.

What can MAME do in order to work around this problem and allow the Classic II to boot? We don't really know the exact details of what this instruction does. With some limited testing, I believe I've observed that the resulting value of A1 depends on the original A1 value, the value of A7, and the program counter. But I'm not sure. Maybe someone can make a program that tries out a bunch of different register values and memory contents, and attempt to deduce what exactly the instruction does so that it can be emulated accurately. Until someone decides that it's worth trying to figure out, MAME is patching this bug out of the ROM in order to allow the Classic II to boot. As Arbee pointed out, we're a little late to get Motorola/Freescale/NXP to issue an errata. Unless someone who worked on the 68030 happens to see this post and might have a clue about what's going on here...

Here's a screenshot of MAME with Arbee's patch applied, now able to successfully emulate a Classic II with 32-bit addressing enabled. Yay!

After all that, what's the lesson we can learn from this story? I guess it's that emulators can teach us new things about hardware that we never would have thought to look into! I bet this bug in the ROM would have gone undiscovered for all eternity if not for MAME providing emulation of the Classic II, which isn't a particularly notable machine compared to more popular compact Macs like the SE/30 and Color Classic.

It also goes to show you how bugs can be lurking in the background in places where you might think everything is totally polished. I think it's also a good example of how some bugs just aren't that big of a deal. This bug fits that category pretty well. The machine worked fine and nobody noticed.

Oh, and as for the original reason I somehow managed to pull myself into this investigation in the first place: the command+power key combination does not work in MAME. Now that I have a real Classic II, I have been able to confirm that the keystroke does indeed work on hardware. It only works with MacsBug installed, which is likely due to what I said earlier about the Egret disabling it by default. Either way, it really should work in MAME when MacsBug is installed. I suppose that's another MAME fix for me to work on!

]]>
https://www.downtowndougbrown.com/2025/01/the-invalid-68030-instruction-that-accidentally-allowed-the-mac-classic-ii-to-successfully-boot-up/feed/ 38