This project is dedicated to the memory of William Morris (aka Frags), who was the main contributor to the bounty but was unable to see the final result.

Tuesday, May 15, 2012

Peeking behind the curtain


What's new

I have just uploaded a small update again to the Subversion repository. In this update you can find the following changes:
  • Implementation of all form of the following addressing modes for both as source and as destination addressing:
    • immediate long, word, byte;
    • indirect addressing: (Ax);
    • indirect addressing with pre-decrement: -(Ax);
    • indirect addressing with post-increment: (Ax)+;
  • Implementation of instructions:
    • MOVE.x #imm,Dy;
    • MOVEA.x #imm,Ay;
    • MOVEQ #imm,Rx;
    • MOVE.x Rx,Ry
  • Code was refactored to avoid repetition of the same code chunks in the addressing modes and instructions.
  • Implementation dumping the compiled code to the console.
  • New configuration (comp_log_compiled) is also added to turn it on/off.

Featuring...

It is always exciting to find out some magic details about the internal behavior of a complex system. I remember when I realized for the first time how the texts are stored in the Commodore Plus/4 games, back in the days. I was amazed how I can change the text that showed up on the bottom scroll of Tycoon Tex.

Let me offer some small excitement to you all: I just implemented a funny little feature in the E-UAE JIT engine. Now we can turn on dumping the compiled code to the console, together with the original Motorola 68k instruction that was compiled and the macroblocks that describe the intermediate translation form.

The purpose of this feature wasn’t (purely) the entertainment, but I was really fed up with the situation that the generated code cannot be debugged properly. Previously, I added a trap instruction (tw) into the translated code at some point, so I was able to have a look on the output from the Grim Reaper window (which was awesome, but let’s not mix it up with actual debugging).
Too bad that GDB is so limited: it cannot debug into any code segment that wasn’t loaded by DOS (like generated code). Not to mention how cumbersome the console interface is... (Or am I missing something? Enlight me please.)

I would like to thank Frank Wille the sources for the PowerPC disassembler that makes it possible to list the translated code.

How to turn on this feature: there are two settings that control the logging. These are:
  • comp_log – if it was set to “true” or “yes” then the JIT logging is turned on and dumped to the standard output.
  • comp_log_compiled – if it was set to “true” or “yes” then the compiled code is listed through the JIT logs.

Let’s see a small demonstration of this feature, shall we? (Not for the faint-hearted!)

The following list is the output from the very simple test code: iamalive.asm, slightly edited and formatted for educational purposes...
  1. M68k: ADD.L #$00000001,D1
    1. Mblk: load_memory_long
      Dism: lwz r15,64(r14)
      Mblk: load_memory_long
      Dism: lwz r3,68(r14)
      Mblk: rotate_and_copy_bits
      Dism: rlwimi r15,r3,16,26,26
    2. Mblk: load_memory_long
      Dism: lwz r3,4(r14)
    3. Mblk: load_register_long
      Dism: li r4,1
    4. Mblk: add_with_flags
      Dism: addco. r3,r3,r4
    5. Mblk: copy_nzcv_flags_to_register
      Dism: mcrxr cr2
      Dism: mfcr r15
      Mblk: rotate_and_copy_bits
      Dism: rlwimi r15,r15,16,26,26
  2. M68k: MOVE.W D1,(A0,$0180) == $00dff180
    1. Mblk: load_memory_long
      Dism: lwz r4,32(r14)
    2. Mblk: add_register_imm
      Dism: addi r5,r4,384
    3. Mblk: check_word_register
      Dism: extsh. r0,r3
      Mblk: copy_nz_flags_to_register
      Dism: mfcr r6
      Mblk: rotate_and_copy_bits
      Dism: rlwimi r15,r6,0,0,2
      Mblk: rotate_and_mask_bits
      Dism: rlwinm r15,r15,0,11,8
    4. Mblk: save_memory_long
      Dism: stw r3,4(r14)
    5. Mblk: save_memory_spec
      Dism: mr r4,r3
      Dism: mr r3,r5
      Dism: rlwinm r0,r3,18,14,29
      Dism: lis r5,27315
      Dism: ori r5,r5,23016
      Dism: lwzx r5,r5,r0
      Dism: lwz r5,16(r5)
      Dism: mtlr r5
      Dism: blrl
  3. M68k: BT.B #$fffffff8 == 0000001a (TRUE)
    1. Mblk: save_memory_long
      Dism: stw r15,64(r14)
      Mblk: save_memory_word
      Dism: sth r15,68(r14)
    2. Mblk: load_register_long
      Dism: lis r3,27606
      Dism: ori r3,r3,45096
      Mblk: save_memory_long
      Dism: stw r3,76(r14)
    3. Mblk: opcode_unsupported
      Dism: li r3,24824
      Dism: lis r4,27315
      Dism: ori r4,r4,21752
      Dism: bl 0x7f91acc0

  4. Done compiling
Colorful, isn't it? :)

Okay, let's try to understand what is going on.

I marked the three Motorola 68k instruction that was compiled here with orange color, the code roughly looks like this:

1. Increase register D0 by one;
2. Put the content of register D0 to the address that is calculated by using register A0 plus offset of 0x180 (A0 was initialized previously with the value: 0xDFF000, which is the base of the custom chipset memory area) - in layman terms: load it to the background color.
3. Go back to step 1.

Now, let's see the second level of the list:

First of all the prefix "Mblk:" marks the macroblocks (white), "Dism:" is the actual PowerPC code (yellow).
As I already mentioned earlier: some macroblocks can be optimized away (although it is not implemented yet), and a macroblock means at least one PowerPC instruction, but it can be a series of instructions also.

The steps can be interpreted roughly as:

1.1. Load the arithmetic flags from the memory where the interpretive emulator stores them.
1.2. Load the previous content of the emulated D0 register into a PPC register.
1.3. Load the constant for the add instruction (one) into a PPC register.
1.4. Add the second register to the first one (increase D0 by one).
1.5. Save the arithmetic flags after the operation.

2.1. Load the previous content of the emulated A0 register into a PPC register.
2.2. Add the offset (0x180) to the content of A0 and load it into a new PPC register.
2.3. Check the content of the emulated D0 register to set up the arithmetic flags according to it.
2.4. Save back the modified D0 register to the memory for the interpretive emulator.
2.5. Calculate the offset and load the function address for the memory write operation handler and call it (namely the custom chipset write handler). This is a function from the interpretive emulation and it was written in C, therefore we must store all volatile registers back to the memory, the C code won't preserve these. (This is why we stored the D0 register in step 2.4.)

3.1. Save the arithmetic flags back to memory where the interpretive emulator stores them. (These were kept in a non-volatile register, so these were preserved while we called the helper function in step 2.5.)
3.2. Update the emulated PC register to the current state for the following instructions.
3.3. Call the interpretive emulation for the branch instruction (because it is not implemented yet, so we reuse the interpretive implementation).

4. Done. Phew.

Funny, eh? :)
If you are not familiar with assmebly then don't stretch yourself too much by trying to understand this techno-blahblah.

For the rest: who can spot what can be optimized on the compiled code?

Tuesday, May 1, 2012

Deep-diving into memory

Since I just bought a new SSD and I am about to reinstall my laptop, it is time to do another update on the project: another batch is uploaded to the SourceForge repository. (I hope I won’t lose the sources… ;)

So, what was made into this update? A quick list about what happened in the last two weeks:

  • Direct memory writing support
  • Detecting of special memory accesses per instruction and calling the chipset emulation on memory writes from the compiled code
  • Implementation of some instructions:
    • ADDQ.L;
    • MOVE.x register to memory;
    • MOVE.x register to register;
  • Implementation of some addressing modes:
    • immediate quick, like: ADDQ.L #x,reg;
    • indirect addressing with post increment, like: MOVE.x reg,(Ay)+;
  • Misc fixed stuff:
    • proper logging for JIT (with a new configuration item: comp_log);
    • slightly reworked temporary register handling;
    • better handling of reloading/saving of flags and the base register;
    • reloading of the emulated PC before leaving the compiled code is added (probably this was responsible for the reboot-loop at booting the OS, see below).

The most important change was the handling of the memory writes. When I started to plan the project this was one of the concerns: how to figure out which memory access requires special handling and which one can hit directly the emulated memory. By overcome of this issue all three problems are resolved:

  • memory accesses are emulated as it is needed;
  • self-modifying code is handled by using the processor cache;
  • translated code lookup is managed by simulating the cache lines.
Luckily, the x86 JIT implementation already solved this issue (too): every instruction is executed by the interpretive emulator first for a couple times to prevent unnecessary translation of the code pieces that won’t be executed often (like initialization/cleanup code). While it is doing that it collects some information about the executed code that can be reused for the translation. One of these is the memory access type. Very clever solution, I must admit.

I am still chasing the ghost-bug that prevents the OS from booting. The recent fixes for the PC register (program counter) reloading changed the behaviour. Instead of running around in a reboot-loop, the OS is waiting for something to happen, most likely some hardware to trigger something. So, it is still no good, but looks a bit better.

Where to go from here: more addressing modes and instructions will be implemented before I start working on the implementation of the optimizing routines. First, I want to make sure that most of the Mandelbrot test is executed using compiled code, thus it will be easier to test the optimization and finally we can see how better the JIT performs than the interpretive. Exciting!

You have probably noticed that the state of the project on SourceForge is still pre-alpha. I don’t want to change it while the OS cannot be run on the emulation, so this is the other important goal.

So much for today. While you are enjoying the sunny, warm summer, don’t forget about us who are suffering from the cold, rainy days: Winter is coming… :/