make store instructions check for mmio

fix vaddsws implementation
add remaining altivec instructions
2026-03-07 11:15:22 +00:00 · 2025-07-07 20:35:10 +02:00 · 2025-07-07 20:35:10 +02:00 · 2025-07-07 20:35:08 +02:00 · 2025-07-07 20:33:33 +02:00 · 2025-07-07 20:33:30 +02:00
9 changed files with 552 additions and 62 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -397,3 +397,13 @@ FodyWeavers.xsd

 # JetBrains Rider
 *.sln.iml
+
+# IntelliJ IDEs
+.idea/
+
+# macOS metadata
+*.DS_Store
+
+# CMake Files
+**/cmake-build-debug
+**/CMakeCache.txt
--- a/README.md
+++ b/README.md
@@ -4,6 +4,8 @@ XenonRecomp is a tool that converts Xbox 360 executables into C++ code, which ca

 This project was heavily inspired by [N64: Recompiled](https://github.com/N64Recomp/N64Recomp), a similar tool for N64 executables.

+**DISCLAIMER:** This project does not provide a runtime implementation. It only converts the game code to C++, which is not going to function correctly without a runtime backing it. **Making the game work is your responsibility.**
+
 ## Implementation Details

 ### Instructions
@@ -155,16 +157,16 @@ savevmx_64_address = 0x831B34E4

 Xbox 360 binaries feature specialized register restore & save functions that act similarly to switch case fallthroughs. Every function that utilizes non-volatile registers either has an inlined version of these functions or explicitly calls them. The recompiler requires the starting address of each restore/save function in the TOML file to recompile them correctly. These functions could likely be auto-detected, but there is currently no mechanism for it.

-Property|Description
-|-
-restgprlr_14_address|Start address of the `__restgprlr_14` function. It starts with `ld r14, -0x98(r1)`, repeating the same operation for the rest of the non-volatile registers and restoring the link register at the end.
-savegprlr_14_address|Start address of the `__savegprlr_14` function. It starts with `std r14, -0x98(r1)`, repeating the same operation for the rest of the non-volatile registers and saving the link register at the end.
-restfpr_14_address|Start address of the `__restfpr_14` function. It starts with `lfd f14, -0x90(r12)`, repeating the same operation for the rest of the non-volatile FPU registers.
-savefpr_14_address|Start address of the `__savefpr_14` function. It starts with `stfd r14, -0x90(r12)`, repeating the same operation for the rest of the non-volatile FPU registers.
-restvmx_14_address|Start address of the `__restvmx_14` function. It starts with `li r11, -0x120` and `lvx v14, r11, r12`, repeating the same operation for the rest of the non-volatile VMX registers until `v31`.
-savevmx_14_address|Start address of the `__savevmx_14` function. It starts with `li r11, -0x120` and `stvx v14, r11, r12`, repeating the same operation for the rest of the non-volatile VMX registers until `v31`.
-restvmx_64_address|Start address of the `__restvmx_64` function. It starts with `li r11, -0x400` and `lvx128 v64, r11, r12`, repeating the same operation for the rest of the non-volatile VMX registers.
-savevmx_64_address|Start address of the `__savevmx_64` function. It starts with `li r11, -0x400` and `stvx128 v64, r11, r12`, repeating the same operation for the rest of the non-volatile VMX registers.
+Property|Description|Byte Pattern
+-|-|-
+restgprlr_14_address|Start address of the `__restgprlr_14` function. It starts with `ld r14, -0x98(r1)`, repeating the same operation for the rest of the non-volatile registers and restoring the link register at the end.|`e9 c1 ff 68`
+savegprlr_14_address|Start address of the `__savegprlr_14` function. It starts with `std r14, -0x98(r1)`, repeating the same operation for the rest of the non-volatile registers and saving the link register at the end.|`f9 c1 ff 68`
+restfpr_14_address|Start address of the `__restfpr_14` function. It starts with `lfd f14, -0x90(r12)`, repeating the same operation for the rest of the non-volatile FPU registers.|`c9 cc ff 70`
+savefpr_14_address|Start address of the `__savefpr_14` function. It starts with `stfd r14, -0x90(r12)`, repeating the same operation for the rest of the non-volatile FPU registers.|`d9 cc ff 70`
+restvmx_14_address|Start address of the `__restvmx_14` function. It starts with `li r11, -0x120` and `lvx v14, r11, r12`, repeating the same operation for the rest of the non-volatile VMX registers until `v31`.|`39 60 fe e0 7d cb 60 ce`
+savevmx_14_address|Start address of the `__savevmx_14` function. It starts with `li r11, -0x120` and `stvx v14, r11, r12`, repeating the same operation for the rest of the non-volatile VMX registers until `v31`.|`39 60 fe e0 7d cb 61 ce`
+restvmx_64_address|Start address of the `__restvmx_64` function. It starts with `li r11, -0x400` and `lvx128 v64, r11, r12`, repeating the same operation for the rest of the non-volatile VMX registers.|`39 60 fc 00 10 0b 60 cb`
+savevmx_64_address|Start address of the `__savevmx_64` function. It starts with `li r11, -0x400` and `stvx128 v64, r11, r12`, repeating the same operation for the rest of the non-volatile VMX registers.|`39 60 fc 00 10 0b 61 cb`

 #### longjmp & setjmp

@@ -255,4 +257,4 @@ On Windows, you can use the clang-cl toolset and open the project in Visual Stud

 ## Special Thanks

-This project could not have been possible without the [Xenia](https://github.com/xenia-project/xenia) emulator, as many parts of the CPU code conversion process has been implemented by heavily referencing its PPC code translator. The project also uses code from [Xenia Canary](https://github.com/xenia-canary/xenia-canary) to patch XEX binaries.
+This project could not have been possible without the [Xenia](https://github.com/xenia-project/xenia) emulator, as many parts of the CPU code conversion process has been implemented by heavily referencing its PPC code translator. The project also uses code from [Xenia Canary](https://github.com/xenia-canary/xenia-canary) to patch XEX binaries.
--- a/XenonRecomp/recompiler.cpp
+++ b/XenonRecomp/recompiler.cpp
@@ -378,8 +378,9 @@ bool Recompiler::Recompile(
            else if (address == config.setJmpAddress)
            {
                println("\t{} = ctx;", env());
-                println("\t{}.s64 = setjmp(*reinterpret_cast<jmp_buf*>(base + {}.u32));", r(3), r(3));
-                println("\tif ({}.s64 != 0) ctx = {};", r(3), env());
+                println("\t{}.s64 = setjmp(*reinterpret_cast<jmp_buf*>(base + {}.u32));", temp(), r(3));
+                println("\tif ({}.s64 != 0) ctx = {};", temp(), env());
+                println("\t{} = {};", r(3), temp());
            }
            else
            {
@@ -530,6 +531,13 @@ bool Recompiler::Recompile(
            println("\t{}.compare<int32_t>({}.s32, 0, {});", cr(0), r(insn.operands[0]), xer());
        break;

+    case PPC_INST_ADDC:
+        println("\t{}.ca = {}.u32 > ~{}.u32;", xer(), r(insn.operands[2]), r(insn.operands[1]));
+        println("\t{}.u64 = {}.u64 + {}.u64;", r(insn.operands[0]), r(insn.operands[1]), r(insn.operands[2]));
+        if (strchr(insn.opcode->name, '.'))
+            println("\t{}.compare<int32_t>({}.s32, 0, {});", cr(0), r(insn.operands[0]), xer());
+        break;
+
    case PPC_INST_ADDE:
        println("\t{}.u8 = ({}.u32 + {}.u32 < {}.u32) | ({}.u32 + {}.u32 + {}.ca < {}.ca);", temp(), r(insn.operands[1]), r(insn.operands[2]), r(insn.operands[1]), r(insn.operands[1]), r(insn.operands[2]), xer(), xer());
        println("\t{}.u64 = {}.u64 + {}.u64 + {}.ca;", r(insn.operands[0]), r(insn.operands[1]), r(insn.operands[2]), xer());
@@ -538,6 +546,14 @@ bool Recompiler::Recompile(
            println("\t{}.compare<int32_t>({}.s32, 0, {});", cr(0), r(insn.operands[0]), xer());
        break;

+    case PPC_INST_ADDME:
+        println("\t{}.u8 = ({}.u32 - 1 < {}.u32) | ({}.u32 - 1 + {}.ca < {}.ca);", temp(), r(insn.operands[1]), r(insn.operands[1]), r(insn.operands[1]), xer(), xer());
+        println("\t{}.u64 = {}.u64 - 1 + {}.ca;", r(insn.operands[0]), r(insn.operands[1]), xer());
+        println("\t{}.ca = {}.u8;", xer(), temp());
+        if (strchr(insn.opcode->name, '.'))
+            println("\t{}.compare<int32_t>({}.s32, 0, {});", cr(0), r(insn.operands[0]), xer());
+        break;
+
    case PPC_INST_ADDI:
        print("\t{}.s64 = ", r(insn.operands[0]));
        if (insn.operands[1] != 0)
@@ -651,6 +667,14 @@ bool Recompiler::Recompile(
        println("\tif ({}.u32 == 0) goto loc_{:X};", ctr(), insn.operands[0]);
        break;

+    case PPC_INST_BDZF:
+    {
+        constexpr std::string_view fields[] = { "lt", "gt", "eq", "so" };
+        println("\t--{}.u64;", ctr());
+        println("\tif ({}.u32 == 0 && !{}.{}) goto loc_{:X};", ctr(), cr(insn.operands[0] / 4), fields[insn.operands[0] % 4], insn.operands[1]);
+        break;
+    }
+
    case PPC_INST_BDZLR:
        println("\t--{}.u64;", ctr());
        println("\tif ({}.u32 == 0) return;", ctr(), insn.operands[0]);
@@ -662,10 +686,20 @@ bool Recompiler::Recompile(
        break;

    case PPC_INST_BDNZF:
-        // NOTE: assuming eq here as a shortcut because all the instructions in the game do that
+    {
+        constexpr std::string_view fields[] = { "lt", "gt", "eq", "so" };
        println("\t--{}.u64;", ctr());
-        println("\tif ({}.u32 != 0 && !{}.eq) goto loc_{:X};", ctr(), cr(insn.operands[0] / 4), insn.operands[1]);
+        println("\tif ({}.u32 != 0 && !{}.{}) goto loc_{:X};", ctr(), cr(insn.operands[0] / 4), fields[insn.operands[0] % 4], insn.operands[1]);
        break;
+    }
+
+    case PPC_INST_BDNZT:
+    {
+        constexpr std::string_view fields[] = { "lt", "gt", "eq", "so" };
+        println("\t--{}.u64;", ctr());
+        println("\tif ({}.u32 != 0 && {}.{}) goto loc_{:X};", ctr(), cr(insn.operands[0] / 4), fields[insn.operands[0] % 4], insn.operands[1]);
+        break;
+    }

    case PPC_INST_BEQ:
        printConditionalBranch(false, "eq");
@@ -795,6 +829,20 @@ bool Recompiler::Recompile(
        println("\t{0}.u64 = {1}.u32 == 0 ? 32 : __builtin_clz({1}.u32);", r(insn.operands[0]), r(insn.operands[1]));
        break;

+    case PPC_INST_CROR:
+    {
+        constexpr std::string_view fields[] = { "lt", "gt", "eq", "so" };
+        println("\t{}.{} = {}.{} | {}.{};", cr(insn.operands[0] / 4), fields[insn.operands[0] % 4], cr(insn.operands[1] / 4), fields[insn.operands[1] % 4], cr(insn.operands[2] / 4), fields[insn.operands[2] % 4]);
+        break;
+    }
+
+    case PPC_INST_CRORC:
+    {
+        constexpr std::string_view fields[] = { "lt", "gt", "eq", "so" };
+        println("\t{}.{} = {}.{} | (~{}.{} & 1);", cr(insn.operands[0] / 4), fields[insn.operands[0] % 4], cr(insn.operands[1] / 4), fields[insn.operands[1] % 4], cr(insn.operands[2] / 4), fields[insn.operands[2] % 4]);
+        break;
+    }
+
    case PPC_INST_DB16CYC:
        // no op
        break;
@@ -807,6 +855,10 @@ bool Recompiler::Recompile(
        // no op
        break;

+    case PPC_INST_DCBST:
+        // no op
+        break;
+
    case PPC_INST_DCBTST:
        // no op
        break;
@@ -851,6 +903,12 @@ bool Recompiler::Recompile(
        // no op
        break;

+    case PPC_INST_EQV:
+        println("\t{}.u64 = ~({}.u64 ^ {}.u64);", r(insn.operands[0]), r(insn.operands[1]), r(insn.operands[2]));
+        if (strchr(insn.opcode->name, '.'))
+            println("\t{}.compare<int32_t>({}.s32, 0, {});", cr(0), r(insn.operands[0]), xer());
+        break;
+
    case PPC_INST_EXTSB:
        println("\t{}.s64 = {}.s8;", r(insn.operands[0]), r(insn.operands[1]));
        if (strchr(insn.opcode->name, '.'))
@@ -1034,6 +1092,12 @@ bool Recompiler::Recompile(
        println("{}.u32);", r(insn.operands[2]));
        break;

+    case PPC_INST_LBZUX:
+        println("\t{} = {}.u32 + {}.u32;", ea(), r(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}.u64 = PPC_LOAD_U8({});", r(insn.operands[0]), ea());
+        println("\t{}.u32 = {};", r(insn.operands[1]), ea());
+        break;
+
    case PPC_INST_LD:
        print("\t{}.u64 = PPC_LOAD_U64(", r(insn.operands[0]));
        if (insn.operands[2] != 0)
@@ -1062,6 +1126,12 @@ bool Recompiler::Recompile(
        println("{}.u32);", r(insn.operands[2]));
        break;

+    case PPC_INST_LDUX:
+        println("\t{} = {}.u32 + {}.u32;", ea(), r(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}.u64 = PPC_LOAD_U64({});", r(insn.operands[0]), ea());
+        println("\t{}.u32 = {};", r(insn.operands[1]), ea());
+        break;
+
    case PPC_INST_LFD:
        printSetFlushMode(false);
        print("\t{}.u64 = PPC_LOAD_U64(", f(insn.operands[0]));
@@ -1070,6 +1140,13 @@ bool Recompiler::Recompile(
        println("{});", int32_t(insn.operands[1]));
        break;

+    case PPC_INST_LFDU:
+        printSetFlushMode(false);
+        println("\t{} = {} + {}.u32;", ea(), int32_t(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}.u64 = PPC_LOAD_U64({});", r(insn.operands[0]), ea());
+        println("\t{}.u32 = {};", r(insn.operands[2]), ea());
+        break;
+
    case PPC_INST_LFDX:
        printSetFlushMode(false);
        print("\t{}.u64 = PPC_LOAD_U64(", f(insn.operands[0]));
@@ -1078,6 +1155,13 @@ bool Recompiler::Recompile(
        println("{}.u32);", r(insn.operands[2]));
        break;

+    case PPC_INST_LFDUX:
+        printSetFlushMode(false);
+        println("\t{} = {}.u32 + {}.u32;", ea(), r(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}.u64 = PPC_LOAD_U64({});", r(insn.operands[0]), ea());
+        println("\t{}.u32 = {};", r(insn.operands[1]), ea());
+        break;
+
    case PPC_INST_LFS:
        printSetFlushMode(false);
        print("\t{}.u32 = PPC_LOAD_U32(", temp());
@@ -1087,6 +1171,14 @@ bool Recompiler::Recompile(
        println("\t{}.f64 = double({}.f32);", f(insn.operands[0]), temp());
        break;

+    case PPC_INST_LFSU:
+        printSetFlushMode(false);
+        println("\t{} = {} + {}.u32;", ea(), int32_t(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}.u32 = PPC_LOAD_U32({});", temp(), ea());
+        println("\t{}.u32 = {};", r(insn.operands[2]), ea());
+        println("\t{}.f64 = double({}.f32);", f(insn.operands[0]), temp());
+        break;
+
    case PPC_INST_LFSX:
        printSetFlushMode(false);
        print("\t{}.u32 = PPC_LOAD_U32(", temp());
@@ -1096,6 +1188,14 @@ bool Recompiler::Recompile(
        println("\t{}.f64 = double({}.f32);", f(insn.operands[0]), temp());
        break;

+    case PPC_INST_LFSUX:
+        printSetFlushMode(false);
+        println("\t{} = {}.u32 + {}.u32;", ea(), r(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}.u32 = PPC_LOAD_U32({});", temp(), ea());
+        println("\t{}.u32 = {};", r(insn.operands[1]), ea());
+        println("\t{}.f64 = double({}.f32);", f(insn.operands[0]), temp());
+        break;
+
    case PPC_INST_LHA:
        print("\t{}.s64 = int16_t(PPC_LOAD_U16(", r(insn.operands[0]));
        if (insn.operands[2] != 0)
@@ -1103,6 +1203,12 @@ bool Recompiler::Recompile(
        println("{}));", int32_t(insn.operands[1]));
        break;

+    case PPC_INST_LHAU:
+        print("\t{} = {} + {}.u32;", ea(), int32_t(insn.operands[1]), r(insn.operands[2]));
+        print("\t{}.s64 = int16_t(PPC_LOAD_U16({}));", r(insn.operands[0]), ea());
+        print("\t{}.u32 = {};", r(insn.operands[2]), ea());
+        break;
+
    case PPC_INST_LHAX:
        print("\t{}.s64 = int16_t(PPC_LOAD_U16(", r(insn.operands[0]));
        if (insn.operands[1] != 0)
@@ -1117,6 +1223,12 @@ bool Recompiler::Recompile(
        println("{});", int32_t(insn.operands[1]));
        break;

+    case PPC_INST_LHZU:
+        println("\t{} = {} + {}.u32;", ea(), int32_t(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}.u64 = PPC_LOAD_U16({});", r(insn.operands[0]), ea());
+        println("\t{}.u32 = {};", r(insn.operands[2]), ea());
+        break;
+
    case PPC_INST_LHZX:
        print("\t{}.u64 = PPC_LOAD_U16(", r(insn.operands[0]));
        if (insn.operands[1] != 0)
@@ -1124,6 +1236,12 @@ bool Recompiler::Recompile(
        println("{}.u32);", r(insn.operands[2]));
        break;

+    case PPC_INST_LHZUX:
+        println("\t{} = {}.u32 + {}.u32;", ea(), r(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}.u64 = PPC_LOAD_U16({});", r(insn.operands[0]), ea());
+        println("\t{}.u32 = {};", r(insn.operands[1]), ea());
+        break;
+
    case PPC_INST_LI:
        println("\t{}.s64 = {};", r(insn.operands[0]), int32_t(insn.operands[1]));
        break;
@@ -1136,6 +1254,7 @@ bool Recompiler::Recompile(
    case PPC_INST_LVEWX128:
    case PPC_INST_LVX:
    case PPC_INST_LVX128:
+    case PPC_INST_LVEHX:
        // NOTE: for endian swapping, we reverse the whole vector instead of individual elements.
        // this is accounted for in every instruction (eg. dp3 sums yzw instead of xyz)
        print("\t_mm_store_si128((__m128i*){}.u8, _mm_shuffle_epi8(_mm_load_si128((__m128i*)(base + ((", v(insn.operands[0]));
@@ -1231,6 +1350,12 @@ bool Recompiler::Recompile(
        println("{}.u32);", r(insn.operands[2]));
        break;

+    case PPC_INST_LWZUX:
+        println("\t{} = {}.u32 + {}.u32;", ea(), r(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}.u64 = PPC_LOAD_U32({});", r(insn.operands[0]), ea());
+        println("\t{}.u32 = {};", r(insn.operands[1]), ea());
+        break;
+
    case PPC_INST_MFCR:
        for (size_t i = 0; i < 32; i++)
        {
@@ -1481,7 +1606,7 @@ bool Recompiler::Recompile(

    case PPC_INST_STBU:
        println("\t{} = {} + {}.u32;", ea(), int32_t(insn.operands[1]), r(insn.operands[2]));
-        println("\tPPC_STORE_U8({}, {}.u8);", ea(), r(insn.operands[0]));
+        println("\t{}{}, {}.u8);", mmioStore() ? "PPC_MM_STORE_U8(" : "PPC_STORE_U8(", ea(), r(insn.operands[0]));
        println("\t{}.u32 = {};", r(insn.operands[2]), ea());
        break;

@@ -1492,6 +1617,12 @@ bool Recompiler::Recompile(
        println("{}.u32, {}.u8);", r(insn.operands[2]), r(insn.operands[0]));
        break;

+    case PPC_INST_STBUX:
+        println("\t{} = {}.u32 + {}.u32;", ea(), r(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}{}, {}.u8);", mmioStore() ? "PPC_MM_STORE_U8(" : "PPC_STORE_U8(", ea(), r(insn.operands[0]));
+        println("\t{}.u32 = {};", r(insn.operands[1]), ea());
+        break;
+
    case PPC_INST_STD:
        print("{}", mmioStore() ? "\tPPC_MM_STORE_U64(" : "\tPPC_STORE_U64(");
        if (insn.operands[2] != 0)
@@ -1511,7 +1642,7 @@ bool Recompiler::Recompile(

    case PPC_INST_STDU:
        println("\t{} = {} + {}.u32;", ea(), int32_t(insn.operands[1]), r(insn.operands[2]));
-        println("\tPPC_STORE_U64({}, {}.u64);", ea(), r(insn.operands[0]));
+        println("\t{}{}, {}.u64);", mmioStore() ? "PPC_MM_STORE_U64(" : "PPC_STORE_U64(", ea(), r(insn.operands[0]));
        println("\t{}.u32 = {};", r(insn.operands[2]), ea());
        break;

@@ -1522,6 +1653,12 @@ bool Recompiler::Recompile(
        println("{}.u32, {}.u64);", r(insn.operands[2]), r(insn.operands[0]));
        break;

+    case PPC_INST_STDUX:
+        println("\t{} = {}.u32 + {}.u32;", ea(), r(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}{}, {}.u64);", mmioStore() ? "PPC_MM_STORE_U64(" : "PPC_STORE_U64(", ea(), r(insn.operands[0]));
+        println("\t{}.u32 = {};", r(insn.operands[1]), ea());
+        break;
+
    case PPC_INST_STFD:
        printSetFlushMode(false);
        print("{}", mmioStore() ? "\tPPC_MM_STORE_U64(" : "\tPPC_STORE_U64(");
@@ -1530,6 +1667,13 @@ bool Recompiler::Recompile(
        println("{}, {}.u64);", int32_t(insn.operands[1]), f(insn.operands[0]));
        break;

+    case PPC_INST_STFDU:
+        printSetFlushMode(false);
+        println("\t{} = {} + {}.u32;", ea(), int32_t(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}{}, {}.u64);", mmioStore() ? "PPC_MM_STORE_U64(" : "PPC_STORE_U64(", ea(), r(insn.operands[0]));
+        println("\t{}.u32 = {};", r(insn.operands[2]), ea());
+        break;
+
    case PPC_INST_STFDX:
        printSetFlushMode(false);
        print("{}", mmioStore() ? "\tPPC_MM_STORE_U64(" : "\tPPC_STORE_U64(");
@@ -1555,6 +1699,14 @@ bool Recompiler::Recompile(
        println("{}, {}.u32);", int32_t(insn.operands[1]), temp());
        break;

+    case PPC_INST_STFSU:
+        printSetFlushMode(false);
+        println("\t{}.f32 = float({}.f64);", temp(), f(insn.operands[0]));
+        println("\t{} = {} + {}.u32;", ea(), int32_t(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}{}, {}.u32);", mmioStore() ? "PPC_MM_STORE_U32(" : "PPC_STORE_U32(", ea(), temp());
+        println("\t{}.u32 = {};", r(insn.operands[2]), ea());
+        break;
+
    case PPC_INST_STFSX:
        printSetFlushMode(false);
        println("\t{}.f32 = float({}.f64);", temp(), f(insn.operands[0]));
@@ -1564,6 +1716,14 @@ bool Recompiler::Recompile(
        println("{}.u32, {}.u32);", r(insn.operands[2]), temp());
        break;

+    case PPC_INST_STFSUX:
+        printSetFlushMode(false);
+        println("\t{}.f32 = float({}.f64);", temp(), f(insn.operands[0]));
+        println("\t{} = {}.u32 + {}.u32;", ea(), r(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}{}, {}.u32);", mmioStore() ? "PPC_MM_STORE_U32(" : "PPC_STORE_U32(", ea(), temp());
+        println("\t{}.u32 = {};", r(insn.operands[1]), ea());
+        break;
+
    case PPC_INST_STH:
        print("{}", mmioStore() ? "\tPPC_MM_STORE_U16(" : "\tPPC_STORE_U16(");
        if (insn.operands[2] != 0)
@@ -1571,6 +1731,18 @@ bool Recompiler::Recompile(
        println("{}, {}.u16);", int32_t(insn.operands[1]), r(insn.operands[0]));
        break;

+    case PPC_INST_STHU:
+        println("\t{} = {} + {}.u32;", ea(), int32_t(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}{}, {}.u16);", mmioStore() ? "PPC_MM_STORE_U16(" : "PPC_STORE_U16(", ea(), r(insn.operands[0]));
+        println("\t{}.u32 = {};", r(insn.operands[2]), ea());
+        break;
+
+    case PPC_INST_STHUX:
+        println("\t{} = {}.u32 + {}.u32;", ea(), r(insn.operands[1]), r(insn.operands[2]));
+        println("\t{}{}, {}.u16);", mmioStore() ? "PPC_MM_STORE_U16(" : "PPC_STORE_U16(", ea(), r(insn.operands[0]));
+        println("\t{}.u32 = {};", r(insn.operands[1]), ea());
+        break;
+
    case PPC_INST_STHBRX:
        print("{}", mmioStore() ? "\tPPC_MM_STORE_U16(" : "\tPPC_STORE_U16(");
        if (insn.operands[1] != 0)
@@ -1666,13 +1838,13 @@ bool Recompiler::Recompile(

    case PPC_INST_STWU:
        println("\t{} = {} + {}.u32;", ea(), int32_t(insn.operands[1]), r(insn.operands[2]));
-        println("\tPPC_STORE_U32({}, {}.u32);", ea(), r(insn.operands[0]));
+        println("\t{}{}, {}.u32);", mmioStore() ? "PPC_MM_STORE_U32(" : "PPC_STORE_U32(", ea(), r(insn.operands[0]));
        println("\t{}.u32 = {};", r(insn.operands[2]), ea());
        break;

    case PPC_INST_STWUX:
        println("\t{} = {}.u32 + {}.u32;", ea(), r(insn.operands[1]), r(insn.operands[2]));
-        println("\tPPC_STORE_U32({}, {}.u32);", ea(), r(insn.operands[0]));
+        println("\t{}{}, {}.u32);", mmioStore() ? "PPC_MM_STORE_U32(" : "PPC_STORE_U32(", ea(), r(insn.operands[0]));
        println("\t{}.u32 = {};", r(insn.operands[1]), ea());
        break;

@@ -1704,6 +1876,14 @@ bool Recompiler::Recompile(
            println("\t{}.compare<int32_t>({}.s32, 0, {});", cr(0), r(insn.operands[0]), xer());
        break;

+    case PPC_INST_SUBFZE:
+        println("\t{}.u8 = (~{}.u32 < ~{}.u32) | (~{}.u32 + {}.ca < {}.ca);", temp(), r(insn.operands[1]), r(insn.operands[1]), r(insn.operands[1]), xer(), xer());
+        println("\t{}.u64 = ~{}.u64 + {}.ca;", r(insn.operands[0]), r(insn.operands[1]), xer());
+        println("\t{}.ca = {}.u8;", xer(), temp());
+        if (strchr(insn.opcode->name, '.'))
+            println("\t{}.compare<int32_t>({}.s32, 0, {});", cr(0), r(insn.operands[0]), xer());
+        break;
+
    case PPC_INST_SUBFIC:
        println("\t{}.ca = {}.u32 <= {};", xer(), r(insn.operands[1]), insn.operands[2]);
        println("\t{}.s64 = {} - {}.s64;", r(insn.operands[0]), int32_t(insn.operands[2]), r(insn.operands[1]));
@@ -1739,10 +1919,23 @@ bool Recompiler::Recompile(
        println("\t_mm_store_ps({}.f32, _mm_add_ps(_mm_load_ps({}.f32), _mm_load_ps({}.f32)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
        break;

+    case PPC_INST_VADDSBS:
+        println("\t_mm_store_si128((__m128i*){}.s8, _mm_adds_epi8(_mm_load_si128((__m128i*){}.s8), _mm_load_si128((__m128i*){}.s8)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
+        break;
+
    case PPC_INST_VADDSHS:
        println("\t_mm_store_si128((__m128i*){}.s16, _mm_adds_epi16(_mm_load_si128((__m128i*){}.s16), _mm_load_si128((__m128i*){}.s16)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
        break;

+    case PPC_INST_VADDSWS:
+        // TODO: vectorize
+        for (size_t i = 0; i < 4; i++)
+        {
+            println("\t{}.s64 = int64_t({}.s32[{}]) + int64_t({}.s32[{}]);", temp(), v(insn.operands[1]), i, v(insn.operands[2]), i);
+            println("\t{}.s32[{}] = {}.s64 > INT_MAX ? INT_MAX : {}.s64 < INT_MIN ? INT_MIN : {}.s64;", v(insn.operands[0]), i, temp(), temp(), temp());
+        }
+        break;
+
    case PPC_INST_VADDUBM:
        println("\t_mm_store_si128((__m128i*){}.u8, _mm_add_epi8(_mm_load_si128((__m128i*){}.u8), _mm_load_si128((__m128i*){}.u8)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
        break;
@@ -1784,6 +1977,10 @@ bool Recompiler::Recompile(
        println("\t_mm_store_si128((__m128i*){}.u8, _mm_avg_epu8(_mm_load_si128((__m128i*){}.u8), _mm_load_si128((__m128i*){}.u8)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
        break;

+    case PPC_INST_VAVGUH:
+        println("\t_mm_store_si128((__m128i*){}.u8, _mm_avg_epu16(_mm_load_si128((__m128i*){}.u16), _mm_load_si128((__m128i*){}.u16)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
+        break;
+
    case PPC_INST_VCTSXS:
    case PPC_INST_VCFPSXWS128:
        printSetFlushMode(true);
@@ -1794,6 +1991,16 @@ bool Recompiler::Recompile(
            println("_mm_load_ps({}.f32)));", v(insn.operands[1]));
        break;

+    case PPC_INST_VCTUXS:
+    case PPC_INST_VCFPUXWS128:
+        printSetFlushMode(true);
+        print("\t_mm_store_si128((__m128i*){}.u32, _mm_vctuxs(", v(insn.operands[0]));
+        if (insn.operands[2] != 0)
+            println("_mm_mul_ps(_mm_load_ps({}.f32), _mm_set1_ps({}))));", v(insn.operands[1]), 1u << insn.operands[2]);
+        else
+            println("_mm_load_ps({}.f32)));", v(insn.operands[1]));
+        break;
+
    case PPC_INST_VCFSX:
    case PPC_INST_VCSXWFP128:
    {
@@ -1847,6 +2054,12 @@ bool Recompiler::Recompile(
            println("\t{}.setFromMask(_mm_load_si128((__m128i*){}.u8), 0xFFFF);", cr(6), v(insn.operands[0]));
        break;

+    case PPC_INST_VCMPEQUH:
+        println("\t_mm_store_si128((__m128i*){}.u8, _mm_cmpeq_epi16(_mm_load_si128((__m128i*){}.u16), _mm_load_si128((__m128i*){}.u16)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
+        if (strchr(insn.opcode->name, '.'))
+            println("\t{}.setFromMask(_mm_load_si128((__m128i*){}.u16), 0xFFFF);", cr(6), v(insn.operands[0]));
+        break;
+
    case PPC_INST_VCMPEQUW:
    case PPC_INST_VCMPEQUW128:
        println("\t_mm_store_si128((__m128i*){}.u8, _mm_cmpeq_epi32(_mm_load_si128((__m128i*){}.u32), _mm_load_si128((__m128i*){}.u32)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
@@ -1872,10 +2085,26 @@ bool Recompiler::Recompile(

    case PPC_INST_VCMPGTUB:
        println("\t_mm_store_si128((__m128i*){}.u8, _mm_cmpgt_epu8(_mm_load_si128((__m128i*){}.u8), _mm_load_si128((__m128i*){}.u8)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
+        if (strchr(insn.opcode->name, '.'))
+            println("\t{}.setFromMask(_mm_load_si128((__m128i*){}.u8), 0xFFFF);", cr(6), v(insn.operands[0]));
        break;

    case PPC_INST_VCMPGTUH:
        println("\t_mm_store_si128((__m128i*){}.u8, _mm_cmpgt_epu16(_mm_load_si128((__m128i*){}.u16), _mm_load_si128((__m128i*){}.u16)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
+        if (strchr(insn.opcode->name, '.'))
+            println("\t{}.setFromMask(_mm_load_si128((__m128i*){}.u16), 0xFFFF);", cr(6), v(insn.operands[0]));
+        break;
+
+    case PPC_INST_VCMPGTSH:
+        println("\t_mm_store_si128((__m128i*){}.s8, _mm_cmpgt_epi16(_mm_load_si128((__m128i*){}.u16), _mm_load_si128((__m128i*){}.u16)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
+        if (strchr(insn.opcode->name, '.'))
+            println("\t{}.setFromMask(_mm_load_si128((__m128i*){}.s16), 0xFFFF);", cr(6), v(insn.operands[0]));
+        break;
+
+    case PPC_INST_VCMPGTSW:
+        println("\t_mm_store_si128((__m128i*){}.s8, _mm_cmpgt_epi32(_mm_load_si128((__m128i*){}.u32), _mm_load_si128((__m128i*){}.u32)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
+        if (strchr(insn.opcode->name, '.'))
+            println("\t{}.setFromMask(_mm_load_si128((__m128i*){}.s32), 0xFFFF);", cr(6), v(insn.operands[0]));
        break;

    case PPC_INST_VEXPTEFP:
@@ -1907,10 +2136,18 @@ bool Recompiler::Recompile(
        println("\t_mm_store_ps({}.f32, _mm_max_ps(_mm_load_ps({}.f32), _mm_load_ps({}.f32)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
        break;

+    case PPC_INST_VMAXSH:
+        println("\t_mm_store_si128((__m128i*){}.u16, _mm_max_epi16(_mm_load_si128((__m128i*){}.u16), _mm_load_si128((__m128i*){}.u16)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
+        break;
+
    case PPC_INST_VMAXSW:
        println("\t_mm_store_si128((__m128i*){}.u32, _mm_max_epi32(_mm_load_si128((__m128i*){}.u32), _mm_load_si128((__m128i*){}.u32)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
        break;

+    case PPC_INST_VMINSH:
+        println("\t_mm_store_si128((__m128i*){}.u16, _mm_min_epi16(_mm_load_si128((__m128i*){}.u16), _mm_load_si128((__m128i*){}.u16)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
+        break;
+
    case PPC_INST_VMINFP:
    case PPC_INST_VMINFP128:
        printSetFlushMode(true);
@@ -2000,7 +2237,7 @@ bool Recompiler::Recompile(
        switch (insn.operands[2])
        {
        case 0: // D3D color
-            if (insn.operands[3] != 1 || insn.operands[4] != 3)
+            if (insn.operands[3] != 1)
                fmt::println("Unexpected D3D color pack instruction at {:X}", base);

            for (size_t i = 0; i < 4; i++)
@@ -2010,7 +2247,29 @@ bool Recompiler::Recompile(
                println("\t{}.f32[{}] = {}.f32[{}] < 3.0f ? 3.0f : ({}.f32[{}] > {}.f32[{}] ? {}.f32[{}] : {}.f32[{}]);", vTemp(), i, v(insn.operands[1]), i, v(insn.operands[1]), i, vTemp(), i, vTemp(), i, v(insn.operands[1]), i);
                println("\t{}.u32 {}= uint32_t({}.u8[{}]) << {};", temp(), i == 0 ? "" : "|", vTemp(), i * 4, indices[i] * 8);
            }
-            println("\t{}.u32[3] = {}.u32;", v(insn.operands[0]), temp());
+            println("\t{}.u32[{}] = {}.u32;", v(insn.operands[0]), insn.operands[4], temp());
+            break;
+
+        case 5: // float16_4
+            if (insn.operands[3] != 2 || insn.operands[4] > 2)
+                fmt::println("Unexpected float16_4 pack instruction at {:X}", base);
+
+            for (size_t i = 0; i < 4; i++)
+            {
+        		// Strip sign from source
+        		println("\t{}.u32 = ({}.u32[{}]&0x7FFFFFFF);", temp(), v(insn.operands[1]), i);
+        		// If |source| is > 65504, clamp output to 0x7FFF, else save 8 exponent bits 
+        		println("\t{0}.u8[0] = ({1}.f32 != {1}.f32) || ({1}.f32 > 65504.0f) ? 0xFF : (({2}.u32[{3}]&0x7f800000)>>23);", vTemp(), temp(), v(insn.operands[1]), i);
+        		// If 8 exponent bits were saved, it can only be 0x8E at most
+        		// If saved, save first 10 bits of mantissa
+        		println("\t{}.u16 = {}.u8[0] != 0xFF ? (({}.u32[{}]&0x7FE000)>>13) : 0x0;", temp(), vTemp(), v(insn.operands[1]), i);
+        		// If saved and > 127-15, exponent is converted from 8 to 5-bit by subtracting 0x70
+        		// If saved but not > 127-15, clamp exponent at 0, add 0x400 to mantissa and shift right by (0x71-exponent)
+        		// If right shift is greater than 31 bits, manually clamp mantissa to 0 or else the output of the shift will be wrong
+        		println("\t{0}.u16[{1}] = {2}.u8[0] != 0xFF ? ({2}.u8[0] > 0x70 ? ((({2}.u8[0]-0x70)<<10)+{3}.u16) : (0x71-{2}.u8[0] > 31 ? 0x0 : ((0x400+{3}.u16)>>(0x71-{2}.u8[0])))) : 0x7FFF;", v(insn.operands[0]), i+(2*insn.operands[4]), vTemp(), temp());
+        		// Add back original sign
+        		println("\t{}.u16[{}] |= (({}.u32[{}]&0x80000000)>>16);", v(insn.operands[0]), i+(2*insn.operands[4]), v(insn.operands[1]), i);
+            }
            break;

        default:
@@ -2019,11 +2278,36 @@ bool Recompiler::Recompile(
        }
        break;

+    case PPC_INST_VPKSHSS:
+    case PPC_INST_VPKSHSS128:
+        println("\t_mm_store_si128((__m128i*){}.u8, _mm_packs_epi16(_mm_load_si128((__m128i*){}.s16), _mm_load_si128((__m128i*){}.s16)));", v(insn.operands[0]), v(insn.operands[2]), v(insn.operands[1]));
+        break;
+
+    case PPC_INST_VPKSWSS:
+    case PPC_INST_VPKSWSS128:
+        println("\t_mm_store_si128((__m128i*){}.u8, _mm_packs_epi32(_mm_load_si128((__m128i*){}.s32), _mm_load_si128((__m128i*){}.s32)));", v(insn.operands[0]), v(insn.operands[2]), v(insn.operands[1]));
+        break;
+
    case PPC_INST_VPKSHUS:
    case PPC_INST_VPKSHUS128:
        println("\t_mm_store_si128((__m128i*){}.u8, _mm_packus_epi16(_mm_load_si128((__m128i*){}.s16), _mm_load_si128((__m128i*){}.s16)));", v(insn.operands[0]), v(insn.operands[2]), v(insn.operands[1]));
        break;

+    case PPC_INST_VPKSWUS:
+    case PPC_INST_VPKSWUS128:
+        println("\t_mm_store_si128((__m128i*){}.u8, _mm_packus_epi32(_mm_load_si128((__m128i*){}.s32), _mm_load_si128((__m128i*){}.s32)));", v(insn.operands[0]), v(insn.operands[2]), v(insn.operands[1]));
+        break;
+
+    case PPC_INST_VPKUHUS:
+    case PPC_INST_VPKUHUS128:
+        for (size_t i = 0; i < 8; i++)
+        {
+            println("\t{0}.u8[{1}] = {2}.u16[{1}] > UCHAR_MAX ? UCHAR_MAX : {2}.u16[{1}];", vTemp(), i, v(insn.operands[2]));
+            println("\t{0}.u8[{1}] = {2}.u16[{3}] > UCHAR_MAX ? UCHAR_MAX : {2}.u16[{3}];", vTemp(), i + 8, v(insn.operands[1]), i);
+        }
+        println("{} = {};", v(insn.operands[0]), vTemp());
+        break;
+
    case PPC_INST_VREFP:
    case PPC_INST_VREFP128:
        // TODO: see if we can use rcp safely
@@ -2056,6 +2340,14 @@ bool Recompiler::Recompile(
        break;
    }

+    case PPC_INST_VRLH:
+        for (size_t i = 0; i < 8; i++)
+        {
+            println("\t{0}.u16[{1}] = ({2}.u16[{1}] << ({3}.u16[{1}] & 0xF)) | ({2}.u16[{1}] >> (16 - ({3}.u16[{1}] & 0xF)));", vTemp(), i, v(insn.operands[1]), v(insn.operands[2]));
+        }
+        println("{} = {};", v(insn.operands[0]), vTemp());
+        break;
+
    case PPC_INST_VRSQRTEFP:
    case PPC_INST_VRSQRTEFP128:
        // TODO: see if we can use rsqrt safely
@@ -2065,6 +2357,7 @@ bool Recompiler::Recompile(
        break;

    case PPC_INST_VSEL:
+    case PPC_INST_VSEL128:
        println("\t_mm_store_si128((__m128i*){}.u8, _mm_or_si128(_mm_andnot_si128(_mm_load_si128((__m128i*){}.u8), _mm_load_si128((__m128i*){}.u8)), _mm_and_si128(_mm_load_si128((__m128i*){}.u8), _mm_load_si128((__m128i*){}.u8))));", v(insn.operands[0]), v(insn.operands[3]), v(insn.operands[1]), v(insn.operands[3]), v(insn.operands[2]));
        break;

@@ -2074,6 +2367,12 @@ bool Recompiler::Recompile(
            println("\t{}.u8[{}] = {}.u8[{}] << ({}.u8[{}] & 0x7);", v(insn.operands[0]), i, v(insn.operands[1]), i, v(insn.operands[2]), i);
        break;

+    case PPC_INST_VSLH:
+        // TODO: vectorize
+        for (size_t i = 0; i < 8; i++)
+            println("\t{}.u16[{}] = {}.u16[{}] << ({}.u8[{}] & 0xF);", v(insn.operands[0]), i, v(insn.operands[1]), i, v(insn.operands[2]), i * 2);
+        break;
+
    case PPC_INST_VSLDOI:
    case PPC_INST_VSLDOI128:
        println("\t_mm_store_si128((__m128i*){}.u8, _mm_alignr_epi8(_mm_load_si128((__m128i*){}.u8), _mm_load_si128((__m128i*){}.u8), {}));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]), 16 - insn.operands[3]);
@@ -2107,6 +2406,10 @@ bool Recompiler::Recompile(
        println("\t_mm_store_si128((__m128i*){}.u8, _mm_set1_epi8(char(0x{:X})));", v(insn.operands[0]), insn.operands[1]);
        break;

+    case PPC_INST_VSPLTISH:
+        println("\t_mm_store_si128((__m128i*){}.u16, _mm_set1_epi16(int(0x{:X})));", v(insn.operands[0]), insn.operands[1]);
+        break;
+
    case PPC_INST_VSPLTISW:
    case PPC_INST_VSPLTISW128:
        println("\t_mm_store_si128((__m128i*){}.u32, _mm_set1_epi32(int(0x{:X})));", v(insn.operands[0]), insn.operands[1]);
@@ -2126,6 +2429,18 @@ bool Recompiler::Recompile(
        println("\t_mm_store_si128((__m128i*){}.u8, _mm_vsr(_mm_load_si128((__m128i*){}.u8), _mm_load_si128((__m128i*){}.u8)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
        break;

+    case PPC_INST_VSRAB:
+        // TODO: vectorize, ensure endianness is correct
+        for (size_t i = 0; i < 16; i++)
+            println("\t{}.s8[{}] = {}.s8[{}] >> ({}.u8[{}] & 0x7);", v(insn.operands[0]), i, v(insn.operands[1]), i, v(insn.operands[2]), i);
+        break;
+
+    case PPC_INST_VSRAH:
+        // TODO: vectorize, ensure endianness is correct
+        for (size_t i = 0; i < 8; i++)
+            println("\t{}.s16[{}] = {}.s16[{}] >> ({}.u8[{}] & 0xF);", v(insn.operands[0]), i, v(insn.operands[1]), i, v(insn.operands[2]), i * 2);
+        break;
+
    case PPC_INST_VSRAW:
    case PPC_INST_VSRAW128:
        // TODO: vectorize, ensure endianness is correct
@@ -2133,6 +2448,12 @@ bool Recompiler::Recompile(
            println("\t{}.s32[{}] = {}.s32[{}] >> ({}.u8[{}] & 0x1F);", v(insn.operands[0]), i, v(insn.operands[1]), i, v(insn.operands[2]), i * 4);
        break;

+    case PPC_INST_VSRH:
+        // TODO: vectorize, ensure endianness is correct
+        for (size_t i = 0; i < 8; i++)
+            println("\t{}.u16[{}] = {}.u16[{}] >> ({}.u8[{}] & 0xF);", v(insn.operands[0]), i, v(insn.operands[1]), i, v(insn.operands[2]), i * 2);
+        break;
+
    case PPC_INST_VSRW:
    case PPC_INST_VSRW128:
        // TODO: vectorize, ensure endianness is correct
@@ -2146,6 +2467,15 @@ bool Recompiler::Recompile(
        println("\t_mm_store_ps({}.f32, _mm_sub_ps(_mm_load_ps({}.f32), _mm_load_ps({}.f32)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
        break;

+    case PPC_INST_VSUBSHS:
+        // TODO: vectorize
+        for (size_t i = 0; i < 8; i++)
+        {
+            println("\t{}.s64 = int64_t({}.s16[{}]) - int64_t({}.s16[{}]);", temp(), v(insn.operands[1]), i, v(insn.operands[2]), i);
+            println("\t{}.s16[{}] = {}.s64 > SHRT_MAX ? SHRT_MAX : {}.s64 < SHRT_MIN ? SHRT_MIN : {}.s64;", v(insn.operands[0]), i, temp(), temp(), temp());
+        }
+        break;
+
    case PPC_INST_VSUBSWS:
        // TODO: vectorize
        for (size_t i = 0; i < 4; i++)
@@ -2159,8 +2489,12 @@ bool Recompiler::Recompile(
        println("\t_mm_store_si128((__m128i*){}.u8, _mm_subs_epu8(_mm_load_si128((__m128i*){}.u8), _mm_load_si128((__m128i*){}.u8)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
        break;

+    case PPC_INST_VSUBUBM:
+        println("\t_mm_store_si128((__m128i*){}.u8, _mm_sub_epi8(_mm_load_si128((__m128i*){}.u8), _mm_load_si128((__m128i*){}.u8)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
+        break;
+
    case PPC_INST_VSUBUHM:
-        println("\t_mm_store_si128((__m128i*){}.u8, _mm_sub_epi16(_mm_load_si128((__m128i*){}.u8), _mm_load_si128((__m128i*){}.u8)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
+        println("\t_mm_store_si128((__m128i*){}.u8, _mm_sub_epi16(_mm_load_si128((__m128i*){}.u16), _mm_load_si128((__m128i*){}.u16)));", v(insn.operands[0]), v(insn.operands[1]), v(insn.operands[2]));
        break;

    case PPC_INST_VUPKD3D128:
--- a/XenonUtils/ppc_context.h
+++ b/XenonUtils/ppc_context.h
@@ -29,7 +29,7 @@
 #define PPC_EXTERN_FUNC(x) extern PPC_FUNC(x)
 #define PPC_WEAK_FUNC(x) __attribute__((weak,noinline)) PPC_FUNC(x)

-#define PPC_FUNC_PROLOGUE() __builtin_assume(((size_t)base & 0xFFFFFFFF) == 0)
+#define PPC_FUNC_PROLOGUE() __builtin_assume(((size_t)base & 0x1F) == 0)

 #ifndef PPC_LOAD_U8
 #define PPC_LOAD_U8(x) *(volatile uint8_t*)(base + (x))
@@ -123,21 +123,18 @@ struct PPCFuncMapping

 extern PPCFuncMapping PPCFuncMappings[];

-struct PPCRegister
+union PPCRegister
 {
-    union
-    {
-        int8_t s8;
-        uint8_t u8;
-        int16_t s16;
-        uint16_t u16;
-        int32_t s32;
-        uint32_t u32;
-        int64_t s64;
-        uint64_t u64;
-        float f32;
-        double f64;
-    };
+    int8_t s8;
+    uint8_t u8;
+    int16_t s16;
+    uint16_t u16;
+    int32_t s32;
+    uint32_t u32;
+    int64_t s64;
+    uint64_t u64;
+    float f32;
+    double f64;
 };

 struct PPCXERRegister
@@ -194,21 +191,18 @@ struct PPCCRRegister
    }
 };

-struct alignas(0x10) PPCVRegister
+union alignas(0x10) PPCVRegister
 {
-    union
-    {
-        int8_t s8[16];
-        uint8_t u8[16];
-        int16_t s16[8];
-        uint16_t u16[8];
-        int32_t s32[4];
-        uint32_t u32[4];
-        int64_t s64[2];
-        uint64_t u64[2];
-        float f32[4];
-        double f64[2];
-    };
+    int8_t s8[16];
+    uint8_t u8[16];
+    int16_t s16[8];
+    uint16_t u16[8];
+    int32_t s32[4];
+    uint32_t u32[4];
+    int64_t s64[2];
+    uint64_t u64[2];
+    float f32[4];
+    double f64[2];
 };

 #define PPC_ROUND_NEAREST 0x00
@@ -270,7 +264,7 @@ struct PPCFPSCRRegister
    }
 };

-struct PPCContext
+struct alignas(0x40) PPCContext
 {
    PPCRegister r3;
 #ifndef PPC_CONFIG_NON_ARGUMENT_AS_LOCAL
@@ -651,6 +645,19 @@ inline __m128i _mm_vctsxs(__m128 src1)
    return _mm_andnot_si128(_mm_castps_si128(xmm2), _mm_castps_si128(dest));
 }

+inline __m128i _mm_vctuxs(__m128 src1)
+{
+    __m128 xmm0 = _mm_max_ps(src1, _mm_set1_epi32(0));
+    __m128 xmm1 = _mm_cmpge_ps(xmm0, _mm_set1_ps((float)0x80000000));
+    __m128 xmm2 = _mm_sub_ps(xmm0, _mm_set1_ps((float)0x80000000));
+    xmm0 = _mm_blendv_ps(xmm0, xmm2, xmm1);
+    __m128i dest = _mm_cvttps_epi32(xmm0);
+    xmm0 = _mm_cmpeq_epi32(dest, _mm_set1_epi32(INT_MIN));
+    xmm1 = _mm_and_si128(xmm1, _mm_set1_epi32(INT_MIN));
+    dest = _mm_add_epi32(dest, xmm1);
+    return _mm_or_si128(dest, xmm0);
+}
+
 inline __m128i _mm_vsr(__m128i a, __m128i b)
 {
    b = _mm_srli_epi64(_mm_slli_epi64(b, 61), 61);
--- a/XenonUtils/xex.cpp
+++ b/XenonUtils/xex.cpp
@@ -5,6 +5,8 @@
 #include <vector>
 #include <unordered_map>
 #include <aes.hpp>
+#include <TinySHA1.hpp>
+#include <xex_patcher.h>

 #define STRINGIFY(X) #X
 #define XE_EXPORT(MODULE, ORDINAL, NAME, TYPE) { (ORDINAL), "__imp__" STRINGIFY(NAME) }
@@ -135,7 +137,7 @@ Image Xex2LoadImage(const uint8_t* data, size_t dataSize)
    // Decompress image
    if (fileFormatInfo != nullptr)
    {
-        assert(fileFormatInfo->compressionType <= XEX_COMPRESSION_BASIC);
+        assert(fileFormatInfo->compressionType <= XEX_COMPRESSION_NORMAL);

        std::unique_ptr<uint8_t[]> decryptedData;
        const uint8_t* srcData = nullptr;
@@ -192,6 +194,67 @@ Image Xex2LoadImage(const uint8_t* data, size_t dataSize)
                destData += blocks[i].zeroSize;
            }
        }
+        else if (fileFormatInfo->compressionType == XEX_COMPRESSION_NORMAL)
+        {
+            result = std::make_unique<uint8_t[]>(imageSize);
+            auto* destData = result.get();
+
+            const Xex2CompressedBlockInfo* blocks = &((const Xex2FileNormalCompressionInfo*)(fileFormatInfo + 1))->firstBlock;
+            const uint32_t headerSize = header->headerSize.get();
+
+            const uint32_t exeLength = dataSize - headerSize;
+            const uint8_t* exeBuffer = srcData;
+
+            auto compressBuffer = std::make_unique<uint8_t[]>(exeLength);
+            const uint8_t* p = NULL;
+            uint8_t* d = NULL;
+            sha1::SHA1 s;
+
+            p = exeBuffer;
+            d = compressBuffer.get();
+
+            uint8_t blockCalcedDigest[0x14];
+            while (blocks->blockSize) 
+            {
+                const uint8_t* pNext = p + blocks->blockSize;
+                const auto* nextBlock = (const Xex2CompressedBlockInfo*)p;
+
+                s.reset();
+                s.processBytes(p, blocks->blockSize);
+                s.finalize(blockCalcedDigest);
+
+                if (memcmp(blockCalcedDigest, blocks->blockHash, 0x14) != 0)
+                    return {};
+
+                p += 4;
+                p += 20;
+
+                while (true) 
+                {
+                    const size_t chunkSize = (p[0] << 8) | p[1];
+                    p += 2;
+
+                    if (!chunkSize)
+                        break;
+
+                    memcpy(d, p, chunkSize);
+                    p += chunkSize;
+                    d += chunkSize;
+                }
+
+                p = pNext;
+                blocks = nextBlock;
+            }
+
+            int resultCode = 0;
+            uint32_t uncompressedSize = security->imageSize;
+            uint8_t* buffer = destData;
+
+            resultCode = lzxDecompress(compressBuffer.get(), d - compressBuffer.get(), buffer, uncompressedSize, ((const Xex2FileNormalCompressionInfo*)(fileFormatInfo + 1))->windowSize, nullptr, 0);
+
+            if (resultCode)
+                return {};
+        }
    }

    image.data = std::move(result);
@@ -201,8 +264,17 @@ Image Xex2LoadImage(const uint8_t* data, size_t dataSize)
    const auto* dosHeader = reinterpret_cast<IMAGE_DOS_HEADER*>(image.data.get());
    const auto* ntHeaders = reinterpret_cast<IMAGE_NT_HEADERS32*>(image.data.get() + dosHeader->e_lfanew);

-    image.base = ntHeaders->OptionalHeader.ImageBase;
-    image.entry_point = image.base + ntHeaders->OptionalHeader.AddressOfEntryPoint;
+    image.base = security->loadAddress;
+    const void* xex2BaseAddressPtr = getOptHeaderPtr(data, XEX_HEADER_IMAGE_BASE_ADDRESS);
+    if (xex2BaseAddressPtr != nullptr)
+    {
+        image.base = *reinterpret_cast<const be<uint32_t>*>(xex2BaseAddressPtr);
+    }
+    const void* xex2EntryPointPtr = getOptHeaderPtr(data, XEX_HEADER_ENTRY_POINT);
+    if (xex2EntryPointPtr != nullptr)
+    {
+        image.entry_point = *reinterpret_cast<const be<uint32_t>*>(xex2EntryPointPtr);
+    }

    const auto numSections = ntHeaders->FileHeader.NumberOfSections;
    const auto* sections = reinterpret_cast<const IMAGE_SECTION_HEADER*>(ntHeaders + 1);
@@ -227,10 +299,13 @@ Image Xex2LoadImage(const uint8_t* data, size_t dataSize)
        std::vector<std::string_view> stringTable;
        auto* pStrTable = reinterpret_cast<const char*>(imports + 1);

+        size_t paddedStringOffset = 0;
        for (size_t i = 0; i < imports->numImports; i++)
        {
-            stringTable.emplace_back(pStrTable);
-            pStrTable += strlen(pStrTable) + 1;
+            stringTable.emplace_back(pStrTable + paddedStringOffset);
+            
+            // pad the offset to the next multiple of 4
+            paddedStringOffset += ((stringTable.back().length() + 1) + 3) & ~3;
        }

        auto* library = (Xex2ImportLibrary*)(((char*)imports) + sizeof(Xex2ImportHeader) + imports->sizeOfStringTable);
--- a/XenonUtils/xex.h
+++ b/XenonUtils/xex.h
@@ -245,13 +245,17 @@ inline const void* getOptHeaderPtr(const uint8_t* moduleBytes, uint32_t headerKe
        const Xex2OptHeader& optHeader = ((const Xex2OptHeader*)(xex2Header + 1))[i];
        if (optHeader.key == headerKey)
        {
-            if ((headerKey & 0xFF) == 0)
+            if((headerKey & 0xFF) == 0)
            {
-                return &optHeader.value;
+                return reinterpret_cast<const uint32_t *>(&optHeader.value);
+            }
+            else if ((headerKey & 0xFF) == 1)
+            {
+                return reinterpret_cast<const void *>(&optHeader.value);
            }
            else
            {
-                return &moduleBytes[optHeader.offset];
+                return reinterpret_cast<const void *>(reinterpret_cast<uintptr_t>(moduleBytes) + optHeader.offset);
            }
        }
    }
--- a/XenonUtils/xex_patcher.cpp
+++ b/XenonUtils/xex_patcher.cpp
@@ -403,7 +403,63 @@ XexPatcher::Result XexPatcher::apply(const uint8_t* xexBytes, size_t xexBytesSiz
            memmove(outDataCursor, srcDataCursor, blocks[i].dataSize);
        }
    }
-    else if (fileFormatInfo->compressionType == XEX_COMPRESSION_NORMAL || fileFormatInfo->compressionType == XEX_COMPRESSION_DELTA)
+    else if (fileFormatInfo->compressionType == XEX_COMPRESSION_NORMAL)
+    {
+        const Xex2CompressedBlockInfo* blocks = &((const Xex2FileNormalCompressionInfo*)(fileFormatInfo + 1))->firstBlock;
+        const uint32_t exeLength = xexBytesSize - xexHeader->headerSize.get();
+        const uint8_t* exeBuffer = &outBytes[headerTargetSize];
+
+        auto compressBuffer = std::make_unique<uint8_t[]>(exeLength);
+        const uint8_t* p = NULL;
+        uint8_t* d = NULL;
+        sha1::SHA1 s;
+
+        p = exeBuffer;
+        d = compressBuffer.get();
+
+        uint8_t blockCalcedDigest[0x14];
+        while (blocks->blockSize) 
+        {
+            const uint8_t* pNext = p + blocks->blockSize;
+            const auto* nextBlock = (const Xex2CompressedBlockInfo*)p;
+
+            s.reset();
+            s.processBytes(p, blocks->blockSize);
+            s.finalize(blockCalcedDigest);
+
+            if (memcmp(blockCalcedDigest, blocks->blockHash, 0x14) != 0)
+                return Result::PatchFailed;
+
+            p += 4;
+            p += 20;
+
+            while (true) 
+            {
+                const size_t chunkSize = (p[0] << 8) | p[1];
+                p += 2;
+
+                if (!chunkSize)
+                    break;
+
+                memcpy(d, p, chunkSize);
+                p += chunkSize;
+                d += chunkSize;
+            }
+
+            p = pNext;
+            blocks = nextBlock;
+        }
+
+        int resultCode = 0;
+        uint32_t uncompressedSize = originalSecurityInfo->imageSize;
+        uint8_t* buffer = outBytes.data() + newXexHeaderSize;
+
+        resultCode = lzxDecompress(compressBuffer.get(), d - compressBuffer.get(), buffer, uncompressedSize, ((const Xex2FileNormalCompressionInfo*)(fileFormatInfo + 1))->windowSize, nullptr, 0);
+
+        if (resultCode)
+            return Result::PatchFailed;
+    }
+    else if (fileFormatInfo->compressionType == XEX_COMPRESSION_DELTA)
    {
        return Result::XexFileUnsupported;
    }
--- a/XenonUtils/xex_patcher.h
+++ b/XenonUtils/xex_patcher.h
@@ -16,6 +16,8 @@
 #include <span>
 #include <vector>

+extern int lzxDecompress(const void* lzxData, size_t lzxLength, void* dst, size_t dstLength, uint32_t windowSize, void* windowData, size_t windowDataLength);
+
 struct XexPatcher
 {
    enum class Result {
--- a/thirdparty/disasm/ppc-dis.c
+++ b/thirdparty/disasm/ppc-dis.c
@@ -840,7 +840,7 @@ const struct powerpc_operand powerpc_operands[] =
                           { 8, 0, insert_vperm, extract_vperm, 0 },
                          
                          #define VD3D0 VPERM128 + 1
-                           { 3, 18, NULL, NULL, 0 },
+                           { 7, 18, NULL, NULL, 0 },
                          
                          #define VD3D1 VD3D0 + 1
                           { 3, 16, NULL, NULL, 0 },
Author	SHA1	Message	Date
DeaTh-G	9ff80d8321	make store instructions check for mmio	2025-07-07 20:35:10 +02:00
DeaTh-G	830be1f69a	fix vaddsws implementation	2025-07-07 20:35:10 +02:00
DeaTh-G	a5d6382975	add remaining altivec instructions	2025-07-07 20:35:08 +02:00
DeaTh-G	1d452c60a8	add vpkuhus implementation	2025-07-07 20:33:33 +02:00
DeaTh-G	cea0b2fc38	Fix instruction implementations based on unit tests	2025-07-07 20:33:30 +02:00
DeaTh-G	f6193ebe43	add more basic instructions	2025-07-07 20:31:58 +02:00
DeaTh-G	f23d22bc7f	Fix indexing on certain instructions	2025-07-07 20:31:57 +02:00
DeaTh-G	847b750786	Add more instructions regarding Bakugan Battle Brawlers	2025-07-07 20:31:57 +02:00
Skyth (Asilkan)	865319a39c	Update README.md (#139 ) * Update README.md * Update README.md Co-authored-by: Hyper <34012267+hyperbx@users.noreply.github.com> --------- Co-authored-by: Hyper <34012267+hyperbx@users.noreply.github.com>	2025-04-17 11:29:46 +03:00
Jillian To	6df2397610	Added extra vpkd3d128 cases (5,2,2 and other 0,1) (#118 ) * added extra vpkd3d128 cases from dev branch * Fix whitespace * fix whitespace again * another whitespace fix * cleaned up float16_4 case * Fix whitespace * Allow variable shift * shift of 3 is not handled	2025-04-12 13:09:49 +03:00
The Spicy Chef	49c5e3b4f5	Added handling of normal compression for patching xex files (#126 ) * Added handling of normal compression for patching xex files * Added normal compression handling to XenonAnalyse * Swap calloc for unique_ptr, tidied up code layout	2025-04-12 13:05:53 +03:00
nithax	0bfeaed44a	XEX2 Loading Fixes (#51 ) * Fixes loading .xex import table names when a name is not aligned to 4 bytes. * Fixes loading .xex optional headers, adds missing case when the header_key & 0xFF == 1 * Fixes loading .xex base address and entry point to be the XEX2 base/entry to successfully resolve all import thunks.	2025-04-04 17:01:18 +03:00
Skyth (Asilkan)	c017eb630a	PPC context header adjustments. (#123 )	2025-03-21 17:40:55 +03:00
Skyth (Asilkan)	82b4cd3bb7	Fix return value from longjmp getting forgotten after setjmp. (#122 ) Restoring env to ctx was causing this because the return value was getting assigned to r3 before the if check.	2025-03-21 17:38:08 +03:00
Mystixor	c3934c624f	fix bitmask of VD3D0 operand (#113 )	2025-03-17 22:51:28 +03:00
Isaac Marovitz	1c571c8576	Better gitignore (#76 ) Signed-off-by: Isaac Marovitz <isaacryu@icloud.com>	2025-03-07 01:43:35 +03:00
Skyth (Asilkan)	7b8e37aa37	Fix the unsafe base address assumption. (#69 )	2025-03-06 17:56:18 +03:00
William Adam-Grenier	0bf1fd5477	Add Byte Patterns In Readme (#36 ) * Add Byte Patterns In Readme Adds the byte patterns for instructions to make it easier for newcomers to find the right functions address. * Added Instructions Added extra instruction for rest, save, restvmx and savevmx * Fix Typo	2025-03-05 01:51:55 +03:00