Initial Commit

This commit is contained in:
Sajid
2024-09-07 18:00:09 +06:00
commit 0f9a53f75a
3352 changed files with 1563708 additions and 0 deletions

View File

@@ -0,0 +1,12 @@
build/
vendor/llvm_root
*/.idea
src/auto-sync/config.json
src/autosync/cpptranslator/Tests/Differ/test_saved_patches.json
src/autosync.egg-info
src/autosync/Tests/MCUpdaterTests/ARCH/Output
src/autosync/Tests/MCUpdaterTests/Disassembler/ARCH/Output
src/autosync/lit_config/test_dir_*
src/autosync/lit_config/.lit_test_times.txt
src/autosync/Tests/MCUpdaterTests/test_output
src/autosync/Tests/MCUpdaterTests/ARCH/Output

View File

@@ -0,0 +1,124 @@
<!--
Copyright © 2022 Rot127 <unisono@quyllur.org>
SPDX-License-Identifier: BSD-3
-->
# Architecture of the Auto-Sync framework
This document is split into four parts.
1. An overview of the update process and which subcomponents of `auto-sync` do what.
2. The instructions how to update an architecture which already supports `auto-sync`.
3. Instructions how to refactor an architecture to use `auto-sync`.
4. Notes about how to add a new architecture to Capstone with `auto-sync`.
Please read the section about capstone module design in
[ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) before proceeding.
The architectural understanding is important for the following.
## Update procedure
As already described in the `ARCHITECTURE` document, Capstone uses translated
and generated source code from LLVM.
Because LLVM is written in C++ and Capstone in C the update process is
internally complicated but almost completely automated.
`auto-sync` categorizes source files of a module into three groups. Each group is updated differently.
| File type | Update method | Edits by hand |
|-----------------------------------|----------------------|------------------------|
| Generated files | Generated by patched LLVM backends | Never/Not allowed |
| Translated LLVM C++ files | `CppTranslater` and `Differ` | Only changes which are too complicated for automation. |
| Capstone files | By hand | all |
Let's look at the update procedure for each group in detail.
**Note**: The only exception to touch generated files is via git patches. This is the last resort
if something is broken in LLVM, and we cannot generate correct files.
**Generated files**
Generated files always have the file extension `.inc`.
There are generated files for the LLVM code and for Capstone. They can be distinguished by their names:
- For Capstone: `<ARCH>GenCS<NAME>.inc`.
- For LLVM code: `<ARCH>Gen<NAME>.inc`.
The files are generated by refactored [LLVM TableGen emitter backends](https://github.com/capstone-engine/llvm-capstone/tree/dev/llvm/utils/TableGen).
The procedure looks roughly like this:
```
┌──────────┐
1 2 3 4 │CS .inc │
┌───────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐ ┌─►│files │
│ .td │ │ │ │ │ │ Code- │ │ └──────────┘
│ files ├────►│ TableGen ├────►│ CodeGen ├────►│ Emitter ├──┤
└───────┘ └──────┬────┘ └───────────┘ └──────────┘ │ ┌──────────┐
│ ▲ └─►│LLVM .inc │
└─────────────────────────────────┘ │files │
└──────────┘
```
1. LLVM architectures are defined in `.td` files. They describe instructions, operands,
features and other properties of an architecture.
2. [LLVM TableGen](https://llvm.org/docs/TableGen/index.html) parses these files
and converts them to an internal representation.
3. In the second step a TableGen component called [CodeGen](https://llvm.org/docs/CodeGenerator.html)
abstracts the these properties even further.
The result is a representation which is _not_ specific to any architecture
(e.g. the `CodeGenInstruction` class can represent a machine instruction of any architecture).
4. The `Code-Emitter` uses the abstract representation of the architecture (provided from `CodeGen`) to
generated state machines for instruction decoding.
Architecture specific information (think of register names, operand properties etc.)
is taken from `TableGen's` internal representation.
The result is emitted to `.inc` files. Those are included in the translated C++ files or Capstone code where necessary.
**Translation of LLVM C++ files**
We use two tools to translate C++ to C files.
First the `CppTranslator` and afterward the `Differ`.
The `CppTranslator` parses the C++ files and patches C++ syntax
with its equivalent C syntax.
_Note_: For details about this checkout `suite/auto-sync/CppTranslator/README.md`.
Because the result of the `CppTranslator` is not perfect,
we still have many syntax problems left.
Those need to be fixed partially by hand.
**Differ**
In order to ease this process we run the `Differ` after the `CppTranslator`.
The `Differ` compares our two versions of C files we have now.
One of them are the C files currently used by the architecture module.
On the other hand we have the translated C files. Those are still faulty and need to be fixed.
Most fixes are syntactical problems. Those were almost always resolved before, during the last update.
The `Differ` helps you to compare the files and let you select which version to accept.
Sometimes (not very often though), the newly translated C files contain important changes.
Most often though, the old files are already correct.
The `Differ` parses both files into an abstract syntax tree and compares certain nodes with the same name
(mostly functions).
The user can choose if she accepts the version from the translated file or the old file.
This decision is saved for every node.
If there exists a saved decision for two nodes, and the nodes did not change since the last time,
it applies the previous decision automatically again.
The `Differ` is far from perfect. It only helps to automatically apply "known to be good" fixes
and gives the user a better interface to solve the other problems.
But there will still be syntax errors left afterward. These must be fixed by hand.

View File

@@ -0,0 +1,221 @@
<!--
Copyright © 2022 Rot127 <unisono@quyllur.org>
SPDX-License-Identifier: BSD-3
-->
# Architecture updater - Auto-Sync
`auto-sync` is the architecture update tool for Capstone.
Because the architecture modules of Capstone use mostly code from LLVM,
we need to update this part with every LLVM release. `auto-sync` helps
with this synchronization between LLVM and Capstone's modules by
automating most of it.
Please refer to [intro.md](intro.md) for an introduction about this tool.
## Install
#### Setup Python environment and Tree-sitter
```
cd <root-dir-Capstone>
# Python version must be at least 3.11
sudo apt install python3-venv
# Setup virtual environment in Capstone root dir
python3 -m venv ./.venv
source ./.venv/bin/activate
```
#### Install Auto-Sync framework
```
cd suite/auto-sync/
pip install -e .
```
#### Clone Capstones LLVM fork and build `llvm-tblgen`
```bash
git clone https://github.com/capstone-engine/llvm-capstone vendor/llvm_root/
cd llvm-capstone
git checkout auto-sync
mkdir build
cd build
# You can also build the "Release" version
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ../llvm
cmake --build . --target llvm-tblgen --config Debug
cd ../../
```
#### Install `llvm-mc` and `FileCheck`
Additionally, we need `llvm-mc` and `FileCheck` to generate our regression tests.
You can build it, but it will take a lot of space on your hard drive.
You can also get the binaries [here](https://releases.llvm.org/download.html) or
install it with your package manager (usually something like `llvm-18-dev`).
Just ensure it is in your `PATH` as `llvm-mc` and `FileCheck` (not as `llvm-mc-18` or similar though!).
## Architecture
Please read [ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) to understand how Auto-Sync works.
This step is essential! Please don't skip it.
## Update an architecture
Updating an architecture module to the newest LLVM release, is only possible if it uses Auto-Sync.
Not all arch-modules support Auto-Sync yet.
Check if your architecture is supported.
```
./src/autosync/ASUpdater.py -h
```
Run the updater
```
./src/autosync/ASUpdater.py -a <ARCH>
```
## Update procedure
1. Run the `ASUpdater.py` script.
2. Compare the functions in `<ARCH>DisassemblerExtension.*` to LLVM (search the function names in the LLVM root)
and update them if necessary.
3. Try to build Capstone and fix the build errors.
## Post-processing steps
This update translates some LLVM C++ files to C.
Because the translation is not perfect (maybe it will some day)
you will get build errors if you try to compile Capstone.
The last step to finish the update is to fix those build errors by hand.
## Additional details
### Overview updated files
This is a rough overview what files of an architecture are updated and where they are coming from.
**Files originating from LLVM** (Automatically updated)
These files are LLVM source files which were translated from C++ to C
Not all the listed files below are used by each architecture.
But those are the most common.
- `<ARCH>Disassembler.*`: Bytes to `MCInst` decoder.
- `<ARCH>InstPrinter.*` or `<ARCH>AsmPrinter.*`: `MCInst` to asm string decoder.
- `<ARCH>BaseInfo.*`: Commonly use functions and definitions.
`*.inc` files are exclusively generated by LLVM TableGen backends:
`*.inc` files for the LLVM component are named like this:
- `<ARCH>Gen*.inc` (note: no `CS` in the name)
Additionally, we generate more details for Capstone with `llvm-tblgen`.
Like enums, operand details and other things.
They are saved also to `*.inc` files, but have the `CS` in the name to make them distinct from the LLVM generated files.
- `<ARCH>GenCS*.inc`
**Capstone module files** (Not automatically updated)
Those files are written by us:
- `<ARCH>DisassemblerExtension.*` All kind of functions which are needed by the LLVM component, but could not be generated or translated.
- `<ARCH>Mapping.*`: Binding code between the architecture module and the LLVM files. This is also where the detail is set.
- `<ARCH>Module.*`: Interface to the Capstone core.
### Relevant documentation and troubleshooting
**LLVM file translation**
For details about the C++ to C translation of the LLVM files refer to `CppTranslator/README.md`.
**Generated .inc files**
Documentation about the `.inc` file generation is in the [llvm-capstone](https://github.com/capstone-engine/llvm-capstone) repository.
**Troubleshooting**
- If some features aren't generated and are missing in the `.inc` files, make sure they are defined as `AssemblerPredicate` in the `.td` files.
Correct:
```
def In32BitMode : Predicate<"!Subtarget->isPPC64()">,
AssemblerPredicate<(all_of (not Feature64Bit)), "64bit">;
```
Incorrect:
```
def In32BitMode : Predicate<"!Subtarget->isPPC64()">;
```
**Formatting**
- If you make changes to the `CppTranslator` please format the files with `black` and `usort`
```
pip3 install black usort
python3 -m usort format src/autosync
python3 -m black src/autosync
```
## Refactor an architecture for Auto-Sync framework
Not all architecture modules support Auto-Sync yet.
Here is an overview of the steps to add support for it.
<hr>
To refactor one of them to use `auto-sync`, you need to add it to the configuration.
1. Add the architecture to the supported architectures list in `ASUpdater.py`.
2. Configure the `CppTranslator` for your architecture (`suite/auto-sync/CppTranslator/arch_config.json`)
Now, manually run the update commands within `ASUpdater.py` but *skip* the `Differ` step:
```
./Updater/ASUpdater.py -a <ARCH> -s IncGen Translate
```
The task after this is to:
- Replace leftover C++ syntax with its C equivalent.
- Implement the `add_cs_detail()` handler in `<ARCH>Mapping` for each operand type.
- Edit the main header file of the architecture (`include/capstone/<ARCH>.h`) to include the generated enums (see below)
- Add any missing logic to the translated files.
- Make it build and write tests.
- Run the Differ again and always select the old nodes.
**Notes:**
- Some generated enums must be included in the `include/capstone/<ARCH>.h` header.
At the position where the enum should be inserted, add a comment like this (don't remove the `<>` brackets):
```
// generate content <FILENAME.inc> begin
// generate content <FILENAME.inc> end
```
The update script will insert the content of the `.inc` file at this place.
- If you find yourself fixing the same syntax error multiple times,
please consider adding a `Patch` to the `CppTranslator` for this case.
- Please check out the implementation of ARM's `add_cs_detail()` before implementing your own.
- Running the `Differ` after everything is done, preserves your version of syntax corrections, and the next user can auto-apply them.
- Sometimes the LLVM code uses a single function from a larger source file.
It is not worth it to translate the whole file just for this function.
Bundle those lonely functions in `<ARCH>DisassemblerExtension.c`.
## Adding a new architecture
Adding a new architecture follows the same steps as above. With the exception that you need
to implement all the Capstone files from scratch.
Check out an `auto-sync` supporting architectures for guidance and open an issue if you need help.

View File

@@ -0,0 +1,16 @@
cmake_minimum_required(VERSION 3.15)
set(AUTO_SYNC_C_TEST_SRC_DIR ${AUTO_SYNC_C_TEST_DIR}/src)
set(AUTO_SYNC_C_TEST_INC_DIR ${AUTO_SYNC_C_TEST_DIR}/include)
include_directories(${AUTO_SYNC_C_TEST_INC_DIR} ${PROJECT_SOURCE_DIR}/include)
file(GLOB AUTO_SYNC_C_SRC ${AUTO_SYNC_C_TEST_SRC_DIR}/*.c)
add_executable(compat_header_build_test ${AUTO_SYNC_C_SRC})
add_dependencies(compat_header_build_test capstone)
target_link_libraries(compat_header_build_test PUBLIC capstone)
add_test(NAME ASCompatibilityHeaderTest
COMMAND compat_header_build_test
WORKING_DIRECTORY ${AUTO_SYNC_C_TEST_DIR}
)

View File

@@ -0,0 +1,6 @@
<!--
Copyright © 2024 Rot127 <unisono@quyllur.org>
SPDX-License-Identifier: BSD-3
-->
Compilation tests for the generated source code.

View File

@@ -0,0 +1,67 @@
// SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
// SPDX-License-Identifier: BSD-3.0-Clause
#include <stdio.h>
#include <inttypes.h>
#define CAPSTONE_AARCH64_COMPAT_HEADER
#include <capstone/capstone.h>
int main(void)
{
csh handle;
if (cs_open(CS_ARCH_ARM64, CS_MODE_BIG_ENDIAN, &handle) != CS_ERR_OK) {
fprintf(stderr, "cs_open failed\n");
return -1;
}
cs_option(handle, CS_OPT_DETAIL, CS_OPT_ON);
cs_insn *insn;
uint8_t bytes[] = "0x1a,0x48,0xa0,0xf8";
size_t count =
cs_disasm(handle, bytes, sizeof(bytes), 0x1000, 1, &insn);
if (count != 1) {
fprintf(stderr, "Failed to disassemble code.\n");
goto err;
}
printf("0x%" PRIx64 ":\t%s\t\t%s\n", insn[0].address, insn[0].mnemonic,
insn[0].op_str);
printf("A register = %s\n",
cs_reg_name(handle, insn[0].detail->arm64.operands[0].reg));
printf("An imm = 0x%" PRIx64 "\n",
insn[0].detail->arm64.operands[1].imm);
if (insn[0].address != 0x1000) {
fprintf(stderr, "Address wrong.\n");
goto err;
}
if (strcmp(insn[0].mnemonic, "adr") != 0) {
fprintf(stderr, "Mnemonic wrong.\n");
goto err;
}
if (strcmp(insn[0].op_str, "x1, 0xf162d") != 0) {
fprintf(stderr, "op_str wrong.\n");
goto err;
}
if (strcmp(cs_reg_name(handle, insn[0].detail->arm64.operands[0].reg),
"x1") != 0) {
fprintf(stderr, "register wrong.\n");
goto err;
}
if (insn[0].detail->arm64.operands[1].imm != 0xf162d) {
fprintf(stderr, "Immediate wrong.\n");
goto err;
}
cs_free(insn, count);
cs_close(&handle);
return 0;
err:
printf("ERROR: Failed to disassemble given code corrcetly!\n");
cs_free(insn, count);
cs_close(&handle);
return -1;
}

View File

@@ -0,0 +1,3 @@
#!/usr/bin/bash
python3 -m black src/autosync

View File

@@ -0,0 +1,350 @@
# Set the vector data type for vector instruction.
# Unfortunately we cannot get this information from the td files.
# See https://github.com/capstone-engine/capstone/issues/2152
# for a possible solution.
diff --git a/arch/ARM/ARMGenAsmWriter.inc b/arch/ARM/ARMGenAsmWriter.inc
index 3a4e61abf..635bfefb0 100644
--- a/arch/ARM/ARMGenAsmWriter.inc
+++ b/arch/ARM/ARMGenAsmWriter.inc
@@ -9927,15 +9927,18 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 13:
// FCONSTD, VABSD, VADDD, VCMPD, VCMPED, VCMPEZD, VCMPZD, VDIVD, VFMAD, V...
SStream_concat0(O, ".f64\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F64);
printOperand(MI, 0, O);
break;
case 14:
// FCONSTH, MVE_VABDf16, MVE_VABSf16, MVE_VADD_qr_f16, MVE_VADDf16, MVE_V...
SStream_concat0(O, ".f16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F16);
break;
case 15:
// FCONSTS, MVE_VABDf32, MVE_VABSf32, MVE_VADD_qr_f32, MVE_VADDf32, MVE_V...
SStream_concat0(O, ".f32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F32);
break;
case 16:
// FMSTAT
@@ -9976,38 +9979,47 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 21:
// MVE_VABAVs16, MVE_VABDs16, MVE_VABSs16, MVE_VADDVs16acc, MVE_VADDVs16n...
SStream_concat0(O, ".s16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_S16);
break;
case 22:
// MVE_VABAVs32, MVE_VABDs32, MVE_VABSs32, MVE_VADDLVs32acc, MVE_VADDLVs3...
SStream_concat0(O, ".s32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_S32);
break;
case 23:
// MVE_VABAVs8, MVE_VABDs8, MVE_VABSs8, MVE_VADDVs8acc, MVE_VADDVs8no_acc...
SStream_concat0(O, ".s8\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_S8);
break;
case 24:
// MVE_VABAVu16, MVE_VABDu16, MVE_VADDVu16acc, MVE_VADDVu16no_acc, MVE_VC...
SStream_concat0(O, ".u16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_U16);
break;
case 25:
// MVE_VABAVu32, MVE_VABDu32, MVE_VADDLVu32acc, MVE_VADDLVu32no_acc, MVE_...
SStream_concat0(O, ".u32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_U32);
break;
case 26:
// MVE_VABAVu8, MVE_VABDu8, MVE_VADDVu8acc, MVE_VADDVu8no_acc, MVE_VCMPu8...
SStream_concat0(O, ".u8\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_U8);
break;
case 27:
// MVE_VADC, MVE_VADCI, MVE_VADD_qr_i32, MVE_VADDi32, MVE_VBICimmi32, MVE...
SStream_concat0(O, ".i32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_I32);
break;
case 28:
// MVE_VADD_qr_i16, MVE_VADDi16, MVE_VBICimmi16, MVE_VCADDi16, MVE_VCLZs1...
SStream_concat0(O, ".i16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_I16);
break;
case 29:
// MVE_VADD_qr_i8, MVE_VADDi8, MVE_VCADDi8, MVE_VCLZs8, MVE_VCMPi8, MVE_V...
SStream_concat0(O, ".i8\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_I8);
break;
case 30:
// MVE_VCTP64, MVE_VSTRD64_qi, MVE_VSTRD64_qi_pre, MVE_VSTRD64_rq, MVE_VS...
@@ -10016,12 +10028,14 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 31:
// MVE_VCVTf16f32bh, MVE_VCVTf16f32th, VCVTBSH, VCVTTSH, VCVTf2h
SStream_concat0(O, ".f16.f32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F16F32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
break;
case 32:
// MVE_VCVTf16s16_fix, MVE_VCVTf16s16n, VCVTs2hd, VCVTs2hq, VCVTxs2hd, VC...
SStream_concat0(O, ".f16.s16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F16S16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10029,6 +10043,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 33:
// MVE_VCVTf16u16_fix, MVE_VCVTf16u16n, VCVTu2hd, VCVTu2hq, VCVTxu2hd, VC...
SStream_concat0(O, ".f16.u16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F16U16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10036,6 +10051,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 34:
// MVE_VCVTf32f16bh, MVE_VCVTf32f16th, VCVTBHS, VCVTTHS, VCVTh2f
SStream_concat0(O, ".f32.f16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F32F16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10044,6 +10060,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 35:
// MVE_VCVTf32s32_fix, MVE_VCVTf32s32n, VCVTs2fd, VCVTs2fq, VCVTxs2fd, VC...
SStream_concat0(O, ".f32.s32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F32S32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10051,6 +10068,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 36:
// MVE_VCVTf32u32_fix, MVE_VCVTf32u32n, VCVTu2fd, VCVTu2fq, VCVTxu2fd, VC...
SStream_concat0(O, ".f32.u32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F32U32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10058,6 +10076,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 37:
// MVE_VCVTs16f16_fix, MVE_VCVTs16f16a, MVE_VCVTs16f16m, MVE_VCVTs16f16n,...
SStream_concat0(O, ".s16.f16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_S16F16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10065,6 +10084,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 38:
// MVE_VCVTs32f32_fix, MVE_VCVTs32f32a, MVE_VCVTs32f32m, MVE_VCVTs32f32n,...
SStream_concat0(O, ".s32.f32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_S32F32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10072,6 +10092,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 39:
// MVE_VCVTu16f16_fix, MVE_VCVTu16f16a, MVE_VCVTu16f16m, MVE_VCVTu16f16n,...
SStream_concat0(O, ".u16.f16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_U16F16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10079,6 +10100,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 40:
// MVE_VCVTu32f32_fix, MVE_VCVTu32f32a, MVE_VCVTu32f32m, MVE_VCVTu32f32n,...
SStream_concat0(O, ".u32.f32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_U32F32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10097,16 +10119,19 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 43:
// MVE_VLDRDU64_qi, MVE_VLDRDU64_qi_pre, MVE_VLDRDU64_rq, MVE_VLDRDU64_rq...
SStream_concat0(O, ".u64\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_U64);
break;
case 44:
// MVE_VMOVimmi64, VADDHNv2i32, VADDv1i64, VADDv2i64, VMOVNv2i32, VMOVv1i...
SStream_concat0(O, ".i64\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_I64);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
break;
case 45:
// MVE_VMULLBp16, MVE_VMULLTp16
SStream_concat0(O, ".p16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_P16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10117,6 +10142,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 46:
// MVE_VMULLBp8, MVE_VMULLTp8, VMULLp8, VMULpd, VMULpq
SStream_concat0(O, ".p8\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_P8);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10137,6 +10163,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 49:
// VCVTBDH, VCVTTDH
SStream_concat0(O, ".f16.f64\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F16F64);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 2, O);
@@ -10145,6 +10172,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 50:
// VCVTBHD, VCVTTHD
SStream_concat0(O, ".f64.f16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F64F16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10153,6 +10181,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 51:
// VCVTDS
SStream_concat0(O, ".f64.f32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F64F32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10161,6 +10190,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 52:
// VCVTSD
SStream_concat0(O, ".f32.f64\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F32F64);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10169,6 +10199,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 53:
// VJCVT, VTOSIRD, VTOSIZD, VTOSLD
SStream_concat0(O, ".s32.f64\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_S32F64);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10236,12 +10267,14 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 67:
// VQADDsv1i64, VQADDsv2i64, VQMOVNsuv2i32, VQMOVNsv2i32, VQRSHLsv1i64, V...
SStream_concat0(O, ".s64\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_S64);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
break;
case 68:
// VSHTOD
SStream_concat0(O, ".f64.s16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F64S16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10252,6 +10285,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 69:
// VSHTOS
SStream_concat0(O, ".f32.s16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F32S16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10262,6 +10296,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 70:
// VSITOD, VSLTOD
SStream_concat0(O, ".f64.s32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F64S32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10269,6 +10304,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 71:
// VSITOH, VSLTOH
SStream_concat0(O, ".f16.s32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F16S32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10276,6 +10312,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 72:
// VTOSHD
SStream_concat0(O, ".s16.f64\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_S16F64);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10286,6 +10323,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 73:
// VTOSHS
SStream_concat0(O, ".s16.f32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_S16F32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10296,6 +10334,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 74:
// VTOSIRH, VTOSIZH, VTOSLH
SStream_concat0(O, ".s32.f16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_S32F16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10303,6 +10342,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 75:
// VTOUHD
SStream_concat0(O, ".u16.f64\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_U16F64);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10313,6 +10353,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 76:
// VTOUHS
SStream_concat0(O, ".u16.f32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_U16F32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10323,6 +10364,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 77:
// VTOUIRD, VTOUIZD, VTOULD
SStream_concat0(O, ".u32.f64\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_U32F64);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10330,6 +10372,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 78:
// VTOUIRH, VTOUIZH, VTOULH
SStream_concat0(O, ".u32.f16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_U32F16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10337,6 +10380,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 79:
// VUHTOD
SStream_concat0(O, ".f64.u16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F64U16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10347,6 +10391,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 80:
// VUHTOS
SStream_concat0(O, ".f32.u16\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F32U16);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10357,6 +10402,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 81:
// VUITOD, VULTOD
SStream_concat0(O, ".f64.u32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F64U32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);
@@ -10364,6 +10410,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 82:
// VUITOH, VULTOH
SStream_concat0(O, ".f16.u32\t");
+ ARM_add_vector_data(MI, ARM_VECTORDATA_F16U32);
printOperand(MI, 0, O);
SStream_concat0(O, ", ");
printOperand(MI, 1, O);

View File

@@ -0,0 +1,48 @@
diff --git a/arch/ARM/ARMGenAsmWriter.inc b/arch/ARM/ARMGenAsmWriter.inc
index 635bfefb0..35f2fe3c8 100644
--- a/arch/ARM/ARMGenAsmWriter.inc
+++ b/arch/ARM/ARMGenAsmWriter.inc
@@ -9870,14 +9870,17 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 1:
// VLD1LNdAsm_16, VLD1LNdWB_fixed_Asm_16, VLD1LNdWB_register_Asm_16, VLD2...
SStream_concat0(O, ".16\t");
+ ARM_add_vector_size(MI, 16);
break;
case 2:
// VLD1LNdAsm_32, VLD1LNdWB_fixed_Asm_32, VLD1LNdWB_register_Asm_32, VLD2...
SStream_concat0(O, ".32\t");
+ ARM_add_vector_size(MI, 32);
break;
case 3:
// VLD1LNdAsm_8, VLD1LNdWB_fixed_Asm_8, VLD1LNdWB_register_Asm_8, VLD2LNd...
SStream_concat0(O, ".8\t");
+ ARM_add_vector_size(MI, 8);
break;
case 4:
// t2LDR_POST_imm, t2LDR_PRE_imm, t2STR_POST_imm, t2STR_PRE_imm
@@ -10024,6 +10027,7 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 30:
// MVE_VCTP64, MVE_VSTRD64_qi, MVE_VSTRD64_qi_pre, MVE_VSTRD64_rq, MVE_VS...
SStream_concat0(O, ".64\t");
+ ARM_add_vector_size(MI, 64);
break;
case 31:
// MVE_VCVTf16f32bh, MVE_VCVTf16f32th, VCVTBSH, VCVTTSH, VCVTf2h
@@ -10207,14 +10211,17 @@ void printInstruction(MCInst *MI, uint64_t Address, SStream *O)
case 54:
// VLD1LNd16, VLD1LNd16_UPD, VLD2LNd16, VLD2LNd16_UPD, VLD2LNq16, VLD2LNq...
SStream_concat0(O, ".16\t{");
+ ARM_add_vector_size(MI, 16);
break;
case 55:
// VLD1LNd32, VLD1LNd32_UPD, VLD2LNd32, VLD2LNd32_UPD, VLD2LNq32, VLD2LNq...
SStream_concat0(O, ".32\t{");
+ ARM_add_vector_size(MI, 32);
break;
case 56:
// VLD1LNd8, VLD1LNd8_UPD, VLD2LNd8, VLD2LNd8_UPD, VLD3DUPd8, VLD3DUPd8_U...
SStream_concat0(O, ".8\t{");
+ ARM_add_vector_size(MI, 8);
break;
case 57:
// VLDR_FPCXTNS_off, VLDR_FPCXTNS_post, VLDR_FPCXTNS_pre, VMSR_FPCXTNS, V...

View File

@@ -0,0 +1,96 @@
## Why the Auto-Sync framework?
Capstone provides a simple API to leverage the LLVM disassemblers, without
having the big footprint of LLVM itself.
It does this by using a stripped down copy of LLVM disassemblers (one for each architecture)
and provides a uniform API to them.
The actual disassembly task (bytes to asm-text and decoded operands) is completely done by
the LLVM code.
Capstone takes the disassembled instructions, adds details to them (operand read/write info etc.)
and organizes them to a uniform structure (`cs_insn`, `cs_detail` etc.).
These objects are then accessible from the API.
Capstone is in C and LLVM is in C++. So to use the disassembler modules of LLVM,
Capstone effectively translates LLVM source files from C++ to C, without changing the semantics.
One could also call it a "disassembler port".
Capstone supports multiple architectures. So whenever LLVM
has a new release and adds more instructions, Capstone needs to update its modules as well.
In the past, the update procedure was done by hand and with some Python scripts.
But the task was tedious and error-prone.
To ease the complicated update procedure, Auto-Sync comes in.
<hr>
## How LLVM disassemblers work
Because effectively use the LLVM disassembler logic, one must understand how they operate.
Each architecture is defined in a so-called `.td` file, that is, a "Target Description" file.
Those files are a declarative description of an architecture.
They are written in a Domain-Specific Language called [TableGen](https://llvm.org/docs/TableGen/).
They contain instructions, registers, processor features, which instructions operands read and write and more information.
These files are consumed by "TableGen Backends". They parse and process them to generate C++ code.
The generated code is for example: enums, decoding algorithms (for instructions and operands) or
lookup tables for register names or alias.
Additionally, LLVM has handwritten files. They use the generated code to build the actual instruction classes
and handle architecture specific edge cases.
Capstone uses both of those files. The generated ones as well as the handwritten ones.
## Overview of updating steps
An Auto-Sync update has multiple steps:
**(1)** Changes in the auto-generated C++ files are handled completely automatically,
We have a LLVM fork with patched TableGen-backends, so they emit C code.
**(2)** Changes in LLVM's handwritten sources are handled semi-automatically.
For each source file, we search C++ syntax and replace it with the equivalent C syntax.
For this task we have the CppTranslator.
The end result is of course not perfectly valid C code.
It is merely an intermediate file, which still has some C++ syntax in it.
Because this leftover syntax was likely already fixed in the equivalent C file currently in Capstone,
we have a last step.
The translated file is diffed with the corresponding old file in Capstone.
The `Differ` tool parses both files into an abstract syntax tree.
From this AST it picks nodes with the same name and diffs them.
The diff is given to the user, and they can decide which one to accept.
All choices are also recorded and automatically applied next time.
**Example**
> Suppose there is a file `ArchDisassembler.cpp` in LLVM.
> Capstone has the C equivalent `ArchDisassembler.c`.
>
> Now LLVM has a new release, and there were several additions in `ArchDisassembler.cpp`.
>
> Auto-Sync will pass `ArchDisassembler.cpp` to the CppTranslator, which replaces most C++ syntax.
> The result is an intermediate file `transl_ArchDisassembler.cpp`.
>
> The result is close to what we want (C code), but still contains invalid syntax.
> Most of this syntax errors were fixed before. They must be, because the C file `ArchDisassemble.c`
> is working fine.
>
> So the intermediate file `transl_ArchDisassebmler.cpp` is compared to the old `ArchDisassemble.c.
> The Differ patches both files to an AST and automatically patches all nodes it can.
>
> Effectively automate most of the boring, mechanical work involved in fixing-up `transl_ArchDisassebmler.cpp`.
> If something new came up, it asks the user for a decission.
>
> The result is saved to `ArchDisassembler.c`, which is now up-to-date with the newest LLVM release.
>
> In practice this file will still contain syntax errors. But not many, so they can easily be resolved.
**(3)** After (1) and (2), some changes in Capstone-only files follow.
This step is manual work.

View File

@@ -0,0 +1,24 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# Copyright © 2024 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
[project]
name = "autosync"
version = "0.1.0"
dependencies = [
"termcolor >= 2.3.0",
"tree_sitter == 0.22.3",
"tree-sitter-cpp >=0.22.0",
"black >= 24.3.0",
"usort >= 1.0.8",
"setuptools >= 69.2.0",
"ninja >= 1.11.1.1",
"reuse >= 3.0.1",
"clang-format >= 18.1.1",
"lit >= 18.1.8",
]
requires-python = ">= 3.11"
[tool.setuptools]
packages = ["autosync", "autosync.cpptranslator", "autosync.cpptranslator.patches"]
package-dir = {"" = "src"}

View File

@@ -0,0 +1,317 @@
#!/usr/bin/env python3
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import argparse
import logging as log
import os
import shutil
import subprocess
import sys
from enum import StrEnum
from pathlib import Path
from autosync.cpptranslator.Configurator import Configurator
from autosync.cpptranslator.CppTranslator import Translator
from autosync.HeaderPatcher import CompatHeaderBuilder, HeaderPatcher
from autosync.Helper import check_py_version, convert_loglevel, fail_exit, get_path
from autosync.IncGenerator import IncGenerator
from autosync.MCUpdater import MCUpdater
class USteps(StrEnum):
INC_GEN = "IncGen"
TRANS = "Translate"
DIFF = "Diff"
MC = "MCUpdate"
PATCH_HEADER = "PatchArchHeader"
ALL = "All"
class ASUpdater:
"""
The auto-sync updater.
"""
def __init__(
self,
arch: str,
write: bool,
steps: list[USteps],
inc_list: list,
no_clean: bool,
refactor: bool,
differ_no_auto_apply: bool,
wait_for_user: bool = True,
) -> None:
self.arch = arch
self.write = write
self.no_clean_build = no_clean
self.inc_list = inc_list
self.wait_for_user = wait_for_user
if USteps.ALL in steps:
self.steps = [
USteps.INC_GEN,
USteps.TRANS,
USteps.DIFF,
USteps.MC,
USteps.PATCH_HEADER,
]
else:
self.steps = steps
self.refactor = refactor
self.differ_no_auto_apply = differ_no_auto_apply
self.arch_dir = get_path("{CS_ARCH_MODULE_DIR}").joinpath(self.arch)
if not self.no_clean_build:
self.clean_build_dir()
self.inc_generator = IncGenerator(
self.arch,
self.inc_list,
)
self.mc_updater = MCUpdater(
self.arch,
get_path("{LLVM_MC_TEST_DIR}"),
None,
None,
True if self.arch == "ARM" else False,
)
def clean_build_dir(self) -> None:
log.info("Clean build directory")
path: Path
for path in get_path("{BUILD_DIR}").iterdir():
log.debug(f"Delete {path}")
if path.is_dir():
shutil.rmtree(path)
else:
os.remove(path)
def patch_main_header(self) -> list:
"""
Patches the main header of the arch with the .inc files.
It returns a list of files it has patched into the main header.
"""
if not self.write:
return []
main_header = get_path("{CS_INCLUDE_DIR}").joinpath(f"{self.arch.lower()}.h")
# Just try every inc file
patched = []
for file in get_path("{C_INC_OUT_DIR}").iterdir():
patcher = HeaderPatcher(main_header, file)
if patcher.patch_header():
# Save the path. This file should not be moved.
patched.append(file)
if self.arch == "AArch64":
# Update the compatibility header
builder = CompatHeaderBuilder(
aarch64_h=main_header,
arm64_h=get_path("{CS_INCLUDE_DIR}").joinpath(f"arm64.h"),
)
builder.generate_aarch64_compat_header()
return patched
def copy_files(self, path: Path, dest: Path) -> None:
"""
Copies files from path to dest.
If path is a directory it copies all files in it.
If it is a file, it only copies it.
"""
if not self.write:
return
if not dest.is_dir():
fail_exit(f"{dest} is not a directory.")
if path.is_file():
log.debug(f"Copy {path} to {dest}")
shutil.copy(path, dest)
return
for file in path.iterdir():
log.debug(f"Copy {path} to {dest}")
shutil.copy(file, dest)
def check_tree_sitter(self) -> None:
ts_dir = get_path("{VENDOR_DIR}").joinpath("tree-sitter-cpp")
if not ts_dir.exists():
log.info("tree-sitter was not fetched. Cloning it now...")
subprocess.run(
["git", "submodule", "update", "--init", "--recursive"], check=True
)
def translate(self) -> None:
self.check_tree_sitter()
translator_config = get_path("{CPP_TRANSLATOR_CONFIG}")
configurator = Configurator(self.arch, translator_config)
translator = Translator(configurator, self.wait_for_user)
translator.translate()
translator.remark_manual_files()
def diff(self) -> None:
translator_config = get_path("{CPP_TRANSLATOR_CONFIG}")
configurator = Configurator(self.arch, translator_config)
from autosync.cpptranslator.Differ import Differ
differ = Differ(configurator, self.differ_no_auto_apply)
differ.diff()
def update(self) -> None:
if USteps.INC_GEN in self.steps:
self.inc_generator.generate()
if USteps.PATCH_HEADER in self.steps:
if self.write:
patched = self.patch_main_header()
log.info(f"Patched {len(patched)} .inc files into the main header.")
else:
log.info("Patching the main header requires the -w flag.")
if USteps.TRANS in self.steps:
self.translate()
if USteps.DIFF in self.steps:
self.diff()
if USteps.MC in self.steps:
self.mc_updater.gen_all()
if self.write:
# Copy .inc files
log.info(f"Copy .inc files to {self.arch_dir}")
i = 0
arch_header = get_path("{CS_INCLUDE_DIR}").joinpath(
f"{self.arch.lower()}.h"
)
for file in get_path("{C_INC_OUT_DIR}").iterdir():
if HeaderPatcher.file_in_main_header(arch_header, file.name):
continue
self.copy_files(file, self.arch_dir)
i += 1
log.info(f"Copied {i} files")
i = 0
# Diffed files
log.info(f"Copy diffed files to {self.arch_dir}")
for file in get_path("{CPP_TRANSLATOR_DIFF_OUT_DIR}").iterdir():
self.copy_files(file, self.arch_dir)
i += 1
log.info(f"Copied {i} files")
# MC tests
i = 0
mc_dir = get_path("{MC_DIR}").joinpath(self.arch)
log.info(f"Copy MC test files to {mc_dir}")
for file in get_path("{MCUPDATER_OUT_DIR}").iterdir():
self.copy_files(file, mc_dir)
i += 1
log.info(f"Copied {i} files")
exit(0)
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
prog="Auto-Sync-Updater",
description="Capstones architecture module updater.",
)
parser.add_argument(
"-a",
dest="arch",
help="Name of target architecture.",
choices=["ARM", "PPC", "AArch64", "Alpha", "LoongArch"],
required=True,
)
parser.add_argument(
"-d",
dest="no_clean",
help="Don't clean build dir before updating.",
action="store_true",
)
parser.add_argument(
"-w",
dest="write",
help="Write generated/translated files to arch/<ARCH>/",
action="store_true",
)
parser.add_argument(
"-v",
dest="verbosity",
help="Verbosity of the log messages.",
choices=["debug", "info", "warning", "fatal"],
default="info",
)
parser.add_argument(
"-e",
dest="no_auto_apply",
help="Differ: Do not apply saved diff resolutions. Ask for every diff again.",
action="store_true",
)
parser.add_argument(
"-s",
dest="steps",
help="List of update steps to perform. If omitted, it performs all update steps.",
choices=[
"All",
"IncGen",
"Translate",
"Diff",
"MCUpdate",
"PatchArchHeader",
],
nargs="+",
default=["All"],
)
parser.add_argument(
"--inc-list",
dest="inc_list",
help="Only generate the following inc files.",
choices=[
"All",
"Disassembler",
"AsmWriter",
"RegisterInfo",
"InstrInfo",
"SubtargetInfo",
"Mapping",
"SystemOperand",
],
nargs="+",
type=str,
default=["All"],
)
parser.add_argument(
"--refactor",
dest="refactor",
help="Sets change update behavior to ease refactoring and new implementations.",
action="store_true",
)
parser.add_argument(
"--ci",
dest="wait_for_user",
help="The translator will not wait for user input when printing important logs.",
action="store_false",
)
arguments = parser.parse_args()
return arguments
if __name__ == "__main__":
check_py_version()
args = parse_args()
log.basicConfig(
level=convert_loglevel(args.verbosity),
stream=sys.stdout,
format="%(levelname)-5s - %(message)s",
force=True,
)
Updater = ASUpdater(
args.arch,
args.write,
args.steps,
args.inc_list,
args.no_clean,
args.refactor,
args.no_auto_apply,
args.wait_for_user,
)
Updater.update()

View File

@@ -0,0 +1,268 @@
#!/usr/bin/env python3
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import argparse
import logging as log
import re
from pathlib import Path
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
prog="PatchHeaders",
description="Patches generated enums into the main arch header file.",
)
parser.add_argument("--header", dest="header", help="Path header file.", type=Path)
parser.add_argument("--inc", dest="inc", help="Path inc file.", type=Path)
parser.add_argument(
"--aarch64", dest="aarch64", help="aarch64.h header file location", type=Path
)
parser.add_argument(
"--arm64", dest="arm64", help="arm64.h header file location", type=Path
)
parser.add_argument(
"-c", dest="compat", help="Generate compatibility header", action="store_true"
)
parser.add_argument(
"-p", dest="patch", help="Patch inc file into header", action="store_true"
)
arguments = parser.parse_args()
return arguments
def error_exit(msg: str) -> None:
log.fatal(f"{msg}")
exit(1)
class HeaderPatcher:
def __init__(self, header: Path, inc: Path, write_file: bool = True) -> None:
self.header = header
self.inc = inc
self.inc_content: str = ""
self.write_file = write_file
# Gets set to the patched file content if writing to the file is disabled.
self.patched_header_content: str = ""
def patch_header(self) -> bool:
if not (self.header.exists() or self.header.is_file()):
error_exit(f"self.header file {self.header.name} does not exist.")
if not (self.inc.exists() or self.inc.is_file()):
error_exit(f"self.inc file {self.inc.name} does not exist.")
with open(self.header) as f:
header_content = f.read()
if self.inc.name not in header_content:
log.debug(f"{self.inc.name} has no include comments in {self.header.name}")
return False
with open(self.inc) as f:
self.inc_content = f.read()
to_write: dict[str:str] = {}
enum_vals_id = ""
for line in self.inc_content.splitlines():
# No comments and empty lines
if "/*" == line[:2] or not line:
continue
if "#ifdef" in line:
enum_vals_id = line[7:].strip("\n")
to_write[enum_vals_id] = ""
elif "#endif" in line and not enum_vals_id == "NOTGIVEN":
enum_vals_id = ""
elif "#undef" in line:
continue
else:
line = re.sub(r"^(\s+)?", "\t", line)
if not enum_vals_id:
enum_vals_id = "NOTGIVEN"
to_write[enum_vals_id] = line + "\n"
continue
to_write[enum_vals_id] += line + "\n"
for ev_id in to_write.keys():
header_enum_id = f":{ev_id}" if ev_id != "NOTGIVEN" else ""
regex = (
rf"\s*// generated content <{self.inc.name}{header_enum_id}> begin.*(\n)"
rf"(.*\n)*"
rf"\s*// generated content <{self.inc.name}{header_enum_id}> end.*(\n)"
)
if not re.search(regex, header_content):
error_exit(f"Could not locate include comments for {self.inc.name}")
new_content = (
f"\n\t// generated content <{self.inc.name}{header_enum_id}> begin\n"
+ "\t// clang-format off\n\n"
+ to_write[ev_id]
+ "\n\t// clang-format on\n"
+ f"\t// generated content <{self.inc.name}{header_enum_id}> end\n"
)
header_content = re.sub(regex, new_content, header_content)
if self.write_file:
with open(self.header, "w") as f:
f.write(header_content)
else:
self.patched_header_content = header_content
log.info(f"Patched {self.inc.name} into {self.header.name}")
return True
@staticmethod
def file_in_main_header(header: Path, filename: str) -> bool:
with open(header) as f:
header_content = f.read()
return filename in header_content
class CompatHeaderBuilder:
def __init__(self, aarch64_h: Path, arm64_h: Path):
self.aarch64_h = aarch64_h
self.arm64_h = arm64_h
def replace_typedef_struct(self, aarch64_lines: list[str]) -> list[str]:
output = list()
typedef = ""
for line in aarch64_lines:
if typedef:
if not re.search(r"^}\s[\w_]+;", line):
# Skip struct content
continue
type_name = re.findall(r"[\w_]+", line)[0]
output.append(
f"typedef {type_name} {re.sub('aarch64','arm64', type_name)};\n"
)
typedef = ""
continue
if re.search(f"^typedef\s+(struct|union)", line):
typedef = line
continue
output.append(line)
return output
def replace_typedef_enum(self, aarch64_lines: list[str]) -> list[str]:
output = list()
typedef = ""
for line in aarch64_lines:
if typedef:
if not re.search(r"^}\s[\w_]+;", line):
# Replace name
if "AArch64" not in line and "AARCH64" not in line:
output.append(line)
continue
found = re.findall(r"(AArch64|AARCH64)([\w_]+)", line)
entry_name: str = "".join(found[0])
arm64_name = entry_name.replace("AArch64", "ARM64").replace(
"AARCH64", "ARM64"
)
patched_line = re.sub(
r"(AArch64|AARCH64).+", f"{arm64_name} = {entry_name},", line
)
output.append(patched_line)
continue
# We still have LLVM and CS naming conventions mixed
p = re.sub(r"aarch64", "arm64", line)
p = re.sub(r"(AArch64|AARCH64)", "ARM64", p)
output.append(p)
typedef = ""
continue
if re.search(f"^typedef\s+enum", line):
typedef = line
output.append("typedef enum {\n")
continue
output.append(line)
return output
def remove_comments(self, aarch64_lines: list[str]) -> list[str]:
output = list()
for line in aarch64_lines:
if re.search(r"^\s*//", line) and "// SPDX" not in line:
continue
output.append(line)
return output
def replace_aarch64(self, aarch64_lines: list[str]) -> list[str]:
output = list()
in_typedef = False
for line in aarch64_lines:
if in_typedef:
if re.search(r"^}\s[\w_]+;", line):
in_typedef = False
output.append(line)
continue
if re.search(f"^typedef", line):
in_typedef = True
output.append(line)
continue
output.append(re.sub(r"(AArch64|AARCH64)", "ARM64", line))
return output
def replace_include_guards(self, aarch64_lines: list[str]) -> list[str]:
output = list()
for line in aarch64_lines:
if not re.search(r"^#(ifndef|define)", line):
output.append(line)
continue
output.append(re.sub(r"AARCH64", "ARM64", line))
return output
def inject_aarch64_header(self, aarch64_lines: list[str]) -> list[str]:
output = list()
header_inserted = False
for line in aarch64_lines:
if re.search(r"^#include", line):
if not header_inserted:
output.append("#include <capstone/aarch64.h>\n")
header_inserted = True
output.append(line)
return output
def generate_aarch64_compat_header(self) -> bool:
"""
Translates the aarch64.h header into the arm64.h header and renames all aarch64 occurrences.
It does simple regex matching and replacing.
"""
log.info("Generate compatibility header")
with open(self.aarch64_h) as f:
aarch64 = f.readlines()
patched = self.replace_typedef_struct(aarch64)
patched = self.replace_typedef_enum(patched)
patched = self.remove_comments(patched)
patched = self.replace_aarch64(patched)
patched = self.replace_include_guards(patched)
patched = self.inject_aarch64_header(patched)
with open(self.arm64_h, "w+") as f:
f.writelines(patched)
if __name__ == "__main__":
args = parse_args()
if (not args.patch and not args.compat) or (args.patch and args.compat):
print("You need to specify either -c or -p")
exit(1)
if args.compat and not (args.aarch64 and args.arm64):
print(
"Generating the arm64 compatibility header requires --arm64 and --aarch64"
)
exit(1)
if args.patch and not (args.inc and args.header):
print("Patching headers requires --inc and --header")
exit(1)
if args.patch:
patcher = HeaderPatcher(args.header, args.inc)
patcher.patch_header()
exit(0)
builder = CompatHeaderBuilder(args.aarch64, args.arm64)
builder.generate_aarch64_compat_header()

View File

@@ -0,0 +1,176 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import hashlib
import logging as log
import shutil
import subprocess
import sys
from pathlib import Path
import termcolor
from tree_sitter import Node
from autosync.PathVarHandler import PathVarHandler
def convert_loglevel(level: str) -> int:
if level == "debug":
return log.DEBUG
elif level == "info":
return log.INFO
elif level == "warning":
return log.WARNING
elif level == "error":
return log.ERROR
elif level == "fatal":
return log.FATAL
elif level == "critical":
return log.CRITICAL
raise ValueError(f'Unknown loglevel "{level}"')
def find_id_by_type(node: Node, node_types: [str], type_must_match: bool) -> bytes:
"""
Recursively searches for a node sequence with given node types.
A valid sequence is a path from node_n to node_{(n + |node_types|-1)} where
forall i in {0, ..., |node_types|-1}: type(node_{(n + i)}) = node_types_i.
If a node sequence is found, this functions returns the text associated with the
last node in the sequence.
:param node: Current node.
:param node_types: List of node types.
:param type_must_match: If true, it is mandatory for the current node that its type matches node_types[0]
:return: The nodes text of the last node in a valid sequence of and empty string of no such sequence exists.
"""
if len(node_types) == 0:
# No ids left to compare to: Nothing found
return b""
# Set true if:
# current node type matches.
# OR
# parent dictates that node type match
type_must_match = node.type == node_types[0] or type_must_match
if type_must_match and node.type != node_types[0]:
# This child has no matching type. Return.
return b""
if len(node_types) == 1 and type_must_match:
if node.type == node_types[0]:
# Found it
return node.text
else:
# Not found. Return to parent
return b""
# If this nodes type matches the first in the list
# we remove this one from the list.
# Otherwise, give the whole list to the child (since our type does not matter).
children_id_types = node_types[1:] if type_must_match else node_types
# Check if any child has a matching type.
for child in node.named_children:
res = find_id_by_type(child, children_id_types, type_must_match)
if res:
# A path from this node matches the id_types!
return res
# None of our children matched the type list.
return b""
def print_prominent_warning(msg: str, wait_for_user: bool = True) -> None:
print("\n" + separator_line_1("yellow"))
print(termcolor.colored("WARNING", "yellow", attrs=["bold"]) + "\n")
print(msg)
print(separator_line_1("yellow"))
if wait_for_user:
input("Press enter to continue...\n")
def term_width() -> int:
return shutil.get_terminal_size()[0]
def print_prominent_info(msg: str, wait_for_user: bool = True) -> None:
print("\n" + separator_line_1("blue"))
print(msg)
print(separator_line_1("blue"))
if wait_for_user:
input("Press enter to continue...\n")
def bold(msg: str, color: str = None) -> str:
if color:
return termcolor.colored(msg, attrs=["bold"], color=color)
return termcolor.colored(msg, attrs=["bold"])
def colored(msg: str, color: str) -> str:
return termcolor.colored(msg, color=color)
def separator_line_1(color: str = None) -> str:
return f"{bold(f'' * int(term_width() / 2), color)}\n"
def separator_line_2(color: str = None) -> str:
return f"{bold(f'' * int(term_width() / 2), color)}\n"
def get_sha256(data: bytes) -> str:
h = hashlib.sha256()
h.update(data)
return h.hexdigest()
def get_header() -> str:
return (
"/* Capstone Disassembly Engine, http://www.capstone-engine.org */\n"
"/* By Nguyen Anh Quynh <aquynh@gmail.com>, 2013-2022, */\n"
"/* Rot127 <unisono@quyllur.org> 2022-2023 */\n"
"/* Automatically translated source file from LLVM. */\n\n"
"/* LLVM-commit: <commit> */\n"
"/* LLVM-tag: <tag> */\n\n"
"/* Only small edits allowed. */\n"
"/* For multiple similar edits, please create a Patch for the translator. */\n\n"
"/* Capstone's C++ file translator: */\n"
"/* https://github.com/capstone-engine/capstone/tree/next/suite/auto-sync */\n\n"
)
def run_clang_format(out_paths: list[Path]):
for out_file in out_paths:
log.info(f"Format {out_file}")
subprocess.run(
[
"clang-format",
f"-style=file:{get_path('{CS_CLANG_FORMAT_FILE}')}",
"-i",
out_file,
]
)
def get_path(config_path: str) -> Path:
return PathVarHandler().complete_path(config_path)
def test_only_overwrite_path_var(var_name: str, new_path: Path):
"""Don't use outside of testing."""
return PathVarHandler().test_only_overwrite_var(var_name, new_path)
def fail_exit(msg: str) -> None:
"""Logs a fatal message and exits with error code 1."""
log.fatal(msg)
exit(1)
def check_py_version() -> None:
if not sys.hexversion >= 0x030B00F0:
log.fatal("Python >= v3.11 required.")
exit(1)

View File

@@ -0,0 +1,197 @@
#!/usr/bin/env python3
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import logging as log
import os
import re
import shutil
import subprocess
from pathlib import Path
from autosync.Helper import fail_exit, get_path
inc_tables = [
{
"name": "Disassembler",
"tblgen_arg": "--gen-disassembler",
"inc_name": "DisassemblerTables",
"only_arch": [],
"lang": ["CCS", "C++"],
},
{
"name": "AsmWriter",
"tblgen_arg": "--gen-asm-writer",
"inc_name": "AsmWriter",
"only_arch": [],
"lang": ["CCS", "C++"],
},
{
"name": "RegisterInfo",
"tblgen_arg": "--gen-register-info",
"inc_name": "RegisterInfo",
"only_arch": [],
"lang": ["CCS"],
},
{
"name": "InstrInfo",
"tblgen_arg": "--gen-instr-info",
"inc_name": "InstrInfo",
"only_arch": [],
"lang": ["CCS"],
},
{
"name": "SubtargetInfo",
"tblgen_arg": "--gen-subtarget",
"inc_name": "SubtargetInfo",
"only_arch": [],
"lang": ["CCS"],
},
{
"name": "Mapping",
"tblgen_arg": "--gen-asm-matcher",
"inc_name": None,
"only_arch": [],
"lang": ["CCS"],
},
{
"name": "SystemOperand",
"tblgen_arg": "--gen-searchable-tables",
"inc_name": None,
"only_arch": ["AArch64", "ARM"],
"lang": ["CCS"],
},
]
class IncGenerator:
def __init__(self, arch: str, inc_list: list) -> None:
self.arch: str = arch
self.inc_list = inc_list # Names of inc files to generate.
self.arch_dir_name: str = "PowerPC" if self.arch == "PPC" else self.arch
self.patches_dir_path: Path = get_path("{INC_PATCH_DIR}")
self.llvm_include_dir: Path = get_path("{LLVM_INCLUDE_DIR}")
self.output_dir: Path = get_path("{BUILD_DIR}")
self.llvm_target_dir: Path = get_path("{LLVM_TARGET_DIR}").joinpath(
f"{self.arch_dir_name}"
)
self.llvm_tblgen: Path = get_path("{LLVM_TBLGEN_BIN}")
self.output_dir_c_inc = get_path("{C_INC_OUT_DIR}")
self.output_dir_cpp_inc = get_path("{CPP_INC_OUT_DIR}")
self.check_paths()
def check_paths(self) -> None:
if not self.llvm_include_dir.exists():
fail_exit(f"{self.llvm_include_dir} does not exist.")
if not self.llvm_target_dir.exists():
fail_exit(f"{self.llvm_target_dir} does not exist.")
if not self.llvm_tblgen.exists():
fail_exit(f"{self.llvm_tblgen} does not exist. Have you build llvm-tblgen?")
if not self.output_dir.exists():
fail_exit(f"{self.output_dir} does not exist.")
if not self.output_dir_c_inc.exists():
log.debug(f"{self.output_dir_c_inc} does not exist. Creating it...")
os.makedirs(self.output_dir_c_inc)
if not self.output_dir_cpp_inc.exists():
log.debug(f"{self.output_dir_cpp_inc} does not exist. Creating it...")
os.makedirs(self.output_dir_cpp_inc)
def generate(self) -> None:
self.gen_incs()
self.move_mapping_files()
def move_mapping_files(self) -> None:
"""
Moves the <ARCH>GenCS files. They are written to CWD (I know, not nice).
We move them manually to the build dir, as long as llvm-capstone doesn't
allow to specify an output dir.
"""
for file in Path.cwd().iterdir():
if re.search(rf"{self.arch}Gen.*\.inc", file.name):
log.debug(f"Move {file} to {self.output_dir_c_inc}")
if self.output_dir_c_inc.joinpath(file.name).exists():
os.remove(self.output_dir_c_inc.joinpath(file.name))
shutil.move(file, self.output_dir_c_inc)
if self.arch == "AArch64":
# We have to rename the file SystemRegister -> SystemOperands
sys_ops_table_file = self.output_dir_c_inc.joinpath(
"AArch64GenSystemRegister.inc"
)
new_sys_ops_file = self.output_dir_c_inc.joinpath(
"AArch64GenSystemOperands.inc"
)
if "SystemOperand" not in self.inc_list:
return
elif not sys_ops_table_file.exists():
fail_exit(
f"{sys_ops_table_file} does not exist. But it should have been generated."
)
if new_sys_ops_file.exists():
os.remove(new_sys_ops_file)
shutil.move(sys_ops_table_file, new_sys_ops_file)
def gen_incs(self) -> None:
for table in inc_tables:
if "All" not in self.inc_list and table["name"] not in self.inc_list:
log.debug(f"Skip {table['name']} generation")
continue
if table["only_arch"] and self.arch not in table["only_arch"]:
continue
log.info(f"Generating {table['name']} tables...")
for lang in table["lang"]:
log.debug(f"Generating {lang} tables...")
td_file = self.llvm_target_dir.joinpath(f"{self.arch}.td")
out_file = f"{self.arch}Gen{table['inc_name']}.inc"
if lang == "CCS":
out_path = self.output_dir_c_inc.joinpath(out_file)
elif lang == "C++":
out_path = self.output_dir_cpp_inc.joinpath(out_file)
else:
raise NotImplementedError(f"{lang} not supported by llvm-tblgen.")
args = []
args.append(str(self.llvm_tblgen))
args.append(f"--printerLang={lang}")
args.append(table["tblgen_arg"])
args.append("-I")
args.append(f"{str(self.llvm_include_dir)}")
args.append("-I")
args.append(f"{str(self.llvm_target_dir)}")
if table["inc_name"]:
args.append("-o")
args.append(f"{str(out_path)}")
args.append(str(td_file))
log.debug(" ".join(args))
try:
subprocess.run(
args,
check=True,
)
except subprocess.CalledProcessError as e:
log.fatal("Generation failed")
raise e
def apply_patches(self) -> None:
"""
Applies a all patches of inc files.
Files must be moved to their arch/<ARCH> directory before.
"""
patch_dir = self.patches_dir_path.joinpath(self.arch)
if not patch_dir.exists():
return
for patch in patch_dir.iterdir():
try:
subprocess.run(
["git", "apply", str(patch)],
check=True,
)
except subprocess.CalledProcessError as e:
log.warning(f"Patch {patch.name} did not apply correctly!")
log.warning(f"git apply returned: {e}")
return

View File

@@ -0,0 +1,514 @@
#!/usr/bin/env python3
# Copyright © 2024 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import argparse
import logging as log
import json
import re
import sys
import subprocess as sp
from pathlib import Path
from autosync.Targets import TARGETS_LLVM_NAMING
from autosync.Helper import convert_loglevel, get_path
class LLVM_MC_Command:
def __init__(self, cmd_line: str, mattr: str):
self.cmd: str = ""
self.opts: str = ""
self.file: Path | None = None
self.mattr: str = mattr
self.cmd, self.opts, self.file = self.parse_llvm_mc_line(cmd_line)
if not (self.cmd and self.opts and self.file):
log.warning(f"Could not parse llvm-mc command: {cmd_line}")
elif not "--show-encoding" in self.cmd:
self.cmd = re.sub("llvm-mc", "llvm-mc --show-encoding", self.cmd)
def parse_llvm_mc_line(self, line: str) -> tuple[str, str, Path]:
test_file_base_dir = str(get_path("{LLVM_LIT_TEST_DIR}").absolute())
file = re.findall(rf"{test_file_base_dir}\S+", line)
if not file:
log.warning(f"llvm-mc command doesn't contain a file: {line}")
return None, None, None
test_file = file[0]
cmd = re.sub(rf"{test_file}", "", line).strip()
cmd = re.sub(r"\s+", " ", cmd)
arch = re.finditer(r"(triple|arch)[=\s](\S+)", cmd)
mattr = re.finditer(r"(mattr|mcpu)[=\s](\S+)", cmd)
opts = ",".join([m.group(2) for m in arch]) if arch else ""
if mattr:
opts += "" if not opts else ","
opts += ",".join([m.group(2).strip("+") for m in mattr])
return cmd, opts, Path(test_file)
def exec(self) -> sp.CompletedProcess:
with open(self.file, "b+r") as f:
content = f.read()
if self.mattr:
# If mattr exists, patch it into the cmd
if "mattr" in self.cmd:
self.cmd = re.sub(
r"mattr[=\s]+", f"mattr={self.mattr} -mattr=", self.cmd
)
else:
self.cmd = re.sub(r"llvm-mc", f"llvm-mc -mattr={self.mattr}", self.cmd)
log.debug(f"Run: {self.cmd}")
result = sp.run(self.cmd.split(" "), input=content, capture_output=True)
return result
def get_opts_list(self) -> list[str]:
opts = self.opts.strip().strip(",")
opts = re.sub(r"[, ]+", ",", opts)
return opts.split(",")
def __str__(self) -> str:
return f"{self.cmd} < {str(self.file.absolute())}"
class MCTest:
"""
A single test. It can contain multiple decoded instruction for a given byte sequence.
In general a MCTest always tests a sequence of instructions in a single .text segment.
"""
def __init__(self, arch: str, opts: list[str], encoding: str, asm_text: str):
self.arch = arch
if arch.lower() in ["arm", "powerpc", "ppc", "aarch64"]:
# Arch and PPC require this option for MC tests.
self.opts = ["CS_OPT_NO_BRANCH_OFFSET"] + opts
else:
self.opts = opts
self.encoding: list[str] = [encoding]
self.asm_text: list[str] = [asm_text]
def extend(self, encoding: str, asm_text: str):
self.encoding.append(encoding)
self.asm_text.append(asm_text)
def __str__(self):
encoding = ",".join(self.encoding)
encoding = re.sub(r"[\[\]]", "", encoding)
encoding = encoding.strip()
encoding = re.sub(r"[\s,]+", ", ", encoding)
yaml_tc = (
" -\n"
" input:\n"
" bytes: [ <ENCODING> ]\n"
' arch: "<ARCH>"\n'
" options: [ <OPTIONS> ]\n"
" expected:\n"
" insns:\n"
)
template = " -\n asm_text: <ASM_TEXT>\n"
insn_cases = ""
for text in self.asm_text:
insn_cases += template.replace("<ASM_TEXT>", f'"{text}"')
yaml_tc = yaml_tc.replace("<ENCODING>", encoding)
yaml_tc = yaml_tc.replace("<ARCH>", f"CS_ARCH_{self.arch.upper()}")
yaml_tc = yaml_tc.replace("<OPTIONS>", ", ".join([f'"{o}"' for o in self.opts]))
yaml_tc += insn_cases
return yaml_tc
class TestFile:
def __init__(
self,
arch: str,
file_path: Path,
opts: list[str] | None,
mc_cmd: LLVM_MC_Command,
unified_test_cases: bool,
):
self.arch: str = arch
self.file_path: Path = file_path
self.opts: list[str] = list() if not opts else opts
self.mc_cmd: LLVM_MC_Command = mc_cmd
# Indexed by .text section count
self.tests: dict[int : list[MCTest]] = dict()
self.init_tests(unified_test_cases)
def init_tests(self, unified_test_cases: bool):
mc_output = self.mc_cmd.exec()
if mc_output.stderr and not mc_output.stdout:
# We can still continue. We just ignore the failed cases.
log.debug(f"llvm-mc cmd stderr: {mc_output.stderr}")
log.debug(f"llvm-mc result: {mc_output}")
text_section = 0 # Counts the .text sections
asm_pat = f"(?P<asm_text>.+)"
enc_pat = r"(\[?(?P<full_enc_string>(?P<enc_bytes>((0x[a-fA-F0-9]{1,2}[, ]{0,2}))+)[^, ]?)\]?)"
for line in mc_output.stdout.splitlines():
line = line.decode("utf8")
if ".text" in line:
text_section += 1
continue
match = re.search(
rf"^\s*{asm_pat}\s*(#|//|@)\s*encoding:\s*{enc_pat}", line
)
if not match:
continue
full_enc_string = match.group("full_enc_string")
if not re.search(r"0x[a-fA-F0-9]{1,2}$", full_enc_string[:-1]):
log.debug(f"Ignore because symbol injection is needed: {line}")
# The encoding string contains symbol information of the form:
# [0xc0,0xe0,A,A,A... or similar. We ignore these for now.
continue
enc_bytes = match.group("enc_bytes").strip()
asm_text = match.group("asm_text").strip()
asm_text = re.sub(r"\t+", " ", asm_text)
asm_text = asm_text.strip()
if not self.valid_byte_seq(enc_bytes):
continue
if text_section in self.tests:
if unified_test_cases:
self.tests[text_section][0].extend(enc_bytes, asm_text)
else:
self.tests[text_section].append(
MCTest(self.arch, self.opts, enc_bytes, asm_text)
)
else:
self.tests[text_section] = [
MCTest(self.arch, self.opts, enc_bytes, asm_text)
]
def has_tests(self) -> bool:
return len(self.tests) != 0
def get_cs_testfile_content(self, only_test: bool) -> str:
content = "\n" if only_test else "test_cases:\n"
for tl in self.tests.values():
content += "\n".join([str(t) for t in tl])
return content
def num_test_cases(self) -> int:
return len(self.tests)
def valid_byte_seq(self, enc_bytes):
match self.arch:
case "AArch64":
# It always needs 4 bytes.
# Otherwise it is likely a reloc or symbol test
return enc_bytes.count("0x") == 4
case _:
return True
def get_multi_mode_filename(self) -> Path:
filename = self.file_path.stem
parent = self.file_path.parent
detailed_name = f"{filename}_{'_'.join(self.opts)}.txt"
detailed_name = re.sub(r"[+-]", "_", detailed_name)
out_path = parent.joinpath(detailed_name)
return Path(out_path)
def get_simple_filename(self) -> Path:
return self.file_path
def __lt__(self, other) -> bool:
return str(self.file_path) < str(other.file_path)
class MCUpdater:
"""
The MCUpdater parses all test files of the LLVM MC regression tests.
Each of those LLVM files can contain several llvm-mc commands to run on the same file.
Mostly this is done to test the same file with different CPU features enabled.
So it can test different flavors of assembly etc.
In Capstone all modules enable always all CPU features (even if this is not
possible in reality).
Due to this we always parse all llvm-mc commands run on a test file, generate a TestFile
object for each of it, but only write the last one of them to disk.
Once https://github.com/capstone-engine/capstone/issues/1992 is resolved, we can
write all variants of a test file to disk.
This is already implemented and tested with multi_mode = True.
"""
def __init__(
self,
arch: str,
mc_dir: Path,
excluded: list[str] | None,
included: list[str] | None,
unified_test_cases: bool,
multi_mode: bool = False,
):
self.symbolic_links = list()
self.arch = arch
self.test_dir_link_prefix = f"test_dir_{arch}_"
self.mc_dir = mc_dir
self.excluded = excluded if excluded else list()
self.included = included if included else list()
self.test_files: list[TestFile] = list()
self.unified_test_cases = unified_test_cases
with open(get_path("{MCUPDATER_CONFIG_FILE}")) as f:
self.conf = json.loads(f.read())
# Additional mattr passed to llvm-mc
self.mattr: str = (
",".join(self.conf["additional_mattr"][self.arch])
if self.arch in self.conf["additional_mattr"]
else ""
)
# A list of options which are always added.
self.mandatory_options: str = (
self.conf["mandatory_options"][self.arch]
if self.arch in self.conf["mandatory_options"]
else list()
)
self.multi_mode = multi_mode
def check_prerequisites(self, paths):
for path in paths:
if not path.exists() or not path.is_dir():
raise ValueError(
f"'{path}' does not exits or is not a directory. Cannot generate tests from there."
)
llvm_lit_cfg = get_path("{LLVM_LIT_TEST_DIR}")
if not llvm_lit_cfg.exists():
raise ValueError(
f"Could not find '{llvm_lit_cfg}'. Check {{LLVM_LIT_TEST_DIR}} in path_vars.json."
)
def write_to_build_dir(self):
file_cnt = 0
test_cnt = 0
overwritten = 0
files_written = set()
for test in sorted(self.test_files):
if not test.has_tests():
continue
file_cnt += 1
test_cnt += test.num_test_cases()
if self.multi_mode:
rel_path = str(
test.get_multi_mode_filename().relative_to(
get_path("{LLVM_LIT_TEST_DIR}")
)
)
else:
rel_path = str(
test.get_simple_filename().relative_to(
get_path("{LLVM_LIT_TEST_DIR}")
)
)
filename = re.sub(rf"{self.test_dir_link_prefix}\d+", ".", rel_path)
filename = get_path("{MCUPDATER_OUT_DIR}").joinpath(f"{filename}.yaml")
if filename in files_written:
write_mode = "a"
else:
write_mode = "w+"
filename.parent.mkdir(parents=True, exist_ok=True)
if self.multi_mode and filename.exists():
raise ValueError(
f"The following file exists already: {filename}\n"
"This is not allowed in multi-mode."
)
else:
log.debug(f"Overwrite: {filename}")
overwritten += 1
with open(filename, write_mode) as f:
f.write(test.get_cs_testfile_content(only_test=(write_mode == "a")))
log.debug(f"Write {filename}")
files_written.add(filename)
log.info(
f"Processed {file_cnt} files with {test_cnt} test cases. Generated {len(files_written)} files"
)
if overwritten > 0:
log.warning(
f"Overwrote {overwritten} test files with the same name.\n"
f"These files contain instructions of several different cpu features.\n"
f"You have to use multi-mode to write them into distinct files.\n"
f"The current setting will only keep the last one written.\n"
f"See also: https://github.com/capstone-engine/capstone/issues/1992"
)
def build_test_files(self, mc_cmds: list[LLVM_MC_Command]) -> list[TestFile]:
log.info("Build TestFile objects")
test_files = list()
n_all = len(mc_cmds)
for i, mcc in enumerate(mc_cmds):
print(f"{i + 1}/{n_all} {mcc.file.name}", flush=True, end="\r")
test_files.append(
TestFile(
self.arch,
mcc.file,
mcc.get_opts_list() + self.mandatory_options,
mcc,
self.unified_test_cases,
)
)
return test_files
def run_llvm_lit(self, paths: list[Path]) -> list[LLVM_MC_Command]:
"""
Calls llvm-lit with the given paths to the tests.
It parses the llvm-lit commands to LLVM_MC_Commands.
"""
lit_cfg_dir = get_path("{LLVM_LIT_TEST_DIR}")
llvm_lit_cfg = str(lit_cfg_dir.absolute())
args = ["lit", "-v", "-a", llvm_lit_cfg]
for i, p in enumerate(paths):
slink = lit_cfg_dir.joinpath(f"{self.test_dir_link_prefix}{i}")
self.symbolic_links.append(slink)
log.debug(f"Create link: {slink} -> {p}")
try:
slink.symlink_to(p, target_is_directory=True)
except FileExistsError as e:
print("Failed: Link existed. Please delete it")
raise e
log.info(f"Run lit: {' '.join(args)}")
cmds = sp.run(args, capture_output=True)
if cmds.stderr:
raise ValueError(f"llvm-lit failed with {cmds.stderr}")
return self.extract_llvm_mc_cmds(cmds.stdout.decode("utf8"))
def extract_llvm_mc_cmds(self, cmds: str) -> list[LLVM_MC_Command]:
log.debug("Parsing llvm-mc commands")
# Get only the RUN lines which have a show-encoding set.
cmd_lines = cmds.splitlines()
log.debug(f"NO FILTER: {cmd_lines}")
matches = list(
filter(
lambda l: (
l
if re.search(r"^RUN.+(show-encoding|disassemble)[^|]+", l)
else None
),
cmd_lines,
)
)
log.debug(f"FILTER RUN: {' '.join(matches)}")
# Don't add tests which are allowed to fail
matches = list(
filter(lambda m: None if re.search(r"not\s+llvm-mc", m) else m, matches)
)
log.debug(f"FILTER not llvm-mc: {' '.join(matches)}")
# Skip object file tests
matches = list(
filter(lambda m: None if re.search(r"filetype=obj", m) else m, matches)
)
log.debug(f"FILTER filetype=obj-mc: {' '.join(matches)}")
# Skip any relocation related tests.
matches = filter(lambda m: None if re.search(r"reloc", m) else m, matches)
# Remove 'RUN: at ...' prefix
matches = map(lambda m: re.sub(r"^RUN: at line \d+: ", "", m), matches)
# Remove redirection
matches = map(lambda m: re.sub(r"\d>&\d", "", m), matches)
# Remove unused arguments
matches = map(lambda m: re.sub(r"-o\s?-", "", m), matches)
# Remove redirection of stderr to a file
matches = map(lambda m: re.sub(r"2>\s?\S+", "", m), matches)
# Remove piping to FileCheck
matches = map(lambda m: re.sub(r"\|\s*FileCheck\s+.+", "", m), matches)
# Remove input stream
matches = map(lambda m: re.sub(r"\s+<", "", m), matches)
all_cmds = list()
for match in matches:
if self.included and not any(
re.search(x, match) is not None for x in self.included
):
continue
if any(re.search(x, match) is not None for x in self.excluded):
continue
llvm_mc_cmd = LLVM_MC_Command(match, self.mattr)
if not llvm_mc_cmd.cmd:
# Invalid
continue
all_cmds.append(llvm_mc_cmd)
log.debug(f"Added: {llvm_mc_cmd}")
log.debug(f"Extracted {len(all_cmds)} llvm-mc commands")
return all_cmds
def gen_all(self):
log.info("Check prerequisites")
disas_tests = self.mc_dir.joinpath(f"Disassembler/{self.arch}")
test_paths = [disas_tests]
self.check_prerequisites(test_paths)
log.info("Generate MC regression tests")
llvm_mc_cmds = self.run_llvm_lit(test_paths)
log.info(f"Got {len(llvm_mc_cmds)} llvm-mc commands to run")
self.test_files = self.build_test_files(llvm_mc_cmds)
for slink in self.symbolic_links:
log.debug(f"Unlink {slink}")
slink.unlink()
self.write_to_build_dir()
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
prog="Test file updater",
description="Synchronizes test files with LLVM",
)
parser.add_argument(
"-d",
dest="mc_dir",
help=f"Path to the LLVM MC test files. Default: {get_path('{LLVM_MC_TEST_DIR}')}",
default=get_path("{LLVM_MC_TEST_DIR}"),
type=Path,
)
parser.add_argument(
"-a",
dest="arch",
help="Name of architecture to update.",
choices=TARGETS_LLVM_NAMING,
required=True,
)
parser.add_argument(
"-e",
dest="excluded_files",
metavar="filename",
nargs="+",
help="File names to exclude from update (can be a regex pattern).",
)
parser.add_argument(
"-i",
dest="included_files",
metavar="filename",
nargs="+",
help="Specific list of file names to update (can be a regex pattern).",
)
parser.add_argument(
"-u",
dest="unified_tests",
action="store_true",
default=False,
help="If set, all instructions of a text segment will decoded and tested at once. Should be set, if instructions depend on each other.",
)
parser.add_argument(
"-v",
dest="verbosity",
help="Verbosity of the log messages.",
choices=["debug", "info", "warning", "fatal"],
default="info",
)
arguments = parser.parse_args()
return arguments
if __name__ == "__main__":
args = parse_args()
log.basicConfig(
level=convert_loglevel(args.verbosity),
stream=sys.stdout,
format="%(levelname)-5s - %(message)s",
force=True,
)
MCUpdater(
args.arch,
args.mc_dir,
args.excluded_files,
args.included_files,
args.unified_tests,
).gen_all()

View File

@@ -0,0 +1,113 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import json
import logging as log
import re
import subprocess
from pathlib import Path
class Singleton(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
class PathVarHandler(metaclass=Singleton):
def __init__(self) -> None:
try:
res = subprocess.run(
["git", "rev-parse", "--show-toplevel"],
check=True,
stdout=subprocess.PIPE,
)
except subprocess.CalledProcessError:
log.fatal("Could not get repository top level directory.")
exit(1)
repo_root = res.stdout.decode("utf8").strip("\n")
# The main directories
self.paths: dict[str:Path] = dict()
self.paths["{CS_ROOT}"] = Path(repo_root)
self.paths["{AUTO_SYNC_ROOT}"] = Path(repo_root).joinpath("suite/auto-sync/")
self.paths["{AUTO_SYNC_SRC}"] = self.paths["{AUTO_SYNC_ROOT}"].joinpath(
"src/autosync/"
)
path_config_file = self.paths["{AUTO_SYNC_SRC}"].joinpath("path_vars.json")
# Load variables
with open(path_config_file) as f:
vars = json.load(f)
paths = vars["paths"]
self.create_during_runtime = vars["create_during_runtime"]
missing = list()
for p_name, path in paths.items():
resolved = path
for var_id in re.findall(r"\{.+}", resolved):
if var_id not in self.paths:
log.fatal(
f"{var_id} hasn't been added to the PathVarsHandler, yet. The var must be defined in a previous entry."
)
exit(1)
resolved: str = re.sub(var_id, str(self.paths[var_id]), resolved)
log.debug(f"Set {p_name} = {resolved}")
if not Path(resolved).exists() and (
p_name not in self.create_during_runtime
and p_name not in vars["ignore_missing"]
):
missing.append(resolved)
elif var_id in self.create_during_runtime:
self.create_path(var_id, resolved)
self.paths[p_name] = Path(resolved)
if len(missing) > 0:
log.fatal(f"Some paths from config file are missing!")
for m in missing:
log.fatal(f"\t{m}")
exit(1)
def test_only_overwrite_var(self, var_name: str, new_path: Path):
if var_name not in self.paths:
raise ValueError(f"PathVarHandler doesn't have a path for '{var_name}'")
if not new_path.exists():
raise ValueError(f"New path doesn't exists: '{new_path}")
self.paths[var_name] = new_path
def get_path(self, name: str) -> Path:
if name not in self.paths:
raise ValueError(f"Path variable {name} has no path saved.")
if name in self.create_during_runtime:
self.create_path(name, self.paths[name])
return self.paths[name]
def complete_path(self, path_str: str) -> Path:
resolved = path_str
for p_name in re.findall(r"\{.+}", path_str):
resolved = re.sub(p_name, str(self.get_path(p_name)), resolved)
return Path(resolved)
@staticmethod
def create_path(var_id: str, path: str):
pp = Path(path)
if pp.exists():
return
log.debug(f"Create path {var_id} @ {path}")
postfix = var_id.strip("}").split("_")[-1]
if postfix == "FILE":
if not pp.parent.exists():
pp.parent.mkdir(parents=True)
pp.touch()
elif postfix == "DIR":
pp.mkdir(parents=True)
else:
from autosync.Helper import fail_exit
fail_exit(
f"The var_id: {var_id} must end in _FILE or _DIR. It ends in '{postfix}'"
)

View File

@@ -0,0 +1,4 @@
# Copyright © 2024 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
TARGETS_LLVM_NAMING = ["ARM", "PowerPC", "Alpha", "AArch64", "LoongArch"]

View File

@@ -0,0 +1,34 @@
# RUN: llvm-mc -triple=aarch64 -mattr=+v8a,+fp-armv8 -disassemble < %s | FileCheck %s
# RUN: llvm-mc -triple=arm64 -mattr=+v8.2a -disassemble < %s | FileCheck %s --check-prefix=CHECK-V82
#------------------------------------------------------------------------------
# Compare and branch (immediate)
#------------------------------------------------------------------------------
# CHECK: sbfx x1, x2, #3, #2
# CHECK: asr x3, x4, #63
# CHECK: asr wzr, wzr, #31
# CHECK: sbfx w12, w9, #0, #1
0x41 0x10 0x43 0x93
0x83 0xfc 0x7f 0x93
0xff 0x7f 0x1f 0x13
0x2c 0x1 0x0 0x13
# CHECK: ubfiz x4, x5, #52, #11
# CHECK: ubfx xzr, x4, #0, #1
# CHECK: ubfiz x4, xzr, #1, #6
# CHECK: lsr x5, x6, #12
0xa4 0x28 0x4c 0xd3
0x9f 0x0 0x40 0xd3
0xe4 0x17 0x7f 0xd3
0xc5 0xfc 0x4c 0xd3
# CHECK: bfi x4, x5, #52, #11
# CHECK: bfxil xzr, x4, #0, #1
# CHECK: bfi x4, xzr, #1, #6
# CHECK-V82: bfc x4, #1, #6
# CHECK: bfxil x5, x6, #12, #52
0xa4 0x28 0x4c 0xb3
0x9f 0x0 0x40 0xb3
0xe4 0x17 0x7f 0xb3
0xc5 0xfc 0x4c 0xb3

View File

@@ -0,0 +1,9 @@
# The RUN line parsing
# RUN: llvm-mc --disassemble -triple=arm64 < %s | FileCheck %s
[0x00,0x0a,0x31,0xd5]
# CHECK: mrs x0, TRCRSR
[0x80,0x08,0x31,0xd5]
# CHECK: mrs x0, TRCEXTINSELR

View File

@@ -0,0 +1,41 @@
# RUN: llvm-mc -triple s390x-unknown-unknown -mcpu=z13 --show-encoding %s | FileCheck %s
# RUN: llvm-mc -triple s390x-unknown-unknown -mcpu=z13 -filetype=obj %s | \
# RUN: llvm-readobj -r - | FileCheck %s -check-prefix=CHECK-REL
# RUN: llvm-mc -triple s390x-unknown-unknown -mcpu=z13 -filetype=obj %s | \
# RUN: llvm-objdump -d - | FileCheck %s -check-prefix=CHECK-DIS
# CHECK: larl %r14, target # encoding: [0xc0,0xe0,A,A,A,A]
# CHECK-NEXT: # fixup A - offset: 2, value: target+2, kind: FK_390_PC32DBL
# CHECK-REL: 0x{{[0-9A-F]*2}} R_390_PC32DBL target 0x2
.align 16
larl %r14, target
# CHECK: larl %r14, target@GOT # encoding: [0xc0,0xe0,A,A,A,A]
# CHECK-NEXT: # fixup A - offset: 2, value: target@GOT+2, kind: FK_390_PC32DBL
# CHECK-REL: 0x{{[0-9A-F]*2}} R_390_GOTENT target 0x2
.align 16
larl %r14, target@got
# CHECK: vl %v0, src(%r1) # encoding: [0xe7,0x00,0b0001AAAA,A,0x00,0x06]
# CHECK-NEXT: # fixup A - offset: 2, value: src, kind: FK_390_U12Imm
# CHECK-REL: 0x{{[0-9A-F]*2}} R_390_12 src 0x0
.align 16
vl %v0, src(%r1)
# CHECK: .insn ss,238594023227392,dst(%r2,%r1),src,%r3 # encoding: [0xd9,0x23,0b0001AAAA,A,0b0000BBBB,B]
# CHECK-NEXT: # fixup A - offset: 2, value: dst, kind: FK_390_U12Imm
# CHECK-NEXT: # fixup B - offset: 4, value: src, kind: FK_390_U12Imm
# CHECK-REL: 0x{{[0-9A-F]*2}} R_390_12 dst 0x0
# CHECK-REL: 0x{{[0-9A-F]*4}} R_390_12 src 0x0
.align 16
.insn ss,0xd90000000000,dst(%r2,%r1),src,%r3 # mvck
##S8
# CHECK: asi 0(%r1), src # encoding: [0xeb,A,0x10,0x00,0x00,0x6a]
# CHECK-NEXT: # fixup A - offset: 1, value: src, kind: FK_390_S8Imm
# CHECK-REL: 0x{{[0-9A-F]+}} R_390_8 src 0x0
.align 16
asi 0(%r1),src

View File

@@ -0,0 +1,3 @@
# CS_ARCH_ARM, CS_MODE_THUMB, None
0x00,0x0a,0x31,0xd5 == mrs x0, TRCRSR

View File

@@ -0,0 +1,240 @@
test_cases:
-
input:
bytes: [ 0x41, 0x10, 0x43, 0x93 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "sbfx x1, x2, #3, #2"
-
input:
bytes: [ 0x83, 0xfc, 0x7f, 0x93 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "asr x3, x4, #63"
-
input:
bytes: [ 0xff, 0x7f, 0x1f, 0x13 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "asr wzr, wzr, #31"
-
input:
bytes: [ 0x2c, 0x01, 0x00, 0x13 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "sbfx w12, w9, #0, #1"
-
input:
bytes: [ 0xa4, 0x28, 0x4c, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "ubfiz x4, x5, #52, #11"
-
input:
bytes: [ 0x9f, 0x00, 0x40, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "ubfx xzr, x4, #0, #1"
-
input:
bytes: [ 0xe4, 0x17, 0x7f, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "ubfiz x4, xzr, #1, #6"
-
input:
bytes: [ 0xc5, 0xfc, 0x4c, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "lsr x5, x6, #12"
-
input:
bytes: [ 0xa4, 0x28, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "bfi x4, x5, #52, #11"
-
input:
bytes: [ 0x9f, 0x00, 0x40, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "bfxil xzr, x4, #0, #1"
-
input:
bytes: [ 0xe4, 0x17, 0x7f, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "bfi x4, xzr, #1, #6"
-
input:
bytes: [ 0xc5, 0xfc, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "bfxil x5, x6, #12, #52"
-
input:
bytes: [ 0x41, 0x10, 0x43, 0x93 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "sbfx x1, x2, #3, #2"
-
input:
bytes: [ 0x83, 0xfc, 0x7f, 0x93 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "asr x3, x4, #63"
-
input:
bytes: [ 0xff, 0x7f, 0x1f, 0x13 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "asr wzr, wzr, #31"
-
input:
bytes: [ 0x2c, 0x01, 0x00, 0x13 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "sbfx w12, w9, #0, #1"
-
input:
bytes: [ 0xa4, 0x28, 0x4c, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "ubfiz x4, x5, #52, #11"
-
input:
bytes: [ 0x9f, 0x00, 0x40, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "ubfx xzr, x4, #0, #1"
-
input:
bytes: [ 0xe4, 0x17, 0x7f, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "ubfiz x4, xzr, #1, #6"
-
input:
bytes: [ 0xc5, 0xfc, 0x4c, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "lsr x5, x6, #12"
-
input:
bytes: [ 0xa4, 0x28, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "bfi x4, x5, #52, #11"
-
input:
bytes: [ 0x9f, 0x00, 0x40, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "bfxil xzr, x4, #0, #1"
-
input:
bytes: [ 0xe4, 0x17, 0x7f, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "bfc x4, #1, #6"
-
input:
bytes: [ 0xc5, 0xfc, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "bfxil x5, x6, #12, #52"

View File

@@ -0,0 +1,20 @@
test_cases:
-
input:
bytes: [ 0x00, 0x0a, 0x31, 0xd5 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64" ]
expected:
insns:
-
asm_text: "mrs x0, TRCRSR"
-
input:
bytes: [ 0x80, 0x08, 0x31, 0xd5 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64" ]
expected:
insns:
-
asm_text: "mrs x0, TRCEXTINSELR"

View File

@@ -0,0 +1,64 @@
test_cases:
-
input:
bytes: [ 0x41, 0x10, 0x43, 0x93, 0x83, 0xfc, 0x7f, 0x93, 0xff, 0x7f, 0x1f, 0x13, 0x2c, 0x01, 0x00, 0x13, 0xa4, 0x28, 0x4c, 0xd3, 0x9f, 0x00, 0x40, 0xd3, 0xe4, 0x17, 0x7f, 0xd3, 0xc5, 0xfc, 0x4c, 0xd3, 0xa4, 0x28, 0x4c, 0xb3, 0x9f, 0x00, 0x40, 0xb3, 0xe4, 0x17, 0x7f, 0xb3, 0xc5, 0xfc, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "sbfx x1, x2, #3, #2"
-
asm_text: "asr x3, x4, #63"
-
asm_text: "asr wzr, wzr, #31"
-
asm_text: "sbfx w12, w9, #0, #1"
-
asm_text: "ubfiz x4, x5, #52, #11"
-
asm_text: "ubfx xzr, x4, #0, #1"
-
asm_text: "ubfiz x4, xzr, #1, #6"
-
asm_text: "lsr x5, x6, #12"
-
asm_text: "bfi x4, x5, #52, #11"
-
asm_text: "bfxil xzr, x4, #0, #1"
-
asm_text: "bfi x4, xzr, #1, #6"
-
asm_text: "bfxil x5, x6, #12, #52"
-
input:
bytes: [ 0x41, 0x10, 0x43, 0x93, 0x83, 0xfc, 0x7f, 0x93, 0xff, 0x7f, 0x1f, 0x13, 0x2c, 0x01, 0x00, 0x13, 0xa4, 0x28, 0x4c, 0xd3, 0x9f, 0x00, 0x40, 0xd3, 0xe4, 0x17, 0x7f, 0xd3, 0xc5, 0xfc, 0x4c, 0xd3, 0xa4, 0x28, 0x4c, 0xb3, 0x9f, 0x00, 0x40, 0xb3, 0xe4, 0x17, 0x7f, 0xb3, 0xc5, 0xfc, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "sbfx x1, x2, #3, #2"
-
asm_text: "asr x3, x4, #63"
-
asm_text: "asr wzr, wzr, #31"
-
asm_text: "sbfx w12, w9, #0, #1"
-
asm_text: "ubfiz x4, x5, #52, #11"
-
asm_text: "ubfx xzr, x4, #0, #1"
-
asm_text: "ubfiz x4, xzr, #1, #6"
-
asm_text: "lsr x5, x6, #12"
-
asm_text: "bfi x4, x5, #52, #11"
-
asm_text: "bfxil xzr, x4, #0, #1"
-
asm_text: "bfc x4, #1, #6"
-
asm_text: "bfxil x5, x6, #12, #52"

View File

@@ -0,0 +1,12 @@
test_cases:
-
input:
bytes: [ 0x00, 0x0a, 0x31, 0xd5, 0x80, 0x08, 0x31, 0xd5 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64" ]
expected:
insns:
-
asm_text: "mrs x0, TRCRSR"
-
asm_text: "mrs x0, TRCEXTINSELR"

View File

@@ -0,0 +1,120 @@
test_cases:
-
input:
bytes: [ 0x41, 0x10, 0x43, 0x93 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "sbfx x1, x2, #3, #2"
-
input:
bytes: [ 0x83, 0xfc, 0x7f, 0x93 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "asr x3, x4, #63"
-
input:
bytes: [ 0xff, 0x7f, 0x1f, 0x13 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "asr wzr, wzr, #31"
-
input:
bytes: [ 0x2c, 0x01, 0x00, 0x13 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "sbfx w12, w9, #0, #1"
-
input:
bytes: [ 0xa4, 0x28, 0x4c, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "ubfiz x4, x5, #52, #11"
-
input:
bytes: [ 0x9f, 0x00, 0x40, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "ubfx xzr, x4, #0, #1"
-
input:
bytes: [ 0xe4, 0x17, 0x7f, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "ubfiz x4, xzr, #1, #6"
-
input:
bytes: [ 0xc5, 0xfc, 0x4c, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "lsr x5, x6, #12"
-
input:
bytes: [ 0xa4, 0x28, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "bfi x4, x5, #52, #11"
-
input:
bytes: [ 0x9f, 0x00, 0x40, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "bfxil xzr, x4, #0, #1"
-
input:
bytes: [ 0xe4, 0x17, 0x7f, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "bfi x4, xzr, #1, #6"
-
input:
bytes: [ 0xc5, 0xfc, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "bfxil x5, x6, #12, #52"

View File

@@ -0,0 +1,120 @@
test_cases:
-
input:
bytes: [ 0x41, 0x10, 0x43, 0x93 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "sbfx x1, x2, #3, #2"
-
input:
bytes: [ 0x83, 0xfc, 0x7f, 0x93 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "asr x3, x4, #63"
-
input:
bytes: [ 0xff, 0x7f, 0x1f, 0x13 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "asr wzr, wzr, #31"
-
input:
bytes: [ 0x2c, 0x01, 0x00, 0x13 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "sbfx w12, w9, #0, #1"
-
input:
bytes: [ 0xa4, 0x28, 0x4c, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "ubfiz x4, x5, #52, #11"
-
input:
bytes: [ 0x9f, 0x00, 0x40, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "ubfx xzr, x4, #0, #1"
-
input:
bytes: [ 0xe4, 0x17, 0x7f, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "ubfiz x4, xzr, #1, #6"
-
input:
bytes: [ 0xc5, 0xfc, 0x4c, 0xd3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "lsr x5, x6, #12"
-
input:
bytes: [ 0xa4, 0x28, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "bfi x4, x5, #52, #11"
-
input:
bytes: [ 0x9f, 0x00, 0x40, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "bfxil xzr, x4, #0, #1"
-
input:
bytes: [ 0xe4, 0x17, 0x7f, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "bfc x4, #1, #6"
-
input:
bytes: [ 0xc5, 0xfc, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "bfxil x5, x6, #12, #52"

View File

@@ -0,0 +1,20 @@
test_cases:
-
input:
bytes: [ 0x00, 0x0a, 0x31, 0xd5 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64" ]
expected:
insns:
-
asm_text: "mrs x0, TRCRSR"
-
input:
bytes: [ 0x80, 0x08, 0x31, 0xd5 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64" ]
expected:
insns:
-
asm_text: "mrs x0, TRCEXTINSELR"

View File

@@ -0,0 +1,32 @@
test_cases:
-
input:
bytes: [ 0x41, 0x10, 0x43, 0x93, 0x83, 0xfc, 0x7f, 0x93, 0xff, 0x7f, 0x1f, 0x13, 0x2c, 0x01, 0x00, 0x13, 0xa4, 0x28, 0x4c, 0xd3, 0x9f, 0x00, 0x40, 0xd3, 0xe4, 0x17, 0x7f, 0xd3, 0xc5, 0xfc, 0x4c, 0xd3, 0xa4, 0x28, 0x4c, 0xb3, 0x9f, 0x00, 0x40, 0xb3, 0xe4, 0x17, 0x7f, 0xb3, 0xc5, 0xfc, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "aarch64", "v8a", "+fp-armv8" ]
expected:
insns:
-
asm_text: "sbfx x1, x2, #3, #2"
-
asm_text: "asr x3, x4, #63"
-
asm_text: "asr wzr, wzr, #31"
-
asm_text: "sbfx w12, w9, #0, #1"
-
asm_text: "ubfiz x4, x5, #52, #11"
-
asm_text: "ubfx xzr, x4, #0, #1"
-
asm_text: "ubfiz x4, xzr, #1, #6"
-
asm_text: "lsr x5, x6, #12"
-
asm_text: "bfi x4, x5, #52, #11"
-
asm_text: "bfxil xzr, x4, #0, #1"
-
asm_text: "bfi x4, xzr, #1, #6"
-
asm_text: "bfxil x5, x6, #12, #52"

View File

@@ -0,0 +1,32 @@
test_cases:
-
input:
bytes: [ 0x41, 0x10, 0x43, 0x93, 0x83, 0xfc, 0x7f, 0x93, 0xff, 0x7f, 0x1f, 0x13, 0x2c, 0x01, 0x00, 0x13, 0xa4, 0x28, 0x4c, 0xd3, 0x9f, 0x00, 0x40, 0xd3, 0xe4, 0x17, 0x7f, 0xd3, 0xc5, 0xfc, 0x4c, 0xd3, 0xa4, 0x28, 0x4c, 0xb3, 0x9f, 0x00, 0x40, 0xb3, 0xe4, 0x17, 0x7f, 0xb3, 0xc5, 0xfc, 0x4c, 0xb3 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64", "v8.2a" ]
expected:
insns:
-
asm_text: "sbfx x1, x2, #3, #2"
-
asm_text: "asr x3, x4, #63"
-
asm_text: "asr wzr, wzr, #31"
-
asm_text: "sbfx w12, w9, #0, #1"
-
asm_text: "ubfiz x4, x5, #52, #11"
-
asm_text: "ubfx xzr, x4, #0, #1"
-
asm_text: "ubfiz x4, xzr, #1, #6"
-
asm_text: "lsr x5, x6, #12"
-
asm_text: "bfi x4, x5, #52, #11"
-
asm_text: "bfxil xzr, x4, #0, #1"
-
asm_text: "bfc x4, #1, #6"
-
asm_text: "bfxil x5, x6, #12, #52"

View File

@@ -0,0 +1,12 @@
test_cases:
-
input:
bytes: [ 0x00, 0x0a, 0x31, 0xd5, 0x80, 0x08, 0x31, 0xd5 ]
arch: "CS_ARCH_ARCH"
options: [ "arm64" ]
expected:
insns:
-
asm_text: "mrs x0, TRCRSR"
-
asm_text: "mrs x0, TRCEXTINSELR"

View File

@@ -0,0 +1,56 @@
// SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
// SPDX-License-Identifier: BSD-3
#ifndef CAPSTONE_AARCH64_H
#define CAPSTONE_AARCH64_H
#include "cs_operand.h"
inline static unsigned AArch64CC_getNZCVToSatisfyCondCode(AArch64CC_CondCode Code)
{
// NZCV flags encoded as expected by ccmp instructions, ARMv8 ISA 5.5.7.
enum { N = 8, Z = 4, C = 2, V = 1 };
switch (Code) {
default:
assert(0 && "Unknown condition code");
case AArch64CC_EQ:
return Z; // Z == 1
}
}
typedef union {
aarch64_dbnxs dbnxs;
aarch64_exactfpimm exactfpimm;
} aarch64_sysop_imm;
typedef enum aarch64_op_type {
AArch64_OP_SYSALIAS = CS_OP_SPECIAL + 27, // Equal Equal
AArch64_OP_SYSALIASI,
AArch64_OP_SYSALIASII = 0,
AArch64_OP_SYSALIASIII, // Comment
} aarch64_op_type;
typedef enum aarch64_op_type_upper {
AARCH64_OP_SYSALIAS = CS_OP_SPECIAL + 27, // Equal Equal
AARCH64_OP_SYSALIASI,
AARCH64_OP_SYSALIASII = 0,
AARCH64_OP_SYSALIASIII, // Comment
} aarch64_op_type_upper;
#define MAX_AARCH64_OPS 8
/// Instruction structure
typedef struct cs_aarch64 {
AArch64CC_CondCode cc; ///< conditional code for this insn
bool update_flags; ///< does this insn update flags?
bool post_index; ///< only set if writeback is 'True', if 'False' pre-index, otherwise post.
bool is_doing_sme; ///< True if a SME operand is currently edited.
/// Number of operands of this instruction,
/// or 0 when instruction has no operand.
uint8_t op_count;
cs_aarch64_op operands[MAX_AARCH64_OPS]; ///< operands for this instruction.
} cs_aarch64;
#endif

View File

@@ -0,0 +1,41 @@
// SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
// SPDX-License-Identifier: BSD-3
#ifndef CAPSTONE_ARM64_H
#define CAPSTONE_ARM64_H
#include <capstone/aarch64.h>
#include "cs_operand.h"
inline static unsigned ARM64CC_getNZCVToSatisfyCondCode(ARM64CC_CondCode Code)
{
enum { N = 8, Z = 4, C = 2, V = 1 };
switch (Code) {
default:
assert(0 && "Unknown condition code");
case ARM64CC_EQ:
return Z; // Z == 1
}
}
typedef aarch64_sysop_imm arm64_sysop_imm;
typedef enum {
ARM64_OP_SYSALIAS = AArch64_OP_SYSALIAS,
ARM64_OP_SYSALIASI = AArch64_OP_SYSALIASI,
ARM64_OP_SYSALIASII = AArch64_OP_SYSALIASII,
ARM64_OP_SYSALIASIII = AArch64_OP_SYSALIASIII,
} arm64_op_type;
typedef enum {
ARM64_OP_SYSALIAS = AARCH64_OP_SYSALIAS,
ARM64_OP_SYSALIASI = AARCH64_OP_SYSALIASI,
ARM64_OP_SYSALIASII = AARCH64_OP_SYSALIASII,
ARM64_OP_SYSALIASIII = AARCH64_OP_SYSALIASIII,
} arm64_op_type_upper;
#define MAX_ARM64_OPS 8
typedef cs_aarch64 cs_arm64;
#endif

View File

@@ -0,0 +1,12 @@
// SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
// SPDX-License-Identifier: BSD-3
// Include the whole file
// generated content <test_include.inc> begin
// generated content <test_include.inc> end
// Include only a part of the file.
// generated content <test_include.inc:GUARD> begin
// generated content <test_include.inc:GUARD> end

View File

@@ -0,0 +1,58 @@
# SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import unittest
from autosync.HeaderPatcher import CompatHeaderBuilder, HeaderPatcher
from autosync.Helper import get_path
class TestHeaderPatcher(unittest.TestCase):
@classmethod
def setUpClass(cls):
cls.hpatcher = HeaderPatcher(
get_path("{HEADER_PATCHER_TEST_HEADER_FILE}"),
get_path("{HEADER_PATCHER_TEST_INC_FILE}"),
write_file=False,
)
cls.compat_gen = CompatHeaderBuilder(
get_path("{HEADER_GEN_TEST_AARCH64_FILE}"),
get_path("{HEADER_GEN_TEST_ARM64_OUT_FILE}"),
)
def test_header_patching(self):
self.hpatcher.patch_header()
self.assertEqual(
self.hpatcher.patched_header_content,
(
"// SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>\n"
"// SPDX-License-Identifier: BSD-3\n"
"\n"
"\n"
" // Include the whole file\n"
" // generated content <test_include.inc> begin\n"
" // clang-format off\n"
"\n"
"\tThis part should be included if the whole file is included.\n"
"\n"
" // clang-format on\n"
" // generated content <test_include.inc> end\n"
"\n"
" // Include only a part of the file.\n"
" // generated content <test_include.inc:GUARD> begin\n"
" // clang-format off\n"
"\n"
" Partial include of something\n"
"\n"
" // clang-format on\n"
" // generated content <test_include.inc:GUARD> end\n"
"\n"
),
)
def test_compat_header_gen(self):
self.compat_gen.generate_aarch64_compat_header()
with open(get_path("{HEADER_GEN_TEST_ARM64_FILE}")) as f:
correct = f.read()
with open(get_path("{HEADER_GEN_TEST_ARM64_OUT_FILE}")) as f:
self.assertEqual(f.read(), correct)

View File

@@ -0,0 +1,11 @@
// SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
// SPDX-License-Identifier: BSD-3
#ifdef GUARD
#undef GUARD
Partial include of something
#endif
This part should be included if the whole file is included.

View File

@@ -0,0 +1,192 @@
# SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import logging
import os
import sys
import unittest
from pathlib import Path
from autosync.Helper import get_path, test_only_overwrite_path_var
from autosync.MCUpdater import MCUpdater
class TestHeaderPatcher(unittest.TestCase):
@classmethod
def setUpClass(cls):
logging.basicConfig(
level=logging.DEBUG,
stream=sys.stdout,
format="%(levelname)-5s - %(message)s",
force=True,
)
def test_test_case_gen(self):
"""
To enforce sequential execution of the tests, we execute them in here.
And don't make them a separated test.
"""
self.assertTrue(self.unified_test_cases(), "Failed: unified_test_cases")
self.assertTrue(self.separated_test_cases(), "Failed: separated_test_cases")
self.assertTrue(
self.multi_mode_unified_test_cases(),
"Failed: multi_mode_unified_test_cases",
)
self.assertTrue(
self.multi_mode_separated_test_cases(),
"Failed: multi_mode_separated_test_cases",
)
def unified_test_cases(self):
out_dir = Path(
get_path("{MCUPDATER_TEST_OUT_DIR}").joinpath("merged").joinpath("unified")
)
if not out_dir.exists():
out_dir.mkdir(parents=True)
for file in out_dir.iterdir():
logging.debug(f"Delete old file: {file}")
os.remove(file)
test_only_overwrite_path_var(
"{MCUPDATER_OUT_DIR}",
out_dir,
)
self.updater = MCUpdater("ARCH", get_path("{MCUPDATER_TEST_DIR}"), [], [], True)
self.updater.gen_all()
return self.compare_files(out_dir, ["test_a.txt.yaml", "test_b.txt.yaml"])
def separated_test_cases(self):
out_dir = Path(
get_path("{MCUPDATER_TEST_OUT_DIR}")
.joinpath("merged")
.joinpath("separated")
)
if not out_dir.exists():
out_dir.mkdir(parents=True)
for file in out_dir.iterdir():
logging.debug(f"Delete old file: {file}")
os.remove(file)
test_only_overwrite_path_var(
"{MCUPDATER_OUT_DIR}",
out_dir,
)
self.updater = MCUpdater(
"ARCH", get_path("{MCUPDATER_TEST_DIR}"), [], [], False
)
self.updater.gen_all()
return self.compare_files(out_dir, ["test_a.txt.yaml", "test_b.txt.yaml"])
def multi_mode_unified_test_cases(self):
out_dir = Path(
get_path("{MCUPDATER_TEST_OUT_DIR}").joinpath("multi").joinpath("unified")
)
if not out_dir.exists():
out_dir.mkdir(parents=True)
for file in out_dir.iterdir():
logging.debug(f"Delete old file: {file}")
os.remove(file)
test_only_overwrite_path_var(
"{MCUPDATER_OUT_DIR}",
out_dir,
)
self.updater = MCUpdater(
"ARCH", get_path("{MCUPDATER_TEST_DIR}"), [], [], True, multi_mode=True
)
self.updater.gen_all()
return self.compare_files(
out_dir,
[
"test_a_aarch64_v8a__fp_armv8.txt.yaml",
"test_a_arm64_v8.2a.txt.yaml",
"test_b_arm64.txt.yaml",
],
)
def multi_mode_separated_test_cases(self):
out_dir = Path(
get_path("{MCUPDATER_TEST_OUT_DIR}").joinpath("multi").joinpath("separated")
)
if not out_dir.exists():
out_dir.mkdir(parents=True)
for file in out_dir.iterdir():
logging.debug(f"Delete old file: {file}")
os.remove(file)
test_only_overwrite_path_var(
"{MCUPDATER_OUT_DIR}",
out_dir,
)
self.updater = MCUpdater(
"ARCH", get_path("{MCUPDATER_TEST_DIR}"), [], [], False, multi_mode=True
)
self.updater.gen_all()
return self.compare_files(
out_dir,
[
"test_a_aarch64_v8a__fp_armv8.txt.yaml",
"test_a_arm64_v8.2a.txt.yaml",
"test_b_arm64.txt.yaml",
],
)
def test_no_symbol_tests(self):
out_dir = Path(get_path("{MCUPDATER_TEST_OUT_DIR}").joinpath("no_symbol"))
if not out_dir.exists():
out_dir.mkdir(parents=True)
for file in out_dir.iterdir():
logging.debug(f"Delete old file: {file}")
os.remove(file)
test_only_overwrite_path_var(
"{MCUPDATER_OUT_DIR}",
out_dir,
)
self.updater = MCUpdater(
"ARCH",
get_path("{MCUPDATER_TEST_DIR}"),
[],
[],
False,
)
self.updater.gen_all()
self.assertFalse(
out_dir.joinpath("test_no_symbol.s.txt.yaml").exists(),
"File should not exist",
)
def compare_files(self, out_dir: Path, filenames: list[str]) -> bool:
if not out_dir.is_dir():
logging.error(f"{out_dir} is not a directory.")
return False
parent_name = out_dir.parent.name
expected_dir = (
get_path("{MCUPDATER_TEST_DIR_EXPECTED}")
.joinpath(parent_name)
.joinpath(out_dir.name)
)
if not expected_dir.exists() or not expected_dir.is_dir():
logging.error(f"{expected_dir} is not a directory.")
return False
for file in filenames:
efile = expected_dir.joinpath(file)
if not efile.exists():
logging.error(f"{efile} does not exist")
return False
with open(efile) as f:
logging.debug(f"Read {efile}")
expected = f.read()
afile = out_dir.joinpath(file)
if not afile.exists():
logging.error(f"{afile} does not exist")
return False
with open(afile) as f:
logging.debug(f"Read {afile}")
actual = f.read()
if expected != actual:
logging.error("Files mismatch")
print(f"Expected: {efile}")
print(f"Actual: {afile}\n")
print(f"Expected:\n\n{expected}\n")
print(f"Actual:\n\n{actual}\n")
return False
logging.debug(f"OK: actual == expected")
return True

View File

@@ -0,0 +1,77 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import json
import logging as log
from pathlib import Path
import tree_sitter_cpp as ts_cpp
from tree_sitter import Language, Parser
from autosync.Helper import fail_exit
class Configurator:
"""
Holds common setup procedures for the configuration.
It reads the configuration file, compiles languages and initializes the Parser.
"""
arch: str
config_path: Path
config: dict = None
ts_cpp_lang: Language = None
parser: Parser = None
def __init__(self, arch: str, config_path: Path) -> None:
self.arch = arch
self.config_path = config_path
self.load_config()
self.ts_set_cpp_language()
self.init_parser()
def get_arch(self) -> str:
return self.arch
def get_cpp_lang(self) -> Language:
if self.ts_cpp_lang:
return self.ts_cpp_lang
self.ts_set_cpp_language()
return self.ts_cpp_lang
def get_parser(self) -> Parser:
if self.parser:
return self.parser
self.init_parser()
return self.parser
def get_arch_config(self) -> dict:
if self.config:
return self.config[self.arch]
self.load_config()
return self.config[self.arch]
def get_general_config(self) -> dict:
if self.config:
return self.config["General"]
self.load_config()
return self.config["General"]
def load_config(self) -> None:
if not Path.exists(self.config_path):
fail_exit(f"Could not load arch config file at '{self.config_path}'")
with open(self.config_path) as f:
conf = json.loads(f.read())
if self.arch not in conf:
fail_exit(
f"{self.arch} has no configuration. Please add them in {self.config_path}!"
)
self.config = conf
def ts_set_cpp_language(self) -> None:
self.ts_cpp_lang = Language(ts_cpp.language())
def init_parser(self) -> None:
log.debug("Init parser")
self.parser = Parser()
self.parser.set_language(self.ts_cpp_lang)

View File

@@ -0,0 +1,538 @@
#!/usr/bin/env python3
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import argparse
import logging as log
import sys
from pathlib import Path
import termcolor
from tree_sitter import Language, Node, Parser, Query, Tree
from autosync.cpptranslator.Configurator import Configurator
from autosync.cpptranslator.patches.AddCSDetail import AddCSDetail
from autosync.cpptranslator.patches.AddOperand import AddOperand
from autosync.cpptranslator.patches.Assert import Assert
from autosync.cpptranslator.patches.BitCastStdArray import BitCastStdArray
from autosync.cpptranslator.patches.CheckDecoderStatus import CheckDecoderStatus
from autosync.cpptranslator.patches.ClassConstructorDef import ClassConstructorDef
from autosync.cpptranslator.patches.ClassesDef import ClassesDef
from autosync.cpptranslator.patches.ConstMCInstParameter import ConstMCInstParameter
from autosync.cpptranslator.patches.ConstMCOperand import ConstMCOperand
from autosync.cpptranslator.patches.CppInitCast import CppInitCast
from autosync.cpptranslator.patches.CreateOperand0 import CreateOperand0
from autosync.cpptranslator.patches.CreateOperand1 import CreateOperand1
from autosync.cpptranslator.patches.Data import Data
from autosync.cpptranslator.patches.DeclarationInConditionClause import (
DeclarationInConditionalClause,
)
from autosync.cpptranslator.patches.DecodeInstruction import DecodeInstruction
from autosync.cpptranslator.patches.DecoderCast import DecoderCast
from autosync.cpptranslator.patches.DecoderParameter import DecoderParameter
from autosync.cpptranslator.patches.FallThrough import FallThrough
from autosync.cpptranslator.patches.FeatureBits import FeatureBits
from autosync.cpptranslator.patches.FeatureBitsDecl import FeatureBitsDecl
from autosync.cpptranslator.patches.FieldFromInstr import FieldFromInstr
from autosync.cpptranslator.patches.GetNumOperands import GetNumOperands
from autosync.cpptranslator.patches.GetOpcode import GetOpcode
from autosync.cpptranslator.patches.GetOperand import GetOperand
from autosync.cpptranslator.patches.GetOperandRegImm import GetOperandRegImm
from autosync.cpptranslator.patches.GetRegClass import GetRegClass
from autosync.cpptranslator.patches.GetRegFromClass import GetRegFromClass
from autosync.cpptranslator.patches.GetSubReg import GetSubReg
from autosync.cpptranslator.patches.Includes import Includes
from autosync.cpptranslator.patches.InlineToStaticInline import InlineToStaticInline
from autosync.cpptranslator.patches.IsOptionalDef import IsOptionalDef
from autosync.cpptranslator.patches.IsPredicate import IsPredicate
from autosync.cpptranslator.patches.IsRegImm import IsOperandRegImm
from autosync.cpptranslator.patches.LLVMFallThrough import LLVMFallThrough
from autosync.cpptranslator.patches.LLVMunreachable import LLVMUnreachable
from autosync.cpptranslator.patches.MethodToFunctions import MethodToFunction
from autosync.cpptranslator.patches.MethodTypeQualifier import MethodTypeQualifier
from autosync.cpptranslator.patches.NamespaceAnon import NamespaceAnon
from autosync.cpptranslator.patches.NamespaceArch import NamespaceArch
from autosync.cpptranslator.patches.NamespaceLLVM import NamespaceLLVM
from autosync.cpptranslator.patches.OutStreamParam import OutStreamParam
from autosync.cpptranslator.patches.Override import Override
from autosync.cpptranslator.patches.Patch import Patch
from autosync.cpptranslator.patches.PredicateBlockFunctions import (
PredicateBlockFunctions,
)
from autosync.cpptranslator.patches.PrintAnnotation import PrintAnnotation
from autosync.cpptranslator.patches.PrintRegImmShift import PrintRegImmShift
from autosync.cpptranslator.patches.QualifiedIdentifier import QualifiedIdentifier
from autosync.cpptranslator.patches.ReferencesDecl import ReferencesDecl
from autosync.cpptranslator.patches.RegClassContains import RegClassContains
from autosync.cpptranslator.patches.SetOpcode import SetOpcode
from autosync.cpptranslator.patches.SignExtend import SignExtend
from autosync.cpptranslator.patches.Size import Size
from autosync.cpptranslator.patches.SizeAssignments import SizeAssignment
from autosync.cpptranslator.patches.STIArgument import STIArgument
from autosync.cpptranslator.patches.STIFeatureBits import STIFeatureBits
from autosync.cpptranslator.patches.STParameter import SubtargetInfoParam
from autosync.cpptranslator.patches.StreamOperation import StreamOperations
from autosync.cpptranslator.patches.TemplateDeclaration import TemplateDeclaration
from autosync.cpptranslator.patches.TemplateDefinition import TemplateDefinition
from autosync.cpptranslator.patches.TemplateParamDecl import TemplateParamDecl
from autosync.cpptranslator.patches.TemplateRefs import TemplateRefs
from autosync.cpptranslator.patches.UseMarkup import UseMarkup
from autosync.cpptranslator.patches.UsingDeclaration import UsingDeclaration
from autosync.cpptranslator.TemplateCollector import TemplateCollector
from autosync.Helper import (
convert_loglevel,
get_header,
get_path,
print_prominent_warning,
run_clang_format,
)
class Translator:
ts_cpp_lang: Language = None
parser: Parser = None
template_collector: TemplateCollector = None
src_paths: [Path]
out_paths: [Path]
conf: dict
src = b""
current_src_path_in: Path = None
current_src_path_out: Path = None
tree: Tree = None
# Patch priorities: The bigger the number the later the patch will be applied.
# Patches which create templates must always be executed last. Since syntax
# in macros is no longer parsed as such (but is only recognized as macro body).
#
# If a patch must be executed before another patch (because the matching rules depend on it)
# mark this dependency as you see below.
patches: [Patch] = list()
patch_priorities: {str: int} = {
RegClassContains.__name__: 0,
GetRegClass.__name__: 0,
GetRegFromClass.__name__: 0,
CppInitCast.__name__: 0,
BitCastStdArray.__name__: 0,
PrintRegImmShift.__name__: 0,
InlineToStaticInline.__name__: 0,
GetSubReg.__name__: 0,
UseMarkup.__name__: 0,
ConstMCOperand.__name__: 0,
ClassConstructorDef.__name__: 0,
ConstMCInstParameter.__name__: 0,
PrintAnnotation.__name__: 0,
GetNumOperands.__name__: 0,
STIArgument.__name__: 0,
DecodeInstruction.__name__: 0,
FallThrough.__name__: 0,
SizeAssignment.__name__: 0,
FieldFromInstr.__name__: 0,
FeatureBitsDecl.__name__: 0,
FeatureBits.__name__: 0,
STIFeatureBits.__name__: 0,
Includes.__name__: 0,
CreateOperand0.__name__: 0, # ◁───┐ `CreateOperand0` removes most calls to MI.addOperand().
AddOperand.__name__: 1, # ────────┘ The ones left are fixed with the `AddOperand` patch.
CreateOperand1.__name__: 0,
GetOpcode.__name__: 0,
SetOpcode.__name__: 0,
GetOperand.__name__: 0,
GetOperandRegImm.__name__: 0,
IsOperandRegImm.__name__: 0,
SignExtend.__name__: 0,
DecoderParameter.__name__: 0,
UsingDeclaration.__name__: 0,
DecoderCast.__name__: 0,
IsPredicate.__name__: 0,
IsOptionalDef.__name__: 0,
Assert.__name__: 0, # ◁─────────┐ The llvm_unreachable calls are replaced with asserts.
LLVMUnreachable.__name__: 1, # ─┘ Those assert should stay.
LLVMFallThrough.__name__: 0,
DeclarationInConditionalClause.__name__: 0,
StreamOperations.__name__: 0,
OutStreamParam.__name__: 0, # ◁──────┐ add_cs_detail() is added to printOperand functions with a certain
SubtargetInfoParam.__name__: 0, # ◁──┤ signature. This signature depends on those patches.
MethodToFunction.__name__: 0, # ◁────┤
AddCSDetail.__name__: 1, # ──────────┘
NamespaceAnon.__name__: 0, # ◁─────┐ "llvm" and anonymous namespaces must be removed first,
NamespaceLLVM.__name__: 0, # ◁─────┤ so they don't match in NamespaceArch.
NamespaceArch.__name__: 1, # ──────┘
PredicateBlockFunctions.__name__: 0,
Override.__name__: 0,
Size.__name__: 0,
Data.__name__: 0,
ClassesDef.__name__: 0, # ◁────────┐ Declarations must be extracted first from the classes.
MethodTypeQualifier.__name__: 1, # ┘
# All previous patches can contain qualified identifiers (Ids with the "::" operator) in their search patterns.
# After this patch they are removed.
QualifiedIdentifier.__name__: 2,
ReferencesDecl.__name__: 3, # ◁────┐
CheckDecoderStatus.__name__: 4, # ─┘ Reference declarations must be removed first.
TemplateParamDecl.__name__: 5,
TemplateRefs.__name__: 5,
# Template declarations are replaced with macros.
# Those declarations are parsed as macro afterwards
TemplateDeclaration.__name__: 5,
# Template definitions are replaced with macros.
# Those template functions are parsed as macro afterwards.
TemplateDefinition.__name__: 6,
}
def __init__(self, configure: Configurator, wait_for_user: bool = False):
self.configurator = configure
self.wait_for_user = wait_for_user
self.arch = self.configurator.get_arch()
self.conf = self.configurator.get_arch_config()
self.conf_general = self.configurator.get_general_config()
self.ts_cpp_lang = self.configurator.get_cpp_lang()
self.parser = self.configurator.get_parser()
self.src_paths: [Path] = [
get_path(sp["in"]) for sp in self.conf["files_to_translate"]
]
t_out_dir: Path = get_path("{CPP_TRANSLATOR_TRANSLATION_OUT_DIR}")
self.out_paths: [Path] = [
t_out_dir.joinpath(sp["out"]) for sp in self.conf["files_to_translate"]
]
self.collect_template_instances()
self.init_patches()
def read_src_file(self, src_path: Path) -> None:
"""Reads the file at src_path into self.src"""
log.debug(f"Read {src_path}")
if not Path.exists(src_path):
log.fatal(f"Could not open the source file '{src_path}'")
exit(1)
with open(src_path) as f:
self.src = bytes(f.read(), "utf8")
def init_patches(self):
log.debug("Init patches")
priorities = dict(
sorted(self.patch_priorities.items(), key=lambda item: item[1])
)
for ptype, p in priorities.items():
match ptype:
case RegClassContains.__name__:
patch = RegClassContains(p)
case GetRegClass.__name__:
patch = GetRegClass(p)
case GetRegFromClass.__name__:
patch = GetRegFromClass(p)
case CppInitCast.__name__:
patch = CppInitCast(p)
case BitCastStdArray.__name__:
patch = BitCastStdArray(p)
case CheckDecoderStatus.__name__:
patch = CheckDecoderStatus(p)
case ReferencesDecl.__name__:
patch = ReferencesDecl(p)
case FieldFromInstr.__name__:
patch = FieldFromInstr(p)
case FeatureBitsDecl.__name__:
patch = FeatureBitsDecl(p)
case FeatureBits.__name__:
patch = FeatureBits(p, bytes(self.arch, "utf8"))
case STIFeatureBits.__name__:
patch = STIFeatureBits(p, bytes(self.arch, "utf8"))
case QualifiedIdentifier.__name__:
patch = QualifiedIdentifier(p)
case Includes.__name__:
patch = Includes(p, self.arch)
case ClassesDef.__name__:
patch = ClassesDef(p)
case CreateOperand0.__name__:
patch = CreateOperand0(p)
case CreateOperand1.__name__:
patch = CreateOperand1(p)
case GetOpcode.__name__:
patch = GetOpcode(p)
case SetOpcode.__name__:
patch = SetOpcode(p)
case GetOperand.__name__:
patch = GetOperand(p)
case SignExtend.__name__:
patch = SignExtend(p)
case TemplateDeclaration.__name__:
patch = TemplateDeclaration(p, self.template_collector)
case TemplateDefinition.__name__:
patch = TemplateDefinition(p, self.template_collector)
case DecoderParameter.__name__:
patch = DecoderParameter(p)
case TemplateRefs.__name__:
patch = TemplateRefs(p)
case TemplateParamDecl.__name__:
patch = TemplateParamDecl(p)
case MethodTypeQualifier.__name__:
patch = MethodTypeQualifier(p)
case UsingDeclaration.__name__:
patch = UsingDeclaration(p)
case NamespaceLLVM.__name__:
patch = NamespaceLLVM(p)
case DecoderCast.__name__:
patch = DecoderCast(p)
case IsPredicate.__name__:
patch = IsPredicate(p)
case IsOptionalDef.__name__:
patch = IsOptionalDef(p)
case Assert.__name__:
patch = Assert(p)
case LLVMFallThrough.__name__:
patch = LLVMFallThrough(p)
case DeclarationInConditionalClause.__name__:
patch = DeclarationInConditionalClause(p)
case OutStreamParam.__name__:
patch = OutStreamParam(p)
case MethodToFunction.__name__:
patch = MethodToFunction(p)
case GetOperandRegImm.__name__:
patch = GetOperandRegImm(p)
case StreamOperations.__name__:
patch = StreamOperations(p)
case SubtargetInfoParam.__name__:
patch = SubtargetInfoParam(p)
case SizeAssignment.__name__:
patch = SizeAssignment(p)
case NamespaceArch.__name__:
patch = NamespaceArch(p)
case NamespaceAnon.__name__:
patch = NamespaceAnon(p)
case PredicateBlockFunctions.__name__:
patch = PredicateBlockFunctions(p)
case FallThrough.__name__:
patch = FallThrough(p)
case DecodeInstruction.__name__:
patch = DecodeInstruction(p)
case STIArgument.__name__:
patch = STIArgument(p)
case GetNumOperands.__name__:
patch = GetNumOperands(p)
case AddOperand.__name__:
patch = AddOperand(p)
case PrintAnnotation.__name__:
patch = PrintAnnotation(p)
case ConstMCInstParameter.__name__:
patch = ConstMCInstParameter(p)
case LLVMUnreachable.__name__:
patch = LLVMUnreachable(p)
case ClassConstructorDef.__name__:
patch = ClassConstructorDef(p)
case ConstMCOperand.__name__:
patch = ConstMCOperand(p)
case UseMarkup.__name__:
patch = UseMarkup(p)
case GetSubReg.__name__:
patch = GetSubReg(p)
case InlineToStaticInline.__name__:
patch = InlineToStaticInline(p)
case AddCSDetail.__name__:
patch = AddCSDetail(p, self.arch)
case PrintRegImmShift.__name__:
patch = PrintRegImmShift(p)
case IsOperandRegImm.__name__:
patch = IsOperandRegImm(p)
case Override.__name__:
patch = Override(p)
case Size.__name__:
patch = Size(p)
case Data.__name__:
patch = Data(p)
case _:
log.fatal(f"Patch type {ptype} not in Patch init routine.")
exit(1)
self.patches.append(patch)
def parse(self, src_path: Path) -> None:
self.read_src_file(src_path)
log.debug("Parse source code")
self.tree = self.parser.parse(self.src, keep_text=True)
def patch_src(self, p_list: [(bytes, Node)]) -> None:
if len(p_list) == 0:
return
# Sort list of patches descending so the patches which are last in the file
# get patched first. This way the indices of the code snippets before
# don't change.
patches = sorted(p_list, key=lambda x: x[1].start_byte, reverse=True)
new_src = b""
patch: bytes
node: Node
for patch, node in patches:
start_byte: int = node.start_byte
old_end_byte: int = node.end_byte
start_point: (int, int) = node.start_point
old_end_point: (int, int) = node.end_point
new_src = self.src[:start_byte] + patch + self.src[old_end_byte:]
self.src = new_src
d = len(patch) - (old_end_byte - start_byte)
self.tree.edit(
start_byte=start_byte,
old_end_byte=old_end_byte,
new_end_byte=old_end_byte + d,
start_point=start_point,
old_end_point=old_end_point,
new_end_point=(old_end_point[0], old_end_point[1] + d),
)
self.tree = self.parser.parse(new_src, self.tree, keep_text=True)
def apply_patch(self, patch: Patch) -> bool:
"""Tests if the given patch should be applied for the current architecture or file."""
has_apply_only = (
len(patch.apply_only_to["files"]) > 0
or len(patch.apply_only_to["archs"]) > 0
)
has_do_not_apply = (
len(patch.do_not_apply["files"]) > 0 or len(patch.do_not_apply["archs"]) > 0
)
if not (has_apply_only or has_do_not_apply):
# Lists empty.
return True
if has_apply_only:
if self.arch in patch.apply_only_to["archs"]:
return True
elif self.current_src_path_in.name in patch.apply_only_to["files"]:
return True
return False
elif has_do_not_apply:
if self.arch in patch.do_not_apply["archs"]:
return False
elif self.current_src_path_in.name in patch.do_not_apply["files"]:
return False
return True
log.fatal("Logical error.")
exit(1)
def translate(self) -> None:
for self.current_src_path_in, self.current_src_path_out in zip(
self.src_paths, self.out_paths
):
log.info(f"Translate '{self.current_src_path_in}'")
self.parse(self.current_src_path_in)
patch: Patch
for patch in self.patches:
if not self.apply_patch(patch):
log.debug(f"Skip patch {patch.__class__.__name__}")
continue
pattern: str = patch.get_search_pattern()
# Each patch has a capture which includes the whole subtree searched for.
# Additionally, it can include captures within this subtree.
# Here we bundle these captures together.
query: Query = self.ts_cpp_lang.query(pattern)
captures_bundle: [[(Node, str)]] = list()
for q in query.captures(self.tree.root_node):
if q[1] == patch.get_main_capture_name():
# The main capture the patch is looking for.
captures_bundle.append([q])
else:
# A capture which is part of the main capture.
# Add it to the bundle.
captures_bundle[-1].append(q)
log.debug(
f"Patch {patch.__class__.__name__} (to patch: {len(captures_bundle)})."
)
p_list: (bytes, Node) = list()
cb: [(Node, str)]
for cb in captures_bundle:
patch_kwargs = self.get_patch_kwargs(patch)
bytes_patch: bytes = patch.get_patch(cb, self.src, **patch_kwargs)
p_list.append((bytes_patch, cb[0][0]))
self.patch_src(p_list)
if self.tree.root_node.type == "ERROR":
log.fatal(
f"Patch {patch.__class__.__name__} corrupts the tree for {self.current_src_path_in.name}!"
)
exit(1)
log.info(f"Patched file at '{self.current_src_path_out}'")
with open(self.current_src_path_out, "w") as f:
f.write(get_header())
f.write(self.src.decode("utf8"))
run_clang_format(self.out_paths)
def collect_template_instances(self):
search_paths = [get_path(p) for p in self.conf["files_for_template_search"]]
temp_arg_deduction = [
p.encode("utf8") for p in self.conf["templates_with_arg_deduction"]
]
self.template_collector = TemplateCollector(
self.parser, self.ts_cpp_lang, search_paths, temp_arg_deduction
)
self.template_collector.collect()
def get_patch_kwargs(self, patch):
if isinstance(patch, Includes):
return {"filename": self.current_src_path_in.name}
return dict()
def remark_manual_files(self) -> None:
manual_edited = self.conf["manually_edited_files"]
msg = ""
if len(manual_edited) > 0:
msg += (
termcolor.colored(
"The following files are too complex to translate! Please check them by hand.",
attrs=["bold"],
)
+ "\n"
)
else:
return
for f in manual_edited:
msg += get_path(f).name + "\n"
print_prominent_warning(msg, self.wait_for_user)
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
prog="CppTranslator",
description="Capstones C++ to C translator for LLVM source files",
)
parser.add_argument(
"-a",
dest="arch",
help="Name of target architecture.",
choices=["ARM", "PPC", "AArch64", "Alpha"],
required=True,
)
parser.add_argument(
"-v",
dest="verbosity",
help="Verbosity of the log messages.",
choices=["debug", "info", "warning", "fatal"],
default="info",
)
parser.add_argument(
"-c",
dest="config_path",
help="Config file for architectures.",
default="arch_config.json",
type=Path,
)
arguments = parser.parse_args()
return arguments
if __name__ == "__main__":
if not sys.hexversion >= 0x030B00F0:
log.fatal("Python >= v3.11 required.")
exit(1)
args = parse_args()
log.basicConfig(
level=convert_loglevel(args.verbosity),
stream=sys.stdout,
format="%(levelname)-5s - %(message)s",
)
configurator = Configurator(args.arch, args.config_path)
translator = Translator(configurator)
translator.translate()
translator.remark_manual_files()

View File

@@ -0,0 +1,973 @@
#!/usr/bin/env python3
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import argparse
import difflib as dl
import json
import logging as log
import subprocess
import sys
import tempfile
from enum import StrEnum
from pathlib import Path
from shutil import copy2
from tree_sitter import Language, Node, Parser, Tree
from autosync.cpptranslator.Configurator import Configurator
from autosync.Helper import (
bold,
colored,
convert_loglevel,
find_id_by_type,
get_path,
get_sha256,
print_prominent_info,
print_prominent_warning,
run_clang_format,
separator_line_1,
separator_line_2,
)
class PatchCoord:
"""Holds the coordinate information of tree-sitter nodes."""
start_byte: int
end_byte: int
start_point: tuple[int, int]
end_point: tuple[int, int]
def __init__(
self,
start_byte: int,
end_byte: int,
start_point: tuple[int, int],
end_point: tuple[int, int],
):
self.start_byte = start_byte
self.end_byte = end_byte
self.start_point = start_point
self.end_point = end_point
def __lt__(self, other):
if not (
(self.start_byte <= other.start_byte and self.end_byte <= other.end_byte)
or (self.start_byte >= other.start_byte and self.end_byte >= other.end_byte)
):
raise IndexError(
f"Coordinates overlap. No comparison possible.\n"
f"a.start_byte = {self.start_byte} a.end_byte = {self.end_byte}\n"
f"b.start_byte = {other.start_byte} b.end_byte = {other.end_byte}\n"
)
return self.end_byte < other.start_byte
def __str__(self) -> str:
return f"s: {self.start_byte} e: {self.end_byte}"
@staticmethod
def get_coordinates_from_node(node: Node):
return PatchCoord(
node.start_byte, node.end_byte, node.start_point, node.end_point
)
class ApplyType(StrEnum):
OLD = "OLD" # Apply version from old file
NEW = "NEW" # Apply version from new file (leave unchanged)
SAVED = "SAVED" # Use saved resolution
EDIT = "EDIT" # Edit patch and apply
SHOW_EDIT = "SHOW_EDIT" # Show the saved edited text.
OLD_ALL = "OLD_ALL" # Apply all versions from old file.
PREVIOUS = "PREVIOUS" # Ignore diff and go to previous
class Patch:
node_id: str
coord: PatchCoord
apply: ApplyType
old: bytes
new: bytes
edit: bytes
old_hash: str
new_hash: str
def __init__(
self,
node_id: str,
old: bytes,
new: bytes,
coord: PatchCoord,
apply: ApplyType,
edit: bytes = None,
) -> None:
if apply == ApplyType.SAVED:
raise NotImplementedError("Not yet implemented.")
self.node_id = node_id
self.apply = apply
self.coord = coord
self.old = old
self.new = new
self.edit = edit
self.old_hash = ""
self.new_hash = ""
if self.old:
self.old_hash = get_sha256(self.old)
if self.new:
self.new_hash = get_sha256(self.new)
def get_persist_info(self) -> dict:
"""Returns a dictionary with the relevant information to back up this patch."""
backup = dict()
backup[self.node_id] = dict()
backup[self.node_id]["apply_type"] = str(self.apply)
backup[self.node_id]["old_hash"] = self.old_hash
backup[self.node_id]["new_hash"] = self.new_hash
backup[self.node_id]["edit"] = self.edit.decode("utf8") if self.edit else ""
return backup
def merge(self, other) -> None:
"""
Merge two patches to one. Necessary if two old nodes are not present in the new file.
And therefore share PatchCoordinates.
"""
if other.new:
raise ValueError("This patch should not have a .new set.")
if not other.old:
raise ValueError("No data in .old")
self.old = other.old + self.old
self.old_hash = get_sha256(self.old)
def __lt__(self, other):
try:
return self.coord < other.coord
except IndexError:
raise IndexError(f"Nodes overlap: {self} - {other}")
def __str__(self) -> str:
return f"{self.node_id} @ {self.coord}"
class Differ:
"""
Diffs the newly translated C++ files against the old version.
The general diffing works like this:
The old and the new file get parsed with tree sitter into an AST.
Then, we extract all nodes of a specific type out of this AST.
Which nodes specifically is defined in "arch_config.json::General::nodes_to_diff".
These nodes (old and new separately) are than sorted descending by their coordinates.
Meaning, nodes at the end in the file come first.
The identifiers of those nodes are saved in a single list.
Now we iterate over this list of identifiers. Now we make decisions:
The node id is present as:
old node & new node => Text matches?
yes => Continue
no => Add new node as Patch (see below)
only old node => We save all consecutive old nodes, which have _no_ equivalent new node
and add them as single patch
only new node => Add patch
Now we have the patch. We have a persistence file which saved previous decisions, on which patch to choose.
We take the node text of the old and new node (or only from a single one) and compare them to our previous decision.
If the text of the nodes didn't change since the last run, we auto-apply the patch.
Otherwise, the user decides:
- Choose the old node text
- Choose the new node text
- Open the editor to edit the patch and apply it.
- Use the stored previous decision.
- Select always the old nodes.
- Go back and decide on node before.
Each decision is saved to the persistence file for later.
Last (optional) step is to write the patches to the new file.
Please note that we always write to the new file in the current version.
"""
ts_cpp_lang: Language = None
parser: Parser = None
translated_files: [Path]
diff_dest_files: [Path] = list()
old_files: [Path]
conf_arch: dict
conf_general: dict
tree: Tree = None
persistence_filepath: Path
saved_patches: dict = None
patches: list[Patch]
current_patch: Patch
cur_old_node: Node | None = None
cur_new_node: Node | None = None
cur_nid: str = None
def __init__(
self,
configurator: Configurator,
no_auto_apply: bool,
testing: bool = False,
check_saved: bool = False,
):
self.configurator = configurator
self.no_auto_apply = no_auto_apply
self.arch = self.configurator.get_arch()
self.conf_arch = self.configurator.get_arch_config()
self.conf_general = self.configurator.get_general_config()
self.ts_cpp_lang = self.configurator.get_cpp_lang()
self.parser = self.configurator.get_parser()
self.differ = dl.Differ()
self.testing = testing
self.check_saved = check_saved
self.diff_out_dir = get_path("{CPP_TRANSLATOR_DIFF_OUT_DIR}")
if self.testing:
t_out_dir: Path = get_path("{DIFFER_TEST_NEW_SRC_DIR}")
self.translated_files = [
t_out_dir.joinpath(sp["out"])
for sp in self.conf_arch["files_to_translate"]
]
self.old_files = [
get_path("{DIFFER_TEST_OLD_SRC_DIR}").joinpath(sp["out"])
for sp in self.conf_arch["files_to_translate"]
]
self.load_persistence_file()
else:
t_out_dir: Path = get_path("{CPP_TRANSLATOR_TRANSLATION_OUT_DIR}")
self.translated_files = [
t_out_dir.joinpath(sp["out"])
for sp in self.conf_arch["files_to_translate"]
]
cs_arch_src: Path = get_path("{CS_ARCH_MODULE_DIR}")
cs_arch_src = cs_arch_src.joinpath(
self.arch if self.arch != "PPC" else "PowerPC"
)
self.old_files = [
cs_arch_src.joinpath(f"{cs_arch_src}/" + sp["out"])
for sp in self.conf_arch["files_to_translate"]
]
self.load_persistence_file()
if check_saved:
self.check_saved_patches()
print("Save file is up-to-date.")
def load_persistence_file(self) -> None:
if self.testing:
self.persistence_filepath = get_path("{DIFFER_TEST_PERSISTENCE_FILE}")
else:
self.persistence_filepath = get_path("{DIFFER_PERSISTENCE_FILE}")
if not self.persistence_filepath.exists():
self.saved_patches = dict()
return
with open(self.persistence_filepath, "rb") as f:
try:
self.saved_patches = json.load(f)
except json.decoder.JSONDecodeError as e:
log.fatal(
f"Persistence file {bold(self.persistence_filepath.name)} corrupt."
)
log.fatal("Delete it or fix it by hand.")
log.fatal(f"JSON Exception: {e}")
exit(1)
def save_to_persistence_file(self) -> None:
print("\nSave choices...\n")
with open(self.persistence_filepath, "w") as f:
json.dump(self.saved_patches, f, indent=2)
def persist_patch(self, filename: Path, patch: Patch) -> None:
"""
:param filename: The filename this patch is saved for.
:param patch: The patch to apply.
"""
if filename.name not in self.saved_patches:
self.saved_patches[filename.name] = dict()
log.debug(f"Save: {patch.get_persist_info()}")
self.saved_patches[filename.name].update(patch.get_persist_info())
def copy_files(self) -> None:
"""
Copy translated files to diff directory for editing.
"""
log.info("Copy files for editing")
diff_dir: Path = self.diff_out_dir
for f in self.translated_files:
dest = diff_dir.joinpath(f.name)
copy2(f, dest)
self.diff_dest_files.append(dest)
def get_diff_intro_msg(
self,
old_filename: Path,
new_filename: Path,
current: int,
total: int,
num_diffs: int,
) -> str:
color_new = self.conf_general["diff_color_new"]
color_old = self.conf_general["diff_color_old"]
return (
f"{bold(f'Diffing files - {current}/{total}')} \n\n"
+ f"{bold('NEW FILE: ', color_new)} {str(new_filename)}\n"
+ f"{bold('OLD FILE: ', color_old)} {str(old_filename)}\n\n"
+ f"{bold('Diffs to process: ')} {num_diffs}\n\n"
+ f"{bold('Changes get written to: ')} {bold('NEW FILE', color_new)}\n"
)
def get_diff_node_id(self, node: Node) -> bytes:
"""
Searches in the nodes children for the identifier node and returns its text.
"""
id_types = [""]
for n in self.conf_general["nodes_to_diff"]:
if n["node_type"] == node.type:
id_types = n["identifier_node_type"]
if not id_types:
log.fatal(
f"Diffing: Node of type {node.type} has not identifier type specified."
)
exit(1)
identifier = ""
for id_type in id_types:
identifier = find_id_by_type(node, id_type.split("/"), False)
if identifier:
break
if not identifier:
log.fatal(f'Diffing: Cannot find node type "{id_types}" in named-children.')
exit(1)
return identifier
def parse_file(self, file: Path) -> dict[str:Node]:
"""
Parse a files and return all nodes which should be diffed.
Nodes are indexed by a unique identifier.
"""
with open(file) as f:
content = bytes(f.read(), "utf8")
tree: Tree = self.parser.parse(content, keep_text=True)
node_types_to_diff = [
n["node_type"] for n in self.conf_general["nodes_to_diff"]
]
content = None
if file.suffix == ".h":
# Header file. Get the content in between the include guard
for n in tree.root_node.named_children:
if n.type == "preproc_ifdef":
content = n
break
if not content:
content = tree.root_node
duplicates = list()
nodes_to_diff = dict()
node: Node
# Get diff candidates and add them to the dict.
for node in content.named_children:
if node.type not in node_types_to_diff:
continue
identifier = self.get_diff_node_id(node).decode("utf8")
if identifier in nodes_to_diff.keys() or identifier in duplicates:
# This happens if the chosen identifier is not unique.
log.info(f"Duplicate {bold(identifier)}: Nodes will not be diffed!")
if identifier in nodes_to_diff.keys():
nodes_to_diff.pop(identifier)
duplicates.append(identifier)
continue
log.debug(f"Add node to diff: {identifier}")
nodes_to_diff[identifier] = node
return nodes_to_diff
def print_diff(self, diff_lines: list[str], node_id: str, current: int, total: int):
new_color = self.conf_general["diff_color_new"]
old_color = self.conf_general["diff_color_old"]
print(separator_line_2())
print(f"{bold('Patch:')} {current}/{total}\n")
print(f"{bold('Node:')} {node_id}")
print(f"{bold('Color:')} {colored('NEW FILE - (Just translated)', new_color)}")
print(
f"{bold('Color:')} {colored('OLD FILE - (Currently in Capstone)', old_color)}\n"
)
print(separator_line_1())
for line in diff_lines:
if line[0] == "+":
print(colored(line, new_color))
elif line[0] == "-":
print(colored(line, old_color))
elif line[0] == "?":
continue
else:
print(line)
print(separator_line_2())
@staticmethod
def no_difference(diff_lines: list[str]) -> bool:
for line in diff_lines:
if line[0] != " ":
return False
return True
def print_prompt(
self, saved_diff_present: bool = False, saved_choice: ApplyType = None
) -> str:
new_color = self.conf_general["diff_color_new"]
old_color = self.conf_general["diff_color_old"]
edited_color = self.conf_general["diff_color_edited"]
saved_selection = self.get_saved_choice_prompt(saved_diff_present, saved_choice)
choice = input(
f"Choice: {colored('O', old_color)}, {bold('o', old_color)}, {bold('n', new_color)}, "
f"{saved_selection}, {colored('e', edited_color)}, {colored('E', edited_color)}, p, q, ? > "
)
return choice
def get_saved_choice_prompt(
self, saved_diff_present: bool = False, saved_choice: ApplyType = None
):
new_color = self.conf_general["diff_color_new"]
old_color = self.conf_general["diff_color_old"]
edited_color = self.conf_general["diff_color_edited"]
saved_color = (
self.conf_general["diff_color_saved"] if saved_diff_present else "dark_grey"
)
saved_selection = f"{bold('s', saved_color)}"
if saved_choice == ApplyType.OLD:
saved_selection += f" ({colored('old', old_color)}) "
elif saved_choice == ApplyType.NEW:
saved_selection += f" ({colored('new', new_color)}) "
elif saved_choice == ApplyType.EDIT:
saved_selection += f" ({colored('edited', edited_color)}) "
elif not saved_choice:
saved_selection += f" ({colored('none', 'dark_grey')}) "
return saved_selection
def print_prompt_help(
self, saved_diff_present: bool = False, saved_choice: ApplyType = None
) -> None:
new_color = self.conf_general["diff_color_new"]
old_color = self.conf_general["diff_color_old"]
edited_color = self.conf_general["diff_color_edited"]
saved_choice = self.get_saved_choice_prompt(saved_diff_present, saved_choice)
print(
f"{colored('O', old_color)}\t\t- Accept ALL old diffs\n"
f"{bold('o', old_color)}\t\t- Accept old diff\n"
f"{bold('n', new_color)}\t\t- Accept new diff\n"
f"{colored('e', edited_color)}\t\t- Edit diff (not yet implemented)\n"
f"{saved_choice}\t- Select saved choice\n"
f"p\t\t- Ignore and go to previous diff\n"
f"q\t\t- Quit (previous selections will be saved)\n"
f"?\t\t- Show this help\n\n"
)
def get_user_choice(
self, saved_diff_present: bool, saved_choice: ApplyType
) -> ApplyType:
while True:
choice = self.print_prompt(saved_diff_present, saved_choice)
if choice not in ["O", "o", "n", "e", "E", "s", "p", "q", "?", "help"]:
print(f"{bold(choice)} is not valid.")
self.print_prompt_help(saved_diff_present, saved_choice)
continue
if choice == "q":
print(f"{bold('Quit...')}")
self.save_to_persistence_file()
exit(0)
elif choice == "o":
return ApplyType.OLD
elif choice == "n":
return ApplyType.NEW
elif choice == "O":
return ApplyType.OLD_ALL
elif choice == "e":
return ApplyType.EDIT
elif choice == "E":
return ApplyType.SHOW_EDIT
elif choice == "s":
return ApplyType.SAVED
elif choice in ["?", "help"]:
self.print_prompt_help(saved_diff_present, saved_choice)
continue
elif choice == "p":
return ApplyType.PREVIOUS
def saved_patch_matches(self, saved: dict) -> bool:
if self.cur_old_node:
old_hash = get_sha256(self.cur_old_node.text)
else:
old_hash = ""
if self.cur_new_node:
new_hash = get_sha256(self.cur_new_node.text)
else:
new_hash = ""
return saved["old_hash"] == old_hash and saved["new_hash"] == new_hash
def create_patch(
self,
coord: PatchCoord,
choice: ApplyType,
saved_patch: dict = None,
edited_text: bytes = None,
):
old = self.cur_old_node.text if self.cur_old_node else b""
new = self.cur_new_node.text if self.cur_new_node else b""
return Patch(
self.cur_nid,
old,
new,
coord,
saved_patch["apply_type"] if saved_patch else choice,
edit=edited_text,
)
def add_patch(
self,
apply_type: ApplyType,
consec_old: int,
old_filepath: Path,
patch_coord: PatchCoord,
saved_patch: dict | None = None,
edited_text: bytes | None = None,
) -> None:
self.current_patch = self.create_patch(
patch_coord, apply_type, saved_patch, edited_text
)
self.persist_patch(old_filepath, self.current_patch)
if consec_old > 1:
# Two or more old nodes are not present in the new file.
# Merge them to one patch.
self.patches[-1].merge(self.current_patch)
else:
self.patches.append(self.current_patch)
def diff_nodes(
self,
old_filepath: Path,
new_nodes: dict[bytes, Node],
old_nodes: dict[bytes, Node],
) -> list[Patch]:
"""
Asks the user for each different node, which version should be written.
It writes the choice to a file, so the previous choice can be applied again if nothing changed.
"""
# Sort list of nodes descending.
# This is necessary because
# a) we need to apply the patches backwards (starting from the end of the file,
# so the coordinates in the file don't change, when replace text).
# b) If there is an old node, which is not present in the new file, we search for
# a node which is adjacent (random node order wouldn't allow this).
new_nodes = {
k: v
for k, v in sorted(
new_nodes.items(), key=lambda item: item[1].start_byte, reverse=True
)
}
old_nodes = {
k: v
for k, v in sorted(
old_nodes.items(), key=lambda item: item[1].start_byte, reverse=True
)
}
# Collect all node ids of this file
node_ids = set()
for new_node_id, old_node_id in zip(new_nodes.keys(), old_nodes.keys()):
node_ids.add(new_node_id)
node_ids.add(old_node_id)
# The initial patch coordinates point after the last node in the file.
n0 = new_nodes[list(new_nodes.keys())[0]]
PatchCoord(n0.end_byte, n0.end_byte, n0.end_point, n0.end_point)
node_ids = sorted(node_ids)
self.patches = list()
matching_nodes_count = 0
# Counts the number of _consecutive_ old nodes which have no equivalent new node.
# They will be merged to a single patch later
consec_old = 0
choice: ApplyType | None = None
idx = 0
while idx < len(node_ids):
self.cur_nid = node_ids[idx]
self.cur_new_node = (
None if self.cur_nid not in new_nodes else new_nodes[self.cur_nid]
)
self.cur_old_node = (
None if self.cur_nid not in old_nodes else old_nodes[self.cur_nid]
)
n = (
self.cur_new_node.text.decode("utf8").splitlines()
if self.cur_new_node
else [""]
)
o = (
self.cur_old_node.text.decode("utf8").splitlines()
if self.cur_old_node
else [""]
)
diff_lines = list(self.differ.compare(o, n))
if self.no_difference(diff_lines):
log.info(
f"{bold('Patch:')} {idx + 1}/{len(node_ids)} - Nodes {bold(self.cur_nid)} match."
)
matching_nodes_count += 1
idx += 1
continue
if self.cur_new_node:
consec_old = 0
# We always write to the new file. So we always take he coordinates form it.
patch_coord = PatchCoord.get_coordinates_from_node(self.cur_new_node)
else:
consec_old += 1
# If the old node has no equivalent new node,
# we search for the next adjacent old node which exist also in new nodes.
# The single old node is insert before the found new one.
old_node_ids = list(old_nodes.keys())
j = old_node_ids.index(self.cur_nid)
while j >= 0 and (old_node_ids[j] not in new_nodes.keys()):
j -= 1
ref_new: Node = (
new_nodes[old_node_ids[j]]
if old_node_ids[j] in new_nodes.keys()
else new_nodes[0]
)
ref_end_byte = ref_new.start_byte
# We always write to the new file. So we always take he coordinates form it.
patch_coord = PatchCoord(
ref_end_byte - 1,
ref_end_byte - 1,
ref_new.start_point,
ref_new.start_point,
)
save_exists = False
saved: dict | None = None
if (
old_filepath.name in self.saved_patches
and self.cur_nid in self.saved_patches[old_filepath.name]
):
saved = self.saved_patches[old_filepath.name][self.cur_nid]
save_exists = True
if self.saved_patch_matches(saved) and not self.no_auto_apply:
apply_type = ApplyType(saved["apply_type"])
self.add_patch(apply_type, consec_old, old_filepath, patch_coord)
log.info(
f"{bold('Patch:')} {idx + 1}/{len(node_ids)} - Auto apply patch for {bold(self.cur_nid)}"
)
idx += 1
continue
if choice == ApplyType.OLD_ALL:
self.add_patch(ApplyType.OLD, consec_old, old_filepath, patch_coord)
idx += 1
continue
self.print_diff(diff_lines, self.cur_nid, idx + 1, len(node_ids))
choice = self.get_user_choice(
save_exists, None if not saved else saved["apply_type"]
)
if choice == ApplyType.OLD:
if not self.cur_old_node:
# No data in old node. Skip
idx += 1
continue
self.add_patch(ApplyType.OLD, consec_old, old_filepath, patch_coord)
elif choice == ApplyType.NEW:
# Already in file. Only save patch.
self.persist_patch(old_filepath, self.create_patch(patch_coord, choice))
elif choice == ApplyType.SAVED:
if not save_exists:
print(bold("Save does not exist."))
continue
self.add_patch(
saved["apply_type"],
consec_old,
old_filepath,
patch_coord,
saved_patch=saved,
edited_text=saved["edit"].encode(),
)
elif choice == ApplyType.SHOW_EDIT:
if not saved or not saved["edit"]:
print(bold("No edited text was saved before."))
input("Press enter to continue...\n")
continue
saved_edited_text = colored(
f'\n{saved["edit"]}\n', self.conf_general["diff_color_edited"]
)
print(saved_edited_text)
input("Press enter to continue...\n")
continue
elif choice == ApplyType.OLD_ALL:
self.add_patch(ApplyType.OLD, consec_old, old_filepath, patch_coord)
elif choice == ApplyType.EDIT:
edited_text = self.edit_patch(diff_lines)
if not edited_text:
continue
self.persist_patch(
old_filepath,
self.create_patch(patch_coord, choice, edited_text=edited_text),
)
elif choice == ApplyType.PREVIOUS:
if idx == 0:
print(bold(f"There is no previous diff for {old_filepath.name}!"))
input("Press enter...")
continue
idx -= 1
continue
idx += 1
log.info(f"Number of matching nodes = {matching_nodes_count}")
return self.patches
def diff(self) -> None:
"""
Diffs certain nodes from the newly translated and old source files to each other.
The user then selects which diff should be written to the new file.
"""
# We do not write to the translated files directly.
self.copy_files()
new_file = dict()
old_file = dict()
i = 0
for old_filepath, new_filepath in zip(self.old_files, self.diff_dest_files):
new_file[i] = dict()
new_file[i]["filepath"] = new_filepath
new_file[i]["nodes"] = self.parse_file(new_filepath)
old_file[i] = dict()
old_file[i]["filepath"] = old_filepath
old_file[i]["nodes"] = self.parse_file(old_filepath)
i += 1
patches = dict()
# diff each file
for k in range(i):
old_filepath = old_file[k]["filepath"]
new_filepath = new_file[k]["filepath"]
diffs_to_process = max(len(new_file[k]["nodes"]), len(old_file[k]["nodes"]))
print_prominent_info(
self.get_diff_intro_msg(
old_filepath, new_filepath, k + 1, i, diffs_to_process
)
)
if diffs_to_process == 0:
continue
patches[new_filepath] = self.diff_nodes(
old_filepath, new_file[k]["nodes"], old_file[k]["nodes"]
)
self.patch_files(patches)
self.save_to_persistence_file()
log.info("Done")
def patch_files(self, file_patches: dict[Path, list[Patch]]) -> None:
log.info("Write patches...")
for filepath, patches in file_patches.items():
patches = sorted(patches, reverse=True)
with open(filepath, "rb") as f:
src = f.read()
patch: Patch
for patch in patches:
start_byte = patch.coord.start_byte
end_byte = patch.coord.end_byte
if patch.apply == ApplyType.OLD:
data = patch.old
elif patch.apply == ApplyType.NEW:
data = patch.new
elif patch.apply == ApplyType.EDIT:
data = patch.edit
else:
print_prominent_warning(f"No data for {patch.apply} defined.")
return
src = src[:start_byte] + data + src[end_byte:]
with open(filepath, "wb") as f:
f.write(src)
run_clang_format(list(file_patches.keys()))
return
def edit_patch(self, diff_lines: list[str]) -> bytes | None:
tmp_file = tempfile.NamedTemporaryFile(suffix="c", delete=False)
tmp_file_name = tmp_file.name
tmp_file.writelines([line.encode() + b"\n" for line in diff_lines])
tmp_file.write(self.get_edit_explanation())
tmp_file.close()
editor = self.conf_general["patch_editor"]
try:
subprocess.run([editor, tmp_file_name])
except FileNotFoundError:
log.error(f"Could not find editor '{editor}'")
return None
edited_text = b""
with open(tmp_file_name, "rb") as tmp_file:
for line in tmp_file.readlines():
if self.get_separator_line() in line:
break
edited_text += line
tmp_file.close()
return edited_text
@staticmethod
def get_separator_line() -> bytes:
return f"// {'=' * 50}".encode()
def get_edit_explanation(self) -> bytes:
return (
f"{self.get_separator_line().decode('utf8')}\n"
"// Everything below this line will be deleted\n"
"// Edit the file to your liking. The result will be written 'as is' to the source file.\n"
).encode()
def check_saved_patches(self):
new_file = dict()
old_file = dict()
i = 0
for old_filepath, new_filepath in zip(self.old_files, self.translated_files):
new_file[i] = {
"filepath": new_filepath,
"nodes": self.parse_file(new_filepath),
}
old_file[i] = {
"filepath": old_filepath,
"nodes": self.parse_file(old_filepath),
}
i += 1
# diff each file
for k in range(i):
old_filepath = old_file[k]["filepath"]
diffs_to_process = max(len(new_file[k]["nodes"]), len(old_file[k]["nodes"]))
if diffs_to_process == 0:
continue
filename, node_id = self.all_choices_saved(
old_filepath, new_file[k]["nodes"], old_file[k]["nodes"]
)
if not node_id:
# Edge case of file has all nodes matching. And therefore has
# no decision saved because the user never was asked.
continue
if filename or node_id:
print(
f"{get_path('{DIFFER_PERSISTENCE_FILE}').name} is not up-to-date!\n"
f"{filename} still requires a user decision for node {node_id}.\n"
f"If the file is good as it is, "
f"commit any changes to it and run the Differ again and choose 'O' to save them."
)
exit(1)
def all_choices_saved(
self, old_filepath, new_nodes, old_nodes
) -> tuple[str, str] | tuple[None, None]:
"""Returns the a (filename, node_id) which is not saved in the save file. Or None, if everything is ok."""
if old_filepath.name not in self.saved_patches:
return old_filepath.name, None
new_nodes = {
k: v
for k, v in sorted(
new_nodes.items(), key=lambda item: item[1].start_byte, reverse=True
)
}
old_nodes = {
k: v
for k, v in sorted(
old_nodes.items(), key=lambda item: item[1].start_byte, reverse=True
)
}
# Collect all node ids of this file
node_ids = set()
for new_node_id, old_node_id in zip(new_nodes.keys(), old_nodes.keys()):
node_ids.add(new_node_id)
node_ids.add(old_node_id)
for self.cur_nid in node_ids:
self.cur_new_node = (
None if self.cur_nid not in new_nodes else new_nodes[self.cur_nid]
)
self.cur_old_node = (
None if self.cur_nid not in old_nodes else old_nodes[self.cur_nid]
)
if (
old_filepath.name in self.saved_patches
and self.cur_nid in self.saved_patches[old_filepath.name]
):
saved = self.saved_patches[old_filepath.name][self.cur_nid]
if not self.saved_patch_matches(saved):
return old_filepath.name, self.cur_nid
return None, None
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
prog="Differ",
description="Diffs translated C++ files to previous version.",
)
parser.add_argument(
"-e",
dest="no_auto_apply",
help="Do not apply saved diff resolutions. Ask for every diff again.",
action="store_true",
)
parser.add_argument(
"-a",
dest="arch",
help="Name of target architecture (ignored with -t option)",
choices=["ARM", "PPC", "AArch64", "Alpha", "LoongArch"],
required=True,
)
parser.add_argument(
"-v",
dest="verbosity",
help="Verbosity of the log messages.",
choices=["debug", "info", "warning", "fatal"],
default="info",
)
parser.add_argument(
"-t", dest="testing", help="Run with test configuration.", action="store_true"
)
parser.add_argument(
"--check_saved",
dest="check_saved",
help=f"Check if patches in {get_path('{DIFFER_PERSISTENCE_FILE}')} is up-to-date.",
action="store_true",
)
arguments = parser.parse_args()
return arguments
if __name__ == "__main__":
if not sys.hexversion >= 0x030B00F0:
log.fatal("Python >= v3.11 required.")
exit(1)
args = parse_args()
log.basicConfig(
level=convert_loglevel(args.verbosity),
stream=sys.stdout,
format="%(levelname)-5s - %(message)s",
)
if args.testing:
cfg = Configurator("ARCH", get_path("{DIFFER_TEST_CONFIG_FILE}"))
else:
cfg = Configurator(args.arch, get_path("{CPP_TRANSLATOR_CONFIG}"))
differ = Differ(
cfg, args.no_auto_apply, testing=args.testing, check_saved=args.check_saved
)
if args.check_saved:
exit(0)
try:
differ.diff()
except Exception as e:
differ.save_to_persistence_file()
raise e

View File

@@ -0,0 +1,166 @@
<!--
Copyright © 2022 Rot127 <unisono@quyllur.org>
SPDX-License-Identifier: BSD-3
-->
# C++ Translator
Capstone uses source files from LLVM to disassemble opcodes.
Because LLVM is written in C++ we must translate those files to C.
The task of the `CppTranslator` is to do just that.
The translation will not result in a completely correct C file! But it takes away most of the manual work.
## The configuration file
The configuration for each architecture is set in `arch_config.json`.
The config values have the following meaning:
- `General`: Settings valid for all architectures.
- `diff_color_new`: Color in the `Differ` for translated content.
- `diff_color_old`: Color in the `Differ` for old/current Capstone content.
- `diff_color_saved`: Color in the `Differ` for saved content.
- `diff_color_edited`: Color in the `Differ` for edited content.
- `patch_editor`: Editor to open for patch editing.
- `nodes_to_diff`: List of parse tree nodes which get diffed - *Mind the note below*.
- `node_type`: The `type` of the node to be diffed.
- `identifier_node_type`: Types of child nodes which identify the node during diffing (the identifier must be the same in the translated and the old file!). Types can be of the form `<parent-type>/<child type>`.
- `<ARCH>`: Settings valid for a specific architecture
- `files_to_translate`: A list of file paths to translate.
- `in`: *Path* to a specific source file.
- `out`: The *filename* of the translated file.
- `files_for_template_search`: List of file paths to search for calls to template functions.
- `manually_edite_files`: List of files which are too complicated to translate. The user will be warned about them.
- `templates_with_arg_deduction`: Template functions which uses [argument deduction](https://en.cppreference.com/w/cpp/language/template_argument_deduction). Those templates are translated to normal functions, not macro definition.
_Note_:
- To understand the `nodes_to_diff` setting, check out `Differ.py`.
- Paths can contain `{AUTO_SYNC_ROOT}`, `{CS_ROOT}` and `{CPP_TRANSLATOR_ROOT}`.
They are replaced with the absolute paths to those directories.
## Translation process
The translation process simply searches for certain syntax and patches it.
To allow searches for complicated patterns we parse the C++ file with Tree-sitter.
Afterward we can use [pattern queries](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries)
to find our syntax we would like to patch.
Here is an overview of the procedure:
- First the source file is parsed with Tree-Sitter.
- Afterward the translator iterates of a number of patches.
For each patch we do the following.
```
Translator Patch
+---+
| | +----+
| | Request pattern to search for | |
| | ----------------------------------> | |
| | | |
| | Return pattern | |
| | <--------------------------------- | |
| | | |
| | ---+ | |
| | | Find | |
| | | captures | |
| | | in src | |
| | <--+ | |
| | | |
| | Return captures found | |
| | ----------------------------------> | |
| | | |
| | +-- | |
| | Use capture | | |
| | info to | | |
| | build new | | |
| | syntax str | | |
| | +-> | |
| | | |
| | Return new syntax string to patch | |
| | <---------------------------------- | |
| | | |
| | ---+ | |
| | | Replace old | |
| | | with new syntax | |
| | | at all occurrences | |
| | | in the file. | |
| | <--+ | |
| | | |
+---+ +----+
```
## C++ Template translation
Most of the C++ syntax is simple to translate. But unfortunately the one exception are C++ templates.
Translating template functions and calls from C++ to C is tricky.
Since each template has a number of actual implementations we do the following.
- A template function definition is translated into a C macro.
- The template parameters get translated to the macro parameters.
- To differentiate the C implementations, the functions follow the naming pattern `fcn_[template_param_0]_[template_param_1]()`
<hr>
**Example**
This C++ template function
```cpp
template<unsigned X>
void fcn() {
unsigned a = X * 8;
}
```
becomes
```
#define DEFINE_FCN(X) \
void fcn ## _ ## X() { \
unsigned a = X * 8; \
}
```
To define an implementation where `X = 0` we do
```
DEFINE_FCN(0)
```
To call this implementation we call `fcn_0()`.
_(There is a special case when a template parameter is passed on to a template call. But this is explained in the code.)_
<hr>
### Enumerate template instances
In our C++ code a template function can be called with different template parameters.
For each of those calls we need to define a template implementation in C.
To do that we first scan source files for calls to template functions (`TemplateCollector.py` does this).
For each unique call we check the parameter list.
Knowing the parameter list we can now define a C function which uses exactly those parameters.
For the definition we use a macro as above.
<hr>
**Example**
Within this C++ code we see two template function calls:
```cpp
void main() {
fcn<0>();
fcn<4>();
}
```
With the knowledge that once parameter `1` and once parameter `4` was passed to the template,
we can define the implementations with the help of our `DEFINE_FCN` macro.
```c
DEFINE_FCN(0)
DEFINE_FCN(4)
```
Within the C code we can now call those with `fcn_0()` and `fcn_4()`.
<hr>

View File

@@ -0,0 +1,346 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import logging as log
import re
from pathlib import Path
from tree_sitter import Language, Node, Parser, Query
from autosync.cpptranslator.patches.Helper import get_text
class TemplateRefInstance:
"""
Represents a concrete instance of a template function reference.
E.g. DecodeT2Imm7<shift, 2>
"""
name: bytes
args: bytes
args_list: list
dependent_calls = list()
# Holds the indices of the caller template parameters which set the templ. parameters
# of this TemplateCallInstance.
# Structure: {caller_name: i, "self_i": k}
#
# Only used if this is an incomplete TemplateInstance
# (parameters are set by the template parameters of the calling function).
caller_param_indices: [{str: int}] = list()
def __init__(
self, name: bytes, args: bytes, start_point, start_byte, end_point, end_byte
):
self.name = name
self.args = args
self.start_point = start_point
self.start_byte = start_byte
self.end_point = end_point
self.end_byte = end_byte
self.args_list = TemplateCollector.templ_params_to_list(args)
self.templ_name = name + args
def __eq__(self, other):
return (
self.name == other.name
and self.args == other.args
and any(
[
a == b
for a, b in zip(
self.caller_param_indices, other.caller_param_indices
)
]
)
and self.start_byte == other.start_byte
and self.start_point == other.start_point
and self.end_byte == other.end_byte
and self.end_point == other.end_point
)
def set_dep_calls(self, deps: list):
self.dependent_calls = deps
def get_c_name(self):
return b"_".join([self.name] + self.args_list)
def get_args_for_decl(self) -> list[bytes]:
"""Returns the list of arguments, but replaces all characters which
can not be part of a C identifier with _
"""
args_list = [re.sub(b"'", b"", a) for a in self.args_list]
return args_list
class TemplateCollector:
"""
Searches through the given files for calls to template functions.
And creates a list with concrete template instances.
"""
# List of completed template instances indexed by their name.
# One function can have multiple template instances. Depending on the template arguments
template_refs: {bytes: [TemplateRefInstance]} = dict()
# List of incomplete template instances indexed by the **function name they depend on**!
incomplete_template_refs: {bytes: [TemplateRefInstance]} = dict()
sources: [{str: bytes}] = list()
def __init__(
self,
ts_parser: Parser,
ts_cpp: Language,
searchable_files: [Path],
temp_arg_deduction: [bytes],
):
self.parser = ts_parser
self.lang_cpp = ts_cpp
self.searchable_files = searchable_files
self.templates_with_arg_deduction = temp_arg_deduction
def collect(self):
self.read_files()
for x in self.sources:
path = x["path"]
src = x["content"]
log.debug(f"Search for template references in {path}")
tree = self.parser.parse(src, keep_text=True)
query: Query = self.lang_cpp.query(self.get_template_pattern())
self.get_capture_bundles(query, tree)
capture_bundles = self.get_capture_bundles(query, tree)
for cb in capture_bundles:
templ_name: Node = cb[1][0]
templ_args: Node = cb[2][0]
name = get_text(src, templ_name.start_byte, templ_name.end_byte)
args = get_text(src, templ_args.start_byte, templ_args.end_byte)
ti = TemplateRefInstance(
name,
args,
cb[0][0].start_point,
cb[0][0].start_byte,
cb[0][0].end_point,
cb[0][0].end_byte,
)
log.debug(
f"Found new template ref: {name.decode('utf8')}{args.decode('utf8')}"
)
if not self.contains_template_dependent_param(src, ti, cb[0]):
if name not in self.template_refs:
self.template_refs[name] = list()
# The template function has no parameter which is part of a previous
# template definition. So all template parameters are well-defined.
# Add it to the well-defined list.
if ti not in self.template_refs[name]:
self.template_refs[name].append(ti)
self.resolve_dependencies()
def resolve_dependencies(self):
# Resolve dependencies of templates until nothing new was resolved.
prev_len = 0
while (
len(self.incomplete_template_refs) > 0
and len(self.incomplete_template_refs) != prev_len
):
# Dict with new template calls which were previously incomplete
# because one or more parameters were unknown.
new_completed_tcs: {str: list} = dict()
tc_instance_list: [TemplateRefInstance]
for caller_name, tc_instance_list in self.template_refs.items():
# Check if this caller has a dependent template call.
# In other words: If a template parameter of this caller is given
# to another template call in the callers body.
if caller_name not in self.incomplete_template_refs:
# Not in the dependency list. Skip it.
continue
# For each configuration of template parameters we complete a template reference.
for caller_template in tc_instance_list:
incomplete_tc: TemplateRefInstance
for incomplete_tc in self.incomplete_template_refs[caller_name]:
new_tc: TemplateRefInstance = self.get_completed_tc(
caller_template, incomplete_tc
)
callee_name = new_tc.name
if callee_name not in new_completed_tcs:
new_completed_tcs[callee_name] = list()
if new_tc not in new_completed_tcs[callee_name]:
new_completed_tcs[callee_name].append(new_tc)
del self.incomplete_template_refs[caller_name]
for templ_name, tc_list in new_completed_tcs.items():
if templ_name in self.template_refs:
self.template_refs[templ_name] += tc_list
else:
self.template_refs[templ_name] = tc_list
prev_len = len(self.incomplete_template_refs)
if prev_len > 0:
log.info(
f"Unresolved template calls: {self.incomplete_template_refs.keys()}. Patch them by hand!"
)
@staticmethod
def get_completed_tc(
tc: TemplateRefInstance, itc: TemplateRefInstance
) -> TemplateRefInstance:
new_tc = TemplateRefInstance(
itc.name,
itc.args,
itc.start_byte,
itc.start_byte,
itc.end_point,
itc.end_byte,
)
for indices in itc.caller_param_indices:
if tc.name not in indices:
# Index of other caller function. Skip.
continue
caller_i = indices[tc.name]
self_i = indices["self_i"]
new_tc.args_list[self_i] = tc.args_list[caller_i]
new_tc.args = TemplateCollector.list_to_templ_params(new_tc.args_list)
new_tc.templ_name = new_tc.name + new_tc.args
return new_tc
def contains_template_dependent_param(
self, src, ti: TemplateRefInstance, parse_tree: (Node, str)
) -> bool:
"""Here we check if one of the template parameters of the given template call,
is a parameter of the callers template definition.
Let's assume we find the template call `func_B<X>()`.
Now look at the context `func_B<X>` is in:
template<X>
void func_A() {
func_B<X>(a)
}
Since `X` is a template parameter of `func_A` we have to wait until we see a call
to `func_A<X>` where `X` gets properly defined.
Until then we save the TemplateInstance of `func_B<X>` in a list of incomplete
template calls and note that it depends on `func_A`.
If later a call to function `func_A` is found (with a concrete value for `X`) we can add
a concrete TemplateInstance of `func_B`.
:param: src The current source code to operate on.
:param: ti The TemplateInstance for which to check dependencies.
:param: parse_tree The parse tree of the template call.
:return: True if a dependency was found. False otherwise.
"""
# Search up to the function definition this call belongs to
node: Node = parse_tree[0]
while node.type != "function_definition":
node = node.parent
if not node.prev_named_sibling.type == "template_parameter_list":
# Caller is a normal function definition.
# Nothing to do here.
return False
caller_fcn_id = node.named_children[2].named_children[0]
caller_fcn_name = get_text(
src, caller_fcn_id.start_byte, caller_fcn_id.end_byte
)
caller_templ_params = get_text(
src, node.prev_sibling.start_byte, node.prev_sibling.end_byte
)
pl = TemplateCollector.templ_params_to_list(caller_templ_params)
has_parameter_dependency = False
for i, param in enumerate(pl):
if param in ti.args_list:
has_parameter_dependency = True
ti.caller_param_indices.append(
{caller_fcn_name: i, "self_i": ti.args_list.index(param)}
)
if not has_parameter_dependency:
return False
if caller_fcn_name not in self.incomplete_template_refs:
self.incomplete_template_refs[caller_fcn_name] = list()
if ti not in self.incomplete_template_refs[caller_fcn_name]:
self.incomplete_template_refs[caller_fcn_name].append(ti)
return True
def read_files(self):
for sf in self.searchable_files:
if not Path.exists(sf):
log.fatal(f"TemplateCollector: Could not find '{sf}' for search.")
exit(1)
log.debug(f"TemplateCollector: Read {sf}")
with open(sf) as f:
file = {"path": sf, "content": bytes(f.read(), "utf8")}
self.sources.append(file)
@staticmethod
def get_capture_bundles(query, tree):
captures_bundle: [[(Node, str)]] = list()
for q in query.captures(tree.root_node):
if q[1] == "templ_ref":
captures_bundle.append([q])
else:
captures_bundle[-1].append(q)
return captures_bundle
@staticmethod
def get_template_pattern():
"""
:return: A pattern which finds either a template function calls or references.
"""
return (
"(template_function"
" ((identifier) @name)"
" ((template_argument_list) @templ_args)"
") @templ_ref"
)
@staticmethod
def templ_params_to_list(templ_params: bytes) -> list[bytes]:
if not templ_params:
return list()
params = templ_params.strip(b"<>").split(b",")
params = [p.strip() for p in params]
res = list()
for p in params:
if len(p.split(b" ")) == 2:
# Typename specified for parameter. Remove it.
# If it was more than one space, it is likely an operation like `size + 1`
p = p.split(b" ")[1]
# true and false get resolved to 1 and 0
if p == "true":
p = "1"
elif p == "false":
p = "0"
res.append(p)
return res
@staticmethod
def list_to_templ_params(temp_param_list: list) -> bytes:
return b"<" + b", ".join(temp_param_list) + b">"
@staticmethod
def get_macro_c_call(name: bytes, arg_param_list: [bytes], fcn_args: bytes = b""):
res = b""
fa = [name] + arg_param_list
for x in fa[:-1]:
res += b"CONCAT(" + x + b", "
res += fa[-1]
return res + (b")" * (len(fa) - 1)) + fcn_args
@staticmethod
def log_missing_ref_and_exit(func_ref: bytes) -> None:
log.fatal(
f"Template collector has no reference for {func_ref}.\n\n"
f"The possible reasons are:\n"
"\t\t\t- Not all C++ source files which call this function are listed in the config.\n"
"\t\t\t- You removed the C++ template syntax from the .td file for this function.\n"
"\t\t\t- The function is a template with argument deduction and has no `template<...>` preamble. "
"Add it in the config as exception in this case."
)
exit(1)

View File

@@ -0,0 +1,28 @@
// SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
// SPDX-License-Identifier: BSD-3
#include "some_header_I.h"
#include <some_system_header.h>
#define MACRO_A 0
#define MACRO_B 0
#define FCN_MACRO_A(x) function_a(x)
#define FCN_MACRO_B(x) \
function_b(x + 1)
int main() {
int x = 10000 * 71;
return x;
}
void function_a(int x) {
return;
}
void function_b(unsigned x) {
return;
}
void only_in_new() {}

View File

@@ -0,0 +1,29 @@
// SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
// SPDX-License-Identifier: BSD-3
#include "some_header_I.h"
#include <some_system_header.h>
#define MACRO_A 0
#define MACRO_B 0
#define FCN_MACRO_A(x) function_a(x)
#define FCN_MACRO_B(x) \
function_b(x)
int main() {
int x = 71;
return x;
}
void function_a(int x) {
return;
}
void function_b(int x) {
return;
}
void only_in_old_I() {}
void only_in_old_II() {}

View File

@@ -0,0 +1,35 @@
{
"General": {
"diff_color_new": "green",
"diff_color_old": "light_blue",
"diff_color_saved": "yellow",
"diff_color_edited": "light_magenta",
"patch_editor": "vim",
"nodes_to_diff": [
{
"node_type": "function_definition",
"identifier_node_type": ["function_declarator/identifier"]
},{
"node_type": "preproc_function_def",
"identifier_node_type": ["identifier"]
},{
"node_type": "preproc_include",
"identifier_node_type": ["string_literal", "system_lib_string"]
},{
"node_type": "preproc_define",
"identifier_node_type": ["identifier"]
}
]
},
"ARCH": {
"files_to_translate": [
{
"in": "{DIFFER_TEST_OLD_SRC_DIR}/diff_test_file.c",
"out": "diff_test_file.c"
}
],
"files_for_template_search": [],
"templates_with_arg_deduction": [],
"manually_edited_files": []
}
}

View File

@@ -0,0 +1,7 @@
// SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
// SPDX-License-Identifier: BSD-3
int main() {
tfunction<int, int>();
tfunction<int, char>();
}

View File

@@ -0,0 +1,16 @@
{
"General": {
"diff_color_new": "green",
"diff_color_old": "light_blue",
"diff_color_saved": "yellow",
"diff_color_edited": "light_magenta",
"patch_editor": "vim",
"nodes_to_diff": []
},
"ARCH": {
"files_to_translate": [],
"files_for_template_search": ["{PATCHES_TEST_DIR}/template_src.c"],
"templates_with_arg_deduction": [],
"manually_edited_files": []
}
}

View File

@@ -0,0 +1,106 @@
# SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import unittest
from tree_sitter import Node
from autosync.cpptranslator.Configurator import Configurator
from autosync.cpptranslator.Differ import ApplyType, Differ, Patch, PatchCoord
from autosync.cpptranslator.TemplateCollector import TemplateCollector
from autosync.Helper import get_path
class TestHeaderPatcher(unittest.TestCase):
@classmethod
def setUpClass(cls):
configurator = Configurator("ARCH", get_path("{DIFFER_TEST_CONFIG_FILE}"))
cls.ts_cpp_lang = configurator.get_cpp_lang()
cls.parser = configurator.get_parser()
cls.template_collector = TemplateCollector(
configurator.get_parser(), configurator.get_cpp_lang(), [], []
)
cls.differ = Differ(configurator, testing=True, no_auto_apply=True)
def check_persistence(self, nid, expected, apply_type, edited_text):
new_node: Node = self.new_nodes[nid] if nid in self.new_nodes else None
old_node: Node = self.old_nodes[nid] if nid in self.old_nodes else None
if not new_node:
before_old_node = old_node.start_byte - 1
coord = PatchCoord(
before_old_node,
before_old_node,
(before_old_node, before_old_node),
(before_old_node, before_old_node),
)
else:
coord = PatchCoord(
new_node.start_byte,
new_node.end_byte,
new_node.start_point,
new_node.end_point,
)
patch = Patch(
node_id=nid,
old=old_node.text if old_node else b"",
new=new_node.text if new_node else b"",
coord=coord,
apply=apply_type,
edit=edited_text,
)
self.assertEqual(patch.get_persist_info(), expected)
def parse_files(self, filename: str):
self.old_nodes = self.differ.parse_file(
get_path("{DIFFER_TEST_OLD_SRC_DIR}").joinpath(filename)
)
self.new_nodes = self.differ.parse_file(
get_path("{DIFFER_TEST_NEW_SRC_DIR}").joinpath(filename)
)
def test_patch_persistence(self):
self.parse_files("diff_test_file.c")
nid = "function_b"
expected = {
f"{nid}": {
"apply_type": "OLD",
"edit": "aaaaaaa",
"new_hash": "e5b3e0e5c6fb1f5f39e5725e464e6dfa3c6a7f1a8a5d104801e1fc10b6f1cc2b",
"old_hash": "8fc2b2123209c37534bb60c8e38564ed773430b9fc5bca37a0ae73a64b2883ab",
}
}
edited_text: bytes = b"aaaaaaa"
self.check_persistence(nid, expected, ApplyType.OLD, edited_text)
nid = "only_in_old_I"
expected = {
f"{nid}": {
"apply_type": "NEW",
"edit": "",
"new_hash": "",
"old_hash": "37431b6fe6707794a8e07902bef6510fc1d10b833db9b1dccc70b1530997b2b1",
}
}
self.check_persistence(nid, expected, ApplyType.NEW, b"")
self.assertRaises(
NotImplementedError,
self.check_persistence,
nid=nid,
expected=expected,
apply_type=ApplyType.SAVED,
edited_text=b"",
)
nid = "function_b"
expected = {
f"{nid}": {
"apply_type": "EDIT",
"edit": "aaaaaaa\n\n\n\n\n91928",
"new_hash": "e5b3e0e5c6fb1f5f39e5725e464e6dfa3c6a7f1a8a5d104801e1fc10b6f1cc2b",
"old_hash": "8fc2b2123209c37534bb60c8e38564ed773430b9fc5bca37a0ae73a64b2883ab",
}
}
edited_text: bytes = b"aaaaaaa\n\n\n\n\n91928"
self.check_persistence(nid, expected, ApplyType.EDIT, edited_text)

View File

@@ -0,0 +1,581 @@
#!/usr/bin/env python3
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-FileCopyrightText: 2024 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import unittest
from tree_sitter import Node, Query
from autosync.cpptranslator import CppTranslator
from autosync.cpptranslator.Configurator import Configurator
from autosync.cpptranslator.patches.AddCSDetail import AddCSDetail
from autosync.cpptranslator.patches.AddOperand import AddOperand
from autosync.cpptranslator.patches.Assert import Assert
from autosync.cpptranslator.patches.BitCastStdArray import BitCastStdArray
from autosync.cpptranslator.patches.CheckDecoderStatus import CheckDecoderStatus
from autosync.cpptranslator.patches.ClassesDef import ClassesDef
from autosync.cpptranslator.patches.ConstMCInstParameter import ConstMCInstParameter
from autosync.cpptranslator.patches.ConstMCOperand import ConstMCOperand
from autosync.cpptranslator.patches.CppInitCast import CppInitCast
from autosync.cpptranslator.patches.CreateOperand0 import CreateOperand0
from autosync.cpptranslator.patches.CreateOperand1 import CreateOperand1
from autosync.cpptranslator.patches.Data import Data
from autosync.cpptranslator.patches.DeclarationInConditionClause import (
DeclarationInConditionalClause,
)
from autosync.cpptranslator.patches.DecodeInstruction import DecodeInstruction
from autosync.cpptranslator.patches.DecoderCast import DecoderCast
from autosync.cpptranslator.patches.DecoderParameter import DecoderParameter
from autosync.cpptranslator.patches.FallThrough import FallThrough
from autosync.cpptranslator.patches.FeatureBits import FeatureBits
from autosync.cpptranslator.patches.FeatureBitsDecl import FeatureBitsDecl
from autosync.cpptranslator.patches.FieldFromInstr import FieldFromInstr
from autosync.cpptranslator.patches.GetNumOperands import GetNumOperands
from autosync.cpptranslator.patches.GetOpcode import GetOpcode
from autosync.cpptranslator.patches.GetOperand import GetOperand
from autosync.cpptranslator.patches.GetOperandRegImm import GetOperandRegImm
from autosync.cpptranslator.patches.GetRegClass import GetRegClass
from autosync.cpptranslator.patches.GetRegFromClass import GetRegFromClass
from autosync.cpptranslator.patches.GetSubReg import GetSubReg
from autosync.cpptranslator.patches.Includes import Includes
from autosync.cpptranslator.patches.InlineToStaticInline import InlineToStaticInline
from autosync.cpptranslator.patches.IsOptionalDef import IsOptionalDef
from autosync.cpptranslator.patches.IsPredicate import IsPredicate
from autosync.cpptranslator.patches.IsRegImm import IsOperandRegImm
from autosync.cpptranslator.patches.LLVMFallThrough import LLVMFallThrough
from autosync.cpptranslator.patches.LLVMunreachable import LLVMUnreachable
from autosync.cpptranslator.patches.Override import Override
from autosync.cpptranslator.patches.MethodToFunctions import MethodToFunction
from autosync.cpptranslator.patches.MethodTypeQualifier import MethodTypeQualifier
from autosync.cpptranslator.patches.NamespaceAnon import NamespaceAnon
from autosync.cpptranslator.patches.NamespaceArch import NamespaceArch
from autosync.cpptranslator.patches.NamespaceLLVM import NamespaceLLVM
from autosync.cpptranslator.patches.OutStreamParam import OutStreamParam
from autosync.cpptranslator.patches.PredicateBlockFunctions import (
PredicateBlockFunctions,
)
from autosync.cpptranslator.patches.PrintAnnotation import PrintAnnotation
from autosync.cpptranslator.patches.PrintRegImmShift import PrintRegImmShift
from autosync.cpptranslator.patches.QualifiedIdentifier import QualifiedIdentifier
from autosync.cpptranslator.patches.ReferencesDecl import ReferencesDecl
from autosync.cpptranslator.patches.RegClassContains import RegClassContains
from autosync.cpptranslator.patches.SetOpcode import SetOpcode
from autosync.cpptranslator.patches.SignExtend import SignExtend
from autosync.cpptranslator.patches.Size import Size
from autosync.cpptranslator.patches.SizeAssignments import SizeAssignment
from autosync.cpptranslator.patches.STIArgument import STIArgument
from autosync.cpptranslator.patches.STIFeatureBits import STIFeatureBits
from autosync.cpptranslator.patches.STParameter import SubtargetInfoParam
from autosync.cpptranslator.patches.StreamOperation import StreamOperations
from autosync.cpptranslator.patches.TemplateDeclaration import TemplateDeclaration
from autosync.cpptranslator.patches.TemplateDefinition import TemplateDefinition
from autosync.cpptranslator.patches.TemplateParamDecl import TemplateParamDecl
from autosync.cpptranslator.patches.TemplateRefs import TemplateRefs
from autosync.cpptranslator.patches.UseMarkup import UseMarkup
from autosync.cpptranslator.patches.UsingDeclaration import UsingDeclaration
from autosync.cpptranslator.TemplateCollector import TemplateCollector
from autosync.Helper import get_path
class TestPatches(unittest.TestCase):
@classmethod
def setUpClass(cls):
configurator = Configurator("ARCH", get_path("{PATCHES_TEST_CONFIG}"))
cls.translator = CppTranslator.Translator(configurator, False)
cls.ts_cpp_lang = configurator.get_cpp_lang()
cls.parser = configurator.get_parser()
cls.template_collector = TemplateCollector(
configurator.get_parser(), configurator.get_cpp_lang(), [], []
)
def check_patching_result(self, patch, syntax, expected, filename=""):
if filename:
kwargs = {"filename": filename}
else:
kwargs = self.translator.get_patch_kwargs(patch)
query: Query = self.ts_cpp_lang.query(patch.get_search_pattern())
captures_bundle: [[(Node, str)]] = list()
for q in query.captures(self.parser.parse(syntax, keep_text=True).root_node):
if q[1] == patch.get_main_capture_name():
captures_bundle.append([q])
else:
captures_bundle[-1].append(q)
self.assertGreater(len(captures_bundle), 0)
for cb in captures_bundle:
self.assertEqual(patch.get_patch(cb, syntax, **kwargs), expected)
def test_addcsdetail(self):
patch = AddCSDetail(0, "ARCH")
syntax = b"int i = x; void printThumbLdrLabelOperand(MCInst *MI, unsigned OpNo, SStream *O) { int i = OpNo; }"
self.check_patching_result(
patch,
syntax,
b"void printThumbLdrLabelOperand(MCInst *MI, unsigned OpNo, SStream *O){ "
b"add_cs_detail(MI, ARCH_OP_GROUP_ThumbLdrLabelOperand, OpNo); "
b"int i = OpNo; "
b"}",
)
def test_addoperand(self):
patch = AddOperand(0)
syntax = b"MI.addOperand(OPERAND)"
self.check_patching_result(
patch,
syntax,
b"MCInst_addOperand2(MI, (OPERAND))",
)
def test_assert(self):
patch = Assert(0)
syntax = b"assert(0 == 0)"
self.check_patching_result(patch, syntax, b"")
def test_bitcaststdarray(self):
patch = BitCastStdArray(0)
syntax = b"auto S = bit_cast<std::array<int32_t, 2>>(Imm);"
self.check_patching_result(
patch,
syntax,
b"union {\n"
b" typeof(Imm) In;\n"
b" int32_t Out[ 2];\n"
b"} U_S;\n"
b"U_S.In = Imm"
b";\n"
b"int32_t *S = U_S.Out;",
)
def test_checkdecoderstatus(self):
patch = CheckDecoderStatus(0)
syntax = b"Check(S, functions())"
self.check_patching_result(patch, syntax, b"Check(&S, functions())")
def test_classesdef(self):
patch = ClassesDef(0)
syntax = b"""class AArch64Disassembler : public MCDisassembler {
std::unique_ptr<const MCInstrInfo> const MCII;
public:
AArch64Disassembler(const MCSubtargetInfo &STI, MCContext &Ctx,
MCInstrInfo const *MCII)
: MCDisassembler(STI, Ctx), MCII(MCII) {}
~AArch64Disassembler() override = default;
MCDisassembler::DecodeStatus
getInstruction(MCInst &Instr, uint64_t &Size, ArrayRef<uint8_t> Bytes,
uint64_t Address, raw_ostream &CStream) const override;
uint64_t suggestBytesToSkip(ArrayRef<uint8_t> Bytes,
uint64_t Address) const override;
};
"""
self.check_patching_result(
patch,
syntax,
b"MCDisassembler::DecodeStatus\n"
b" getInstruction(MCInst &Instr, uint64_t &Size, ArrayRef<uint8_t> Bytes,\n"
b" uint64_t Address, raw_ostream &CStream) const override;\n"
b"uint64_t suggestBytesToSkip(ArrayRef<uint8_t> Bytes,\n"
b" uint64_t Address) const override;\n",
)
def test_constmcinstparameter(self):
patch = ConstMCInstParameter(0)
syntax = b"void function(const MCInst *MI);"
expected = b"MCInst *MI"
self.check_patching_result(patch, syntax, expected)
def test_constmcoperand(self):
patch = ConstMCOperand(0)
syntax = b"const MCOperand op = { 0 };"
self.check_patching_result(patch, syntax, b"MCOperand op = { 0 };")
def test_cppinitcast(self):
patch = CppInitCast(0)
syntax = b"int(0x0000)"
self.check_patching_result(patch, syntax, b"((int)(0x0000))")
def test_createoperand0(self):
patch = CreateOperand0(0)
syntax = b"Inst.addOperand(MCOperand::createReg(REGISTER));"
self.check_patching_result(
patch,
syntax,
b"MCOperand_CreateReg0(Inst, (REGISTER))",
)
def test_createoperand1(self):
patch = CreateOperand1(0)
syntax = b"MI.insert(I, MCOperand::createReg(REGISTER));"
self.check_patching_result(
patch,
syntax,
b"MCInst_insert0(MI, I, MCOperand_CreateReg1(MI, (REGISTER)))",
)
def test_data(self):
patch = Data(0)
syntax = b"Bytes.data()"
self.check_patching_result(patch, syntax, b"Bytes")
def test_declarationinconditionclause(self):
patch = DeclarationInConditionalClause(0)
syntax = b"if (int i = 0) {}"
self.check_patching_result(patch, syntax, b"int i = 0;\nif (i)\n{}")
def test_decodeinstruction(self):
patch = DecodeInstruction(0)
syntax = (
b"decodeInstruction(DecoderTableThumb16, MI, Insn16, Address, this, STI);"
)
self.check_patching_result(
patch,
syntax,
b"decodeInstruction_2(DecoderTableThumb16, MI, Insn16, Address, NULL)",
)
syntax = b"decodeInstruction(Table[i], MI, Insn16, Address, this, STI);"
self.check_patching_result(
patch,
syntax,
b"decodeInstruction_2(Table[i], MI, Insn16, Address, NULL)",
)
def test_decodercast(self):
patch = DecoderCast(0)
syntax = (
b"const MCDisassembler *Dis = static_cast<const MCDisassembler*>(Decoder);"
)
self.check_patching_result(patch, syntax, b"")
def test_decoderparameter(self):
patch = DecoderParameter(0)
syntax = b"void function(const MCDisassembler *Decoder);"
self.check_patching_result(patch, syntax, b"const void *Decoder")
def test_fallthrough(self):
patch = FallThrough(0)
syntax = b"[[fallthrough]]"
self.check_patching_result(patch, syntax, b"// fall through")
def test_featurebitsdecl(self):
patch = FeatureBitsDecl(0)
syntax = b"const FeatureBitset &FeatureBits = ((const MCDisassembler*)Decoder)->getSubtargetInfo().getFeatureBits();"
self.check_patching_result(patch, syntax, b"")
def test_featurebits(self):
patch = FeatureBits(0, b"ARCH")
syntax = b"bool hasD32 = featureBits[ARCH::HasV8Ops];"
self.check_patching_result(
patch,
syntax,
b"ARCH_getFeatureBits(Inst->csh->mode, ARCH::HasV8Ops)",
)
def test_fieldfrominstr(self):
patch = FieldFromInstr(0)
syntax = b"unsigned Rm = fieldFromInstruction(Inst16, 0, 4);"
self.check_patching_result(
patch,
syntax,
b"fieldFromInstruction_2(Inst16, 0, 4)",
)
syntax = b"void function(MCInst *MI, unsigned Val) { unsigned Rm = fieldFromInstruction(Val, 0, 4); }"
self.check_patching_result(
patch,
syntax,
b"fieldFromInstruction_4(Val, 0, 4)",
)
def test_getnumoperands(self):
patch = GetNumOperands(0)
syntax = b"MI.getNumOperands();"
self.check_patching_result(patch, syntax, b"MCInst_getNumOperands(MI)")
def test_getopcode(self):
patch = GetOpcode(0)
syntax = b"Inst.getOpcode();"
self.check_patching_result(patch, syntax, b"MCInst_getOpcode(Inst)")
def test_getoperand(self):
patch = GetOperand(0)
syntax = b"MI.getOperand(0);"
self.check_patching_result(patch, syntax, b"MCInst_getOperand(MI, (0))")
def test_getoperandregimm(self):
patch = GetOperandRegImm(0)
syntax = b"OPERAND.getReg()"
self.check_patching_result(patch, syntax, b"MCOperand_getReg(OPERAND)")
def test_getregclass(self):
patch = GetRegClass(0)
syntax = b"MRI.getRegClass(RegClass);"
expected = b"MCRegisterInfo_getRegClass(Inst->MRI, RegClass)"
self.check_patching_result(patch, syntax, expected)
def test_getregfromclass(self):
patch = GetRegFromClass(0)
syntax = b"ARCHMCRegisterClasses[ARCH::FPR128RegClassID].getRegister(RegNo);"
self.check_patching_result(
patch,
syntax,
b"ARCHMCRegisterClasses[ARCH::FPR128RegClassID].RegsBegin[RegNo]",
)
def test_getsubreg(self):
patch = GetSubReg(0)
syntax = b"MRI.getSubReg(REGISTER);"
self.check_patching_result(
patch,
syntax,
b"MCRegisterInfo_getSubReg(Inst->MRI, REGISTER)",
)
def test_includes(self):
patch = Includes(0, "TEST_ARCH")
syntax = b'#include "some_llvm_header.h"'
self.check_patching_result(
patch,
syntax,
b"#include <stdio.h>\n"
b"#include <string.h>\n"
b"#include <stdlib.h>\n"
b"#include <capstone/platform.h>\n\n"
b"test_output",
"filename",
)
def test_inlinetostaticinline(self):
patch = InlineToStaticInline(0)
syntax = b"inline void FUNCTION() {}"
self.check_patching_result(
patch,
syntax,
b"static inline void FUNCTION() {}",
)
def test_isoptionaldef(self):
patch = IsOptionalDef(0)
syntax = b"OpInfo[i].isOptionalDef()"
self.check_patching_result(
patch,
syntax,
b"MCOperandInfo_isOptionalDef(&OpInfo[i])",
)
def test_ispredicate(self):
patch = IsPredicate(0)
syntax = b"OpInfo[i].isPredicate()"
self.check_patching_result(
patch,
syntax,
b"MCOperandInfo_isPredicate(&OpInfo[i])",
)
def test_isregimm(self):
patch = IsOperandRegImm(0)
syntax = b"OPERAND.isReg()"
self.check_patching_result(patch, syntax, b"MCOperand_isReg(OPERAND)")
def test_llvmfallthrough(self):
patch = LLVMFallThrough(0)
syntax = b"LLVM_FALLTHROUGH;"
self.check_patching_result(patch, syntax, b"")
def test_llvmunreachable(self):
patch = LLVMUnreachable(0)
syntax = b'llvm_unreachable("Error msg")'
self.check_patching_result(patch, syntax, b'assert(0 && "Error msg")')
def test_methodtofunctions(self):
patch = MethodToFunction(0)
syntax = b"void CLASS::METHOD_NAME(int a) {}"
self.check_patching_result(patch, syntax, b"METHOD_NAME(int a)")
def test_methodtypequalifier(self):
patch = MethodTypeQualifier(0)
syntax = b"void a_const_method() const {}"
self.check_patching_result(patch, syntax, b"a_const_method()")
def test_namespaceanon(self):
patch = NamespaceAnon(0)
syntax = b"namespace { int a = 0; }"
self.check_patching_result(patch, syntax, b" int a = 0; ")
def test_namespacearch(self):
patch = NamespaceArch(0)
syntax = b"namespace ArchSpecificNamespace { int a = 0; }"
self.check_patching_result(
patch,
syntax,
b"// CS namespace begin: ArchSpecificNamespace\n\n"
b"int a = 0;\n\n"
b"// CS namespace end: ArchSpecificNamespace\n\n",
)
def test_namespacellvm(self):
patch = NamespaceLLVM(0)
syntax = b"namespace llvm {int a = 0}"
self.check_patching_result(patch, syntax, b"int a = 0")
def test_outstreamparam(self):
patch = OutStreamParam(0)
syntax = b"void function(int a, raw_ostream &OS);"
self.check_patching_result(patch, syntax, b"(int a, SStream *OS)")
def test_override(self):
patch = Override(0)
syntax = b"class a { void function(int a) override; };"
self.check_patching_result(patch, syntax, b"function(int a)")
def test_predicateblockfunctions(self):
patch = PredicateBlockFunctions(0)
syntax = b"void function(MCInst *MI) { VPTBlock.instrInVPTBlock(); }"
self.check_patching_result(
patch,
syntax,
b"VPTBlock_instrInVPTBlock(&(MI->csh->VPTBlock))",
)
def test_predicateblockfunctions(self):
patch = PrintAnnotation(0)
syntax = b"printAnnotation();"
self.check_patching_result(patch, syntax, b"")
def test_printregimmshift(self):
patch = PrintRegImmShift(0)
syntax = b"printRegImmShift(0)"
self.check_patching_result(patch, syntax, b"printRegImmShift(Inst, 0)")
def test_qualifiedidentifier(self):
patch = QualifiedIdentifier(0)
syntax = b"NAMESPACE::ID"
self.check_patching_result(patch, syntax, b"NAMESPACE_ID")
def test_referencesdecl(self):
patch = ReferencesDecl(0)
syntax = b"int &Param = 0;"
self.check_patching_result(patch, syntax, b"*Param")
def test_regclasscontains(self):
patch = RegClassContains(0)
syntax = b"if (MRI.getRegClass(AArch64::GPR32RegClassID).contains(Reg)) {}"
self.check_patching_result(
patch,
syntax,
b"MCRegisterClass_contains(MRI.getRegClass(AArch64::GPR32RegClassID), Reg)",
)
def test_setopcode(self):
patch = SetOpcode(0)
syntax = b"Inst.setOpcode(0)"
self.check_patching_result(patch, syntax, b"MCInst_setOpcode(Inst, (0))")
def test_signextend(self):
patch = SignExtend(0)
syntax = b"SignExtend32<A>(0)"
self.check_patching_result(patch, syntax, b"SignExtend32((0), A)")
def test_size(self):
patch = Size(0)
syntax = b"Bytes.size()"
self.check_patching_result(patch, syntax, b"BytesLen")
def test_sizeassignments(self):
patch = SizeAssignment(0)
syntax = b"void function(int &Size) { Size = 0; }"
self.check_patching_result(patch, syntax, b"*Size = 0")
def test_stiargument(self):
patch = STIArgument(0)
syntax = b"printSomeOperand(MI, NUM, STI, NUM)"
self.check_patching_result(patch, syntax, b"(MI, NUM, NUM)")
def test_stifeaturebits(self):
patch = STIFeatureBits(0, b"ARCH")
syntax = b"STI.getFeatureBits()[ARCH::FLAG];"
self.check_patching_result(
patch,
syntax,
b"ARCH_getFeatureBits(Inst->csh->mode, ARCH::FLAG)",
)
def test_stifeaturebits(self):
patch = SubtargetInfoParam(0)
syntax = b"void function(MCSubtargetInfo &STI);"
self.check_patching_result(patch, syntax, b"()")
def test_streamoperation(self):
patch = StreamOperations(0)
syntax = b"{ OS << 'a'; }"
self.check_patching_result(patch, syntax, b'SStream_concat0(OS, "a");\n')
syntax = b'{ OS << "aaaa" << "bbbb" << "cccc"; }'
self.check_patching_result(
patch,
syntax,
b'SStream_concat(OS, "%s%s", "aaaa", "bbbb");\nSStream_concat0(OS, "cccc");',
)
syntax = b'{ OS << "aaaa" << \'a\' << "cccc"; }'
self.check_patching_result(
patch,
syntax,
b'SStream_concat(OS, "%s", "aaaa");\n'
b"SStream_concat1(OS, 'a');\n"
b'SStream_concat0(OS, "cccc");',
)
def test_templatedeclaration(self):
patch = TemplateDeclaration(0, self.template_collector)
syntax = b"template<A, B> void tfunction();"
self.check_patching_result(
patch,
syntax,
b"#define DECLARE_tfunction(A, B) \\\n"
b" void CONCAT(tfunction, CONCAT(A, B))();\n"
b"DECLARE_tfunction(int, int);\n"
b"DECLARE_tfunction(int, char);\n",
)
def test_templatedefinition(self):
patch = TemplateDefinition(0, self.template_collector)
syntax = b"template<A, B> void tfunction() {}"
self.check_patching_result(
patch,
syntax,
b"#define DEFINE_tfunction(A, B) \\\n"
b" void CONCAT(tfunction, CONCAT(A, B))(){}\n"
b"DEFINE_tfunction(int, int);\n"
b"DEFINE_tfunction(int, char);\n",
)
def test_templateparamdecl(self):
patch = TemplateParamDecl(0)
syntax = b"void function(ArrayRef<uint8_t> x);"
self.check_patching_result(patch, syntax, b"const uint8_t *x, size_t xLen")
def test_templaterefs(self):
patch = TemplateRefs(0)
syntax = b"TemplateFunction<A, B>();"
self.check_patching_result(
patch,
syntax,
b"CONCAT(TemplateFunction, CONCAT(A, B))",
)
def test_usemarkup(self):
patch = UseMarkup(0)
syntax = b"UseMarkup()"
self.check_patching_result(patch, syntax, b"getUseMarkup()")
def test_usingdecl(self):
patch = UsingDeclaration(0)
syntax = b"using namespace llvm;"
self.check_patching_result(patch, syntax, b"")

View File

@@ -0,0 +1,154 @@
{
"General": {
"diff_color_new": "green",
"diff_color_old": "light_blue",
"diff_color_saved": "yellow",
"diff_color_edited": "light_magenta",
"patch_editor": "vim",
"nodes_to_diff": [
{
"node_type": "function_definition",
"identifier_node_type": ["function_declarator/identifier"]
},{
"node_type": "preproc_function_def",
"identifier_node_type": ["identifier"]
},{
"node_type": "preproc_include",
"identifier_node_type": ["string_literal", "system_lib_string"]
},{
"node_type": "preproc_define",
"identifier_node_type": ["identifier"]
}
]
},
"ARM": {
"files_to_translate": [
{
"in": "{LLVM_ROOT}/llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp",
"out": "ARMDisassembler.c"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/ARM/MCTargetDesc/ARMInstPrinter.cpp",
"out": "ARMInstPrinter.c"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/ARM/MCTargetDesc/ARMInstPrinter.h",
"out": "ARMInstPrinter.h"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/ARM/MCTargetDesc/ARMAddressingModes.h",
"out": "ARMAddressingModes.h"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/ARM/Utils/ARMBaseInfo.cpp",
"out": "ARMBaseInfo.c"
}
],
"files_for_template_search": [
"{CPP_INC_OUT_DIR}/ARMGenDisassemblerTables.inc",
"{CPP_INC_OUT_DIR}/ARMGenAsmWriter.inc",
"{LLVM_ROOT}/llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp",
"{LLVM_ROOT}/llvm/lib/Target/ARM/MCTargetDesc/ARMInstPrinter.cpp"
],
"templates_with_arg_deduction": [],
"manually_edited_files": [
"{LLVM_ROOT}/llvm/lib/Target/ARM/Utils/ARMBaseInfo.h"
]
},
"PPC": {
"files_to_translate": [
{
"in": "{LLVM_ROOT}/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp",
"out": "PPCDisassembler.c"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp",
"out": "PPCInstPrinter.c"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.h",
"out": "PPCInstPrinter.h"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h",
"out": "PPCMCTargetDesc.h"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/PowerPC/MCTargetDesc/PPCPredicates.h",
"out": "PPCPredicates.h"
}
],
"files_for_template_search": [
"{CPP_INC_OUT_DIR}/PPCGenDisassemblerTables.inc",
"{LLVM_ROOT}/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp"
],
"templates_with_arg_deduction": [
"decodeRegisterClass"
],
"manually_edited_files": [
"{LLVM_ROOT}/llvm/lib/Target/PowerPC/PPCInstrInfo.h",
"{LLVM_ROOT}/llvm/lib/Target/PowerPC/PPCRegisterInfo.h"
]
},
"AArch64": {
"files_to_translate": [
{
"in": "{LLVM_ROOT}/llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp",
"out": "AArch64Disassembler.c"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp",
"out": "AArch64InstPrinter.c"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.h",
"out": "AArch64InstPrinter.h"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h",
"out": "AArch64AddressingModes.h"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.cpp",
"out": "AArch64BaseInfo.c"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h",
"out": "AArch64BaseInfo.h"
}
],
"files_for_template_search": [
"{CPP_INC_OUT_DIR}/AArch64GenDisassemblerTables.inc",
"{CPP_INC_OUT_DIR}/AArch64GenAsmWriter.inc",
"{LLVM_ROOT}/llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp",
"{LLVM_ROOT}/llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp"
],
"templates_with_arg_deduction": [
"printImmSVE",
"printAMIndexedWB",
"isSVECpyImm",
"isSVEAddSubImm",
"printVectorIndex"
],
"manually_edited_files": []
},
"Alpha": {
"files_to_translate": [],
"files_for_template_search": [
"{CPP_INC_OUT_DIR}/AlphaGenDisassemblerTables.inc",
"{CPP_INC_OUT_DIR}/AlphaGenAsmWriter.inc"
],
"templates_with_arg_deduction": [],
"manually_edited_files": []
},
"LoongArch": {
"files_to_translate": [
{
"in": "{LLVM_ROOT}/llvm/lib/Target/LoongArch/Disassembler/LoongArchDisassembler.cpp",
"out": "LoongArchDisassembler.c"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchInstPrinter.cpp",
"out": "LoongArchInstPrinter.c"
},{
"in": "{LLVM_ROOT}/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchInstPrinter.h",
"out": "LoongArchInstPrinter.h"
}
],
"files_for_template_search": [
"{CPP_INC_OUT_DIR}/LoongArchGenDisassemblerTables.inc",
"{CPP_INC_OUT_DIR}/LoongArchGenAsmWriter.inc",
"{LLVM_ROOT}/llvm/lib/Target/LoongArch/Disassembler/LoongArchDisassembler.cpp",
"{LLVM_ROOT}/llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchInstPrinter.cpp"
],
"templates_with_arg_deduction": [
],
"manually_edited_files": []
}
}

View File

@@ -0,0 +1,131 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import logging as log
import re
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import (
get_MCInst_var_name,
get_text,
template_param_list_to_dict,
)
from autosync.cpptranslator.patches.Patch import Patch
class AddCSDetail(Patch):
"""
Adds calls to `add_cs_detail()` for printOperand functions in <ARCH>InstPrinter.c
Patch void printThumbLdrLabelOperand(MCInst *MI, unsigned OpNo, SStream *O) {...}
to void printThumbLdrLabelOperand(MCInst *MI, unsigned OpNo, SStream *O) {
add_cs_detail(MI, ARM_OP_GROUP_ThumbLdrLabelOperand, ...);
...
}
"""
# TODO Simply checking for the passed types would be so much nicer.
# Parameter lists of printOperand() functions we need to add `add_cs_detail()` to.
# Spaces are removed, so we only need to check the letters.
valid_param_lists = [
b"(MCInst*MI,unsignedOpNum,SStream*O)", # Default printOperand parameters.
b"(MCInst*MI,unsignedOpNo,SStream*O)", # ARM - printComplexRotationOp / PPC default
b"(SStream*O,ARM_AM::ShiftOpcShOpc,unsignedShImm,boolgetUseMarkup())", # ARM - printRegImmShift
b"(MCInst*MI,unsignedOpNo,SStream*O,constchar*Modifier)", # PPC - printPredicateOperand
b"(MCInst*MI,uint64_tAddress,unsignedOpNo,SStream*O)", # PPC - printBranchOperand
]
def __init__(self, priority: int, arch: str):
super().__init__(priority)
self.arch = arch
self.apply_only_to = {
"files": [
"ARMInstPrinter.cpp",
"PPCInstPrinter.cpp",
"AArch64InstPrinter.cpp",
"LoongArchInstPrinter.cpp",
],
"archs": list(),
}
def get_search_pattern(self) -> str:
return (
"(function_definition"
" (_)+"
" (function_declarator"
' ((identifier) @fcn_id (#match? @fcn_id "print.*"))'
" ((parameter_list) @p_list)"
" )"
" (compound_statement) @comp_stmt"
") @print_op"
)
def get_main_capture_name(self) -> str:
return "print_op"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
fcn_def: Node = captures[0][0]
params = captures[2][0]
params = get_text(src, params.start_byte, params.end_byte)
if re.sub(b"[\n \t]", b"", params) not in self.valid_param_lists:
return get_text(src, fcn_def.start_byte, fcn_def.end_byte)
fcn_id = captures[1][0]
fcn_id = get_text(src, fcn_id.start_byte, fcn_id.end_byte)
add_cs_detail = self.get_add_cs_detail(src, fcn_def, fcn_id, params)
comp = captures[3][0]
comp = get_text(src, comp.start_byte, comp.end_byte)
return b"void " + fcn_id + params + b"{ " + add_cs_detail + comp.strip(b"{")
def get_add_cs_detail(
self, src: bytes, fcn_def: Node, fcn_id: bytes, params: bytes
) -> bytes:
op_group_enum = (
self.arch.encode("utf8") + b"_OP_GROUP_" + fcn_id[5:]
) # Remove "print" from function id
is_template = fcn_def.prev_sibling.type == "template_parameter_list"
op_num_var_name = (
b"OpNum"
if b"OpNum" in params
else (b"OpNo" if b"OpNo" in params else b"-.-")
)
if not is_template and op_num_var_name in params:
# Standard printOperand() parameters
mcinst_var = get_MCInst_var_name(src, fcn_def)
return (
b"add_cs_detail("
+ mcinst_var
+ b", "
+ op_group_enum
+ b", "
+ op_num_var_name
+ b");"
)
elif op_group_enum == b"ARM_OP_GROUP_RegImmShift":
return b"add_cs_detail(MI, " + op_group_enum + b", ShOpc, ShImm);"
elif is_template and op_num_var_name in params:
mcinst_var = get_MCInst_var_name(src, fcn_def)
templ_p = template_param_list_to_dict(fcn_def.prev_sibling)
cs_args = b""
for tp in templ_p:
op_group_enum = (
b"CONCAT(" + op_group_enum + b", " + tp["identifier"] + b")"
)
cs_args += b", " + tp["identifier"]
return (
b"add_cs_detail("
+ mcinst_var
+ b", "
+ op_group_enum
+ b", "
+ op_num_var_name
+ b" "
+ cs_args
+ b");"
)
log.fatal(f"Case {op_group_enum} not handled.")
exit(1)

View File

@@ -0,0 +1,41 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class AddOperand(Patch):
"""
Patch MI.addOperand(...)
to MCInst_addOperand(MI, ...)
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
q = (
"(call_expression "
" (field_expression"
" ((identifier) @inst_var)"
' ((field_identifier) @field_id_op (#eq? @field_id_op "addOperand"))'
" )"
" ((argument_list) @arg_list)"
") @add_operand"
)
return q
def get_main_capture_name(self) -> str:
return "add_operand"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Get instruction variable name (MI, Inst)
inst_var: Node = captures[1][0]
# Arguments of getOperand(...)
get_op_args = captures[3][0]
inst = get_text(src, inst_var.start_byte, inst_var.end_byte)
args = get_text(src, get_op_args.start_byte, get_op_args.end_byte)
return b"MCInst_addOperand2(" + inst + b", " + args + b")"

View File

@@ -0,0 +1,31 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Patch import Patch
class Assert(Patch):
"""
Patch Remove asserts
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(expression_statement"
" (call_expression"
' ((identifier) @id (#eq? @id "assert"))'
" (argument_list)"
" )"
") @assert"
)
def get_main_capture_name(self) -> str:
return "assert"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
return b""

View File

@@ -0,0 +1,84 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class BitCastStdArray(Patch):
"""
Patch auto S = bit_cast<std::array<int32_t, 2>>(Imm);
to union {
typeof(Imm) In;
int32_t Out[2];
} U_S;
U_S.In = Imm;
int32_t *S = U_S.Out;
MSVC doesn't support typeof so it has to be resolved manually.
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(declaration"
" (placeholder_type_specifier)"
" (init_declarator"
" (identifier) @arr_name"
" (call_expression"
" (template_function"
' ((identifier) @tfid (#eq @tfid "bit_cast"))'
" (template_argument_list"
' ((type_descriptor) @td (#match @td "std::array<.*>"))'
" )"
" )"
" (argument_list) @cast_target"
" )"
" )"
") @array_bit_cast"
)
def get_main_capture_name(self) -> str:
return "array_bit_cast"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
arr_name: bytes = captures[1][0].text
array_type: Node = captures[3][0]
cast_target: bytes = captures[4][0].text.strip(b"()")
array_templ_args: bytes = (
array_type.named_children[0]
.named_children[1]
.named_children[1]
.text.strip(b"<>")
)
arr_type = array_templ_args.split(b",")[0]
arr_len = array_templ_args.split(b",")[1]
return (
b"union {\n"
+ b" typeof("
+ cast_target
+ b") In;\n"
+ b" "
+ arr_type
+ b" Out["
+ arr_len
+ b"];\n"
+ b"} U_"
+ arr_name
+ b";\n"
+ b"U_"
+ arr_name
+ b".In = "
+ cast_target
+ b";\n"
+ arr_type
+ b" *"
+ arr_name
+ b" = U_"
+ arr_name
+ b".Out;"
)

View File

@@ -0,0 +1,37 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class CheckDecoderStatus(Patch):
"""
Patch "Check(S, ..."
to "Check(&S, ..."
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression"
' ((identifier) @fcn_name (#eq? @fcn_name "Check"))'
" ((argument_list) @arg_list)"
") @check_call"
)
def get_main_capture_name(self) -> str:
return "check_call"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
call_expr: Node = captures[0][0]
first_arg: Node = captures[2][0].named_children[0]
call_text = get_text(src, call_expr.start_byte, call_expr.end_byte)
first_arg_text = get_text(src, first_arg.start_byte, first_arg.end_byte)
return call_text.replace(first_arg_text + b",", b"&" + first_arg_text + b",")

View File

@@ -0,0 +1,32 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Patch import Patch
class ClassConstructorDef(Patch):
"""
Removes Class constructor definitions with a field initializer list.
Removes Class::Class(...) : ... {}
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
q = (
"(function_definition"
" (function_declarator)"
" (field_initializer_list)"
" (compound_statement)"
") @class_constructor"
)
return q
def get_main_capture_name(self) -> str:
return "class_constructor"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
return b""

View File

@@ -0,0 +1,50 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import logging as log
import re
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class ClassesDef(Patch):
"""
Patch Class definitions
to Removes class but extracts method declarations.
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return "(class_specifier (_)* ((field_declaration_list) @decl_list)*) @class_specifier"
def get_main_capture_name(self) -> str:
return "class_specifier"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
if len(captures) < 2:
# Forward class definition. Ignore it.
return b""
field_decl_list = captures[1][0]
functions = list()
for field_decl in field_decl_list.named_children:
if (
field_decl.type in "field_declaration"
and (
"function_declarator" in [t.type for t in field_decl.named_children]
)
) or field_decl.type == "template_declaration":
# Keep comments
sibling = field_decl.prev_named_sibling
while sibling.type == "comment":
functions.append(sibling)
sibling = sibling.prev_named_sibling
functions.append(field_decl)
fcn_decl_text = b""
for f in functions:
fcn_decl_text += get_text(src, f.start_byte, f.end_byte) + b"\n"
return fcn_decl_text

View File

@@ -0,0 +1,36 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class ConstMCInstParameter(Patch):
"""
Patch const MCInst *MI
to MCInst *MI
Removes the const qualifier from MCInst parameters because functions like MCInst_getOperand() ignore them anyway.
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(parameter_declaration"
" ((type_qualifier) @type_qualifier)"
' ((type_identifier) @type_id (#eq? @type_id "MCInst"))'
" (pointer_declarator) @ptr_decl"
") @mcinst_param"
)
def get_main_capture_name(self) -> str:
return "mcinst_param"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
inst = captures[3][0]
inst = get_text(src, inst.start_byte, inst.end_byte)
return b"MCInst " + inst

View File

@@ -0,0 +1,36 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class ConstMCOperand(Patch):
"""
Patch const MCOperand ...
to MCOperand
Removes the const qualifier from MCOperand declarations. They are ignored by the following functions.
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(declaration"
" (type_qualifier)"
' ((type_identifier) @tid (#eq? @tid "MCOperand"))'
" (init_declarator) @init_decl"
") @const_mcoperand"
)
def get_main_capture_name(self) -> str:
return "const_mcoperand"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
init_decl = captures[2][0]
init_decl = get_text(src, init_decl.start_byte, init_decl.end_byte)
return b"MCOperand " + init_decl + b";"

View File

@@ -0,0 +1,36 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class CppInitCast(Patch):
"""
Patch int(...)
to (int)(...)
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression"
" (primitive_type) @cast_type"
" (argument_list) @cast_target"
") @cast"
)
def get_main_capture_name(self) -> str:
return "cast"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
cast_type: Node = captures[1][0]
cast_target: Node = captures[2][0]
ctype = get_text(src, cast_type.start_byte, cast_type.end_byte)
ctarget = get_text(src, cast_target.start_byte, cast_target.end_byte)
return b"((" + ctype + b")" + ctarget + b")"

View File

@@ -0,0 +1,59 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import re
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class CreateOperand0(Patch):
"""
Patch Inst.addOperand(MCOperand::createReg(...));
to MCOperand_CreateReg0(...)
(and equivalent for CreateImm)
This is the `0` variant of the CS `CreateReg`/`CreateImm` functions. It is used if the
operand is added via `addOperand()`.
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression "
" (field_expression ((identifier) @inst_var"
' (field_identifier) @field_id (#eq? @field_id "addOperand")))'
" (argument_list (call_expression "
" (qualified_identifier ((_) (identifier) @create_fcn))"
" (argument_list) @arg_list"
" )"
" )"
") @create_operand0"
)
def get_main_capture_name(self) -> str:
return "create_operand0"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Get name of instruction variable
inst_var: Node = captures[1][0]
# Get 'create[Reg/Imm]'
op_create_fcn: Node = captures[3][0]
# Get arg list
op_create_args: Node = captures[4][0]
# Capstone spells the function with capital letter 'C' for whatever reason.
fcn = re.sub(
b"create",
b"Create",
get_text(src, op_create_fcn.start_byte, op_create_fcn.end_byte),
)
inst = get_text(src, inst_var.start_byte, inst_var.end_byte)
args = get_text(src, op_create_args.start_byte, op_create_args.end_byte)
if args[0] == b"(" and args[-1] == b")":
args = args
return b"MCOperand_" + fcn + b"0(" + inst + b", " + args + b")"

View File

@@ -0,0 +1,76 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import re
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_MCInst_var_name, get_text
from autosync.cpptranslator.patches.Patch import Patch
class CreateOperand1(Patch):
"""
Patch MI.insert(..., MCOperand::createReg(...));
to MCInst_insert0(..., MCOperand_createReg1(...));
(and equivalent for CreateImm)
This is the `1` variant of the CS `CreateReg`/`CreateImm` functions. It is used if the
operand is added via `insert()`.
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression "
" (field_expression ((identifier) @MC_id"
' ((field_identifier) @field_id (#match? @field_id "insert")))'
" )"
" (argument_list"
" ((identifier) @inst_var"
" (call_expression"
" (qualified_identifier ((_) (identifier) @create_fcn))"
" (argument_list) @arg_list)"
" )"
" )"
") @create_operand1"
)
def get_main_capture_name(self) -> str:
return "create_operand1"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Get instruction variable
inst_var: Node = captures[1][0]
# Get argument of .insert() call
insert_arg: Node = captures[3][0]
# Get 'create[Reg/Imm]'
op_create_fcn: Node = captures[4][0]
# CreateReg/Imm args
op_create_args: Node = captures[5][0]
insert_arg_t = get_text(src, insert_arg.start_byte, insert_arg.end_byte)
# Capstone spells the function with capital letter 'C' for whatever reason.
fcn = re.sub(
b"create",
b"Create",
get_text(src, op_create_fcn.start_byte, op_create_fcn.end_byte),
)
inst = get_text(src, inst_var.start_byte, inst_var.end_byte)
args = get_text(src, op_create_args.start_byte, op_create_args.end_byte)
return (
b"MCInst_insert0("
+ inst
+ b", "
+ insert_arg_t
+ b", "
+ b"MCOperand_"
+ fcn
+ b"1("
+ inst
+ b", "
+ args
+ b"))"
)

View File

@@ -0,0 +1,35 @@
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class Data(Patch):
"""
Patch Bytes.data()
to Bytes
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
q = (
"(call_expression "
" (field_expression"
" ((identifier) @data_var)"
' ((field_identifier) @field_id_op (#eq? @field_id_op "data"))'
" )"
" ((argument_list) @arg_list)"
") @data"
)
return q
def get_main_capture_name(self) -> str:
return "data"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Get operand variable name (Bytes, ArrayRef)
op_var: Node = captures[1][0]
op = get_text(src, op_var.start_byte, op_var.end_byte)
return op

View File

@@ -0,0 +1,50 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_capture_node, get_text
from autosync.cpptranslator.patches.Patch import Patch
class DeclarationInConditionalClause(Patch):
"""
Patch if (DECLARATION) ...
to DECLARATION
if (VAR) ...
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(if_statement"
" (condition_clause"
" (declaration"
" (_)"
" ((identifier) @id)"
" (_)"
" ) @decl"
" )"
" (_) @if_body"
") @condition_clause"
)
def get_main_capture_name(self) -> str:
return "condition_clause"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
cond = get_capture_node(captures, "condition_clause")
for nc in cond.named_children:
if nc.type == "if_statement":
# Skip if statements with else if
return get_text(src, cond.start_byte, cond.end_byte)
declaration = get_capture_node(captures, "decl")
identifier = get_capture_node(captures, "id")
if_body = get_capture_node(captures, "if_body")
identifier = get_text(src, identifier.start_byte, identifier.end_byte)
declaration = get_text(src, declaration.start_byte, declaration.end_byte)
if_body = get_text(src, if_body.start_byte, if_body.end_byte)
res = declaration + b";\nif (" + identifier + b")\n" + if_body
return res

View File

@@ -0,0 +1,53 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class DecodeInstruction(Patch):
"""
Patch decodeInstruction(..., this, STI)
to decodeInstruction_<instr_width>(..., NULL)
It also removes the arguments `this, STI`.
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression ("
' (identifier) @fcn_name (#eq? @fcn_name "decodeInstruction")'
" ((argument_list) @arg_list)"
")) @decode_instr"
)
def get_main_capture_name(self) -> str:
return "decode_instr"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
arg_list = captures[2][0]
args_text = get_text(src, arg_list.start_byte, arg_list.end_byte).strip(b"()")
table, mi_inst, opcode_var, address, this, sti = args_text.split(b",")
is_32bit = (
table[-2:].decode("utf8") == "32" or opcode_var[-2:].decode("utf8") == "32"
)
is_16bit = (
table[-2:].decode("utf8") == "16" or opcode_var[-2:].decode("utf8") == "16"
)
args = (
table + b", " + mi_inst + b", " + opcode_var + b", " + address + b", NULL"
)
if is_16bit and not is_32bit:
return b"decodeInstruction_2(" + args + b")"
elif is_32bit and not is_16bit:
return b"decodeInstruction_4(" + args + b")"
else:
# Cannot determine instruction width easily. Only update the calls arguments.
return b"decodeInstruction(" + args + b")"

View File

@@ -0,0 +1,38 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Patch import Patch
class DecoderCast(Patch):
"""
Patch Removes casts like `const MCDisassembler *Dis = static_cast<const MCDisassembler*>(Decoder);`
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(declaration"
" (type_qualifier)*"
' ((type_identifier) @tid (#eq? @tid "MCDisassembler"))'
" (init_declarator"
" (pointer_declarator)"
" (call_expression"
" (template_function)" # static_cast<const MCDisassembler>
" (argument_list"
' ((identifier) @id (#eq? @id "Decoder"))'
" )"
" )"
" )"
") @decoder_cast"
)
def get_main_capture_name(self) -> str:
return "decoder_cast"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
return b""

View File

@@ -0,0 +1,31 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Patch import Patch
class DecoderParameter(Patch):
"""
Patch const MCDisassembler *Decoder
to const void *Decoder
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(parameter_declaration"
" ((type_qualifier) @type_qualifier)"
' ((type_identifier) @type_id (#eq? @type_id "MCDisassembler"))'
" (pointer_declarator) @ptr_decl"
") @decoder_param"
)
def get_main_capture_name(self) -> str:
return "decoder_param"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
return b"const void *Decoder"

View File

@@ -0,0 +1,25 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Patch import Patch
class FallThrough(Patch):
"""
Patch [[fallthrough]]
to // fall through
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return '(attributed_statement) @attr (#match? @attr "fallthrough")'
def get_main_capture_name(self) -> str:
return "attr"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
return b"// fall through"

View File

@@ -0,0 +1,44 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_MCInst_var_name, get_text
from autosync.cpptranslator.patches.Patch import Patch
class FeatureBits(Patch):
"""
Patch featureBits[FLAG]
to ARCH_getFeatureBits(Inst->csh->mode, FLAG)
"""
def __init__(self, priority: int, arch: bytes):
self.arch = arch
super().__init__(priority)
def get_search_pattern(self) -> str:
# Search for featureBits usage.
return (
"(subscript_expression "
' ((identifier) @id (#match? @id "[fF]eatureBits"))'
" (subscript_argument_list ((qualified_identifier) @qid))"
") @feature_bits"
)
def get_main_capture_name(self) -> str:
return "feature_bits"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Get flag name of feature bit.
qualified_id: Node = captures[2][0]
flag = get_text(src, qualified_id.start_byte, qualified_id.end_byte)
mcinst_var_name = get_MCInst_var_name(src, qualified_id)
return (
self.arch
+ b"_getFeatureBits("
+ mcinst_var_name
+ b"->csh->mode, "
+ flag
+ b")"
)

View File

@@ -0,0 +1,30 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Patch import Patch
class FeatureBitsDecl(Patch):
"""
Patch ... featureBits = ...
to REMOVED
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
# Search for featureBits declarations.
return (
"(declaration (init_declarator (reference_declarator "
'((identifier) @id (#match? @id "[fF]eatureBits"))))) @feature_bits_decl'
)
def get_main_capture_name(self) -> str:
return "feature_bits_decl"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Remove declaration
return b""

View File

@@ -0,0 +1,71 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import logging as log
import re
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_function_params_of_node, get_text
from autosync.cpptranslator.patches.Patch import Patch
class FieldFromInstr(Patch):
"""
Patch fieldFromInstruction(...)
to fieldFromInstruction_<instr_width>(...)
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
# Search for fieldFromInstruction() calls.
return (
"(call_expression"
' ((identifier) @fcn_name (#eq? @fcn_name "fieldFromInstruction"))'
" (argument_list ((identifier) @first_arg) (_) (_))"
") @field_from_instr"
)
def get_main_capture_name(self) -> str:
return "field_from_instr"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
ffi_call: Node = captures[0][0]
ffi_first_arg: Node = captures[2][0]
param_list_caller = get_function_params_of_node(ffi_call)
ffi_first_arg_text = get_text(
src, ffi_first_arg.start_byte, ffi_first_arg.end_byte
).decode("utf8")
# Determine width of instruction by the variable name.
if ffi_first_arg_text[-2:] == "32":
inst_width = 4
elif ffi_first_arg_text[-2:] == "16":
inst_width = 2
else:
# Get the Val/Inst parameter.
# Its type determines the instruction width.
inst_param: Node = param_list_caller.named_children[1]
inst_param_text = get_text(src, inst_param.start_byte, inst_param.end_byte)
# Search for the 'Inst' parameter and determine its type
# and with it the width of the instruction.
inst_type = inst_param_text.split(b" ")[0]
if inst_type:
if inst_type in [b"unsigned", b"uint32_t"]:
inst_width = 4
elif inst_type in [b"uint16_t"]:
inst_width = 2
else:
log.fatal(f"Type {inst_type} no handled.")
exit(1)
else:
# Needs manual fix
return get_text(src, ffi_call.start_byte, ffi_call.end_byte)
return re.sub(
rb"fieldFromInstruction",
b"fieldFromInstruction_%d" % inst_width,
get_text(src, ffi_call.start_byte, ffi_call.end_byte),
)

View File

@@ -0,0 +1,38 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class GetNumOperands(Patch):
"""
Patch MI.getNumOperands()
to MCInst_getNumOperands(MI)
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
q = (
"(call_expression "
" (field_expression"
" ((identifier) @inst_var)"
' ((field_identifier) @field_id_op (#eq? @field_id_op "getNumOperands"))'
" )"
" ((argument_list) @arg_list)"
") @get_num_operands"
)
return q
def get_main_capture_name(self) -> str:
return "get_num_operands"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Get instruction variable name: MI, Inst etc.
inst_var: Node = captures[1][0]
inst = get_text(src, inst_var.start_byte, inst_var.end_byte)
return b"MCInst_getNumOperands(" + inst + b")"

View File

@@ -0,0 +1,44 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class GetOpcode(Patch):
"""
Patch Inst.getOpcode()
to MCInst_getOpcode(Inst)
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression"
" (field_expression ("
" ((identifier) @inst_var)"
' ((field_identifier) @field_id (#eq? @field_id "getOpcode")))'
" )"
" (argument_list) @arg_list"
") @get_opcode"
)
def get_main_capture_name(self) -> str:
return "get_opcode"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Instruction variable
inst_var: Node = captures[1][0]
arg_list: Node = captures[3][0]
inst = get_text(src, inst_var.start_byte, inst_var.end_byte)
args = get_text(src, arg_list.start_byte, arg_list.end_byte)
if args != b"()":
args = b", " + args
else:
args = b""
return b"MCInst_getOpcode(" + inst + args + b")"

View File

@@ -0,0 +1,41 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class GetOperand(Patch):
"""
Patch MI.getOperand(...)
to MCInst_getOperand(MI, ...)
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
q = (
"(call_expression "
" (field_expression"
" ((identifier) @inst_var)"
' ((field_identifier) @field_id_op (#eq? @field_id_op "getOperand"))'
" )"
" ((argument_list) @arg_list)"
") @get_operand"
)
return q
def get_main_capture_name(self) -> str:
return "get_operand"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Get instruction variable name (MI, Inst)
inst_var: Node = captures[1][0]
# Arguments of getOperand(...)
get_op_args = captures[3][0]
inst = get_text(src, inst_var.start_byte, inst_var.end_byte)
args = get_text(src, get_op_args.start_byte, get_op_args.end_byte)
return b"MCInst_getOperand(" + inst + b", " + args + b")"

View File

@@ -0,0 +1,44 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_capture_node, get_text
from autosync.cpptranslator.patches.Patch import Patch
class GetOperandRegImm(Patch):
"""
Patch OPERAND.getReg()
to MCOperand_getReg(OPERAND)
Same for isImm()
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
q = (
"(call_expression"
" (field_expression"
" ((_) @operand)"
' ((field_identifier) @field_id (#match? @field_id "get(Reg|Imm)"))'
" )"
' ((argument_list) @arg_list (#eq? @arg_list "()"))'
") @get_operand"
)
return q
def get_main_capture_name(self) -> str:
return "get_operand"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# The operand
operand: Node = get_capture_node(captures, "operand")
# 'getReg()/getImm()'
get_reg_imm = get_capture_node(captures, "field_id")
fcn = get_text(src, get_reg_imm.start_byte, get_reg_imm.end_byte)
op = get_text(src, operand.start_byte, operand.end_byte)
return b"MCOperand_" + fcn + b"(" + op + b")"

View File

@@ -0,0 +1,45 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import (
get_capture_node,
get_MCInst_var_name,
get_text,
)
from autosync.cpptranslator.patches.Patch import Patch
class GetRegClass(Patch):
"""
Patch MRI.getRegClass(...)
to MCRegisterInfo_getRegClass(Inst->MRI, ...)
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
q = (
"(call_expression"
" (field_expression"
" (_)"
' ((field_identifier) @field_id (#eq? @field_id "getRegClass"))'
" )"
" ((argument_list) @arg_list)"
") @get_reg_class"
)
return q
def get_main_capture_name(self) -> str:
return "get_reg_class"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
arg_list: Node = get_capture_node(captures, "arg_list")
args = get_text(src, arg_list.start_byte, arg_list.end_byte).strip(b"()")
mcinst_var = get_MCInst_var_name(
src, get_capture_node(captures, "get_reg_class")
)
res = b"MCRegisterInfo_getRegClass(" + mcinst_var + b"->MRI, " + args + b")"
return res

View File

@@ -0,0 +1,44 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_capture_node, get_text
from autosync.cpptranslator.patches.Patch import Patch
class GetRegFromClass(Patch):
"""
Patch <ARCH>MCRegisterClasses[<ARCH>::FPR128RegClassID].getRegister(RegNo);
to <ARCH>MCRegisterClasses[<ARCH>::FPR128RegClassID].RegsBegin[RegNo];
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
q = (
"(call_expression"
" (field_expression"
' ((_) @operand (#match? @operand ".+MCRegisterClasses.*"))'
' ((field_identifier) @field_id (#eq? @field_id "getRegister"))'
" )"
" (argument_list) @arg_list"
") @get_reg_from_class"
)
return q
def get_main_capture_name(self) -> str:
return "get_reg_from_class"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Table
table: Node = get_capture_node(captures, "operand")
# args
getter_args = get_capture_node(captures, "arg_list")
tbl = get_text(src, table.start_byte, table.end_byte)
args = get_text(src, getter_args.start_byte, getter_args.end_byte)
res = tbl + b".RegsBegin" + args.replace(b"(", b"[").replace(b")", b"]")
return res

View File

@@ -0,0 +1,41 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_MCInst_var_name, get_text
from autosync.cpptranslator.patches.Patch import Patch
class GetSubReg(Patch):
"""
Patch MRI.getSubReg(...);
to MCRegisterInfo_getSubReg(MI->MRI, ...)
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression"
" (field_expression ("
" (identifier)"
' ((field_identifier) @field_id (#eq? @field_id "getSubReg")))'
" )"
" (argument_list) @arg_list"
") @get_sub_reg"
)
def get_main_capture_name(self) -> str:
return "get_sub_reg"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Get arg list
op_create_args: Node = captures[2][0]
args = get_text(src, op_create_args.start_byte, op_create_args.end_byte).strip(
b"()"
)
mcinst_var_name = get_MCInst_var_name(src, op_create_args)
return b"MCRegisterInfo_getSubReg(" + mcinst_var_name + b"->MRI, " + args + b")"

View File

@@ -0,0 +1,230 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import logging as log
import re
from tree_sitter import Node
from autosync.Helper import fail_exit
def get_function_params_of_node(n: Node) -> Node:
"""
Returns for a given node the parameters of the function this node is a children from.
Or None if the node is not part of a function definition.
"""
fcn_def: Node = n
while fcn_def.type != "function_definition":
if fcn_def.parent == None:
# root node reached
return None
fcn_def = fcn_def.parent
# Get parameter list of the function definition
param_list: Node = None
for child in fcn_def.children:
if child.type == "function_declarator":
param_list = child.children[1]
break
if not param_list:
log.warning(f"Could not find the functions parameter list for {n.text}")
return param_list
def get_MCInst_var_name(src: bytes, n: Node) -> bytes:
"""Searches for the name of the parameter of type MCInst and returns it."""
params = get_function_params_of_node(n)
mcinst_var_name = b""
if params:
for p in params.named_children:
p_text = get_text(src, p.start_byte, p.end_byte)
if b"MCInst" not in p_text:
continue
mcinst_var_name = p_text.split((b"&" if b"&" in p_text else b"*"))[1]
break
if mcinst_var_name == b"":
log.debug("Could not find `MCInst` variable name. Defaulting to `Inst`.")
mcinst_var_name = b"Inst"
return mcinst_var_name
def template_param_list_to_dict(param_list: Node) -> [dict]:
if param_list.type != "template_parameter_list":
log.fatal(
f"Wrong node type '{param_list.type}'. Not 'template_parameter_list'."
)
exit(1)
pl = list()
for c in param_list.named_children:
if c.type == "type_parameter_declaration":
type_decl = {
"prim_type": False,
"type": "",
"identifier": c.children[1].text,
}
pl.append(type_decl)
else:
pl.append(parameter_declaration_to_dict(c))
return pl
def parameter_declaration_to_dict(param_decl: Node) -> dict:
if param_decl.type != "parameter_declaration":
log.fatal(
f"Wrong node type '{param_decl.type}'. Should be 'parameter_declaration'."
)
exit(1)
return {
"prim_type": param_decl.children[0].type == "primitive_type",
"type": param_decl.children[0].text,
"identifier": param_decl.children[1].text,
}
def get_text(src: bytes, start_byte: int, end_byte: int) -> bytes:
"""Workaround for https://github.com/tree-sitter/py-tree-sitter/issues/122"""
return src[start_byte:end_byte]
def namespace_enum(src: bytes, ns_id: bytes, enum: Node) -> bytes:
"""
Alters an enum in the way that it prepends the namespace id to every enum member.
And defines it as a type.
Example: naemspace_id = "ARM"
enum { X } -> typedef enum { ARM_X } ARM_enum
"""
enumerator_list: Node = None
type_id: Node = None
primary_tid_set = False
for c in enum.named_children:
if c.type == "enumerator_list":
enumerator_list = c
elif c.type == "type_identifier" and not primary_tid_set:
type_id = c
primary_tid_set = True
if not enumerator_list and not type_id:
log.fatal("Could not find enumerator_list or enum type_identifier.")
exit(1)
tid = get_text(src, type_id.start_byte, type_id.end_byte) if type_id else None
elist = get_text(src, enumerator_list.start_byte, enumerator_list.end_byte)
for e in enumerator_list.named_children:
if e.type == "enumerator":
enum_entry_text = get_text(src, e.start_byte, e.end_byte)
elist = elist.replace(enum_entry_text, ns_id + b"_" + enum_entry_text)
if tid:
new_enum = b"typedef enum " + tid + b" " + elist + b"\n " + ns_id + b"_" + tid
else:
new_enum = b"enum " + b" " + elist + b"\n"
return new_enum
def namespace_fcn_def(src: bytes, ns_id: bytes, fcn_def: Node) -> bytes:
fcn_id: Node = None
for c in fcn_def.named_children:
if c.type == "function_declarator":
fcn_id = c.named_children[0]
break
elif c.named_children and c.named_children[0].type == "function_declarator":
fcn_id = c.named_children[0].named_children[0]
break
if not fcn_id:
# Not a function declaration
return get_text(src, fcn_def.start_byte, fcn_def.end_byte)
fcn_id_text = get_text(src, fcn_id.start_byte, fcn_id.end_byte)
fcn_def_text = get_text(src, fcn_def.start_byte, fcn_def.end_byte)
res = re.sub(fcn_id_text, ns_id + b"_" + fcn_id_text, fcn_def_text)
return res
def namespace_struct(src: bytes, ns_id: bytes, struct: Node) -> bytes:
"""
Defines a struct as a type.
Example: naemspace_id = "ARM"
struct id { X } -> typedef struct { } ARM_id
"""
type_id: Node = None
field_list: Node = None
for c in struct.named_children:
if c.type == "type_identifier":
type_id = c
elif c.type == "base_class_clause":
# Inheritances should be fixed manually.
return get_text(src, struct.start_byte, struct.end_byte)
elif c.type == "field_declaration_list":
field_list = c
if not (type_id and field_list):
log.fatal("Could not find struct type_identifier or field declaration list.")
exit(1)
tid = get_text(src, type_id.start_byte, type_id.end_byte)
fields = get_text(src, field_list.start_byte, field_list.end_byte)
typed_struct = (
b"typedef struct " + tid + b" " + fields + b"\n " + ns_id + b"_" + tid
)
return typed_struct
def parse_function_capture(
capture: list[tuple[Node, str]], src: bytes
) -> tuple[list[bytes], bytes, bytes, bytes, bytes, bytes]:
"""
Parses the capture of a (template) function definition or declaration and returns the byte strings
for each node in the following order:
list[template_args], storage_class_identifiers, return_type_id, function_name, function_params, compound_stmt
If any of those is not present it returns an empty byte string for this position.
"""
temp_args = b""
st_class_ids = b""
ret_type = b""
func_name = b""
func_params = b""
comp_stmt = b""
for node, node_name in capture:
t = get_text(src, node.start_byte, node.end_byte)
match node.type:
case "template_declaration":
continue
case "template_parameter_list":
temp_args += t if not temp_args else b" " + t
case "storage_class_specifier":
st_class_ids += b" " + t
case "type_identifier" | "primitive_type":
ret_type += b" " + t
case "identifier":
func_name += t if not func_name else b" " + t
case "parameter_list":
func_params += t if not func_params else b" " + t
case "compound_statement":
comp_stmt += t if not comp_stmt else b" " + t
case _:
raise NotImplementedError(f"Node type {node.type} not handled.")
from autosync.cpptranslator.TemplateCollector import TemplateCollector
return (
TemplateCollector.templ_params_to_list(temp_args),
st_class_ids,
ret_type,
func_name,
func_params,
comp_stmt,
)
def get_capture_node(captures: [(Node, str)], name: str) -> Node:
"""
Returns the captured node with the given name.
"""
for c in captures:
if c[1] == name:
return c[0]
fail_exit(f'Capture "{name}" is not in captures:\n{captures}')

View File

@@ -0,0 +1,300 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import logging as log
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class Includes(Patch):
"""
Patch LLVM includes
to Capstone includes
"""
include_count = dict()
def __init__(self, priority: int, arch: str):
self.arch = arch
super().__init__(priority)
def get_search_pattern(self) -> str:
return "(preproc_include) @preproc_include"
def get_main_capture_name(self) -> str:
return "preproc_include"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
filename = kwargs["filename"]
if filename not in self.include_count:
self.include_count[filename] = 1
else:
self.include_count[filename] += 1
include_text = get_text(src, captures[0][0].start_byte, captures[0][0].end_byte)
# Special cases, which appear somewhere in the code.
if b"GenDisassemblerTables.inc" in include_text:
return (
b'#include "'
+ bytes(self.arch, "utf8")
+ b'GenDisassemblerTables.inc"\n\n'
)
elif b"GenAsmWriter.inc" in include_text:
return b'#include "' + bytes(self.arch, "utf8") + b'GenAsmWriter.inc"\n\n'
elif b"GenSystemOperands.inc" in include_text:
return (
b'#include "' + bytes(self.arch, "utf8") + b'GenSystemOperands.inc"\n\n'
)
if self.include_count[filename] > 1:
# Only the first include is replaced with all CS includes.
return b""
# All includes which belong to the source files top.
res = get_general_inc()
match self.arch:
case "ARM":
return res + get_ARM_includes(filename) + get_general_macros()
case "PPC":
return res + get_PPC_includes(filename) + get_general_macros()
case "AArch64":
return res + get_AArch64_includes(filename) + get_general_macros()
case "LoongArch":
return res + get_LoongArch_includes(filename) + get_general_macros()
case "TEST_ARCH":
return res + b"test_output"
case _:
log.fatal(f"Includes of {self.arch} not handled.")
exit(1)
def get_general_inc() -> bytes:
return (
b"#include <stdio.h>\n"
+ b"#include <string.h>\n"
+ b"#include <stdlib.h>\n"
+ b"#include <capstone/platform.h>\n\n"
)
def get_PPC_includes(filename: str) -> bytes:
match filename:
case "PPCDisassembler.cpp":
return (
b'#include "../../LEB128.h"\n'
+ b'#include "../../MCDisassembler.h"\n'
+ b'#include "../../MCFixedLenDisassembler.h"\n'
+ b'#include "../../MCInst.h"\n'
+ b'#include "../../MCInstrDesc.h"\n'
+ b'#include "../../MCInstPrinter.h"\n'
+ b'#include "../../MCRegisterInfo.h"\n'
+ b'#include "../../SStream.h"\n'
+ b'#include "../../utils.h"\n'
+ b'#include "PPCLinkage.h"\n'
+ b'#include "PPCMapping.h"\n'
+ b'#include "PPCMCTargetDesc.h"\n'
+ b'#include "PPCPredicates.h"\n\n'
)
case "PPCInstPrinter.cpp":
return (
b'#include "../../LEB128.h"\n'
+ b'#include "../../MCInst.h"\n'
+ b'#include "../../MCInstrDesc.h"\n'
+ b'#include "../../MCInstPrinter.h"\n'
+ b'#include "../../MCRegisterInfo.h"\n'
+ b'#include "PPCInstrInfo.h"\n'
+ b'#include "PPCInstPrinter.h"\n'
+ b'#include "PPCLinkage.h"\n'
+ b'#include "PPCMCTargetDesc.h"\n'
+ b'#include "PPCMapping.h"\n'
+ b'#include "PPCPredicates.h"\n\n'
+ b'#include "PPCRegisterInfo.h"\n\n'
)
case "PPCInstPrinter.h":
return (
b'#include "../../LEB128.h"\n'
+ b'#include "../../MCDisassembler.h"\n'
+ b'#include "../../MCInst.h"\n'
+ b'#include "../../MCInstrDesc.h"\n'
+ b'#include "../../MCRegisterInfo.h"\n'
+ b'#include "../../SStream.h"\n'
+ b'#include "PPCMCTargetDesc.h"\n\n'
)
case "PPCMCTargetDesc.h":
return (
b'#include "../../LEB128.h"\n'
+ b'#include "../../MathExtras.h"\n'
+ b'#include "../../MCInst.h"\n'
+ b'#include "../../MCInstrDesc.h"\n'
+ b'#include "../../MCRegisterInfo.h"\n'
)
log.fatal(f"No includes given for PPC source file: {filename}")
exit(1)
def get_ARM_includes(filename: str) -> bytes:
match filename:
case "ARMDisassembler.cpp":
return (
b'#include "../../LEB128.h"\n'
+ b'#include "../../MCDisassembler.h"\n'
+ b'#include "../../MCFixedLenDisassembler.h"\n'
+ b'#include "../../MCInst.h"\n'
+ b'#include "../../MCInstrDesc.h"\n'
+ b'#include "../../MCRegisterInfo.h"\n'
+ b'#include "../../MathExtras.h"\n'
+ b'#include "../../cs_priv.h"\n'
+ b'#include "../../utils.h"\n'
+ b'#include "ARMAddressingModes.h"\n'
+ b'#include "ARMBaseInfo.h"\n'
+ b'#include "ARMDisassemblerExtension.h"\n'
+ b'#include "ARMInstPrinter.h"\n'
+ b'#include "ARMLinkage.h"\n'
+ b'#include "ARMMapping.h"\n\n'
+ b"#define GET_INSTRINFO_MC_DESC\n"
+ b'#include "ARMGenInstrInfo.inc"\n\n'
)
case "ARMInstPrinter.cpp":
return (
b'#include "../../Mapping.h"\n'
+ b'#include "../../MCInst.h"\n'
+ b'#include "../../MCInstPrinter.h"\n'
+ b'#include "../../MCRegisterInfo.h"\n'
+ b'#include "../../SStream.h"\n'
+ b'#include "../../utils.h"\n'
+ b'#include "ARMAddressingModes.h"\n'
+ b'#include "ARMBaseInfo.h"\n'
+ b'#include "ARMDisassemblerExtension.h"\n'
+ b'#include "ARMInstPrinter.h"\n'
+ b'#include "ARMLinkage.h"\n'
+ b'#include "ARMMapping.h"\n\n'
+ b"#define GET_BANKEDREG_IMPL\n"
+ b'#include "ARMGenSystemRegister.inc"\n'
)
case "ARMInstPrinter.h":
return (
b'#include "ARMMapping.h"\n\n'
+ b'#include "../../MCInst.h"\n'
+ b'#include "../../SStream.h"\n'
+ b'#include "../../MCRegisterInfo.h"\n'
+ b'#include "../../MCInstPrinter.h"\n'
+ b'#include "../../utils.h"\n\n'
)
case "ARMBaseInfo.cpp":
return b'#include "ARMBaseInfo.h"\n\n'
case "ARMAddressingModes.h":
return b"#include <assert.h>\n" + b'#include "../../MathExtras.h"\n\n'
log.fatal(f"No includes given for ARM source file: {filename}")
exit(1)
def get_AArch64_includes(filename: str) -> bytes:
match filename:
case "AArch64Disassembler.cpp":
return (
b'#include "../../MCFixedLenDisassembler.h"\n'
+ b'#include "../../MCInst.h"\n'
+ b'#include "../../MCInstrDesc.h"\n'
+ b'#include "../../MCRegisterInfo.h"\n'
+ b'#include "../../LEB128.h"\n'
+ b'#include "../../MCDisassembler.h"\n'
+ b'#include "../../cs_priv.h"\n'
+ b'#include "../../utils.h"\n'
+ b'#include "AArch64AddressingModes.h"\n'
+ b'#include "AArch64BaseInfo.h"\n'
+ b'#include "AArch64DisassemblerExtension.h"\n'
+ b'#include "AArch64Linkage.h"\n'
+ b'#include "AArch64Mapping.h"\n\n'
+ b"#define GET_INSTRINFO_MC_DESC\n"
+ b'#include "AArch64GenInstrInfo.inc"\n\n'
+ b"#define GET_INSTRINFO_ENUM\n"
+ b'#include "AArch64GenInstrInfo.inc"\n\n'
)
case "AArch64InstPrinter.cpp":
return (
b'#include "../../MCInst.h"\n'
+ b'#include "../../MCInstPrinter.h"\n'
+ b'#include "../../MCRegisterInfo.h"\n'
+ b'#include "../../SStream.h"\n'
+ b'#include "../../utils.h"\n'
+ b'#include "AArch64AddressingModes.h"\n'
+ b'#include "AArch64BaseInfo.h"\n'
+ b'#include "AArch64DisassemblerExtension.h"\n'
+ b'#include "AArch64InstPrinter.h"\n'
+ b'#include "AArch64Linkage.h"\n'
+ b'#include "AArch64Mapping.h"\n\n'
+ b"#define GET_BANKEDREG_IMPL\n"
+ b'#include "AArch64GenSystemOperands.inc"\n\n'
+ b"#define CONCATs(a, b) CONCATS(a, b)\n"
+ b"#define CONCATS(a, b) a##b\n\n"
)
case "AArch64InstPrinter.h":
return (
b'#include "AArch64Mapping.h"\n\n'
+ b'#include "../../MCInst.h"\n'
+ b'#include "../../MCRegisterInfo.h"\n'
+ b'#include "../../MCInstPrinter.h"\n'
+ b'#include "../../SStream.h"\n'
+ b'#include "../../utils.h"\n\n'
)
case "AArch64BaseInfo.cpp":
return b'#include "AArch64BaseInfo.h"\n\n'
case "AArch64BaseInfo.h":
return (
b'#include "../../utils.h"\n'
+ b"#define GET_REGINFO_ENUM\n"
+ b'#include "AArch64GenRegisterInfo.inc"\n\n'
+ b"#define GET_INSTRINFO_ENUM\n"
+ b'#include "AArch64GenInstrInfo.inc"\n\n'
)
case "AArch64AddressingModes.h":
return b"#include <assert.h>\n" + b'#include "../../MathExtras.h"\n\n'
log.fatal(f"No includes given for AArch64 source file: {filename}")
exit(1)
def get_LoongArch_includes(filename: str) -> bytes:
match filename:
case "LoongArchDisassembler.cpp":
return (
b'#include "../../MCInst.h"\n'
+ b'#include "../../MathExtras.h"\n'
+ b'#include "../../MCInstPrinter.h"\n'
+ b'#include "../../MCDisassembler.h"\n'
+ b'#include "../../MCFixedLenDisassembler.h"\n'
+ b'#include "../../cs_priv.h"\n'
+ b'#include "../../utils.h"\n'
+ b'#include "LoongArchDisassemblerExtension.h"\n'
+ b"#define GET_SUBTARGETINFO_ENUM\n"
+ b'#include "LoongArchGenSubtargetInfo.inc"\n\n'
+ b"#define GET_INSTRINFO_ENUM\n"
+ b'#include "LoongArchGenInstrInfo.inc"\n\n'
+ b"#define GET_REGINFO_ENUM\n"
+ b'#include "LoongArchGenRegisterInfo.inc"\n\n'
)
case "LoongArchInstPrinter.cpp":
return (
b'#include "LoongArchMapping.h"\n'
+ b'#include "LoongArchInstPrinter.h"\n\n'
+ b"#define GET_SUBTARGETINFO_ENUM\n"
+ b'#include "LoongArchGenSubtargetInfo.inc"\n\n'
+ b"#define GET_INSTRINFO_ENUM\n"
+ b'#include "LoongArchGenInstrInfo.inc"\n\n'
+ b"#define GET_REGINFO_ENUM\n"
+ b'#include "LoongArchGenRegisterInfo.inc"\n\n'
)
case "LoongArchInstPrinter.h":
return (
b'#include "../../MCInstPrinter.h"\n' + b'#include "../../cs_priv.h"\n'
)
log.fatal(f"No includes given for LoongArch source file: {filename}")
exit(1)
def get_general_macros():
return (
b"#define CONCAT(a, b) CONCAT_(a, b)\n" b"#define CONCAT_(a, b) a ## _ ## b\n"
)

View File

@@ -0,0 +1,38 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class InlineToStaticInline(Patch):
"""
Removes the qualified identifier of the class from method definitions.
Translating them to functions.
Patch inline void FUNCTION(...) {...}
to static inline void FUNCTION(...) {...}
"""
def __init__(self, priority: int):
super().__init__(priority)
self.apply_only_to = {"files": ["ARMAddressingModes.h"], "archs": list()}
def get_search_pattern(self) -> str:
return (
"(function_definition"
' ((storage_class_specifier) @scs (#eq? @scs "inline"))'
" (_)+"
") @inline_def"
)
def get_main_capture_name(self) -> str:
return "inline_def"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
inline_def = captures[0][0]
inline_def = get_text(src, inline_def.start_byte, inline_def.end_byte)
return b"static " + inline_def

View File

@@ -0,0 +1,40 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class IsOptionalDef(Patch):
"""
Patch OpInfo[i].isOptionalDef()
to MCOperandInfo_isOptionalDef(&OpInfo[i])
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression"
" (field_expression"
" (subscript_expression"
" ((identifier) @op_info_var)"
" ((_) @index)"
" )"
' ((field_identifier) @fid (#eq? @fid "isOptionalDef"))'
" )"
") @is_optional_def"
)
def get_main_capture_name(self) -> str:
return "is_optional_def"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
op_info_var = captures[1][0]
index = captures[2][0]
op_info_var = get_text(src, op_info_var.start_byte, op_info_var.end_byte)
index = get_text(src, index.start_byte, index.end_byte)
return b"MCOperandInfo_isOptionalDef(&" + op_info_var + index + b")"

View File

@@ -0,0 +1,40 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class IsPredicate(Patch):
"""
Patch OpInfo[i].isPredicate()
to MCOperandInfo_isPredicate(&OpInfo[i])
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression"
" (field_expression"
" (subscript_expression"
" ((identifier) @op_info_var)"
" ((_) @index)"
" )"
' ((field_identifier) @fid (#eq? @fid "isPredicate"))'
" )"
") @is_predicate"
)
def get_main_capture_name(self) -> str:
return "is_predicate"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
op_info_var = captures[1][0]
index = captures[2][0]
op_info_var = get_text(src, op_info_var.start_byte, op_info_var.end_byte)
index = get_text(src, index.start_byte, index.end_byte)
return b"MCOperandInfo_isPredicate(&" + op_info_var + index + b")"

View File

@@ -0,0 +1,44 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class IsOperandRegImm(Patch):
"""
Patch OPERAND.isReg()
to MCOperand_isReg(OPERAND)
Same for isImm()
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
q = (
"(call_expression"
" (field_expression"
" ((_) @operand)"
' ((field_identifier) @field_id (#match? @field_id "is(Reg|Imm)"))'
" )"
" (argument_list)"
") @is_operand"
)
return q
def get_main_capture_name(self) -> str:
return "is_operand"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# The operand
operand: Node = captures[1][0]
# 'isReg()/isImm()'
get_reg_imm = captures[2][0]
fcn = get_text(src, get_reg_imm.start_byte, get_reg_imm.end_byte)
op = get_text(src, operand.start_byte, operand.end_byte)
return b"MCOperand_" + fcn + b"(" + op + b")"

View File

@@ -0,0 +1,28 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Patch import Patch
class LLVMFallThrough(Patch):
"""
Patch Remove LLVM_FALLTHROUGH
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(expression_statement"
' ((identifier) @id (#eq? @id "LLVM_FALLTHROUGH"))'
") @llvm_fall_through"
)
def get_main_capture_name(self) -> str:
return "llvm_fall_through"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
return b""

View File

@@ -0,0 +1,34 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class LLVMUnreachable(Patch):
"""
Patch llvm_unreachable("Error msg")
to assert(0 && "Error msg")
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression ("
' (identifier) @fcn_name (#eq? @fcn_name "llvm_unreachable")'
" (argument_list) @err_msg"
")) @llvm_unreachable"
)
def get_main_capture_name(self) -> str:
return "llvm_unreachable"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
err_msg = captures[2][0]
err_msg = get_text(src, err_msg.start_byte, err_msg.end_byte).strip(b"()")
res = b"assert(0 && " + err_msg + b")"
return res

View File

@@ -0,0 +1,45 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class MethodToFunction(Patch):
"""
Removes the qualified identifier of the class from method definitions.
Translating them to functions.
Patch void CLASS::METHOD_NAME(...) {...}
to void METHOD_NAME(...) {...}
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(function_declarator"
" (qualified_identifier"
" (namespace_identifier)"
" (identifier) @method_name"
" )"
" (parameter_list) @param_list"
") @method_def"
)
def get_main_capture_name(self) -> str:
return "method_def"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
name = captures[1][0]
parameter_list = captures[2][0]
name = get_text(src, name.start_byte, name.end_byte)
parameter_list = get_text(
src, parameter_list.start_byte, parameter_list.end_byte
)
res = name + parameter_list
return res

View File

@@ -0,0 +1,40 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class MethodTypeQualifier(Patch):
"""
Patch Removes type qualifiers like "const" etc. from methods.
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(function_declarator"
" (["
" (qualified_identifier)"
" (identifier)"
" ]) @id"
" (parameter_list) @param_list"
" (type_qualifier)"
")"
"@method_type_qualifier"
)
def get_main_capture_name(self) -> str:
return "method_type_qualifier"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
identifier = captures[1][0]
parameter_list = captures[2][0]
identifier = get_text(src, identifier.start_byte, identifier.end_byte)
p_list = get_text(src, parameter_list.start_byte, parameter_list.end_byte)
res = identifier + p_list
return res

View File

@@ -0,0 +1,34 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class NamespaceAnon(Patch):
"""
Patch namespace {CONTENT}
to CONTENT
Only for anonymous or llvm namespaces
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(namespace_definition"
" (declaration_list) @decl_list"
") @namespace_def"
)
def get_main_capture_name(self) -> str:
return "namespace_def"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
decl_list = captures[1][0]
dl = get_text(src, decl_list.start_byte, decl_list.end_byte).strip(b"{}")
return dl

View File

@@ -0,0 +1,67 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import (
get_text,
namespace_enum,
namespace_fcn_def,
namespace_struct,
)
from autosync.cpptranslator.patches.Patch import Patch
class NamespaceArch(Patch):
"""
Patch namespace ArchSpecificNamespace {CONTENT}
to CONTENT
Patches namespaces specific to architecture. This needs to patch enums and functions within this namespace.
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(namespace_definition"
" (namespace_identifier)"
" (declaration_list) @decl_list"
") @namespace_def"
)
def get_main_capture_name(self) -> str:
return "namespace_def"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
namespace = captures[0][0]
decl_list = captures[1][0]
namespace_id = get_text(
src,
namespace.named_children[0].start_byte,
namespace.named_children[0].end_byte,
)
# We need to prepend the namespace id to all enum members, function declarators and struct types.
# Because in the generated files they are accessed via NAMESPACE::X which becomes NAMESPACE_X.
res = b""
for d in decl_list.named_children:
match d.type:
case "enum_specifier":
res += namespace_enum(src, namespace_id, d) + b";\n\n"
case "declaration" | "function_definition":
res += namespace_fcn_def(src, namespace_id, d) + b"\n\n"
case "struct_specifier":
res += namespace_struct(src, namespace_id, d) + b";\n\n"
case _:
res += get_text(src, d.start_byte, d.end_byte) + b"\n"
return (
b"// CS namespace begin: "
+ namespace_id
+ b"\n\n"
+ res
+ b"// CS namespace end: "
+ namespace_id
+ b"\n\n"
)

View File

@@ -0,0 +1,35 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class NamespaceLLVM(Patch):
"""
Patch namespace llvm {CONTENT}
to CONTENT
Only for anonymous or llvm namespaces
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(namespace_definition"
' (namespace_identifier) @id (#eq? @id "llvm")'
" (declaration_list) @decl_list"
") @namespace_def"
)
def get_main_capture_name(self) -> str:
return "namespace_def"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
decl_list = captures[2][0]
dl = get_text(src, decl_list.start_byte, decl_list.end_byte).strip(b"{}")
return dl

View File

@@ -0,0 +1,44 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class OutStreamParam(Patch):
"""
Patches the parameter list only:
Patch void function(int a, raw_ostream &OS)
to void function(int a, SStream *OS)
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(parameter_list"
" (_)*"
" (parameter_declaration"
' ((type_identifier) @tid (#eq? @tid "raw_ostream"))'
" (_)"
" )"
" (_)*"
") @ostream_param"
)
def get_main_capture_name(self) -> str:
return "ostream_param"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
param_list = list()
for param in captures[0][0].named_children:
p_text = get_text(src, param.start_byte, param.end_byte)
if b"raw_ostream" in p_text:
p_text = p_text.replace(b"raw_ostream", b"SStream").replace(b"&", b"*")
param_list.append(p_text)
res = b"(" + b", ".join(param_list) + b")"
return res

View File

@@ -0,0 +1,36 @@
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class Override(Patch):
"""
Patch function(args) override
to function(args)
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
q = (
"(function_declarator "
" ((field_identifier) @declarator)"
" ((parameter_list) @parameter_list)"
' ((virtual_specifier) @specifier (#eq? @specifier "override"))'
") @override"
)
return q
def get_main_capture_name(self) -> str:
return "override"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
# Get function name
declarator: Node = captures[1][0]
# Get parameter list
parameter_list: Node = captures[2][0]
decl = get_text(src, declarator.start_byte, declarator.end_byte)
params = get_text(src, parameter_list.start_byte, parameter_list.end_byte)
return decl + params

View File

@@ -0,0 +1,57 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import logging as log
from tree_sitter import Node
class Patch:
priority: int = None
# List of filenames and architectures this patch applies to or not.
# Order of testing:
# 1. apply_only_to.archs
# 2. apply_only_to.files
# 3. do_not_apply.archs
# 4. do_not_apply.files
# Contains the _in_ filenames and architectures this patch should be applied to. Empty list means all.
apply_only_to = {"files": list(), "archs": list()}
# Contains the _in_ filenames and architectures this patch should NOT be applied to.
do_not_apply = {"files": list(), "archs": list()}
def __init__(self, priority: int = 0):
self.priority = priority
def get_search_pattern(self) -> str:
"""
Returns a search pattern for the syntax tree of the C++ file.
The search pattern must be formed according to:
https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries
Also, each pattern needs to be assigned a name in order to work.
See: https://github.com/tree-sitter/py-tree-sitter/issues/77
:return: The search pattern which matches a part in the syntax tree which will be patched.
"""
log.fatal("Method must be overloaded.")
exit(1)
def get_main_capture_name(self) -> str:
"""
:return: The name of the capture which matches the complete syntax to be patched.
"""
log.fatal("Method must be overloaded.")
exit(1)
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
"""
Patches the given subtree accordingly and returns the patch as string.
:param src: The source code currently patched.
:param captures: The subtree and its name which needs to be patched.
:param **kwargs: Additional arguments the Patch might need.
:return: The patched version of the code.
"""
log.fatal("Method must be overloaded.")
exit(1)

View File

@@ -0,0 +1,48 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_MCInst_var_name, get_text
from autosync.cpptranslator.patches.Patch import Patch
class PredicateBlockFunctions(Patch):
"""
Patch VPTBlock.instrInVPTBlock()
to VPTBlock_instrInVPTBlock(&(MI->csh->VPTBlock))
And other functions of VPTBlock and ITBlock
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression "
" (field_expression"
' ((identifier) @block_var (#match? @block_var "[VI][PT]T?Block"))'
" ((field_identifier) @field_id)"
" )"
" ((argument_list) @arg_list)"
") @block_fcns"
)
def get_main_capture_name(self) -> str:
return "block_fcns"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
block_var = captures[1][0]
fcn_id = captures[2][0]
args = captures[3][0]
block_var_text = get_text(src, block_var.start_byte, block_var.end_byte)
fcn_id_text = get_text(src, fcn_id.start_byte, fcn_id.end_byte)
args_text = get_text(src, args.start_byte, args.end_byte)
mcinst_var: bytes = get_MCInst_var_name(src, block_var)
a = b"&(" + mcinst_var + b"->csh->" + block_var_text + b")"
args_text = args_text.strip(b"()")
if args_text:
a += b"," + args_text
return block_var_text + b"_" + fcn_id_text + b"(" + a + b")"

View File

@@ -0,0 +1,29 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Patch import Patch
class PrintAnnotation(Patch):
"""
Removes printAnnotation(...) calls.
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"(call_expression ("
' (identifier) @fcn_name (#eq? @fcn_name "printAnnotation")'
" (argument_list)"
")) @print_annotation"
)
def get_main_capture_name(self) -> str:
return "print_annotation"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
return b""

View File

@@ -0,0 +1,36 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_MCInst_var_name, get_text
from autosync.cpptranslator.patches.Patch import Patch
class PrintRegImmShift(Patch):
"""
Patch printRegImmShift(...)
to printRegImmShift(MI, ...)
"""
def __init__(self, priority: int):
super().__init__(priority)
self.apply_only_to = {"files": ["ARMInstPrinter.cpp"], "archs": list()}
def get_search_pattern(self) -> str:
return (
"(call_expression ("
' (identifier) @fcn_name (#eq? @fcn_name "printRegImmShift")'
" ((argument_list) @arg_list)"
")) @print_call"
)
def get_main_capture_name(self) -> str:
return "print_call"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
call: Node = captures[0][0]
mcinst_var = get_MCInst_var_name(src, call)
params = captures[2][0]
params = get_text(src, params.start_byte, params.end_byte)
return b"printRegImmShift(" + mcinst_var + b", " + params.strip(b"(")

View File

@@ -0,0 +1,36 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class QualifiedIdentifier(Patch):
"""
Patch NAMESPACE::ID
to NAMESPACE_ID
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return "(qualified_identifier) @qualified_id"
def get_main_capture_name(self) -> str:
return "qualified_id"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
if len(captures[0][0].named_children) > 1:
identifier = captures[0][0].named_children[1]
identifier = get_text(src, identifier.start_byte, identifier.end_byte)
namespace = captures[0][0].named_children[0]
namespace = get_text(src, namespace.start_byte, namespace.end_byte)
else:
# The namespace can be omitted. E.g. std::transform(..., ::tolower)
namespace = b""
identifier = captures[0][0].named_children[0]
identifier = get_text(src, identifier.start_byte, identifier.end_byte)
return namespace + b"_" + identifier

View File

@@ -0,0 +1,39 @@
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
import re
from tree_sitter import Node
from autosync.cpptranslator.patches.Helper import get_text
from autosync.cpptranslator.patches.Patch import Patch
class ReferencesDecl(Patch):
"""
Patch TYPE &Param
to TYPE *Param
Param is optional
"""
def __init__(self, priority: int):
super().__init__(priority)
def get_search_pattern(self) -> str:
return (
"["
"(reference_declarator)"
"(type_identifier) (abstract_reference_declarator)"
"] @reference_decl"
)
def get_main_capture_name(self) -> str:
return "reference_decl"
def get_patch(self, captures: [(Node, str)], src: bytes, **kwargs) -> bytes:
ref_decl: Node = captures[0][0]
ref_decl_text = get_text(src, ref_decl.start_byte, ref_decl.end_byte)
res = re.sub(rb"&", b"*", ref_decl_text)
return res

Some files were not shown because too many files have changed in this diff Show More