-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64 detection via asm #186
Conversation
b3437a3
to
f64053c
Compare
1a328fb
to
f79425c
Compare
It's a nice addition, thx for the PR. |
will the testing architecture be changed? since there are a lot of tests for x86. |
575015e
to
84b7fca
Compare
No only the code layout for now. It should be a non functional change. |
I take that back, I might have to split the tests as well... I'll make sure it's the less disruptive possible. |
f5651ea
to
9d27eeb
Compare
c9f223c
to
7a0d427
Compare
inline static uint64_t ExtractBitRange(uint64_t reg, uint64_t msb, | ||
uint64_t lsb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed overflow when occurs extract bit range from ID_AA64_*
fb14bc9
to
ffd4b22
Compare
b9c637b
to
cb22995
Compare
@toor1245 does it work on aarch64-darwin by any chance? I tried running it, it hits an "illegal instruction". |
Hi, thanks for the reply! I think no, for now. Right now I have no opportunity to test, and there is no CI to test yet. An issue has been created to add CI jobs for ios, #197, as a temporary solution can be done through sysctlbyname, but I would not like this since there will be a dependence on the OS, and lowers the possibility for further improving the information of a given architecture |
Btw, if this gets merged, are we going to drop all the OS APIs (e.g., |
for Linux / Adnroid, something that has been working for a long time and does not break the public api, I think definitely not now, but on other OS this api may become truncated, what's not good |
Since a new pull request was introduced, I converted this pull request to draft until it was merged #204 to the master branch |
Thx for the work here. I'm keeping an eye on the PR (even though I haven't commented lately). Just to make sure we're on the same page, there should be only one header for aarch64 eventually. I assume you have the two headers in parallel so to check that the results are consistent? |
@gchatelet at the moment, this PR has only one I think we сan to create small pull-requests, since PR too big and hard to review, and leave this PR as a sample. I see that we can create at least:
And as I mentioned above, it will be good if we create a dev branch for implementation. What do you think? |
UpdateI have completed research on aarch64 support for Windows. Access to system registers is limited from EL0 and we can't get information from user mode, to solve this problem we have only one way, we have to write software driver in kernel mode, where we have more access, and make client communicate with driver. here is the result of the test driver of load and unload to read processor info we can use in lib we have to add something like this: HANDLE hDevice = CreateFile(L"\\\\.\\CpuFeatures", GENERIC_READ,
0, NULL, OPEN_EXISTING, 0, NULL);
if (hDevice == INVALID_HANDLE_VALUE)
// handle hDevice == INVALID_HANDLE_VALUE
EL1Info info;
DWORD bytes;
BOOL ok = ReadFile(hDevice, &info, sizeof(EL1Info), &bytes, NULL);
if (!ok)
// handle !ok
if (bytes != sizeof(EL1Info))
// handle bytes != sizeof(EL1Info)
CloseHandle(hDevice); Linux allows reading information from EL1 for the following system registers:
@gchatelet, @Mizux, what do you think? |
here is a sketch of the code, but don't mind the quality and reliability of this code as i have invested and experimented
#include <ntddk.h>
void GetCpuInfo(PVOID Buffer);
#if defined(_M_ARM64)
#include "CpuInfoBase.h"
#include "CpuInfoAarch64.h"
#include "arm64_neon.h"
// https://reviews.llvm.org/D53115
#define SYS_REG(op0, op1, crn, crm, op2) \
( ((op0 & 1) << 14) | \
((op1 & 7) << 11) | \
((crn & 15) << 7) | \
((crm & 15) << 3) | \
((op2 & 7) << 0) )
#define SYS_MIDR_EL1 SYS_REG(3, 0, 0, 0, 0)
void GetCpuInfo(PVOID Buffer) {
UINT64 midr = _ReadStatusReg(SYS_MIDR_EL1);
memcpy((PUINT64)Buffer, &midr, sizeof(UINT64));
}
#endif // defined(_M_ARM64)
#include <ntddk.h>
#include "CpuInfoBase.h"
void CpuFeaturesUnload(PDRIVER_OBJECT DriverObject)
{
KdPrint(("cpu_features driver unload\n"));
UNICODE_STRING symbolicLinkName = RTL_CONSTANT_STRING(L"\\??\\CpuFeatures");
// Delete symbolic link.
const NTSTATUS deleteSymbolicLinkStatus = IoDeleteSymbolicLink(&symbolicLinkName);
if (!NT_SUCCESS(deleteSymbolicLinkStatus))
{
KdPrint(("Failed to delete symbolink link cpu_features_arm64. Status: (0x%08X)\n", deleteSymbolicLinkStatus));
}
IoDeleteDevice(DriverObject->DeviceObject);
}
NTSTATUS CpuFeaturesDispatchCreateClose(PDEVICE_OBJECT DriverObject, PIRP Irp)
{
UNREFERENCED_PARAMETER(DriverObject);
UNREFERENCED_PARAMETER(Irp);
return STATUS_SUCCESS;
}
NTSTATUS CompleteIrp(PIRP Irp, NTSTATUS status, ULONG_PTR info )
{
Irp->IoStatus.Status = status;
Irp->IoStatus.Information = info;
IoCompleteRequest(Irp, 0);
return status;
}
NTSTATUS CpuFeaturesDispatchRead(PDEVICE_OBJECT DeviceObject, PIRP Irp)
{
UNREFERENCED_PARAMETER(DeviceObject);
PIO_STACK_LOCATION pIoStackLocation = IoGetCurrentIrpStackLocation(Irp);
ULONG requestLength = 0;
PVOID userBuffer = NULL;
requestLength = pIoStackLocation->Parameters.Read.Length;
if (requestLength == 0)
{
return CompleteIrp(Irp, STATUS_INVALID_BUFFER_SIZE, 0);
}
userBuffer = MmGetSystemAddressForMdlSafe(Irp->MdlAddress, NormalPagePriority);
if (!userBuffer)
{
return CompleteIrp(Irp, STATUS_INVALID_BUFFER_SIZE, 0);
}
GetCpuInfo(userBuffer);
return CompleteIrp(Irp, STATUS_SUCCESS, requestLength);
}
UINT64 ExtractBitRange(UINT64 Reg, UINT64 Msb,
UINT64 lsb) {
const UINT64 bits = Msb - lsb + 1ULL;
const UINT64 mask = (1ULL << bits) - 1ULL;
return Reg >> lsb & mask;
}
NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath)
{
UNREFERENCED_PARAMETER(RegistryPath);
// Set an unload routine.
DriverObject->DriverUnload = CpuFeaturesUnload;
// Set dispatch routines the driver support.
DriverObject->MajorFunction[IRP_MJ_CREATE] = CpuFeaturesDispatchCreateClose;
DriverObject->MajorFunction[IRP_MJ_CLOSE] = CpuFeaturesDispatchCreateClose;
DriverObject->MajorFunction[IRP_MJ_READ] = CpuFeaturesDispatchRead;
UNICODE_STRING devName = RTL_CONSTANT_STRING(L"\\Device\\CpuFeatures");
UNICODE_STRING symLink = RTL_CONSTANT_STRING(L"\\??\\CpuFeatures");
PDEVICE_OBJECT DeviceObject = NULL;
NTSTATUS status = STATUS_SUCCESS;
BOOLEAN symLinkCreated = FALSE;
// just midr output
#if defined(_M_ARM64)
KdPrint(("cpu_features aarch64 working!!!\n"));
UINT64 value;
GetCpuInfo(&value);
KdPrint(("%d\n", value));
UINT64 impl = ExtractBitRange(value, 31, 24);
UINT64 variant = ExtractBitRange(value, 23, 20);
UINT64 part = ExtractBitRange(value, 15, 4);
UINT64 revision = ExtractBitRange(value, 3, 0);
KdPrint(("part: %d\n", part));
KdPrint(("variant: %d\n", variant));
KdPrint(("revision: %d\n", revision));
KdPrint(("implementer: %d\n", impl));
#endif
do {
status = IoCreateDevice(DriverObject, 0, &devName, FILE_DEVICE_UNKNOWN, 0, FALSE, &DeviceObject);
if (!NT_SUCCESS(status)) {
KdPrint(("failed to create device (0x%08X)\n", status));
break;
}
DeviceObject->Flags |= DO_DIRECT_IO;
status = IoCreateSymbolicLink(&symLink, &devName);
if (!NT_SUCCESS(status)) {
KdPrint(("failed to create symbolic link (0x%08X)\n", status));
break;
}
symLinkCreated = TRUE;
} while (FALSE);
if (!NT_SUCCESS(status)) {
if (symLinkCreated)
IoDeleteSymbolicLink(&symLink);
if (DeviceObject)
IoDeleteDevice(DeviceObject);
}
return status;
} The biggest disadvantage of this approach is if something is not processed, there will be a blue screen of death |
hi @arkivm, could you please run this code on your M1 to make sure the behavior with EL on macOS is the same as it is on Windows compiler clang. (if possible on macOS _ReadStatusReg) #include <stdint.h>
#include <intrin.h>
#define ARM64_SYSREG(op0, op1, crn, crm, op2) \
( ((op0 & 1) << 14) | \
((op1 & 7) << 11) | \
((crn & 15) << 7) | \
((crm & 15) << 3) | \
((op2 & 7) << 0) )
#define ARM64_CNTVCT ARM64_SYSREG(3, 3, 14, 0, 2)
int main() {
uint64_t cntvct = _ReadStatusReg(ARM64_CNTVCT); // should work
return 0;
} #include <stdint.h>
#include <intrin.h>
#define ARM64_SYSREG(op0, op1, crn, crm, op2) \
( ((op0 & 1) << 14) | \
((op1 & 7) << 11) | \
((crn & 15) << 7) | \
((crm & 15) << 3) | \
((op2 & 7) << 0) )
#define ARM64_MIDR_EL1 ARM64_SYSREG(3, 0, 0, 0, 0)
int main() {
uint64_t midr = _ReadStatusReg(ARM64_MIDR_EL1); // the probability that it will fail the way it is done in user mode
return 0;
} if _ReadStautsReg is not supported for macOS void** tpidrro;
__asm__ volatile ("mrs %0, tpidrro_el0" : "=r" (tpidrro)); // should work uint64_t midr;
__asm__ volatile ("mrs %0, midr_el1" : "=r" (midr)); // the probability that it will fail the way it is done in user mode |
__asm__ volatile ("mrs %0, cntvct_el0" : "=r" (cntvct1)); // works
__asm__ volatile ("mrs %0, tpidrro_el0" : "=r" (tpidr)); // works
__asm__ volatile ("mrs %0, midr_el1" : "=r" (midr)); // fails - illegal instruction I couldn't get the include/intrin.h:12:15: fatal error: 'intrin.h' file not found
#include_next <intrin.h>
^~~~~~~~~~ Let me know if you want me to test something else. |
@arkivm, thanks for the help! |
To add support AARCH64 for other operating systems such as macOS(Apple M1), ios, FreeBSD, Windows has been moved common logic from `src/impl_aarch64_linux_or_android.c` to `src/impl_aarch64__base_implementation.inl`, namely: * Definitions for introspection * `Aarch64Info` kEmptyAarch64Info field Removed include "internal/bit_utils.h" from `src/impl_aarch64_linux_or_android.c`, since this include was not used. Also, include `cpuinfo_aarch64` has been removed from linux implementation and replaced with `impl_aarch64__base_implementation.inl`, this include will be used for all other operating system impl as well Added a compilation check that matches the base X86 implementation Refs: google#121 See also: google#150, google#186, google#204
To add support AARCH64 for other operating systems such as macOS(Apple M1), ios, FreeBSD, Windows has been moved common logic from `src/impl_aarch64_linux_or_android.c` to `src/impl_aarch64__base_implementation.inl`, namely: * Definitions for introspection * `Aarch64Info` kEmptyAarch64Info field Removed include "internal/bit_utils.h" from `src/impl_aarch64_linux_or_android.c`, since this include was not used. Also, include `cpuinfo_aarch64` has been removed from linux implementation and replaced with `impl_aarch64__base_implementation.inl`, this include will be used for all other operating systems impl as well Added a compilation check that matches the base X86 implementation Refs: google#121 See also: google#150, google#186, google#204
To add Windows Arm64 support was added detection of features via Windows API function IsProcessorFeaturePresent. Added _M_ARM64 to detect CPU_FEATURES_AARCH64 macro on Windows. Added initial code for Windows Arm64 testing and provided test for Raspberry PI 4. We can't use "define_introspection_and_hwcaps.inl" as a common file for all operating systems due to msvc compiler error C2099: initializer is not a constant, so as a workaround for Windows I used separate "define_introspection.inl" See also: google#268, google#284, google#186
To add Windows Arm64 support was added detection of features via Windows API function IsProcessorFeaturePresent. Added _M_ARM64 to detect CPU_FEATURES_AARCH64 macro on Windows. Added initial code for Windows Arm64 testing and provided test for Raspberry PI 4. We can't use "define_introspection_and_hwcaps.inl" as a common file for all operating systems due to msvc compiler error C2099: initializer is not a constant, so as a workaround for Windows I used separate "define_introspection.inl" See also: google#268, google#284, google#186
* Add Windows Arm64 support To add Windows Arm64 support was added detection of features via Windows API function IsProcessorFeaturePresent. Added _M_ARM64 to detect CPU_FEATURES_AARCH64 macro on Windows. Added initial code for Windows Arm64 testing and provided test for Raspberry PI 4. We can't use "define_introspection_and_hwcaps.inl" as a common file for all operating systems due to msvc compiler error C2099: initializer is not a constant, so as a workaround for Windows I used separate "define_introspection.inl" See also: #268, #284, #186 * [CMake] Add windows_utils.h to PROCESSOR_IS_AARCH64 * Add detection of armv8.1 atomic instructions * Update note on win-arm64 implementation and move to cpuinfo_aarch64.h * Remove redundant #ifdef CPU_FEATURES_OS_WINDOWS * Add note on FP/SIMD and Cryptographic Extension for win-arm64 * Add comments to Aarch64Info fields Added comments to specify that implementer, part and variant we set 0 for Windows, since Win API does not provide a way to get information. For revision added comment that we use GetNativeSystemInfo
Added an initial implementation, detecting ARM64 through assembly instructions, thereby getting rid of OS binding, which can be used, for example, to detect ARM on Windows.
What do you think?