Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 detection via asm #186

Closed
wants to merge 14 commits into from
Closed

Conversation

toor1245
Copy link
Contributor

@toor1245 toor1245 commented Oct 24, 2021

Added an initial implementation, detecting ARM64 through assembly instructions, thereby getting rid of OS binding, which can be used, for example, to detect ARM on Windows.

What do you think?

@toor1245 toor1245 force-pushed the arm_detection_via_asm branch 5 times, most recently from b3437a3 to f64053c Compare October 24, 2021 18:41
@toor1245 toor1245 force-pushed the arm_detection_via_asm branch 2 times, most recently from 1a328fb to f79425c Compare October 24, 2021 22:54
@gchatelet
Copy link
Collaborator

It's a nice addition, thx for the PR.
That said, cpu_features codebase is starting to be a mess so I'd like to change the code layout. Can we keep it here for now and integrate once the layout is in a better shape?

@toor1245
Copy link
Contributor Author

It's a nice addition, thx for the PR. That said, cpu_features codebase is starting to be a mess so I'd like to change the code layout. Can we keep it here for now and integrate once the layout is in a better shape?

will the testing architecture be changed? since there are a lot of tests for x86.

@toor1245 toor1245 force-pushed the arm_detection_via_asm branch 2 times, most recently from 575015e to 84b7fca Compare October 25, 2021 22:57
@gchatelet
Copy link
Collaborator

It's a nice addition, thx for the PR. That said, cpu_features codebase is starting to be a mess so I'd like to change the code layout. Can we keep it here for now and integrate once the layout is in a better shape?

will the testing architecture be changed? since there are a lot of tests for x86.

No only the code layout for now. It should be a non functional change.

@gchatelet
Copy link
Collaborator

It's a nice addition, thx for the PR. That said, cpu_features codebase is starting to be a mess so I'd like to change the code layout. Can we keep it here for now and integrate once the layout is in a better shape?

will the testing architecture be changed? since there are a lot of tests for x86.

No only the code layout for now. It should be a non functional change.

I take that back, I might have to split the tests as well... I'll make sure it's the less disruptive possible.

@toor1245 toor1245 marked this pull request as draft October 27, 2021 18:30
@toor1245 toor1245 force-pushed the arm_detection_via_asm branch 6 times, most recently from f5651ea to 9d27eeb Compare November 1, 2021 21:45
@toor1245 toor1245 force-pushed the arm_detection_via_asm branch 4 times, most recently from c9f223c to 7a0d427 Compare November 9, 2021 15:44
Comment on lines +30 to +31
inline static uint64_t ExtractBitRange(uint64_t reg, uint64_t msb,
uint64_t lsb) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed overflow when occurs extract bit range from ID_AA64_*

@toor1245 toor1245 force-pushed the arm_detection_via_asm branch 3 times, most recently from fb14bc9 to ffd4b22 Compare November 9, 2021 22:56
@toor1245 toor1245 marked this pull request as ready for review November 9, 2021 23:12
@toor1245 toor1245 force-pushed the arm_detection_via_asm branch from b9c637b to cb22995 Compare November 16, 2021 10:34
@arkivm
Copy link
Contributor

arkivm commented Nov 17, 2021

@toor1245 does it work on aarch64-darwin by any chance? I tried running it, it hits an "illegal instruction".

@toor1245
Copy link
Contributor Author

@toor1245 does it work on aarch64-darwin by any chance? I tried running it, it hits an "illegal instruction".

Hi, thanks for the reply! I think no, for now. Right now I have no opportunity to test, and there is no CI to test yet. An issue has been created to add CI jobs for ios, #197, as a temporary solution can be done through sysctlbyname, but I would not like this since there will be a dependence on the OS, and lowers the possibility for further improving the information of a given architecture

@arkivm
Copy link
Contributor

arkivm commented Nov 18, 2021

Btw, if this gets merged, are we going to drop all the OS APIs (e.g., sysctl) for arm64 and switch to this version?

@toor1245
Copy link
Contributor Author

Btw, if this gets merged, are we going to drop all the OS APIs (e.g., sysctl) for arm64 and switch to this version?

for Linux / Adnroid, something that has been working for a long time and does not break the public api, I think definitely not now, but on other OS this api may become truncated, what's not good

@toor1245 toor1245 marked this pull request as draft November 18, 2021 12:06
@toor1245
Copy link
Contributor Author

Since a new pull request was introduced, I converted this pull request to draft until it was merged #204 to the master branch

@gchatelet
Copy link
Collaborator

Thx for the work here. I'm keeping an eye on the PR (even though I haven't commented lately). Just to make sure we're on the same page, there should be only one header for aarch64 eventually. I assume you have the two headers in parallel so to check that the results are consistent?

@toor1245
Copy link
Contributor Author

Thx for the work here. I'm keeping an eye on the PR (even though I haven't commented lately). Just to make sure we're on the same page, there should be only one header for aarch64 eventually. I assume you have the two headers in parallel so to check that the results are consistent?

@gchatelet at the moment, this PR has only one include/cpuinfo_aarch64.h header unchanged and added two headers to include/internal as include/internal/cpuid_x86.h, or do we need to move the implementation? Sorry, but I misunderstood the question of the consistency of the two headers.

I think we сan to create small pull-requests, since PR too big and hard to review, and leave this PR as a sample.

I see that we can create at least:

  1. Add new inl file stringize.inl, since often operantion and dependency of define_introspection.inl .
  2. Fix or add override function of ExctractBitRange uint64_t, as there is an overflow when extracting 64 bit info in aarch64.
  3. Add include/internal/cpuid_aarch64.h, src/define_cpuid_aarch64.inl and impl_aarch64__base_implementation.inl where only MIDR_EL1(architecture, revision, part num, variant) will be implemented.
  4. Provide new inline asm testing logic.
  5. Provide PR for each ID_* separately.

And as I mentioned above, it will be good if we create a dev branch for implementation.

What do you think?

@toor1245
Copy link
Contributor Author

Update

I have completed research on aarch64 support for Windows. Access to system registers is limited from EL0 and we can't get information from user mode, to solve this problem we have only one way, we have to write software driver in kernel mode, where we have more access, and make client communicate with driver.

here is the result of the test driver of load and unload
Screenshot (8)

to read processor info we can use _ReadStatusReg, thus we don't have to think about armasm64 integration
https://reviews.llvm.org/D53115
https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170

in lib we have to add something like this:

HANDLE hDevice = CreateFile(L"\\\\.\\CpuFeatures", GENERIC_READ,
          0, NULL, OPEN_EXISTING, 0, NULL);

 if (hDevice == INVALID_HANDLE_VALUE)
      // handle hDevice == INVALID_HANDLE_VALUE

 EL1Info info;
 DWORD bytes;
 BOOL ok = ReadFile(hDevice, &info, sizeof(EL1Info), &bytes, NULL);
 if (!ok)
     // handle !ok
 if (bytes != sizeof(EL1Info))
     // handle bytes != sizeof(EL1Info)
 CloseHandle(hDevice);

Linux allows reading information from EL1 for the following system registers:

@gchatelet, @Mizux, what do you think?

@toor1245
Copy link
Contributor Author

toor1245 commented Feb 13, 2022

here is a sketch of the code, but don't mind the quality and reliability of this code as i have invested and experimented

CpuInfoBase.h

#include <ntddk.h>

void GetCpuInfo(PVOID Buffer);

CpuInfoAarch64.c

#if defined(_M_ARM64)
#include "CpuInfoBase.h"
#include "CpuInfoAarch64.h"
#include "arm64_neon.h"

// https://reviews.llvm.org/D53115
#define SYS_REG(op0, op1, crn, crm, op2) \
        ( ((op0 & 1) << 14) | \
          ((op1 & 7) << 11) | \
          ((crn & 15) << 7) | \
          ((crm & 15) << 3) | \
          ((op2 & 7) << 0) )

#define SYS_MIDR_EL1 SYS_REG(3, 0, 0, 0, 0)

void GetCpuInfo(PVOID Buffer) {
	UINT64 midr = _ReadStatusReg(SYS_MIDR_EL1);
	memcpy((PUINT64)Buffer, &midr, sizeof(UINT64));
}

#endif // defined(_M_ARM64)

Driver.c

#include <ntddk.h>
#include "CpuInfoBase.h"

void CpuFeaturesUnload(PDRIVER_OBJECT DriverObject)
{
	KdPrint(("cpu_features driver unload\n"));
	
	UNICODE_STRING symbolicLinkName = RTL_CONSTANT_STRING(L"\\??\\CpuFeatures");
	
	// Delete symbolic link.
	const NTSTATUS deleteSymbolicLinkStatus = IoDeleteSymbolicLink(&symbolicLinkName);
	if (!NT_SUCCESS(deleteSymbolicLinkStatus))
	{
		KdPrint(("Failed to delete symbolink link cpu_features_arm64. Status: (0x%08X)\n", deleteSymbolicLinkStatus));
	}
	IoDeleteDevice(DriverObject->DeviceObject);
}

NTSTATUS CpuFeaturesDispatchCreateClose(PDEVICE_OBJECT DriverObject, PIRP Irp)
{
	UNREFERENCED_PARAMETER(DriverObject);
	UNREFERENCED_PARAMETER(Irp);
	return STATUS_SUCCESS;
}

NTSTATUS CompleteIrp(PIRP Irp, NTSTATUS status, ULONG_PTR info )
{
	Irp->IoStatus.Status = status;
	Irp->IoStatus.Information = info;
	IoCompleteRequest(Irp, 0);
	return status;
}

NTSTATUS CpuFeaturesDispatchRead(PDEVICE_OBJECT DeviceObject, PIRP Irp)
{
	UNREFERENCED_PARAMETER(DeviceObject);
	PIO_STACK_LOCATION pIoStackLocation = IoGetCurrentIrpStackLocation(Irp);
	ULONG requestLength = 0;
	PVOID userBuffer = NULL;
	requestLength = pIoStackLocation->Parameters.Read.Length;
	if (requestLength == 0) 
	{
		return CompleteIrp(Irp, STATUS_INVALID_BUFFER_SIZE, 0);
	}
		
	userBuffer = MmGetSystemAddressForMdlSafe(Irp->MdlAddress, NormalPagePriority);
	if (!userBuffer)
	{
		return CompleteIrp(Irp, STATUS_INVALID_BUFFER_SIZE, 0);
	}

	GetCpuInfo(userBuffer);
		
	return CompleteIrp(Irp, STATUS_SUCCESS, requestLength);
}

UINT64 ExtractBitRange(UINT64 Reg, UINT64 Msb,
	UINT64 lsb) {
	const UINT64 bits = Msb - lsb + 1ULL;
	const UINT64 mask = (1ULL << bits) - 1ULL;
	return Reg >> lsb & mask;
}

NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath)
{
	UNREFERENCED_PARAMETER(RegistryPath);

	// Set an unload routine.
	DriverObject->DriverUnload = CpuFeaturesUnload;

	// Set dispatch routines the driver support.
	DriverObject->MajorFunction[IRP_MJ_CREATE] = CpuFeaturesDispatchCreateClose;
	DriverObject->MajorFunction[IRP_MJ_CLOSE] = CpuFeaturesDispatchCreateClose;
	DriverObject->MajorFunction[IRP_MJ_READ] = CpuFeaturesDispatchRead;
	
	UNICODE_STRING devName = RTL_CONSTANT_STRING(L"\\Device\\CpuFeatures");
	UNICODE_STRING symLink = RTL_CONSTANT_STRING(L"\\??\\CpuFeatures");
	PDEVICE_OBJECT DeviceObject = NULL;
	NTSTATUS status = STATUS_SUCCESS;
	BOOLEAN symLinkCreated = FALSE;

// just midr output 
#if defined(_M_ARM64)
	KdPrint(("cpu_features aarch64 working!!!\n"));
	UINT64 value;
	GetCpuInfo(&value);
	KdPrint(("%d\n", value));
	UINT64 impl = ExtractBitRange(value, 31, 24);
	UINT64 variant = ExtractBitRange(value, 23, 20);
	UINT64 part = ExtractBitRange(value, 15, 4);
	UINT64 revision = ExtractBitRange(value, 3, 0);
	KdPrint(("part:	   %d\n", part));
	KdPrint(("variant:     %d\n", variant));
	KdPrint(("revision:    %d\n", revision));
	KdPrint(("implementer: %d\n", impl));
#endif

	do {
		status = IoCreateDevice(DriverObject, 0, &devName, FILE_DEVICE_UNKNOWN, 0, FALSE, &DeviceObject);
		if (!NT_SUCCESS(status)) {
			KdPrint(("failed to create device (0x%08X)\n", status));
			break;
		}
		DeviceObject->Flags |= DO_DIRECT_IO;

		status = IoCreateSymbolicLink(&symLink, &devName);
		if (!NT_SUCCESS(status)) {
			KdPrint(("failed to create symbolic link (0x%08X)\n", status));
			break;
		}
		symLinkCreated = TRUE;

	} while (FALSE);

	if (!NT_SUCCESS(status)) {
		if (symLinkCreated)
			IoDeleteSymbolicLink(&symLink);
		if (DeviceObject)
			IoDeleteDevice(DeviceObject);
	}

	return status;
}

The biggest disadvantage of this approach is if something is not processed, there will be a blue screen of death

@toor1245
Copy link
Contributor Author

toor1245 commented Feb 14, 2022

hi @arkivm, could you please run this code on your M1 to make sure the behavior with EL on macOS is the same as it is on Windows

compiler clang. (if possible on macOS _ReadStatusReg)
https://reviews.llvm.org/D53115

#include <stdint.h>
#include <intrin.h>

#define ARM64_SYSREG(op0, op1, crn, crm, op2) \
        ( ((op0 & 1) << 14) | \
          ((op1 & 7) << 11) | \
          ((crn & 15) << 7) | \
          ((crm & 15) << 3) | \
          ((op2 & 7) << 0) )

#define ARM64_CNTVCT            ARM64_SYSREG(3, 3, 14, 0, 2)

int main() {
   uint64_t cntvct = _ReadStatusReg(ARM64_CNTVCT); // should work
   return 0;
}
#include <stdint.h>
#include <intrin.h>

#define ARM64_SYSREG(op0, op1, crn, crm, op2) \
        ( ((op0 & 1) << 14) | \
          ((op1 & 7) << 11) | \
          ((crn & 15) << 7) | \
          ((crm & 15) << 3) | \
          ((op2 & 7) << 0) )

#define ARM64_MIDR_EL1         ARM64_SYSREG(3, 0, 0, 0, 0)

int main() {
   uint64_t midr = _ReadStatusReg(ARM64_MIDR_EL1); // the probability that it will fail the way it is done in user mode
   return 0;
}

if _ReadStautsReg is not supported for macOS

void** tpidrro;
__asm__ volatile ("mrs %0, tpidrro_el0" : "=r" (tpidrro)); // should work
uint64_t midr;
__asm__ volatile ("mrs %0, midr_el1" : "=r" (midr)); // the probability that it will fail the way it is done in user mode

@arkivm
Copy link
Contributor

arkivm commented Feb 19, 2022

@toor1245

__asm__ volatile ("mrs %0, cntvct_el0" : "=r" (cntvct1)); // works
__asm__ volatile ("mrs %0, tpidrro_el0" : "=r" (tpidr));  // works
__asm__ volatile ("mrs %0, midr_el1" : "=r" (midr)); // fails - illegal instruction

I couldn't get the_ReadStatusReg intrinsic to work. When I add intrin.h, it was complaining

include/intrin.h:12:15: fatal error: 'intrin.h' file not found
#include_next <intrin.h>
              ^~~~~~~~~~

Let me know if you want me to test something else.

@toor1245
Copy link
Contributor Author

@arkivm, thanks for the help!

toor1245 pushed a commit to toor1245/cpu_features that referenced this pull request Aug 18, 2022
To add support AARCH64 for other operating systems such as macOS(Apple M1), ios, FreeBSD, Windows has been moved common logic from 
`src/impl_aarch64_linux_or_android.c` to `src/impl_aarch64__base_implementation.inl`, namely:
* Definitions for introspection
* `Aarch64Info` kEmptyAarch64Info field

Removed include "internal/bit_utils.h" from `src/impl_aarch64_linux_or_android.c`, since this include was not used. Also, include `cpuinfo_aarch64` has been removed  from linux implementation and replaced with `impl_aarch64__base_implementation.inl`, this include will be used for all other operating system impl as well

Added a compilation check that matches the base X86 implementation

Refs: google#121
See also: google#150, google#186, google#204
toor1245 pushed a commit to toor1245/cpu_features that referenced this pull request Aug 18, 2022
To add support AARCH64 for other operating systems such as macOS(Apple M1), ios, FreeBSD, Windows has been moved common logic from 
`src/impl_aarch64_linux_or_android.c` to `src/impl_aarch64__base_implementation.inl`, namely:
* Definitions for introspection
* `Aarch64Info` kEmptyAarch64Info field

Removed include "internal/bit_utils.h" from `src/impl_aarch64_linux_or_android.c`, since this include was not used. Also, include `cpuinfo_aarch64` has been removed from linux implementation and replaced with `impl_aarch64__base_implementation.inl`, this include will be used for all other operating systems impl as well

Added a compilation check that matches the base X86 implementation

Refs: google#121
See also: google#150, google#186, google#204
@toor1245 toor1245 closed this Nov 3, 2022
toor1245 pushed a commit to toor1245/cpu_features that referenced this pull request Jan 22, 2023
To add Windows Arm64 support was added detection of features via Windows API function IsProcessorFeaturePresent. Added _M_ARM64 to detect CPU_FEATURES_AARCH64 macro on Windows. Added initial code for Windows Arm64 testing and provided test for Raspberry PI 4. We can't use "define_introspection_and_hwcaps.inl" as a common file for all operating systems due to msvc compiler error C2099: initializer is not a constant, so as a workaround for Windows I used separate "define_introspection.inl"

See also: google#268, google#284, google#186
toor1245 pushed a commit to toor1245/cpu_features that referenced this pull request Jan 22, 2023
To add Windows Arm64 support was added detection of features via Windows API function IsProcessorFeaturePresent. Added _M_ARM64 to detect CPU_FEATURES_AARCH64 macro on Windows. Added initial code for Windows Arm64 testing and provided test for Raspberry PI 4. We can't use "define_introspection_and_hwcaps.inl" as a common file for all operating systems due to msvc compiler error C2099: initializer is not a constant, so as a workaround for Windows I used separate "define_introspection.inl"

See also: google#268, google#284, google#186
gchatelet pushed a commit that referenced this pull request Feb 23, 2023
* Add Windows Arm64 support

To add Windows Arm64 support was added detection of features via Windows API function IsProcessorFeaturePresent. Added _M_ARM64 to detect CPU_FEATURES_AARCH64 macro on Windows. Added initial code for Windows Arm64 testing and provided test for Raspberry PI 4. We can't use "define_introspection_and_hwcaps.inl" as a common file for all operating systems due to msvc compiler error C2099: initializer is not a constant, so as a workaround for Windows I used separate "define_introspection.inl"

See also: #268, #284, #186

* [CMake] Add  windows_utils.h to PROCESSOR_IS_AARCH64

* Add detection of armv8.1 atomic instructions

* Update note on win-arm64 implementation and move to cpuinfo_aarch64.h

* Remove redundant #ifdef CPU_FEATURES_OS_WINDOWS

* Add note on FP/SIMD and Cryptographic Extension for win-arm64

* Add comments to Aarch64Info fields

Added comments to specify that implementer, part and variant we set 0 for Windows, since Win API does not provide a way to get information. For revision added comment that we use GetNativeSystemInfo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants