Skip to content

Check for unless darwin before including aarch64 SVE support#2

Open
nogginly wants to merge 2 commits into
spider-gazelle:mainfrom
nogginly:fix-SIMD.instance-macOS
Open

Check for unless darwin before including aarch64 SVE support#2
nogginly wants to merge 2 commits into
spider-gazelle:mainfrom
nogginly:fix-SIMD.instance-macOS

Conversation

@nogginly
Copy link
Copy Markdown

Fixes #1 by including SVE support for aarch64 iff platform is Linux.

Here's why (as far as I can make out):

  • Apple Silicon only supports NEON
  • The initialization pulls in the SVE and SVE2 implementation expecting to be able to discard it in the case statement
  • However this causes some compilation of instructions that are simply not available on Apple Silicon

The fix wraps all those SVE/SVE2 cases / requires within {% unless flag?(:darwin %} ... {% end %} to skip entirely unless Linux.

FYI: I originally check for Linux but then looked it up and it appears Windows on aarch64 may / does also support SVE/SVE2. The one thing I know is Apple Silicon doesn't, so I converted the checks to unless Darwin.

@nogginly
Copy link
Copy Markdown
Author

@stakach fyi. This fixes #1 and if merged I'll be able to use SIMD.instance.

I do have a new problem: I am unable to compile the benchmarker. Now, when I build it I get a new error.

$ shards build benchmarker
Dependencies are satisfied
Building: benchmarker
Error target benchmarker failed to compile:
error: <inline asm>:4:15: invalid operand for instruction
         umax v0.2d, v0.2d, v1.2d

This instruction usage is found in two methods:

  • neon.cr:1779 in #clamp...UInt64...
  • neon.cr:1863 in @max...UInt64...

From searching the web, keep in mind I don't know ARM assembly, it sounds like ARM doesn't support .2d for UMAX. Does that make sense?

I don't run into this in practice because I'm using the shard for Float32 vectors.

If you're OK we can merge this so SIMD.instance works for Apple Silicon. I can raise a separate issue for the benchmarker failing on Mac and we can tackle it separately (since this is an existing issue.)

@nogginly
Copy link
Copy Markdown
Author

nogginly commented Apr 27, 2026

@stakach, I decided to try commenting out max and clamp for UInt64 and replacing with scalar fallback. This then led to more errors related invalid smax and umin ARM instructions using .2d widths. In the end I commented out

  • max(a) for UIn64, Int64
  • min(a) for UIn64, Int64
  • clamp(d,a,lo,hi) for UInt64, Int64

At the top I added the following fallbacks to scalar:

  • max(a:UInt64)
  • max(a:Int64)
  • min(a:UInt64)
  • min(a:Int64)
  • clamp(a:Array(T)...) : T forall T

After all this 🥳 I am now able to compile benchmarker on my Macbook.

Two questions:

  1. Shall I commit the changes to neon.cr in this PR, or proceed with a separate issue and PR?
  2. Do you mind if I leave the commented code in? Since 32bit ops (via .4s) work. But I need more time for that.

@nogginly nogginly changed the title Check for linux before including aarch64 SVE support Check for unless darwin before including aarch64 SVE support Apr 27, 2026
@nogginly
Copy link
Copy Markdown
Author

Sooooooo ... one more thing: I ran benchmarker and it looks like usinf NEON is slower most of the time than Scalar.

Prior to Crystal 1.20 I distinctly remember not using simd was very slow. But now, using Crystal 1.20, it's faster. I see LLVM got a lift in that release, I'm on 22.x ... does this mean auto-vectorization is working and we don't need to hand-roll SIMD?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inline asm error calling SIMD.instance on macOS

1 participant