Skip to content
This repository was archived by the owner on May 16, 2024. It is now read-only.

Foreign definitions for modules Common, CodeUnits and Unsafe#1

Open
postsolar wants to merge 14 commits into
purescm:v6.0.1-scmfrom
postsolar:scm/ffi
Open

Foreign definitions for modules Common, CodeUnits and Unsafe#1
postsolar wants to merge 14 commits into
purescm:v6.0.1-scmfrom
postsolar:scm/ffi

Conversation

@postsolar
Copy link
Copy Markdown

@postsolar postsolar commented Oct 30, 2023

These come together because they are needed to be able to reproduce a minimal tests' subset run. Imports imply/depend on merging of the IrRegex PR, as well as SRFI N152 PR for purescm/purescm.

Copy link
Copy Markdown

@anttih anttih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small stylistic nitpick: could you use square brackets in the let form for the binding pairs?

Comment thread src/Data/String/CodeUnits.ss Outdated
Comment thread src/Data/String/CodeUnits.ss Outdated
Comment thread src/Data/String/CodeUnits.ss Outdated
Comment thread src/Data/String/Unsafe.ss Outdated
Copy link
Copy Markdown

@anttih anttih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this we also arrive at the spot where we need to figure out how to deal with the fact that JS strings are UTF-16 encoded code units and Chez uses unicode code points:

> "😁a😁😁".charAt(1)
'\ude01'
> (string-ref "😁a😁😁" 1)
#\a

For example the CodeUnit module now deals with code points rather than code units. We could actually make it work with code units by turning the string to a bytevector with string->utf16 and indexing into that.

Comment thread src/Data/String/CodeUnits.ss Outdated
Comment thread src/Data/String/Unsafe.ss Outdated
@postsolar
Copy link
Copy Markdown
Author

With this we also arrive at the spot where we need to figure out how to deal with the fact that JS strings are UTF-16 encoded code units and Chez uses unicode code points:

Something like this?

  (define _charAt
    (lambda (just)
      (lambda (nothing)
        (lambda (i)
          (lambda (s)
            (if (< i (string-length s))
              (let
                ([cus (bvs:string->utf16 s)]
                 [v (bvs:make-bytevector 2)]
                 [ix (* i 2)]
                 [tx (make-transcoder (utf-16-codec))])
                (bvs:bytevector-u16-set!
                  v
                  0
                  (bvs:bytevector-u16-ref cus ix (bvs:native-endianness))
                  (bvs:native-endianness))
                (just (string-ref (bytevector->string v tx) 0)))
              nothing))))))

Doesn't make sense for them to be commented out if the module they're testing is under review
…countPrefix` in terms of CUs

This temporarily makes tests fail because the rest of functions don't expect `length` to be returning the values it's returning now
It was doing it wrt the unicode string's length rather than utf-16 bytevector length
In previous commit I forgot to make them operate on CUs
…artingAt"

Actually they were just fine.

This reverts commit 7e2715d.
@postsolar
Copy link
Copy Markdown
Author

Ok I think it's ready.

@postsolar
Copy link
Copy Markdown
Author

A note to myself: for SRFI 152, potentially there's a huge performance gain in using substring primitive instead of take and especially drop.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants