Foreign definitions for modules Common, CodeUnits and Unsafe by postsolar · Pull Request #1 · purescm/purescript-strings · GitHub

postsolar · 2023-10-30T18:35:21Z

These come together because they are needed to be able to reproduce a minimal tests' subset run. Imports imply/depend on merging of the IrRegex PR, as well as SRFI N152 PR for purescm/purescm.

These come together because they were all needed to run the minimal test subset.

anttih

Small stylistic nitpick: could you use square brackets in the let form for the binding pairs?

anttih

With this we also arrive at the spot where we need to figure out how to deal with the fact that JS strings are UTF-16 encoded code units and Chez uses unicode code points:

> "😁a😁😁".charAt(1)
'\ude01'

> (string-ref "😁a😁😁" 1)
#\a

For example the CodeUnit module now deals with code points rather than code units. We could actually make it work with code units by turning the string to a bytevector with string->utf16 and indexing into that.

postsolar · 2023-11-03T15:06:38Z

With this we also arrive at the spot where we need to figure out how to deal with the fact that JS strings are UTF-16 encoded code units and Chez uses unicode code points:

Something like this?

  (define _charAt
    (lambda (just)
      (lambda (nothing)
        (lambda (i)
          (lambda (s)
            (if (< i (string-length s))
              (let
                ([cus (bvs:string->utf16 s)]
                 [v (bvs:make-bytevector 2)]
                 [ix (* i 2)]
                 [tx (make-transcoder (utf-16-codec))])
                (bvs:bytevector-u16-set!
                  v
                  0
                  (bvs:bytevector-u16-ref cus ix (bvs:native-endianness))
                  (bvs:native-endianness))
                (just (string-ref (bytevector->string v tx) 0)))
              nothing))))))

Doesn't make sense for them to be commented out if the module they're testing is under review

…countPrefix` in terms of CUs This temporarily makes tests fail because the rest of functions don't expect `length` to be returning the values it's returning now

It was doing it wrt the unicode string's length rather than utf-16 bytevector length

In previous commit I forgot to make them operate on CUs

…artingAt" Actually they were just fine. This reverts commit 7e2715d.

postsolar · 2023-11-04T00:22:39Z

Ok I think it's ready.

postsolar · 2023-11-05T19:43:33Z

A note to myself: for SRFI 152, potentially there's a huge performance gain in using substring primitive instead of take and especially drop.

postsolar added 5 commits October 30, 2023 19:23

Update workflows

10aa9d4

Add a note about lone surrogates in Test.Data.String.CodePoints

1a4543a

Add spago hash

5ffa4e0

Comment out tests outside the scope of this PR

1a216ee

Add foreign definitions for CodeUnits, Common and Unsafe

3b014ad

These come together because they were all needed to run the minimal test subset.

anttih reviewed Nov 2, 2023

View reviewed changes

Comment thread src/Data/String/CodeUnits.ss Outdated

Comment thread src/Data/String/CodeUnits.ss Outdated

Comment thread src/Data/String/CodeUnits.ss Outdated

Comment thread src/Data/String/Unsafe.ss Outdated

anttih reviewed Nov 3, 2023

View reviewed changes

Comment thread src/Data/String/CodeUnits.ss Outdated

Comment thread src/Data/String/Unsafe.ss Outdated

postsolar added 2 commits November 3, 2023 14:06

Brackets + clearer namings

e7c037a

Remove all error handling from Data.String.Unsafe FFI

f15508c

postsolar added 7 commits November 3, 2023 19:12

Bring back tests for Data.String.CodeUnits

89acc59

Doesn't make sense for them to be commented out if the module they're testing is under review

Data.String.CodeUnits: Implement _charAt, _toChar, length and `…

ff66674

…countPrefix` in terms of CUs This temporarily makes tests fail because the rest of functions don't expect `length` to be returning the values it's returning now

Fix bounds checking in Data.String.CodeUnits._charAt

3fcc11e

It was doing it wrt the unicode string's length rather than utf-16 bytevector length

Off-by-one error in countPrefix

190cb33

Implement the rest of Data.String.CodeUnits in terms of CUs

0a71262

Fix indexOf, lastIndexOf, indexOfStartingAt and lastIndexOfStartingAt

7e2715d

In previous commit I forgot to make them operate on CUs

Revert "Fix indexOf, lastIndexOf, indexOfStartingAt and lastIndexOfSt…

404720b

…artingAt" Actually they were just fine. This reverts commit 7e2715d.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Foreign definitions for modules Common, CodeUnits and Unsafe#1

Foreign definitions for modules Common, CodeUnits and Unsafe#1
postsolar wants to merge 14 commits into
purescm:v6.0.1-scmfrom
postsolar:scm/ffi

postsolar commented Oct 30, 2023 •

edited

Loading

Uh oh!

anttih left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anttih left a comment

Uh oh!

Uh oh!

Uh oh!

postsolar commented Nov 3, 2023

Uh oh!

postsolar commented Nov 4, 2023

Uh oh!

postsolar commented Nov 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

postsolar commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anttih left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anttih left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

postsolar commented Nov 3, 2023

Uh oh!

postsolar commented Nov 4, 2023

Uh oh!

postsolar commented Nov 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

postsolar commented Oct 30, 2023 •

edited

Loading