Changelog of @hackage/tiktoken 1.0.3

1.0.3

  • Fix source distribution

    The .tiktoken files for each encoding were not being included correctly in the source distribution uploaded to Hackage.

1.0.2

  • Correctly handle gaps in ranks

    The old implementation assumed that encoding don't have gaps in their ranks, but some do (especially near the end, typically reserved for special tokens). This change fixes the internal implementation to correctly handle those gaps.

  • Fix o200k_base regex to match upstream

    The upstream tiktoken package uses a flavor of regex that subtly differs from the Haskell pcre-light package. Specifically, they differ in whether they treat ideographic space (U+3000) as whitespace (which this change fixes).

    There may be other differences yet to be uncovered, but this is the only one that has arisen so far when comparing to upstream on a large corpus of text.

1.0.1

1.0.0

  • Initial release