Randall Randall
2004-08-25 01:06:23 UTC
I've started on a small library that simplifies unicode handling.
It's currently intended to be fully portable Common Lisp, and
the functions it defines should conform to the CLHS's definitions.
You can find it at
http://www.randallsquared.com/download/unicode-0.99rc1.lisp .
In order to try it out, you'll need to get
http://www.randallsquared.com/download/tables.lisp
and data from the Unicode consortium, at
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt .
This basically has all the things I had in mind for 1.0, to wit:
import and export of UTF-8, UTF-16*, UTF-32*, us-ascii, ISO 8859-[1-16];
most string and character functions implemented;
The most basic 15100 characters (if UnicodeData.txt supplied).
Things it doesn't have yet, but planned for after 1.0, are:
Conversions: SCSU, &-escaped ASCII, CMUCL characters,
OpenMCL characters, etc
Include other unicode characters
rework at least some errors to be cerrors
convenience helpers for reading and writing files and other streams
handle one-to-many mappings of case for *ansi-compliant* ==> NIL
maybe more printer methods, though I'm not very familiar with those.
Trivial (because this file has no extended UTF-8 sequence) example:
* (with-open-file (f "/Users/randall/unicode-test.txt"
:element-type '(unsigned-byte 8))
(let ((utf8 (make-array 13)))
(read-sequence utf8 f)
(utf-8->internal utf8)))
; =>
#(#\U+0054 #\U+0068 #\U+0069 #\U+0073 #\U+0020 #\U+0069 #\U+0073
#\U+0020 #\U+0061 #\U+0020 #\U+0075 #\U+006E #\U+0069)
If this is useful for anyone, I'd appreciate bug reports and feature
requests!
--
Randall Randall <***@randallsquared.com>
Property law should use #'EQ , not #'EQUAL .
It's currently intended to be fully portable Common Lisp, and
the functions it defines should conform to the CLHS's definitions.
You can find it at
http://www.randallsquared.com/download/unicode-0.99rc1.lisp .
In order to try it out, you'll need to get
http://www.randallsquared.com/download/tables.lisp
and data from the Unicode consortium, at
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt .
This basically has all the things I had in mind for 1.0, to wit:
import and export of UTF-8, UTF-16*, UTF-32*, us-ascii, ISO 8859-[1-16];
most string and character functions implemented;
The most basic 15100 characters (if UnicodeData.txt supplied).
Things it doesn't have yet, but planned for after 1.0, are:
Conversions: SCSU, &-escaped ASCII, CMUCL characters,
OpenMCL characters, etc
Include other unicode characters
rework at least some errors to be cerrors
convenience helpers for reading and writing files and other streams
handle one-to-many mappings of case for *ansi-compliant* ==> NIL
maybe more printer methods, though I'm not very familiar with those.
Trivial (because this file has no extended UTF-8 sequence) example:
* (with-open-file (f "/Users/randall/unicode-test.txt"
:element-type '(unsigned-byte 8))
(let ((utf8 (make-array 13)))
(read-sequence utf8 f)
(utf-8->internal utf8)))
; =>
#(#\U+0054 #\U+0068 #\U+0069 #\U+0073 #\U+0020 #\U+0069 #\U+0073
#\U+0020 #\U+0061 #\U+0020 #\U+0075 #\U+006E #\U+0069)
If this is useful for anyone, I'd appreciate bug reports and feature
requests!
--
Randall Randall <***@randallsquared.com>
Property law should use #'EQ , not #'EQUAL .