Skip to content

Commit 3b9c0a5

Browse files
authored
[66_13] Reasonable herk->utf8 and utf8->herk
## Why Try to solve the Cork encoding defects by introducing the Herk encoding with minimal changes. Herk encoding is adopted in TMU serialization and deserialization. It is much better than `utf8->cork` and `cork->utf8`. Because in `utf8->cork` and `cork->utf8`, there may be two unicode maps to the same cork code. It does bring breaking changes for the TMU format, that's why we need to bump the version. But it is not a big change. ## What 1. UTF8 from 00 to 1F should be encoded as <#0> to <#1F> in Herk encoding 2. UTF8 from A0 to FF should be encoded as <#A0> to <#FF> in Herk encoding if there is not cork encoding found + Will fix copy and paste of © https://symbl.cc/en/00A9-copyright-emoji/ when we use herk encoding in copy and paste 4. Herk DF should be mapped to U+1E9E 5. Herk 17 should be mapped to U+200B 6. Herk 18 should be mapped to U+2080 7. Herk 1A should be mapped to U+0237 8. Herk 7F should be mapped to U+00AD 9. Bump to TMU 1.0.5 ## How to test ### Unit tests on branch-1.2 Before ``` (utf8->herk (string #\null)) => ; *** failed *** ; expected result: <#0> (herk->utf8 (string #\x18)) => ▒ ; *** failed *** ; expected result: ₀ (herk->utf8 (string #\x1a)) => ▒ ; *** failed *** ; expected result: ȷ (utf8->herk (string #\x10)) => ; *** failed *** ; expected result: <#10> (utf8->herk (utf8->string #u(194 160))) =>   ; *** failed *** ; expected result: <#A0> (herk->utf8 (string #\xdf)) => � ; *** failed *** ; expected result: ẞ (utf8->herk (string #\xff)) => � ; *** failed *** ; expected result: <#FF> ``` Now TeXmacs/tests/66_13.scm should work fine! ### Test doc Several test cases are listed in TeXmacs/tests/tmu/unicode_256.tmu The bug lies in the TMU reader.
1 parent 5e0fa5e commit 3b9c0a5

File tree

6 files changed

+1853
-8
lines changed

6 files changed

+1853
-8
lines changed
+266
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
;; Two-way conversions between Cork and Unicode
2+
3+
;; (C) 2003 Felix Breuer, David Allouche
4+
;; 2024 Darcy Shen
5+
;;
6+
;; This software falls under the GNU general public license version 3 or later.
7+
;; It comes WITHOUT ANY WARRANTY WHATSOEVER. For details, see the file LICENSE
8+
;; in the root directory or <http://www.gnu.org/licenses/gpl-3.0.html>.
9+
10+
11+
("#00" "#60")
12+
("#01" "#B4")
13+
("#02" "#02C6") ; modifier letter circumflex accent
14+
("#03" "#02DC") ; small tilde
15+
("#04" "#A8")
16+
("#05" "#02DD")
17+
("#06" "#02DA")
18+
("#07" "#02C7")
19+
("#08" "#02D8")
20+
("#09" "#AF")
21+
("#0A" "#02D9")
22+
("#0B" "#B8")
23+
("#0C" "#02DB")
24+
("#0D" "#201A")
25+
("#0E" "#2039")
26+
("#0F" "#203A")
27+
("#10" "#201C")
28+
("#11" "#201D")
29+
("#12" "#201E")
30+
("#13" "#AB")
31+
("#14" "#BB")
32+
("#15" "#2013")
33+
("#16" "#2014")
34+
("#17" "#200B")
35+
("#18" "#2080")
36+
("#19" "#0131")
37+
("#1A" "#0237")
38+
("#1B" "#FB00")
39+
("#1C" "#FB01")
40+
("#1D" "#FB02")
41+
("#1E" "#FB03")
42+
("#1F" "#FB04")
43+
("#20" "#20")
44+
("#21" "#21")
45+
("#22" "#22")
46+
("#23" "#23")
47+
("#24" "#24")
48+
("#25" "#25") ; percent sign
49+
("#26" "#26")
50+
("#27" "#27")
51+
("#28" "#28")
52+
("#29" "#29")
53+
("#2A" "#2A")
54+
("#2B" "#2B")
55+
("#2C" "#2C")
56+
("#2D" "#2D")
57+
("#2E" "#2E")
58+
("#2F" "#2F")
59+
("#30" "#30")
60+
("#31" "#31")
61+
("#32" "#32")
62+
("#33" "#33")
63+
("#34" "#34")
64+
("#35" "#35")
65+
("#36" "#36")
66+
("#37" "#37")
67+
("#38" "#38")
68+
("#39" "#39")
69+
("#3A" "#3A")
70+
("#3B" "#3B")
71+
("#3C" "#3C") ; less than
72+
("#3D" "#3D")
73+
("#3E" "#3E") ; greater than
74+
("#3F" "#3F")
75+
("#40" "#40")
76+
("#41" "#41")
77+
("#42" "#42")
78+
("#43" "#43")
79+
("#44" "#44")
80+
("#45" "#45")
81+
("#46" "#46")
82+
("#47" "#47")
83+
("#48" "#48")
84+
("#49" "#49")
85+
("#4A" "#4A")
86+
("#4B" "#4B")
87+
("#4C" "#4C")
88+
("#4D" "#4D")
89+
("#4E" "#4E")
90+
("#4F" "#4F")
91+
("#50" "#50")
92+
("#51" "#51")
93+
("#52" "#52")
94+
("#53" "#53")
95+
("#54" "#54")
96+
("#55" "#55")
97+
("#56" "#56")
98+
("#57" "#57")
99+
("#58" "#58")
100+
("#59" "#59")
101+
("#5A" "#5A")
102+
("#5B" "#5B")
103+
("#5C" "#5C")
104+
("#5D" "#5D")
105+
("#5E" "#5E")
106+
("#5F" "#5F")
107+
("#60" "#2018") ; typographic backquote
108+
("#61" "#61")
109+
("#62" "#62")
110+
("#63" "#63")
111+
("#64" "#64")
112+
("#65" "#65")
113+
("#66" "#66")
114+
("#67" "#67")
115+
("#68" "#68")
116+
("#69" "#69")
117+
("#6A" "#6A")
118+
("#6B" "#6B")
119+
("#6C" "#6C")
120+
("#6D" "#6D")
121+
("#6E" "#6E")
122+
("#6F" "#6F")
123+
("#70" "#70")
124+
("#71" "#71")
125+
("#72" "#72")
126+
("#73" "#73")
127+
("#74" "#74")
128+
("#75" "#75")
129+
("#76" "#76")
130+
("#77" "#77")
131+
("#78" "#78")
132+
("#79" "#79")
133+
("#7A" "#7A")
134+
("#7B" "#7B")
135+
("#7C" "#7C")
136+
("#7D" "#7D")
137+
("#7E" "#7E")
138+
("#7F" "#00AD")
139+
("#80" "#0102")
140+
("#81" "#0104")
141+
("#82" "#0106")
142+
("#83" "#010C")
143+
("#84" "#010E")
144+
("#85" "#011A")
145+
("#86" "#0118")
146+
("#87" "#011E")
147+
("#88" "#0139")
148+
("#89" "#013D")
149+
("#8A" "#0141")
150+
("#8B" "#0143")
151+
("#8C" "#0147")
152+
("#8D" "#014A")
153+
("#8E" "#0150")
154+
("#8F" "#0154")
155+
("#90" "#0158")
156+
("#91" "#015A")
157+
("#92" "#0160")
158+
("#93" "#015E")
159+
("#94" "#0164")
160+
("#95" "#0162")
161+
("#96" "#0170")
162+
("#97" "#016E")
163+
("#98" "#0178")
164+
("#99" "#0179")
165+
("#9A" "#017D")
166+
("#9B" "#017B")
167+
("#9C" "#0132")
168+
("#9D" "#0130")
169+
("#9E" "#0111")
170+
("#9F" "#A7")
171+
("#A0" "#0103")
172+
("#A1" "#0105")
173+
("#A2" "#0107")
174+
("#A3" "#010D")
175+
("#A4" "#010F")
176+
("#A5" "#011B")
177+
("#A6" "#0119")
178+
("#A7" "#011F")
179+
("#A8" "#013A")
180+
("#A9" "#013E")
181+
("#AA" "#0142")
182+
("#AB" "#0144")
183+
("#AC" "#0148")
184+
("#AD" "#014B")
185+
("#AE" "#0151")
186+
("#AF" "#0155")
187+
("#B0" "#0159")
188+
("#B1" "#015B")
189+
("#B2" "#0161")
190+
("#B3" "#015F")
191+
("#B4" "#0165")
192+
("#B5" "#0163")
193+
("#B6" "#0171")
194+
("#B7" "#016F")
195+
("#B8" "#FF")
196+
("#B9" "#017A")
197+
("#BA" "#017E")
198+
("#BB" "#017C")
199+
("#BC" "#0133")
200+
("#BD" "#A1")
201+
("#BE" "#BF")
202+
("#BF" "#A3")
203+
("#C0" "#C0")
204+
("#C1" "#C1")
205+
("#C2" "#C2")
206+
("#C3" "#C3")
207+
("#C4" "#C4")
208+
("#C5" "#C5")
209+
("#C6" "#C6")
210+
("#C7" "#C7")
211+
("#C8" "#C8")
212+
("#C9" "#C9")
213+
("#CA" "#CA")
214+
("#CB" "#CB")
215+
("#CC" "#CC")
216+
("#CD" "#CD")
217+
("#CE" "#CE")
218+
("#CF" "#CF")
219+
("#D0" "#D0")
220+
("#D1" "#D1")
221+
("#D2" "#D2")
222+
("#D3" "#D3")
223+
("#D4" "#D4")
224+
("#D5" "#D5")
225+
("#D6" "#D6")
226+
("#D7" "#0152")
227+
("#D8" "#D8")
228+
("#D9" "#D9")
229+
("#DA" "#DA")
230+
("#DB" "#DB")
231+
("#DC" "#DC")
232+
("#DD" "#DD")
233+
("#DE" "#DE")
234+
("#DF" "#1E9E")
235+
("#E0" "#E0")
236+
("#E1" "#E1")
237+
("#E2" "#E2")
238+
("#E3" "#E3")
239+
("#E4" "#E4")
240+
("#E5" "#E5")
241+
("#E6" "#E6")
242+
("#E7" "#E7")
243+
("#E8" "#E8")
244+
("#E9" "#E9")
245+
("#EA" "#EA")
246+
("#EB" "#EB")
247+
("#EC" "#EC")
248+
("#ED" "#ED")
249+
("#EE" "#EE")
250+
("#EF" "#EF")
251+
("#F0" "#F0")
252+
("#F1" "#F1")
253+
("#F2" "#F2")
254+
("#F3" "#F3")
255+
("#F4" "#F4")
256+
("#F5" "#F5")
257+
("#F6" "#F6")
258+
("#F7" "#0153")
259+
("#F8" "#F8")
260+
("#F9" "#F9")
261+
("#FA" "#FA")
262+
("#FB" "#FB")
263+
("#FC" "#FC")
264+
("#FD" "#FD")
265+
("#FE" "#FE")
266+
("#FF" "#DF")

0 commit comments

Comments
 (0)