Broken link in content - international character handling

Problems below follows from a simple upload of content for a domain named safe://题题 which links to a file that is named as 花园.jpg and becomes broken, likely because of the filename and the handling on non western centric iso-8859-1 characters.

  1. dog is barking with percentage encoding
  2. the image is broken

This test following the thought about IDN handling in the browser and perhaps a consequence of that. See Tools for Everyone - international and beyond

Simple upload with

$ safe files put --recursive ./ztest/
FilesContainer created at: "safe://hnyynyzd3c3zapx4yz8485yixxeeuznggcun6kx4e18xsxfhpxbrm1angybnc"
+  ./ztest/index.html  safe://hbhybyn4s5osw6z64rhrhb7b7aw5rz59ey3zpkzm946fjbgpq6kio6fawm 
+  ./ztest/style.css   safe://hbhyyyd5i4fhtij9ijqa8oc8p8uze5y47qiqhwx3tfoo1eobchz1m7mr63 
+  ./ztest/花园.jpg    safe://hbwybon5otkw5n8onfsu4zwcww4x9buuk57a1z6c4ooxpiy9dr6zgnj1cz 

$ safe nrs create 题题 --link safe://hnyynyzd3c3zapx4yz8485yixxeeuznggcun6kx4e18xsxfhpxbrm1angybnc?v=0
New NRS Map for "safe://题题" created at: "safe://hnyydyw3wnegkg1t39tnoy3crekzf5gospopsjjycxwth6jax7aey6nw6cbqh"
+  题题  safe://hnyynyzd3c3zapx4yz8485yixxeeuznggcun6kx4e18xsxfhpxbrm1angybnc?v=0 

So, as noted with the browser using IDN atm, access to the NRS is fixable by also registering the IDN:

$ safe nrs create xn--q15aa --link safe://hnyynyzd3c3zapx4yz8485yixxeeuznggcun6kx4e18xsxfhpxbrm1angybnc?v=0
New NRS Map for "safe://xn--q15aa" created at: "safe://hnyydywqqmbw8rsdhcpzbyk7j14rbm5jxnfia4rm4358nmbgr67k7i1hhobqh"
+  xn--q15aa  safe://hnyynyzd3c3zapx4yz8485yixxeeuznggcun6kx4e18xsxfhpxbrm1angybnc?v=0 

dog oddly is talking in percent-encodin PublicName: "%E6%97%A0%E9%A2%98"
and surely that should not be required?

$ safe dog safe://题题
Native data type: PublishedSeqAppendOnlyData
Version: 0
Type tag: 1100
XOR name: 0xb7536b8bfb7881d328a2b32ce73d04e29958f67d3bc57c6b07977dde1630d2c0
XOR-URL: safe://hnyynys5ig4hm9phedw3ewk313337yutj1s8sxw7hk9dmy6mz5zosgdjcybnc?v=0

Resolved using NRS Map:
PublicName: "%E6%97%A0%E9%A2%98"
Container XOR-URL: safe://hnyydywmu43iru1iy8pha74hhndheb1mkjy8w8m9qo4ojxaebxd3xsx5qabqh
Native data type: PublishedSeqAppendOnlyData
Type tag: 1500
XOR name: 0x173d66a49caa03b798eeb9c10f880c96a480f43afee86a097e10178f2fb3f6ec
Version: 0
+--------------------+----------------------+----------------------+--------------------------------------------------------------------------+
| NRS name/subname   | Created              | Modified             | Link                                                                     |
+--------------------+----------------------+----------------------+--------------------------------------------------------------------------+
| %E6%97%A0%E9%A2%98 | 2020-04-19T11:18:13Z | 2020-04-19T11:18:13Z | safe://hnyynys5ig4hm9phedw3ewk313337yutj1s8sxw7hk9dmy6mz5zosgdjcybnc?v=0 |
+--------------------+----------------------+----------------------+--------------------------------------------------------------------------+

but now visiting safe://题题 or it’s equivalent as safe://xn–q15aa
the image is broken and perhaps because the image is named as 花园.jpg

Just to note the image itself is uploaded and available on the xor as safe://hbwybon5otkw5n8onfsu4zwcww4x9buuk57a1z6c4ooxpiy9dr6zgnj1cz

This is a worry because it suggests that not just the domain but the content will need to be fixed for IDN in some way.

6 Likes

It looks like the filename was correctly stored on the FilesContainer, but when trying to look it up that’s when the name is not properly encoded/decoded, so the image file is not corrupted but accessing it through its path from the FilesContainer is what needs to be fixed…well also it seems is that in the NRS the encoding is also broken. I was able to download the image from safe://hbwybon5otkw5n8onfsu4zwcww4x9buuk57a1z6c4ooxpiy9dr6zgnj1cz

4 Likes

Not entirely sure I follow as cat suggests the files in the right place for what the index.html requests as src="./花园.jpg"

$ safe cat safe://题题
Files of FilesContainer (version 0) at "safe://题题":
+-------------+--------+----------------------+----------------------+-------------------------------------------------------------------+
| Name        | Size   | Created              | Modified             | Link                                                              |
+-------------+--------+----------------------+----------------------+-------------------------------------------------------------------+
| /index.html | 790    | 2020-04-19T13:00:09Z | 2020-04-19T13:00:09Z | safe://hbhybyn4s5osw6z64rhrhb7b7aw5rz59ey3zpkzm946fjbgpq6kio6fawm |
+-------------+--------+----------------------+----------------------+-------------------------------------------------------------------+
| /style.css  | 244    | 2020-04-19T13:00:09Z | 2020-04-19T13:00:09Z | safe://hbhyyyd5i4fhtij9ijqa8oc8p8uze5y47qiqhwx3tfoo1eobchz1m7mr63 |
+-------------+--------+----------------------+----------------------+-------------------------------------------------------------------+
| /花园.jpg   | 382631 | 2020-04-19T13:00:09Z | 2020-04-19T13:00:09Z | safe://hbwybon5otkw5n8onfsu4zwcww4x9buuk57a1z6c4ooxpiy9dr6zgnj1cz |
+-------------+--------+----------------------+----------------------+-------------------------------------------------------------------+
1 Like

Yeah, I’m not exactly sure where the bug is, but there is some inconsistency where encoding/decoding happens when looking up within NRS and/or FilesContainers, but this is what I was trying to mean:

safe cat safe://题题/花园.jpg
[2020-04-19T16:06:35Z ERROR safe] safe-cli error: [Error] ContentError - No data found for path "/%E8%8A%B1%E5%9B%AD.jpg/" on the FilesContainer at "safe://hnyynyzd3c3zapx4yz8485yixxeeuznggcun6kx4e18xsxfhpxbrm1angybnc/%E8%8A%B1%E5%9B%AD.jpg?v=0"
3 Likes

It’s also a feature because it seems that impossible code points are handled.

So, this is NRS put to U+FFFE… I note the character is retained here on the forum, which is good to see that such handling is possible. You likely cannot see it unless you edit this post.

$ safe dog safe://
Native data type: PublishedSeqAppendOnlyData
Version: 0
Type tag: 1100
XOR name: 0xeebb5de391862fedb258d9e606fd797052abd76fc315468e3002e42d6d6c50d7
XOR-URL: safe://hnyynyzzmszxd1gdn95p1mdc6cbz7xfaffk6zp9btktwqgybqemmpptepqbnc?v=0

Resolved using NRS Map:
PublicName: "%EF%BF%BE"
Container XOR-URL: safe://hnyydywoyccpj5ccrtnx94h3jtzykdgs8fd8ysbo1bnsanj66oaj1snnqabqh
Native data type: PublishedSeqAppendOnlyData
Type tag: 1500
XOR name: 0x200631a9db184889ffd73298dc0a19ac728ce0b061208ad8127de86132b084ec
Version: 0
+------------------+----------------------+----------------------+--------------------------------------------------------------------------+
| NRS name/subname | Created              | Modified             | Link                                                                     |
+------------------+----------------------+----------------------+--------------------------------------------------------------------------+
| %EF%BF%BE        | 2020-04-19T19:15:25Z | 2020-04-19T19:15:25Z | safe://hnyynyzzmszxd1gdn95p1mdc6cbz7xfaffk6zp9btktwqgybqemmpptepqbnc?v=0 |
+------------------+----------------------+----------------------+--------------------------------------------------------------------------+

works.

So, undoing that percentage encoding in certain cases would be feeding back the users, something beyond utf-8.

That file, incidentally is [UTF-8 decoder capability and stress test] which lists invalid code points that would become a problem on output.

Still, it would be great to see valid unicode available in the url and filenames.

5 Likes

@bochaco note there’s also a similar problem in subnames

$ safe nrs add матрошка.матрошка.матрошка.матрошка --link safe://hnyynywgmqog75ipmwdmdw7fzgc3tzmegy49wj6xi374xtj8bxajq878ysbnc?v=0
NRS Map updated (version 1): "safe://%D0%BC%D0%B0%D1%82%D1%80%D0%BE%D1%88%D0%BA%D0%B0.%D0%BC%D0%B0%D1%82%D1%80%D0%BE%D1%88%D0%BA%D0%B0.%D0%BC%D0%B0%D1%82%D1%80%D0%BE%D1%88%D0%BA%D0%B0.hnyydyiecc8etme6jgpzr7gs8op44byo6wwhmhurhii3pi6j1yh4h5pwxgbqh"
+  матрошка.матрошка.матрошка.матрошка  safe://hnyynywgmqog75ipmwdmdw7fzgc3tzmegy49wj6xi374xtj8bxajq878ysbnc?v=0

and that NRS map suggested then is hard to make use of.

The original setup was put as safe://матрошка.матрошка but I’ve lost the detail of it beyond the basic files are at

$ safe cat safe://hnyynywgmqog75ipmwdmdw7fzgc3tzmegy49wj6xi374xtj8bxajq878ysbnc?v=0
Files of FilesContainer (version 0) at "safe://hnyynywgmqog75ipmwdmdw7fzgc3tzmegy49wj6xi374xtj8bxajq878ysbnc?v=0":
+---------------+--------+----------------------+----------------------+-------------------------------------------------------------------+
| Name          | Size   | Created              | Modified             | Link                                                              |
+---------------+--------+----------------------+----------------------+-------------------------------------------------------------------+
| /index.html   | 253    | 2020-05-31T15:21:24Z | 2020-05-31T15:21:24Z | safe://hbhybynsybuxp9edjh8ngn5t1dtytehhy33x5q5k6o6nkxk4t1mpqehehb |
+---------------+--------+----------------------+----------------------+-------------------------------------------------------------------+
| /style.css    | 115    | 2020-05-31T15:21:24Z | 2020-05-31T15:21:24Z | safe://hbhyyynjteexd8wfn6rq7nm99bduojhgi4ogjh7mosa4r6kmbux1s1x1o9 |
+---------------+--------+----------------------+----------------------+-------------------------------------------------------------------+
| /матрошка.jpg | 214014 | 2020-05-31T15:21:24Z | 2020-05-31T15:21:24Z | safe://hbwybonxnbz73ecy8ut8o489egir5qc8kqwur4rg37o887ue3zcnsnjo9z |
+---------------+--------+----------------------+----------------------+-------------------------------------------------------------------+
1 Like

This topic was automatically closed after 60 days. New replies are no longer allowed.