Problems below follows from a simple upload of content for a domain named safe://题题 which links to a file that is named as 花园.jpg and becomes broken, likely because of the filename and the handling on non western centric iso-8859-1 characters.
It looks like the filename was correctly stored on the FilesContainer, but when trying to look it up that’s when the name is not properly encoded/decoded, so the image file is not corrupted but accessing it through its path from the FilesContainer is what needs to be fixed…well also it seems is that in the NRS the encoding is also broken. I was able to download the image from safe://hbwybon5otkw5n8onfsu4zwcww4x9buuk57a1z6c4ooxpiy9dr6zgnj1cz
Yeah, I’m not exactly sure where the bug is, but there is some inconsistency where encoding/decoding happens when looking up within NRS and/or FilesContainers, but this is what I was trying to mean:
safe cat safe://题题/花园.jpg
[2020-04-19T16:06:35Z ERROR safe] safe-cli error: [Error] ContentError - No data found for path "/%E8%8A%B1%E5%9B%AD.jpg/" on the FilesContainer at "safe://hnyynyzd3c3zapx4yz8485yixxeeuznggcun6kx4e18xsxfhpxbrm1angybnc/%E8%8A%B1%E5%9B%AD.jpg?v=0"
It’s also a feature because it seems that impossible code points are handled.
So, this is NRS put to U+FFFE… I note the character is retained here on the forum, which is good to see that such handling is possible. You likely cannot see it unless you edit this post.
$ safe dog safe://
Native data type: PublishedSeqAppendOnlyData
Version: 0
Type tag: 1100
XOR name: 0xeebb5de391862fedb258d9e606fd797052abd76fc315468e3002e42d6d6c50d7
XOR-URL: safe://hnyynyzzmszxd1gdn95p1mdc6cbz7xfaffk6zp9btktwqgybqemmpptepqbnc?v=0
Resolved using NRS Map:
PublicName: "%EF%BF%BE"
Container XOR-URL: safe://hnyydywoyccpj5ccrtnx94h3jtzykdgs8fd8ysbo1bnsanj66oaj1snnqabqh
Native data type: PublishedSeqAppendOnlyData
Type tag: 1500
XOR name: 0x200631a9db184889ffd73298dc0a19ac728ce0b061208ad8127de86132b084ec
Version: 0
+------------------+----------------------+----------------------+--------------------------------------------------------------------------+
| NRS name/subname | Created | Modified | Link |
+------------------+----------------------+----------------------+--------------------------------------------------------------------------+
| %EF%BF%BE | 2020-04-19T19:15:25Z | 2020-04-19T19:15:25Z | safe://hnyynyzzmszxd1gdn95p1mdc6cbz7xfaffk6zp9btktwqgybqemmpptepqbnc?v=0 |
+------------------+----------------------+----------------------+--------------------------------------------------------------------------+
works.
So, undoing that percentage encoding in certain cases would be feeding back the users, something beyond utf-8.
That file, incidentally is [UTF-8 decoder capability and stress test] which lists invalid code points that would become a problem on output.
Still, it would be great to see valid unicode available in the url and filenames.