Unity Issue Tracker - Unicode escape sequences are incorrectly parsed when using SerializedObjectReader.Read()

Won't Fix

Votes

0

Found in [Package]

3.0.X - Serialization

Issue ID

ECSB-1147

Regression

Yes

Unicode escape sequences are incorrectly parsed when using SerializedObjectReader.Read()

Package: Entity Component System (ECS)

-

Jun 12, 2024

How to reproduce:
1. Open the attached “serialization_unicode_issue” project
2. In the menu bar, select “Tools → Test”
3. Observe the result in the Console window

Expected result: Unicode sequences are successfully parsed and *\"TEST\"* is shown in the Console
Actual result: Unicode sequences are parsed unsuccessfully and null characters are inserted, the Console output is blank

Reproducible in: 3.0.0-pre.1 (2022.3.32f1), 3.1.1 (2022.3.32f1, 6000.0.5f1)
Not reproducible in: 2.1.2-exp.1 (2021.3.39f1, 2022.3.32f1)

Reproducible on: Windows 11
Not reproducible on: No other environments tested

Note: Also reproducible in Player

Resolution Note:

So UnsafePackedBinaryWriter relies on a couple of "streaming" data structures, none of which necessarily have a single buffer which contains every sequence of bytes that make up a char stream.

For unescaped sequences that's "fine," you can just serially output data from start to finish and everything will be fine. Two-character escape sequences always start with backslash and aren't too hard to handle either. But unicode escape sequences are between 3 and 6 characters and require substantially more handling to be robust and correct.

I threw together a patch to handle unicode escape sequences but taking a step back this is functionality that, were we really to commit to maintaining, would require a more substantial rewrite of the internals to avoid having to maintain the very fragile "continuation context" necessary to pipeline escaped character state across token boundaries.

Recommendation is for users to use StripStringEscapeCharacters mechanism to strip unicode characters after the data has been linearized.

Log in to vote on this issue

Search Issue Tracker

0

3.0.X - Serialization

ECSB-1147

Yes

Unicode escape sequences are incorrectly parsed when using SerializedObjectReader.Read()

Add comment

All about bugs

Latest issues