I recently imported a local git project into bitbucket. Everything went well, but when I edited a minor typo using the online source editor, it changed a non-ASCII Unicode UTF-8 character into some other encoding. Specifically, it changed the Unicode copyright symbol (U+00A9) into a single-byte character A9, which is not a legal UTF-8 character.
I verified that this is where the character got changed by doing a diff between the two versions, the relevant lines being:
-# Copyright© 2018 Center for Advanced Study of Language University of Maryland
+# Copyright © 2018 Center for Advanced Study of Language University of Maryland
The version before my edit contains the sequence of bytes C2 (represented in the online diff tool as the funny kind of A-hat) + A9 (represented as a copyright symbol). But in the version after the edit, it displays just A9, i.e. an 8-bit char. It happens that C2 A9 is the sequence of bytes used in UTF-8 to encode U+00A9. So it appears that when I edited the file using the on-line editor, it changed the valid UTF-8 byte sequence C2 A9 into just A9, which as I say is invalid UTF-8.
I'm using the Firefox browser, and it says the page with the editor is UTF-8. So I don't think it's my browser.
Is the bitbucket on-line editor unsafe with Unicode? If so, that's fine, I normally only edit things in my own editor; I just saw a typo and decided to fix it, but I could instead have pushed the fix from my own computer.
Online forums and learning are now in one easy-to-use experience.
By continuing, you accept the updated Community Terms of Use and acknowledge the Privacy Policy. Your public name, photo, and achievements may be publicly visible and available in search engines.