Garbled text display in "Log View"

In Sourcetree, when in Log View, on the datagrid with columns "Graph", "Description", "Commit", "Author" and "Date", commit messages and author names having unicode accented characters display as garbled text with strange characters.

But on the panel below, which displays the selected commit's full message, text appears correct.

Other applications display text correctly, only Sourcetree has this problem.

On Mac OS X 10.8.2

2 answers

1 accepted

0 votes

Answer accepted

SourceTree supports UTF everywhere, and accented characters (and Japanese in fact) are very common. Is it possible the encoding used here is something else, perhaps one of the Latin ASCII subsets from another platform?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hello Steve, thank you for your answer.

The commit messages are written in Portuguese, and encoded as UTF-8.

The messages have been written on SourceTree itself, or on SmartGit. It doesn't matter. If I write accented characters on a commit message in SourceTree, for instance, and then commit, they will appear garbled on the Log View.

All other applications (including command-line git) correctly show the messages.

Event Sourcetree shows the text correctly in several places, except on the Log View's main datagrid, as explained above.

I include below, on this message, a screenshot that may help you understant what I'm talking about.

I created a test message having several accented characters (in this case: ÁèíçããâêÇ) to make a more obvious example.

As you can see, the selected commit message appears correct on the commit details pane, but appears garbled on the log list.
All other messages also display the same problem.

The Author names also appear garbled.

I have hilighted some areas of the application's interface where the problems are clearly visible.

I hope this helps to diagnose the problem.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

I just copied & pasted those characters in your comment above into a commit in SourceTree and it worked fine for me:

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Are you able to give us a copy of the repo to investigate? You can do it privately at https://support.atlassian.com if you like.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Great! That means there is hope!

Have you any idea of what could be done to investigate this issue further?

The problem is, the way things are, I will have to give up using SourceTree, for it's too unpleasant to see a Log View of garbled messages.

Are you sure this isn't a bug in SourceTree that only happens on certain repo configurations? SmartGit and the command line Git show the log fine.

What can I do?

Thanks

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Unfortunately, the repo contains a private company project which I cannot divulge. But thank you for your offer!

The problem also occurs on other repos, so it's not just a weird accident with a specific repo.

I decided to investigate the issue further, and after many tests, I discovered the source of the error:

The problem starts on a commit who's author name has accented characters that are encoded in a specific way.

When displaying a log where such a commit exists (even if just one), SourceTree displays all commit messages and author names with an incorrect encoding (it probably starts using the encoding of the offending commit).

I exported 2 patches to demonstrate the problem:

In the first patch excerpt, you may see the author name (Cláudio Silva) encoded as UTF-8
(this is just one line from the patch file):

From: =?UTF-8?q?Cla=CC=81udio=20Silva?= <claudio.silva@impactwave.com>

The accented A is encoded as 3 characters (a xCC x81). This commit causes no problems on SourceTree.

Now, here's the author name from an offending (error inducing) commit:

From: =?UTF-8?q?Cl=E1udio=20Silva?= <claudio.silva@impactwave.com>

The accented A is encoded as just 1 character (xE1). This seems to be a valid encoding for Unicode, but NOT for UTF-8 (see this: Unicode Character 'LATIN SMALL LETTER A WITH ACUTE' (U+00E1)).

All it takes is just one commit with an author name encoded like this to make SourceTree go mad!...

Nevertheless, command line Git and SmartGit display the logs just fine, and are unaffected by this. At most, the incorrectly encoded characters may appear garbled, but all other text appears fine.

The problematic commit was probably created on another application (perhaps on Windows, with msysgit or with SmartGit, I don't know).

So, in conclusion, I believe making SourceTree being able to handle incorrectly encoded strings without going nuts would be a nice enhancement to the software.

May I suggest bringing up this issue to the development team?

Best regards.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Would you mind attaching the patch that reproduces this either here (as an attachment rather than inline, the encoding seems to have been lost) or against https://jira.atlassian.com/browse/SRCTREE-1285 ? That will make it easier to make sure we test this case directly.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

OK, that makes sense, thanks for the detailed analysis. I've seen this problem once before in fact, and the issue is the way that Cocoa deals with character encoding - basically if one character in the stream fails UTF decoding, it refuses to decode the entire stream as UTF, meaning it falls back on a simpler encoding (which then breaks the other extended UTF characters). It doesn't appear to be possible to tell it to skip the offending characters. SourceTree loads the log in bulk for performance reasons which is why this problem can leak across up to 200 lines when it occurs.

I'm guessing that SmartGit works because Java is more tolerant of bad encoding. Command-line git is fine because it does one line at a time.

I've tried to find a workaround for this in the past and not managed it (without horribly killing performance), but I'll try again. The one case this happened in before became a non-issue because it faded into history really fast, but obviously this is more of a problem for you - the problem will go away eventually once that commit drops out of the first 200 lines in the log (after that it won't make the decoding fail for the entire first batch). We'll track it here: https://jira.atlassian.com/browse/SRCTREE-1285

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Done!

Thank you very much for looking into this issue.

Best regards.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

Hi, I think I have the same exact problem. Any news related to this matter?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

A pop-up survey could appear while you're here--curious what it's for? Click here to learn more!

Forums

Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Garbled text display in "Log View"

2 answers

1 accepted

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events