Archive for July, 2010

FileMerge and UTF-8

I really like apples FileMerge tool for comparing files but you can sometimes get problems when diffing files which are UTF-8. In fact when you have non-ascii characters, with eg umlauts or graves, etc the diff will look like it has garbage in it.  For background see this hint at Mac OSX Hints for a bit of background on this. Basically to make sure FileMerge does a nice diff on files which are UTF-8 encoded you need to set the extended attribute to be ‘UTF-8;134217984‘, and then your diffs will work correctly. I needed to solve this exact problem for MacHg. Thus a general diff script for whatever revision system which uses FileMerge should set this attribute correctly. Ie if you are using, git, subvbersion, Mercurial, bazzar, or anything else likely the script for diffing should be setting the text-encdoing extended attribute.

It can be done by adding the following script somewhere on your unix path Eg unzip and move this bash script to ~/bin/ (assuming this is on your $PATH). This is a typical filemaker diff script as you can find floating around, eg here.  However the important part of this script is at the end where we have:

# Find the extended attributes of the files
leftattributes=xattr -p "$leftfile" 2>/dev/null
rightattributes=xattr -p "$rightfile" 2>/dev/null

if the encodings are not UTF-8, then make them UTF-8

shopt -s nocasematch if [ -z "$leftattributes" ] || [ "$leftattributes" != "UTF-8;134217984" ]; then xattr -w "UTF-8;134217984" "$leftfile" fi if [ -z "$rightattributes" ] || [ "$rightattributes" != "UTF-8;134217984" ]; then xattr -w "UTF-8;134217984" "$rightfile" fi shopt -u nocasematch

This snipped of code above in the script ensures that the extended attribute is set on each of the files before they are diffed using apple’s opendiff which in turn calls FileMerge.

After is installed you can simply call fmdiff in exactly the same way as opendiff. Eg:

Leave a Comment