Twitter API encoding is somewhat bonkers

Take a look at the Encoding section in the Twitter API docs:

The Twitter API supports UTF-8 encoding. Please note that angle brackets (“<” and “>”) are entity-encoded to prevent Cross-Site Scripting attacks for web-embedded consumers of JSON API output. The resulting encoded entities do count towards the 140 character limit.

Does anyone notice the weirdness there? Apart from the MAGIC_QUOTES smell.

If I were feeling pathological, I could tweet a message of 140 characters all between the Unicode code-points U+010000-U+10FFFF. I think that would end up as 560 bytes. And I think that would be all fine with Twitter. Which is another way of saying that Twitter would, I assume, be happy to exceed 140 bytes for a message if it were written in, say, Japanese.

By contrast, while on my pathological holiday from good sense, I would only be able to tweet a message of 35 angle brackets – hence 140 characters, 140 bytes in UTF-8 – because the encoded angle-brackets count toward the number of characters. Seems a bit backwards doesn’t it?

Does anyone know the reasoning here? Or are the docs at fault?

Back to the angle-bracket quoting. Just as the PHP folk are finally ending their own embarrassing journey through that silliness, it looks to me like Twitter are now making a similar mistake. JSON should safely encapsulate angle-brackets, so perhaps I don’t understand the problem that they are trying to solve?

One more question: what if I tweet “&gt;”? When using the API, can that be distinguished from a “>”?

(You might have noticed that I’ve have so far been too lazy to experiment with all this stuff; I just wanted to write it down before I forgot. I’ll add a comment if I get the time to play.)


A few months ago I downloaded angL by Ihsahn (Wikipedia,, but I noticed that a couple of tracks were broken. One skipped a lot and the other went silent half way through. I emailed eMusic and followed up a month later. To their credit, they replied and gave me a free track, but said it was down to the record company to provide new tracks.

After a couple of months, I was getting frustrated. I didn’t think eMusic were following up on this, so I emailed Ihsahn’s record company, Candlelight Records, and his management/production company, Mnemosyne, to let them know. I got a couple of replies, including one from Heidi at Mnemosyne – who I’ve just realised is Ihriel, Ihsahn’s wife and fellow artist – apologising and thanking me for telling them.

And this morning I also received angL on CD, sent by post from Mnemosyne, with a hand-written postcard saying “Thanks for helping us out”, and signed “Ihsahn”.

Thanks Heidi and Ihsahn, you’ve totally made my day, and I wish you all the best.

(Side note to everyone else: angL is a bloody brilliant album, so go and get it now :)