11 Sep 2012 00:22
HTTP and character encodings
Ganesh Sittampalam <ganesh <at> earth.li>
2012-09-10 22:22:33 GMT
2012-09-10 22:22:33 GMT
Hi, tl;dr: I'd like to remove the String instances from the HTTP package. The HTTP library is overloaded on the type for request and response bodies; there are instances for String and both strict and lazy Bytestrings. Unfortunately, the String instance is rather broken. A String ought to represent Unicode data, but the HTTP wire format is bytes, and HTTP makes no attempt to handle encoding. In particular uploaded data (e.g. in POSTs) gets silently truncated and downloaded data is improperly embedded as one byte per character no matter what encoding the server advertises in the Content-Type header. (https://github.com/haskell/HTTP/issues/28) I've spent a while investigating the option of making HTTP encode and decode Strings appropriately, but my tentative conclusion is that it's too hard: - on upload we'd have to pick an encoding by default - probably UTF-8 - and also add it to the Content-Type header which may involve messing with any header supplied by the user. If the user supplied a different encoding in Content-Type then we probably would need to notice and respect that. - on upload Content-Length may also need to be managed somehow. - on download we'd need to be able to handle at least common encodings that the server might send, but on Windows even common encodings like(Continue reading)
RSS Feed