Louis Coder | 27 Jan 2011 16:07
Picon
Favicon

My mp3 tagging code can't handle Unicode tags

Hi.

I am Louis. I wrote some code to tag ID3v2.2 and 2.3 mp3 files.
The tagging code worked over years, but now some people came up with Unicode tags and my tagging code does not
WRITE those Unicode tags correctly.

Please have a look at this screen shot I made from my Windows desktop:

www.louis-coder.com/ID3/ID3v2_TAG_WRITING_ERROR.png

In the Background, there's the Windows explorer showing the content of two mp3 files. The one is the
original file and the other (... - Kopie [that means copy]) is the same file where the tag was once read and
instantly written back by my tagging code.

As you see, the tag was destroyed by my code, the Explorer only displays Chinese chars.

You also see a hex editor showing the beginning of both files.

QUESTION: DO YOU FIND ANY ERROR IN THE TAG OF "2 Pac & Notorious B.I.G - Running (Dying To Live) (#1) -
Kopie.mp3" ???

You can download both mp3s here:

www.louis-coder.com/ID3/2 Pac & Notorious B.I.G - Running (Dying To Live) (#1) - Kopie.mp3
www.louis-coder.com/ID3/2 Pac & Notorious B.I.G - Running (Dying To Live) (#1).mp3

(please do not share or listen to these files, just watch the tag).

Note that the copy ("- Kopie") is the file where the tag was damaged by my code. Note that the ASCII codes the
hex editor shows are hexadecimal, that means '32' is no space char (which has ASCII code 32) but a char with
(Continue reading)

Peter Bennett | 28 Jan 2011 00:26
Picon

Re: My mp3 tagging code can't handle Unicode tags

It looks like you have ASCII but the encoding specifies unicode. If the 
way the text is encoded does not match the contents, you will get this. 
Encoding must be 00 or 01. If it is 00 you must have ascii characters, 
if it is 01 you must have unicode. If you use 01 with ascii characters 
you will see the problem that you have.
You really need to determine whether the song requires unicode or not. 
If it only uses standard ascii characters you can encode with ascii, 
other wise you should use Unicode.

Here is my code that does this in java. Note that for ID3v2.3 only 
encoding types 01 and 1 are valid("ISO-8859-1" and "UTF16"). the other 
two values are valid for ID3V2.4, which is not widely supported, so I do 
not recommend using them.

     static final String[] ENC_TYPES = {"ISO-8859-1", "UTF16",
     "UTF-16BE", "UTF-8"};

     // Attempt to encode in encoding 0, if not possible use encoding 1
     static public byte [] encodeString(String source, byte[] encodingB)
             throws UnsupportedEncodingException {
         byte [] result = source.getBytes(ENC_TYPES[encodingB[0]]);
         if (encodingB[0] == 0) {
             String checkResult = new 
String(result,ENC_TYPES[encodingB[0]]);
             if (!source.equals(checkResult)) {
                 encodingB[0] = 1;
                 result = source.getBytes(ENC_TYPES[encodingB[0]]);
             }
         }
         return result;
(Continue reading)


Gmane