Anthony Ferrara | 7 Jul 2012 16:41
Picon
Gravatar

[PHP-DEV] Run-tests.php JUnit format issue

Hey all,

I've run into an issue with run-tests.php with the junit format. The XML
that it generates can be invalid because of invalid UTF-8 characters and
invalid XML characters. This means that trying to parse it using something
like Jenkins gives a huge stack-trace because of invalid XML. I've been
digging through how to fix it, and I think I've come up with a solution.
But I'm not too happy with it, so I'd like some feedback.

https://github.com/php/php-src/blob/master/run-tests.php#L2096

Right now, the diff for a failed test is just injected in cdata tags, and
stuck unencoded in the result XML. For tests that are testing invalid UTF-8
bytes (or other character sets), that diff can contain bad byte sequences.

	$diff = empty($diff) ? '' : "<![CDATA[\n " . preg_replace('/\e/',
'<esc>', $diff) . "\n]]>";

What I'm proposing is to escape all non-UTF8 and non-XML safe bytes with
their value wrapped by <>. So chr(0xFF) (which is invalid in UTF8) would
become <xFF>

Now, to implement it is a bit more interesting. I've come up with a single
regex that will do it:

        $diff = preg_replace_callback(
'/(
[\x0-\x8]                                           # Control Characters
| [\xB-\xC]                                        # Invalid XML Characters
| [\xE-\x19]                                       # Invalid XML Characters
(Continue reading)

Ferenc Kovacs | 28 Jul 2012 01:40
Picon
Gravatar

Re: [PHP-DEV] Run-tests.php JUnit format issue

On Sat, Jul 7, 2012 at 4:41 PM, Anthony Ferrara <ircmaxell <at> gmail.com> wrote:

> Hey all,
>
> I've run into an issue with run-tests.php with the junit format. The XML
> that it generates can be invalid because of invalid UTF-8 characters and
> invalid XML characters. This means that trying to parse it using something
> like Jenkins gives a huge stack-trace because of invalid XML. I've been
> digging through how to fix it, and I think I've come up with a solution.
> But I'm not too happy with it, so I'd like some feedback.
>
> https://github.com/php/php-src/blob/master/run-tests.php#L2096
>
> Right now, the diff for a failed test is just injected in cdata tags, and
> stuck unencoded in the result XML. For tests that are testing invalid UTF-8
> bytes (or other character sets), that diff can contain bad byte sequences.
>
>         $diff = empty($diff) ? '' : "<![CDATA[\n " . preg_replace('/\e/',
> '<esc>', $diff) . "\n]]>";
>
>
> What I'm proposing is to escape all non-UTF8 and non-XML safe bytes with
> their value wrapped by <>. So chr(0xFF) (which is invalid in UTF8) would
> become <xFF>
>
> Now, to implement it is a bit more interesting. I've come up with a single
> regex that will do it:
>
>         $diff = preg_replace_callback(
> '/(
(Continue reading)

Gustavo Lopes | 28 Jul 2012 16:15
Picon
Favicon
Gravatar

Re: [PHP-DEV] Run-tests.php JUnit format issue

Em Sat, 28 Jul 2012 01:40:33 +0200, Ferenc Kovacs <tyra3l <at> gmail.com>  
escreveu:

> ps2: I think there would be still one hiccup in the code: afair we didn't
> handled the case when a CDATA closure happens to be in the test output,
> that could be handled by
> http://www.lshift.net/blog/2007/10/25/xml-cdata-and-escaping I guess.
>

This same solution (temporarily leaving CDATA) can be applied to encoding  
those special characters. See:

http://ideone.com/csB6b

If you turned say, \xF8 to &#xF8;, that would mean you be considering \xF8  
to be representing U+00F8, which is actually represented with a different  
byte sequence in UTF-8. Of course, the underlying problem is that many  
tests are not encoded in UTF-8.

I think Anthony's solution is fine.

-- 
Gustavo Lopes

--

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php


Gmane