Julian Reschke | 2 Jul 2008 22:52
Picon
Picon

Microsoft's "I mean it" content-type parameter


Hi,

(crossposted to both the HTTPbis WG's and HTML5 WG's mailing lists...)

looking at 
<http://blogs.msdn.com/ie/archive/2008/07/02/ie8-security-part-v-comprehensive-protection.aspx>:

"MIME-Handling: Sniffing Opt-Out

Next, we’ve provided web-applications with the ability to opt-out of 
MIME-sniffing. Sending the new authoritative=true attribute on the 
Content-Type HTTP response header prevents Internet Explorer from 
MIME-sniffing a response away from the declared content-type."

Let's ignore the issue of inventing a new media type parameter for all 
new media types for a moment...

It's good that MS recognizes that content-type-sniffing may be bad and 
that they are doing something about it. But is this really the right 
approach?

BR, Julian

Frank Ellermann | 3 Jul 2008 00:55
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


Julian Reschke wrote:

>| MIME-Handling: Sniffing Opt-Out

"Opt-out" and "embrace" should be put under Godwin's Law.

For their text/plain example, why can't they add an "try
to render as HTML" OPTIoN ?  OE6 can manage this issue 
as I want it, admittedly I wasn't aware that IE6 doesn't.
And IE8 can do whatever makes sense wrt security.

> Let's ignore the issue of inventing a new media type 
> parameter for all new media types for a moment...

<moment for="a" />  No ;-)  As your subject says, this
"I mean it" parameter has the decent charme of RFC 3514:
| The Security Flag in the IPv4 Header (Evil Bit).
| S. Bellovin. 1 April 2003.

> But is this really the right approach?

Assuming that HTTP servers know what they are talking
about is arguably no OPTIoN... :-(

 Frank

Robert Collins | 2 Jul 2008 23:26

Re: Microsoft's "I mean it" content-type parameter

On Wed, 2008-07-02 at 22:52 +0200, Julian Reschke wrote:
> Hi,
> 
> (crossposted to both the HTTPbis WG's and HTML5 WG's mailing lists...)
> 
> looking at 
> <http://blogs.msdn.com/ie/archive/2008/07/02/ie8-security-part-v-comprehensive-protection.aspx>:
> 
> "MIME-Handling: Sniffing Opt-Out
> 
> Next, we’ve provided web-applications with the ability to opt-out of 
> MIME-sniffing. Sending the new authoritative=true attribute on the 
> Content-Type HTTP response header prevents Internet Explorer from 
> MIME-sniffing a response away from the declared content-type."
> 
> Let's ignore the issue of inventing a new media type parameter for all 
> new media types for a moment...
> 
> It's good that MS recognizes that content-type-sniffing may be bad and 
> that they are doing something about it. But is this really the right 
> approach?

If they assume that fixing all the bust clients they have been shipping
for years is infeasible, then I think they would have concluded its the
right way.

I think its bogus - it requires every web site author in existence to
change their site to fix a defect in MSIE. Thats got to be harder to
deploy than just a hotfix to MSIE to not sniff at all. 'Sorry, bad idea,
fixed in hotfix #12345.'
(Continue reading)

William A. Rowe, Jr. | 3 Jul 2008 02:20

Re: Microsoft's "I mean it" content-type parameter


Robert Collins wrote:
> On Wed, 2008-07-02 at 22:52 +0200, Julian Reschke wrote:
>>
>> Let's ignore the issue of inventing a new media type parameter for all 
>> new media types for a moment...

And ignore the fact that the content may not be proxied properly?  I think
that's a pretty silly detail to ignore.

>> It's good that MS recognizes that content-type-sniffing may be bad and 
>> that they are doing something about it. But is this really the right 
>> approach?
> 
> If they assume that fixing all the bust clients they have been shipping
> for years is infeasible, then I think they would have concluded its the
> right way.

Of course, this repairs all the bust clients no more effectively than
changing their behavior to conform to RFC2616 in the first place.

> I think its bogus - it requires every web site author in existence to
> change their site to fix a defect in MSIE. Thats got to be harder to
> deploy than just a hotfix to MSIE to not sniff at all. 'Sorry, bad idea,
> fixed in hotfix #12345.'

Well, at least every administrator.

I find this statement from the blog very telling;

(Continue reading)

Justin James | 3 Jul 2008 16:42
Picon

RE: Microsoft's "I mean it" content-type parameter


> It would be very fun to see the example they cite, I sincerely doubt they
> exist to any legitimate extent today.  Our friends crawling the web could
> probably give us hard numbers.  I suspect the short history goes;
>
>   * some folks start serving files over http:, associate .html with text/html
>
>   * 1000's download ms authoring tools to create default.htm files
>
>   * 100's uploading to these "ancient" servers discover they render as either
>     binary/octet-stream or text/plain
>
>   * MS fixes their client to display .htm files as html
>
> Interestingly, they don't work around the fact that all of these servers are
> also configured to serve index.html and not default.htm.  If they relied on
> the administrators to fix one side of the coin...

William -

There are tons of legitimate use cases here they you have completely overlooked. For example, lots of
server side applications throw out content of a type different from what their file extension would
indicate. For example, the earliest "hit counter" programs were .cgi or .pl files (typically)
generating image/gif or image/jpeg content. The Web servers were set up explicitly to serve the output of
those applications as text/html. And a great many developers had no idea that they needed to change the
Content-type at the code level to make this work. Content sniffing made life easier for these developers.
Indeed, Content-disposition is a brutally critical header for any developer making, say, a file
download application that actually spews forth the bits itself instead of performing a redirection.

J.Ja
(Continue reading)

Daniel Stenberg | 3 Jul 2008 17:51
Picon
Favicon
Gravatar

RE: Microsoft's "I mean it" content-type parameter


On Thu, 3 Jul 2008, Justin James wrote:

> There are tons of legitimate use cases here they you have completely 
> overlooked. For example, lots of server side applications throw out content 
> of a type different from what their file extension would indicate. For 
> example, the earliest "hit counter" programs were .cgi or .pl files 
> (typically) generating image/gif or image/jpeg content. The Web servers were 
> set up explicitly to serve the output of those applications as text/html. 
> And a great many developers had no idea that they needed to change the 
> Content-type at the code level to make this work. Content sniffing made life 
> easier for these developers.

Uh, that doesn't make sense.

Sure, some scripts output wrong Content-Type. Then no browser can output it 
correctly and thus you fix the server side.

But, this system with bad Content-Type outputs still showing up nicely only 
works if the client *already* have does this "sniffing" business and thus they 
more or less encouraged the server-side hackers to remain sloppy.

So this cannot have been a case where the browser adapted to how servers work, 
since servers would hardly ever have worked this way if some browsers didn't 
already support it...

I find this "I promise this time I really mean that the type is what I say" 
attribute hilariously funny.

--

-- 
(Continue reading)

Sam Ruby | 3 Jul 2008 22:23
Picon
Favicon

RE: Microsoft's "I mean it" content-type parameter

Daniel Stenberg wrote on 07/03/2008 11:51:39 AM:
>
> I find this "I promise this time I really mean that the type is what I say"
> attribute hilariously funny.

That's not exactly what it means.

Consider:

http://feedvalidator.org/testcases/atom/1.1/brief-noerror.xml

This is a testcase (you might have guessed such from the URI).  It has been served as application/xml for years.  Sometime during the that period a number of uppity browsers one by one decided to throw out the rules that have guided the development of the internet and that they knew better than I did as to how this data was intended to be displayed.  They did so because a number of *other* people don't know how to configure their servers.  And now the HTML5 working group has decided to ensrine this decision.

Because I'm clearly outnumbered, I can understand why the default can't be to support me, despite my not having done anything wrong.  I know some are indignant about this, but I'm not one of them.  I simply want a way to say that despite what other people may have done and may continue to do, I would like to request that the content type I am sending be respected.

Spell it differently, put in verbage that says that web servers SHOULD NOT enable this by default (and by this, Apache, I mean YOU), I care not.  Just give me a way to specify the MIME type reliably.

- Sam Ruby

Lachlan Hunt | 4 Jul 2008 00:34
Picon

Re: Microsoft's "I mean it" content-type parameter


Sam Ruby wrote:
> Daniel Stenberg wrote on 07/03/2008 11:51:39 AM:
>> I find this "I promise this time I really mean that the type is what I
> say"
>> attribute hilariously funny.
> 
> That's not exactly what it means.
> 
> Consider:
> 
> http://feedvalidator.org/testcases/atom/1.1/brief-noerror.xml
> 
> This is a testcase (you might have guessed such from the URI).  It has been
> served as application/xml for years.  Sometime during the that period a
> number of uppity browsers one by one decided to throw out the rules that
> have guided the development of the internet and that they knew better than
> I did as to how this data was intended to be displayed.

As far as I can see, there is technically nothing at all wrong with the 
the way that file is rendered by browsers.  It's served as XML, which 
correctly parse it with an XML parser; they see the Atom namespace and 
then process and render it accordingly.  What are you expecting?  It is 
analogous to the way an XHTML file served as application/xml using the 
XHTML namespace will be processed and rendered as XHTML.

--

-- 
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/

Jamie Lokier | 3 Jul 2008 23:01

Re: Microsoft's "I mean it" content-type parameter


Sam Ruby wrote:
> http://feedvalidator.org/testcases/atom/1.1/brief-noerror.xml

> This is a testcase (you might have guessed such from the URI).  It
> has been served as application/xml for years.  Sometime during the
> that period a number of uppity browsers one by one decided to throw
> out the rules that have guided the development of the internet and
> that they knew better than I did as to how this data was intended to
> be displayed.

What's the problem with this resource?

I'm assuming that one-by-one uppity browsers includes Firefox, as
there aren't many major browser engines.  The resource seems to behave
fine in Firefox.

-- Jamie

John Kemp | 3 Jul 2008 23:23

Re: Microsoft's "I mean it" content-type parameter


Jamie Lokier wrote:
> Sam Ruby wrote:
>> http://feedvalidator.org/testcases/atom/1.1/brief-noerror.xml
> 
>> This is a testcase (you might have guessed such from the URI).  It
>> has been served as application/xml for years.  Sometime during the
>> that period a number of uppity browsers one by one decided to throw
>> out the rules that have guided the development of the internet and
>> that they knew better than I did as to how this data was intended to
>> be displayed.
> 
> What's the problem with this resource?
> 
> I'm assuming that one-by-one uppity browsers includes Firefox, as
> there aren't many major browser engines.  The resource seems to behave
> fine in Firefox.

Depends what you call fine I suppose.

The content-type is reported (via 'View Page Info') in my Firefox 2 as 
application/xhtml+xml. However, the page is rendered as if it were an 
ATOM feed (which usually has the content-type application/atom+xml IIRC) 
rather than as if it were XHTML.

The user of a user-agent would probably call that fine. The person 
configuring his web server to deliver a particular piece of content with 
a specified content-type might not. Both attitudes seem reasonable.

Regards,

- johnk

> 
> -- Jamie
> 

L. David Baron | 4 Jul 2008 00:17
Gravatar

Re: Microsoft's "I mean it" content-type parameter


On Thursday 2008-07-03 17:23 -0400, John Kemp wrote:
> Jamie Lokier wrote:
>> Sam Ruby wrote:
>>> http://feedvalidator.org/testcases/atom/1.1/brief-noerror.xml

> The content-type is reported (via 'View Page Info') in my Firefox 2 as  
> application/xhtml+xml. However, the page is rendered as if it were an  
> ATOM feed (which usually has the content-type application/atom+xml IIRC)  
> rather than as if it were XHTML.

That's because the MIME-type dispatch of the application/xhtml+xml
type triggers XML processing by namespace-based dispatch.

Mixed-namespace documents are (or at least were) probably the
"killer app" of switching from HTML to XHTML, but there's been
little standardized so far for MIME type labeling (and handling
thereof) or content negotiation of such documents, so we're stuck
using the MIME types of the constituent languages.

I tried to start some discussion of these issues in the CDF group
back in 2005 (see [1] and [2]), and I think we came to consensus on
some issues (not necessarily agreeing with what I proposed in those
documents), but the charter wasn't really geared towards producing a
spec in that area, and I think it's since been clarified to say that
such issues are clearly out of scope for CDF and should be discussed
by the TAG.

-David

[1] http://damowmow.com/temp/http-conneg-cdi-req.xhtml or
http://lists.w3.org/Archives/Member/member-cdf/2005Feb/att-0090/http-conneg-cdi-req.xhtml

[2] http://dbaron.org/cdi-req/ or
http://lists.w3.org/Archives/Member/member-cdf/2005Oct/att-0181/cdi-req.xhtml

--

-- 
L. David Baron                                 http://dbaron.org/
Mozilla Corporation                       http://www.mozilla.com/

Sam Ruby | 4 Jul 2008 15:20
Picon
Favicon

Re: Microsoft's "I mean it" content-type parameter

L. David Baron wrote on 07/03/2008 06:17:22 PM:
>
> On Thursday 2008-07-03 17:23 -0400, John Kemp wrote:
> > Jamie Lokier wrote:
> >> Sam Ruby wrote:
> >>> http://feedvalidator.org/testcases/atom/1.1/brief-noerror.xml
>
> > The content-type is reported (via 'View Page Info') in my Firefox 2 as  
> > application/xhtml+xml. However, the page is rendered as if it were an  
> > ATOM feed (which usually has the content-type application/atom+xml IIRC)  
> > rather than as if it were XHTML.
>
> That's because the MIME-type dispatch of the application/xhtml+xml
> type triggers XML processing by namespace-based dispatch.
>
> Mixed-namespace documents are (or at least were) probably the
> "killer app" of switching from HTML to XHTML, but there's been
> little standardized so far for MIME type labeling (and handling
> thereof) or content negotiation of such documents, so we're stuck
> using the MIME types of the constituent languages.
>
> I tried to start some discussion of these issues in the CDF group
> back in 2005 (see [1] and [2]), and I think we came to consensus on
> some issues (not necessarily agreeing with what I proposed in those
> documents), but the charter wasn't really geared towards producing a
> spec in that area, and I think it's since been clarified to say that
> such issues are clearly out of scope for CDF and should be discussed
> by the TAG.

As the author of much of this content, I would prefer that a means be provided to trigger the display of these pages as XML, complete with displaying the comments.

If I can't trigger the XML display, being able to force a text/plain display would meet my needs.

- Sam Ruby

William A. Rowe, Jr. | 3 Jul 2008 21:18

Re: Microsoft's "I mean it" content-type parameter


Daniel Stenberg wrote:
> 
> On Thu, 3 Jul 2008, Justin James wrote:
> 
>> [...] a great many developers had no idea 
>> that they needed to change the Content-type at the code level to make 
>> this work. Content sniffing made life easier for these developers.
> 
> Uh, that doesn't make sense.
> 
> Sure, some scripts output wrong Content-Type. Then no browser can output 
> it correctly and thus you fix the server side.

Exactly.

> But, this system with bad Content-Type outputs still showing up nicely 
> only works if the client *already* have does this "sniffing" business 
> and thus they more or less encouraged the server-side hackers to remain 
> sloppy.
> 
> So this cannot have been a case where the browser adapted to how servers 
> work, since servers would hardly ever have worked this way if some 
> browsers didn't already support it...

Right.  For example, charset sniffing.  It's clear that MS has done the
community a huge disservice by including UTF-7 in the automatic sniffing,
given that it's now impossible to walk around every autodetection pitfall.
Sniffing UTF-7, far more than the base content-type, is the origin of my
grief against Microsoft.  (Well, that and the fact that it's very hard to
serve example files presented as text/plain for inspection when Microsoft
insists every user actually wants the results of that text file.)

If charsets were not sniffed, administrators would be 'forced' to correct
such flaws.  (They might be aware of the flaw in the first place, for that
matter.)  And should they leave the content type unstated, or without a
charset, it would then be perfectly reasonable for the content author to
describe the file in meta tags (once again, correctly).

> I find this "I promise this time I really mean that the type is what I 
> say" attribute hilariously funny.

:)

Justin James | 3 Jul 2008 18:04
Picon

RE: Microsoft's "I mean it" content-type parameter


Daniel -

The *entire* Web is founded on sloppy programmers, what makes you think that
this scenario is an exception? If browser vendors didn't create browsers
that accepted any semi-reasonable slop served out, then the Web would still
just be TBL and a few scientists who were used to Postscript being delighted
at how "simple" HTML is (while the public has proven how hard it is to write
valid HTML) to pass around their academic papers. :)

But you are right about there being a chicken/egg issue here in my
particular example. I am *positive* that Apache's behavior of throwing
everything out at text/html unless explicitly specified otherwise with a
MIME mapping or in the headers from a CGI had a lot to do with it, which
Julian already explained.

I don't think the proposal is a good one either, for the record. I also
don't think it is a bad one. It doesn't break anything, and it extends the
protocol in a way that does not cause any problems to existing stuff, and it
will only be used by a small fraction of people.

J.Ja

-----Original Message-----
From: public-html-request@...
[mailto:public-html-request@...] On
Behalf Of Daniel Stenberg
Sent: Thursday, July 03, 2008 11:52 AM
To: 'HTTP Working Group'
Cc: public-html@...
Subject: RE: Microsoft's "I mean it" content-type parameter

On Thu, 3 Jul 2008, Justin James wrote:

> There are tons of legitimate use cases here they you have completely 
> overlooked. For example, lots of server side applications throw out
content 
> of a type different from what their file extension would indicate. For 
> example, the earliest "hit counter" programs were .cgi or .pl files 
> (typically) generating image/gif or image/jpeg content. The Web servers
were 
> set up explicitly to serve the output of those applications as text/html. 
> And a great many developers had no idea that they needed to change the 
> Content-type at the code level to make this work. Content sniffing made
life 
> easier for these developers.

Uh, that doesn't make sense.

Sure, some scripts output wrong Content-Type. Then no browser can output it 
correctly and thus you fix the server side.

But, this system with bad Content-Type outputs still showing up nicely only 
works if the client *already* have does this "sniffing" business and thus
they 
more or less encouraged the server-side hackers to remain sloppy.

So this cannot have been a case where the browser adapted to how servers
work, 
since servers would hardly ever have worked this way if some browsers didn't

already support it...

I find this "I promise this time I really mean that the type is what I say" 
attribute hilariously funny.

--

-- 

  / daniel.haxx.se

Karl Dubost | 4 Jul 2008 03:25
Picon
Favicon

Re: Microsoft's "I mean it" content-type parameter


Le 4 juil. 2008 à 01:04, Justin James a écrit :
> If browser vendors didn't create browsers
> that accepted any semi-reasonable slop served out, then the Web  
> would still
> just be TBL […]

Statement which can't be proved.

> It doesn't break anything, and it extends the protocol in a way that  
> does not cause any problems to existing stuff, and it will only be  
> used by a small fraction of people.

Sniffing content causes issues, for example when you want to serve an  
HTML file with text/plain on *purpose*.  Use case: insert the source  
code of an html document with object or iframe sent as text/plain.

--

-- 
Karl Dubost - W3C
http://www.w3.org/QA/
Be Strict To Be Cool

Justin James | 4 Jul 2008 05:17
Picon

RE: Microsoft's "I mean it" content-type parameter


> Sniffing content causes issues, for example when you want to serve an  
> HTML file with text/plain on *purpose*.  Use case: insert the source  
> code of an html document with object or iframe sent as text/plain.

Yes, this is correct. But it does not contradict my statement that this
proposal does not break anything. If anything, it lends weight to the
proposal. After all, browsers are performing sniffing anyways already,
*regardless of whether or not they are supposed to* (a phrase that can be
applied to much of browsers' behavior...). Therefore, this proposal provides
a mechanism for people on the server side to override that behavior in
precisely the scenarios that you describe.

There are situations where content sniffing makes sense. There are
situations where it doesn't. The only way to resolve it is to have a flag
that triggers a "no sniffing mode"; to do it the other way around (with a
flag that *turns on* sniffing mode) would contradict existing behavior and
therefore Break The Web.

J.Ja

William A. Rowe, Jr. | 4 Jul 2008 05:46

Re: Microsoft's "I mean it" content-type parameter


Justin James wrote:
> 
> There are situations where content sniffing makes sense. 

Yes.  There is local file content.  There is unintelligent, ftp based
delivery.  These all need some context that doesn't exist behind the
delivery of the content.

> There are
> situations where it doesn't. The only way to resolve it is to have a flag
> that triggers a "no sniffing mode"; to do it the other way around (with a
> flag that *turns on* sniffing mode) would contradict existing behavior and
> therefore Break The Web.

Nonsense.  HTTP/1.1 defined the mechanism to do just this.  The fact that
vendors ignored this, suffered the consequences on vuln-dev and bugtraq,
and will continue to do so until they follow the protocol reiterates that
sniffing has a place, and within a well defined protocol this isn't it.

Justin James | 4 Jul 2008 06:19
Picon

RE: Microsoft's "I mean it" content-type parameter


>> There are
>> situations where it doesn't. The only way to resolve it is to have a flag
>> that triggers a "no sniffing mode"; to do it the other way around (with a
>> flag that *turns on* sniffing mode) would contradict existing behavior and
>> therefore Break The Web.

> Nonsense.  HTTP/1.1 defined the mechanism to do just this.  The fact that
> vendors ignored this, suffered the consequences on vuln-dev and bugtraq,
> and will continue to do so until they follow the protocol reiterates that
> sniffing has a place, and within a well defined protocol this isn't it.

It is quite clear that you are ignoring the point here. The point is *not* what the spec says. As you point out,
there is a serious disconnect between reality and the spec. What you are essentially saying is, "if
everyone just followed the spec, everything would be fine." Which is true. But it is also not what
happened. Which is the point.

Getting the currently HTML spec to handle the current reality in a way that not only leaves existing
applications un-broken (regardless of their usage of a non-standard behavior) while also providing an
option for avoiding the non-standard behavior is the challenge here. I am curious what you think would be a
good approach to resolving this situation as it currently stands. And just re-writing the browsers to
stop content sniffing is not a realistic answer, nor one that meets the goals set forth.

J.Ja

William A. Rowe, Jr. | 4 Jul 2008 06:41

Re: Microsoft's "I mean it" content-type parameter


Justin James wrote:
> 
> It is quite clear that you are ignoring the point here. The point is *not* what the spec says. As you point
out, there is a serious disconnect between reality and the spec. What you are essentially saying is, "if
everyone just followed the spec, everything would be fine." Which is true. But it is also not what
happened. Which is the point.

No, I'm observing that a very small percentage of sites would be instantly
broken by such a draconian "course correction" by browser authors.

And a much larger number of vulnerable sites would be "resolved" by such
a correction (in respect to UTF-7 detection particularly, but many other
forms of sniffing in general).

Rather than persisting FUD, I'd challenge you to point out only one
significant site, and a relatively minor site, affected by such a change.
Folks who insist that sniffing is "necessary" really aught to back up the
assertion with hard data, or close the significant vulnerabilities that
persist in the ecosystem.

As mentioned in a previous note, sniffing served a noble purpose for a safer
environment, one that simply doesn't exist.

Justin James | 4 Jul 2008 09:50
Picon

RE: Microsoft's "I mean it" content-type parameter


> Rather than persisting FUD, I'd challenge you to point out only one
> significant site, and a relatively minor site, affected by such a change.

Others on the list have already provided some pretty decent examples, no need for me to do so. While those
sites may constitute a minority of the total number of Web sites out there, they are some fairly major
sites. Furthermore, as we have seen in past discussions on this mailing list, if a change we make "breaks"
0.1% of pages, that is still millions of pages, more than we usually like to "break".

> Folks who insist that sniffing is "necessary" really aught to back up the
> assertion with hard data, or close the significant vulnerabilities that
> persist in the ecosystem.

As a technique, it is not "necessary". However, not breaking the Web *is* necessary, or else HTML 5 will
become yet another ignored spec on the pile. So it is now important that we find a way to deal with this
without causing *more* problems.

> As mentioned in a previous note, sniffing served a noble purpose for a safer
> environment, one that simply doesn't exist.

I never disagreed with you here. At best, I think people on this list can provide a guess as to the historical
thinking that lead to browsers doing this. You can go ahead and stick to the position of "I don't care about
anyone not following the spec, and I wish browsers wouldn't enable them to write bad code." In which case
you back a vision of the HTML spec that stands in an ivory tower, or is some sort of brass ring (wow, I just
realized the M:tG references there!) attainable by only the elite who use The Right Software and who have
memorized many hundred-plus page specs. Or, you can back a more pragmatic vision, where we work hard to get
a spec that Real World Developers can target. Personally, as a "real world developer", I opt for the latter.

Also, I would greatly appreciate it if you didn't toss FUD accusations around. First of all, personal
attacks are highly unnecessary. Second of all, I am one of the people active on this list without any strong
ties to an industry player. I do not represent a major company. I am not trying to get my employers decisions
ratified after the fact by the HTML 5 spec. In fact, my current employer has no intention of using either
HTTP or HTML in any of their products, except for possibly as a documentation format (doubtful at that). I
chose to join this group since I felt that the HTML 5 draft that I read was headed in the wrong direction. The
"special interest" that I try my best to represent are real world developers who have to use this spec and
HTML documents, based upon my own experiences with HTML, HTTP, etc. and et al. Given my reasons for
participation, I have zero reason to "persist FUD" (as you state). Finally, I am not "persisting FUD" anyways.

So, now that your challenge to me has been answered by others, I once again ask you to present your own
solution to the problem. Not, "well, if only the evil browsers from vendors I don't like would only follow
the spec to the letter, everything would be fine!" But an actual alternative to what has already been
presented (the humorously, but accurately titled "Microsoft's 'I mean it' content-type parameter").

J.Ja

Adam Barth | 4 Jul 2008 07:19

Re: Microsoft's "I mean it" content-type parameter


On Thu, Jul 3, 2008 at 9:41 PM, William A. Rowe, Jr.
<wrowe@...> wrote:
> Rather than persisting FUD, I'd challenge you to point out only one
> significant site, and a relatively minor site, affected by such a change.

I encourage you to build a copy of Firefox without content sniffing
and try surfing the web.  I tried this for a while, and I remember
there being a lot of broken sites, including digg.com and united.com.

Adam

Karl Dubost | 4 Jul 2008 07:42
Picon
Favicon

Re: Microsoft's "I mean it" content-type parameter


Le 4 juil. 2008 à 14:19, Adam Barth a écrit :
> there being a lot of broken sites, including digg.com and united.com.

I had to check. No issues so far.

# DIGG

http://web-sniffer.net/?url=http%3A%2F%2Fdigg.com&submit=Submit&http=1.1&gzip=yes&type=GET&uak=0

Content-Type:	text/html; charset=UTF-8

took another page
http://web-sniffer.net/?url=http%3A%2F%2Fdigg.com%2Ftech_news%2FThe_new_Digg_actuall_validates_against_the_W3C&submit=Submit&http=1.1&gzip=yes&type=GET&uak=0

Content-Type:	text/html; charset=UTF-8

# UNITED

http://web-sniffer.net/?url=http%3A%2F%2Fwww.united.com%2F&submit=Submit&http=1.1&gzip=yes&type=GET&uak=0

Content-type:	text/html

--

-- 
Karl Dubost - W3C
http://www.w3.org/QA/
Be Strict To Be Cool

Adam Barth | 4 Jul 2008 07:49

Re: Microsoft's "I mean it" content-type parameter


On Thu, Jul 3, 2008 at 10:42 PM, Karl Dubost <karl@...> wrote:
> Le 4 juil. 2008 à 14:19, Adam Barth a écrit :
>> there being a lot of broken sites, including digg.com and united.com.
>
> I had to check. No issues so far.

It looks like you're just checking the front page.  As I recall, the
Digg issue was in one of its frames.  Instead of seeing what I was
supposed to see, I saw a script tag and a bunch of JavaScript.  On the
United site, it was something that cropped up while I was booking a
flight.

I recommend the experiment I mentioned, compiling a browser without
content sniffing and actually trying to use the web for a reasonable
amount of time.  I tried this a while ago, so it's possible the sites
have changed in the intervening months.

Adam

Julian Reschke | 4 Jul 2008 08:12
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


Adam Barth wrote:
> It looks like you're just checking the front page.  As I recall, the
> Digg issue was in one of its frames.  Instead of seeing what I was
> supposed to see, I saw a script tag and a bunch of JavaScript.  On the
> United site, it was something that cropped up while I was booking a
> flight.
> 
> I recommend the experiment I mentioned, compiling a browser without
> content sniffing and actually trying to use the web for a reasonable
> amount of time.  I tried this a while ago, so it's possible the sites
> have changed in the intervening months.

Or switch it off in the browser, when on IE7: 
<http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx#364853>.

BR, Julian

Adam Barth | 4 Jul 2008 08:18

Re: Microsoft's "I mean it" content-type parameter

On Thu, Jul 3, 2008 at 11:12 PM, Julian Reschke <julian.reschke@...> wrote:
> Adam Barth wrote:
>> I recommend the experiment I mentioned, compiling a browser without
>> content sniffing and actually trying to use the web for a reasonable
>> amount of time.
>
> Or switch it off in the browser, when on IE7:
> <http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx#364853>.

Oh nice, I didn't know about that.  I've attached an (untested) patch
that I think turns off content sniffing in TOT Firefox for those that
would like to try this out.

Adam
Attachment (no-content-sniffing.patch): application/octet-stream, 8 KiB
Eric Lawrence | 6 Jul 2008 03:22
Picon

RE: Microsoft's "I mean it" content-type parameter


Unfortunately, the existing option in the IE6+ Security Zones UI is both poorly named and does not really do
what it implies.  Rather than turning off sniffing altogether, it modifies the behavior only in the case of
an "ambiguous" MIME type.  Specifically, "text/plain" and IIRC "application/octet-stream."

The new authoritative=true attribute introduced for IE8 Beta-2, on the other hand, will be effective for
all MIME types.  You can simply see what behavior change would result if IE were to universally change
behavior by writing a small Fiddler (www.fiddler2.com) response modification rule that sets the
authoritative=true attribute for all HTTP responses.

Please do keep in mind, however, that most folks (even the ultra-web engaged on these lists) see but a small
fraction of the web, especially considering private address space/intranets, etc.

Thanks,

Eric Lawrence
Program Manager
Internet Explorer - Security
________________________________________
From: ietf-http-wg-request@...
[ietf-http-wg-request@...] On Behalf Of Adam Barth [w3c@...]
Sent: Thursday, July 03, 2008 11:18 PM
To: Julian Reschke
Cc: Karl Dubost; HTTP Working Group; HTML WG
Subject: Re: Microsoft's "I mean it" content-type parameter

On Thu, Jul 3, 2008 at 11:12 PM, Julian Reschke <julian.reschke@...> wrote:
> Adam Barth wrote:
>> I recommend the experiment I mentioned, compiling a browser without
>> content sniffing and actually trying to use the web for a reasonable
>> amount of time.
>
> Or switch it off in the browser, when on IE7:
> <http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx#364853>.

Oh nice, I didn't know about that.  I've attached an (untested) patch
that I think turns off content sniffing in TOT Firefox for those that
would like to try this out.

Adam

Frank Ellermann | 6 Jul 2008 18:00
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


Eric Lawrence wrote:

> keep in mind, however, that most folks (even the ultra-web engaged
> on these lists) see but a small fraction of the web, especially 
> considering private address space/intranets, etc.

Yes, but a general philosophical problem with any "do what I mean"
flag applies:

(1) A relevant fraction of the Web got it wrong, using x=y where
    they should have said x=z.
(2) Therefore you couldn't trust that x=y means x=y, introducing
    some "what is x divination".
(3) That annoyed another relevant fraction of the Web who really
    want x=y when they say x=y.
(4) You add an "I mean it" flag for (3), sticking to "divination"
    as default for (1).
(5) In theory protocols, software, and config files are upgraded
    to add those new "I mean it" flags everywhere.  As that is a
    worldwide upgrade stunt you lose a major fraction of the Web
    sticking to (1) or (3) without this flag.
(6) Another major fraction does what you want, among them a part
    of (1) now saying "x=y I mean it" when they clearly want x=z.
(7) SNAFU, your flag made it worse.

Some problems can't be solved in specifications because it's a
problem with folks never reading specifications.

 Frank

Stefan Eissing | 7 Jul 2008 08:58
Picon
Favicon

Re: Microsoft's "I mean it" content-type parameter


When HTTP has solved this problem, could you guys head over to the  
SMTP groups and let them know? They struggle with the "This-Is-Really- 
No-Spam" header for quite some time without making a break through.

//Stefan

Am 06.07.2008 um 18:00 schrieb Frank Ellermann:

>
> Eric Lawrence wrote:
>
>> keep in mind, however, that most folks (even the ultra-web engaged
>> on these lists) see but a small fraction of the web, especially
>> considering private address space/intranets, etc.
>
> Yes, but a general philosophical problem with any "do what I mean"
> flag applies:
>
> (1) A relevant fraction of the Web got it wrong, using x=y where
>     they should have said x=z.
> (2) Therefore you couldn't trust that x=y means x=y, introducing
>     some "what is x divination".
> (3) That annoyed another relevant fraction of the Web who really
>     want x=y when they say x=y.
> (4) You add an "I mean it" flag for (3), sticking to "divination"
>     as default for (1).
> (5) In theory protocols, software, and config files are upgraded
>     to add those new "I mean it" flags everywhere.  As that is a
>     worldwide upgrade stunt you lose a major fraction of the Web
>     sticking to (1) or (3) without this flag.
> (6) Another major fraction does what you want, among them a part
>     of (1) now saying "x=y I mean it" when they clearly want x=z.
> (7) SNAFU, your flag made it worse.
>
> Some problems can't be solved in specifications because it's a
> problem with folks never reading specifications.
>
>  Frank
>
>

--
<green/>bytes GmbH, Hafenweg 16, D-48155 Münster, Germany
Amtsgericht Münster: HRB5782

Picon

Re: Microsoft's "I mean it" content-type parameter


2008/7/6 Eric Lawrence <ericlaw@...>:
>...
> The new authoritative=true attribute introduced for IE8 Beta-2, on the other hand, will be effective for
all MIME types.  You can simply see what behavior change would result if IE were to universally change
behavior by writing a small Fiddler (www.fiddler2.com) response modification rule that sets the
authoritative=true attribute for all HTTP responses.
>

AFAIK, IE8 Beta-2 isn't available for a public download, so we can't
see that behavior with IE.

Michael A. Puls II | 4 Jul 2008 09:00
Picon

Re: Microsoft's "I mean it" content-type parameter


On 7/4/08, Adam Barth <w3c@...> wrote:
> On Thu, Jul 3, 2008 at 11:12 PM, Julian Reschke <julian.reschke@...> wrote:
>  > Adam Barth wrote:
>  >> I recommend the experiment I mentioned, compiling a browser without
>  >> content sniffing and actually trying to use the web for a reasonable
>  >> amount of time.
>  >
>
> > Or switch it off in the browser, when on IE7:
>  > <http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx#364853>.
>
>
> Oh nice, I didn't know about that.  I've attached an (untested) patch
>  that I think turns off content sniffing in TOT Firefox for those that
>  would like to try this out.

For Opera, turning on opera:config#trust%20server%20types might help.

--

-- 
Michael

Boris Zbarsky | 4 Jul 2008 08:28
Picon
Favicon

Re: Microsoft's "I mean it" content-type parameter


Adam Barth wrote:
> Oh nice, I didn't know about that.  I've attached an (untested) patch
> that I think turns off content sniffing in TOT Firefox for those that
> would like to try this out.

That looks about right, though it also turns of sniffing in cases when 
there is no Content-Type header, and for local files...

-Boris

Julian Reschke | 4 Jul 2008 08:23
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


Adam Barth wrote:
> On Thu, Jul 3, 2008 at 11:12 PM, Julian Reschke <julian.reschke@...> wrote:
>> Adam Barth wrote:
>>> I recommend the experiment I mentioned, compiling a browser without
>>> content sniffing and actually trying to use the web for a reasonable
>>> amount of time.
>> Or switch it off in the browser, when on IE7:
>> <http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx#364853>.
> 
> Oh nice, I didn't know about that.  I've attached an (untested) patch
> that I think turns off content sniffing in TOT Firefox for those that
> would like to try this out.

It would be great to have that as an opt-in in FF3 -- it's bad when the 
browser uses content sniffing (or needs to); but it's even worse that 
page authors have no simple way to find out it happened.

BR, Julian

Dave Singer | 4 Jul 2008 16:28
Picon
Favicon

Re: Microsoft's "I mean it" content-type parameter


At 8:23  +0200 4/07/08, Julian Reschke wrote:
>Adam Barth wrote:
>>On Thu, Jul 3, 2008 at 11:12 PM, Julian Reschke 
>><julian.reschke@...> wrote:
>>>Adam Barth wrote:
>>>>I recommend the experiment I mentioned, compiling a browser without
>>>>content sniffing and actually trying to use the web for a reasonable
>>>>amount of time.
>>>Or switch it off in the browser, when on IE7:
>>><http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx#364853>.
>>
>>Oh nice, I didn't know about that.  I've attached an (untested) patch
>>that I think turns off content sniffing in TOT Firefox for those that
>>would like to try this out.
>
>It would be great to have that as an opt-in in FF3 -- it's bad when 
>the browser uses content sniffing (or needs to); but it's even worse 
>that page authors have no simple way to find out it happened.
>

I rather suspect that quite a number of sites are unwittingly relying 
on content sniffing right now;  it's not easy to notice that you're 
using it, if you are a page author.

An interesting but hypothetical question is whether they actually 
need to, or whether they could fix the site.  My suspicion is that 
they could easily fix it.

If servers were fixed to omit content-type headers for content when 
they don't know the type (i.e. align with the spec), and browsers 
were fixed to sniff only when content-type was absent, magically 
overnight all at once, I would expect a rough few months as the 
places were found and fixed, and then we'd settle down again.  But it 
isn't going to happen...
--

-- 
David Singer
Apple/QuickTime

Roy T. Fielding | 3 Jul 2008 20:20
Favicon
Gravatar

Re: Microsoft's "I mean it" content-type parameter


On Jul 3, 2008, at 9:04 AM, Justin James wrote:
> The *entire* Web is founded on sloppy programmers, what makes you  
> think that
> this scenario is an exception? If browser vendors didn't create  
> browsers
> that accepted any semi-reasonable slop served out, then the Web  
> would still
> just be TBL and a few scientists who were used to Postscript being  
> delighted
> at how "simple" HTML is (while the public has proven how hard it is  
> to write
> valid HTML) to pass around their academic papers. :)

Whoa, talk about revisionist history.  There was no significant  
sloppyness
in content types on the web until after IE3 came out with that bug, and
the reason they did so had nothing to do with the content on servers.
It was just "simpler" for them to use the same algorithm regardless of
source (what everyone else calls a security hole).  It was only later,
when Netscape and other browsers had problems with content that was  
served
at sites that only use MSIE for authoring, that sniffing by other  
browsers
became common on the Web.

> But you are right about there being a chicken/egg issue here in my
> particular example. I am *positive* that Apache's behavior of throwing
> everything out at text/html unless explicitly specified otherwise  
> with a
> MIME mapping or in the headers from a CGI had a lot to do with it,  
> which
> Julian already explained.

No, it had nothing to do with it.  Apache sends text/plain by default
because that is the desired config for files with no type extensions
on Unix filesystems.  It comes from NCSA httpd history and is very hard
to deprecate without breaking valid content that has been correctly
configured.  In any case, it has only had a negative effect on new file
formats, not defined ones like HTML, and has no effect whatsoever on  
scripts.
I've never seen a script forget to set its own content-type.

> I don't think the proposal is a good one either, for the record. I  
> also
> don't think it is a bad one. It doesn't break anything, and it  
> extends the
> protocol in a way that does not cause any problems to existing  
> stuff, and it
> will only be used by a small fraction of people.

On the contrary, if MSIE uses it to detect authoritative content, then
Apache will probably be changed to always send that parameter when
browser UA is MSIE.  Apache works around stupid browser bugs and
security holes in browsers, whether they like it or not.

....Roy

Jamie Lokier | 3 Jul 2008 19:41

Re: Microsoft's "I mean it" content-type parameter


Justin James wrote:
> The *entire* Web is founded on sloppy programmers,

Indeed, just look at HTML5 potentially taking over from XHTML.  It's
practically reincarnating sloppiness, admitting defeat with XHTML,
this time trying to formalise how to handle slop so at least we do it
all the same.

(But last time I looked, it didn't handle slop the same way as certain
major browsers, but rather according to what the HTML5 authors thought
would be sensible, so that part seems doomed to be not implemented
according to spec, but as Yet Another incompatible compatibility layer
in real browsers.)

-- Jamie

Julian Reschke | 3 Jul 2008 09:42
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


William A. Rowe, Jr. wrote:
> ...
>> If they assume that fixing all the bust clients they have been shipping
>> for years is infeasible, then I think they would have concluded its the
>> right way.
> 
> Of course, this repairs all the bust clients no more effectively than
> changing their behavior to conform to RFC2616 in the first place.
> ...

Many more clients to content sniffing, and the HTML5 draft suggests it's 
  the right thing to do...

>> I think its bogus - it requires every web site author in existence to
>> change their site to fix a defect in MSIE. Thats got to be harder to
>> deploy than just a hotfix to MSIE to not sniff at all. 'Sorry, bad idea,
>> fixed in hotfix #12345.'
> 
> Well, at least every administrator.
> 
> I find this statement from the blog very telling;
> 
> "For instance, if Internet Explorer finds HTML content in a file delivered
> with the HTTP response header Content-Type: text/plain, IE determines that
> the content should be rendered as HTML. Because of the number of legacy
> servers on the web (e.g. those that serve all files as text/plain)
> MIME-sniffing is an important compatibility feature."
>
> It would be very fun to see the example they cite, I sincerely doubt they
> exist to any legitimate extent today.  Our friends crawling the web could
> probably give us hard numbers.  I suspect the short history goes;
> ...

As a matter of fact, I can't even reproduce that *specific* case with 
IE6 and IE7, see 
<http://hixie.ch/tests/adhoc/http/content-type/013.html>. Not sure what 
I'm missing here...

 > ...
> This makes no more sense than their lifting Content-Disposition into http,
> but there you go, it's there.  Until more major MS customers move entirely
> to Firefox or other alternatives, I don't anticipate this patchwork 
> approach
> changing.  And few content providers are so lucky as to dictate their
> browser client.
> ...

Hm, what does this have to do with Content-Disposition?

> ...

BR, Julian

Julian Reschke | 3 Jul 2008 20:00
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


Julian Reschke wrote:
> ...
> As a matter of fact, I can't even reproduce that *specific* case with 
> IE6 and IE7, see 
> <http://hixie.ch/tests/adhoc/http/content-type/013.html>. Not sure what 
> I'm missing here...
> ...

FYI, see

<http://blogs.msdn.com/ie/archive/2008/07/02/ie8-security-part-v-comprehensive-protection.aspx#8684670> 
-- it's because the tag IE's is sniffing for does not occur in the first 
256 bytes.

BR, Julian

Jamie Lokier | 3 Jul 2008 15:29

Re: Microsoft's "I mean it" content-type parameter


Julian Reschke wrote:
> Many more clients to content sniffing, and the HTML5 draft suggests it's 
>  the right thing to do...

So this whole question can be rephrased thus:

   Are there significant numbers of servers out there which are
   serving content intended to be rendered as HTML (or other) with
   Content-Type: text/plain?

> >>I think its bogus - it requires every web site author in existence to
> >>change their site to fix a defect in MSIE. Thats got to be harder to
> >>deploy than just a hotfix to MSIE to not sniff at all.

Perhaps.  On the other hand, presumably the IE sniffing heuristic has
worked fine for many years with most sites, therefore it *doesn't*
require most site authors to do anything at all.

> >"For instance, if Internet Explorer finds HTML content in a file delivered
> >with the HTTP response header Content-Type: text/plain, IE determines that
> >the content should be rendered as HTML. Because of the number of legacy
> >servers on the web (e.g. those that serve all files as text/plain)
> >MIME-sniffing is an important compatibility feature."
> >
> >It would be very fun to see the example they cite, I sincerely doubt they
> >exist to any legitimate extent today.

What about all HTTP->FTP proxies?  FTP doesn't have Content-Type.  Do
proxies themselves add Content-Type by sniffing what they are forwarding?

I suspect this IE sniffing came out of compatibility with what the
browser must do when viewing FTP servers and local filesystems, by the
way.  On both of those, the browser must either sniff the content,
and/or sniff the filename, to decide the content type (and charset).

Also, it might it be invoked by servers which report *no* Content-Type?

> >Our friends crawling the web could probably give us hard numbers.

Hard numbers would be good.

-- Jamie

  I suspect the short history goes;
> >...
> 
> As a matter of fact, I can't even reproduce that *specific* case with 
> IE6 and IE7, see 
> <http://hixie.ch/tests/adhoc/http/content-type/013.html>. Not sure what 
> I'm missing here...
> 
> > ...
> >This makes no more sense than their lifting Content-Disposition into http,
> >but there you go, it's there.  Until more major MS customers move entirely
> >to Firefox or other alternatives, I don't anticipate this patchwork 
> >approach
> >changing.  And few content providers are so lucky as to dictate their
> >browser client.
> >...
> 
> Hm, what does this have to do with Content-Disposition?
> 
> >...
> 
> BR, Julian
> 

Henrik Nordstrom | 4 Jul 2008 00:36
Gravatar

Re: Microsoft's "I mean it" content-type parameter

On tor, 2008-07-03 at 14:29 +0100, Jamie Lokier wrote:
> Also, it might it be invoked by servers which report *no* Content-Type?

Sniffing IS allowed in HTTP when there is no Content-Type. But if there
is a Content-Type is MUST be trusted to be correct.

Simple, efficient, beautiful. But fails when the content author has no
realistict means of controlling the content-type reported by the
server..

Regards
Henrik
Julian Reschke | 3 Jul 2008 15:43
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


Jamie Lokier wrote:
> Julian Reschke wrote:
>> Many more clients to content sniffing, and the HTML5 draft suggests it's 
>>  the right thing to do...
> 
> So this whole question can be rephrased thus:
> 
>    Are there significant numbers of servers out there which are
>    serving content intended to be rendered as HTML (or other) with
>    Content-Type: text/plain?

I fear so, because of Apache httpd's support for defaulting the content 
type (and the default being text/plain).

See <https://issues.apache.org/bugzilla/show_bug.cgi?id=13986>.

> ...
> Also, it might it be invoked by servers which report *no* Content-Type?
> ...

Well, that's totally ok. Servers that do not know the Content-Type of a 
resource should not guess, which in turn allows the recipient to sniff.

> ...

BR, Julian

Dave Singer | 3 Jul 2008 18:09
Picon
Favicon

Re: Microsoft's "I mean it" content-type parameter


At 15:43  +0200 3/07/08, Julian Reschke wrote:
>Jamie Lokier wrote:
>>Julian Reschke wrote:
>>>Many more clients to content sniffing, and the HTML5 draft 
>>>suggests it's  the right thing to do...
>>
>>So this whole question can be rephrased thus:
>>
>>    Are there significant numbers of servers out there which are
>>    serving content intended to be rendered as HTML (or other) with
>>    Content-Type: text/plain?
>
>I fear so, because of Apache httpd's support for defaulting the 
>content type (and the default being text/plain).
>
>See <https://issues.apache.org/bugzilla/show_bug.cgi?id=13986>.
>
>>...
>>Also, it might it be invoked by servers which report *no* Content-Type?
>>...
>
>Well, that's totally ok. Servers that do not know the Content-Type 
>of a resource should not guess, which in turn allows the recipient 
>to sniff.

but, as far as I can tell, there is no "unknown" content-type, is there?

--

-- 
David Singer
Apple/QuickTime

Julian Reschke | 3 Jul 2008 18:17
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


Dave Singer wrote:
>>> ...
>>> Also, it might it be invoked by servers which report *no* Content-Type?
>>> ...
>>
>> Well, that's totally ok. Servers that do not know the Content-Type of 
>> a resource should not guess, which in turn allows the recipient to sniff.
> 
> but, as far as I can tell, there is no "unknown" content-type, is there?

The way to signal "unknown" is not to send a Content-Type header at all. 
As far as I understand, this is what happens with httpd trunk when you 
set the DefaultType to "none".

BR, Julian

Dave Singer | 3 Jul 2008 18:28
Picon
Favicon

Re: Microsoft's "I mean it" content-type parameter


At 18:17  +0200 3/07/08, Julian Reschke wrote:
>Dave Singer wrote:
>>>>...
>>>>Also, it might it be invoked by servers which report *no* Content-Type?
>>>>...
>>>
>>>Well, that's totally ok. Servers that do not know the Content-Type 
>>>of a resource should not guess, which in turn allows the recipient 
>>>to sniff.
>>
>>but, as far as I can tell, there is no "unknown" content-type, is there?
>
>The way to signal "unknown" is not to send a Content-Type header at 
>all. As far as I understand, this is what happens with httpd trunk 
>when you set the DefaultType to "none".

or, it seems, "application/octet-stream".  From HTTP 1.1:

Any HTTP/1.1 message containing an entity-body SHOULD include a 
Content-Type header field defining the media type of that body. If 
and only if the media type is not given by a Content-Type field, the 
recipient MAY attempt to guess the media type via inspection of its 
content and/or the name extension(s) of the URI used to identify the 
resource. If the media type remains unknown, the recipient SHOULD 
treat it as type "application/octet-stream".

It does seem as if sniffing when there is a content-type header is 
flat-out forbidden.  I.e. the presence of content-type was supposed 
to serve *exactly* what the "I mean it" extension is doing...

Next up:  a server that always adds the "I mean it" attribute, even 
when it doesn't, and the subsequent invention of the "No, really, 
come on, you have to believe me, scout's honor, I really truly mean 
it" extension.
--

-- 
David Singer
Apple/QuickTime

Ian Hickson | 5 Jul 2008 09:16
Picon

Why Microsoft's authoritative=true won't work and is a bad idea


On Thu, 3 Jul 2008, Dave Singer wrote:
> 
> Next up:  a server that always adds the "I mean it" attribute, even when 
> it doesn't, and the subsequent invention of the "No, really, come on, 
> you have to believe me, scout's honor, I really truly mean it" 
> extension.

This is exactly why this won't work. Sites will use this correctly, then 
someone will set some default somewhere incorrectly, or copy and paste a 
correct site somehow, or misunderstand a tutorial or something, and deploy 
it without testing in IE8. And it will work fine in all the browsers 
except IE8, an then IE8 will be patched to make this attribute trigger a 
slightly different (and smaller) set of content-sniffing instead... except 
that the set won't be quite what was intended, because there will be some 
bug, and then there will be sites that DO test with this patched IE8, but 
end up relying on this slightly different content sniffing...

...and ten years from now we'll have four different content sniffing modes 
with four different ways of triggering it and the next generation will 
look back at 2008 and wonder what we were thinking.

The way out of this mess is containment. We define a strict set of 
Content-Type sniffing rules that are required to render the Web, and we 
get the browsers to converge on only sniffing for those.

That's what the HTML5 spec does by defining strict and precise content 
sniffing rules based on what browsers do now:

   http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#content-type-sniffing

--

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Chris Wilson | 8 Jul 2008 19:10
Picon
Favicon

RE: Why Microsoft's authoritative=true won't work and is a bad idea


Ian Hickson wrote:
> This is exactly why this won't work. Sites will use this correctly,
> then someone will set some default somewhere incorrectly, or copy and
> paste a correct site somehow, or misunderstand a tutorial or something,
> and deploy it without testing in IE8. And it will work fine in all the
> browsers except IE8, an then IE8 will be patched to make this attribute
> trigger a slightly different (and smaller) set of content-sniffing

No.

> The way out of this mess is containment. We define a strict set of
> Content-Type sniffing rules that are required to render the Web, and we
> get the browsers to converge on only sniffing for those.

Indeed.  And we're providing a way for content providers to opt out of that mess.

-Chris Wilson

Julian Reschke | 5 Jul 2008 09:39
Picon
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Ian Hickson wrote:
> On Thu, 3 Jul 2008, Dave Singer wrote:
>> Next up:  a server that always adds the "I mean it" attribute, even when 
>> it doesn't, and the subsequent invention of the "No, really, come on, 
>> you have to believe me, scout's honor, I really truly mean it" 
>> extension.
> 
> This is exactly why this won't work. Sites will use this correctly, then 
> someone will set some default somewhere incorrectly, or copy and paste a 
> correct site somehow, or misunderstand a tutorial or something, and deploy 
> it without testing in IE8. And it will work fine in all the browsers 
 > ...

Well, only if the other UAs do not adopt the proposal.

I'm not saying they should (yet), but why wouldn't it work if all UAs 
did the same thing here?

> except IE8, an then IE8 will be patched to make this attribute trigger a 
> slightly different (and smaller) set of content-sniffing instead... except 
> that the set won't be quite what was intended, because there will be some 
> bug, and then there will be sites that DO test with this patched IE8, but 
> end up relying on this slightly different content sniffing...
> 
> ...and ten years from now we'll have four different content sniffing modes 
> with four different ways of triggering it and the next generation will 
> look back at 2008 and wonder what we were thinking.
> 
> 
> The way out of this mess is containment. We define a strict set of 
> Content-Type sniffing rules that are required to render the Web, and we 
> get the browsers to converge on only sniffing for those.
> ...

So you can get the browser vendors to converge on a precise set of 
sniffing rules, but you can't get them to agree on an opt-out?

Sounds inconsistent to me.

BR, Julian

Sam Ruby | 5 Jul 2008 15:16
Picon
Favicon

Re: Why Microsoft's authoritative=true won't work and is a bad idea

Julian Reschke wrote on 07/05/2008 03:39:59 AM:
>
> Ian Hickson wrote:
> > On Thu, 3 Jul 2008, Dave Singer wrote:
> >> Next up:  a server that always adds the "I mean it" attribute, even when
> >> it doesn't, and the subsequent invention of the "No, really, come on,
> >> you have to believe me, scout's honor, I really truly mean it"
> >> extension.
> >
> > This is exactly why this won't work. Sites will use this correctly, then
> > someone will set some default somewhere incorrectly, or copy and paste a
> > correct site somehow, or misunderstand a tutorial or something, and deploy
> > it without testing in IE8. And it will work fine in all the browsers
>  > ...
>
> Well, only if the other UAs do not adopt the proposal.
>
> I'm not saying they should (yet), but why wouldn't it work if all UAs
> did the same thing here?
>
> > except IE8, an then IE8 will be patched to make this attribute trigger a
> > slightly different (and smaller) set of content-sniffing instead... except
> > that the set won't be quite what was intended, because there will be some
> > bug, and then there will be sites that DO test with this patched IE8, but
> > end up relying on this slightly different content sniffing...
> >
> > ...and ten years from now we'll have four different content sniffing modes
> > with four different ways of triggering it and the next generation will
> > look back at 2008 and wonder what we were thinking.
> >
> >
> > The way out of this mess is containment. We define a strict set of
> > Content-Type sniffing rules that are required to render the Web, and we
> > get the browsers to converge on only sniffing for those.
> > ...
>
> So you can get the browser vendors to converge on a precise set of
> sniffing rules, but you can't get them to agree on an opt-out?
>
> Sounds inconsistent to me.

Permit me to rephrase that in the form of a question, based on a live example.  I just changed the content type of feed validator test cases from "text/xml" to "text/plain; charset=utf-8".  I did this with the following:

http://feedvalidator.googlecode.com/svn/trunk/feedvalidator/testcases/.htaccess

I verified the content type returned using:

curl --head http://www.feedvalidator.org/testcases/atom/1.1/brief-noerror.xml

I then fetched the file using IE 7.0.5730.13, Firefox 3.0, Safari 3.1.2, and Opera 9.50. IE and Firefox rendered the content as a feed, Safari as html, and Opera as text/plain.

As I read the spec, content sniffing as defined by sections 2.7.2 (and perhaps 2.7.3, despite the fact that my charset was sent as lower-case utf-8 despite my specifying this parameter using upper case) specifies that content served as "text/plain" effectively is an opt-out from further content sniffing.

This leads to the question: what is the essential difference between "text/plain" as defined by the spec and therefore is presumed to be workable (despite all the evidence to the contrary), and "authoritative=true" which is being rejected out of hand as unworkable.
 
> BR, Julian

- Sam Ruby

Ian Hickson | 5 Jul 2008 20:50
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


On Sat, 5 Jul 2008, Julian Reschke wrote:
> > 
> > This is exactly why this won't work. Sites will use this correctly, 
> > then someone will set some default somewhere incorrectly, or copy and 
> > paste a correct site somehow, or misunderstand a tutorial or 
> > something, and deploy it without testing in IE8. And it will work fine 
> > in all the browsers ...
> 
> Well, only if the other UAs do not adopt the proposal.

The only way you get get it to _not_ work in all the other browsers would 
be for all the browsers to be updated to support this simultaneously, with 
a simultaneous launch, and have the entire installed base upgraded at the 
same time. In practice, more than 25% of the install base still uses _IE6_ 
today. The amount of time between when a feature can first be used and 
when a feature cannot be copy-and-pasted by an ignorant author who isn't 
using the latest browsers is several _years_. That's plenty of time to 
poison the well and ruin the chances of the new feature getting deployed 
across all browsers.

> > The way out of this mess is containment. We define a strict set of 
> > Content-Type sniffing rules that are required to render the Web, and 
> > we get the browsers to converge on only sniffing for those. ...
> 
> So you can get the browser vendors to converge on a precise set of 
> sniffing rules, but you can't get them to agree on an opt-out?

The precise set is the set that is compatible with rendering the legacy 
content as expected, the minimal subset compatible with what browsers do. 
It can also be changed in response to browser feedback when it is 
discovered that it isn't quite perfect. It is far easier to incrementally 
move towards a set that is trying to be compatible with what the browsers 
already do than it is to get the browsers to jump to an extreme.

On Sat, 5 Jul 2008, Sam Ruby wrote:
> 
> Permit me to rephrase that in the form of a question, based on a live 
> example.  I just changed the content type of feed validator test cases 
> from "text/xml" to "text/plain; charset=utf-8".  I did this with the 
> following:
> 
> http://feedvalidator.googlecode.com/svn/trunk/feedvalidator/testcases/.htaccess
> 
> I then fetched the file using IE 7.0.5730.13, Firefox 3.0, Safari 3.1.2, 
> and Opera 9.50.  IE and Firefox rendered the content as a feed, Safari 
> as html, and Opera as text/plain.
>
> As I read the spec, content sniffing as defined by sections 2.7.2 (and 
> perhaps 2.7.3, despite the fact that my charset was sent as lower-case 
> utf-8 despite my specifying this parameter using upper case) specifies 
> that content served as "text/plain" effectively is an opt-out from 
> further content sniffing.
> 
> This leads to the question: what is the essential difference between 
> "text/plain" as defined by the spec and therefore is presumed to be 
> workable (despite all the evidence to the contrary), and 
> "authoritative=true" which is being rejected out of hand as unworkable.

text/plain might not be workable. If Opera and Safari find they have to 
change as well, then the spec will have to change too.

--

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Julian Reschke | 6 Jul 2008 12:30
Picon
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Ian Hickson wrote:
> The only way you get get it to _not_ work in all the other browsers would 
> be for all the browsers to be updated to support this simultaneously, with 
> a simultaneous launch, and have the entire installed base upgraded at the 
> same time. In practice, more than 25% of the install base still uses _IE6_ 
> today. The amount of time between when a feature can first be used and 
> when a feature cannot be copy-and-pasted by an ignorant author who isn't 
> using the latest browsers is several _years_. That's plenty of time to 
> poison the well and ruin the chances of the new feature getting deployed 
> across all browsers.

It seems to me that you're totally missing the point here.

Before HTML5, the specifications told UAs to respect the mime type (see 
HTTP, MIME, WebArch...). UAs are known not do this, and to vary in how 
they implement sniffing.

That means that today, a content author has no way to ensure that 
recipients will not do content sniffing.

In many cases, it doesn't matter. In some, it does.

Giving content authors more control about what recipients seems to be a 
good thing to me, even if it only works in one of the major UAs first. 
In particular, if that one is known to do the most content-sniffing today.

Yes, I'd prefer all of this not to be necessary. The less sniffing is 
done, is better. Therefore I'd encourage everybody to try to get the 
number of cases as small as possible.

And no, I'm not convinced that a content-type parameter is the best 
approach, in particular as I don't see how it can be registered 
properly. A new response header may be better.

And yes, I'd prefer if Microsoft would submit proposals like this to a 
public forum, instead of just telling us "this is what we're going to 
do" (the canvas way :-).

>>> The way out of this mess is containment. We define a strict set of 
>>> Content-Type sniffing rules that are required to render the Web, and 
>>> we get the browsers to converge on only sniffing for those. ...
>> So you can get the browser vendors to converge on a precise set of 
>> sniffing rules, but you can't get them to agree on an opt-out?
> 
> The precise set is the set that is compatible with rendering the legacy 
> content as expected, the minimal subset compatible with what browsers do. 
> It can also be changed in response to browser feedback when it is 
> discovered that it isn't quite perfect. It is far easier to incrementally 
> move towards a set that is trying to be compatible with what the browsers 
> already do than it is to get the browsers to jump to an extreme.

I wouldn't consider trusting the server supplied content type an "extreme."

> ...
>> This leads to the question: what is the essential difference between 
>> "text/plain" as defined by the spec and therefore is presumed to be 
>> workable (despite all the evidence to the contrary), and 
>> "authoritative=true" which is being rejected out of hand as unworkable.
> 
> text/plain might not be workable. If Opera and Safari find they have to 
> change as well, then the spec will have to change too.

...I don't think this answers Sam's question. What's the difference 
between considering the encoding as input, but not another parameter?

BR, Julian

Sam Ruby | 6 Jul 2008 00:16
Picon
Favicon

Re: Why Microsoft's authoritative=true won't work and is a bad idea

Ian Hickson wrote on 07/05/2008 02:50:23 PM:
>
> On Sat, 5 Jul 2008, Julian Reschke wrote:
> > >
> > > This is exactly why this won't work. Sites will use this correctly,
> > > then someone will set some default somewhere incorrectly, or copy and
> > > paste a correct site somehow, or misunderstand a tutorial or
> > > something, and deploy it without testing in IE8. And it will work fine
> > > in all the browsers ...
> >
> > Well, only if the other UAs do not adopt the proposal.
>
> The only way you get get it to _not_ work in all the other browsers would
> be for all the browsers to be updated to support this simultaneously, with
> a simultaneous launch, and have the entire installed base upgraded at the
> same time. In practice, more than 25% of the install base still uses _IE6_
> today. The amount of time between when a feature can first be used and
> when a feature cannot be copy-and-pasted by an ignorant author who isn't
> using the latest browsers is several _years_. That's plenty of time to
> poison the well and ruin the chances of the new feature getting deployed
> across all browsers.
>
>
> > > The way out of this mess is containment. We define a strict set of
> > > Content-Type sniffing rules that are required to render the Web, and
> > > we get the browsers to converge on only sniffing for those. ...
> >
> > So you can get the browser vendors to converge on a precise set of
> > sniffing rules, but you can't get them to agree on an opt-out?
>
> The precise set is the set that is compatible with rendering the legacy
> content as expected, the minimal subset compatible with what browsers do.
> It can also be changed in response to browser feedback when it is
> discovered that it isn't quite perfect. It is far easier to incrementally
> move towards a set that is trying to be compatible with what the browsers
> already do than it is to get the browsers to jump to an extreme.
>
>
> On Sat, 5 Jul 2008, Sam Ruby wrote:
> >
> > Permit me to rephrase that in the form of a question, based on a live
> > example.  I just changed the content type of feed validator test cases
> > from "text/xml" to "text/plain; charset=utf-8".  I did this with the
> > following:
> >
> > http://feedvalidator.googlecode.
> com/svn/trunk/feedvalidator/testcases/.htaccess
> >
> > I then fetched the file using IE 7.0.5730.13, Firefox 3.0, Safari 3.1.2,
> > and Opera 9.50.  IE and Firefox rendered the content as a feed, Safari
> > as html, and Opera as text/plain.
> >
> > As I read the spec, content sniffing as defined by sections 2.7.2 (and
> > perhaps 2.7.3, despite the fact that my charset was sent as lower-case
> > utf-8 despite my specifying this parameter using upper case) specifies
> > that content served as "text/plain" effectively is an opt-out from
> > further content sniffing.
> >
> > This leads to the question: what is the essential difference between
> > "text/plain" as defined by the spec and therefore is presumed to be
> > workable (despite all the evidence to the contrary), and
> > "authoritative=true" which is being rejected out of hand as unworkable.
>
> text/plain might not be workable. If Opera and Safari find they have to
> change as well, then the spec will have to change too.

At the present time, four browsers give three different answers, one of which matches the spec.  Changing the spec can't improve upon this situation.

There are only two workable solutions.  One is to declare that this combination of value for _official_ type and parameters and pattern detected in the content itself maps to a specific _sniffed type_, which would require at least two browsers to change.  Another is to declare that this combination is undefined, and thereby may vary based on the browser.

If any variation of the former is pursued, there is no fundamental difference between sniffing for one given HTTP parameter vs another.
 
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

- Sam Ruby

Ian Hickson | 6 Jul 2008 01:22
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


On Sat, 5 Jul 2008, Sam Ruby wrote:
> 
> At the present time, four browsers give three different answers, one of 
> which matches the spec.  Changing the spec can't improve upon this 
> situation.

The spec is not really the point here. The point is interoperability. 
Clearly if browsers don't do the same thing as each other, we don't have 
interoperability and one or more browsers have to change. The spec will 
just change to whatever the browsers decide on. It can help bring the 
browsers together, but that's about it.

> There are only two workable solutions.  One is to declare that this
> combination of value for _official_ type and parameters and pattern
> detected in the content itself maps to a specific _sniffed type_, which
> would require at least two browsers to change.  Another is to declare that
> this combination is undefined, and thereby may vary based on the browser.

Having things vary by browser fails to achieve the only goal here, 
interoperability. So there's only one workable solution.

> If any variation of the former is pursued, there is no fundamental 
> difference between sniffing for one given HTTP parameter vs another.

I agree. So the key is to find a solution that can reach a steady state. 
The "I really mean it" parameter doesn't (since it will end up used on 
pages that aren't labelled correctly, and so other browsers won't support 
it as it would lead to them supporting fewer pages). The idea of having 
browsers converge on the common subset of what they already do to support 
the Web seems like the simplest way of reaching a steady state. That's 
what HTML5 is trying to do now.

--

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Sam Ruby | 6 Jul 2008 01:58
Picon
Favicon

Re: Why Microsoft's authoritative=true won't work and is a bad idea

Ian Hickson <ian-Y1JINVRCvcs@public.gmane.org> wrote on 07/05/2008 07:22:54 PM:

> On Sat, 5 Jul 2008, Sam Ruby wrote:
> >
> > At the present time, four browsers give three different answers, one of
> > which matches the spec.  Changing the spec can't improve upon this
> > situation.
>
> The spec is not really the point here. The point is interoperability.
> Clearly if browsers don't do the same thing as each other, we don't have
> interoperability and one or more browsers have to change. The spec will
> just change to whatever the browsers decide on. It can help bring the
> browsers together, but that's about it.

I think that we are getting rather close to agreeing here.
 
> > There are only two workable solutions.  One is to declare that this
> > combination of value for _official_ type and parameters and pattern
> > detected in the content itself maps to a specific _sniffed type_, which
> > would require at least two browsers to change.  Another is to declare that
> > this combination is undefined, and thereby may vary based on the browser.
>
> Having things vary by browser fails to achieve the only goal here,
> interoperability. So there's only one workable solution.

Slight disagreement here: there are multiple potentially workable solutions.  At best, the one captured by the current draft is but one of them.

> > If any variation of the former is pursued, there is no fundamental
> > difference between sniffing for one given HTTP parameter vs another.
>
> I agree. So the key is to find a solution that can reach a steady state.
> The "I really mean it" parameter doesn't (since it will end up used on
> pages that aren't labelled correctly, and so other browsers won't support
> it as it would lead to them supporting fewer pages).

Any documented solution, including the one in the current draft, suffers from the above.  Pages will be lagelled incorrectly.  Yes, even with the rules captured by the current draft of HTML5.

To the extent that the HTML5 document limits itself to documenting consistent error recovery for pages served incorrectly, then are few issues.  If a page *can* be interpreted as text/plain (and in general, most html pages and feeds can), then there is no reason that the consistent error recovery couldn't provide *some* combination of parameters where the sniffed type matches the official type.  In fact, I would go so far as to say that making sure that this is always the case would be a worthy goal.

> The idea of having
> browsers converge on the common subset of what they already do to support
> the Web seems like the simplest way of reaching a steady state. That's
> what HTML5 is trying to do now.

Agreed.  But again, I will assert that there are multiple potential common subsets.  Furthermore, I will assert that is is the fact that picking a scheme that browsers are willing to converge to is a more important factor than which subset is picked.

Another factor to consider is that the http working group is concerned with more user agents than browsers.  Having the sniffed type not match the official type for content that can be reasonably interpreted using the official type is an issue; anything that can reduce the set of cases for which this occurs would be a good thing.  
 
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

- Sam Ruby

Ian Hickson | 6 Jul 2008 03:53
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


On Sat, 5 Jul 2008, Sam Ruby wrote:
> > > 
> > > There are only two workable solutions. [a and b]
> >
> > [b] fails to achieve the only goal here, interoperability. So there's 
> > only one workable solution.
> 
> Slight disagreement here: there are multiple potentially workable 
> solutions.

You said there were two, I pointed out why one of those two isn't a 
solution, and now there are multiple? I'm confused.

> > the key is to find a solution that can reach a steady state. The "I 
> > really mean it" parameter doesn't (since it will end up used on pages 
> > that aren't labelled correctly, and so other browsers won't support it 
> > as it would lead to them supporting fewer pages).
> 
> Any documented solution, including the one in the current draft, suffers 
> from the above. Pages will be lagelled incorrectly.  Yes, even with the 
> rules captured by the current draft of HTML5.

I think you are missing the key difference here.

With what the spec says, which is the status quo plus or minus the delta 
between implementations, we have already run through the people making the 
mistakes and have already gotten to a pretty stable steady state. Getting 
from here to a state where all the browsers do the same thing is a small 
change.

With a radical new parameter, we are much further from the status quo, and 
so it would take significantly more to get to a steady state. Furthermore, 
since the new parameter is in fact identical in semantic meaning to the 
origin Content-Type header, there's not really any reason to believe that 
that final steady state wouldn't look exactly like today's near-steady 
state (except more complicated, since it would have more inputs).

> If a page *can* be interpreted as text/plain (and in general, most html 
> pages and feeds can), then there is no reason that the consistent error 
> recovery couldn't provide *some* combination of parameters where the 
> sniffed type matches the official type.  In fact, I would go so far as 
> to say that making sure that this is always the case would be a worthy 
> goal.

I agree, and indeed currently HTML5 says to honour text/plain in all cases 
where the content is valid text/plain content. Personally I think it's a 
security risk to treat text/plain as anything but.

> Another factor to consider is that the http working group is concerned 
> with more user agents than browsers.

I should hope everyone is. However, that doesn't change anything -- it's 
still the same ecosystem, and the same content. We don't want tools 
treating content different than each other, whether they are Web browsers 
or not.

--

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Julian Reschke | 6 Jul 2008 12:35
Picon
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Ian Hickson wrote:
> ...
>> Another factor to consider is that the http working group is concerned 
>> with more user agents than browsers.
> 
> I should hope everyone is. However, that doesn't change anything -- it's 
> still the same ecosystem, and the same content. We don't want tools 
> treating content different than each other, whether they are Web browsers 
> or not.
> ...

Now this is something I totally can agree with.

In which case I'm not sure why it's the HTML working group working on 
this. Seems that W3C and IETF should collaborate on this one.

BR, Julian

Sam Ruby | 6 Jul 2008 05:51
Picon
Favicon

Re: Why Microsoft's authoritative=true won't work and is a bad idea

Ian Hickson wrote on 07/05/2008 09:53:51 PM:
>
> On Sat, 5 Jul 2008, Sam Ruby wrote:
> > > >
> > > > There are only two workable solutions. [a and b]
> > >
> > > [b] fails to achieve the only goal here, interoperability. So there's
> > > only one workable solution.
> >
> > Slight disagreement here: there are multiple potentially workable
> > solutions.
>
> You said there were two, I pointed out why one of those two isn't a
> solution, and now there are multiple? I'm confused.

What I originally said was:

"One is to declare that this combination of value for _official_ type and parameters and pattern detected in the content itself maps to a specific _sniffed type_, which would require at least two browsers to change."

We both agree on that one.  Unfortunately, I would have been clearer if the word "this" were replace with the words "a given", where what is currently in the HTML5 specification is one possible combination, what is in RFC 2616 is another possible combination, what IE7 currently implements is another, what is proposed for IE8 is another, etc.

Sorry for the confusion.
 
> > > the key is to find a solution that can reach a steady state. The "I
> > > really mean it" parameter doesn't (since it will end up used on pages
> > > that aren't labelled correctly, and so other browsers won't support it
> > > as it would lead to them supporting fewer pages).
> >
> > Any documented solution, including the one in the current draft, suffers
> > from the above. Pages will be lagelled incorrectly.  Yes, even with the
> > rules captured by the current draft of HTML5.
>
> I think you are missing the key difference here.
>
> With what the spec says, which is the status quo plus or minus the delta
> between implementations, we have already run through the people making the
> mistakes and have already gotten to a pretty stable steady state.

Simply put, I'm not seeing the "status quo" that you describe here.  Since we seem to be so far apart here, let me put in in the form of a question.  Considering the following content:

http://feedvalidator.org/testcases/atom/1.1/brief-noerror.xml

When I visit that page, I would like the text "No errors should be produced by the minimal feed" to be visible.  I've read the spec.  I've tried various combinations of content-type parameters.  I've gotten it to work with Opera 9.50, but have failed to find any combination of values to place in the content-type parameter that works with IE 7.0.5730.13, Firefox 3.0, or Safari 3.1.2.

From my read of the spec, this should be possible.  From my experience with the browsers I have cited, if there is a stable steady state, it is quite a distance from what I read in the spec.

What am I missing?  In particular, what do you recommend for the content type of the testcase I cited above?

- Sam Ruby

Ian Hickson | 6 Jul 2008 10:42
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


On Sat, 5 Jul 2008, Sam Ruby wrote:
> >
> > With what the spec says, which is the status quo plus or minus the 
> > delta between implementations, we have already run through the people 
> > making the mistakes and have already gotten to a pretty stable steady 
> > state.
> 
> Simply put, I'm not seeing the "status quo" that you describe here. 

Maybe you have only been checking the small set of cases that are not 
interoperable (the "delta between implementations"). By and large, 
browsers all do the same thing. There are certainly many edge cases (and 
some cases that aren't quite so "edge") where browsers differ to some 
extent, but the delta between the various behaviours browsers have today 
and a single description of what various behaviours browsers have today is 
far smaller, almost by definition, that the delta between the various 
behaviours browsers have today and the behaviour of the authoritative=true 
parameter. Hence the latter is further from the steady state.

> http://feedvalidator.org/testcases/atom/1.1/brief-noerror.xml
> 
> When I visit that page, I would like the text "No errors should be 
> produced by the minimal feed" to be visible.

If you would like the document to be processed as plain text, then there 
might not be a good answer for you, sorry. Your use case is incompatible 
with the use case of the many users who want to see feeds sent as 
text/plain handled as feeds. Enough people mislabel their feeds as 
text/plain that in practice documents labeled as text/plain are, in some 
browsers, sniffed for feeds before being treated as plain text.

This is one of the areas where the delta between implementations is 
non-zero, though, and the spec does currently suggest treating that 
document as plain text despite it looking like a feed, since that is what 
some browsers do and it is the better technical solution theoretically.

I would like the aforementioned browsers to change to line up with what 
the spec says. From Boris' comments, apparently even some of the 
developers of those browsers would like to change, but it seems it's not 
that simple. These things rarely are. Maybe the spec will have to change 
instead, and the browsers that handle this as text/plain will have to 
start sniffing for the feed.

Either way, interoperability will hopefully be reached, so that all 
browsers act the same and the users and authors don't have to be surprised 
with differing behaviour.

--

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Geoffrey Sneddon | 6 Jul 2008 13:21

Re: Why Microsoft's authoritative=true won't work and is a bad idea


On 6 Jul 2008, at 09:42, Ian Hickson wrote:

> Enough people mislabel their feeds as
> text/plain that in practice documents labeled as text/plain are, in  
> some
> browsers, sniffed for feeds before being treated as plain text.

Interestingly, I did, partly as an experiment, stop sniffing text/ 
plain in the latest release of SimplePie (which, inevitably, isn't the  
nicest of things to do, seeming there are tens of thousands of users).  
Next to nothing broke. I know for a fact this couldn't have been done  
a year or two ago: things have certainly moved on in terms of the MIME  
types feeds are served with: all that is supported now is application/ 
xml, text/xml, application/rss+xml, application/atom+xml, application/ 
rdf+xml, and sniffed text/html (this is certainly still needed for  
compat. though). If anyone has the guts, it really would be nice to  
see some of the larger UAs pulling support for feeds served as text/ 
plain.

--
Geoffrey Sneddon
<http://gsnedders.com/>

Julian Reschke | 6 Jul 2008 12:40
Picon
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Ian Hickson wrote:
> ...
> If you would like the document to be processed as plain text, then there 
> might not be a good answer for you, sorry. Your use case is incompatible 
> with the use case of the many users who want to see feeds sent as 
> text/plain handled as feeds. Enough people mislabel their feeds as 
> text/plain that in practice documents labeled as text/plain are, in some 
> browsers, sniffed for feeds before being treated as plain text.
> ...

With the current text in HTML5, there's not only no "good answer" but no 
answer at all (except by telling users to configure their UAs to respect 
mime types).

Sam's use case could be made compatible by making the response 
distinguishable from one sent by a misconfigured server.

At this point it seems to me that you are simply not interested in that 
case. Is this correct?

BR, Julian

Ian Hickson | 7 Jul 2008 00:19
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


On Sun, 6 Jul 2008, Julian Reschke wrote:
> >
> > The precise set is the set that is compatible with rendering the 
> > legacy content as expected, the minimal subset compatible with what 
> > browsers do. It can also be changed in response to browser feedback 
> > when it is discovered that it isn't quite perfect. It is far easier to 
> > incrementally move towards a set that is trying to be compatible with 
> > what the browsers already do than it is to get the browsers to jump to 
> > an extreme.
> 
> I wouldn't consider trusting the server supplied content type an 
> "extreme."

Compared to the status quo, it is an extreme. (If you consider the 
possible implementation space as a multidimensional phase space, and 
consider the current implementations are points in phase space, they are 
all relatively close to each other, and close to HTML5. The position that 
involves no sniffing at all, whether that be HTTP-compliance or this new 
authoritative=true parameter, is far, far from the browsers.)

> > > This leads to the question: what is the essential difference between 
> > > "text/plain" as defined by the spec and therefore is presumed to be 
> > > workable (despite all the evidence to the contrary), and 
> > > "authoritative=true" which is being rejected out of hand as 
> > > unworkable.
> > 
> > text/plain might not be workable. If Opera and Safari find they have 
> > to change as well, then the spec will have to change too.
> 
> ...I don't think this answers Sam's question. What's the difference 
> between considering the encoding as input, but not another parameter?

I've explained multiple times the difference is not in the syntax but in 
the delta from the status quo to the behaviour required by the two 
proposals. One is relatively close to where we are now, and by making 
minor changes to browsers and specs, we can reach an equilibrium. The 
other is so far away that only large changes will reach interoperability, 
and such changes aren't stable, since they would happen over a long time 
period and would result in a large body of legacy content that is 
mislabelled, thus leading us right back into a content-sniffing world as 
we are today.

On Sun, 6 Jul 2008, Julian Reschke wrote:
> > >
> > > Another factor to consider is that the http working group is 
> > > concerned with more user agents than browsers.
> > 
> > I should hope everyone is. However, that doesn't change anything -- 
> > it's still the same ecosystem, and the same content. We don't want 
> > tools treating content different than each other, whether they are Web 
> > browsers or not. ...
> 
> Now this is something I totally can agree with.
> 
> In which case I'm not sure why it's the HTML working group working on 
> this. Seems that W3C and IETF should collaborate on this one.

I would aboslutely love it if the relevant groups would take this stuff 
and specify it themselves. However, the HTTP group has already indicated 
that they have no intention of defining the content sniffing rules 
required to be compatible with legacy content. (This is just like the URL 
issue, where the URI group indicated no intention to update the URI specs 
to be compatible with legacy content.) I've no intention of playing blame- 
laying games; if the HTTP group doesn't want to do the work, then we will 
instead. If the HTTP group decides to do the work, I would be very happy 
to remove this stuff from the HTML5 spec.

On Sun, 6 Jul 2008, Julian Reschke wrote:
> > ... If you would like the document to be processed as plain text, then 
> > there might not be a good answer for you, sorry. Your use case is 
> > incompatible with the use case of the many users who want to see feeds 
> > sent as text/plain handled as feeds. Enough people mislabel their 
> > feeds as text/plain that in practice documents labeled as text/plain 
> > are, in some browsers, sniffed for feeds before being treated as plain 
> > text. ...
> 
> With the current text in HTML5, there's not only no "good answer" but no 
> answer at all (except by telling users to configure their UAs to respect 
> mime types).

This problem has nothing to do with the spec, since the spec currently 
requires text/plain to be honoured in this case.

The "bad" answer is for Sam to stuff the top of this text/plain feeds with 
filler content that doesn't get sniffed, so that the sniffing heuristics 
in IE and Firefox get tricked into not seeing the feed content. (So, there 
_is_ an answer, it's just not a good one.)

> Sam's use case could be made compatible by making the response 
> distinguishable from one sent by a misconfigured server.

How is that possible?

> At this point it seems to me that you are simply not interested in that 
> case. Is this correct?

I would love sniffing to go away altogether. I'm so interested in this 
particular use case that HTML5 in fact supports it _despite_ this 
requiring changes from the two biggest browsers. What more can I do?

However, if said browsers ignore me, then I'm not going to just stick my 
head in the sand and pretend like all is well -- the spec will change to 
align with reality. At the end of the day, it's not up to me.

--

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Julian Reschke | 7 Jul 2008 09:33
Picon
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Ian Hickson wrote:
>> I wouldn't consider trusting the server supplied content type an 
>> "extreme."
> 
> Compared to the status quo, it is an extreme. (If you consider the 
> possible implementation space as a multidimensional phase space, and 
> consider the current implementations are points in phase space, they are 
> all relatively close to each other, and close to HTML5. The position that 
> involves no sniffing at all, whether that be HTTP-compliance or this new 
> authoritative=true parameter, is far, far from the browsers.)

It's an "extreme" that is currently allowed in HTML5, remember?

"If the user agent is configured to strictly obey Content-Type headers 
for this resource, then jump to the last step in this set of steps." -- 
<http://www.w3.org/html/wg/html5/#content-type0>

>> ...I don't think this answers Sam's question. What's the difference 
>> between considering the encoding as input, but not another parameter?
> 
> I've explained multiple times the difference is not in the syntax but in 
> the delta from the status quo to the behaviour required by the two 
> proposals. One is relatively close to where we are now, and by making 
> minor changes to browsers and specs, we can reach an equilibrium. The 
> other is so far away that only large changes will reach interoperability, 
> and such changes aren't stable, since they would happen over a long time 
> period and would result in a large body of legacy content that is 
> mislabelled, thus leading us right back into a content-sniffing world as 
> we are today.

It seems you are satisfied with the equilibrium HTML5 defines. Others 
are not, for instance Microsoft.

Many think that the information supplied by the server must be treated 
as authoritative, thus want to reach a *different* equilibrium. That may 
require more changes, but this doesn't mean it can't be done (despite 
what you say).

> On Sun, 6 Jul 2008, Julian Reschke wrote:
>>>> Another factor to consider is that the http working group is 
>>>> concerned with more user agents than browsers.
>>> I should hope everyone is. However, that doesn't change anything -- 
>>> it's still the same ecosystem, and the same content. We don't want 
>>> tools treating content different than each other, whether they are Web 
>>> browsers or not. ...
>> Now this is something I totally can agree with.
>>
>> In which case I'm not sure why it's the HTML working group working on 
>> this. Seems that W3C and IETF should collaborate on this one.
> 
> I would aboslutely love it if the relevant groups would take this stuff 
> and specify it themselves. However, the HTTP group has already indicated 

With "it", what exactly do you mean? The thing these groups will agree 
on, or the thing you prefer personally?

> that they have no intention of defining the content sniffing rules 
> required to be compatible with legacy content. (This is just like the URL 

The IETF HTTPbis working group has no mandate to do so. Thus it would 
need to be rechartered, or a new WG would have to start.

> issue, where the URI group indicated no intention to update the URI specs 
> to be compatible with legacy content.) I've no intention of playing blame- 
> laying games; if the HTTP group doesn't want to do the work, then we will 
> instead. If the HTTP group decides to do the work, I would be very happy 
> to remove this stuff from the HTML5 spec.

There is no "URI group" -- there's a list of people subscribed to the 
URI mailing list. That being said, I haven't seen *any* kind of 
consensus that RFC3986 should be changed. I've seen some discussion 
about whether RFC3987bis should expand on the "LEIRI" topic, and it 
seems Martin Dürst was considering that input.

The difference between the sniffing issue and the URI issue is this: 
what a content-type means is totally relevant outside the HTML context; 
how an HTTP response is to be processed needs to be the same everywhere.

On the other hand, what lexical format HTML5 allows internally is 
primarily a problem for the HTML WG to decide. It just needs to define 
how the internal format maps to URI/IRI.

>> With the current text in HTML5, there's not only no "good answer" but no 
>> answer at all (except by telling users to configure their UAs to respect 
>> mime types).
> 
> This problem has nothing to do with the spec, since the spec currently 
> requires text/plain to be honoured in this case.
> 
> The "bad" answer is for Sam to stuff the top of this text/plain feeds with 
> filler content that doesn't get sniffed, so that the sniffing heuristics 
> in IE and Firefox get tricked into not seeing the feed content. (So, there 
> _is_ an answer, it's just not a good one.)

That may be a workaround that works in this case, but I doubt it's 
universally applicable.

>> Sam's use case could be made compatible by making the response 
>> distinguishable from one sent by a misconfigured server.
> 
> How is that possible?

Using Microsoft's proposal or by using a separate header, for instance.

>> At this point it seems to me that you are simply not interested in that 
>> case. Is this correct?
> 
> I would love sniffing to go away altogether. I'm so interested in this 
> particular use case that HTML5 in fact supports it _despite_ this 
> requiring changes from the two biggest browsers. What more can I do?
> 
> However, if said browsers ignore me, then I'm not going to just stick my 
> head in the sand and pretend like all is well -- the spec will change to 
> align with reality. At the end of the day, it's not up to me.

Well, the biggest vendor just put a proposal on the table that would 
make it possible to disable sniffing altogether.

Maybe it would make sense to consider it seriously, instead of 
immediately stating "won't work"?

BR, Julian

Justin James | 7 Jul 2008 15:36
Picon

RE: Why Microsoft's authoritative=true won't work and is a bad idea


> There is no "URI group" -- there's a list of people subscribed to the 
> URI mailing list. That being said, I haven't seen *any* kind of 
> consensus that RFC3986 should be changed. I've seen some discussion 
> about whether RFC3987bis should expand on the "LEIRI" topic, and it 
> seems Martin Dürst was considering that input.

It seems to me that the following facts are true:

* The URI group/mailing list is not actively working to update or change the
URI specs.
* Over the last few weeks, it has become clear that the URI specs need to
change for certain aspects of browser behavior and HTML to make sense and/or
work right.
* The current URI/URL/"HTTP URL"/IRI breakout is artificial and can/should
be fixed in the URI spec.

If what Julian says is correct (and I have no reason to doubt it), how do we
get some traction on this issue? Who do we engage? Does it make sense,
instead of trying to do the work of an active URI group within the HTML 5
spec (the "HTTP URL" initiative) for a number of us to get involved with
getting an *active* URI group going and simply working within that framework
on that issue? Yes, it might feel like "packing the court", but if the spec
is in desperate need of some reality-based changes, and there is no *active*
group willing or able to even consider changes, then I don't see any issue
with it.

J.Ja

Henrik Nordstrom | 8 Jul 2008 00:22
Gravatar

RE: Why Microsoft's authoritative=true won't work and is a bad idea

On mån, 2008-07-07 at 09:36 -0400, Justin James wrote:

> * Over the last few weeks, it has become clear that the URI specs need to
> change for certain aspects of browser behavior and HTML to make sense and/or
> work right.

Whats wrong with the HTTP URL specification that makes HTML not make
sense or not work right?

I know some cases where browsers behave oddly wrt Internet URLs in
general (mainly http:// and ftp://), and in all cases so far they are
not following specifications and would behave quite well if they did..

Regards
Henrik
Justin James | 8 Jul 2008 00:56
Picon

RE: Why Microsoft's authoritative=true won't work and is a bad idea


> Whats wrong with the HTTP URL specification that makes HTML not make
> sense or not work right?
> 
> I know some cases where browsers behave oddly wrt Internet URLs in
> general (mainly http:// and ftp://), and in all cases so far they are
> not following specifications and would behave quite well if they did..

Henrik -

The problem with the concept of HTML specifying its own URLs, from my viewpoint, is that developers need one
standard to follow, not 3 (URI, IRI, HTTP URL). All too often, once you get more than 2 competing
"standards", none of are actually "standard" and enough will get enough traction so that they never die. I
truly think that everyone would be better served if there was simply 1 "U|IR*" standard (it's really sad
when a regex is the best way to refer to a group of things...) that developers learn and understand. All of
the debate on this list over having a "U|IR*" standard added to the HTML spec, in order to compensate for
discrepancies between how U|IR*'s are commonly used in HTML, as opposed to the way the specs read, is
further proof that the specs are broken.

A simple summary of my thoughts:
Any spec which is not properly followed by the majority of developers a majority of the time (where
pertinent, of course) is not a "standard" and is a broken spec. Sometimes, it is broken outside of the spec
itself, such as being sponsored or ratified by an unrecognized body. Other times it is broken within the
spec, like 800 page specs describing a floor sweeping process or something. Sometimes it is just a
marketing problem (like so many of the X* specs, like XHTML, XForms, XPath, and a zillion other X* specs
which few people use).

>From what I can tell, the W3C has very, very hard time producing specs which don't qualify as "broken" by
that measure, and HTML is heading that list.

Imagine if drive manufacturers followed the SATA spec as well as HTML authors followed the HTML spec. We'd
still be using pen and paper. So we need to be asking ourselves, "what's wrong with HTML that no one follows
it?" The answer is not *just* "browsers accept garbage". The answer also includes, "a spec so long and
lengthy that only a select few people can understand it to the point where they can write valid HTML." In
other words, HTML is broken from the inside.

J.Ja

Henrik Nordstrom | 8 Jul 2008 01:49
Gravatar

RE: Why Microsoft's authoritative=true won't work and is a bad idea

On mån, 2008-07-07 at 18:56 -0400, Justin James wrote:

> The problem with the concept of HTML specifying its own URLs, from my
> viewpoint, is that developers need one standard to follow, not 3 (URI,
> IRI, HTTP URL). 

But I am still not aware of the problem which triggered this. I linger
on the HTTP WG, not the HTML one.. and is therefore unaware of what
problem HTTP URL/URI/IRI specifications cause for HTML.

> Any spec which is not properly followed by the majority of developers
> a majority of the time (where pertinent, of course) is not a
> "standard" and is a broken spec.

There is a large grey zone there. But yes, if every implementer consider
what the specs says in some area to be nonsense and implements something
else than the specs says then the spec is most likely broken. But in
quite many cases it's just poor choice of language making the intentions
of the specification not so obvious

If every implementer implements something else because what the specs
says is correct but the will to try to interoperate with existing/older
broken implementations is greater than the will to keep a sane
implementation. And especially not when there is multiple such areas for
historical reasons (which HTTP has it's noticeable share of with 3.5
generations in a less than a handful years)

> Sometimes, it is broken outside of the spec itself, such as being
> sponsored or ratified by an unrecognized body.

Or implemented before the effects has been properly analyzed..

> Other times it is broken within the spec, like 800 page specs
> describing a floor sweeping process or something.

Yes.. and unfortunately many specifications is heading in that
direction, growing uncontrollably large with huge amounts of legacy
attached.

But quite often it's better to clearly define the original intents using
the original mechanisms and encourage compliance, than to reinvent the
same things again only because most implementers got it wrong the first
time.

> Sometimes it is just a marketing problem (like so many of the X*
> specs, like XHTML, XForms, XPath, and a zillion other X* specs which
> few people use).

Heh

> From what I can tell, the W3C has very, very hard time producing specs
> which don't qualify as "broken" by that measure, and HTML is heading
> that list.

Can't comment. HTML is not my main field, staying mostly in the area of
protocols and bits. But I do still feel a significant gap between HTML
(and related) specifications and user agent implementation, and quite
different gaps depending on implementation... But I still have faith
that things will improve over time if one has a little patience, and
coverge towards the specications instead of diverging even further
apart.

A really big problem is to how to get rid of legacy from earlier
specifications whos design choices perhaps wasn't the best.. Once a
feature gets into a standard and implemented in more than one
implementation it's likely to stay for a considerable time even if it
turned out to be a very bad idea.

Things which is only implemented but not officially standardised, or
only in the standards but never implemented is a while lot easier to
change as you can always claim that one of the two is wrong/broken.

Same for when implementations misread specifications, resulting in
unintentional deviations from the specification, most often from not
understanding the specification or how it applies to what they do. Such
mistakes is often relatively easy to get corrected once the right people
is made aware of the issue and why it's important to follow the specs.

Regards
Henrik
Julian Reschke | 8 Jul 2008 09:27
Picon
Picon

the "HTML URL" issue, was: Why Microsoft's authoritative=true won't work and is a bad idea


Henrik Nordstrom wrote:
> On mån, 2008-07-07 at 18:56 -0400, Justin James wrote:
> 
>> The problem with the concept of HTML specifying its own URLs, from my
>> viewpoint, is that developers need one standard to follow, not 3 (URI,
>> IRI, HTTP URL). 
> 
> But I am still not aware of the problem which triggered this. I linger
> on the HTTP WG, not the HTML one.. and is therefore unaware of what
> problem HTTP URL/URI/IRI specifications cause for HTML.
> ...

See thread at <http://lists.w3.org/Archives/Public/uri/2008Jun/0088.html>.

Key issues:

1) there are non-IRI identifiers in HTML in use (such as using space 
characters)

2) UAs do not use UTF-8 consistently when mapping non-ASCII characters 
in query parameters (they may use the document encoding instead)

3) there is no defined error handling in URI/IRI (I do not agree that 
this is a problem with URI/IRI)

1) and 2) can be solved by defining a transformation from HTML URL to 
IRI. HTML5 currently modifies the parsing rules of IRI instead, which I 
think is the wrong approach.

The other issue that got a lot of discussion is whether the things used 
in HTML should be called "URL", when in reality they are something else.

BR, Julian

Roy T. Fielding | 9 Jul 2008 00:12
Favicon
Gravatar

Re: the "HTML URL" issue, was: Why Microsoft's authoritative=true won't work and is a bad idea


On Jul 8, 2008, at 12:27 AM, Julian Reschke wrote:
> Henrik Nordstrom wrote:
>> On mån, 2008-07-07 at 18:56 -0400, Justin James wrote:
>>> The problem with the concept of HTML specifying its own URLs,  
>>> from my
>>> viewpoint, is that developers need one standard to follow, not 3  
>>> (URI,
>>> IRI, HTTP URL).
>> But I am still not aware of the problem which triggered this. I  
>> linger
>> on the HTTP WG, not the HTML one.. and is therefore unaware of what
>> problem HTTP URL/URI/IRI specifications cause for HTML.
>> ...
>
> See thread at <http://lists.w3.org/Archives/Public/uri/2008Jun/ 
> 0088.html>.
>
> Key issues:
>
> 1) there are non-IRI identifiers in HTML in use (such as using  
> space characters)

No, there aren't.  The contents of the attribute value is CDATA, not  
an IRI.
How the parser converts the CDATA to a URI string (not IRI string)  
should
be defined by HTML.  The algorithm doesn't even need to be the same for
different element attributes (e.g., some attribute values consist of
space-separated references).  The value doesn't become identifier(s)
until after the conversion of CDATA string to valid URI is complete.

> 2) UAs do not use UTF-8 consistently when mapping non-ASCII  
> characters in query parameters (they may use the document encoding  
> instead)

That's because UTF-8 was not a desired mapping when HTML was defined.
That's why HTML maps query parameters to the document encoding.
I don't see why this is even being argued, since it certainly won't
be changing any time soon.  It makes far more sense to encourage the
use of UTF-8 document encodings.

> 3) there is no defined error handling in URI/IRI (I do not agree  
> that this is a problem with URI/IRI)

Of course not, just as there is no defined error handling for the name
on your birth certificate.  Error handling is always defined by context.

> 1) and 2) can be solved by defining a transformation from HTML URL  
> to IRI. HTML5 currently modifies the parsing rules of IRI instead,  
> which I think is the wrong approach.

The whole discussion is just brain dead.  All of the supposed issues
are about translating raw data into standardized form.  Instead
of simply defining the transform of raw attribute to standardized value,
which is entirely governed by HTML, the editor has chosen to treat the
raw value as some sort of magic final form, reuses the well-known URL
moniker is the most asinine way, and blames the other standards
(which he thankfully has no control over) for not supporting all of the
possible crappy raw data that could be input in an HTML attribute.

We know that just anything is not interoperable.  That's why URI is
limited to a fairly small set of characters and a simple syntax: to
require WWW identifiers to be in a form that is usable worldwide.
That's why HTTP identifiers are limited to URIs.  That's why this
whole discussion about creating new identifiers and new protocols in
HTML is a total waste of time -- the rest of the world does not want
it and will not allow it to be published as HTML5.  Pound the sand
all you like; the network standards will not change because they are
designed to support everyone's needs, not just the selfish desires of
a very small set of browser developers.

....Roy

Stefan Eissing | 8 Jul 2008 10:20
Picon
Favicon

Re: the "HTML URL" issue, was: Why Microsoft's authoritative=true won't work and is a bad idea


Am 08.07.2008 um 09:27 schrieb Julian Reschke:
> The other issue that got a lot of discussion is whether the things  
> used in HTML should be called "URL", when in reality they are  
> something else.

Calling them HREFs (even though they also appear in other attributes)  
would give everyone the right context (HTML) and topic (URLs) without  
the confusion of redefining existing terms.

//Stefan
--
<green/>bytes GmbH, Hafenweg 16, D-48155 Münster, Germany
Amtsgericht Münster: HRB5782

Justin James | 8 Jul 2008 15:55
Picon

RE: the "HTML URL" issue, was: Why Microsoft's authoritative=true won't work and is a bad idea


> > The other issue that got a lot of discussion is whether the things
> > used in HTML should be called "URL", when in reality they are
> > something else.
> 
> Calling them HREFs (even though they also appear in other attributes)
> would give everyone the right context (HTML) and topic (URLs) without
> the confusion of redefining existing terms.

Having nearly identical concepts is the root of this problem, not the nearly
identical names (although that does not help either). There is no need to
have a different spec for URI, IRI, and "HTTP URL", "URL reference", "HREF"
(or whatever this mystery spec is being called). There should be *one* spec
for resource locations. Period.

Besides, defining resource locators is outside the domain of HTML as far as
I am concerned.

J.Ja

Stefan Eissing | 8 Jul 2008 16:10
Picon
Favicon

Re: the "HTML URL" issue, was: Why Microsoft's authoritative=true won't work and is a bad idea


Am 08.07.2008 um 15:55 schrieb Justin James:
> Having nearly identical concepts is the root of this problem, not  
> the nearly
> identical names (although that does not help either). There is no  
> need to
> have a different spec for URI, IRI, and "HTTP URL", "URL  
> reference", "HREF"
> (or whatever this mystery spec is being called). There should be  
> *one* spec
> for resource locations. Period.

The spec for URIs needs to define what URIs are and what not. It  
should *not* define how everything written on the side of a bus  
should be converted into a proper URI, nor what should happen to the  
bus if this does not work.

//Stefan

--
<green/>bytes GmbH, Hafenweg 16, D-48155 Münster, Germany
Amtsgericht Münster: HRB5782

Anne van Kesteren | 8 Jul 2008 16:30
Picon
Favicon
Gravatar

Re: the "HTML URL" issue


On Tue, 08 Jul 2008 16:10:34 +0200, Stefan Eissing  
<stefan.eissing@...> wrote:
> The spec for URIs needs to define what URIs are and what not. It should  
> *not* define how everything written on the side of a bus should be  
> converted into a proper URI, nor what should happen to the bus if this  
> does not work.

I don't agree with that, but I'm fine with HTML5 defining it instead.

--

-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>

Julian Reschke | 7 Jul 2008 16:05
Picon
Picon

URI/IRI vs HTML-URL, was: Why Microsoft's authoritative=true won't work and is a bad idea


Justin James wrote:
>> There is no "URI group" -- there's a list of people subscribed to the 
>> URI mailing list. That being said, I haven't seen *any* kind of 
>> consensus that RFC3986 should be changed. I've seen some discussion 
>> about whether RFC3987bis should expand on the "LEIRI" topic, and it 
>> seems Martin Dürst was considering that input.
> 
> It seems to me that the following facts are true:
> 
> * The URI group/mailing list is not actively working to update or change the
> URI specs.

There is no URI working group. URI is a stable specification (full IETF 
standard), and there's no consensus that anything needs to be done with 
it with respect to "HTML URL".

There are individuals (?) working on a revision of the IRI spec, 
including Martin Dürst. That revision may contain more information about 
what's currently called LEIRI (Legacy Extended IRI), but I don't think 
there's consensus about whether this is really good idea. Head over to 
the URI mailing list and discuss it, if you're interested.

> * Over the last few weeks, it has become clear that the URI specs need to
> change for certain aspects of browser behavior and HTML to make sense and/or
> work right.

Nope.

What has become clear is that HTML needs to handle a superset of what 
IRI allows, and also needs to special case IRI->URI conversion for query 
components.

That can be done in a separate spec, defining a mapping from "HTTP URL" 
to IRI reference, and then letting the default URI/IRI rules apply.

It's not yet clear whether the same is needed outside HTML. Still 
waiting for examples.

> * The current URI/URL/"HTTP URL"/IRI breakout is artificial and can/should
> be fixed in the URI spec.

Not sure what you call "breakout", and what you want fixed.

> If what Julian says is correct (and I have no reason to doubt it), how do we
> get some traction on this issue? Who do we engage? Does it make sense,
> instead of trying to do the work of an active URI group within the HTML 5
> spec (the "HTTP URL" initiative) for a number of us to get involved with
> getting an *active* URI group going and simply working within that framework
> on that issue? Yes, it might feel like "packing the court", but if the spec
> is in desperate need of some reality-based changes, and there is no *active*
> group willing or able to even consider changes, then I don't see any issue
> with it.

I think HTML5 defining local rules for treatment of identifiers in HTML 
documents is fine. Optimally this is done by defining a mapping to IRI 
(which as far as I understand currently is not the case).

*If* more specifications need the same kind of mapping (and that's still 
an "if" for me), it would make sense to extract these mapping rules into 
a separate spec. Should these specs live in W3C land, it would probably 
make sense to make this a W3C activity.

BR, Julian

Martin Duerst | 8 Jul 2008 08:13
Picon
Gravatar

Re: URI/IRI vs HTML-URL, was: Why Microsoft's authoritative=true won't work and is a bad idea


At 23:05 08/07/07, Julian Reschke wrote:
>
>Justin James wrote:
>>> There is no "URI group" -- there's a list of people subscribed to the URI mailing list. That being said, I
haven't seen *any* kind of consensus that RFC3986 should be changed. I've seen some discussion about
whether RFC3987bis should expand on the "LEIRI" topic, and it seems Martin D$B—S(Bst was considering
that input.
>> It seems to me that the following facts are true:
>> * The URI group/mailing list is not actively working to update or change the
>> URI specs.
>
>There is no URI working group. URI is a stable specification (full IETF standard), and there's no
consensus that anything needs to be done with it with respect to "HTML URL".
>
>There are individuals (?) working on a revision of the IRI spec, including Martin D$B—S(Bst. That
revision may contain more information about what's currently called LEIRI (Legacy Extended IRI), but I
don't think there's consensus about whether this is really good idea.

I think that there is a consensus that LEIRIs are a bad idea.
The current(ly expired) draft actually says so. What there is
no consensus on is whether nevertheless, LEIRIs should be
described in the (future) IRI spec.

>Head over to the URI mailing list and discuss it, if you're interested.
>
>> * Over the last few weeks, it has become clear that the URI specs need to
>> change for certain aspects of browser behavior and HTML to make sense and/or
>> work right.
>
>Nope.

Agreed. "for browser behavior to make sense" is an attempt to justify
such browser behavior from an a-priori (good vs. bad) standpoint.
The current browser behavior, overall, makes sense, but there are
some details where it doesn't make sense. It would be better to write
"for some details of current browser behavior to fit some spec"

>What has become clear is that HTML needs to handle a superset of what IRI allows, and also needs to special
case IRI->URI conversion for query components.

It may or may not need such a special case. The truth is that some years
ago (less than 10), virtually all existing non-ASCII path information
in (U/I)RIs had to be interpreted in the encoding of the containing page.
This has changed, because people started to pick up on the idea of IRIs,
more and more systems used UTF-8 on the server side, and at least some
people understood that using the encoding of the containing page
made it impossible to treat such identifiers free-standing. Also, a
fallback for paths in legacy encodings is still availible (and was always
available): %-encoding.

As long as query URIs are interpreted based on the encoding of the
containing page, they will stay useless without that context. I.e.
they cannot (without further pain) be put into bookmark lists, they
cannot be sent in email, and so on. The only sensible way to make
this possible is to do the same as for the path part, namely use
UTF-8 for the IRI->URI conversion. Freestanding (U/I)RIs with
query parts may be less important than freestanding (U/I)RIs
without query parts, but still, they are often convenient.
However, they won't work if implemented the way HTML5 is currently
describing them. Also, same as for path parts, a fallback for query
parts in legacy encodings is still availible (and was always
available): %-encoding.

In summary, there are cases where things changed to the better
in the last few years, and there are cases where some solutions
make the Web work better than others.

>That can be done in a separate spec, defining a mapping from "HTTP URL" to IRI reference, and then letting
the default URI/IRI rules apply.

I'm very much confused by "HTTP URL". In case that's the term that HTML5
currently uses, it should use a different one, to avoid confusion.

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@...     

Henry S. Thompson | 8 Jul 2008 16:56
Picon
Picon
Favicon

Re: URI/IRI vs HTML-URL


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Marting Duerst wrote:

> I think that there is a consensus that LEIRIs are a bad idea.  The
> current(ly expired) draft actually says so. What there is no
> consensus on is whether nevertheless, LEIRIs should be described in
> the (future) IRI spec.

The reason the W3C XML Core WG asked for the prose labelled "7.0
Legacy Extended IRIs" in the RFC3987bis draft [1] was to have a named
central place to reference for the range of XML specifications which
share a need to specify the conversion of XML system identifiers into
IRIs.  The IRI spec itself seemed to us to be the right place for
this, in terms of both technical and organizational appropriateness.

The WG is happy with the introduction to section 7, which makes clear
that LEIRIs are defined as a necessary bridge for specs which predate
IRIs, not as a mechanism for new specs or languages.

ht

[1] http://www.ietf.org/internet-drafts/draft-duerst-iri-bis-03.txt
- -- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
                         Half-time member of W3C Team
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@...
                       URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFIc4AAkjnJixAXWBoRAqBiAJ4yZJOufoNf/FTtp++vJDToRLOB1QCfeXZI
MYMcCTN1RhvIqwLMhFbX5CI=
=+XPm
-----END PGP SIGNATURE-----

Robert J Burns | 8 Jul 2008 13:48

Re: URI/IRI vs HTML-URL, was: Why Microsoft's authoritative=true won't work and is a bad idea


On Jul 8, 2008, at 9:13 AM, Martin Duerst wrote:
>
> As long as query URIs are interpreted based on the encoding of the
> containing page, they will stay useless without that context. I.e.
> they cannot (without further pain) be put into bookmark lists, they
> cannot be sent in email, and so on. The only sensible way to make
> this possible is to do the same as for the path part, namely use
> UTF-8 for the IRI->URI conversion. Freestanding (U/I)RIs with
> query parts may be less important than freestanding (U/I)RIs
> without query parts, but still, they are often convenient.
> However, they won't work if implemented the way HTML5 is currently
> describing them. Also, same as for path parts, a fallback for query
> parts in legacy encodings is still availible (and was always
> available): %-encoding.
>

Some implementations also break the fallback %-encoding by first  
trying to reinterpret the %-encoding within the current document  
encoding and then translating where appropriate. For example if the  
percent encoding represents a Unicode code point that maps to the  
current document encoding the implementation uses that translated  
bytecode instead of the literal percent encoded bytecode. I'm not sure  
whether this is an unfixable implementation error or whether we could  
use HTML5 to get these implementations back on track though.

On Jul 8, 2008, at 11:20 AM, Stefan Eissing wrote:
>
> Am 08.07.2008 um 09:27 schrieb Julian Reschke:
>> The other issue that got a lot of discussion is whether the things  
>> used in HTML should be called "URL", when in reality they are  
>> something else.
>
> Calling them HREFs (even though they also appear in other  
> attributes) would give everyone the right context (HTML) and topic  
> (URLs) without the confusion of redefining existing terms.

 From the relevant RFCs the term "URL reference" already exists and is  
the appropriate term for the value taken by the  <at> href,  <at> cite,  <at> src and  
other attributes ("URI reference" or "IRI reference" might also make  
sense).

Take care,
Rob

Julian Reschke | 8 Jul 2008 10:03
Picon
Picon

Re: URI/IRI vs HTML-URL, was: Why Microsoft's authoritative=true won't work and is a bad idea


Martin Duerst wrote:
> ...
> It may or may not need such a special case. The truth is that some years
> ago (less than 10), virtually all existing non-ASCII path information
> in (U/I)RIs had to be interpreted in the encoding of the containing page.
> This has changed, because people started to pick up on the idea of IRIs,
> more and more systems used UTF-8 on the server side, and at least some
> people understood that using the encoding of the containing page
> made it impossible to treat such identifiers free-standing. Also, a
> fallback for paths in legacy encodings is still availible (and was always
> available): %-encoding.
> 
> As long as query URIs are interpreted based on the encoding of the
> containing page, they will stay useless without that context. I.e.
> they cannot (without further pain) be put into bookmark lists, they
> cannot be sent in email, and so on. The only sensible way to make
> this possible is to do the same as for the path part, namely use
> UTF-8 for the IRI->URI conversion. Freestanding (U/I)RIs with
> query parts may be less important than freestanding (U/I)RIs
> without query parts, but still, they are often convenient.
> However, they won't work if implemented the way HTML5 is currently
> describing them. Also, same as for path parts, a fallback for query
> parts in legacy encodings is still availible (and was always
> available): %-encoding.
> 
> In summary, there are cases where things changed to the better
> in the last few years, and there are cases where some solutions
> make the Web work better than others.
> ...

Note that HTML5 documents that carry aren't encoded in UTF-8 (or UTF-16)
and which carry non-ASCII query parameters are currently non-conformant.
(I personally don't think it makes a big difference in practice as HTML5
makes normatively defines their handling, so people will rely on that
anyway).

>> That can be done in a separate spec, defining a mapping from "HTTP URL" to IRI reference, and then letting
the default URI/IRI rules apply.
> 
> I'm very much confused by "HTTP URL". In case that's the term that HTML5
> currently uses, it should use a different one, to avoid confusion.

Actually, I wanted to say "HTML URL" (URL as used in HTML5). HTML5
really uses just the term "URL".

BR, Julian

Henrik Nordstrom | 7 Jul 2008 15:19
Gravatar

Re: Why Microsoft's authoritative=true won't work and is a bad idea

[sorry for the missing red thread in this message, please read it in
full before responding]

On mån, 2008-07-07 at 09:33 +0200, Julian Reschke wrote:

> The IETF HTTPbis working group has no mandate to do so. Thus it would 
> need to be rechartered, or a new WG would have to start.

And from a protool specification and common sense point of view it would
be the wrong thing to officially allow sniffing even when the
content-type is clearly specified.

HTTP already specifies when sniffing is allowed or not. Major browser
vendors have over time and by intent choosen to ignore this part of the
specifications, and now their ignorance is coming back and biting them
and their users. Does this mean that specifications should change to
allow for these bugs to grow into a standard feature encouraging
ignorance?

It also seems that some noticeable players have lost faith, thinking
that things won't improve over time and things will stay as bad or worse
over time. An attitude I find a bit disturbing when working with
specifications as it means nothing can be changed or fixed other than
documenting how broken the current implementations is today, ending in
the rationale that "UTF7 content sniffing is implemented by some, so it
must be supported by everyone even if completely stupid and current
specifications we all agreed to implement years ago says you MUST NOT".

Yee, it do take some years of effort before any result in these areas at
all is seen, but it's certainly not impossible. I have been fighting
some of these wars, and some hours per year over some years nagging the
right people about something which bothers you can make a difference.

Yes, in the end there will be some old minor sites no longer working
well with newer browsers if sniffing is deprecated. But there will also
be existing major sites working better, being able to use content types
as intended instead of having to find ways around the browsers guessing
game.

HTTP intentionally does not specify how sniffing is to be implemented or
evaluated. That's a client implementation detail as far as HTTP is
concerned, and extra feature to be used when nothing else is known about
the content.

> > How is that possible?
> 
> Using Microsoft's proposal or by using a separate header, for instance.

If it wasn't for the Apache answer that if such extension gets commonly
available then it will be set by default by Apache, and things would go
back to square -1 by the reasoning applied earlier, with even more bits
on the wire that nobody want's to trust because server admins is by
definition not trustworthy to be willing to make their servers conform
with requirements or in general completely ignorant if their content
breaks for large parts of their user base because of this.

My concern about the proposal or added header is the reverse. Yes, it
will enable servers to tell next generation of clients to trust them,
but on the downside it will give more slack to the proponents which
thinks sniffing is the solution to how to deal with mislabelled content.
It's not a real solution to the problem, in fact it encourages that bug
to grow even bigger, just adding a workaround to be able to ask that bug
to go and hide for a while.

> Well, the biggest vendor just put a proposal on the table that would 
> make it possible to disable sniffing altogether.
> 
> Maybe it would make sense to consider it seriously, instead of 
> immediately stating "won't work"?

It will work, at least temporarily until there is again sufficient
amount of mislabelled content.

The only real long term solution I see to this problem is for major
browser vendors to gradually stop sniffing content even without this
extension. Add "serer trust" levels similar to how cookies
black/whitelisting is managed, enabling the browsers to learn (by user
experience) which sites label their content proerly and which don't. A
good start on this track is to add a visible indication when mislabelled
content is detected, enabling users to see when there is something wrong
without "destroying the web".

Regards
Henrik
David Morris | 7 Jul 2008 22:28
Favicon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


On Mon, 7 Jul 2008, Henrik Nordstrom wrote:

> [sorry for the missing red thread in this message, please read it in
...
> experience) which sites label their content proerly and which don't. A
> good start on this track is to add a visible indication when mislabelled
> content is detected, enabling users to see when there is something wrong
> without "destroying the web".

Probably the most useful suggestion in this long thread. Unless it is easy
to identify incorrect labeling, it won't happen.

Many major web sites produce invalid javascript. Generally benign. Some
browsers offer an option to report an error when scripting errors are
detected. Of course, it is obvious few of the developers or QA folks
associatead with web content and applications enable those error reports.
But when I was leading web technology based development teams, I used to
insist that everyone enable error reporting so as to improve our quality.

Having invalid content of other forms noted as well would enable better
overall quality.

I'd also like to see a standard mechanism by which UAs could automatically
report errors to origin servers with enough data to allow the server
administrator a hope of tracking down the problem. Probably with a
mechanism like Microsoft and Apple use to propmpt users for permission to
send error data.

David Morris

Ian Hickson | 7 Jul 2008 23:40
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


On Mon, 7 Jul 2008, David Morris wrote:
> 
> Having invalid content of other forms noted as well would enable better 
> overall quality.

This isn't a particularly new idea; the main problem with it is that the 
indicator would basically be always showing. Some _conservative_ estimates 
put the number of pages with *syntax errors alone* in the 90% range. If 
you add things like Content-Type errors, other HTTP problems, attribute 
value errors, CSS errors, scripting errors, etc, the number is likely so 
close to 100% that frankly the user will just wonder why his browser is 
sad all the time. (It would be interesting to see if anyone could actually 
find a Web page with more than 10kb of total content that is not in any 
way affiliated with the person who found it and that had absolutely no 
errors of any kind. I'm not convinced there are any.)

--

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Julian Reschke | 8 Jul 2008 09:30
Picon
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Ian Hickson wrote:
> On Mon, 7 Jul 2008, David Morris wrote:
>> Having invalid content of other forms noted as well would enable better 
>> overall quality.
> 
> This isn't a particularly new idea; the main problem with it is that the 
> indicator would basically be always showing. Some _conservative_ estimates 
> put the number of pages with *syntax errors alone* in the 90% range. If 
> you add things like Content-Type errors, other HTTP problems, attribute 
> value errors, CSS errors, scripting errors, etc, the number is likely so 
> close to 100% that frankly the user will just wonder why his browser is 
> sad all the time. (It would be interesting to see if anyone could actually 
> find a Web page with more than 10kb of total content that is not in any 
> way affiliated with the person who found it and that had absolutely no 
> errors of any kind. I'm not convinced there are any.)

The important point here is that the *page developers* should be able to 
detect the errors.

BR, Julian

Dan Connolly | 8 Jul 2008 00:04
Picon
Favicon

any error-free web pages at all? [was: Why Microsoft's authoritative=true won't work and is a bad idea]


On Mon, 2008-07-07 at 21:40 +0000, Ian Hickson wrote:
> [...] (It would be interesting to see if anyone could actually 
> find a Web page with more than 10kb of total content that is not in any 
> way affiliated with the person who found it and that had absolutely no 
> errors of any kind. I'm not convinced there are any.)

OK... at the risk of further exposing my ignorance... I'll bite.

Today's featured article in wikipedia is ~32k and I don't
see any errors; the W3C markup service
says "This Page Is Valid XHTML 1.0 Transitional!" and
the firefox error console is blank.
http://en.wikipedia.org/wiki/New_York_State_Route_32

Then I did a search for "XHTML CSS web design"
and about half the links get an OK from the markup
validator, e.g.
  http://www.oswd.org/
  http://veerle.duoh.com/

I've seen data that puts the number of valid web
pages at 1 in a million or so, but there are billions
of web pages out there, so I don't think error-free
pages are *that* hard to find.

--

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
gpg D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Philip TAYLOR (Ret'd | 8 Jul 2008 01:17
Picon
Favicon

Re: any error-free web pages at all? [was: Why Microsoft's authoritative=true won't work and is a bad idea]


Dan Connolly wrote:

> OK... at the risk of further exposing my ignorance... I'll bite.
> 
> Today's featured article in wikipedia is ~32k and I don't
> see any errors; the W3C markup service
> says "This Page Is Valid XHTML 1.0 Transitional!" and
> the firefox error console is blank.
> http://en.wikipedia.org/wiki/New_York_State_Route_32

Yet the page is being served as text/html and should thus
be parsed as such (and not as XHTML).  The <head> region
should therefore terminate at the "/" of

	<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

and the ">" of the same should cause an error ("character data not allowed
here" or somesuch).  Not the best example of an error-free page, IMHO, Dan ...

Philip TAYLOR

T.V Raman | 8 Jul 2008 19:02
Picon
Favicon

Re: any error-free web pages at all? [was: Why Microsoft's authoritative=true won't work and is a bad idea]


Actually the algorithm you used to call that page an error takes
me back to the horror days of <b><i>..</b>  HTML where
eventually, correct, well-balanced markup often failed as an
error as the various browsers bent over backwards to "correct"
for errors.

Philip TAYLOR (Ret'd) writes:
 > 
 > 
 > 
 > Dan Connolly wrote:
 > 
 > > OK... at the risk of further exposing my ignorance... I'll bite.
 > > 
 > > Today's featured article in wikipedia is ~32k and I don't
 > > see any errors; the W3C markup service
 > > says "This Page Is Valid XHTML 1.0 Transitional!" and
 > > the firefox error console is blank.
 > > http://en.wikipedia.org/wiki/New_York_State_Route_32
 > 
 > Yet the page is being served as text/html and should thus
 > be parsed as such (and not as XHTML).  The <head> region
 > should therefore terminate at the "/" of
 > 
 > 	<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 > 
 > and the ">" of the same should cause an error ("character data not allowed
 > here" or somesuch).  Not the best example of an error-free page, IMHO, Dan ...
 > 
 > Philip TAYLOR

--

-- 
Best Regards,
--raman

Title:  Research Scientist      
Email:  raman@...
WWW:    http://emacspeak.sf.net/raman/
Google: tv+raman 
GTalk:  raman@..., tv.raman.tv@...
PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc

Ian Hickson | 8 Jul 2008 01:03
Picon

Re: any error-free web pages at all?


Moved to www-archive.

On Mon, 7 Jul 2008, Dan Connolly wrote:
> 
> OK... at the risk of further exposing my ignorance... I'll bite.
> 
> Today's featured article in wikipedia is ~32k and I don't see any 
> errors; the W3C markup service says "This Page Is Valid XHTML 1.0 
> Transitional!" and the firefox error console is blank. 
> http://en.wikipedia.org/wiki/New_York_State_Route_32

Content-Type is text/html, but the content is XHTML.

The content has all manner of unregistered extension names and values used 
in HTTP headers, <link rel>, and <link type>, but that's not a huge deal.

The CSS contains the still-non-standard value overflow-x at least twice.

For the <script> elements, the type="" doesn't match the Content-Type.

The Content-Type for scripts is not a registered type.

The scripts use all manner of non-standard APIs, including APIs that even 
HTML5 doesn't (yet) define, like "navigator.userAgent".

The scripts assume that createElement() will return XHTML-namespace nodes, 
which isn't the way under current DOM specs define things.

There is usage of the unregistered javascript: scheme. Said usage violates 
the URI specs.

The injectSpinner() function misuses either alt="" or title="" or both.

ts_makeSortable() uses the wrong alt="" for the <img> as far as I can 
tell.

The IPv6 AAAA connectivity testing section in the Wikipedia-specific 
scripts generates a number of <img> elements whose src="" attributes point 
to non-image content (src="").

The page is shock-full of deprecated presentational markup.

The page has several images with alt="" attributes that don't conform to 
WCAG best practices; some just have the value missing altogether (marking 
content images as decorative); others have captions instead of 
replacement text, etc.

The page has multiple empty paragraphs that are not empty for the 
purposes of future scripts filling them in.

The semantics of the page are somewhat dubious (try looking at the page in 
Lynx or with Firefox with styles disabled -- there are iframes where you 
wouldn't expect, newlines where you wouldn't expect, and odd images next 
to big images, which I assume as supposed to be UI parts.

--

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Dan Connolly | 8 Jul 2008 04:03
Picon
Favicon

Re: any error-free web pages at all?


I expected you'd fill in a few gaps in my education. Thanks.

On Mon, 2008-07-07 at 23:03 +0000, Ian Hickson wrote:
> Moved to www-archive.
[...]

--

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
gpg D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Boris Zbarsky | 7 Jul 2008 20:21
Picon
Favicon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Henrik Nordstrom wrote:
> HTTP already specifies when sniffing is allowed or not. Major browser
> vendors have over time and by intent choosen to ignore this part of the
> specifications

Indeed, though as far as I can tell all of them except IE did this in the face 
of the #1 most-commonly-used HTTP server having a "feature" which essentially 
forced them to do it if they were to have a hope of being compatible with 
commonly-used websites. That's for text/plain.  Feed sniffing was more a matter 
of standalone feed readers ignoring Content-Type altogether and treating 
everything as a feed, which meant that there was zero incentive to label feeds 
as such.  When browsers came to implement a feed reader, the status quo was that 
a large fraction of feeds (easily double-digit percentages) was mislabeled.

> and now their ignorance is coming back and biting them
> and their users.

Excuse me?  "Ignorance"?  Everyone involved knew exactly what they were doing. 
There were just no good solutions; the small amount of sniffing added seemed 
like the least bad of a set of bad choices.

> Does this mean that specifications should change to
> allow for these bugs to grow into a standard feature encouraging
> ignorance?

The specifications, the UAs, and the servers should change such that:

1)  The UAs implement the specification.
2)  The servers implement the specification.
3)  The specification defines error-handling.
4)  The ensemble is a stable equilibrium (Ideally no one has incentive to
     change behavior).
5)  At no point in between here and there is a UA required to do something
     that would cause its users to stop using it (an obvious non-starter
     from a UA point of view).
6)  At no point in between here and there is a server required to do
     something that would cause administrators to stop using it (also an
     obvious non-starter, I would think).

I have no opinion as to what the final state should be, subject to the above 
constraints.

> It also seems that some noticeable players have lost faith, thinking
> that things won't improve over time and things will stay as bad or worse
> over time.

That's an empirical observation of the last 10 years, for what it's worth, not 
just a "think".  If you think the next 10 years will somehow be different, I'd 
love to know why.

-Boris

Henrik Nordstrom | 8 Jul 2008 00:46
Gravatar

Re: Why Microsoft's authoritative=true won't work and is a bad idea

On mån, 2008-07-07 at 14:21 -0400, Boris Zbarsky wrote:

> Excuse me?  "Ignorance"?  Everyone involved knew exactly what they were doing. 
> There were just no good solutions; the small amount of sniffing added seemed 
> like the least bad of a set of bad choices.

I obviously disagree, but that's my opinion.

> The specifications, the UAs, and the servers should change such that:

I'll add

0) The specifications makes sense and unambious to implement

> 1)  The UAs implement the specification.
> 2)  The servers implement the specification.
> 3)  The specification defines error-handling.
> 4)  The ensemble is a stable equilibrium (Ideally no one has incentive to
>      change behavior).
> 5)  At no point in between here and there is a UA required to do something
>      that would cause its users to stop using it (an obvious non-starter
>      from a UA point of view).
> 6)  At no point in between here and there is a server required to do
>      something that would cause administrators to stop using it (also an
>      obvious non-starter, I would think).

Yes, with some reservations for 5 & 6. I do expect UAs and servers to be
willing to correct bugs, even if correcting those bugs would cause some
slight interoperability issues with other broken implementations at the
benefit of enabling correct interoperability with correct
implementations. Even if this results in some users shifting one way or
another.

> I have no opinion as to what the final state should be, subject to the above 
> constraints.

I have some opinions, based on

  - Simplicity.

  - No second-guessing or non-obvious sideeffects. If something is said
it is said and should be trusted to be correct.

  - Consistent. As few special cases as possible.

> That's an empirical observation of the last 10 years, for what it's worth, not 
> just a "think".  If you think the next 10 years will somehow be different, I'd 
> love to know why.

Been in this business for more than 10 years, and have not yet lost
faith in the ability to work for a more standardized and predictable
computing environment.

But if standardisation discussions in general tend to focus on "making
current broken implementations the standardized status and assuming all
implementations will be broken in the same way" instead of what makes
sense from a long term technical standard point of view then things will
certainly spin in the direction of worse.

Regards
Henrik
Boris Zbarsky | 8 Jul 2008 01:31
Picon
Favicon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Henrik Nordstrom wrote:
>> Excuse me?  "Ignorance"?  Everyone involved knew exactly what they were doing. 
>> There were just no good solutions; the small amount of sniffing added seemed 
>> like the least bad of a set of bad choices.
> 
> I obviously disagree, but that's my opinion.

You're entitled to it, and I should clarify that the above only applies to the 
cases in which I've been able to see the reasoning process that led to the 
decisions (namely Gecko and Webkit).

> 0) The specifications makes sense and unambious to implement

Assuming you meant "unambiguous", I agree.  If you meant something else, what 
did you mean?

>> 5)  At no point in between here and there is a UA required to do something
>>      that would cause its users to stop using it (an obvious non-starter
>>      from a UA point of view).
>> 6)  At no point in between here and there is a server required to do
>>      something that would cause administrators to stop using it (also an
>>      obvious non-starter, I would think).
> 
> Yes, with some reservations for 5 & 6. I do expect UAs and servers to be
> willing to correct bugs, even if correcting those bugs would cause some
> slight interoperability issues with other broken implementations at the
> benefit of enabling correct interoperability with correct
> implementations. Even if this results in some users shifting one way or
> another.

So you're asking people to shoot themselves in the foot for the common good. 
While some may be willing to, in general that's a tough sell if the shooting is 
significant enough.

Put another way, I can't think of a browser that would be willing to, say, 
sacrifice 5% of market share on this issue.  I suspect sacrificing a single user 
is acceptable.  The line is somewhere in between.

>   - Simplicity.

Which is nice if possible, of course.  Are we talking simplicity of 
specification, of implementation, or of deployment?

>   - No second-guessing or non-obvious sideeffects. If something is said
> it is said and should be trusted to be correct.

This is nice to have, yes.

>   - Consistent. As few special cases as possible.

Again, this is nice to have.

-Boris

Henrik Nordstrom | 8 Jul 2008 01:55
Gravatar

Re: Why Microsoft's authoritative=true won't work and is a bad idea

On mån, 2008-07-07 at 19:31 -0400, Boris Zbarsky wrote:

> > 0) The specifications makes sense and unambious to implement
> 
> Assuming you meant "unambiguous", I agree.

I did. Always have a hard time spelling that word for some reason...

> So you're asking people to shoot themselves in the foot for the common good. 
> While some may be willing to, in general that's a tough sell if the shooting is 
> significant enough.

No I am not.

> Put another way, I can't think of a browser that would be willing to, say, 
> sacrifice 5% of market share on this issue.  I suspect sacrificing a single user 
> is acceptable.  The line is somewhere in between.

Yes. The rule is that you sacrifice some share to gain another part and
improve long term stability and reliability.

> >   - Simplicity.
> 
> Which is nice if possible, of course.  Are we talking simplicity of 
> specification, of implementation, or of deployment?

In this discussion at least specification and implementation. Usually
goes hand in hand.

Regards
Henrik
Ian Hickson | 7 Jul 2008 11:43
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


On Mon, 7 Jul 2008, Julian Reschke wrote:
> Ian Hickson wrote:
> > > I wouldn't consider trusting the server supplied content type an 
> > > "extreme."
> > 
> > Compared to the status quo, it is an extreme. (If you consider the 
> > possible implementation space as a multidimensional phase space, and 
> > consider the current implementations are points in phase space, they 
> > are all relatively close to each other, and close to HTML5. The 
> > position that involves no sniffing at all, whether that be 
> > HTTP-compliance or this new authoritative=true parameter, is far, far 
> > from the browsers.)
> 
> It's an "extreme" that is currently allowed in HTML5, remember?
>
> "If the user agent is configured to strictly obey Content-Type headers 
> for this resource, then jump to the last step in this set of steps." -- 
> <http://www.w3.org/html/wg/html5/#content-type0>

Yes, you asked for it. It's not an extreme that anyone is going to 
implement in widely distributed software. (The HTML5 spec similarly allows 
UAs to abort with a fatal error when they come acros a parse error, but 
you won't see that implemented widely either.)

> It seems you are satisfied with the equilibrium HTML5 defines.

No, I hate the status quo. I wish Content-Type was followed to the letter. 
I just don't see that as a realistic possibility given how the Web works 
today.

> Many think that the information supplied by the server must be treated 
> as authoritative, thus want to reach a *different* equilibrium. That may 
> require more changes, but this doesn't mean it can't be done (despite 
> what you say).

Well, let me know when you succeed.

I worked hard (writing test suites, filing bugs, contacting engineers 
directly, etc) from about 1998 to about 2007 to get multiple vendors to 
change their ways and adopt a more strict implementation of Content-Type 
headers. In fact, I think it's probably not a stretch to say that I've 
done more than anyone else to push for strict Content-Type conformance.

Based on that experience, I don't believe that it is possible to reach 
that state and stay there. It simply isn't feasible given the economics of 
the Web. At least, that's what I've concluded.

Please, prove me wrong.

> > I would aboslutely love it if the relevant groups would take this 
> > stuff and specify it themselves. However, the HTTP group has already 
> > indicated
> 
> With "it", what exactly do you mean? The thing these groups will agree 
> on, or the thing you prefer personally?

Anything that can lead to interoperability, the browsers doing what the 
specs say, and the specs being a complete description of what browsers 
have to implement would be good as far as I'm concerned. What exactly the 
specs say isn't my main concern.

Right now, HTTP is incomplete (it doesn't define, e.g., error handling), 
and doesn't match reality (e.g. browsers don't obey Content-Type like HTTP 
says they should).

Whether the browsers change to match HTTP or HTTP changes to match the 
browsers or they both change and meet at a middle ground, I don't mind.

However, if the groups agree on something that is either incomplete (i.e. 
doesn't define precise behaviour for every case, including error cases) or 
is something vendors won't implement, then that's no good.

The same applies to the URI specs.

(Both HTTP and URI mailing lists have had people request these issues be 
addressed; in both cases the requests were dismissed. I don't really care 
enough to fight this; in the meantime, HTML5 will fill in the holes. 
Incidentally, I don't really care about drawing partisan lines around what 
applies to HTML and what applies to browsers and what applies to Other 
Specs and Other Software and so on. As far as I'm concerned there's only 
one Web and we need one coherent set of rules for everything on the Web, 
whether it's HTML or not, and whether it's browsers or not.)

> > > With the current text in HTML5, there's not only no "good answer" 
> > > but no answer at all (except by telling users to configure their UAs 
> > > to respect mime types).
> > 
> > This problem has nothing to do with the spec, since the spec currently 
> > requires text/plain to be honoured in this case.
> > 
> > The "bad" answer is for Sam to stuff the top of this text/plain feeds 
> > with filler content that doesn't get sniffed, so that the sniffing 
> > heuristics in IE and Firefox get tricked into not seeing the feed 
> > content. (So, there _is_ an answer, it's just not a good one.)
> 
> That may be a workaround that works in this case, but I doubt it's 
> universally applicable.

Yes... it's not a "good" answer...

> > > Sam's use case could be made compatible by making the response 
> > > distinguishable from one sent by a misconfigured server.
> > 
> > How is that possible?
> 
> Using Microsoft's proposal or by using a separate header, for instance.

And how do you distinguish someone using the parameter or header correctly 
from one using it in a misconfigured case?

> Well, the biggest vendor just put a proposal on the table that would 
> make it possible to disable sniffing altogether.

Only when a parameter is present, and only if nobody ever misues it. The 
parameter won't be always included, and it will almost certainly be 
misused. So it certainly won't be possible to disable sniffing altogether, 
and on the long term it almost certainly won't be possible to disable it 
altogether even when the parameter is included.

> Maybe it would make sense to consider it seriously, instead of 
> immediately stating "won't work"?

Please don't think that just because I can give a list of problems off the 
top of my head, I haven't seriously considered something. This idea was 
considered seriously _years_ ago. It's not a new idea.

--

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Sam Ruby | 7 Jul 2008 03:31
Picon
Favicon

Re: Why Microsoft's authoritative=true won't work and is a bad idea

Ian Hickson wrote on 07/06/2008 06:19:20 PM:
>
> > At this point it seems to me that you are simply not interested in that
> > case. Is this correct?
>
> I would love sniffing to go away altogether. I'm so interested in this
> particular use case that HTML5 in fact supports it _despite_ this
> requiring changes from the two biggest browsers. What more can I do?

Perhaps something like this:

Hey Erik!

I read your post, and see that you have posted to the W3C mailing list:

http://blogs.msdn.com/ie/archive/2008/07/02/ie8-security-part-v-comprehensive-protection.aspx
http://lists.w3.org/Archives/Public/public-html/2008Jul/0088.html

I was wondering if you could expand on the use case that motivated this change, and if you can comment on whether there is any flexibility to discuss alternatives such as a new response header, to be implemented some time post IE8 Beta-2.
 
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

- Sam Ruby

Boris Zbarsky | 5 Jul 2008 19:32
Picon
Favicon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Sam Ruby wrote:
> I then fetched the file using IE 7.0.5730.13, Firefox 3.0, Safari 3.1.2, 
> and Opera 9.50. IE and Firefox rendered the content as a feed

Note <https://bugzilla.mozilla.org/show_bug.cgi?id=394416> (that is, 
this behavior is specific to Firefox, not to Gecko, and is certainly a 
bug in my view).

-Boris

Frank Ellermann | 5 Jul 2008 11:49
Picon
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Julian Reschke wrote:

> Sounds inconsistent to me.

Just for fun, MIME parts can have a Content-Type:

| Content-Type: text/html; (note unusual charset) charset=cp437

2.7.1 step 3 returns nothing 

| Content-Type: text/html; charset= (take) "cp437" (that)

2.7.1 step 5 returns charset (take)

 Frank

Julian Reschke | 8 Jul 2008 12:03
Picon
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Frank Ellermann wrote:
> Julian Reschke wrote:
>  
>> Sounds inconsistent to me.
> 
> Just for fun, MIME parts can have a Content-Type:
> 
> | Content-Type: text/html; (note unusual charset) charset=cp437

You mean "can have a comment"? Looking at RFC2616, Sections 14.17 and 
3.7, that doesn't seem to be correct...

> ...

BR, Julian

Frank Ellermann | 8 Jul 2008 14:58
Picon
Picon

Re: Why Microsoft's authoritative=true won't work and is a bad idea


Julian Reschke wrote:

>> | Content-Type: text/html; (note unusual charset) charset=cp437

> You mean "can have a comment"? Looking at RFC2616, Sections 14.17
> and 3.7, that doesn't seem to be correct...

...arrghh, those HTTP geeks.  MIME parts are something in e-mail
or NetNews, often rendered by a Web browser if it's text/html ;-)

 Frank

William A. Rowe, Jr. | 3 Jul 2008 21:22

Re: Microsoft's "I mean it" content-type parameter


Dave Singer wrote:
> 
> At 18:17  +0200 3/07/08, Julian Reschke wrote:
>>
>> The way to signal "unknown" is not to send a Content-Type header at 
>> all. As far as I understand, this is what happens with httpd trunk 
>> when you set the DefaultType to "none".

Agreed.

> or, it seems, "application/octet-stream".  From HTTP 1.1:
> 
> Any HTTP/1.1 message containing an entity-body SHOULD include a 
> Content-Type header field defining the media type of that body. If and 
> only if the media type is not given by a Content-Type field, the 
> recipient MAY attempt to guess the media type via inspection of its 
> content and/or the name extension(s) of the URI used to identify the 
> resource. If the media type remains unknown, the recipient SHOULD treat 
> it as type "application/octet-stream".

Your interpretation makes no sense... it does not say that sniffing the
binary/octet-stream is permitted, it says that it is to treat it as
opaque data.

> It does seem as if sniffing when there is a content-type header is 
> flat-out forbidden.  I.e. the presence of content-type was supposed to 
> serve *exactly* what the "I mean it" extension is doing...
> 
> Next up:  a server that always adds the "I mean it" attribute, even when 
> it doesn't, and the subsequent invention of the "No, really, come on, 
> you have to believe me, scout's honor, I really truly mean it" extension.

ROFL :)

Julian Reschke | 3 Jul 2008 18:52
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


Dave Singer wrote:
> ...
> Next up:  a server that always adds the "I mean it" attribute, even when 
> it doesn't, and the subsequent invention of the "No, really, come on, 
> you have to believe me, scout's honor, I really truly mean it" extension.
 > ...

Of course. That's why we usually do not define specific error recovery 
in HTTPbis. First of all, mandating a very specific error recovery (1) 
may not be the right thing for all use cases, (2) it blurs the boundary 
between valid and invalid (if the behavior for invalid is mandatory, 
where's the point in producing valid messages), and (3), as you said, in 
the end you'll have to define error recovery for the error recovery...

BR, Julian

Frank Ellermann | 3 Jul 2008 11:35
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


Julian Reschke wrote:

> I can't even reproduce that *specific* case with IE6 and IE7,
> see <http://hixie.ch/tests/adhoc/http/content-type/013.html>.
> Not sure what I'm missing here...

...my IE6 says that you missed 014.html on the same server :-)

 Frank

Julian Reschke | 3 Jul 2008 11:42
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


Frank Ellermann wrote:
> Julian Reschke wrote:
>  
>> I can't even reproduce that *specific* case with IE6 and IE7,
>> see <http://hixie.ch/tests/adhoc/http/content-type/013.html>.
>> Not sure what I'm missing here...
> 
> ...my IE6 says that you missed 014.html on the same server :-)

Indeed. But it does display as plain text in IE7.

BR, Julian

Julian Reschke | 2 Jul 2008 23:31
Picon
Picon

Re: Microsoft's "I mean it" content-type parameter


Robert Collins wrote:
> ...
> If they assume that fixing all the bust clients they have been shipping
> for years is infeasible, then I think they would have concluded its the
> right way.
> 
> I think its bogus - it requires every web site author in existence to
> change their site to fix a defect in MSIE. Thats got to be harder to
> deploy than just a hotfix to MSIE to not sniff at all. 'Sorry, bad idea,
> fixed in hotfix #12345.'
> ...

Well, not only MS is guilty of sniffing (although they may have started 
it), and the HTML5 spec has lots of details on how to do it 
(<http://www.w3.org/html/wg/html5/#content-type-sniffing>), although it 
at least allows UAs not to sniff (*).

BR, Julian

(*) Need to check: is this the case throughout the spec, or are there 
exceptions?


Gmane