Maurits van Rees | 13 Nov 23:17
Picon
Favicon

basesyndication: xhtml content

Hi again,

The browser/atom.xml.pt file of basesyndication has this comment:

"This body below should really be xhtml instead of semi-encoded
possibly unescaped strange stuff."

According to
http://www.atomenabled.org/developers/syndication/#text
you can say the type is xhtml and wrap a div around the text that you
want in there.  Current content tag of an entry:

     <content type="html" xml:base="" xml:lang="en-US" xml:space="preserve"
              tal:attributes="xml:base feed/getBaseURL">
         <tal:block tal:replace="structure string:&lt;![CDATA["/>
         <tal:block tal:replace="structure feedentry/getBody"/>
         <tal:block tal:replace="structure string:]]&gt;"/>
      </content>

The changes would make the content tag look like this:

      <content type="xhtml" xml:base="" xml:lang="en-US" xml:space="preserve"
               tal:attributes="xml:base feed/getBaseURL">
        <div xmlns="http://www.w3.org/1999/xhtml">
          <tal:block tal:replace="structure feedentry/getBody"/>
        </div>
      </content>

I'm not sure about that xml base, lang and space, but that's how the
page template currently looks.
(Continue reading)

Tim Hicks | 14 Nov 00:24
Picon
Favicon

Re: basesyndication: xhtml content

Maurits van Rees wrote:
> Hi again,
> 
> The browser/atom.xml.pt file of basesyndication has this comment:
> 
> "This body below should really be xhtml instead of semi-encoded
> possibly unescaped strange stuff."
> 
> According to
> http://www.atomenabled.org/developers/syndication/#text
> you can say the type is xhtml and wrap a div around the text that you
> want in there.  Current content tag of an entry:
> 
>      <content type="html" xml:base="" xml:lang="en-US" xml:space="preserve"
>               tal:attributes="xml:base feed/getBaseURL">
>          <tal:block tal:replace="structure string:&lt;![CDATA["/>
>          <tal:block tal:replace="structure feedentry/getBody"/>
>          <tal:block tal:replace="structure string:]]&gt;"/>
>       </content>
> 
> The changes would make the content tag look like this:
> 
>       <content type="xhtml" xml:base="" xml:lang="en-US" xml:space="preserve"
>                tal:attributes="xml:base feed/getBaseURL">
>         <div xmlns="http://www.w3.org/1999/xhtml">
>           <tal:block tal:replace="structure feedentry/getBody"/>
>         </div>
>       </content>
> 
> I'm not sure about that xml base, lang and space, but that's how the
(Continue reading)

Reinout van Rees | 14 Nov 14:41
Gravatar

Re: basesyndication: xhtml content

Tim Hicks wrote:
> Maurits van Rees wrote:

>> The changes would make the content tag look like this:
>>
>>       <content type="xhtml" xml:base="" xml:lang="en-US" xml:space="preserve"
>>                tal:attributes="xml:base feed/getBaseURL">
>>         <div xmlns="http://www.w3.org/1999/xhtml">
>>           <tal:block tal:replace="structure feedentry/getBody"/>
>>         </div>
>>       </content>
>>
>> I'm not sure about that xml base, lang and space, but that's how the
>> page template currently looks.
>>
>> Well, this works for me.  Thoughts?
> 
> That looks *much* saner to me :), so +1 if there are no compatibility 
> issues from this new structure.

It used to have the "real" html in there, but I reverted it back to a
CDATA section because

* We're not guaranteed to get good xhtml out of a blog entry. I often
got screams about unknown entities like &uuml; or unescaped &'s.

* The feed often didn't display well in bloglines: missing spaces around
<a> and <b> tags and so.

So: good idea in theory, but I got bitten by the practice. Might be
(Continue reading)

Tim Hicks | 14 Nov 15:10
Picon
Favicon

Re: basesyndication: xhtml content

Reinout van Rees wrote:
> Tim Hicks wrote:
>> Maurits van Rees wrote:
> 
>>> The changes would make the content tag look like this:
>>>
>>>       <content type="xhtml" xml:base="" xml:lang="en-US" xml:space="preserve"
>>>                tal:attributes="xml:base feed/getBaseURL">
>>>         <div xmlns="http://www.w3.org/1999/xhtml">
>>>           <tal:block tal:replace="structure feedentry/getBody"/>
>>>         </div>
>>>       </content>
>>>
>>> I'm not sure about that xml base, lang and space, but that's how the
>>> page template currently looks.
>>>
>>> Well, this works for me.  Thoughts?
>> That looks *much* saner to me :), so +1 if there are no compatibility 
>> issues from this new structure.
> 
> It used to have the "real" html in there, but I reverted it back to a
> CDATA section because
> 
> * We're not guaranteed to get good xhtml out of a blog entry. I often
> got screams about unknown entities like &uuml; or unescaped &'s.
> 
> * The feed often didn't display well in bloglines: missing spaces around
> <a> and <b> tags and so.
> 
> So: good idea in theory, but I got bitten by the practice. Might be
(Continue reading)

Reinout van Rees | 14 Nov 23:29
Gravatar

Re: basesyndication: xhtml content

Tim Hicks wrote:

> Ah, good points.
> 
> It now occurs to me that it may not be correct to *assume* that a call 
> to IFeedEntry.getBody() will return (x)html.  What if someone wants to 
> syndicate some plain text, say?
> 
> Perhaps that means that IFeed should be extended with a getBodyType 
> method, which could return a tuple like (major_type, minor_type).  Does 
> that make any sense?

Not sure if it is handier, but perhaps grab a getXhtml() first. If
that's empty, do a getBody. If it is xhtml: put it in directly,
otherwise put it in a CDATA.

Difference: we don't have to deal with a lot of major/minor types of
input, but just with the two that we support. On the other hand, keeping
the option open is also handy.

Brainstorming :-)

Reinout

--

-- 
Reinout van Rees                       r.van.rees @ zestsoftware.nl
http://vanrees.org/weblog/                  http://zestsoftware.nl/
"Military engineers build missiles. Civil engineers build targets."
Maurits van Rees | 18 Nov 01:03
Picon
Favicon

Re: basesyndication: xhtml content

Reinout van Rees, on 2006-11-14:
> Not sure if it is handier, but perhaps grab a getXhtml() first. If
> that's empty, do a getBody. If it is xhtml: put it in directly,
> otherwise put it in a CDATA.
>
> Difference: we don't have to deal with a lot of major/minor types of
> input, but just with the two that we support. On the other hand, keeping
> the option open is also handy.
>
> Brainstorming :-)

I'd like to keep some momentum going here.  I'm trying to get a new
version for my weblog off the ground. :-)

But I'm not sure on how to proceed here.  I don't fully grasp the
relationship between fatsyndication, basesyndication and Quills or
other weblog products and which parts of which products need to be
changed.

Well, let's think this through.  I think the implementation of your
idea would be as follows.  Correct my if I'm wrong.

- In basesyndication, add a function getXhtml() to IFeedEntry.

- In Quills (as example for other weblog products) add a function
  getXhtml() in syndication.py.  This returns the body text if the
  mimetype is text/html and return None otherwise.

- In fatsyndication do nothing, though possibly adapters/feedentry.py
  could get a getXhtml() function as well and either return the same
(Continue reading)

Reinout van Rees | 19 Nov 01:22
Gravatar

Re: basesyndication: xhtml content

Maurits van Rees wrote:

> I implemented that and that seems to work fine.  I added an (x)html
> weblog entry with kupu and a restructured text one.  The atom feed
> correctly puts the first in a div and the second one in a CDATA.
> rss2email didn't complain about this feed.
> 
> Would this be safe to commit?

Sounds OK. It *does* modify the interface, but it seems like a quite
natural change. It puts the responsibility for ensuring good xhtml where
it belongs.

Safe to commit? The number one thing is that it really really needs to
check whether it can safely grab getXhtml(). What I mean: it shouldn't
fail if that method isn't there. Backward compatibility.

Reinout

--

-- 
Reinout van Rees                       r.van.rees @ zestsoftware.nl
http://vanrees.org/weblog/                  http://zestsoftware.nl/
"Military engineers build missiles. Civil engineers build targets."
Maurits van Rees | 20 Nov 20:19
Picon
Favicon

Re: basesyndication: xhtml content

Reinout van Rees, on 2006-11-19:
> Sounds OK. It *does* modify the interface, but it seems like a quite
> natural change. It puts the responsibility for ensuring good xhtml where
> it belongs.
>
> Safe to commit? The number one thing is that it really really needs to
> check whether it can safely grab getXhtml(). What I mean: it shouldn't
> fail if that method isn't there. Backward compatibility.

AFAICT (As far as I can test) this goes fine.

I went ahead and committed it.  See revisions 33941 of basesyndication
and 33943 of Quills.

If this breaks anything, shout.

--

-- 
Maurits van Rees | http://maurits.vanrees.org/ [NL]
            Work | http://zestsoftware.nl/
"Do not worry about your difficulties in computers,
 I can assure you mine are still greater."
Tim Hicks | 18 Nov 17:13
Picon
Favicon

Re: basesyndication: xhtml content

Maurits van Rees wrote:
> Reinout van Rees, on 2006-11-14:
>> Not sure if it is handier, but perhaps grab a getXhtml() first. If
>> that's empty, do a getBody. If it is xhtml: put it in directly,
>> otherwise put it in a CDATA.
>>
>> Difference: we don't have to deal with a lot of major/minor types of
>> input, but just with the two that we support. On the other hand, keeping
>> the option open is also handy.
>>
>> Brainstorming :-)
> 
> I'd like to keep some momentum going here.  I'm trying to get a new
> version for my weblog off the ground. :-)
> 
> But I'm not sure on how to proceed here.  I don't fully grasp the
> relationship between fatsyndication, basesyndication and Quills or
> other weblog products and which parts of which products need to be
> changed.
> 
> Well, let's think this through.  I think the implementation of your
> idea would be as follows.  Correct my if I'm wrong.
> 
> - In basesyndication, add a function getXhtml() to IFeedEntry.

Yup.

> - In Quills (as example for other weblog products) add a function
>   getXhtml() in syndication.py.  This returns the body text if the
>   mimetype is text/html and return None otherwise.
(Continue reading)

Maurits van Rees | 19 Nov 00:25
Picon
Favicon

Re: basesyndication: xhtml content

Tim Hicks, on 2006-11-18:
>> Would this be safe to commit?
>
> I'm not yet convinced that adding a getXhtml (or perhaps getXHTML) 
> method is the right way to go.  Somehow it feels like a special-case 
> pollution of the (conceptual) IFeedEntry interface.
>
> Can you remind me what problem we are trying to solve?  If it's just 
> simplification of the template, then this conditioning will actually 
> make things worse.

Well, there is this comment in the atom feed about the CDATA section
that says:

  "This body below should really be xhtml instead of semi-encoded
  possibly unescaped strange stuff."

That doesn't strike me as very encouraging.

http://www.w3schools.com/xml/xml_cdata.asp says about CDATA:

  Everything inside a CDATA section is ignored by the parser.

  If your text contains a lot of "<" or "&" characters - as program
  code often does - the XML element can be defined as a CDATA section.

This makes me think that CDATA can be handy if you're not sure what
kind of data will end up in your template, but should be avoided if
possible.

(Continue reading)


Gmane