Alexios Giotis | 8 Feb 16:30
Picon
Gravatar

Producing & archiving FOP intermediate format

Hi,

I am already storing some millions per month of files containing FOP intermediate format (FOP_IF) using a
private patched branch based on FOP 1.0. The current use case is performance. If a document is found in the
store containing FOP_IF, then use it and create the final output format (typically PDF). If not, then
start from XML. The retention period of the FOP_IF files is 6 months to 1 year. The XML files are kept for at
least 10 years. In my tests, 85% of the time is spent on the layout and the rest for rendering. This has worked
well, especially for big documents (with thousands of pages). I have no worries about the FOP_IF format
and how it will evolve as I know that they will be gone after 6 months or one year max. And for sure, I can keep an
older version for that long. 

I am now planing to use FOP in different ways and use cases such as:

1. Bypassing FOP's layout engine and it's quirks in XSL:FO input, cpu-time and memory. This means directly
creating FOP_IF. With the same effort, I could use PDFBox (or iText 2.x) to create PDF files. But having
FOP_IF, I also produce AFP, PS and PCL which I need and I know no other open sources renderers.

2. Longer storage of FOP_IF. Compared to storing XML, it's faster, less components are involved until the
final output and it allows for easier versioning. For example, given the same XSL:FO input, FOP 2.0 will
not produce the *identical* content as FOP 1.0 (I hope somebody will disagree to this :) Compared to
storing PDF, the required space is much less as I have big volumes on expensive EMC storage. Secondly I
retain the flexibility on selecting parts to render. Not all users have the permissions to see all parts of
the documents. Also, some users see masked values (e.g. stars in place of a card number).

For both cases, I really need to know your thoughts and plans for FOP_IF. Watching the lists the last 2 years,
I have not noticed anything related to it.

Greetings,
Alexios Giotis
(Continue reading)

Alexios Giotis | 14 Feb 00:24
Picon
Gravatar

Re: Producing & archiving FOP intermediate format

Any thoughts or comments on this ? Of course, I don't expect anybody to make a commitment that it will change
in backwards compatible ways.

Alexios

On Feb 8, 2012, at 5:30 PM, Alexios Giotis wrote:

> Hi,
> 
> I am already storing some millions per month of files containing FOP intermediate format (FOP_IF) using a
private patched branch based on FOP 1.0. The current use case is performance. If a document is found in the
store containing FOP_IF, then use it and create the final output format (typically PDF). If not, then
start from XML. The retention period of the FOP_IF files is 6 months to 1 year. The XML files are kept for at
least 10 years. In my tests, 85% of the time is spent on the layout and the rest for rendering. This has worked
well, especially for big documents (with thousands of pages). I have no worries about the FOP_IF format
and how it will evolve as I know that they will be gone after 6 months or one year max. And for sure, I can keep an
older version for that long. 
> 
> I am now planing to use FOP in different ways and use cases such as:
> 
> 1. Bypassing FOP's layout engine and it's quirks in XSL:FO input, cpu-time and memory. This means
directly creating FOP_IF. With the same effort, I could use PDFBox (or iText 2.x) to create PDF files. But
having FOP_IF, I also produce AFP, PS and PCL which I need and I know no other open sources renderers.
> 
> 2. Longer storage of FOP_IF. Compared to storing XML, it's faster, less components are involved until the
final output and it allows for easier versioning. For example, given the same XSL:FO input, FOP 2.0 will
not produce the *identical* content as FOP 1.0 (I hope somebody will disagree to this :) Compared to
storing PDF, the required space is much less as I have big volumes on expensive EMC storage. Secondly I
retain the flexibility on selecting parts to render. Not all users have the permissions to see all parts of
the documents. Also, some users see masked values (e.g. stars in place of a card number).
(Continue reading)

mehdi houshmand | 14 Feb 09:23
Picon

Re: Producing & archiving FOP intermediate format

Hi Alex,

I'm not 100% what you're asking here, I must say. If it's more of a
general question of backward compatibility, then I think  we get back
to a topic of discussion we've been having lately about FOPs API. I'd
argue that IF itself is part of FOPs API (though I wouldn't make the
same argument for the Area-Tree), since it's so widely used and we
need to maintain some semblance of compatibility.

However, and this is a big caveat, at what cost? At what cost do we
want to maintain compatibility? I think if there's a compelling reason
it's OK to break compatibility, which obviously affects users like
yourself. As a user, you have to mitigate that risk, by either locking
down the FOP version or holding onto the XSL-FO (obviously not valid
if you're creating custom IF). As a more general argument, it really
isn't in the interest of broader user-base that FOP 1.0 and 2.0 create
identical output be that IF or PDF/AFP/whatever; FOP is evolving,
there will be new features added, bugs fixed etc, all of which change
the output. If I'm not mistaken, the new TaggedPDF branch merge, will
create IF that's incompatible with previous versions (breaking both
backward and forward compatibility). However this was well justified,
since the previous accessibility architecture was limited in design an
we were really pushing against the glass ceiling in terms of features.

Apologies if that hasn't been a particularly helpful answer, but if
you're wanting some reassurance that there won't be gratuitous changes
to IF, or some transparency on plans to rip apart IF structure, then I
can tell we don't plan either. However, we don't have the regression
tests to guarantee backwards compatibility, and I don't think having
them is a good idea.
(Continue reading)

Alexios Giotis | 14 Feb 13:56
Picon
Gravatar

Re: Producing & archiving FOP intermediate format

Hi Mehdi,

Your answer was very helpful. I was not asking for any assurance related to changes. I wanted mostly to know
about any plans to rip apart the IF structure and some insight on the way of thinking about compatibility
and future changes.

Related to long term archival, it seems (as I expected) that I definitely need to keep the different
versions of FOP, regardless if I keep any or all of  XMLs + XSLTs / XSL:FO / FOP_IF.

Thank you,
Alex

On Feb 14, 2012, at 10:23 AM, mehdi houshmand wrote:

> Hi Alex,
> 
> I'm not 100% what you're asking here, I must say. If it's more of a
> general question of backward compatibility, then I think  we get back
> to a topic of discussion we've been having lately about FOPs API. I'd
> argue that IF itself is part of FOPs API (though I wouldn't make the
> same argument for the Area-Tree), since it's so widely used and we
> need to maintain some semblance of compatibility.
> 
> However, and this is a big caveat, at what cost? At what cost do we
> want to maintain compatibility? I think if there's a compelling reason
> it's OK to break compatibility, which obviously affects users like
> yourself. As a user, you have to mitigate that risk, by either locking
> down the FOP version or holding onto the XSL-FO (obviously not valid
> if you're creating custom IF). As a more general argument, it really
> isn't in the interest of broader user-base that FOP 1.0 and 2.0 create
(Continue reading)

Marcin Tustin | 14 Feb 10:31
Picon
Gravatar

Re: Producing & archiving FOP intermediate format

In general it is a poor idea to rely on undocumented or
software-specific formats for long term archival purposes. If you
always also store the FOP, then you're likely to be fine.

On Mon, Feb 13, 2012 at 23:24, Alexios Giotis <alex.giotis <at> gmail.com> wrote:
> Any thoughts or comments on this ? Of course, I don't expect anybody to make a commitment that it will change
in backwards compatible ways.
>
> Alexios
>
>
> On Feb 8, 2012, at 5:30 PM, Alexios Giotis wrote:
>
>> Hi,
>>
>> I am already storing some millions per month of files containing FOP intermediate format (FOP_IF) using
a private patched branch based on FOP 1.0. The current use case is performance. If a document is found in the
store containing FOP_IF, then use it and create the final output format (typically PDF). If not, then
start from XML. The retention period of the FOP_IF files is 6 months to 1 year. The XML files are kept for at
least 10 years. In my tests, 85% of the time is spent on the layout and the rest for rendering. This has worked
well, especially for big documents (with thousands of pages). I have no worries about the FOP_IF format
and how it will evolve as I know that they will be gone after 6 months or one year max. And for sure, I can keep an
older version for that long.
>>
>> I am now planing to use FOP in different ways and use cases such as:
>>
>> 1. Bypassing FOP's layout engine and it's quirks in XSL:FO input, cpu-time and memory. This means
directly creating FOP_IF. With the same effort, I could use PDFBox (or iText 2.x) to create PDF files. But
having FOP_IF, I also produce AFP, PS and PCL which I need and I know no other open sources renderers.
>>
(Continue reading)


Gmane