19 Nov 2007 10:26
Metadata use by Apache Java projects
Jeremias Maerki <dev <at> jeremias-maerki.ch>
2007-11-19 09:26:47 GMT
2007-11-19 09:26:47 GMT
(I realize this is heavy cross-posting but it's probably the best way to reach all the players I want to address.) As you may know, I've started developing an XMP metadata package inside XML Graphics Commons in order to support XMP metadata (and ultimately PDF/A) in Apache FOP. Therefore, I have quite an interest in metadata. What is XMP? XMP, for those who don't know about it, is based on a subset of RDF to provide a flexible and extensible way of storing/representing document metadata. Yesterday, I was surprised to discover that Adobe has published an XMP Toolkit with Java support under the BSD license. In contrast to my effort, Adobe's toolkit is quite complete if maybe a bit more complicated to use. That got me thinking: Every project I'm sending this message to is using document metadata in some form: - Apache XML Graphics: embeds document metadata in the generated files (just FOP at the moment, but Batik is a similar candidate) - Tika (in incubation): has as one of its main purposes the extraction of metadata - Sanselan (in incubation): extracts and embeds metadata from/in bitmap images - PDFBox (incubation in discussion): extracts and embeds XMP metadata from/in PDF files (see also JempBox) Every one of these projects has its own means to represent metadata in memory. Wouldn't it make sense to have a common approach? I've worked with XMP for some time now and I can say it's ideal to work with. It(Continue reading)
At least, that's my impression. Maybe I still just know too
little about RDF. IMO, XMP finds a good compromise between
expressiveness and simplicity. The positive points for Adobe's XMP
toolkit: it is in Java, available now and under a license we can easily
use in Apache projects.
In your point 4, you mention some restrictions you see for XMP. But XMP
is a subset of RDF, so does RDF really restrict you from an RDF point of
view? I didn't really understand that point.
We'll see how this works out.
Jeremias Maerki
On 20.11.2007 15:25:44 Antoni Mylka wrote:
> Hi Jeremias, tika-dev
>
> My name is Antoni Mylka, I am involved in aperture.sourceforge.net,
> which is addressing similar things as Tika, we got your mail on the
> tika-dev mailing list. I also work for the Nepomuk Social Semantic
> Desktop project, I'm the maintainer of the Nepomuk Information Element
> Ontology. More below.
>
> Your mail addresses four more-or-less orthogonal issues.
RSS Feed