Gerd Zechmeister | 31 May 2012 12:42
Picon
Favicon

Extracting German noun forms

Hi,

I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.

Example:
http://de.wiktionary.org/wiki/Haus

I need the data from the box:
Kasus 	Singular 	Plural
Nominativ 	das Haus 	die Häuser
Genitiv 	des Hauses 	der Häuser
Dativ 	dem Haus
dem Hause 	den Häusern
Akkusativ 	das Haus 	die Häuser

Any idea how to get this? a SPARQL query expression?

regards,
Gerd

--
Gerd Zechmeister
Research & Development Manager

Semantic Web Company GmbH
Mariahilfer Straße 70 / 8
A - 1070 Vienna, Austria
Tel +43 1 402 12 35 - 28
Fax +43 1 402 12 35 - 22
Mobile +43 650 3905697
(Continue reading)

Christoph Lauer | 31 May 2012 17:12
Favicon

Re: Extracting German noun forms

> Hi,
> 
> I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
> 
> Example:
> http://de.wiktionary.org/wiki/Haus
> 
> I need the data from the box:
> Kasus 	Singular 	Plural
> Nominativ 	das Haus 	die Häuser
> Genitiv 	des Hauses 	der Häuser
> Dativ 	dem Haus
> dem Hause 	den Häusern
> Akkusativ 	das Haus 	die Häuser
> 
> Any idea how to get this? a SPARQL query expression?
> 
> regards,
> Gerd
> 

Hi Gerd,
I suppose you mean the dbpedia dumps from wiktionary, because the
wiktionary xml dumps contain the box data. If that is so, you're right
that unfortunately they are not in there. So a SPARQL query won't help
you either, it will give you the same informations that are in the
dumps. In order to add these informations you would have to write a
template for the "Entry Layout" as explained on the dbpedia website, but
I'm not an expert on that, maybe Jonas can tell you more about that, or
if it's even possible. Sory I can't help you any further :-)
(Continue reading)

Gerd Zechmeister | 31 May 2012 17:48
Picon
Favicon

Re: Extracting German noun forms

Hi Christoph,

thanks for your reply! In between we'll investigate here at SWC as well and let you know.

btw: Virtuoso returns an error when querying the endpoint (http://wiktionary.dbpedia.org/sparql)
with the expression below. Is that an encoding issue?

SELECT *
WHERE {
?s ?p ?o FILTER(bif:contains(?o, "häuser"))
}

Regards,
Gerd

----- Ursprüngliche Mail -----
Von: "Christoph Lauer" <dbpedia <at> online.ms>
An: "The Wiktionary (http://www.wiktionary.org) mailing list" <wiktionary-l <at> lists.wikimedia.org>
Gesendet: Donnerstag, 31. Mai 2012 17:12:16
Betreff: Re: [Wiktionary-l] Extracting German noun forms

> Hi,
>
> I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
>
> Example:
> http://de.wiktionary.org/wiki/Haus
>
> I need the data from the box:
> Kasus 	Singular 	Plural
(Continue reading)

Christoph Lauer | 31 May 2012 21:07
Favicon

Re: Extracting German noun forms

Hi Gerd,
Apparently SPARQL has problems with mutated vovels, you'll have to use
the Unicode escape sequence for the letter "ä" instead (something with
\u and 4 characters).
Regards,
Christoph

> Hi Christoph,
> 
> thanks for your reply! In between we'll investigate here at SWC as well and let you know.
> 
> btw: Virtuoso returns an error when querying the endpoint (http://wiktionary.dbpedia.org/sparql)
with the expression below. Is that an encoding issue?
> 
> SELECT *
> WHERE {
> ?s ?p ?o FILTER(bif:contains(?o, "häuser"))
> }
> 
> Regards,
> Gerd
> 
> ----- Ursprüngliche Mail -----
> Von: "Christoph Lauer" <dbpedia <at> online.ms>
> An: "The Wiktionary (http://www.wiktionary.org) mailing list" <wiktionary-l <at> lists.wikimedia.org>
> Gesendet: Donnerstag, 31. Mai 2012 17:12:16
> Betreff: Re: [Wiktionary-l] Extracting German noun forms
> 
>> Hi,
>>
(Continue reading)

Lars Aronsson | 1 Jun 2012 12:08
Picon
Favicon

Re: Extracting German noun forms

On 2012-05-31 12:42, Gerd Zechmeister wrote:
> I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
>
> Example: http://de.wiktionary.org/wiki/Haus
>
> I need the data from the box:
> Kasus 	Singular 	Plural
> Nominativ 	das Haus 	die Häuser

This is provided in the wiki template call

{{Deutsch Substantiv Übersicht
|...
|Nominativ Singular=das Haus
|Nominativ Plural=die Häuser
...

That you find in this XML dump (only 50 MB compressed),
http://dumps.wikimedia.org/dewiktionary/20120526/dewiktionary-20120526-pages-articles.xml.bz2

An old Perl script for parsing the XML dumps is found here,
http://meta.wikimedia.org/wiki/User:LA2/Extraktor

--

-- 
   Lars Aronsson (lars <at> aronsson.se)
   Aronsson Datateknik - http://aronsson.se

_______________________________________________
Wiktionary-l mailing list
Wiktionary-l <at> lists.wikimedia.org
(Continue reading)

Gerd Zechmeister | 1 Jun 2012 12:30
Picon
Favicon

Re: Extracting German noun forms

Thanks, Lars! this seems to be right source ;)

----- Ursprüngliche Mail -----
Von: "Lars Aronsson" <lars <at> aronsson.se>
An: "The Wiktionary (http://www.wiktionary.org) mailing list" <wiktionary-l <at> lists.wikimedia.org>
Gesendet: Freitag, 1. Juni 2012 12:08:12
Betreff: Re: [Wiktionary-l] Extracting German noun forms

On 2012-05-31 12:42, Gerd Zechmeister wrote:
> I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
>
> Example: http://de.wiktionary.org/wiki/Haus
>
> I need the data from the box:
> Kasus 	Singular 	Plural
> Nominativ 	das Haus 	die Häuser

This is provided in the wiki template call

{{Deutsch Substantiv Übersicht
|...
|Nominativ Singular=das Haus
|Nominativ Plural=die Häuser
...

That you find in this XML dump (only 50 MB compressed),
http://dumps.wikimedia.org/dewiktionary/20120526/dewiktionary-20120526-pages-articles.xml.bz2

An old Perl script for parsing the XML dumps is found here,
http://meta.wikimedia.org/wiki/User:LA2/Extraktor
(Continue reading)

Jonas Brekle | 1 Jun 2012 17:29
Picon
Gravatar

Re: Extracting German noun forms

regarding the forms: currently that is not part of the dataset yet. And
unfortunatley its not very easy to add it. I think it even would require
some enhancement to the extractor (not just the config). But its on my
todo list...
However such "boxes" of word forms are probably easier to extract with
the default DPpedia infobox extractor. Maybe the DBpedia community could
help with that. The biggest problem there would be to determine the
right "context" (i.e. the subject URI)...
i crossposted this to DBpedia, so they can reply

Regards,
Jonas

Am Freitag, den 01.06.2012, 12:08 +0200 schrieb Lars Aronsson:
> On 2012-05-31 12:42, Gerd Zechmeister wrote:
> > I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
> >
> > Example: http://de.wiktionary.org/wiki/Haus
> >
> > I need the data from the box:
> > Kasus 	Singular 	Plural
> > Nominativ 	das Haus 	die Häuser
> 
> This is provided in the wiki template call
> 
> {{Deutsch Substantiv Übersicht
> |...
> |Nominativ Singular=das Haus
> |Nominativ Plural=die Häuser
> ...
(Continue reading)

Jona Christopher Sahnwaldt | 1 Jun 2012 19:21
Picon

Re: [Wiktionary-l] Extracting German noun forms

Some thoughts...

On Fri, Jun 1, 2012 at 5:29 PM, Jonas Brekle <jonas.brekle@...> wrote:
> regarding the forms: currently that is not part of the dataset yet. And
> unfortunatley its not very easy to add it. I think it even would require
> some enhancement to the extractor (not just the config). But its on my
> todo list...
> However such "boxes" of word forms are probably easier to extract with
> the default DPpedia infobox extractor. Maybe the DBpedia community could
> help with that. The biggest problem there would be to determine the
> right "context" (i.e. the subject URI)...

I think you don't really need to enhance your extractor. Just run the
DBpedia MappingExtractor in addition. You could do the following:

- set up a mappings wiki for DBpedia Wiktionary (1)
- add a mapping for {{Deutsch Substantiv Übersicht}} to the mappings wiki
- during the DBpedia Wiktionary extraction, also run a
MappingExtractor instance that uses the mappings from your mappings
wiki

(1) or add namespaces to the existing mappings wiki - although it's
getting a bit crowded as far as namespaces are concerned :-)

As far as I can tell, DBpedia Wiktionary currently only has subject
URIs for words from en.wiktionary.org, right? So you'd probably have
to add URIs like http://de.wiktionary.dbpedia.org/resource/Haus.

I don't know if properties like "Nominativ Singular=das Haus" should
be extracted as URIs or as literals.
(Continue reading)


Gmane