Christoph Lauer | 14 May 2012 15:44
Favicon

more grammatical information in the extraction framework

Hello everyone, sorry for my partially bad english.
I would like to use the dbpedia-based wiktionary framework (through
dumps or online access) for a program I'm developing, which (trys to)
extract the article name and property from a natural language question
and queries the dbpedia for the answer. Particularly interesting would
be the extraction of a property from a verb, e.g. if I ask when someone
was "born", the program has to find a connection to the noun "birth",
which can then be further processed. Wiktionary provides that link by
first relating to the base form "to bear" and there under "Etymology 2 -
Verb" the transitive meaning "give birth". The connection in the german
wiktionary is a little different, there the link to the base form is
under "Grammatische Merkmale" (grammatical properties), and in the base
form of the verb the noun "Geburt" (birth) is found under "Abgeleitete
Begriffe" (derived terms).
I would be very happy if these informations could be extracted into the
dbpedia-wiktionary, in a unified way for all languages. Unfortunately
I'm not such an expert programmer, it would probably take weeks until I
found my way through mercurial, maven and the entire source code of the
framework to do the extraction myself, so I was hoping, someone with
more experience with the framework could do it (if it doesn't take weeks
of work :-) ). Thanks in advance!
With regards,
Christoph Lauer
Lars Aronsson | 14 May 2012 16:11
Picon
Favicon

Re: more grammatical information in the extraction framework

On 2012-05-14 15:44, Christoph Lauer wrote:
> The connection in the german
> wiktionary is a little different, there the link to the base form is
> under "Grammatische Merkmale" (grammatical properties), and in the base
> form of the verb the noun "Geburt" (birth) is found under "Abgeleitete
> Begriffe" (derived terms).
> I would be very happy if these informations could be extracted into the
> dbpedia-wiktionary, in a unified way for all languages.

If you look around the various languages of Wiktionary, you will
find that German is the exception. Most languages follow the
pattern of the English Wiktionary. If you want things to work the
same way for all languages, the German Wiktionary would need
to be restructured from scratch. This is not likely to happen.

Still, the entry for bear (English Wiktionary, etymology 2, verb)
does list "born" as the participle near the headword. There is also
a list ofderived terms (bear down, bear up, ...), it just doesn't
list "birth" yet, but I think you are free to add it.

--

-- 
   Lars Aronsson (lars@...)
   Aronsson Datateknik - http://aronsson.se
Christoph Lauer | 14 May 2012 16:54
Favicon

Re: more grammatical information in the extraction framework

Am 14.05.2012 16:11, schrieb Lars Aronsson:
> On 2012-05-14 15:44, Christoph Lauer wrote:
>> The connection in the german
>> wiktionary is a little different, there the link to the base form is
>> under "Grammatische Merkmale" (grammatical properties), and in the base
>> form of the verb the noun "Geburt" (birth) is found under "Abgeleitete
>> Begriffe" (derived terms).
>> I would be very happy if these informations could be extracted into the
>> dbpedia-wiktionary, in a unified way for all languages.
> 
> If you look around the various languages of Wiktionary, you will
> find that German is the exception. Most languages follow the
> pattern of the English Wiktionary. If you want things to work the
> same way for all languages, the German Wiktionary would need
> to be restructured from scratch. This is not likely to happen.
> 
> Still, the entry for bear (English Wiktionary, etymology 2, verb)
> does list "born" as the participle near the headword. There is also
> a list ofderived terms (bear down, bear up, ...), it just doesn't
> list "birth" yet, but I think you are free to add it.
> 
> 
Thanks for the information. Too bad the german wiktionary makes such
exceptions there, it's the wiktionary I wanted to use :-(
However my central problem was that none of these informations aren't
available in the RDF dumps or through the SPARQL endpoint
http://wiktionary.dbpedia.org/sparql, neither born -> bear, nor bear ->
birth/give birth, I thought maybe someone knows if there are plans to
import these informations. Does the project, which creates the dumps,
has a name anyway? Like dbpedia, the project creating the dumps from
(Continue reading)

Jonas Brekle | 14 May 2012 18:14
Picon
Gravatar

Re: more grammatical information in the extraction framework

wiktionary.dbpedia.org is part of the DBpedia project and not associated
with Wiktionary or Wikimedia etc.
It doesnt have a special name yet.

That the article "born" is not fully parsed, is a bug, as far as i can
see now. when you look at its html representation
http://wiktionary.dbpedia.org/page/born you can see that only the
language section (English) was parsed, there should be a link to the PoS
section too... I will have a look into it soon.

But generally, if data is missing in the wiki, change it there (and wait
for us to make a new dump) or if its not parsed, have a look at the
configuration xml file. If its a general problem with the general entry
layout, thats hard to change. But in this case, its a bug.

Regards,
Jonas

Am Montag, den 14.05.2012, 16:54 +0200 schrieb Christoph Lauer:
> Am 14.05.2012 16:11, schrieb Lars Aronsson:
> > On 2012-05-14 15:44, Christoph Lauer wrote:
> >> The connection in the german
> >> wiktionary is a little different, there the link to the base form is
> >> under "Grammatische Merkmale" (grammatical properties), and in the base
> >> form of the verb the noun "Geburt" (birth) is found under "Abgeleitete
> >> Begriffe" (derived terms).
> >> I would be very happy if these informations could be extracted into the
> >> dbpedia-wiktionary, in a unified way for all languages.
> > 
> > If you look around the various languages of Wiktionary, you will
(Continue reading)

Amgine | 14 May 2012 18:15
Picon

Re: more grammatical information in the extraction framework

On 14/05/12 07:54 AM, Christoph Lauer wrote:
> Am 14.05.2012 16:11, schrieb Lars Aronsson:
>> On 2012-05-14 15:44, Christoph Lauer wrote:
>>> The connection in the german
>>> wiktionary is a little different, there the link to the base form is
>>> under "Grammatische Merkmale" (grammatical properties), and in the base
>>> form of the verb the noun "Geburt" (birth) is found under "Abgeleitete
>>> Begriffe" (derived terms).
>>> I would be very happy if these informations could be extracted into the
>>> dbpedia-wiktionary, in a unified way for all languages.
>>
>> If you look around the various languages of Wiktionary, you will
>> find that German is the exception. Most languages follow the
>> pattern of the English Wiktionary. If you want things to work the
>> same way for all languages, the German Wiktionary would need
>> to be restructured from scratch. This is not likely to happen.
>>
>> Still, the entry for bear (English Wiktionary, etymology 2, verb)
>> does list "born" as the participle near the headword. There is also
>> a list ofderived terms (bear down, bear up, ...), it just doesn't
>> list "birth" yet, but I think you are free to add it.
>>
>>
> Thanks for the information. Too bad the german wiktionary makes such
> exceptions there, it's the wiktionary I wanted to use :-(
> However my central problem was that none of these informations aren't
> available in the RDF dumps or through the SPARQL endpoint
> http://wiktionary.dbpedia.org/sparql, neither born -> bear, nor bear ->
> birth/give birth, I thought maybe someone knows if there are plans to
> import these informations. Does the project, which creates the dumps,
(Continue reading)

Lars Aronsson | 14 May 2012 21:19
Picon
Favicon

Re: more grammatical information in the extraction framework

On 2012-05-14 16:54, Christoph Lauer wrote:
> However my central problem was that none of these informations aren't
> available in the RDF dumps or through the SPARQL endpoint
> http://wiktionary.dbpedia.org/sparql, neither born ->  bear, nor bear ->

Wiktionary is highly concentrated: A few people and a few templates
generate the vast majority of the content. I think I created half
of the Swedish language entries in the English Wiktionary. If the
people (who?) who run dbpedia.org can explain their needs, perhaps
the templates used in Wiktionary can better support the extraction
of structured data. I don't recall getting any feedback from them.

For the purpose of Swedish entries in the English Wiktionary, "född"
(born, geboren) is treated as an adjective (since it is inflected as
an adjective), with its role as participle of the verb being indicated
in the etymology section. The template
{{sv-verb-form-pastpart|föda}}
expands to the text "past participle of föda" and also adds a
category: Swedish past participles,
but it doesn't contain any other mark-up that says this is a
past participle. I have no idea how this is treated by dbpedia.

--

-- 
   Lars Aronsson (lars@...)
   Aronsson Datateknik - http://aronsson.se
Sebastian Hellmann | 15 May 2012 08:45
Picon
Favicon

Re: more grammatical information in the extraction framework

The DBpedia Wiktionary parser does not have a special use case. It aims 
for flexibility.
The parser can be configured by anyone to fit their use case.  It is 
also not limited to Wiktionary, we intend to parse other Wikis such as 
http://wikihow.com orhttp://wikitravel.org as well

DBpedia Wiktionary follows several visions:
1. if it is possible to get the data that you have put into Wiktionary 
out again, Wiktionary will be strengthened as a central resource.
2. Efforts to extract data from Wiktionary can be focused into one 
collaborative project. Therefore not everybody has to write his/her own 
parser.
3. DBpedia Wiktionary has the potential to become a major hub  of: 
http://linguistics.okfn.org/resources/llod/ as DBpedia is the central 
hub of http://richard.cyganiak.de/2007/10/lod/

It will need some more work to improve the config files step by step for 
each language, but it is not unrealistic. During the next week, we will 
add dumps for several more languages. We will migrate the config files 
somewhere user-friendly. So people who want to get data, will have no 
need to download and install software and know mercurial or Scala.
Sebastian

On 05/14/2012 09:19 PM, Lars Aronsson wrote:
> On 2012-05-14 16:54, Christoph Lauer wrote:
>> However my central problem was that none of these informations aren't
>> available in the RDF dumps or through the SPARQL endpoint
>> http://wiktionary.dbpedia.org/sparql, neither born ->  bear, nor bear ->
>
> Wiktionary is highly concentrated: A few people and a few templates
(Continue reading)

Christian Meyer | 14 May 2012 17:19
Picon

Re: more grammatical information in the extraction framework

Hi Christoph,

if you're interested in accessing the "Abgeleitete Begriffe" (derived terms) from the German
Wiktionary, you could use the JWKTL software [1]. It is a Java library for parsing German and English
Wiktionary dump files and accessing much of the information encoded in Wiktionary in a structured way.

Hope it helps!

Best regards,
Christian

[1] http://www.ukp.tu-darmstadt.de/software/jwktl/

________________________________________
Von: wiktionary-l-bounces@...
[wiktionary-l-bounces@...]" im Auftrag
von "Christoph Lauer [dbpedia@...]
Gesendet: Montag, 14. Mai 2012 15:44
An: wiktionary-l@...
Betreff: [Wiktionary-l] more grammatical information in the extraction  framework

Hello everyone, sorry for my partially bad english.
I would like to use the dbpedia-based wiktionary framework (through
dumps or online access) for a program I'm developing, which (trys to)
extract the article name and property from a natural language question
and queries the dbpedia for the answer. Particularly interesting would
be the extraction of a property from a verb, e.g. if I ask when someone
was "born", the program has to find a connection to the noun "birth",
which can then be further processed. Wiktionary provides that link by
first relating to the base form "to bear" and there under "Etymology 2 -
(Continue reading)

Christoph Lauer | 14 May 2012 18:58
Favicon

Re: more grammatical information in the extraction framework

Hi Christian,
Thanks I'll look into it, maybe this is just what I need.
Regards,
Christoph

Am 14.05.2012 17:19, schrieb Christian Meyer:
> Hi Christoph,
> 
> if you're interested in accessing the "Abgeleitete Begriffe" (derived terms) from the German
Wiktionary, you could use the JWKTL software [1]. It is a Java library for parsing German and English
Wiktionary dump files and accessing much of the information encoded in Wiktionary in a structured way.
> 
> Hope it helps!
> 
> Best regards,
> Christian
> 
> [1] http://www.ukp.tu-darmstadt.de/software/jwktl/
> 
> 
> ________________________________________
> Von: wiktionary-l-bounces@...
[wiktionary-l-bounces@...]" im Auftrag
von "Christoph Lauer [dbpedia@...]
> Gesendet: Montag, 14. Mai 2012 15:44
> An: wiktionary-l@...
> Betreff: [Wiktionary-l] more grammatical information in the extraction  framework
> 
> Hello everyone, sorry for my partially bad english.
> I would like to use the dbpedia-based wiktionary framework (through
(Continue reading)


Gmane