Re: To find a list of actors
Andrew Dunbar <hippytrail <at> gmail.com>
2012-08-14 08:39:28 GMT
On 12 August 2012 09:58, Pavan Kumar <pavankumarstudent <at> yahoo.com> wrote:
> I know wikipedia has the information if lot of actors
> Is there a way I can write a Mediawiki API to get the list of all actors in
> wikipedia and there corresponding links...
> I am asking for a search in wikipedia ))) is it available...
> pl let me know
Well it's not as simple as you were probably hoping.
There is only a "MediaWiki API" that is the same on all the projects
related to Wikipedia. This means it is totally unaware of the content.
This means it knows nothing about Encyclopedias or how one might be
formatted to fit in a MediaWiki wiki.
It only knows about the generic concepts and operations of a wiki.
This means you can't query for "all actors".
Fortunately one of the generic concepts of a wiki is that of
"categories" and you can use the API to investigate what's in a
You can query the Category:Actors like so:
This will return not just the pages about actors though, but anything
people have put in the category, some of which may surprise you.
Here's an example in XML:
<cm pageid="35149376" ns="0" title="Tany Youne" />
<cm pageid="35963938" ns="0" title="Donna Wyant" />
<cm pageid="35778902" ns="14" title="Category:Actors by award" />
<categorymembers cmcontinue="..." />
Most importantly in a category you will usually find subcategories, so
to build up a list of all the actors in Wikipedia you will need to
descend recursively into some of those subcategories.
The biggest problem here is that counter to expectations of many, the
categories in Wikipedia are not arranged into a strict hierarchy.
There is not a "tree" of categories but "graph" connected in all kinds
of whimsical ways.
So you will need to analyse yourself the subcategories of the actor
category and make a hard-codes list of which to include, or you will
need to design some clever heuristics to decide which subcategory
paths to follow and which to ignore.
Some of this work may or may not have already been turned into an
"ontology" that you can query using SPARQL in DBpedia, which is data
mined from Wikipedia:
Andrew Dunbar (hippietrail)