Pavan Kumar | 9 Aug 2012 08:16
Picon
Favicon

getting data for a topic

Hi all,
I am new to wikipedia API.
can you help me with the following:
I want to grep all the content of the "united states of america" to a text file with out images.
I am looking a response in text format.

How can I do that?
looking for: http://en.wikipedia.org/wiki/United_States  page contents.

I got this url constructed:
But I am not getting what I want ..:(.may be I am missing some thing basic

1.how can I get the content of what ever string I give in the query?
    please help me with the url.
2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json?
3. In the unites_states example, I want to get the  first coulm of the citys (
Leading population centers

)
how can I get that .   

-pavi
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Platonides | 9 Aug 2012 13:14
Picon

Re: getting data for a topic

On Thu, Aug 9, 2012 at 8:16 AM, Pavan Kumar <pavankumarstudent <at> yahoo.com> wrote:
Hi all,
I am new to wikipedia API.
can you help me with the following:
I want to grep all the content of the "united states of america" to a text file with out images.
I am looking a response in text format.

How can I do that?
looking for: http://en.wikipedia.org/wiki/United_States  page contents.

I got this url constructed:
But I am not getting what I want ..:(.may be I am missing some thing basic

A simple mistake: You are writing the article in lowercase.
 

Try with http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=United_States&prop=revisions&rvprop=content


1.how can I get the content of what ever string I give in the query?
    please help me with the url.
2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json?
3. In the unites_states example, I want to get the  first coulm of the citys (
Leading population centers

)
how can I get that .   

Extracting content from the inside of the article content will require you to perform some parsing of the wikitext.



_______________________________________________
Mediawiki-api mailing list
Mediawiki-api <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Pavan Kumar | 10 Aug 2012 07:53
Picon
Favicon

Re: getting data for a topic


Thank you for thereply.
with case changes that worked But:
when I am trying to get the data in Json.which I think is better to parser:

I see that I am getting lot of data like:
==
\u0906\u0923\u093f \u092a\u094d\u0930\u0926\u0947\u0936]]\n[[ms:Negeri dan wilayah di India]]\n[[nl:Lijst van staten en territoria van India]]\n[[ne:\u092d\u093e\u0930\u0924\u0915\u093e \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941 \u0930 \u0915\u0947\u0928\u094d\u0926\u094d\u0930 \u0936\u093e\u0938\u093f\u0924 \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941]]\n[[ja:\u30a4\u30f3\u30c9\u306e\u5730\u65b9\u884c\u653f\u533a\u753b]]\n[[no:Indias delstater og territorier]]\n[[nn:Statar og territorium i India]]\n[[or:\u0b2d\u0b3e\u0b30\u0b24\u0b30is my query correct? all I need is to get the Leading population centers



From: Platonides <platonides <at> gmail.com>
To: Pavan Kumar <pavankumarstudent <at> yahoo.com>; MediaWiki API announcements & discussion <mediawiki-api <at> lists.wikimedia.org>
Sent: Thursday, August 9, 2012 4:14 AM
Subject: Re: [Mediawiki-api] getting data for a topic

On Thu, Aug 9, 2012 at 8:16 AM, Pavan Kumar <pavankumarstudent <at> yahoo.com> wrote:
Hi all,
I am new to wikipedia API.
can you help me with the following:
I want to grep all the content of the "united states of america" to a text file with out images.
I am looking a response in text format.

How can I do that?
looking for: http://en.wikipedia.org/wiki/United_States  page contents.

I got this url constructed:
http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=united_states&prop=revisions&rvprop=content 
But I am not getting what I want ..:(.may be I am missing some thing basic

A simple mistake: You are writing the article in lowercase.
 

Try with http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=United_States&prop=revisions&rvprop=content


1.how can I get the content of what ever string I give in the query?
    please help me with the url.
2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json?
3. In the unites_states example, I want to get the  first coulm of the citys (
Leading population centers

)
how can I get that .   

Extracting content from the inside of the article content will require you to perform some parsing of the wikitext.





_______________________________________________
Mediawiki-api mailing list
Mediawiki-api <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Tommy Chheng | 10 Aug 2012 07:56
Picon
Gravatar

Re: getting data for a topic

If your work focuses on getting structured data, i recommend using dbpedia.org or freebase.com. They both structure wikipedia data and they have structured query languages.

-- 
Tommy Chheng

El jueves, agosto 9, 2012 a las 10:53 PM, Pavan Kumar escribió:


Thank you for thereply.
with case changes that worked But:
when I am trying to get the data in Json.which I think is better to parser:

I see that I am getting lot of data like:
==
\u0906\u0923\u093f \u092a\u094d\u0930\u0926\u0947\u0936]]\n[[ms:Negeri dan wilayah di India]]\n[[nl:Lijst van staten en territoria van India]]\n[[ne:\u092d\u093e\u0930\u0924\u0915\u093e \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941 \u0930 \u0915\u0947\u0928\u094d\u0926\u094d\u0930 \u0936\u093e\u0938\u093f\u0924 \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941]]\n[[ja:\u30a4\u30f3\u30c9\u306e\u5730\u65b9\u884c\u653f\u533a\u753b]]\n[[no:Indias delstater og territorier]]\n[[nn:Statar og territorium i India]]\n[[or:\u0b2d\u0b3e\u0b30\u0b24\u0b30is my query correct? all I need is to get the Leading population centers



From: Platonides <platonides <at> gmail.com>
To: Pavan Kumar <pavankumarstudent <at> yahoo.com>; MediaWiki API announcements & discussion <mediawiki-api <at> lists.wikimedia.org>
Sent: Thursday, August 9, 2012 4:14 AM
Subject: Re: [Mediawiki-api] getting data for a topic

On Thu, Aug 9, 2012 at 8:16 AM, Pavan Kumar <pavankumarstudent <at> yahoo.com> wrote:
Hi all,
I am new to wikipedia API.
can you help me with the following:
I want to grep all the content of the "united states of america" to a text file with out images.
I am looking a response in text format.

How can I do that?
looking for: http://en.wikipedia.org/wiki/United_States  page contents.

I got this url constructed:
But I am not getting what I want ..:(.may be I am missing some thing basic

A simple mistake: You are writing the article in lowercase.
 

Try with http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=United_States&prop=revisions&rvprop=content


1.how can I get the content of what ever string I give in the query?
    please help me with the url.
2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json?
3. In the unites_states example, I want to get the  first coulm of the citys (
Leading population centers

)
how can I get that .   

Extracting content from the inside of the article content will require you to perform some parsing of the wikitext.





_______________________________________________
Mediawiki-api mailing list

_______________________________________________
Mediawiki-api mailing list
Mediawiki-api <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Pavan Kumar | 10 Aug 2012 08:28
Picon
Favicon

Re: getting data for a topic

Thanks Tommy  chheng for the reply.

My requirement is to get the information of things I posted in the query.
example: I post the following query:

I want to get all the states in USA.
so I was looking if I can get the JSON putput and use any tool to extract the JSON output for alll the states..

I was bit histent to use new API's but I will look into that..also

can you tell me if there is any other good tools that convert for me JSON to get the information i am looking

From: Tommy Chheng <tommy.chheng <at> gmail.com>
To: Pavan Kumar <pavankumarstudent <at> yahoo.com>; MediaWiki API announcements & discussion <mediawiki-api <at> lists.wikimedia.org>
Sent: Thursday, August 9, 2012 10:56 PM
Subject: RE: [Mediawiki-api] getting data for a topic

If your work focuses on getting structured data, i recommend using dbpedia.org or freebase.com. They both structure wikipedia data and they have structured query languages.

-- 
Tommy Chheng

El jueves, agosto 9, 2012 a las 10:53 PM, Pavan Kumar escribió:

Thank you for thereply.
with case changes that worked But:
when I am trying to get the data in Json.which I think is better to parser:
http://en.wikipedia.org/w/api.php?format=json&action=query&titles=United_States&prop=revisions&rvprop=content 

I see that I am getting lot of data like:
==
\u0906\u0923\u093f \u092a\u094d\u0930\u0926\u0947\u0936]]\n[[ms:Negeri dan wilayah di India]]\n[[nl:Lijst van staten en territoria van India]]\n[[ne:\u092d\u093e\u0930\u0924\u0915\u093e \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941 \u0930 \u0915\u0947\u0928\u094d\u0926\u094d\u0930 \u0936\u093e\u0938\u093f\u0924 \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941]]\n[[ja:\u30a4\u30f3\u30c9\u306e\u5730\u65b9\u884c\u653f\u533a\u753b]]\n[[no:Indias delstater og territorier]]\n[[nn:Statar og territorium i India]]\n[[or:\u0b2d\u0b3e\u0b30\u0b24\u0b30is my query correct? all I need is to get the Leading population centers



From: Platonides <platonides <at> gmail.com>
To: Pavan Kumar <pavankumarstudent <at> yahoo.com>; MediaWiki API announcements & discussion <mediawiki-api <at> lists.wikimedia.org>
Sent: Thursday, August 9, 2012 4:14 AM
Subject: Re: [Mediawiki-api] getting data for a topic

On Thu, Aug 9, 2012 at 8:16 AM, Pavan Kumar <pavankumarstudent <at> yahoo.com> wrote:
Hi all,
I am new to wikipedia API.
can you help me with the following:
I want to grep all the content of the "united states of america" to a text file with out images.
I am looking a response in text format.

How can I do that?
looking for: http://en.wikipedia.org/wiki/United_States  page contents.

I got this url constructed:
http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=united_states&prop=revisions&rvprop=content 
But I am not getting what I want ..:(.may be I am missing some thing basic

A simple mistake: You are writing the article in lowercase.
 

Try with http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=United_States&prop=revisions&rvprop=content


1.how can I get the content of what ever string I give in the query?
    please help me with the url.
2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json?
3. In the unites_states example, I want to get the  first coulm of the citys (
Leading population centers

)
how can I get that .   

Extracting content from the inside of the article content will require you to perform some parsing of the wikitext.





_______________________________________________
Mediawiki-api mailing list



_______________________________________________
Mediawiki-api mailing list
Mediawiki-api <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Tommy Chheng | 10 Aug 2012 08:39
Picon
Gravatar

Re: getting data for a topic

Hi,
Please refer to freebase or dbpedia for getting that information in JSON form. 

Here are some links that might be helpful

i recommend following up in either freebase/dbpedia message boards.

-- 
Tommy Chheng

El jueves, agosto 9, 2012 a las 11:28 PM, Pavan Kumar escribió:

Thanks Tommy  chheng for the reply.

My requirement is to get the information of things I posted in the query.
example: I post the following query:

I want to get all the states in USA.
so I was looking if I can get the JSON putput and use any tool to extract the JSON output for alll the states..

I was bit histent to use new API's but I will look into that..also

can you tell me if there is any other good tools that convert for me JSON to get the information i am looking

From: Tommy Chheng <tommy.chheng <at> gmail.com>
To: Pavan Kumar <pavankumarstudent <at> yahoo.com>; MediaWiki API announcements & discussion <mediawiki-api <at> lists.wikimedia.org>
Sent: Thursday, August 9, 2012 10:56 PM
Subject: RE: [Mediawiki-api] getting data for a topic

If your work focuses on getting structured data, i recommend using dbpedia.org or freebase.com. They both structure wikipedia data and they have structured query languages.

-- 
Tommy Chheng

El jueves, agosto 9, 2012 a las 10:53 PM, Pavan Kumar escribió:

Thank you for thereply.
with case changes that worked But:
when I am trying to get the data in Json.which I think is better to parser:

I see that I am getting lot of data like:
==
\u0906\u0923\u093f \u092a\u094d\u0930\u0926\u0947\u0936]]\n[[ms:Negeri dan wilayah di India]]\n[[nl:Lijst van staten en territoria van India]]\n[[ne:\u092d\u093e\u0930\u0924\u0915\u093e \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941 \u0930 \u0915\u0947\u0928\u094d\u0926\u094d\u0930 \u0936\u093e\u0938\u093f\u0924 \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941]]\n[[ja:\u30a4\u30f3\u30c9\u306e\u5730\u65b9\u884c\u653f\u533a\u753b]]\n[[no:Indias delstater og territorier]]\n[[nn:Statar og territorium i India]]\n[[or:\u0b2d\u0b3e\u0b30\u0b24\u0b30is my query correct? all I need is to get the Leading population centers



From: Platonides <platonides <at> gmail.com>
To: Pavan Kumar <pavankumarstudent <at> yahoo.com>; MediaWiki API announcements & discussion <mediawiki-api <at> lists.wikimedia.org>
Sent: Thursday, August 9, 2012 4:14 AM
Subject: Re: [Mediawiki-api] getting data for a topic

On Thu, Aug 9, 2012 at 8:16 AM, Pavan Kumar <pavankumarstudent <at> yahoo.com> wrote:
Hi all,
I am new to wikipedia API.
can you help me with the following:
I want to grep all the content of the "united states of america" to a text file with out images.
I am looking a response in text format.

How can I do that?
looking for: http://en.wikipedia.org/wiki/United_States  page contents.

I got this url constructed:
But I am not getting what I want ..:(.may be I am missing some thing basic

A simple mistake: You are writing the article in lowercase.
 

Try with http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=United_States&prop=revisions&rvprop=content


1.how can I get the content of what ever string I give in the query?
    please help me with the url.
2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json?
3. In the unites_states example, I want to get the  first coulm of the citys (
Leading population centers

)
how can I get that .   

Extracting content from the inside of the article content will require you to perform some parsing of the wikitext.





_______________________________________________
Mediawiki-api mailing list




_______________________________________________
Mediawiki-api mailing list
Mediawiki-api <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Pavan Kumar | 12 Aug 2012 08:58
Picon
Favicon

To find a list of actors

Hi,
I know wikipedia has the information if lot of actors
Is there a way I can write a Mediawiki API to get the list of all actors in wikipedia and there corresponding links...
I am asking for a search in wikipedia :-)))) is it available...

pl let me know 

//

_______________________________________________
Mediawiki-api mailing list
Mediawiki-api <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Andrew Dunbar | 14 Aug 2012 10:39
Picon
Gravatar

Re: To find a list of actors

On 12 August 2012 09:58, Pavan Kumar <pavankumarstudent <at> yahoo.com> wrote:
> Hi,
> I know wikipedia has the information if lot of actors
> Is there a way I can write a Mediawiki API to get the list of all actors in
> wikipedia and there corresponding links...
> I am asking for a search in wikipedia :-)))) is it available...
>
> pl let me know

Well it's not as simple as you were probably hoping.

There is only a "MediaWiki API" that is the same on all the projects
related to Wikipedia. This means it is totally unaware of the content.
This means it knows nothing about Encyclopedias or how one might be
formatted to fit in a MediaWiki wiki.

It only knows about the generic concepts and operations of a wiki.

This means you can't query for "all actors".

Fortunately one of the generic concepts of a wiki is that of
"categories" and you can use the API to investigate what's in a
category:

http://www.mediawiki.org/wiki/API:Categorymembers

You can query the Category:Actors like so:
http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Actors

This will return not just the pages about actors though, but anything
people have put in the category, some of which may surprise you.
Here's an example in XML:

<?xml version="1.0"?>
<api>
  <query>
    <categorymembers>
      <cm pageid="35149376" ns="0" title="Tany Youne" />
      <cm pageid="35963938" ns="0" title="Donna Wyant" />
      <cm pageid="35778902" ns="14" title="Category:Actors by award" />
    </categorymembers>
  </query>
  <query-continue>
    <categorymembers cmcontinue="..." />
  </query-continue>
</api>

Most importantly in a category you will usually find subcategories, so
to build up a list of all the actors in Wikipedia you will need to
descend recursively into some of those subcategories.

The biggest problem here is that counter to expectations of many, the
categories in Wikipedia are not arranged into a strict hierarchy.
There is not a "tree" of categories but "graph" connected in all kinds
of whimsical ways.

So you will need to analyse yourself the subcategories of the actor
category and make a hard-codes list of which to include, or you will
need to design some clever heuristics to decide which subcategory
paths to follow and which to ignore.

Some of this work may or may not have already been turned into an
"ontology" that you can query using SPARQL in DBpedia, which is data
mined from Wikipedia:

http://dbpedia.org/About

Good luck.
Andrew Dunbar (hippietrail)

Gmane