1 Jun 2009 06:56
[jira] Created: (PIG-826) DISTINCT as "Function" rather than statement - High Level Pig
DISTINCT as "Function" rather than statement - High Level Pig
-------------------------------------------------------------
Key: PIG-826
URL: https://issues.apache.org/jira/browse/PIG-826
Project: Pig
Issue Type: New Feature
Reporter: David Ciemiewicz
In SQL, a user would think nothing of doing something like:
{code}
select
COUNT(DISTINCT(user)) as user_count,
COUNT(DISTINCT(country)) as country_count,
COUNT(DISTINCT(url) as url_count
from
server_logs;
{code}
But in Pig, we'd need to do something like the following. And this is about the most
compact version I could come up with.
{code}
Logs = load 'log' using PigStorage()
as ( user: chararray, country: chararray, url: chararray);
DistinctUsers = distinct (foreach Logs generate user);
DistinctCountries = distinct (foreach Logs generate country);
DistinctUrls = distinct (foreach Logs generate url);
(Continue reading)
RSS Feed