David Koppstein | 31 Oct 15:28

map of maps

Hi,

I have a question regarding the implementation of maps. Let's say I have
a map of maps, in which the first mapis keyed by strings, and the
second map is keyed by ints. Let's say I have 10 strings, and 1 million
ints per string. Let's say I want to add an element.

let add k1 k2 elem t =
let m2 =
try Fst.find k1 t
with Not_found -> Snd.empty
in
let m2 = Snd.add k2 elem m2 in
Fst.add k1 m2 t

where Fst is the first map and Snd is the second map, k1 is the key of
the first, and k2 is the key of the second.

Every time I do this, am I completely rewriting the int map, which is 1
million ints keys plus their elements? If not, how does the compiler get
around this, and are there still any significant performance issues?
Will using Hashtbls circumvent this problem?

Thanks,
David

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

New business?

Get new customers.

List your web site

in Yahoo! Search.

Yahoo! Groups

Come check out

featured healthy living

groups on Yahoo!

.

__,_._,___
Richard Jones | 2 Nov 15:52
Favicon

Re: map of maps

On Fri, Oct 31, 2008 at 10:32:13AM -0400, David Koppstein wrote:
> let add k1 k2 elem t =
> let m2 =
> try Fst.find k1 t
> with Not_found -> Snd.empty
> in
> let m2 = Snd.add k2 elem m2 in
> Fst.add k1 m2 t
>
> where Fst is the first map and Snd is the second map, k1 is the key of
> the first, and k2 is the key of the second.
>
> Every time I do this, am I completely rewriting the int map, which is 1
> million ints keys plus their elements? If not, how does the compiler get
> around this, and are there still any significant performance issues?

No, you're not rewriting either map. You probably want to have a look
for some resources on 'persistent' data structures to understand
what's "really" happening in memory:

http://en.wikipedia.org/wiki/Purely_functional#Examples_of_purely_functional_data_structures

> Will using Hashtbls circumvent this problem?

Hashtbls aren't persistent, and depending on your application that can
be a good or bad thing. You'll probably want to benchmark the whole
program to find out which it is. (In some circumstances you simply
need to use a persistent structure to ease the programming).

This book by Chris Okasaki is the canonical reference, albeit rather
difficult to follow beyond the first few chaptes:

http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504

BTW, have you thought about using a single map which is keyed on
(string, int) key?

Rich.

--
Richard Jones
Red Hat

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Ads on Yahoo!

Learn more now.

Reach customers

searching for you.

Yahoo! Groups

Stay healthy

and discover other

people who can help.

.

__,_._,___
David Koppstein | 2 Nov 16:27

Re: map of maps

I actually used a string * int key initially, but I thought that it was
very silly to repeat the string a million times when once would suffice.
Anyhow, now Ashish and I have a full-fledged map of maps module.

I think I found the expensive operation -- I am trying to take the
average of some logged values that is associated with a string,int pair.
So, I am taking the log of some float and adding it to the accumulator
-- taking the log every time I insert a value is probably the culprit. I
suppose that one way to circumvent this problem would be to instead
multiply numbers into the accumulator and take the log at the end, since
log(a*b) = log(a) + log(b) but these numbers are pretty big so I am
afraid of overflow.

I guess I could use ocamlprof to make sure that the log is indeed the
culprit.

Thanks for your help, Richard!

David

Richard Jones wrote:
>
> On Fri, Oct 31, 2008 at 10:32:13AM -0400, David Koppstein wrote:
> > let add k1 k2 elem t =
> > let m2 =
> > try Fst.find k1 t
> > with Not_found -> Snd.empty
> > in
> > let m2 = Snd.add k2 elem m2 in
> > Fst.add k1 m2 t
> >
> > where Fst is the first map and Snd is the second map, k1 is the key of
> > the first, and k2 is the key of the second.
> >
> > Every time I do this, am I completely rewriting the int map, which is 1
> > million ints keys plus their elements? If not, how does the compiler
> get
> > around this, and are there still any significant performance issues?
>
> No, you're not rewriting either map. You probably want to have a look
> for some resources on 'persistent' data structures to understand
> what's "really" happening in memory:
>
> http://en.wikipedia.org/wiki/Purely_functional#Examples_of_purely_functional_data_structures
> <http://en.wikipedia.org/wiki/Purely_functional#Examples_of_purely_functional_data_structures>
>
> > Will using Hashtbls circumvent this problem?
>
> Hashtbls aren't persistent, and depending on your application that can
> be a good or bad thing. You'll probably want to benchmark the whole
> program to find out which it is. (In some circumstances you simply
> need to use a persistent structure to ease the programming).
>
> This book by Chris Okasaki is the canonical reference, albeit rather
> difficult to follow beyond the first few chaptes:
>
> http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504
> <http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504>
>
> BTW, have you thought about using a single map which is keyed on
> (string, int) key?
>
> Rich.
>
> --
> Richard Jones
> Red Hat
>
>

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Healthy Living

Learn to live life

to the fullest

on Yahoo! Groups.

Y! Groups blog

the best source

for the latest

scoop on Groups.

.

__,_._,___
Jon Harrop | 2 Nov 17:38
Favicon

Re: map of maps

On Sunday 02 November 2008 15:27:31 David Koppstein wrote:
> I actually used a string * int key initially, but I thought that it was
> very silly to repeat the string a million times when once would suffice.
> Anyhow, now Ashish and I have a full-fledged map of maps module.
>
> I think I found the expensive operation -- I am trying to take the
> average of some logged values that is associated with a string,int pair.
> So, I am taking the log of some float and adding it to the accumulator
> -- taking the log every time I insert a value is probably the culprit. I
> suppose that one way to circumvent this problem would be to instead
> multiply numbers into the accumulator and take the log at the end, since
> log(a*b) = log(a) + log(b) but these numbers are pretty big so I am
> afraid of overflow.

I'd be surprised if the log is the performance bottleneck (you are in
64-bit?).

> I guess I could use ocamlprof to make sure that the log is indeed the
> culprit.

You mean "ocamlopt -p" and "gprof".

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

__._,_.___
Recent Activity
Visit Your Group
Give Back

Yahoo! for Good

Get inspired

by a good cause.

Y! Toolbar

Get it Free!

easy 1-click access

to your groups.

Yahoo! Groups

Start a group

in 3 easy steps.

Connect with others.

.

__,_._,___
Richard Jones | 2 Nov 17:08
Favicon

Re: map of maps

On Sun, Nov 02, 2008 at 10:27:31AM -0500, David Koppstein wrote:
> I actually used a string * int key initially, but I thought that it was
> very silly to repeat the string a million times when once would suffice.

It shouldn't store the string, just a pointer to the string. On the
other hand, you are right that there will be extra overhead (the tuple
of two elements takes three words on its own, and on 64-bit even a
pointer to a string is a large object).

> I guess I could use ocamlprof to make sure that the log is indeed the
> culprit.

Yes, profiling is definitely helpful :-)

Rich.

--
Richard Jones
Red Hat

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Real Food Group

Share recipes,

restaurant ratings

and favorite meals.

Popular Y! Groups

Is your group one?

Check it out and

see.

.

__,_._,___
Ashish Agarwal | 2 Nov 19:28

Re: map of maps

> have you thought about using a single map which is keyed on (string, int)
key
I've been using a map of maps also because it allows more flexible
operations. I can perform operations on all the values, or all the values
for each string. Actually, I've been wondering if there are any
generalizations of this. What if we need three or four keys. I guess a
database is the ultimate answer but a lighter-weight solution would be
better.

On Sun, Nov 2, 2008 at 10:52 AM, Richard Jones <rich <at> annexia.org> wrote:

> On Fri, Oct 31, 2008 at 10:32:13AM -0400, David Koppstein wrote:
> > let add k1 k2 elem t =
> > let m2 =
> > try Fst.find k1 t
> > with Not_found -> Snd.empty
> > in
> > let m2 = Snd.add k2 elem m2 in
> > Fst.add k1 m2 t
> >
> > where Fst is the first map and Snd is the second map, k1 is the key of
> > the first, and k2 is the key of the second.
> >
> > Every time I do this, am I completely rewriting the int map, which is 1
> > million ints keys plus their elements? If not, how does the compiler get
> > around this, and are there still any significant performance issues?
>
> No, you're not rewriting either map. You probably want to have a look
> for some resources on 'persistent' data structures to understand
> what's "really" happening in memory:
>
>
> http://en.wikipedia.org/wiki/Purely_functional#Examples_of_purely_functional_data_structures
>
> > Will using Hashtbls circumvent this problem?
>
> Hashtbls aren't persistent, and depending on your application that can
> be a good or bad thing. You'll probably want to benchmark the whole
> program to find out which it is. (In some circumstances you simply
> need to use a persistent structure to ease the programming).
>
> This book by Chris Okasaki is the canonical reference, albeit rather
> difficult to follow beyond the first few chaptes:
>
>
> http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504
>
> BTW, have you thought about using a single map which is keyed on
> (string, int) key?
>
> Rich.
>
> --
> Richard Jones
> Red Hat
>
>
>

[Non-text portions of this message have been removed]

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Y! Groups blog

The place to go

to stay informed

on Groups news!

Special K Group

on Yahoo! Groups

Learn how others

are losing pounds.

.

__,_._,___
Jon Harrop | 2 Nov 18:48
Favicon

Re: map of maps

On Friday 31 October 2008 14:32:13 David Koppstein wrote:
> Hi,
>
> I have a question regarding the implementation of maps. Let's say I have
> a map of maps, in which the first mapis keyed by strings, and the
> second map is keyed by ints. Let's say I have 10 strings, and 1 million
> ints per string. Let's say I want to add an element.
>
> let add k1 k2 elem t =
> let m2 =
> try Fst.find k1 t
> with Not_found -> Snd.empty
> in
> let m2 = Snd.add k2 elem m2 in
> Fst.add k1 m2 t
>
> where Fst is the first map and Snd is the second map, k1 is the key of
> the first, and k2 is the key of the second.
>
> Every time I do this, am I completely rewriting the int map, which is 1
> million ints keys plus their elements? If not, how does the compiler get
> around this, and are there still any significant performance issues?
> Will using Hashtbls circumvent this problem?

Basically, everything is passed by reference in OCaml so you are not copying
the data structure but referring back to it. After all, there is no point in
copying immutable data.

Note that this is one of the most important foundational concepts in
functional programming.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

New business?

Get new customers.

List your web site

in Yahoo! Search.

Yahoo! Groups

Going Green Zone

Save the planet.

Your resources to go green.

.

__,_._,___

Gmane