code17 | 1 Nov 23:58

How to identify the origin of a message using Unix module functions

Hi,

I'm using OCaml to implement a library for a network application where
many distributed processes running and communicating. A process can
generate and send messages to others, or pass messages it receives
*without* any modification.

The only requirement is that when a process gets a message, it must be
able to identify whether this message was previously generated by itself
(and circulate back), or by another process but running on the same
machine, or by processes on other machines.

For that, we are allowed to prefix message strings with any kind of tag
when it's generated. The problem is what the tag should look like? Even
simpler, we can assume all the processes start and end at the same time,
so there is no problems about pid/IP reuse.

My naive attempt is using hostname + pid as the tag, but two machines
can named themselves the same but still communicate well if they choose
to always connect through explicit IP, can't they?

If I instead use ip + pid, how can I detect the ip of a running process
without forcing users to always provide one explicitly?
Unix.gethostbyname(Unix.gethostname()) only returns 127.0.0.1, and what
if a machine has several IPs?

The problem turns out be what is unique for a host machine in a
networked application? It seems simple but I don't really know what's
the common idiom. Any suggestion is welcome. Thanks in advance.

--
code17

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Real Food Group

Share recipes,

restaurant ratings

and favorite meals.

New business?

Get new customers.

List your web site

in Yahoo! Search.

.

__,_._,___
code17 | 2 Nov 09:27

Re: How to identify the origin of a message using Unix module functions

Here is a more concrete example, I think may people have encountered the
same problem:

Suppose process A opens a local file and get some file descriptor fd. It
then passes the fd as message over the network. Now suppose process B
gets the message and tries to read from fd, something bad might happen
unless B is actually A (or forked from A etc.):

- It can encounter the EBADF exception since there is no such file
descriptor opened on by this process
- Even worse, if it happens to have a fd open locally with the same fd
number, it may succeed in reading but fail later for other unexpected
reasons.

Now suppose we're allowed to prefix the fd with some tag information to
ensure safety, so that when a process receives this fd, its reading
function can at least check:

- whether it was generated by myself?
- whether it was generated by another process on the same machine ?
(suppose you also pack the file information into the tag, then there
might be something this process can do)
- or it was generated on other machines (i.e. nothing we can do here,
maybe just passing the fd on and on)

Even if we are forced to open a fd no matter it was generated on the
local machine or not, the raised exception can at least carries much
clearer message than EBADF or others, such as "Can't read: Fd was
generated on machine XXX".

Well we may still encounter exception when a local fd was previously
open but now closed, but this is a problem even in purely local
applications. So it's not a problem newly introduced by the "passing fd
as message" scheme, hence not my concern.

The question is what the tag should look like?

Thanks.

code17 wrote:

> I'm using OCaml to implement a library for a network application where
> many distributed processes running and communicating. A process can
> generate and send messages to others, or pass messages it receives
> *without* any modification.
>
> The only requirement is that when a process gets a message, it must be
> able to identify whether this message was previously generated by itself
> (and circulate back), or by another process but running on the same
> machine, or by processes on other machines.
>
> For that, we are allowed to prefix message strings with any kind of tag
> when it's generated. The problem is what the tag should look like? Even
> simpler, we can assume all the processes start and end at the same time,
> so there is no problems about pid/IP reuse.
>
> My naive attempt is using hostname + pid as the tag, but two machines
> can named themselves the same but still communicate well if they choose
> to always connect through explicit IP, can't they?
>
> If I instead use ip + pid, how can I detect the ip of a running process
> without forcing users to always provide one explicitly?
> Unix.gethostbyname(Unix.gethostname()) only returns 127.0.0.1, and what
> if a machine has several IPs?
>
> The problem turns out be what is unique for a host machine in a
> networked application? It seems simple but I don't really know what's
> the common idiom. Any suggestion is welcome. Thanks in advance.
>
> --
> code17
>

__._,_.___
Recent Activity
Visit Your Group
Give Back

Yahoo! for Good

Get inspired

by a good cause.

Y! Toolbar

Get it Free!

easy 1-click access

to your groups.

Yahoo! Groups

Start a group

in 3 easy steps.

Connect with others.

.

__,_._,___
Dario Teixeira | 2 Nov 13:59
Favicon

Re: How to identify the origin of a message using Unix module functions

> My naive attempt is using hostname + pid as the tag, but
> two machines can named themselves the same but still communicate
> well if they choose to always connect through explicit IP,
> can't they?

Hi,

You are looking for some form of UUID:

http://en.wikipedia.org/wiki/UUID
http://erratique.ch/software/uuidm

Hope this helps,
Dario Teixeira

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Search Ads

Get new customers.

List your web site

in Yahoo! Search.

Best of Y! Groups

Discover groups

that are the best

of their class.

.

__,_._,___
Richard Jones | 2 Nov 15:52
Favicon

Re: How to identify the origin of a message using Unix module functions

In addition to the other reply, there is an RFC 4122-compliant UUID
generator libary for OCaml which you may find useful:

http://alan.petitepomme.net/cwn/2008.06.17.html#1

Rich.

--
Richard Jones
Red Hat

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

New business?

Get new customers.

List your web site

in Yahoo! Search.

Popular Y! Groups

Is your group one?

Check it out and

see.

.

__,_._,___
code17 | 2 Nov 19:38

Re: How to identify the origin of a message using Unix module functions

Hi,

Thanks Dario and Richard for the suggestions on UUID.

It seems that this problem can't be addressed universally with the plain
composition of pid, ip etc. UUID seems to design for that. I will
investigate it later.

Just out of curious, what kind of assumptions we can usually take when
dealing with hostname/ip/pid uniqueness in an inter-connected application?

All the best

Richard Jones wrote:
>
>
> In addition to the other reply, there is an RFC 4122-compliant UUID
> generator libary for OCaml which you may find useful:
>
> http://alan.petitepomme.net/cwn/2008.06.17.html#1
> <http://alan.petitepomme.net/cwn/2008.06.17.html#1>
>
> Rich.
>
> --
> Richard Jones
> Red Hat
>

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Search Ads

Get new customers.

List your web site

in Yahoo! Search.

Food Lovers

Real Food Group

on Yahoo! Groups

find out more.

.

__,_._,___
Richard Jones | 2 Nov 22:37
Favicon

Re: Re: How to identify the origin of a message using Unix module functions

On Sun, Nov 02, 2008 at 07:38:01PM +0100, code17 wrote:
> Just out of curious, what kind of assumptions we can usually take when
> dealing with hostname/ip/pid uniqueness in an inter-connected application?

In all the cases I can think of that I've personally dealt with, we
have either been able to assign a unique ID to each host or each host
has come with a unique ID of some sort (be it hostname or IP).
eg. For distributing databases, you can always assume that each RDBMS
host has a unique ID, which you then embed in all serials / object
IDs.

In the larger case, such as SMTP on the internet, there has always
been a way to assign unique strings to each object -- eg. this very
message will have a Message-ID header when it is sent, and that will
be generated in a way which is globally unique. Usually it involves
concatenating enough random/unique stuff together, such as hostname,
PID, time, etc. The RFC I mentioned just formalises this process.

If you have a good source of random numbers, ie. /dev/random or
/dev/urandom, then just get 128 bits of randomness together. It's
extremely unlikely that two objects will have the same ID -- in fact
it's _astronomically[1]_ more likely that your code has a mistake than
that two objects will be given the same ID.

Rich.

[1] Or "economically" as in this quote from Feynman:
http://www.quotationspage.com/quote/26930.html

--
Richard Jones
Red Hat

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

New business?

Get new customers.

List your web site

in Yahoo! Search.

Best of Y! Groups

Discover groups

that are the best

of their class.

.

__,_._,___
Dario Teixeira | 2 Nov 22:45
Favicon

Re: Re: How to identify the origin of a message using Unix module functions

> Just out of curious, what kind of assumptions we can
> usually take when
> dealing with hostname/ip/pid uniqueness in an
> inter-connected application?

Hi,

Generally you an assume thay IPs are unique. However, in NATed
environments the uniqueness of IP is trickier than what it seems
at first. You can have two machines in different networks, each
with IP 192.168.1.1. There is no IP routing conflict because
the router will NAT that private address into its own external
IP. If, however, your protocol relies on each process
identifying itself, then processes running on each of the two
machines can both claim they are running on a host with IP
192.168.1.1... Nevertheless, I am sure you won't have to worry
about this.

The PID uniqueness is a also a tricky question, one that depends
on whether the temporal interval is important. While on any
given instant every process on a machine will have a different
PID, because the PID numbers wrap around, a process p1 running
at time t1 may have the same PID as a process p2 running at time
t2 (if p1 terminated between t1 and t2, of course). If you are
running Linux, check /proc/sys/kernel/pid_max, which tells you
the wrap-around number.

One quick and dirty (in the sense that it only works in probable, not provable terms) solution to your problem is
to generate a large (say, 256 bit) random number and use
that as your UUID. Sure, there's always the infinitesimal
chance of a collision, but I reckon there's a higher chance
that all the protons in your body will spontaneously decay
before breakfast tomorrow.

Cheers,
Dario Teixeira

__._,_.___
Recent Activity
Visit Your Group
Give Back

Yahoo! for Good

Get inspired

by a good cause.

Y! Toolbar

Get it Free!

easy 1-click access

to your groups.

Yahoo! Groups

Start a group

in 3 easy steps.

Connect with others.

.

__,_._,___
code17 | 3 Nov 00:03

Re: How to identify the origin of a message using Unix module functions

Again, thanks both Dario and Richard for the detail explanation. Now I
understand the problem better.

If we were building a particular application, then there could be some
assumptions/restrictions about its destined working environments we can
make use of, then there might be some encodings can be *proved* to be
collision-free under these preconditions.

However, considering more general cases, such as ours --- a library
having no pre-knowledge about the applications it might be integrated
into (nor do we want to expose extra restrictions on that), some
encodings based on random numbers with infinitesimal collision
probability is probably the right way to go.

Hope I understand it correctly.

Thanks!

Dario Teixeira wrote:
> Hi,
>
> Generally you an assume thay IPs are unique. However, in NATed
> environments the uniqueness of IP is trickier than what it seems
> at first. You can have two machines in different networks, each
> with IP 192.168.1.1. There is no IP routing conflict because
> the router will NAT that private address into its own external
> IP. If, however, your protocol relies on each process
> identifying itself, then processes running on each of the two
> machines can both claim they are running on a host with IP
> 192.168.1.1... Nevertheless, I am sure you won't have to worry
> about this.
>
> The PID uniqueness is a also a tricky question, one that depends
> on whether the temporal interval is important. While on any
> given instant every process on a machine will have a different
> PID, because the PID numbers wrap around, a process p1 running
> at time t1 may have the same PID as a process p2 running at time
> t2 (if p1 terminated between t1 and t2, of course). If you are
> running Linux, check /proc/sys/kernel/pid_max, which tells you
> the wrap-around number.
>
> One quick and dirty (in the sense that it only works in probable, not
> provable terms) solution to your problem is
> to generate a large (say, 256 bit) random number and use
> that as your UUID. Sure, there's always the infinitesimal
> chance of a collision, but I reckon there's a higher chance
> that all the protons in your body will spontaneously decay
> before breakfast tomorrow.
>
> Cheers,
> Dario Teixeira
>

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Ads on Yahoo!

Learn more now.

Reach customers

searching for you.

Y! Groups blog

the best source

for the latest

scoop on Groups.

.

__,_._,___

Gmane