Bernd Petrovitsch | 6 Nov 2005 23:03
Picon
Favicon

[PATCH] Improvements in dot_draw plugin

The attached patch removes lots of duplicated stuff by adding a
ipc_send_str() function.
Furtehr a few typos are corrected and a function and several variables
were made "static".
It is against 0.4.9.

The open questions are:
- Do you actually appreciate such patches?
  And in that format?
- There severe problems there also: The use of sprintf() is dangerous
  since the function doesn't know how long the buffer.
  The obvious fix is to replace it (unconditionally) with the - now
  common on glibc systems - snprintf() (which I could do). The question
  is if non-Linux-OSes have a problem with such a fix.
  I'm also not a friend of strcpy()/strcat() for the same reason.
  And BTW strncpy()/strncat() have the problem that they do not
  guarantee a terminating \0 character in the string.
- Do you want more such "cleanup" patches?
  And can they be done so that *BSD and Win* do not suffer from
  Linux-isms I'm probably introducing?

	Bernd
--

-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services

Attachment (dot-draw.patch): text/x-patch, 5085 bytes
(Continue reading)

Thomas Lopatic | 7 Nov 2005 10:47
Picon

snprintf(), ...

Hi Bernd,

[Bernd suggested to use snprintf() et al. instead of sprintf(), strcat(),
...]

On Win32 olsrd uses msvcrt.dll, which ships with Windows, as its C
library. I've just checked and the version on my system (XP Service Pack
2) has snprintf(), strncpy(), and strncat(). However, I am not sure since
which Windows version these functions exist inside msvcrt.dll. So maybe
we'd break compatibility with earlier Windows versions by using them. But
then again, there should BSD-licensed source code available somewhere for
them, which we could include with olsrd for platforms that do not support
them natively.

-Thomas
Bernd Petrovitsch | 7 Nov 2005 10:56
Picon
Favicon

Re: snprintf(), ...

On Mon, 2005-11-07 at 10:47 +0100, Thomas Lopatic wrote:
[...]
> [Bernd suggested to use snprintf() et al. instead of sprintf(), strcat(),
> ...]
> 
> On Win32 olsrd uses msvcrt.dll, which ships with Windows, as its C
> library. I've just checked and the version on my system (XP Service Pack
> 2) has snprintf(), strncpy(), and strncat(). However, I am not sure since
> which Windows version these functions exist inside msvcrt.dll. So maybe
> we'd break compatibility with earlier Windows versions by using them. But
> then again, there should BSD-licensed source code available somewhere for
> them, which we could include with olsrd for platforms that do not support
> them natively.

A license compatible to the olsrd license is enough since the simple
integration is "put the source into the tree and add a #define for it".
I have to take a closer look into the Makefile to see what's the
simplest solution.

	Bernd
--

-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services
Thomas Lopatic | 7 Nov 2005 11:06
Picon

Re: snprintf(), ...

[adding source code for snprintf() to the olsrd source code]

>> But
>> then again, there should BSD-licensed source code available somewhere
>> for
>> them, which we could include with olsrd for platforms that do not
>> support
>> them natively.
>
> A license compatible to the olsrd license is enough since the simple
> integration is "put the source into the tree and add a #define for it".
> I have to take a closer look into the Makefile to see what's the
> simplest solution.

Wait a second... Is this really the case? Would we (or other people) be
allowed to distribute binary-only versions for, for example, Windows if we
used a GPLed implementation of snprintf() in olsrd?

Hmmm. Maybe we should simply use OpenBSD's implementation. We'd then
probably have the world's safest snprintf() in olsrd. :-) And it would
come with a BSD license.

-Thomas
Bernd Petrovitsch | 7 Nov 2005 11:46
Picon
Favicon

Re: snprintf(), ...

On Mon, 2005-11-07 at 11:06 +0100, Thomas Lopatic wrote:
> [adding source code for snprintf() to the olsrd source code]
[...]
> > A license compatible to the olsrd license is enough since the simple
> > integration is "put the source into the tree and add a #define for it".
> > I have to take a closer look into the Makefile to see what's the
> > simplest solution.
> 
> Wait a second... Is this really the case? Would we (or other people) be
> allowed to distribute binary-only versions for, for example, Windows if we
> used a GPLed implementation of snprintf() in olsrd?

Probably no. What does the license.txt actually mean? BSD-like?

And it is actually an interesting question: Am I forced to provide the
source code if the only GPL source is in a separate .c and .h file but
actually the code is not present in the compiled (and handed out) output
because the program links aginst a (system-)library which provides the
desired function(ality).

> Hmmm. Maybe we should simply use OpenBSD's implementation. We'd then
> probably have the world's safest snprintf() in olsrd. :-) And it would

Yes, we all know about the bugs in (OpenBSDs original) openssh years
ago;-)

> come with a BSD license.

I don't care *which* one (as long as I don't have to write one).

(Continue reading)

Andreas Tønnesen | 7 Nov 2005 07:14

Re: [PATCH] Improvements in dot_draw plugin

Hi Bernd,

Thanks, patches are always appreciated :-)
But if you can make them against the current CVS version you'll make
life easier for everybody and you make sure you're not fixing something
that is already fixed.

Regarding the sprintf issue I agree that it is bad, but in this code the
string arguments passed to the %s format is always the result for the
olsr_ip_to_string fuction which I regard to be a "controlled" result.
In the core olsrd code there is (AFAIK) no use of potentially dangerous
string functions, but in plugins such as this I don't think this is a
big problem. But if you'd like to clean it up in a platform independent
way, feel free to do so :-)

- andreas

Bernd Petrovitsch wrote:
> The attached patch removes lots of duplicated stuff by adding a
> ipc_send_str() function.
> Furtehr a few typos are corrected and a function and several variables
> were made "static".
> It is against 0.4.9.
> 
> The open questions are:
> - Do you actually appreciate such patches?
>   And in that format?
> - There severe problems there also: The use of sprintf() is dangerous
>   since the function doesn't know how long the buffer.
>   The obvious fix is to replace it (unconditionally) with the - now
(Continue reading)

Bernd Petrovitsch | 8 Nov 2005 12:34
Picon
Favicon

Re: [PATCH] Improvements in dot_draw plugin

On Mon, 2005-11-07 at 07:14 +0100, Andreas Tønnesen wrote:
[...]
> string functions, but in plugins such as this I don't think this is a
> big problem. But if you'd like to clean it up in a platform independent
> way, feel free to do so :-)

Is "%m" in printf()s and the like considered portable?

	Bernd
--

-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services
Thomas Lopatic | 8 Nov 2005 12:43
Picon

Re: [PATCH] Improvements in dot_draw plugin

> Is "%m" in printf()s and the like considered portable?

This doesn't work on Windows. So I am afraid, no.

-Thomas
Bernd Petrovitsch | 8 Nov 2005 13:32
Picon
Favicon

Re: [PATCH] Improvements in dot_draw plugin

On Tue, 2005-11-08 at 12:43 +0100, Thomas Lopatic wrote:
> > Is "%m" in printf()s and the like considered portable?
> 
> This doesn't work on Windows. So I am afraid, no.

Hmm, do you ever plan to implement olsr_syslog (e.g. to write something
into the Win* event log)?

	Bernd
--

-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services
Thomas Lopatic | 8 Nov 2005 13:49
Picon

Re: [PATCH] Improvements in dot_draw plugin

> On Tue, 2005-11-08 at 12:43 +0100, Thomas Lopatic wrote:
>> > Is "%m" in printf()s and the like considered portable?
>>
>> This doesn't work on Windows. So I am afraid, no.
>
> Hmm, do you ever plan to implement olsr_syslog (e.g. to write something
> into the Win* event log)?

Hmm, no, not yet. Would that help us with the "%m" issue?

"%m" on Linux is easy, as every API function that we use in olsrd sets
errno. To implement "%m" the C library can thus simply take errno, run it
through strerror() and display the message.

On Windows it's bit more complicated. The Windows port currently uses two
sources of error information instead of the single "errno" macro known
from Linux: WSAGetLastError() and GetLastError(). The former is used to
obtain error messages supplied by the Windows socket implementation, the
latter is used for all other API functions. I could imagine that this is
also one of the reasons why "%m" is not supported by msvcrt.dll. Too many
sources of error information on the OS.

-Thomas
Bernd Petrovitsch | 8 Nov 2005 14:21
Picon
Favicon

Re: [PATCH] Improvements in dot_draw plugin

On Tue, 2005-11-08 at 13:49 +0100, Thomas Lopatic wrote:
> > On Tue, 2005-11-08 at 12:43 +0100, Thomas Lopatic wrote:
> >> > Is "%m" in printf()s and the like considered portable?
> >>
> >> This doesn't work on Windows. So I am afraid, no.
> >
> > Hmm, do you ever plan to implement olsr_syslog (e.g. to write something
> > into the Win* event log)?
> 
> Hmm, no, not yet. Would that help us with the "%m" issue?

No. We should than eliminate %m from olsr_syslog() calls.
Some of them are buggy anyways since using syslog() and printf() both
may change errno if there is an error.

> "%m" on Linux is easy, as every API function that we use in olsrd sets
> errno. To implement "%m" the C library can thus simply take errno, run it
> through strerror() and display the message.

Yup.

> On Windows it's bit more complicated. The Windows port currently uses two
> sources of error information instead of the single "errno" macro known
> from Linux: WSAGetLastError() and GetLastError(). The former is used to
> obtain error messages supplied by the Windows socket implementation, the
> latter is used for all other API functions. I could imagine that this is

So you have to call it outside of the olsr_syslog() call since you need
to know where look (or use two different olsr_syslog() functions).

(Continue reading)

Thomas Lopatic | 14 Nov 2005 22:02
Picon

Re: [PATCH] Improvements in dot_draw plugin

[...]

> Hmm, the obvious replacement is to use `"%s", strerror(errno)`. But this
> doesnt' help either.
> So basically the question is: Should we kill the %m in olsr_syslog()
> calls?
> Or should we kill %m in all possibly non-Unix code?

[...]

I'd suggest that we move to the '"%s", strerror(errno)' combination.
That's what we currently also use for printf()s. Luckily each source
code file currently requires either GetLastError() or WSAGetLastError()
but not both at the same time. So, I define "errno" to be one of these
two functions and "strerror()" to be my substitute function that turns
an error code into an error message. Have a look at parser.c or
socket_parser.c as an example of this.

If we need GetLastError() and WSAGetLastError() in the same source code
file one day, we can think of extending our solution to this problem. :-)

-Thomas
Bernd Petrovitsch | 5 Jan 2006 17:40
Picon
Favicon

Die "%m", die (was Re: [PATCH] Improvements in dot_draw plugin)

On Mon, 2005-11-14 at 22:02 +0100, Thomas Lopatic wrote:
> [...]
> > Hmm, the obvious replacement is to use `"%s", strerror(errno)`. But this
> > doesnt' help either.
> > So basically the question is: Should we kill the %m in olsr_syslog()
> > calls?
> > Or should we kill %m in all possibly non-Unix code?
> [...]
> 
> I'd suggest that we move to the '"%s", strerror(errno)' combination.
> That's what we currently also use for printf()s. Luckily each source
[...]
The attached patch against cvs-current kills all %m which are used for
printf(), syslog() and the like and replaces them with the above
solution.

I also replaced the ones in the Unix-specific directories because
- to be able to check if with a simple 
find -type f -name '*.[ch]' -print0 | xargs -0r fgrep %m /dev/null
- no one should be tempted or able to copy-paste them into "wrong"
  files.
- it actually fixes bugs since several of them are called after
  e.g. OLSR_PRINTF() or perror() and they surely clobber "errno"
  anyway.

The patch will probably break compilation on Win* since it need #defines
at the top.

Also some returned values by ioctl() and similar sys-calls are compared
with "== -1" instead of "< 0" since it is defined that way.
(Continue reading)

Roar Bjørgum Rotvik | 6 Jan 2006 09:54

Re: Die "%m", die (was Re: [PATCH] Improvements in dot_draw plugin)

Bernd Petrovitsch wrote:
> On Mon, 2005-11-14 at 22:02 +0100, Thomas Lopatic wrote:
[ ... ]
> The attached patch against cvs-current kills all %m which are used for
> printf(), syslog() and the like and replaces them with the above
> solution.
> 
> I also replaced the ones in the Unix-specific directories because
> - to be able to check if with a simple 
> find -type f -name '*.[ch]' -print0 | xargs -0r fgrep %m /dev/null
> - no one should be tempted or able to copy-paste them into "wrong"
>   files.
> - it actually fixes bugs since several of them are called after
>   e.g. OLSR_PRINTF() or perror() and they surely clobber "errno"
>   anyway.

I have some comments to this patch.

You indicate that errno may be clobbered by OLSR_PRINTF() or perror(),
but look at this description from "man 3 perror" on linux:
"The error number is taken  from  the external variable errno, which is
set when errors occur but not cleared when non-erroneous calls are made."

That means that if a system call returns error code (normally -1), then
errno is set to new error value. If the functions do not fail and return
an error code, the errno value is unchanged.

And by looking at the man page for fprintf (as OLSR_PRINTF uses), it may
return an negative value but the man page does not specify that the
errno is changed or set to a spesific value.
(Continue reading)

Bernd Petrovitsch | 8 Jan 2006 21:00
Picon
Favicon

Re: Die "%m", die (was Re: [PATCH] Improvements in dot_draw plugin)

On Fri, 2006-01-06 at 09:54 +0100, Roar Bjørgum Rotvik wrote:
> Bernd Petrovitsch wrote:
> > On Mon, 2005-11-14 at 22:02 +0100, Thomas Lopatic wrote:
> [ ... ]
> > The attached patch against cvs-current kills all %m which are used for
> > printf(), syslog() and the like and replaces them with the above
> > solution.
> > 
> > I also replaced the ones in the Unix-specific directories because
> > - to be able to check if with a simple 
> > find -type f -name '*.[ch]' -print0 | xargs -0r fgrep %m /dev/null
> > - no one should be tempted or able to copy-paste them into "wrong"
> >   files.
> > - it actually fixes bugs since several of them are called after
> >   e.g. OLSR_PRINTF() or perror() and they surely clobber "errno"
> >   anyway.
> 
> I have some comments to this patch.

Good, than someone actually reads it;-)

> You indicate that errno may be clobbered by OLSR_PRINTF() or perror(),
> but look at this description from "man 3 perror" on linux:
> "The error number is taken  from  the external variable errno, which is
> set when errors occur but not cleared when non-erroneous calls are made."
> 
> That means that if a system call returns error code (normally -1), then
> errno is set to new error value. If the functions do not fail and return
> an error code, the errno value is unchanged.
> 
(Continue reading)

Roar Bjørgum Rotvik | 9 Jan 2006 10:33

Re: Die "%m", die (was Re: [PATCH] Improvements in dot_draw plugin)

Bernd Petrovitsch wrote:
> On Fri, 2006-01-06 at 09:54 +0100, Roar Bjørgum Rotvik wrote:
> 
>>Bernd Petrovitsch wrote:
>>
>>>On Mon, 2005-11-14 at 22:02 +0100, Thomas Lopatic wrote:
>>
>>[ ... ]
>>
>>>The attached patch against cvs-current kills all %m which are used for
>>>printf(), syslog() and the like and replaces them with the above
>>>solution.
>>>
>>>I also replaced the ones in the Unix-specific directories because
>>>- to be able to check if with a simple 
>>>find -type f -name '*.[ch]' -print0 | xargs -0r fgrep %m /dev/null
>>>- no one should be tempted or able to copy-paste them into "wrong"
>>>  files.
>>>- it actually fixes bugs since several of them are called after
>>>  e.g. OLSR_PRINTF() or perror() and they surely clobber "errno"
>>>  anyway.
>>
>>I have some comments to this patch.
> 
> Good, than someone actually reads it;-)
> 
>>Neither the manpages for fprintf() or perror() says that errno is set if
>>an error occours, so I don't think that errno is clobbered by these
>>functions. Do you know otherwise since you said "they surely clobber
>>"errno" anyway"?
(Continue reading)

Bernd Petrovitsch | 9 Jan 2006 11:01
Picon
Favicon

Re: Die "%m", die

On Mon, 2006-01-09 at 10:33 +0100, Roar Bjørgum Rotvik wrote:
[....]
> But a little note:
> The error string from strerror() is a static string that may be changed
> by either sterror() or perror(). From "man strerror":
> "The strerror() function returns a string describing the error code
> passed in the argument errnum, possibly using the LC_MESSAGES part of
> the current locale to select the appropriate language.  This string must
> not be modified by the application, but may be modified by a subsequent
> call to perror() or strerror(). No library function will modify this
> string."

Ooops.

[...]
> This means that the call to strerror() will store the error string from
> sendto(), but as the man page for strerror() indicates, the call to
> perror() may overwrite this static string and the call to olsr_syslog()
> may in fact return error string from perror()... :)

Yes.

> What about storing the actual errno instead of the string as in this
> example?
> 	if (olsr_sendto(...) < 0)
>  	{
>           const int my_errno = errno;
>  	  perror("sendto(v4)");
> 	  olsr_syslog(OLSR_LOG_ERR, "OLSR: sendto IPv4 %s", strerror (my_errno));
>  	  netbufs[ifp->if_nr]->pending = 0;
(Continue reading)

Thomas Lopatic | 9 Jan 2006 11:34
Picon

Re: Re: Die "%m", die

Hey Bernd,

First of all, thanks for your thorough code review efforts! We greatly
appreciate it that somebody takes the time to clean up the code of olsrd.

[...]

> My strategic problem with the error reporting actually is: I can't see
> in a simple way, what should be reported when and via which means and
> how should it be configurable (at compile-time and at run-time).
> There is perror() at some places, there is the OLSR_FPRINTF macro, there
> is syslog() and in different combination at different places.
> 
> Perhaps Andreas T. or Thomas L. can put some light on this (and I'm not
> so interested in the history but more on the goal or features)?!

Erm, there isn't any goal behind the diversity of error reporting
mechanisms in olsrd, there's only history. :-) So, I agree that making
the error reporting use the same function in all places is a good idea.

> Im my simple world, a function like "void olsr_error(int level, const
> char * const format, ...);"[0] should hide all the magic (probalby with
> configuration variables) if perror() should be used, syslog(), or
> whatever and how the messages should be filtered.

The OLSR_PRINTF macro is in place so that, on platforms that do not have
much CPU power, we can easily remove things like the following from the
code:

OLSR_PRINTF(3, "The result of the really expensive debug operation is
(Continue reading)

Andreas Tønnesen | 9 Jan 2006 11:58

Re: Re: Die "%m", die


> Andreas - did I miss anything?

Nope, I agree. Now just some tiny words of history. There used to be a
olsr_print function, but this was replaced with the macro a couple of
revisions ago for the reasons Thomas put up. Printouts for high
debuglevels tend to be called very intensive and did cost a lot of CPU as
shown by profiling. We discussed having debug options fixed at build time,
but it makes debugging harder so we went with the define solution(which
also allows us to remove the debug statements completly at buildtime).
Also, there is really no big philosophy behind the stderr printout vs.
syslog.  I've mainly just taken the approach that you syslog what you want
an admin to know and possibly report to us(but not create gigantic logs)
and you print what you want people debugging/developing to know. The
syslog set is pretty much always a subset of the print set, but I think
they should be regarded as two different entities.
Now I'm sure this can be handled better but there has not been any effort
put into cleaning this up. Suggestions for changes are always welcome :-)

- Andreas
Bernd Petrovitsch | 9 Jan 2006 15:40
Picon
Favicon

Re: Re: Die "%m", die

On Mon, 2006-01-09 at 11:58 +0100, Andreas Tønnesen wrote:
> > Andreas - did I miss anything?
> 
> Nope, I agree. Now just some tiny words of history. There used to be a

I feared that there is only history.

> olsr_print function, but this was replaced with the macro a couple of
> revisions ago for the reasons Thomas put up. Printouts for high
> debuglevels tend to be called very intensive and did cost a lot of CPU as
> shown by profiling. We discussed having debug options fixed at build time,
> but it makes debugging harder so we went with the define solution(which
> also allows us to remove the debug statements completly at buildtime).
> Also, there is really no big philosophy behind the stderr printout vs.
> syslog.  I've mainly just taken the approach that you syslog what you want
> an admin to know and possibly report to us(but not create gigantic logs)
> and you print what you want people debugging/developing to know. The
> syslog set is pretty much always a subset of the print set, but I think
> they should be regarded as two different entities.
> Now I'm sure this can be handled better but there has not been any effort
> put into cleaning this up. Suggestions for changes are always welcome :-)

Features:
- several debug levels as specified now e.g. in the config file for
  run-time selection.
  Which number range BTW? The numbers should somewhat correlate with
  the LOG_ERR levels of syslog() if we use that (and I don't know about
  the Win view on that).
- a #define for compile-time "selection" of said levels.
- indepently selection of "print to stderr" and "syslog" it.
(Continue reading)

Andreas Tønnesen | 9 Jan 2006 16:12

Re: Re: Die "%m", die


> Features:
> - several debug levels as specified now e.g. in the config file for
>   run-time selection.
>   Which number range BTW? The numbers should somewhat correlate with
>   the LOG_ERR levels of syslog() if we use that (and I don't know about
>   the Win view on that).
> - a #define for compile-time "selection" of said levels.
> - indepently selection of "print to stderr" and "syslog" it.

Sounds fair enough. But should the debug level be a "level" like it is
today, or rather a mask whare we can categorize the debug info better? The
latter may become to complicated(user-wise) unless one is fairly strict
with the allowed amount of flags. Still, it makes it easier for devs to
request the exact output needed.
And should there be a shared parameter for loglevel and debuglevel? IMO if
we are to introduce a common function for logging to syslog and stderr we
should be able to log to both in the same call. Wouldn't this be
problematic if debuglevel is shared with log levels?

- Andreas
Bernd Petrovitsch | 9 Jan 2006 17:04
Picon
Favicon

Re: Re: Die "%m", die

On Mon, 2006-01-09 at 16:12 +0100, Andreas Tønnesen wrote:
> > Features:
> > - several debug levels as specified now e.g. in the config file for
> >   run-time selection.
> >   Which number range BTW? The numbers should somewhat correlate with
> >   the LOG_ERR levels of syslog() if we use that (and I don't know about
> >   the Win view on that).
> > - a #define for compile-time "selection" of said levels.
> > - indepently selection of "print to stderr" and "syslog" it.
> 
> Sounds fair enough. But should the debug level be a "level" like it is
> today, or rather a mask whare we can categorize the debug info better? The

Do we actually have categories that make sense?

> latter may become to complicated(user-wise) unless one is fairly strict
> with the allowed amount of flags. Still, it makes it easier for devs to
> request the exact output needed.
> And should there be a shared parameter for loglevel and debuglevel? IMO if
> we are to introduce a common function for logging to syslog and stderr we
> should be able to log to both in the same call. Wouldn't this be

ACK. I don't think it makes sens to have 3 parameters just to deduce
syslog-level, printf-level and a mask for categories since next to no
one will really understand it (and hey, that's just a tool within the
source and not a primary feature;-).

> problematic if debuglevel is shared with log levels?

That's the question. I wouldn't forget (or ignore) simplicity - both for
(Continue reading)

Andreas Tønnesen | 9 Jan 2006 18:19

Re: Re: Die "%m", die


Bernd Petrovitsch wrote:
>>Sounds fair enough. But should the debug level be a "level" like it is
>>today, or rather a mask whare we can categorize the debug info better? The
> 
> 
> Do we actually have categories that make sense?

 From a devs perspective I certainly think so. It might be interesting 
debugging just the scheduler, NIC changes, neighor set changes,
packet reception etc. without having all the other debug output
automatically applied. The dager is that you might end up with one flag
for every smallest functionallity.
But from a "system" point of view I don't know how useful this is and
it might be cumbersome for end-users to use. But then again one could
create predefined debug levels from these groups which woul create an
easy interface for users...
might be an overkill though :-)

> That's the question. I wouldn't forget (or ignore) simplicity - both for
> the admin/user choosing a debug value/mask and the developer which has
> to decide on the level and/or bits in th mask for every single message.

Yepp.

> To answer that question: IMHO "no". It is quite simple. And I usually in
> my code, the log levels are to the syslog levels. On the first run, the
> values are more or less random since they are development driven, but
> once the code stabilized the messages are marked more for "maintenance"
> use.
(Continue reading)

onelektra | 15 Nov 2005 02:37
Picon
Gravatar

Seems like a bug...

Hi -

I just made a Slackware Package from cvs and applied some changes to my 
olsrd.conf.

HelloValidity is 100.0 but the deamon tells me this:

  *** olsr.org - 0.4.10-pre ***
  Build date: Nov 14 2005

WARNING: wlan0 HELLO validity time set to 1000.0 seconds!
WARNING: eth0 HELLO validity time set to 1000.0 seconds!

That's the relevant part in my olsrd.conf:

LinkQualityWinSize      100

     HelloInterval 10.0
     HelloValidityTime   100.0

     TcInterval 3.0
     TcValidityTime      30.0

     MidInterval 5.0
     MidValidityTime     50.0

     HnaInterval 5.0
     HnaValidityTime     50.0

cu elektra
(Continue reading)

Thomas Lopatic | 15 Nov 2005 11:05
Picon

Re: Seems like a bug...

Hey Elektra.

[...]

> WARNING: wlan0 HELLO validity time set to 1000.0 seconds!
> WARNING: eth0 HELLO validity time set to 1000.0 seconds!
> 
> That's the relevant part in my olsrd.conf:
> 
> LinkQualityWinSize      100
> 
>     HelloInterval 10.0
>     HelloValidityTime   100.0

[...]

It's a feature!! With a LQ window size of 100 and a HELLO interval of 10
seconds olsrd now requires a minimal HELLO validity time of 100 x 10
seconds = 1000 seconds. If you specify a shorter validity time, olsrd
automatically ups this to 1000 seconds. That's what the warning is
about. With a window size of 100 we do not want to lose a link after 100
seconds, i.e. after only 10 missed HELLOs. Or am I missing something here?

Hmmm. Maybe the warning message is a bit misleading as it suggests that
the user has set the validity time to 1000 seconds.

-Thomas
Sven-Ola Tuecke | 15 Nov 2005 12:55
Picon

Re: Seems like a bug...

Thomas,

is it possible to change that, i.e. have a shorter valtime than 
win*interval? Oh yes - what for: Like to have up to 16 minutes measurement 
window for more precision but don't want a dead link in the tables for that 
long time. Do I miss something hiere, or is it just a matter of discarding 
that plausicheck?

Another elektra-suggestion: TCs twice as fast as Hellos for more routing 
stability in case of a lossy link...

LG Sven-Ola

"Thomas Lopatic" <thomas <at> lopatic.de> schrieb im Newsbeitrag 
news:4379B300.3040508 <at> lopatic.de...
> Hey Elektra.
>
> [...]
>
>> WARNING: wlan0 HELLO validity time set to 1000.0 seconds!
>> WARNING: eth0 HELLO validity time set to 1000.0 seconds!
>>
>> That's the relevant part in my olsrd.conf:
>>
>> LinkQualityWinSize      100
>>
>>     HelloInterval 10.0
>>     HelloValidityTime   100.0
>
> [...]
(Continue reading)

Thomas Lopatic | 16 Nov 2005 00:51
Picon

Re: Seems like a bug...

[for ETX olsrd currently enforces a lower limit for the HELLO validity
time based on the HELLO interval and the LQ window size]

> is it possible to change that, i.e. have a shorter valtime than
> win*interval?

Sure. I've just committed a fix to CVS. If the user doesn't specify
anything, the HELLO validity time is window size x HELLO interval. If
the user specifies a HELLO validity time, his or her choice has
precedence over the default.

-Thomas
onelektra | 15 Nov 2005 22:39
Picon
Gravatar

Improvements in the algorithm, and bugs that are fixed in the upcoming release


Hi all -

let me introduce you to some thoughts and off-list discussions that we
had here in Berlin about the occurance of routing loops in a mesh with
olsrd.

I stated the hypothesis that routing loops occur rather local than from
end-to-end in a mesh cloud. Olsrd with LQ-Extension attempts by design
to know all the best routes all over the whole mesh cloud, but it is
likely that it never will be able to achieve this in a mesh that has
more than a handful of nodes. TC informations are likely to be lost on
there way while they propagate over the mesh. And this likelyhood
increases with the number of nodes that forward TC information.

But this fact doesnt necessarily harm the routing of traffic if it
travels a long multihop path. A node at one end of a mesh cloud may have
the illusion to know the exact and best path its packages go when
sending payload to a node that is several hops away. But this
information may be pretty outdated and incomplete.

In fact all we can and have to achieve with the algorithm is a
reasonable choice for the next two or three hops. If our path is 8 hops,
for example, nodes that are more hops away from the node initiating
traffic and closer to the destination have better and more accurate
information about the best path. They dont know what a node that
initiates traffic thinks about the path its packets have to take - they
have more accurate routing information and will look into their routing
table and make a choice based on their knowledge.

(Continue reading)

Aaron Kaplan | 16 Nov 2005 00:29
Gravatar

Re: Improvements in the algorithm, and bugs that are fixed in the upcoming release


sounds like HLSL :)

by the way.... some time ago there was basically two choices of  
working / useful implementations: olsr and aodv. Aodv was just to  
buggy at that time. So - big deal, it became olsr :)))

Now when I look at http://en.wikipedia.org/wiki/Ad_hoc_protocol_list
I thought there might be quite a few very good ideas out there.

My personal favorite metric is: take the shortest route which  
generates the least
signal interference (i.e. hop from channel 1 -> 13 -> 6 ... etc)
Of course this is 802.11 specific...

best regards,
aaron.

On Nov 15, 2005, at 10:39 PM, onelektra wrote:

>
>
> Hi all -
>
> let me introduce you to some thoughts and off-list discussions that we
> had here in Berlin about the occurance of routing loops in a mesh with
> olsrd.
>
> I stated the hypothesis that routing loops occur rather local than  
> from
(Continue reading)

onelektra | 16 Nov 2005 17:55
Picon
Gravatar

Re: Improvements in the algorithm, and bugs that are fixed in the upcoming release

Hi -

> My personal favorite metric is: take the shortest route which  generates 
> the least
> signal interference (i.e. hop from channel 1 -> 13 -> 6 ... etc)
> Of course this is 802.11 specific...

well, this is not exactly what I consider a metric. Using multiple 
wireless interfaces on different frequencies would be really nice. Well, 
you can switch channels, of course. But how would the algorithm 
coordinate that - keeping track who is listening on which channel at a 
certain time? I'd rather go for several interfaces...

cu elektra
Dan | 16 Nov 2005 07:19
Picon
Favicon

RE: Improvements in the algorithm

It's good to see this sort of discussion on this list.  I've been doing some
thinking myself about this.

Basically the problem with OLSR is that it is a flat routing protocol.  That
is, it is not hierarchical.  In a traditional wired network, hierarchy can
be seen in the CIDR subnetting of IP address space.  
Routing in a wired network is achieved like so:
Packet destination address is 10.10.100.1 - the local router has a route for
10.10.0.0/16 and so sends it towards that /16 network.  The next router has
a route for 10.10.96.0/20 and so sends the packet towards that particular
/20 network.  The next router has a route for 10.10.100.0/28 and so sends
the packet to that network - hopefully that network is an Ethernet LAN and
it can then find it's way to 10.10.100.1 by doing an ARP (IP-to-MAC address)
lookup.  With the MAC address, the router on the destination network can
simply send the packet and the switch on the destination network will send
the packet to it's destination host.

There is a problem though.  If the node with the IP address of 10.10.100.1
moves to another area, it may no longer be within cabled range of the
10.10.100.0/28 network.  So it must change it's IP address.  This is also
true of wireless networks that are subnetted the same way - with each
geographical location covering a different range of IP addresses.  If a node
moves across a geographic subnet border it must change it's IP address.  Any
current data connections must be closed and re-established after the new
address is configured.  Also, because it has a new IP address, other nodes
may now not know where to find it.  Traditional methods of keeping track of
dynamic IP addresses, such as dynamic DNS or mobile IP are basically too
slow to keep track of a fast-roaming node, and they require centralised,
human-managed servers to maintain the identity-to-location lookup tables.

(Continue reading)

onelektra | 16 Nov 2005 18:00
Picon
Gravatar

Re: Improvements in the algorithm

Hi -

> Basically the problem with OLSR is that it is a flat routing protocol.  That
> is, it is not hierarchical. 

true, that's what olsr is - a routing protocol acting within a subnet or 
AS - autonomous system. Introducing several subnets would be something 
completely different... I dont expect any ad-hoc wireless network to 
scale on global level ;-) Its nice dreaming about this, though.

cu elektra
Dan | 17 Nov 2005 04:43
Picon
Favicon

RE: Improvements in the algorithm

> I dont expect any ad-hoc wireless network to
> scale on global level ;-) Its nice dreaming about this, though.
> 
> cu elektra

Hi elecktra,

Certainly I don't think wireless is the answer to global connectivity
issues.  In many aspects cables will always be better than radio waves.  But
now with networking so ubiquitous we are starting to see the real
limitations of the IP protocol.  It scales well but handles mobility badly.
Mesh routing is the opposite.  It handles mobility well but scales badly.

Read the bottom couple of paragraphs of my previous post in this list to get
to the good stuff.  It's called DART.  There are "scalable" adhoc routing
protocols that use various methods to build a hierarchy.  I believe your
suggestion closely matches what ZRP does.  Other methods use geographic
information to lighten the routing overhead load - so each mobile node needs
to have a GPS receiver installed.

DART is different.  It revamps the addressing scheme, so that each node has
a static identity, and a dynamic routing address that automatically changes
as the node roams the network.  The advantage of this is that routing tables
stay extremely small (logarithmic to the number of nodes), and address space
is allocated in the most efficient manner possible.  It also means that you
don't necessarily need to have a human-managed IP allocation system -
although IP addresses will be used as the static identities to maintain
compatibility with IP-dependant applications.  Theoretically, DART could be
completely self-organising, able to scale globally with no human
administration or centralised servers to provide critical network services
(Continue reading)

onelektra | 17 Nov 2005 13:21
Picon
Gravatar

Re: Improvements in the algorithm

Hi Dan -

thanks for your comprehensive post. Olsr doesn't scale to a large level 
and it has a flat subnet hierarchy that doesn't scale to a high level. I 
agree. I don't think we have a point of argument here.

The free networks community needs running code - so if someone comes up 
with a promising solution like DART and code that we can run on our 
machines - we are really glad to test it. That was the reason why we 
started to use mobilemesh first and later olsrd. Olsrd looked very 
promising - not because of the fancy algorithms but because there were 
several implementations supporting more than one OS.

We learned that many ideas that have been considered as optimizations in 
olsr harmed the stability of the ad-hoc network. At the moment we have 
disabled the MPR-feature and the hysteresis. We introduced the 
LQ-Extension and currently we are testing a fisheye-algorithm to allow 
TC broadcasts at higher frequency to avoid routing loops.

I'm tempted to rant at the academic approach to research mesh routing. 
In my opinion the scientific aproach spins far too much about the 
question: Is our algorithm scalable? How can we reduce protocol 
overhead? Rather than: Does it work in a wireless network full of 
collisions, lost packages and interference? I will shamelessly smile at 
any researcher that comes up with an new idea that finds shortest path 
routes at best ;-)

To me they seem to be inebriated by the visons popping up when they say 
'scalability'.

(Continue reading)

sebastian | 16 Nov 2005 10:09

Re: Improvements in the algorithm

hiya,

not sure if this is completely on topic, but atleast related.
I'm not sure how olsr handles route switching, but  this could be a
solution if it is a problem such a described below:

  say you(node A) can reach a node(C) through 2 other nodes(B and D), but
  link quality differs, sometimes B is better, sometimes D. at what point
  will this inflict a route change? as soon as one gets only slight
  better then the other? or is there a threshold? currently switching
  routes seems like a bad idea to me(takes up cpu and network traffic?)

another issue I think would be the big routing tables for big
networks(one route per host) seems atleast to take up memory and
increase network traffic(when you tell who you can reach)

now, the sugestion I have would move the resource issue from network and
memory to cpu.
say you have some control over how the network will look ip wise.
ofcourse it needs to be flexible as nodes can move around but most will
not (think city wide roof network for example). now you could choose
that atleast -initially- you give nodes in the west an ip within
10.2.3.0/24 and a node in south an ip in 10.2.4.0/24, for example. doing
this with a regular olsr setup won't gain you anything except you have
more of an idea what ip could be where. now if you have an option or a
plugin in olsr to let it -calculate- subnets for neighbour nodes. a node
in the east would only need one entry to find a node in the west,
initially.. this ofcourse could become less efficient as nodes move a
round a bit, but if the majority will stay in place it will, certainly
for bigger networks be quite a gain. an example to show what I mean:
(Continue reading)

onelektra | 16 Nov 2005 18:21
Picon
Gravatar

Re: Improvements in the algorithm

hi -

>   say you(node A) can reach a node(C) through 2 other nodes(B and D), but
>   link quality differs, sometimes B is better, sometimes D. at what point
>   will this inflict a route change? as soon as one gets only slight
>   better then the other? or is there a threshold? currently switching
>   routes seems like a bad idea to me(takes up cpu and network traffic?)

yes, according to Thomas there is a threshold implemented (personally I 
call it hysteresis).

> another issue I think would be the big routing tables for big
> networks(one route per host) seems atleast to take up memory and
> increase network traffic(when you tell who you can reach)

I see olsrd as a solution for a wireless ad-hoc subnet of up to 100 - 
300 nodes. I expect olsrd to handle this quite well in the not so far 
future. It works quite nice already, but with the stability issues under 
link saturation that I described in my post from yesterday.

This is what I expect olsrd to achieve. In order to build a network that 
goes beyond that one should have a look at the way the internet already 
works. To me there is no point in reinventing RIP, BGP, OSPF. If a 
community wants to have a WAN that interconnects more than a few 
hundered nodes they should have a look at the known art :-)

If one wants to develop a protocol that is able to route 16.7 million 
nodes within one subnet - carry on ;-)

cu elektra
(Continue reading)

Bernd Petrovitsch | 16 Nov 2005 09:53
Picon
Favicon

RE: Improvements in the algorithm

On Wed, 2005-11-16 at 17:19 +1100, Dan wrote:
[...]
> There is a problem though.  If the node with the IP address of 10.10.100.1
> moves to another area, it may no longer be within cabled range of the
> 10.10.100.0/28 network.  So it must change it's IP address.  This is also
> true of wireless networks that are subnetted the same way - with each
> geographical location covering a different range of IP addresses.  If a node
> moves across a geographic subnet border it must change it's IP address.  Any
> current data connections must be closed and re-established after the new
> address is configured.  Also, because it has a new IP address, other nodes
> may now not know where to find it.  Traditional methods of keeping track of
> dynamic IP addresses, such as dynamic DNS or mobile IP are basically too
> slow to keep track of a fast-roaming node, and they require centralised,
> human-managed servers to maintain the identity-to-location lookup tables.

And - especially - the security is the major problem. Think of ssh and
moaning about chnaged IP adresses for a given host key.
So - if you want security - you need also a working PKI as a
prerequisite.

And there is also another thing: At least in Vienna we have lots of
static nodes which basically do not move (since the antenna is mounted
on the top of the roof). For these it makes absolutely sense to have a
network/mask (as opposed to really mobile nodes).

	Bernd

PS: Sorry, I didn't read the rest of the mail up tonow.
--

-- 
Firmix Software GmbH                   http://www.firmix.at/
(Continue reading)

Bernd Petrovitsch | 14 Nov 2005 23:26
Picon
Favicon

Re: [PATCH] Improvements in dot_draw plugin

On Mon, 2005-11-14 at 22:02 +0100, Thomas Lopatic wrote:
[...]
> I'd suggest that we move to the '"%s", strerror(errno)' combination.
> That's what we currently also use for printf()s. Luckily each source
> code file currently requires either GetLastError() or WSAGetLastError()
> but not both at the same time. So, I define "errno" to be one of these
> two functions and "strerror()" to be my substitute function that turns
> an error code into an error message. Have a look at parser.c or
> socket_parser.c as an example of this.

Yes, I stumbled over such #defines.

> If we need GetLastError() and WSAGetLastError() in the same source code
> file one day, we can think of extending our solution to this problem. :-)

OK.

	Bernd
--

-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services
Bernd Petrovitsch | 8 Nov 2005 12:07
Picon
Favicon

Re: [PATCH] Improvements in dot_draw plugin

On Mon, 2005-11-07 at 07:14 +0100, Andreas Tønnesen wrote:
[...]
> Thanks, patches are always appreciated :-)
> But if you can make them against the current CVS version you'll make
> life easier for everybody and you make sure you're not fixing something
> that is already fixed.

ACK. And I (tried to) apply to former patch. A similar one against
current cvs is attached.

> Regarding the sprintf issue I agree that it is bad, but in this code the
> string arguments passed to the %s format is always the result for the
> olsr_ip_to_string fuction which I regard to be a "controlled" result.

Yes, of course I don't want to imply that anything is seriously broken
*now* (please insert definition of "seriously broken" here;-).
It is just to be sure that no (hidden) bug is introduced if e.g. some of
the controlled functions (like inet_ntoa()) is replaced at sometime in
the future with not-so-controlled functions (or even worse that parts of
these strings comes from the config-file or command line).
And I don't think that one (especially not me) checks *all* callers of a
function if such a change is made just to check that some buffer is
large enough after the change.

> In the core olsrd code there is (AFAIK) no use of potentially dangerous
> string functions, but in plugins such as this I don't think this is a

I will explain the potential weakness with the patches (even if they are
quite pathological cases).
And gcc (especially gcc-4.0x) has also a lot of nice warnings to find
(Continue reading)

Andreas Tønnesen | 8 Nov 2005 12:51

Re: [PATCH] Improvements in dot_draw plugin


> ACK. And I (tried to) apply to former patch. A similar one against
> current cvs is attached.

The patch is applied to cvs. I'll apply the cvsingore patch later on.

> BTW is there some (not so interesting) plugin or similar part of the
> code where the platform independency can be tried with simply committing
> a change and wait for complaints/bug reports (i.e. the compile breakage
> is not that serious).

Well... I guess if one want's a rather quick response one might as well
break olsrd. But the best way IMO, is probably to put the patch on the
dev-list and ask for confirmations from users of the various systems.

- Andreas

Gmane