Christopher Li | 4 Sep 03:16

Fwd: [PATCH 0/10] Sparse linker

Oops, forget to CC the list.

Chris

---------- Forwarded message ----------
From: Chris Li <sparse <at> chrisli.org>
Date: Wed, Sep 3, 2008 at 6:08 PM
Subject: Re: [PATCH 0/10] Sparse linker
To: alexey.zaytsev <at> gmail.com

On Wed, Sep 3, 2008 at 2:55 PM,  <alexey.zaytsev <at> gmail.com> wrote:

> more on the subject, I now agree that we should include the
> intermediate code representation into the object files.

Good.

> for this is a four byte overhead prepended to every
> serializable structure by the allocation wrapper. Also, you

I would rather not have that 4 byte prepended to every
structure. Serialize is just one short stage of the life cycle
of those c structures. Having the permanent extra space
for just that is unnecessary. That 4 bytes meta data also
limits what C structure you can work on. All you need
is being able to map a point into some serialize object
to keep track which object is tracked and which one is not.

After you serialized the data. The meta data can be drop
completely. So the price to pay is for every unknown object
(Continue reading)

Alexey Zaytsev | 4 Sep 06:03

Re: [PATCH 0/10] Sparse linker

On Thu, Sep 4, 2008 at 5:16 AM, Christopher Li <sparse <at> chrisli.org> wrote:
> Oops, forget to CC the list.
>
> Chris
>
> ---------- Forwarded message ----------
> From: Chris Li <sparse <at> chrisli.org>
> Date: Wed, Sep 3, 2008 at 6:08 PM
> Subject: Re: [PATCH 0/10] Sparse linker
> To: alexey.zaytsev <at> gmail.com
>
>
> On Wed, Sep 3, 2008 at 2:55 PM,  <alexey.zaytsev <at> gmail.com> wrote:
>
>> more on the subject, I now agree that we should include the
>> intermediate code representation into the object files.
>
> Good.
>
>> for this is a four byte overhead prepended to every
>> serializable structure by the allocation wrapper. Also, you
>
> I would rather not have that 4 byte prepended to every
> structure. Serialize is just one short stage of the life cycle
> of those c structures. Having the permanent extra space
> for just that is unnecessary. That 4 bytes meta data also
> limits what C structure you can work on. All you need
> is being able to map a point into some serialize object
> to keep track which object is tracked and which one is not.
>
(Continue reading)

Christopher Li | 4 Sep 09:27

Re: [PATCH 0/10] Sparse linker

On Wed, Sep 3, 2008 at 9:03 PM, Alexey Zaytsev <alexey.zaytsev <at> gmail.com> wrote:
> If I understand the question right, no. Every "sparse object" .so has a
> "struct ptr_list *symbols" entry (in fact, the only non-static entry) that
> points to the serialized ptr list of the "struct sold_symbol". The linker
> dlopen()'s the .so and hooks to the entry, for every input object file.
> After that, it simply calls ptr_list_concat() on the opened symbol lists,
> and serializes the resulting combined list. There is of course nothing
> wrong if we modify the data obtained from the .so, as it is cow-mmaped.
...
> Well, I serialize the data into C, and then compile it into .so, if
> that was the question. You might want to apply the first patch
> and look at the serialization-test output.

OK. I just realized that you are building a completely different kind
of "linker" than I have in mind.

Generate C source file and let gcc to compile and link it is an
interesting idea. But I think it is a step back wards.

For starts, how do you handle the case that the symbol from your
input file have conflict on the function define in the loader itself?

If I understand your plan correctly, I don't see how it can handle
the following case:

file a.c:

void foo(void) {
    printf("%p\n", &bar);
}
(Continue reading)

Alexey Zaytsev | 4 Sep 11:41

Re: [PATCH 0/10] Sparse linker

On Thu, Sep 4, 2008 at 11:27 AM, Christopher Li <sparse <at> chrisli.org> wrote:
> On Wed, Sep 3, 2008 at 9:03 PM, Alexey Zaytsev <alexey.zaytsev <at> gmail.com> wrote:
>> If I understand the question right, no. Every "sparse object" .so has a
>> "struct ptr_list *symbols" entry (in fact, the only non-static entry) that
>> points to the serialized ptr list of the "struct sold_symbol". The linker
>> dlopen()'s the .so and hooks to the entry, for every input object file.
>> After that, it simply calls ptr_list_concat() on the opened symbol lists,
>> and serializes the resulting combined list. There is of course nothing
>> wrong if we modify the data obtained from the .so, as it is cow-mmaped.
> ...
>> Well, I serialize the data into C, and then compile it into .so, if
>> that was the question. You might want to apply the first patch
>> and look at the serialization-test output.
>
> OK. I just realized that you are building a completely different kind
> of "linker" than I have in mind.
>
> Generate C source file and let gcc to compile and link it is an
> interesting idea. But I think it is a step back wards.
>

No, that's not how it works. ;)
Please compile and run the code. And look at what is actually generated.
Or wait a bit, I'll try to describe the serialization process in more detail.
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

Christopher Li | 4 Sep 12:35

Re: [PATCH 0/10] Sparse linker

On Thu, Sep 4, 2008 at 2:41 AM, Alexey Zaytsev <alexey.zaytsev <at> gmail.com> wrote:
> No, that's not how it works. ;)
> Please compile and run the code. And look at what is actually generated.
> Or wait a bit, I'll try to describe the serialization process in more detail.
>

I did. It generate C *source* code like this:

=============cut =============
#include "test.sparse_declarations.c"

#define NULL ((void *)0)
static struct a_wrapper __a_0 = {
        .payload = {
                .d = 1,
                .b_ptr = &__b_0.payload,
        },
};
static struct b_wrapper __b_0 = {
        .payload = {
                .k = 11,
                .a_ptr = &__a_1.payload,
        },
};
============ paste ===========

I assume you intend to use a real compiler(gcc) to compile
and link that code, no?

I haven't fully understand how you use that piece of C code. But my
(Continue reading)

Alexey Zaytsev | 4 Sep 15:29

Re: [PATCH 0/10] Sparse linker

On Thu, Sep 4, 2008 at 2:35 PM, Christopher Li <sparse <at> chrisli.org> wrote:
> On Thu, Sep 4, 2008 at 2:41 AM, Alexey Zaytsev <alexey.zaytsev <at> gmail.com> wrote:
>> No, that's not how it works. ;)
>> Please compile and run the code. And look at what is actually generated.
>> Or wait a bit, I'll try to describe the serialization process in more detail.
>>
>
> I did. It generate C *source* code like this:
>
> =============cut =============
> #include "test.sparse_declarations.c"
>
> #define NULL ((void *)0)
> static struct a_wrapper __a_0 = {
>        .payload = {
>                .d = 1,
>                .b_ptr = &__b_0.payload,
>        },
> };
> static struct b_wrapper __b_0 = {
>        .payload = {
>                .k = 11,
>                .a_ptr = &__a_1.payload,
>        },
> };
> ============ paste ===========
>
> I assume you intend to use a real compiler(gcc) to compile
> and link that code, no?
>
(Continue reading)

Alexey Zaytsev | 4 Sep 15:35

Re: [PATCH 0/10] Sparse linker

On Thu, Sep 4, 2008 at 5:29 PM, Alexey Zaytsev <alexey.zaytsev <at> gmail.com> wrote:
> On Thu, Sep 4, 2008 at 2:35 PM, Christopher Li <sparse <at> chrisli.org> wrote:
>> On Thu, Sep 4, 2008 at 2:41 AM, Alexey Zaytsev <alexey.zaytsev <at> gmail.com> wrote:
>>> No, that's not how it works. ;)
>>> Please compile and run the code. And look at what is actually generated.
>>> Or wait a bit, I'll try to describe the serialization process in more detail.
>>>
>>
>> I did. It generate C *source* code like this:
>>
>> =============cut =============
>> #include "test.sparse_declarations.c"
>>
>> #define NULL ((void *)0)
>> static struct a_wrapper __a_0 = {
>>        .payload = {
>>                .d = 1,
>>                .b_ptr = &__b_0.payload,
>>        },
>> };
>> static struct b_wrapper __b_0 = {
>>        .payload = {
>>                .k = 11,
>>                .a_ptr = &__a_1.payload,
>>        },
>> };
>> ============ paste ===========
>>
>> I assume you intend to use a real compiler(gcc) to compile
>> and link that code, no?
(Continue reading)

Christopher Li | 4 Sep 21:04

Re: [PATCH 0/10] Sparse linker

On Thu, Sep 4, 2008 at 6:35 AM, Alexey Zaytsev <alexey.zaytsev <at> gmail.com> wrote:

>> Ok, let me try to explain how the stuff works. Please note that in
>
> Ugh, my pretty code listings got corrupted by the bloody gmail.
> Here is a better version: http://zaytsev.su/explanation.txt

Thanks for your detail explain. It just confirm my reading of your
code. I stand by my original feedback:

- Using C source code as the output format is bad and unnecessary.
  It depend on gcc to process the intermediate C source file.

- Using dlopen to load the module does not have the fine grain control
  of the which symbol need to resolve and which is doesn't. The linked
  sparse object code for the whole linux kernel will be huge. Dynamic
  loading of 300M bytes of .so file is not fun.

- I can see you link all the define symbol together that way. In order to do
  inter-function check effectively, we need the have the reverse mapping
  as well. It need to perform task like this:
  "Get me a list of the function who has reference to spin_lock()".

  If I am writing a spin_lock checker.  I can look at who used spin_lock
  and only load those functions as needed.
  It is much better than scanning every single one of the kernel function to
  search for the spin_lock function call.

- The extra 4 bytes per structure storage on disk can be eliminated.
  I agree you need some meta data to track the object before you dump
(Continue reading)

Alexey Zaytsev | 4 Sep 22:21

Re: [PATCH 0/10] Sparse linker

On Thu, Sep 4, 2008 at 11:04 PM, Christopher Li <sparse <at> chrisli.org> wrote:
> On Thu, Sep 4, 2008 at 6:35 AM, Alexey Zaytsev <alexey.zaytsev <at> gmail.com> wrote:
>
>>> Ok, let me try to explain how the stuff works. Please note that in
>>
>> Ugh, my pretty code listings got corrupted by the bloody gmail.
>> Here is a better version: http://zaytsev.su/explanation.txt
>
> Thanks for your detail explain. It just confirm my reading of your
> code. I stand by my original feedback:
>
> - Using C source code as the output format is bad and unnecessary.
>  It depend on gcc to process the intermediate C source file.
>
Mostly ack here, but I still think the C code has two advantages over
binaries: It's easy to read, and it's an easy way to get the shared
library filled with the data, see below.

The huge disadvantage is the time and the memory it takes to compile
the C code.

> - Using dlopen to load the module does not have the fine grain control
>  of the which symbol need to resolve and which is doesn't. The linked
>  sparse object code for the whole linux kernel will be huge. Dynamic
>  loading of 300M bytes of .so file is not fun.

Here I have to disagree. Loading the data from an .so might actually the
most evfficient method. See, the bulk of data of the .so is simply mmap'ed
read-only, with only the GOT being read-write, and when mapping with
RTLD_LAZY, the pointers are resolved only when you follow them, completely
(Continue reading)

Christopher Li | 4 Sep 23:24

Re: [PATCH 0/10] Sparse linker

On Thu, Sep 4, 2008 at 1:21 PM, Alexey Zaytsev <alexey.zaytsev <at> gmail.com> wrote:
> Mostly ack here, but I still think the C code has two advantages over
> binaries: It's easy to read, and it's an easy way to get the shared
> library filled with the data, see below.

It does not stop you to have some parsing tool to generate readable
format from the object dump. But using the C source as primary way to
dump object is letting the tail whack the dog. The on disk format should
be optimized towards easy for checker rather than human to read it.

> The huge disadvantage is the time and the memory it takes to compile
> the C code.

And the run time dependency of gcc.

> Here I have to disagree. Loading the data from an .so might actually the
> most evfficient method. See, the bulk of data of the .so is simply mmap'ed
> read-only, with only the GOT being read-write, and when mapping with
> RTLD_LAZY, the pointers are resolved only when you follow them, completely
> transparently to us. You don't need the fine-grained control, the OS just does
> the right thing for you. And if the checker needs to look at the bulk
> of the data,

Are you sure?

Quote the man page:
===================
RTLD_LAZY
    Perform lazy binding. Only resolve symbols as the code that
references them is executed. If the symbol is never referenced, then
(Continue reading)

Alexey Zaytsev | 5 Sep 11:49

Re: [PATCH 0/10] Sparse linker

On Fri, Sep 5, 2008 at 1:24 AM, Christopher Li <sparse <at> chrisli.org> wrote:
> On Thu, Sep 4, 2008 at 1:21 PM, Alexey Zaytsev <alexey.zaytsev <at> gmail.com> wrote:
>> Mostly ack here, but I still think the C code has two advantages over
>> binaries: It's easy to read, and it's an easy way to get the shared
>> library filled with the data, see below.
>
> It does not stop you to have some parsing tool to generate readable
> format from the object dump. But using the C source as primary way to
> dump object is letting the tail whack the dog. The on disk format should
> be optimized towards easy for checker rather than human to read it.
>
>> The huge disadvantage is the time and the memory it takes to compile
>> the C code.
>
> And the run time dependency of gcc.
>
>> Here I have to disagree. Loading the data from an .so might actually the
>> most evfficient method. See, the bulk of data of the .so is simply mmap'ed
>> read-only, with only the GOT being read-write, and when mapping with
>> RTLD_LAZY, the pointers are resolved only when you follow them, completely
>> transparently to us. You don't need the fine-grained control, the OS just does
>> the right thing for you. And if the checker needs to look at the bulk
>> of the data,
>
> Are you sure?
>
> Quote the man page:
> ===================
> RTLD_LAZY
>    Perform lazy binding. Only resolve symbols as the code that
(Continue reading)

Tommy Thorn | 4 Sep 03:54

Re: Fwd: [PATCH 0/10] Sparse linker

Christopher Li wrote:
> I would rather not have that 4 byte prepended to every
> structure. Serialize is just one short stage of the life cycle
> of those c structures. Having the permanent extra space
> for just that is unnecessary. That 4 bytes meta data also
> limits what C structure you can work on. All you need
> is being able to map a point into some serialize object
> to keep track which object is tracked and which one is not.
>
> After you serialized the data. The meta data can be drop
> completely. So the price to pay is for every unknown object
> pointer, you need to do a dictionary look up. Only during
> the dumping stage. But that price is actually very small,
> when you dumping objects. You are mostly limit by the disk
> any way. The plus side is: you can work with any objects.
> You don't need to waste extra memory for serialization
> when you are not doing serialization. You can leave the
> object allocation code unchanged.
>   

I concur and just wanted to point out that this technique has been used 
in the garbage collector for functional languages for the same reason: 
the type information is very small and almost completely static; no need 
to replicate it all over the data. It does make marshaling (this is the 
common terminology for what Alex calls "serialization") slightly more 
complicated.

Tommy

--
(Continue reading)


Gmane