David Finlayson | 3 Sep 00:36
Favicon

Binary file I/O performance problems

I've been working on my first Smalltalk program which needs to read
and write large c structs from a binary file. I wrote two classes
BinaryStreamReader and BinaryStreamWriter that take a stream and can
read (or write) all of the integer and floating point types I need
(also handles byte-swapping if necessary). I wrote a test program that
focuses on just reading a small (for us) 123 Mb data file on disk. The
program takes about 166 seconds to run compared to 1.2 seconds for an
equivalent C version (140x faster than Squeak version).

As an example of the style of code I've written, here is the method
that reads an unsigned 32-bit integer:

uint32
	" returns the next unsigned, 32-bit integer from the binary stream "
	" see PositionableStream for original implimentation."
	| n a b c d |
	isBigEndian
		ifTrue:
			[ a := stream next.
			b := stream next.
			c := stream next.
			d := stream next ]
		ifFalse:
			[ d := stream next.
			c := stream next.
			b := stream next.
			a := stream next ].
	((((a notNil and: [ b notNil ]) and: [ c notNil ])) and: [ d notNil])
		ifTrue:
			[ n := a.
(Continue reading)

Randal L. Schwartz | 3 Sep 01:04
Favicon

Re: Binary file I/O performance problems

>>>>> "David" == David Finlayson <dfinlayson <at> usgs.gov> writes:

David> 	((((a notNil and: [ b notNil ]) and: [ c notNil ])) and: [ d notNil])
David> 		ifTrue:
David> 			[ n := a.
David> 			n := (n bitShift: 8) + b.
David> 			n := (n bitShift: 8) + c.
David> 			n := (n bitShift: 8) + d ]
David> 		ifFalse: [ n := nil ].

This screams for an "early answer" assistant method, something like:

computeSomething
        a isNil: [^nil].
        b isNil: [^nil].
        c isNil: [^nil].
        d isNil: [^nil].
        ^(the code with all the bitshifts).

Actually, perhaps even the use of a good detect: would be right here, if you
didn't have a, b, c, d as instvars.  In fact, that's much more likely an array
instead of four instvars, which would simplify all the repeated code.

--

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn <at> stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
David Finlayson | 3 Sep 02:09
Favicon

Re: Binary file I/O performance problems

Thanks for the style pointers. I'm a scientist, not a programmer, so
it will be rough going while I learn.

What I wanted was an exception (try/except) in case any of the reads
failed. Corrupt files are an expected case that should be handled by
the program. So I can't crash while reading (or writing). Does Squeak
have exceptions? Or is there a Smalltalk pattern for this "try to
execute this, do something else if it fails"? That answer should
probably go into another thread.

David
David Finlayson | 3 Sep 08:00
Favicon

Re: Binary file I/O performance problems

OK - I made some of the suggested changes. I broke the readers into two parts:

uint32
	"returns the next unsigned, 32-bit integer from the binary
	stream"
	isBigEndian
		ifTrue: [^ self nextBigEndianNumber: 4]
		ifFalse: [^ self nextLittleEndianNumber: 4]

Where nextLittleEndianNumber looks like this:

nextLittleEndianNumber: n
	"Answer the next n bytes as a positive Integer or
	LargePositiveInteger, where the bytes are ordered from least
	significant to most significant.
	Copied from PositionableStream"
	| bytes s |
	[bytes := stream next: n.
	s := 0.
	n
		to: 1
		by: -1
		do: [:i | s := (s bitShift: 8)
						bitOr: (bytes at: i)].
	^ s]
		on: Error
		do: [^ nil]

This (I think) cleans up some of the code smell, but for only marginal
performance improvements. It seems that I may need to implement a
(Continue reading)

Klaus D. Witzel | 3 Sep 11:53

Re: Binary file I/O performance problems

Hi David,

let me respond in "reverse" order of your points:

> I find it troubling that I am having to write code below the
> abstraction level of C to read and write data from a file.  I thought
> Smalltalk was supposed to free me from this kind of drudgery? Right
> now, Java looks good and Python/Ruby look fantastic by comparison.

Here the difference to Squeak/Smalltalk is, that the intermediate level  
routines like #uint32 are made available at the Smalltalk language level  
where users can see them, use them and modify them. Such an approach is  
seen as part of an invaluable resource by Smalltalk users. It has a price,  
yes.

But Squeak/Smalltalk can do faster, dramatically faster than what you  
observed. The .image file (10s - 100s MB) is read from disk and  
de-endianessed in a second or so. Of course this is possible only because  
the file is in a ready-to-use format, but this can be a clue when you  
perhaps want to consider alternative input methods.

> This (I think) cleans up some of the code smell, but for only marginal
> performance improvements. It seems that I may need to implement a
> buffer on the binary stream. Is there a good example on how this
> should be done in the image or elsewhere?

I don't know of a particular example (specialized somehow on your problem  
at hand, for buffered reading of arbitrary "struct"s) but this here is  
easy to do in Squeak:

(Continue reading)

David T. Lewis | 4 Sep 02:27
Favicon

Re: Binary file I/O performance problems

On Tue, Sep 02, 2008 at 11:00:54PM -0700, David Finlayson wrote:
> 
> I find it troubling that I am having to write code below the
> abstraction level of C to read and write data from a file.  I thought
> Smalltalk was supposed to free me from this kind of drudgery?

David,

You're quite right about that. The good news is that you have already
figured out how to profile, and you already know where the performance
problem is. Setting aside for the moment the issue of Squeak's awfile
file I/O performance, the quickest solution to your problem may also
be the easiest. As long as the data sets are not too large, just load
the whole file into Squeak first (use FileStream>>contentsOfEntireFile)
and *then* operate on the data.

For example, if you have data in MYDATA.BIN, and you want to load it
into Squeak and read the first 100 bytes, you can do something like this:

	| myFile dataStream |
	myFile := FileStream fileNamed: 'MYDATA.BIN'.
	[dataStream := ReadStream on: myFile contentsOfEntireFile]
		ensure: [myFile ifNotNilDo: [:f | f close]].
	dataStream next: 100.

Once you have the data in memory, things are quite fast. I know this
sounds like an odd way to handle data loading, but it actually works
very well, and buying some memory is a whole lot easier than fixing
Squeak's I/O performance ;)

(Continue reading)

David Finlayson | 5 Sep 19:59
Favicon

Re: Binary file I/O performance problems

I re-wrote the test application to load the test file entirely into
memory before parsing the data. The total time to parse the file
decreased by about 50%. Now that I/O is removed from the picture, the
new bottle neck is turning bytes into integers (and then integers into
Floats).

I know that Smalltalk isn't the common language for number crunching,
but if I can get acceptable performance out of it, then down the road
I would like to tap into the Croquet environment. That is why I am
trying to learn a way that will work.

David
Matthias Berth | 5 Sep 20:07

Re: Binary file I/O performance problems

David,

How many possible float values do you have? Maybe a lookup strategy
for the conversion is feasible...

Cheers

Matthias

On Fri, Sep 5, 2008 at 7:59 PM, David Finlayson <dfinlayson <at> usgs.gov> wrote:
> I re-wrote the test application to load the test file entirely into
> memory before parsing the data. The total time to parse the file
> decreased by about 50%. Now that I/O is removed from the picture, the
> new bottle neck is turning bytes into integers (and then integers into
> Floats).
>
> David
> _______________________________________________
> Beginners mailing list
> Beginners <at> lists.squeakfoundation.org
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>
David Finlayson | 5 Sep 20:33
Favicon

Re: Binary file I/O performance problems

For the most part, these numbers represent instrument measurements
(swath bathymetry from sonar systems). Precision ranges from 5 to 10
significant figures depending on the specific instrument being
recorded. So it wouldn't really be practical to form a look-up table
in most cases.

What attracted me to Squeak was that I was on the boat a few months
ago and got a functional navigation system built (sort-of like a
Garmin console on a pleasure boat) in about 2 days (used morphic and
the UDPSocket stuff)! That was awesome.

Then I modified the sonogram class to display sonar backscatter data
(like a black-and-white image of the sea floor) in about 2 hours. Very
cools stuff. The only problem was that the sonar data is time
consuming to parse in Squeak and so the sonogram scrolled about 1 row
per second (our system is collecting data at 8 pings per second) So it
would take me 8 hours to display 1 hour of sonar data.

The distant dream is to paint the sonar data into a Croquet world in
real time where scientists from other stations on the boat (or maybe
over the internet) can see the data rolling in as we collect it. It
would be really cool. Add in our boat as an icon, an ROV (remotely
operated vehicle) and maybe some in-water targets like fish or
whatever and I bet this would be Slashdot stuff! BUT, I need to be
able to get a handle on the speed of Squeak or this won't be
practical.

Maybe I need to write some kind of filter (pre-amplifier) in a
high-performance language as the data comes in over the network and
then re-broadcasts a decimated data set to Squeak?
(Continue reading)

Yoshiki Ohshima | 5 Sep 21:03

Re: Binary file I/O performance problems

At Fri, 5 Sep 2008 11:33:37 -0700,
David Finlayson wrote:
> 
> Then I modified the sonogram class to display sonar backscatter data
> (like a black-and-white image of the sea floor) in about 2 hours. Very
> cools stuff. The only problem was that the sonar data is time
> consuming to parse in Squeak and so the sonogram scrolled about 1 row
> per second (our system is collecting data at 8 pings per second) So it
> would take me 8 hours to display 1 hour of sonar data.

  Ah, cool.  In the OLPC Etoys image, there is a more efficient
version of Sonogram called WsSonogram, and it is about 2 times faster
than the original, and if you just add a primitive that takes a float
array and calculate the sqrt of all entries and store them into the
array, that will be 4-5 times faster or such.  The code is of course
perfectly portable across the platform (i.e., not tied to OLPC) so
probably it might be an interest of you.

> The distant dream is to paint the sonar data into a Croquet world in
> real time where scientists from other stations on the boat (or maybe
> over the internet) can see the data rolling in as we collect it. It
> would be really cool. Add in our boat as an icon, an ROV (remotely
> operated vehicle) and maybe some in-water targets like fish or
> whatever and I bet this would be Slashdot stuff! BUT, I need to be
> able to get a handle on the speed of Squeak or this won't be
> practical.

  It could be quite practical with a few extra primitives.  One could
of course imagine to utilize GPU.  That would be fairly viable.

(Continue reading)

Yoshiki Ohshima | 5 Sep 20:44

Re: Binary file I/O performance problems

At Fri, 5 Sep 2008 10:59:03 -0700,
David Finlayson wrote:
> 
> I re-wrote the test application to load the test file entirely into
> memory before parsing the data. The total time to parse the file
> decreased by about 50%. Now that I/O is removed from the picture, the
> new bottle neck is turning bytes into integers (and then integers into
> Floats).
> 
> I know that Smalltalk isn't the common language for number crunching,
> but if I can get acceptable performance out of it, then down the road
> I would like to tap into the Croquet environment. That is why I am
> trying to learn a way that will work.

  If the integers or floats are in the layout of C's int[] or float[],
there is a better chance to make it much faster.

  Look at the method Bitmap>>asByteArray and
Bitmap>>copyFromByteArray:.  You can convert a big array of non-pointer
words from/to a byte array.

  data := (1 to: 1000000) as: FloatArray.
  words := Bitmap new: data size.
  words replaceFrom: 1 to: data size with: data.
  bytes := words asByteArray.

  "and you write out the bytes into a binary file."

  "to get them back:"

(Continue reading)

nicolas cellier | 5 Sep 23:00

Re: Binary file I/O performance problems

Yoshiki Ohshima a écrit :
> At Fri, 5 Sep 2008 10:59:03 -0700,
> David Finlayson wrote:
>> I re-wrote the test application to load the test file entirely into
>> memory before parsing the data. The total time to parse the file
>> decreased by about 50%. Now that I/O is removed from the picture, the
>> new bottle neck is turning bytes into integers (and then integers into
>> Floats).
>>
>> I know that Smalltalk isn't the common language for number crunching,
>> but if I can get acceptable performance out of it, then down the road
>> I would like to tap into the Croquet environment. That is why I am
>> trying to learn a way that will work.
> 
>   If the integers or floats are in the layout of C's int[] or float[],
> there is a better chance to make it much faster.
> 
>   Look at the method Bitmap>>asByteArray and
> Bitmap>>copyFromByteArray:.  You can convert a big array of non-pointer
> words from/to a byte array.
> 
>   data := (1 to: 1000000) as: FloatArray.
>   words := Bitmap new: data size.
>   words replaceFrom: 1 to: data size with: data.
>   bytes := words asByteArray.
> 
>   "and you write out the bytes into a binary file."
> 
>   "to get them back:"
> 
(Continue reading)

nicolas cellier | 5 Sep 23:19

Re: Binary file I/O performance problems

nicolas cellier a écrit :
> Yoshiki Ohshima a écrit :
>> At Fri, 5 Sep 2008 10:59:03 -0700,
>> David Finlayson wrote:
>>> I re-wrote the test application to load the test file entirely into
>>> memory before parsing the data. The total time to parse the file
>>> decreased by about 50%. Now that I/O is removed from the picture, the
>>> new bottle neck is turning bytes into integers (and then integers into
>>> Floats).
>>>
>>> I know that Smalltalk isn't the common language for number crunching,
>>> but if I can get acceptable performance out of it, then down the road
>>> I would like to tap into the Croquet environment. That is why I am
>>> trying to learn a way that will work.
>>
>>   If the integers or floats are in the layout of C's int[] or float[],
>> there is a better chance to make it much faster.
>>
>>   Look at the method Bitmap>>asByteArray and
>> Bitmap>>copyFromByteArray:.  You can convert a big array of non-pointer
>> words from/to a byte array.
>>
>>   data := (1 to: 1000000) as: FloatArray.
>>   words := Bitmap new: data size.
>>   words replaceFrom: 1 to: data size with: data.
>>   bytes := words asByteArray.
>>
>>   "and you write out the bytes into a binary file."
>>
>>   "to get them back:"
(Continue reading)

David Finlayson | 5 Sep 23:49
Favicon

Re: Re: Binary file I/O performance problems

Unfortunately, the data is not a simple block of floats. For example,
in C here is how I read a "ping" header block from one of our vendors
formats:

/* read_xyza_ping: read ping block, returns 1 if successful, EOF if
 * end of file  */
int read_xyza_ping(FILE *fin, XYZA_Ping *pp) {
    int8_t byte[4];

    fread(&pp->linename, sizeof(int8_t), MAX_LINENAME_LEN, fin);
    fread(&pp->pingnum, sizeof(uint32_t), 1, fin);
    fread(&byte, sizeof(int8_t), 4, fin);
    fread(&pp->time, sizeof(double), 1, fin);
    fread(&pp->notxers, sizeof(int32_t), 1, fin);
    fread(&byte, sizeof(int8_t), 4, fin);
    read_posn(fin, &pp->posn);
    fread(&pp->roll, sizeof(double), 1, fin);
    fread(&pp->pitch, sizeof(double), 1, fin);
    fread(&pp->heading, sizeof(double), 1, fin);
    fread(&pp->height, sizeof(double), 1, fin);
    fread(&pp->tide, sizeof(double), 1, fin);
    fread(&pp->sos, sizeof(double), 1, fin);

    if (ferror(fin) != 0) {
        perror("sxpfile: error: (read_xyza_ping)");
        abort();
    }

    // time between 1995 - 2020?
    assert(788936400 < pp->time && pp->time < 1577865600);
(Continue reading)

Yoshiki Ohshima | 6 Sep 01:27

Re: Re: Binary file I/O performance problems

At Fri, 5 Sep 2008 14:49:29 -0700,
David Finlayson wrote:
> 
> Unfortunately, the data is not a simple block of floats. For example,
> in C here is how I read a "ping" header block from one of our vendors
> formats:

  I'm sure that there are other implications, but it sounds like you
do need some primitives to make it efficient.  I would make a
primitive that is equivalent of read_xyza_ping() that fills a Squeak
object, or if you are dealing with array of XYZA_Ping structure,
making an array of homogeneous arrays so that all linenames are stored
in a ByteArray, all pingnums are stored in a WordArray, etc.  In this
way, you may still be able to utilize the vector primitives.

-- Yoshiki
Herbert König | 6 Sep 11:45

Re: Re: Binary file I/O performance problems

Hello David,

YO>   I'm sure that there are other implications, but it sounds like you
YO> do need some primitives to make it efficient.  I would make a
YO> primitive that is equivalent of read_xyza_ping() that fills a Squeak
YO> object, or if you are dealing with array of XYZA_Ping structure,
YO> making an array of homogeneous arrays so that all linenames are stored
YO> in a ByteArray, all pingnums are stored in a WordArray, etc.  In this
YO> way, you may still be able to utilize the vector primitives.

this approach seems to give a chance of solving the sped problem.

In your original post you talked about 10 significant figures, so be
aware that float array only is 32 bit floats with only about 8
significant figures.

The second caveat is if many of your floats are in the range of 1e-38
(the closet to zero number of 32 Bit Float) Float array gets very slow
(speed degradation by a factor of 8).  I'm talking about FloatArray>>*
and *= here.

Sorry if I sound negative I just think its bad to ignore problems that
are know in advance.

--

-- 
Cheers,

Herbert   
David Finlayson | 6 Sep 17:29
Favicon

Re: Re: Binary file I/O performance problems

I have implemented a number of signal processing programs in both C99
and Python (with psyco jit). I have an 8-core Mac Pro workstation
which I can use as for parallel processing by launching multiple
instances of the code using Make scripts. An interesting thing
happened when I compared the performance of the C code to the Python
code:

The C code became I/O bound at 4 cores saturating either the disks or
the memory bus (I am not sure exactly where the bottleneck is). While
the Python version never became I/O bound at 8 cores, it did however
close to within a factor of 10 of the performance of the C code. This
suggested to me that If I had enough processors to saturate the I/O
there was no speed advantage of writing the code in C.

The next generation of workstations we buy will probably have dozens
of cores but hard drives and memory will only be marginally faster (if
history is any indication). So, if I/O is the rate limiting factor,
not cpu speed, why not look for the most productive programing
environment possible? I've always read that Smalltalk is often
considered the most productive programing environment ever invented.
So I wanted to give it a try. But I am discovering (from the point of
view of a scientist programmer like myself) it lacks a lot in
comparison to Matlab or Python (both high-level) and especially C and
C++ (lots and lots of library code).

I am going to have to weigh the pros and cons of whether it makes
since to push on with this.

David
(Continue reading)

Yoshiki Ohshima | 8 Sep 19:01

Re: Re: Binary file I/O performance problems

At Sat, 6 Sep 2008 08:29:35 -0700,
David Finlayson wrote:
> 
> The next generation of workstations we buy will probably have dozens
> of cores but hard drives and memory will only be marginally faster (if
> history is any indication). So, if I/O is the rate limiting factor,
> not cpu speed, why not look for the most productive programing
> environment possible? I've always read that Smalltalk is often
> considered the most productive programing environment ever invented.
> So I wanted to give it a try. But I am discovering (from the point of
> view of a scientist programmer like myself) it lacks a lot in
> comparison to Matlab or Python (both high-level) and especially C and
> C++ (lots and lots of library code).

  That observation on the sophistication level is quite right.  And,
Squeak's moving/compacting GC would give you some more penalty
compared to other implementations when it involves 10's of MB to GB of
data.

> I am going to have to weigh the pros and cons of whether it makes
> since to push on with this.

  We tend to do something just ok for its own need, but listening to the
other people's needs is always fun (and depressing^^).

-- Yoshiki
Yoshiki Ohshima | 5 Sep 23:21

Re: Re: Binary file I/O performance problems

At Fri, 05 Sep 2008 23:00:07 +0200,
nicolas cellier wrote:
> 
> Hi David,
> your applications is exciting my curiosity. Which company/organization 
> are you working for, if not indiscreet?

  I assume the answer is USGS, because of his email address!  Yes, it
sounds like something cool is going on.

-- Yoshiki
David Finlayson | 6 Sep 00:12
Favicon

Re: Re: Binary file I/O performance problems

Coastal and marine geology, USGS. But this isn't an official project.
Just a pipe dream of mine right now. I am not even sure I am competent
enough to pull it off by myself. However, I figure the best way to get
support for this is to build a semi-working prototype and then show it
off and see what happens.

I do wish Cog were further along though. Without Croquet, VW isn't
really an option. I don't know if other languages support the 3D
collaboration that Croquet promises. Meanwhile, I need to learn more
Smalltalk.

David
Zulq Alam | 3 Sep 01:44

Re: Binary file I/O performance problems

Hi David,

You could try using stream next: 4 to read the 4 bytes in one go:

[StandardFileStream readOnlyFileNamed: 'Base.image' do:
	[:stream |
	[stream atEnd] whileFalse:
		[stream next.
		stream next.
		stream next.
		stream next.]]] timeToRun
" 328505 "

[StandardFileStream readOnlyFileNamed: 'Base.image' do:
	[:stream |
	stream binary.
	[stream atEnd] whileFalse:
		[stream next: 4]]] timeToRun
" 144469 "

If you can, read larger chunks:

[StandardFileStream readOnlyFileNamed: 'Base.image' do:
	[:stream |
	stream binary.
	[stream atEnd] whileFalse:
		[stream next: 2048]]] timeToRun
" 343 "

[StandardFileStream readOnlyFileNamed: 'Base.image' do:
(Continue reading)

Herbert König | 3 Sep 09:17

Re: Binary file I/O performance problems

Hello David,

DF> focuses on just reading a small (for us) 123 Mb data file on disk. The
DF> program takes about 166 seconds to run compared to 1.2 seconds for an
DF> equivalent C version (140x faster than Squeak version).

number crunching and raw speed are not the points where Smalltalk
excels.

0 tinyBenchmarks gives '322824716 bytecodes/sec; 8945704 sends/sec'
which is about 9 million sends on my 1.8 GHz Pentium M.

In the browser when you will switch from Source to Byte codes in the
lowest pane (rightmost button) you will see the many sends in your
code. Some of these code fragments (e.g. the arithmetic) would be a
lot faster in any compiling language.

With this you can estimate the performance you can expect.

If it would only take one send per byte read from the file my Computer
would take about 10 seconds for 100MB.

That's the price for dynamically looking up the receiver's class for
every send.

So I guess this application is better left for other languages.

Cheers,

Herbert                            mailto:herbertkoenig <at> gmx.net
(Continue reading)

Waldemar Schwan | 3 Sep 09:37

moving files on Windows

Hello everyone.

Normaly I'm developing on MacOS 10.5. As I tryed to run my code on a  
Windows Vista deleting a file throws me an

CannotDeleteFileException: Coud not delete the old version of file D: 
\waldemar\test\movingDestionation\moveMe.txt

Because the error don't tells me why the file can't be deletet I'm  
completly stumped. The file is writeable.

What I'm trying to do is to move a file from one folder to another. To  
acomplish that I create a readOnlyFileStream on the src-file an force  
the destinationdirectory to create a new file named like the src-file.  
After that I use FileDirectory>>copyFile: to: .

moveLocalFile: aCBFile3DLocal toMountain: aCBMountain
	| srcDir destDir srcFile destFile |
	srcDir := aCBFile3DLocal file directory fileDirectory.
	destDir := FileDirectory on: aCBMountain path.

	srcFile := srcDir readOnlyFileNamed: aCBFile3DLocal file name.
	srcFile binary.
	destFile := destDir forceNewFileNamed: aCBFile3DLocal file name.
	destFile binary.

	srcDir copyFile: srcFile toFile: destFile.
	srcDir deleteFileNamed:  aCBFile3DLocal file name.

Again: This code works on Mac but don't on Windows (Vista) allsow in  
(Continue reading)


Gmane