Jaroslav Hajek | 1 Dec 08:01 2009

Re: FYI: smarter indexing by logical masks

On Sat, Nov 28, 2009 at 8:55 PM, Judd Storrs <storrsjm <at> email.uc.edu> wrote:
On Sat, Nov 28, 2009 at 1:23 AM, Jaroslav Hajek <highegg <at> gmail.com> wrote:
> Not really. Let me explain how this works (not in 3.2.x though). When a
> matrix is first used in index expression, the internal conversion to
> index_vector is, if successful, cached for subsequent uses.

That is even better. I didn't pick up on octave caching the result in
your first email. Now I don't even have to do that manually anymore
which is also a huge deal! :) When you were reporting the two times in
your benchmark, I misunderstood and thought that was about filling the
CPU cache/conditioning along the lines of timeit.m

find() returning doubles is yet another case of WWMWT that I hadn't
noticed. In a sane world index vectors would map immediately to one of
the integer array types based on the architecture. At least that's how
it works in IDL. But you're right, Matlab returns doubles. WITWWMWT D:

Great work!

I suppose it's a heritage from the old days when Matlab had no integers. Now this needs to be kept for backward compatibility. This issue is magnified by the fact that integers are "invasive", in the sense that they force integer results when mixed with reals. So, if "find" suddenly started to return integer results a lot of computations would break. Further, int64 arithmetic is not implemented in Matlab (at least in 2007 or so), so that would be problematic. double can be used to index up to 2^53 = 9e+15 elements, and that seems enough for decades to come. All in all, it wasn't really bad solution in that time, but I'm sure if they started over, they'd do things differently.
I even thought about creating a special type to hold just idx_vector and pretending to be a double matrix, postponing the conversion until needed. But that would need significant extensions to idx_vector, mimicking the capabilities of arrays (esp. indexing). Scanning real-life scripts, I didn't find enough justification for this change. The index vectors returned by "find" or "sort" are typically used in indexing, that's why they're created with pre-cached index vector, but they're also often employed in arithmetic.
All in all, the current solution seems like a good compromise to me.

best regards

RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz