Re: Recursive collect() (code included)
On Aug 31, 2012, at 12:54 AM, Juha Jeronen <juha.jeronen@...> wrote:
> On 30/08/12 02:43, Aaron Meurer wrote:
>> On Wed, Aug 29, 2012 at 1:36 AM, Juha Jeronen <juha.jeronen@...> wrote:
>>> I think the functionality should be split into two parts:
>>> 1) The automatic symbol list generation. This could be used as a default for
>>> collect() and rcollect(), if no symbols are given.
>>> In the case of rcollect(), the list should probably be generated at each
>>> level of recursion separately to achieve optimal collection (consider e.g.
>>> "c*a*(b*x + b*y) + c*b*(a*x + a*y)").
>>> I think the symbol list generator could also be exposed in the API, as there
>>> might be some unanticipated uses for it.
>> Is this different from .free_symbols?
> Yes, it produces a particular ordering, which is critical for how
> collect() works.
> I originally used .atoms() (should probably have used .free_symbols in
> 0.7.x), but then wrote this custom function to get the ordering that was
> Primary sort is descending by #occurrences, secondary by how near the
> top of the expression tree the topmost occurrence was seen.
> The idea behind this is heuristic. Loosely speaking, on average it
> should pay off better to collect first by those symbols which have more
> occurrences, since there being so many of each of them, they have the
> most potential for reduction.
> E.g. returning to the trivial example,
> sy.collect( "a*b*c*d + b*c*d + c*d + d", ["d", "c", "b", "a"] )
> => d*(a*b*c + b*c + c + 1)
> sy.collect( "a*b*c*d + b*c*d + c*d + d", ["a", "b", "c", "d"] )
> => a*b*c*d + b*c*d + c*d + d
> (i.e. does nothing) since the original expression is already collected
> w.r.t. "a".
> So, it's critical that in this particular example, the ordering is d, c,
> b, a.
> The secondary sort criterion was added to handle expressions like
> Uin*(1 + u0_x)*(u0/(__uvmag__1__))
> where the optimal collection is to do nothing. There is one of each
> symbol here, but still, what happens depends on the order in which they
> are given to collect(). If we start with u0_x, that's not good...
> In this case, since the expression is not in expanded form, it is
> critical to pick the collection order in such a way that symbols nearer
> the top of the expression tree are chosen first (choosing a "deeper"
> symbol first causes unnecessary expansion).
> Of course, in many cases the input to recursive_collect() will be in
> expanded form, but my opinion is that it would be nice to do something
> semi-sensible also when it isn't.
> (I.e. warn about it in the docstring, somewhat like the warning for
> simplify(): robust behaviour is only guaranteed for expanded
> expressions, but when robustness is not a requirement, the input doesn't
> need to be in expanded form.)
>>> 2) The two ways of recursive collection: current rcollect() vs. the
>>> suggested code. There may be cases where either of them is preferable, so I
>>> think both are needed.
>>> In my opinion, the most logical place for this part of the new functionality
>>> would be rcollect(). Should we add a flag for choosing the mode? But which
>>> mode should be the default? (This is not exactly "deep" vs. "non-deep", but
>>> different recursion strategies.)
>>> Or should we add a new API function, keeping rcollect() as-is? This solution
>>> doesn't seem very clean.
>> Well, I actually would like to have just one function, collect(), but
>> that would require changing the API. So I would try to find a good
>> name for each strategy, and use method="name" in rcollect(). As to
>> what should be the default, I don't know.
> If we don't want to break the API, I think the current method of
> rcollect() should be the default (since otherwise the behaviour of the
> function would change).
If it will make it cleaner, we should consider breaking the API, with
a deprecation on the old one if possible.
> method="depth-first" vs. method="breadth-first"?
> Difficult to type, especially breat...breadh...aaaaahhh! How about
bfs and dfs are common abbreviations for these.
> method="sums" vs. method="all"? (Additionally, method="top" could be
> top-level-only, i.e. the current behaviour of collect().)
method="all" sounds like "apply all the methods".
Top-level only is exactly what deep=False usually does.
> Something else?
> In addition to this, we could add a flag to the existing collect(),
> which would make it behave as rcollect(). Default would be off to avoid
> breaking the API. This way, the user could use the current rcollect()
> API or the new extended collect() API.
> As an added bonus, this would be consistent with other simplifiers,
> which have exactly this behaviour: same function for recursive and
> non-recursive, with "deep" off by default.
In my experience, deep is on by default. Also in my opinion in most
cases it should be, as this is what you want most of the time.
> E.g. deep=True, and if this was enabled, the kwarg "method" would become
> available (just like the suggested change for rcollect), and collect()
> would basically call rcollect(), passing through the value for "method".
> Then rcollect() could internally use collect() in the non-recursive form.
> (Or maybe split collect() into a pure API function and an internal
> worker function to avoid the circular-looking use relationship between
> collect() and rcollect() in the suggested solution. But that depends on
> which you think is clearer; I'm fine with either option.)
Maybe collect(deep=True) can call rcollect, which we can also
deprecate. Then, when the deprecation period ends, we just remove
rcollect from __init__.py.
>> And as I said, the automatic symbols can happen if no symbols are given.
>>> And finally, about rcollect():
>>> - Is there a reason why the call syntax differs from collect()? collect()
>>> takes in a list of syms, while rcollect() uses Python's *args mechanism for
>>> the same (and calls them "vars", instead of "syms" like collect()).
>>> - Suggestion for documentation: rcollect() could be mentioned in the
>>> documentation for collect() to make it easier to find. At least I find it a
>>> bit confusing that many of the other simplifiers come with a "deep" flag,
>>> but for collect() the recursive version is a separate function. (Of course,
>>> technically there is the distinction of descending into function arguments
>>> vs. recursing into sums, so the difference may be justified.)
>>> These are valid points. We try to be consistent with our APIs, but
>>> unfortunately sometimes people either don't know or forget the conventions
>>> when contributing code.
>>> I see. Well, that's understandable :)
>> By the way, it's not so bad that one is called vars and the other
>> syms, but the thing that is bad is that one is *vars and the other is
>> syms. So one takes a list and the other doesn't.
> Mm, yes. I did try to pass in a list to rcollect() at first (as in "ah,
> just prefix the function name with an r"), until reading the
> documentation and noticing the star :)
> Consistent types are of course the most important thing in this regard,
> but I think consistent parameter naming would be a nice bonus. It would
> allow the user-programmer to learn the API faster, if the same kind of
> thing always had the same name.
Agreed. This can be changed throughout SymPy without an API break (in
other words, patches welcome).
> If the names are different, at least I tend to become wary that there is
> probably some subtle difference between what kind of symbols are
> considered "vars" vs. "syms". Only after looking at the source I could
> be sure they were the same thing :)
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To post to this group, send email to sympy@...
> To unsubscribe from this group, send email to sympy+unsubscribe@...
> For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
You received this message because you are subscribed to the Google Groups "sympy" group.
To post to this group, send email to sympy@...
To unsubscribe from this group, send email to sympy+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/sympy?hl=en.