Adam Gundry | 26 Apr 10:08 2013
Picon
Picon

Diving into the records swamp (possible GSoC project)

Hi,

I am hoping to do a GSoC project this year working on GHC, and have been
pointed in the direction of the records issue (in particular, the desire
to overload field names). This has been discussed on-and-off for years,
and while there are lots of ideas [1], little has been implemented in
GHC itself.

The plan would be to implement a solution to the "narrow issue" of
overloaded field names, along the lines of Simon PJ's SORF proposal (on
the wiki). This would provide a basis for experimentation with
first-class record types. While there are still design issues to
resolve, the broad plan is clear and I'm confident it can be implemented
in a summer and without overly restricting future record system designs.

Does this sound like a reasonable strategy? I'd appreciate comments and
criticism, although arguing about the details of the design should
perhaps wait.

(A little about me: I'm a PhD student working on type inference for
Haskell and dependent types, with about four years' Haskell experience
including work on big type-system related projects. I am familiar with
the theory behind GHC, but I haven't worked on the code before.)

Thanks,

Adam Gundry

[1] http://hackage.haskell.org/trac/ghc/wiki/Records
(Continue reading)

AntC | 26 Apr 10:55 2013
Picon

Re: Diving into the records swamp (possible GSoC project)

> Adam Gundry <adam.gundry <at> strath.ac.uk> writes:
> 
> Hi,
> 
> I am hoping to do a GSoC project this year working on GHC, and have been
> pointed in the direction of the records issue (in particular, the desire
> to overload field names).

Heck you're brave!

Are you sure you want to step into the aggravated issue of changing the 
dot operator from being function composition?
Are you going to use explicit type application? ("The type of get is very 
odd.")
Are you going to handle type-changing update?

> 
> The plan would be to implement a solution to the "narrow issue" of
> overloaded field names, along the lines of Simon PJ's SORF proposal

So has <someone> decided that SORF is the best of those many proposals? I 
guess it's because it comes with the SPJ ring of confidence?

Before jumping to that decision, I suggest you/your sponsors consider the 
implications of the "NewAxioms" stuff in GHC Head [2] to 
support 'controlled' overlap.

I think overlap is the only extra feature needed to support the DORF or 
TPDORF proposals. (Plus the syntactic sugar already outlined in that 
proposal.)
(Continue reading)

Aleksandar Dimitrov | 26 Apr 17:49 2013
Picon

Re: Diving into the records swamp (possible GSoC project)

Slightly, off-topic, but just because I've been spending my last couple of days
trying to shoehorn an inheritance-based subytping type system into Haskell
(without full OO-power, so no methods or mutable state.)

> Oleg/Ralf's HList paper covers all the ground for first-class records. It 
> depends heavily on overlaps, which is why the NewAxioms stuff would work 
> in really well.

I've been kicking around the idea of re-implementing HList on the basis of the
new DataKinds [1] extension. I don't know if there'd be much of a need for that,
though. More generally, I'm wondering what can be done on extensible records and
inheritance-based type systems with FC-pro under one's belt…

Here's one thing I don't like about the "current" way HList-based extensible
record are represented (and used in OOHaskell [2]): the access time is linear in
the number of records a certain type has. Somehow just the thought of "reorder
the records in your constructors to make your program go faster" makes me cringe
a little. Oleg and Ralf do hint at implementing some binary search tree based
technique to alleviate the problem, but I have no idea how to go about it,
except extending the compiler.

~ A.

References

[1] Yorgey, Brent A., et al. "Giving Haskell a promotion." Proceedings of the
8th ACM SIGPLAN workshop on Types in language design and implementation. ACM,
2012.
[2] Kiselyov, Oleg, and Ralf Lämmel. "Haskell's overlooked object system." arXiv
preprint cs/0509027 (2005).
(Continue reading)

adam vogt | 26 Apr 20:29 2013
Picon

Re: Diving into the records swamp (possible GSoC project)

Hi Aleksandar,

This library for extensible records does use -XDataKinds:
<http://hackage.haskell.org/package/vinyl>. It doesn't have as many
definitions as HList, but that might be because more recent extensions
are more powerful. Many other libraries are listed
<http://www.haskell.org/haskellwiki/Extensible_record#Libraries_on_hackage>.
I don't know of a comparison between all of them.

--
Adam
AntC | 28 Apr 11:44 2013
Picon

Re: HList records access time [was: Diving into the records swamp]

> Aleksandar Dimitrov <aleks.dimitrov <at> gmail.com> writes:
>

Hi Aleksandar, I was hoping that Oleg himself would answer the second part 
of your post, as he did the part re DataKinds:

> 
> Here's one thing I don't like about the "current" way HList-based 
> extensible record are represented (and used in OOHaskell [2]):
> the access time is linear in
> the number of records a certain type has.
> Somehow just the thought of "reorder the records in your
> constructors to make your program go faster" makes me cringe
> a little.

Yes, it would me!

I thought that usually the instance matching for HList happens at compile 
time(?) So reordering might make compiles faster, but that should be dealt 
with before run-time(?)

I guess it matters whether the 'shape' of the HLists is knowable at 
compile time, or the records are built 'on the fly' at run time.

Here's a possibly relevant idea that I would try for myself if only my day 
job didn't get in the way of my life so much ;-(. 

Instead of Type-Indexed Products (TIP's as the HList paper calls them), 
how about Type-Indexed tuples (TIple's) like this:

(Continue reading)

Adam Gundry | 27 Apr 10:50 2013
Picon
Picon

Re: Diving into the records swamp (possible GSoC project)

Hi AntC,

Thanks for the feedback!

On 26/04/13 09:55, AntC wrote:
>> Adam Gundry <adam.gundry <at> strath.ac.uk> writes:
>> 
>> Hi,
>> 
>> I am hoping to do a GSoC project this year working on GHC, and
>> have been pointed in the direction of the records issue (in
>> particular, the desire to overload field names).
> 
> Heck you're brave!

Or possibly stupid. ;-) That's what I'm trying to figure out. There are
certainly other, less controversial, things I might work on!

Your questions are the kind of detailed issue that I'm tempted to
postpone for the time being, although I guess I can state my prejudices:

> Are you sure you want to step into the aggravated issue of changing 
> the dot operator from being function composition?

I'd prefer to leave dot alone and add a new operator for record
field projection, mostly because I don't relish trying to modify the
parser, but I am open to other opinions.

> Are you going to use explicit type application? ("The type of get is 
> very odd.")
(Continue reading)

Petr Pudlák | 26 Apr 20:53 2013
Picon

Re: Diving into the records swamp (possible GSoC project)

Hi Adam,

very nice idea. As the others, I'm curious why you chose to implement SORF in favor of the other ideas?

I just read the SORF proposal, and I'm a bit concerned about what error messages would GHC issue when someone would type incorrect code involving such records. Currently Haskell's error messages already pose a barrier for newcomers (like "No instance for (Num (a -> a))"), and if records are converted into those very complicated `Has` instances, type errors would be probably undecipherable even for moderate skilled Haskell users. Considering that records are a basic feature of Haskell and something that  people with OOP background are familiar with, this could result in a feature that would without doubts deter many (if not most) newcomers. So do you think it would be possible to implement it in such a way that users get sensible type error messages?

  Best regards,
  Petr



2013/4/26 Adam Gundry <adam.gundry <at> strath.ac.uk>
Hi,

I am hoping to do a GSoC project this year working on GHC, and have been
pointed in the direction of the records issue (in particular, the desire
to overload field names). This has been discussed on-and-off for years,
and while there are lots of ideas [1], little has been implemented in
GHC itself.

The plan would be to implement a solution to the "narrow issue" of
overloaded field names, along the lines of Simon PJ's SORF proposal (on
the wiki). This would provide a basis for experimentation with
first-class record types. While there are still design issues to
resolve, the broad plan is clear and I'm confident it can be implemented
in a summer and without overly restricting future record system designs.

Does this sound like a reasonable strategy? I'd appreciate comments and
criticism, although arguing about the details of the design should
perhaps wait.

(A little about me: I'm a PhD student working on type inference for
Haskell and dependent types, with about four years' Haskell experience
including work on big type-system related projects. I am familiar with
the theory behind GHC, but I haven't worked on the code before.)

Thanks,

Adam Gundry


[1] http://hackage.haskell.org/trac/ghc/wiki/Records

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Adam Gundry | 27 Apr 10:51 2013
Picon
Picon

Re: Diving into the records swamp (possible GSoC project)

Hi Petr,

On 26/04/13 19:53, Petr Pudlák wrote:
> Hi Adam,
> 
> very nice idea. As the others, I'm curious why you chose to implement
> SORF in favor of the other ideas?

As I've commented in a message just now [1], by mentioning SORF I didn't
mean to exclude taking on board the other proposals (particularly
DORF/TPDORF). I do think a type-based approach is the way forward, with
a new sort of constraint that gives a polymorphic type to record
projection. The plans based on changing the name-spacing rules either
need type-based resolution as well, or require field names to be
prefixed with the relevant data type (so they don't fully solve the
problem).

Also, I'm a type theorist, so obviously the type-based solution is the
best one. ;-)

> I just read the SORF proposal, and I'm a bit concerned about what error
> messages would GHC issue when someone would type incorrect code
> involving such records. Currently Haskell's error messages already pose
> a barrier for newcomers (like "No instance for (Num (a -> a))"), and if
> records are converted into those very complicated `Has` instances, type
> errors would be probably undecipherable even for moderate skilled
> Haskell users. Considering that records are a basic feature of Haskell
> and something that  people with OOP background are familiar with, this
> could result in a feature that would without doubts deter many (if not
> most) newcomers. So do you think it would be possible to implement it in
> such a way that users get sensible type error messages?

You're right to raise this as an issue. I think it is essential to get
good error messages, or at least no worse than we have already! Rather
than "No instance for Has r l t" we want it to say something like "Type
r has no field l".  This justifies special treatment of Has constraints
by the compiler, rather than simply desugaring to an encoding.
Constraint-based type inference means this shouldn't be too difficult.

The current description of SORF doesn't really distinguish between the
user's perspective and the implementer's. This is something that we
should change (so it's clear how much users will need to understand);
but perhaps I'll wait and see if my GSoC proposal is accepted first!

>   Best regards,
>   Petr

Thanks for your comments,

Adam

[1] http://www.haskell.org/pipermail/haskell-cafe/2013-April/107876.html
Johan Tibell | 26 Apr 21:46 2013
Picon

Re: Diving into the records swamp (possible GSoC project)

Hi Adam,

Since we have already had *very* long discussions on this topic, I'm worried that I might open a can of worms be weighing in here, but the issue is important enough to me that I will do so regardless.

Instead of endorsing one of the listed proposals directly, I will emphasize the problem, so we don't lose sight of it. The problem people run into *in practice* and complain about in blog posts, on Google+, or privately when we chat about Haskell over beer, is that they would like to write a record definition like this one:

    data Employee = Employee { id :: Int, name :: String }

    printId :: Employee -> IO ()
    printId emp = print $ id emp

but since that doesn't work well in Haskell today due to name collisions, the best practice today is to instead write something like:

    data Employee = Employee { employeeId :: Int, employeeName :: String }

    printId :: Employee -> IO ()
    printId emp = print $ employeeId emp

The downsides of the latter have been discussed elsewhere, but briefly they are:

 * Overly verbose when there's no ambiguity.
 * Ad-hoc prefix is hard to predict (i.e. sometimes abbreviations of the data type name are used).

The important requirement, which might seem a bit obvious, is that any solution to this problem better not be *even more* verbose than the second code snippet above. If I understand the SORF proposal correctly, you would write:

    data Employee = Employee { id :: Int, name :: String }

    printId :: Employee -> IO ()
    printId emp = print $ emp.id

Is that correct or do you have to replace 'Employee' with 'r { id :: Int }' in the type signature of 'printId'?

The discussions about an overhauled record system also involve lots of talk about record sub-typing, extensible records, and other more advanced features. I'd like to point out that there doesn't seem to be a great demand for these features. They might be nice-to-haves or might fall out naturally from a solution to the namespacing problem above, but they are in fact not needed to solve the common problem people have with the Haskell record system.

Cheers,
  Johan

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Adam Gundry | 27 Apr 10:52 2013
Picon
Picon

Re: Diving into the records swamp (possible GSoC project)

Hi Johan,

On 26/04/13 20:46, Johan Tibell wrote:
> Hi Adam,
> 
> Since we have already had *very* long discussions on this topic, I'm
> worried that I might open a can of worms be weighing in here, but the
> issue is important enough to me that I will do so regardless.

I'm the one busily opening this particular can. It's good to know it's
an important one though!

Thanks for characterising the problem so neatly:

> Instead of endorsing one of the listed proposals directly, I will
> emphasize the problem, so we don't lose sight of it. The problem people
> run into *in practice* and complain about in blog posts, on Google+, or
> privately when we chat about Haskell over beer, is that they would like
> to write a record definition like this one:
> 
>     data Employee = Employee { id :: Int, name :: String }
> 
>     printId :: Employee -> IO ()
>     printId emp = print $ id emp
> 
> but since that doesn't work well in Haskell today due to name
> collisions, the best practice today is to instead write something like:
> 
>     data Employee = Employee { employeeId :: Int, employeeName :: String }
> 
>     printId :: Employee -> IO ()
>     printId emp = print $ employeeId emp
> 
> The downsides of the latter have been discussed elsewhere, but briefly
> they are:
> 
>  * Overly verbose when there's no ambiguity.
>  * Ad-hoc prefix is hard to predict (i.e. sometimes abbreviations of the
> data type name are used).
> 
> The important requirement, which might seem a bit obvious, is that any
> solution to this problem better not be *even more* verbose than the
> second code snippet above. If I understand the SORF proposal correctly,
> you would write:
> 
>     data Employee = Employee { id :: Int, name :: String }
> 
>     printId :: Employee -> IO ()
>     printId emp = print $ emp.id <http://emp.id>
> 
> Is that correct or do you have to replace 'Employee' with 'r { id :: Int
> }' in the type signature of 'printId'?

That's correct. The most general type (inferred if the annotation is
omitted) will be something like

printId :: r { id :: Int } => r -> IO ()

but you are free to declare a more specific type in the usual way, much
as if the constraint was 'Show r', say.

> The discussions about an overhauled record system also involve lots of
> talk about record sub-typing, extensible records, and other more
> advanced features. I'd like to point out that there doesn't seem to be a
> great demand for these features. They might be nice-to-haves or might
> fall out naturally from a solution to the namespacing problem above, but
> they are in fact not needed to solve the common problem people have with
> the Haskell record system.

Thanks, I take your point. My proposal is to implement a good solution
to the problem you've outlined; I don't think we should go all the way
to extensible records just yet, if at all.

> Cheers,
>   Johan
> 

All the best,

Adam
AntC | 27 Apr 12:23 2013
Picon

Re: Diving into the records swamp (possible GSoC project)

> Johan Tibell <johan.tibell <at> gmail.com> writes:
> 
> Instead of endorsing one of the listed proposals directly, I will 
emphasize the problem, so we don't lose sight of it. The problem people 
run into *in practice* and complain about in blog posts, on Google+, or 
privately when we chat about Haskell over beer, is that they would like to 
write a record definition like this one:
> 
>     data Employee = Employee { id :: Int, name :: String }
> 
>     printId :: Employee -> IO ()
>     printId emp = print $ id emp
> 
> but since that doesn't work well in Haskell today due to name
> collisions, ...

[I've a bit more to say on that record definition below.]

Thank you Johan, I agree we should keep clear sight of the problem. So 
let's be a bit more precise: it's not exactly the record declaration that 
causes the name collisions, it's the field selector function that gets 
created automatically. (Note that we can use xDisambiguateRecordFields to 
access fields to, errm, disambiguate.)

So I did put in a separate proposal [3] (and ticket) on that very narrow 
issue. (Simon M pointed out that I probably didn't name it very well!)

Even if we do nothing to advance the "records swamp", PLEASE can we 
provide a compiler option to suppress that function.

I envisage it might facilitate a 'cottage industry' of Template Haskell 
solutions (generating Has instances), which would be a cheap and cheerful 
way to experiment in the design space.

[3] 
http://hackage.haskell.org/trac/ghc/wiki/Records/DeclaredOverloadedRecordFi
elds/NoMonoRecordFields
(There are bound to be some fishhooks, especially around export/import of 
names from a module with no selector functions to one that's expecting 
them.)

[cont from above]
> ... the best practice today is to instead write something like:
> 
>     data Employee = Employee { employeeId :: Int, employeeName :: 
String }
> 
>     printId :: Employee -> IO ()
>     printId emp = print $ employeeId emp
> 
> The downsides of the latter have been discussed elsewhere, but briefly 
they are:
> 
>  * Overly verbose when there's no ambiguity.
>  * Ad-hoc prefix is hard to predict (i.e. sometimes abbreviations of the 
data type name are used).

I don't entirely agree with your analysis.
 * fields named `id' or `name' are very likely to clash,
   so that's a bad design (_too_ generic).
 * If you've normalised your data model [**],
   you are very likely to want exactly the same field
   in several records
   (for example employeeId in EmployeeNameAddress,
    and in EmployeePay and in EmployeeTimeSheet.)

[And this use case is what TP/DORF is primarily aimed at.]

[**] Do I need to explain what data model normalisation is? I fear that so-
called XML 'databases' mean academics don't get taught normalisation any 
more(?)

AntC

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
AntC | 27 Apr 12:41 2013
Picon

Re: Diving into the records swamp (possible GSoC project)

> Johan Tibell <johan.tibell <at> gmail.com> writes:
> 
> The discussions about an overhauled record system also involve lots of 
talk about record sub-typing, extensible records, and other more advanced 
features. I'd like to point out that there doesn't seem to be a great 
demand for these features. ...

Sorry, Johan, I really have to disagree with that.

There's lot's of Haskell to SQL interfaces that build on HList and its 
extensible record ideas (HDBC for example).

But the usability is not good (as Petr points out, and as Oleg/Ralf 
admitted back in the paper). The type error messages are long and obscure. 

> ... They might be nice-to-haves or might fall out naturally from a 
solution to the namespacing problem above, but they are in fact not needed 
to solve the common problem people have with the Haskell record system.

"the common problem people have" is that the record system is unusable 
[IMO] so doesn't get 'stretched' to see what other difficulties it has. 
There are all sorts of alternative systems (including Lenses) built with 
Template Haskell (and chewing gum and gaffer tape: that's how desperately 
bad is the current situation ;-).

I'm saying that many people find the Haskell record system 'as is' so 
dysfunctional that they give up on it! I feel strongly that as soon as we 
get past the name collissions, there'll be other blockages to using it.

I'd be interested to hear if there are any who can remember the Trex 
system, and how (un)usable it was?

AntC

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane