Simon Peyton-Jones | 1 Mar 11:25 2013
Picon

Fixing "breaking packages"

Duncan, Mark, Simon

 

Here’s another cry of pain: http://fluffynukeit.com/2013/02/reflections-after-a-hard-day-in-haskell-gui-land/

 

Cabal is a frustratingly constraining tool.  Far too frequently I encountered packages that, when trying to install, would say installing this package will break a dozen others.  If not that, then I often would be notified that the dependencies could not be resolved. “

 

What is frustrating that we KNOW how to fix this, don’t we?  (Allow multiple installations of package P-3.2.5, each depending on different versions of its dependencies.)  We just need to liberate enough effort to do it.

 

Indeed, more people seem to be joining in with GHC/Cabal these days.  How hard would it be to write a detailed description of the implementation changes needed to support side-by-side installations, and project-manage a group to do it?

 

Simon

_______________________________________________
ghc-devs mailing list
ghc-devs@...
http://www.haskell.org/mailman/listinfo/ghc-devs
Gabor Greif | 1 Mar 13:56 2013
Picon

Re: Fixing "breaking packages"

What I found very confusing that the sequence

> cabal install wurbel

and

> cabal unpack wurbel
> cd wurbel-0.0
> cabal install

Gives us *radically* different results. The first one usually fails
when I have a "hand-patched" package (already successfully installed)
that wurbel depend on. The second one will work.

My interpretation is that the former looks at the transitive
dependency tree and thus ignores my fixes that have lead to my current
cabal world. The latter only considers direct dependencies, which is
sufficient to resolve everything, since the necessary packages are
present, and the problematic dependencies (from hackage) get dropped.

What we need for the former is a flag that says:
   "do not transitively chase dependencies of already installed packages"

This would greatly enhance the cabal experience for people who want to
try packages with HEAD GHC and thus may accelerate the adoption rate
of new GHC releases w.r.t. hackage.

Just some feedback while at it.

Cheers,

    Gabor

On 3/1/13, Simon Peyton-Jones <simonpj@...> wrote:
> Duncan, Mark, Simon
>
> Here's another cry of pain:
> http://fluffynukeit.com/2013/02/reflections-after-a-hard-day-in-haskell-gui-land/
>
> "Cabal is a frustratingly constraining tool.  Far too frequently I
> encountered packages that, when trying to install, would say installing this
> package will break a dozen others.  If not that, then I often would be
> notified that the dependencies could not be resolved. "
>
> What is frustrating that we KNOW how to fix this, don't we?  (Allow multiple
> installations of package P-3.2.5, each depending on different versions of
> its dependencies.)  We just need to liberate enough effort to do it.
>
> Indeed, more people seem to be joining in with GHC/Cabal these days.  How
> hard would it be to write a detailed description of the implementation
> changes needed to support side-by-side installations, and project-manage a
> group to do it?
>
> Simon
>
Jan Stolarek | 1 Mar 15:02 2013
Picon

Re: Fixing "breaking packages"

I'll chip in with my cries. Yesterday I spent 6 hours trying to track down a build problem in GHC 
HEAD. Turned out that it was caused by having two versions of binary package, which were 
installed to satisfy dependencies and silently broke my installation. I spent another hour today 
fixing things. I would like cabal to prevent such things from ever happening, the same way that 
Linux rpm/deb managers keep packages on the system in a consistent state.

Janek
Ian Lynagh | 1 Mar 15:15 2013

Re: Fixing "breaking packages"

On Fri, Mar 01, 2013 at 03:02:41PM +0100, Jan Stolarek wrote:
>
> fixing things. I would like cabal to prevent such things from
> ever happening, the same way that 
> Linux rpm/deb managers keep packages on the system in a consistent state.

There's one big difference here: rpm/dpkg are only used to install
things by the system administrator. But in the case of Cabal, a user
could install 'mypackage' (in their user package database) and the next
day the sysadmin could install a different instance of 'mypackage' in
the global database.

Thanks
Ian
Jan Stolarek | 1 Mar 17:13 2013
Picon

Re: Fixing "breaking packages"

> There's one big difference here: rpm/dpkg are only used to install
> things by the system administrator. But in the case of Cabal, a user
> could install 'mypackage' (in their user package database) and the next
> day the sysadmin could install a different instance of 'mypackage' in
> the global database.
Then we must come up with a way of handling such a situation. The first idea that comes to my head 
is that by default cabal would only use one database: either the global one managed by the system 
administrator or the local user database. The user should be allowed to override the default 
setting and use both package databases (as it is now) with no consistency guarantees.

Janek
Ian Lynagh | 1 Mar 17:31 2013

Re: Fixing "breaking packages"

On Fri, Mar 01, 2013 at 05:13:58PM +0100, Jan Stolarek wrote:
> > There's one big difference here: rpm/dpkg are only used to install
> > things by the system administrator. But in the case of Cabal, a user
> > could install 'mypackage' (in their user package database) and the next
> > day the sysadmin could install a different instance of 'mypackage' in
> > the global database.
> Then we must come up with a way of handling such a situation. The first idea that comes to my head 
> is that by default cabal would only use one database: either the global one managed by the system 
> administrator or the local user database.

Well, that basically means you can't use the local one, as base is in
the global one.

Even if you made it a 3 database system:
* the 'ghc' database, containing base, directory, etc
* the 'system' database, containing and packages from Debian (for example)
* the 'user' database, containing things you install
where you have the choice of (ghc + system) or (ghc + user) then that
means that you can only use packages from your OS if every single
package you want to use is packaged by the OS.

You could imagine changing things so that packages installed by OS
packages aren't actually visible, and there's some way to add them to
your user database (provided that would keep everything consistent).
Perhaps 'cabal install foo' would first check to see if there is a
suitable 'global' foo that it can just register in its database. It
would be a more klunky workflow, but perhaps better than the status quo.

Thanks
Ian
Max Bolingbroke | 1 Mar 19:24 2013
Picon

Re: Fixing "breaking packages"

On 1 March 2013 14:15, Ian Lynagh <ian@...> wrote:
> On Fri, Mar 01, 2013 at 03:02:41PM +0100, Jan Stolarek wrote:
>>
>> fixing things. I would like cabal to prevent such things from
>> ever happening, the same way that
>> Linux rpm/deb managers keep packages on the system in a consistent state.
>
> There's one big difference here: rpm/dpkg are only used to install
> things by the system administrator. But in the case of Cabal, a user
> could install 'mypackage' (in their user package database) and the next
> day the sysadmin could install a different instance of 'mypackage' in
> the global database.

I thought that "cabal install" should be viewed as installing an
instance of the requested package by recompiling the whole transitive
closure of dependencies from scratch, in a sort of NixOS-like way.
Given this view, Cabal's reuse of already compiled and installed
packages is purely an optimization that can prevent it from
recompiling some things if it is absolutely certain that doing so is
unnecessary. The problem then is just that Cabal is currently brokenly
unable to handle multiple instances of an installed package with the
same name and version. In this view, the existence of local and global
databases is straightforward: packages should always be installed in
the most-accessible DB to which you have write permissions (for
maximum sharing) and should be sourced from whichever is convenient
when they are required.

There are two complicating factors:
 1. Some packages cannot be recompiled by the user (such as base)
which breaks the mental model a bit. This s probably not too
important.
 2. In this view, does "cabal install mylibrary-1.1" actually do
anything useful? The very next program you write that tries to link
against mylibrary-1.1 may end up requiring a differently-compiled
version because of its own dependency constraints. Of course, "cabal
install myexe-1.1" is perfectly useful and well defined, and it should
be the case that if "cabal install my-dep-1 my-dep-2 ... my-dep-N"
immediately preceds "cabal build" of a package with dependencies
(my-dep-i) then compilation of that package should proceed without
requiring any dependencies to be recompiled.

It seems to me that the ideal mental model for "cabal install
mylibrary-1.1" is that it appends to a global mapping from package
name to version which are essentially the packages that are available
when you do "ghc -package mylibrary" and when using ghci. Cabals
promise should be that it adds the requested package to the global
mapping and then recompiles *everything* on your system as necessary
in order to make it possible for every package in that global mapping
to be imported simultaneously into a GHCi session.

This seems like a vaguely sensible model of how things *should* work
to me, unless I've overlooked some horrible complication. I know that
Duncan is pretty keen on Nix so this the above plan may even be his
final intention. But of course, saying all that is one thing, but
finding the time to implement it quite another...

Max
Johan Tibell | 1 Mar 19:33 2013
Picon

Re: Fixing "breaking packages"

On Fri, Mar 1, 2013 at 10:24 AM, Max Bolingbroke <batterseapower <at> hotmail.com> wrote:

I thought that "cabal install" should be viewed as installing an
instance of the requested package by recompiling the whole transitive
closure of dependencies from scratch, in a sort of NixOS-like way.
Given this view, Cabal's reuse of already compiled and installed
packages is purely an optimization that can prevent it from
recompiling some things if it is absolutely certain that doing so is
unnecessary. The problem then is just that Cabal is currently brokenly
unable to handle multiple instances of an installed package with the
same name and version. In this view, the existence of local and global
databases is straightforward: packages should always be installed in
the most-accessible DB to which you have write permissions (for
maximum sharing) and should be sourced from whichever is convenient
when they are required.

There are two complicating factors:
 1. Some packages cannot be recompiled by the user (such as base)
which breaks the mental model a bit. This s probably not too
important.
 2. In this view, does "cabal install mylibrary-1.1" actually do
anything useful? The very next program you write that tries to link
against mylibrary-1.1 may end up requiring a differently-compiled
version because of its own dependency constraints. Of course, "cabal
install myexe-1.1" is perfectly useful and well defined, and it should
be the case that if "cabal install my-dep-1 my-dep-2 ... my-dep-N"
immediately preceds "cabal build" of a package with dependencies
(my-dep-i) then compilation of that package should proceed without
requiring any dependencies to be recompiled.

It seems to me that the ideal mental model for "cabal install
mylibrary-1.1" is that it appends to a global mapping from package
name to version which are essentially the packages that are available
when you do "ghc -package mylibrary" and when using ghci. Cabals
promise should be that it adds the requested package to the global
mapping and then recompiles *everything* on your system as necessary
in order to make it possible for every package in that global mapping
to be imported simultaneously into a GHCi session.

This seems like a vaguely sensible model of how things *should* work
to me, unless I've overlooked some horrible complication. I know that
Duncan is pretty keen on Nix so this the above plan may even be his
final intention. But of course, saying all that is one thing, but
finding the time to implement it quite another...

This is the model I've been arguing for in e.g. http://blog.johantibell.com/2012/03/cabal-of-my-dreams.html

It's the only model I believe scales to e.g. executables that depend on thousands of packages (which happens for us at work). At that number of dependencies building needs to be hermetic. cabal install <lib> should just be a connivence thing you can use if you e.g. want to poke around a library using ghci or need to have the library available when you're offline.
_______________________________________________
ghc-devs mailing list
ghc-devs@...
http://www.haskell.org/mailman/listinfo/ghc-devs
Ian Lynagh | 1 Mar 19:56 2013

Re: Fixing "breaking packages"

On Fri, Mar 01, 2013 at 10:33:39AM -0800, Johan Tibell wrote:
> 
> It's the only model I believe scales to e.g. executables that depend on
> thousands of packages

Debian has approximately 30,000 packages (although admittedly I don't
know how many are libraries), and only needs a single version of each
package.

Having a single version of each package (with Hackage using a system
similar to Debian's releases and 'testing' to define the sets of package
versions) would make life a lot easier:

Library maintainers don't need to worry so much about keeping packages
working with old versions of their dependencies.

Authors know that they can use any 2 packages together, and not have to
worry about one of those packages depending on foo 1.* and the other
depending on foo 2.*.

The intractible problem of testing all combinations of versions of
dependencies, to ensure that packages really do build in all the
circumstances that they claim they do, disappears.

Thanks
Ian
Don Stewart | 1 Mar 22:03 2013
Picon

Re: Fixing "breaking packages"

Debian has a large team curating the packages.

On Mar 1, 2013 6:56 PM, "Ian Lynagh" <ian-97r1ndUNnnIyY3YROqfsYA@public.gmane.org> wrote:
On Fri, Mar 01, 2013 at 10:33:39AM -0800, Johan Tibell wrote:
>
> It's the only model I believe scales to e.g. executables that depend on
> thousands of packages

Debian has approximately 30,000 packages (although admittedly I don't
know how many are libraries), and only needs a single version of each
package.

Having a single version of each package (with Hackage using a system
similar to Debian's releases and 'testing' to define the sets of package
versions) would make life a lot easier:

Library maintainers don't need to worry so much about keeping packages
working with old versions of their dependencies.

Authors know that they can use any 2 packages together, and not have to
worry about one of those packages depending on foo 1.* and the other
depending on foo 2.*.

The intractible problem of testing all combinations of versions of
dependencies, to ensure that packages really do build in all the
circumstances that they claim they do, disappears.


Thanks
Ian


_______________________________________________
ghc-devs mailing list
ghc-devs-HC+Z4NTRIlBAfugRpC6u6w@public.gmane.org
http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@...
http://www.haskell.org/mailman/listinfo/ghc-devs
Simon Marlow | 1 Mar 22:19 2013
Picon

Re: Fixing "breaking packages"

On 01/03/13 18:24, Max Bolingbroke wrote:
> On 1 March 2013 14:15, Ian Lynagh <ian@...> wrote:
>> On Fri, Mar 01, 2013 at 03:02:41PM +0100, Jan Stolarek wrote:
>>>
>>> fixing things. I would like cabal to prevent such things from
>>> ever happening, the same way that
>>> Linux rpm/deb managers keep packages on the system in a consistent state.
>>
>> There's one big difference here: rpm/dpkg are only used to install
>> things by the system administrator. But in the case of Cabal, a user
>> could install 'mypackage' (in their user package database) and the next
>> day the sysadmin could install a different instance of 'mypackage' in
>> the global database.
>
> I thought that "cabal install" should be viewed as installing an
> instance of the requested package by recompiling the whole transitive
> closure of dependencies from scratch, in a sort of NixOS-like way.
> Given this view, Cabal's reuse of already compiled and installed
> packages is purely an optimization that can prevent it from
> recompiling some things if it is absolutely certain that doing so is
> unnecessary. The problem then is just that Cabal is currently brokenly
> unable to handle multiple instances of an installed package with the
> same name and version.

Cabal comes under fire a lot, so I'd like to point out that it's not 
just Cabal that can't handle this right now, GHC can't either :-)

And various people have been thinking a lot about how to fix it, there 
was even a SoC project last year to tackle it.  The design notes are here:

http://hackage.haskell.org/trac/ghc/wiki/Commentary/GSoCMultipleInstances

> In this view, the existence of local and global
> databases is straightforward: packages should always be installed in
> the most-accessible DB to which you have write permissions (for
> maximum sharing) and should be sourced from whichever is convenient
> when they are required.

Right - when the DB is semantically just a cache, it doesn't matter 
whether stuff is installed in the global or local database. All those 
problems just go away.

> It seems to me that the ideal mental model for "cabal install
> mylibrary-1.1" is that it appends to a global mapping from package
> name to version which are essentially the packages that are available
> when you do "ghc -package mylibrary" and when using ghci. Cabals
> promise should be that it adds the requested package to the global
> mapping and then recompiles *everything* on your system as necessary
> in order to make it possible for every package in that global mapping
> to be imported simultaneously into a GHCi session.

The new library that you just asked to be installed might be 
incompatible with some other libraries that you asked to be installed, 
and yet you want to be able to use them both with GHCi (just not at the 
same time).  I don't think we should prevent the user from doing that.

So whether "cabal install foo-1.0" should store some state somewhere 
that says the user prefers foo-1.0 over other versions of foo is an 
interesting question.  (see the section "Simplistic Dependency 
Resolution" on the wiki page for some other thoughts on this).  One 
stance is that "cabal install foo-1.0" should do nothing except populate 
the cache; that is, it is semantically a no-op.  To have it modify some 
state breaks this nice no-op notion.

Cheers,
	Simon
Administrator | 2 Mar 00:27 2013
Picon

Re: Fixing "breaking packages"

Thanks for the GSoCMultipleInstances link: it is very informative!

It seems that there is a consensus already on what needs to be done
here: GHC and Cabal must support multiple package instances with the
same name and version (package curation and development sandboxing
have their value above and beyond this too). And there also is seems
to be a general design of how this needs to be done.

Assuming that a package instance is identified by
{PackageName}-{Version}-{InstanceId} here are some specific comments:

** What are the precise inputs to generating {InstanceId}? The is a
key question and the rest of the design will flow from it.

** When developing a package or multiple packages there is no point in
keeping track of multiple instances (i.e. don't install). Cabal
sandboxing or a local package db where {InstanceId} is a constant is
enough. Cabal will, however, need to find their other package instance
dependencies in the user db or system db.

> [GSoCMultipleInstance] There are three identifiers:
> [GSoCMultipleInstance] XXXX: the identifier appended to the installation directory so that installed
packages do not clash with each other
> [GSoCMultipleInstance] YYYY: the InstalledPackageId, which is an identifier used to uniquely
identify a package in the package database.
> [GSoCMultipleInstance] ZZZZ: the ABI hash derived by GHC after compiling the package
** It would be nice to reduce the complexity here and strive for a
single {InstanceId} that together with {PackageName} and {Version} are
used throughout (libs, package.conf.d, etc)

> [GSoCMultipleInstance] "we need to distinguish between two packages that have identical ABIs but
different behaviour (e.g. a bug was fixed)"
** This is why the package version {Version} exists. If a bug was
fixed, a proper release process must increase the package version and
the unique hash/id should not try to fix this.

> [GSoCMultipleInstance] "We define a new Cabal Hash that hashes the compilation inputs (the
LocalBuildInfo and the contents of the source files)"
** I am not sure why hashing the sources here is important: an added
space character could render a different hash but the object file
could be exactly the same.
** There is paragraph later in the document that describes what could
be the motivation here: installing unreleased packages (a clean
install vs a dirty install).

> [GSoCMultipleInstance] "ZZZZ is recorded in the package database as a new field abi-hash. When two
packages have identical ZZZZs then they are interface-compatible, and the user might in the future want
to change a particular dependency to use a different package but the the same ZZZZ. We do not want to make
this change automatically, because even when two packages have identical ZZZZs, they may have different
behaviour (e.g. bugfixes)."
** It is not clear to me in what cases will this be useful. If my
.cabal defines that I depend on a version 1.2.3 (or a range) this
assumes these dependencies are interface compatible and the
InstallPlan should be able to pick one that makes most sense (same for
bug fixes). I don't deny that this may be an interesting requirement,
but sounds like secondary to me.
** I am a bit confused by who will be responsible for generating this
{InstanceId}: Cabal or GHC? My initial thought was that GHC should be
responsible for defining the required inputs and generating the
appropriate {InstanceId} specially since it needs to be able to
traverse package dependencies for linking/ghci. However, maybe this is
not an issue since the package DB will simply be a DAG with specific
{InstanceId} pointers between nodes/dependencies?

> [GSoCMultipleInstance] The best tool for determining suitable package instances to use as build inputs
is cabal-install. However, in practice there will be many situations where users will probably not have
the full cabal-install functionality available:
> [GSoCMultipleInstance] invoking GHCi from the command line,
> [GSoCMultipleInstance] invoking GHC directly from the command line,
> [GSoCMultipleInstance] invoking the configure phase of Cabal (without using cabal-install).
** If the package DB stores a graph of
{PackageName}-{Version}-{InstanceId} packages connected to other
specific package instances (e.g. the mypkg-1.0-1234 package instance
depends on the yourpkg-1.1-9876 package instance), navigating this DAG
is straightforward and I don't see why cabal-install would be needed
here. Maybe the issue is selecting the first package instance based on
a given {PackageName}-{Version} or just {PackageName}? Maybe the
design here should make sure that there are some minimal attributes
that GHC/GHCi can query to decide what initial package instance to
pick.
Simon Peyton-Jones | 1 Mar 23:02 2013
Picon

RE: Fixing "breaking packages"

|  I thought that "cabal install" should be viewed as installing an
|  instance of the requested package by recompiling the whole transitive
|  closure of dependencies from scratch, in a sort of NixOS-like way.
|  Given this view, Cabal's reuse of already compiled and installed
|  packages is purely an optimization that can prevent it from
|  recompiling some things if it is absolutely certain that doing so is
|  unnecessary. The problem then is just that Cabal is currently brokenly
|  unable to handle multiple instances of an installed package with the
|  same name and version. 

I believe that what you describe is precisely The Glorious Plan
http://hackage.haskell.org/trac/ghc/wiki/Commentary/GSoCMultipleInstances

It's just that no one has time to do it.  That's why I was raising it (again) to see if anyone has any bright ideas
for un-gluing this particular log-jam.

Simon
Stephen Paul Weber | 1 Mar 21:53 2013
Picon

Re: Fixing "breaking packages"

Somebody claiming to be Simon Peyton-Jones wrote:
>"Far too frequently I encountered packages that, when trying to install, 
>would say installing this package will break a dozen others."

I also get this message sometimes, but I never consider it a problem.  
I just add all the packages that would be broken also to the command line, 
and that informs the constraints solver a bit more (and rebuilds some 
things) and then works.

I've actually wished for a switch that just does this, but cut-n-paste the 
packages it tells me about works fine.

--

-- 
Stephen Paul Weber,  <at> singpolyma
See <http://singpolyma.net> for how I prefer to be contacted
edition right joseph
_______________________________________________
ghc-devs mailing list
ghc-devs@...
http://www.haskell.org/mailman/listinfo/ghc-devs

Gmane