Christopher Done | 11 Oct 22:30 2013
Picon

Compiling arbitrary Haskell code

Is there a definitive list of things in GHC that are unsafe to
_compile_ if I were to take an arbitrary module and compile it?

E.g. off the top of my head, things that might be dangerous:

* TemplateHaskell/QuasiQuotes -- obviously
* Are rules safe?
* #includes — I presume there's some security risk with including any old file?
* FFI -- speaks for itself

I'm interested in the idea of compiling Haskell code on lpaste.org,
for core, rule firings, maybe even Th expansion, etc. When sandboxing
code that I'm running, it's really easy if I whitelist what code is
available (parsing with HSE, whitelisting imports, extensions). The
problem of infinite loops or too much allocation is fairly
straight-forwardly solved by similar techniques applied in mueval.

SafeHaskell helps a lot here, but suppose that I want to also allow
TemplateHaskell, GeneralizedNewtypeDeriving and stuff like that,
because a lot of real code uses those. They only seem to be restricted
to prevent cheeky messing with APIs in ways the authors of the APIs
didn't want -- but that shouldn't necessarily be a security—in terms
of my system—problem, should it? Ideally I'd very strictly whitelist
which modules are allowed to be used (e.g. a version of TH that
doesn't have runIO), and extensions, and then compile any code that
uses them.

I'd rather not have to setup a VM just to compile Haskell code safely.
I'm willing to put some time in to investigate it, but if there's
already previous work done for this, I'd appreciate any links.
(Continue reading)

Jason Dagit | 11 Oct 23:41 2013
Picon

Re: Compiling arbitrary Haskell code




On Fri, Oct 11, 2013 at 1:30 PM, Christopher Done <chrisdone <at> gmail.com> wrote:
Is there a definitive list of things in GHC that are unsafe to
_compile_ if I were to take an arbitrary module and compile it?

E.g. off the top of my head, things that might be dangerous:

* TemplateHaskell/QuasiQuotes -- obviously
* Are rules safe?
* #includes — I presume there's some security risk with including any old file?
* FFI -- speaks for itself

It really depends on the security properties you want to maintain. That should inform your policy. For example, denial of service vs. leaking information (like password db) vs. allowing yourself to become part of a botnet. There are lots of things to consider here.

For example, lambdabot has always disallowed IO and thus needs to disallow unsafeCoerce/unsafePerformIO/unsafeInterleaveIO and anything else that introduces a "backdoor" in the type system. I think the list you have above is a good start, but wouldn't be complete for lambdabot.
 

I'm interested in the idea of compiling Haskell code on lpaste.org,
for core, rule firings, maybe even Th expansion, etc. When sandboxing
code that I'm running, it's really easy if I whitelist what code is
available (parsing with HSE, whitelisting imports, extensions). The
problem of infinite loops or too much allocation is fairly
straight-forwardly solved by similar techniques applied in mueval.

What type of sandboxing do you plan to use and what limitations does it have? For example, chroot jails can be defeated.
 

SafeHaskell helps a lot here, but suppose that I want to also allow
TemplateHaskell, GeneralizedNewtypeDeriving and stuff like that,
because a lot of real code uses those. They only seem to be restricted
to prevent cheeky messing with APIs in ways the authors of the APIs
didn't want -- but that shouldn't necessarily be a security—in terms
of my system—problem, should it? Ideally I'd very strictly whitelist
which modules are allowed to be used (e.g. a version of TH that
doesn't have runIO), and extensions, and then compile any code that
uses them.

GND can be used to cause a segfault. I don't know if it can be used to cause a more serious exploit, but I would be concerned that it can. Then again, if you're already allowing TH or arbitrary IO then those are probably much easier places to attack so it may not matter.
 

I'd rather not have to setup a VM just to compile Haskell code safely.
I'm willing to put some time in to investigate it, but if there's
already previous work done for this, I'd appreciate any links.

I don't know how well it's documented, but lambdabot has a long history of restricting the Haskell it accepts to make it safe. Other things to look at, google native client (to see how they approach sandboxing), and geordi the C++ IRC bot.

In the native client case they do fancy tricks with segment registers (to control where the sandboxed process can write to memory) and intercepting system calls in the outer part of the process. They have the case where they do everything in one process in one address space. You could imagine porting the GHC RTS to run in native client (didn't someone start on that?) and then using that to sandbox all your Haskell evaluation.
 

At the end of the day, there's always just supporting a subset of
Haskell using SafeHaskell. I'm just curious about the more general
case, for use-cases similar to my own.

I think SafeHaskell is a reasonable starting place, but I don't think it gives you a really strong guarantee yet. Everything that is inferred safe probably is (I don't know of any exploits with that part of SafeHaskell). In practice, you'll probably also want to use some trusted packages, but that requires that none of the stuff your trust is exploitable.

I hope that helps,
Jason
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Aleksey Khudyakov | 12 Oct 00:29 2013
Picon

Re: Compiling arbitrary Haskell code

On 12.10.2013 00:30, Christopher Done wrote:
> Is there a definitive list of things in GHC that are unsafe to
> _compile_ if I were to take an arbitrary module and compile it?
>
> E.g. off the top of my head, things that might be dangerous:
>
> * TemplateHaskell/QuasiQuotes -- obviously
> * Are rules safe?
> * #includes — I presume there's some security risk with including any old file?
> * FFI -- speaks for itself
>
> I'm interested in the idea of compiling Haskell code on lpaste.org,
> for core, rule firings, maybe even Th expansion, etc. When sandboxing
> code that I'm running, it's really easy if I whitelist what code is
> available (parsing with HSE, whitelisting imports, extensions). The
> problem of infinite loops or too much allocation is fairly
> straight-forwardly solved by similar techniques applied in mueval.
>
Pragma GHC_OPTIONS. You can add custom preprocessor for example bash and 
then interpret program as bash script. I think sandboing compiler
is a must. There are just too many handles and hooks to cater to all
possible uses. Some of them must be exploitable.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Johan Tibell | 12 Oct 01:19 2013
Picon

Re: Compiling arbitrary Haskell code

Whatever guarantees GHC offers (e.g. using Safe Haskell), I would always run things like these in a sandbox. It's much better for security to dissallow everything and then whitelist some things (e.g. let the sandbox communicate with the rest of the world in some limited way) than the other way around. 

Same goes for running untrusted code.


On Fri, Oct 11, 2013 at 1:30 PM, Christopher Done <chrisdone <at> gmail.com> wrote:
Is there a definitive list of things in GHC that are unsafe to
_compile_ if I were to take an arbitrary module and compile it?

E.g. off the top of my head, things that might be dangerous:

* TemplateHaskell/QuasiQuotes -- obviously
* Are rules safe?
* #includes — I presume there's some security risk with including any old file?
* FFI -- speaks for itself

I'm interested in the idea of compiling Haskell code on lpaste.org,
for core, rule firings, maybe even Th expansion, etc. When sandboxing
code that I'm running, it's really easy if I whitelist what code is
available (parsing with HSE, whitelisting imports, extensions). The
problem of infinite loops or too much allocation is fairly
straight-forwardly solved by similar techniques applied in mueval.

SafeHaskell helps a lot here, but suppose that I want to also allow
TemplateHaskell, GeneralizedNewtypeDeriving and stuff like that,
because a lot of real code uses those. They only seem to be restricted
to prevent cheeky messing with APIs in ways the authors of the APIs
didn't want -- but that shouldn't necessarily be a security—in terms
of my system—problem, should it? Ideally I'd very strictly whitelist
which modules are allowed to be used (e.g. a version of TH that
doesn't have runIO), and extensions, and then compile any code that
uses them.

I'd rather not have to setup a VM just to compile Haskell code safely.
I'm willing to put some time in to investigate it, but if there's
already previous work done for this, I'd appreciate any links.

At the end of the day, there's always just supporting a subset of
Haskell using SafeHaskell. I'm just curious about the more general
case, for use-cases similar to my own.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Christopher Done | 12 Oct 01:36 2013
Picon

Re: Compiling arbitrary Haskell code

On 12 October 2013 01:19, Johan Tibell <johan.tibell <at> gmail.com> wrote:
> Whatever guarantees GHC offers (e.g. using Safe Haskell), I would always run
> things like these in a sandbox. It's much better for security to dissallow
> everything and then whitelist some things (e.g. let the sandbox communicate
> with the rest of the world in some limited way) than the other way around.

Yeah, the impression I'm getting is that compiling pretty much
anything other than simple expressions (a la lambdabot) is that all
bets are off.
Daniil Frumin | 14 Oct 11:58 2013
Picon

Re: Compiling arbitrary Haskell code

For those who are interested (and I already chatted with Chris on IRC), I've implemented a pastebin that is able to (among some other things) to run arbitrary Haskell code: http://paste.hskll.org/
I've also developed a 'restricted-workers' library for managing processes that should run in secured environment. I've described some of my endeavors in a blog post: http://parenz.wordpress.com/2013/07/15/interactive-diagrams-gsoc-progress-report/

Bottom line: proper restrictions are hard, the necessary tools operate on a low level, there are some caveats too


On Sat, Oct 12, 2013 at 12:30 AM, Christopher Done <chrisdone <at> gmail.com> wrote:
Is there a definitive list of things in GHC that are unsafe to
_compile_ if I were to take an arbitrary module and compile it?

E.g. off the top of my head, things that might be dangerous:

* TemplateHaskell/QuasiQuotes -- obviously
* Are rules safe?
* #includes — I presume there's some security risk with including any old file?
* FFI -- speaks for itself

I'm interested in the idea of compiling Haskell code on lpaste.org,
for core, rule firings, maybe even Th expansion, etc. When sandboxing
code that I'm running, it's really easy if I whitelist what code is
available (parsing with HSE, whitelisting imports, extensions). The
problem of infinite loops or too much allocation is fairly
straight-forwardly solved by similar techniques applied in mueval.

SafeHaskell helps a lot here, but suppose that I want to also allow
TemplateHaskell, GeneralizedNewtypeDeriving and stuff like that,
because a lot of real code uses those. They only seem to be restricted
to prevent cheeky messing with APIs in ways the authors of the APIs
didn't want -- but that shouldn't necessarily be a security—in terms
of my system—problem, should it? Ideally I'd very strictly whitelist
which modules are allowed to be used (e.g. a version of TH that
doesn't have runIO), and extensions, and then compile any code that
uses them.

I'd rather not have to setup a VM just to compile Haskell code safely.
I'm willing to put some time in to investigate it, but if there's
already previous work done for this, I'd appreciate any links.

At the end of the day, there's always just supporting a subset of
Haskell using SafeHaskell. I'm just curious about the more general
case, for use-cases similar to my own.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



--
Sincerely yours,
-- Daniil
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane