Best practices for execution of untrusted code

Question

I have a project where I need to allow users to run arbitrary, untrusted python code (a bit like this) against my server. I'm fairly new to python and I'd like to avoid making any mistakes that introduce security holes or other vulnerabilities into the system. Are there any best-practices available, recommended reading, or other pointers you can give me make my service usable but not abusable?

Here's what I've considered so far:

Remove __builtins__ from the exec context to prohibit use of potentially dangerous packages like os. Users will only be able to use packages I provide to them.
Use threads to enforce a reasonable timeout.
I'd like to limit the total amount of memory that can be allocated within the exec context, but I'm not sure if it's even possible.

There are some alternatives to a straight exec, but I'm not sure which of these would be helpful here:

Using an ast.NodeVisitor to catch any attempt to access unsafe objects. But what objects should I prohibit?
Searching for any double-underscores in the input. (less graceful than the above option).
Using PyPy or something similar to sandbox the code.

NOTE: I'm aware that there is at least one JavaScript-based interpreter. That will not work in my scenario.

Some starting points for study: blog.delroth.net/2013/03/…, nedbatchelder.com/blog/201206/eval_really_is_dangerous.html, nedbatchelder.com/blog/201302/… and nedbatchelder.com/blog/201302/finding_python_3_builtins.html about breaking out of sandboxes.
@MartijnPieters: Excellent. Probably worthy of an answer, if you summarize each one.
@MartijnPieters Thanks, I'm reading through them now. Highly informative!
Consider also: garbage left on the disk, network (do not let them send spam or whatever), permissions to other files (reading your files). Even eject in while loop can destroy CD mechanics... I would go for virtualization (jails or some kvm you name it) or at least user with almost no privileges. Set reasonable nice and amount of memory to advantage your own programms.

Martijn Pieters · Accepted Answer · 2013-03-23 08:58:39Z

Python sandboxing is hard. Python is inherently introspectable, at multiple levels.

This also means that you can find the factory methods for specific types from those types themselves, and construct new low-level objects, which will be run directly by the interpreter without limitation.

Here are some examples of finding creative ways to break out of Python sandboxes:

Ned Batchelder starts with a demonstration how dangerous eval() really is; eval() is often used to execute Python expressions; as a primitive and naive sandbox for one-liners.

He then continued to try and apply the same principles to Python 3, eventually succeeding to break out with some helpful pointers.
Pierre Bourdon uses similar techniques to hack a python system at a hack-a-thon

The basic idea is always to find a way to create base Python types; functions and classes and break out of the shell by getting the Python interpreter to execute arbitrary (unchecked!) bytecode.

The same and more applies to the exec statement (exec() function in Python 3).

So, you want to:

Strictly control the byte compilation of the Python code, or at least post-process the bytecode to remove any access to names starting with underscores.

This requires intimate knowledge of how the Python interpreter works and how Python bytecode is structured. Code objects are nested; a module's bytecode only covers the top level of statements, each function and class consists of their own bytecode sequence plus metadata, containing other bytecode objects for nested functions and classes, for example.
You need to whitelist modules that can be used. Carefully.

A python module contains references to other modules. If you import os, there is a local name os in your module namespace that refers to the os module. This can lead a determined attacker to modules that can help them break out of the sandbox. The pickle module, for example, lets you load arbitrary code objects for example, so if any path through whitelisted modules leads to the pickle module, you have a problem still.
You need to strictly limit the time quotas. Even the most neutered code can still attempt to run forever, tying up your resources.

Take a look at RestrictedPython, which attempts to give you the strict bytecode control. RestrictedPython transforms Python code into something that lets you control what names, modules and objects are permissible in Python 2.3 through to 2.7.

If RestrictedPython is secure enough for your purposes does depend on the policies you implement. Not allowing access to names starting with an underscore and strictly whitelisting the modules would be a start.

ddyer · Answer 2 · 2013-03-22 21:29:09Z

There is no way you can do this safely.

If you wanted to do something like this safely, you'd have to start by having your own implementation of python which runs in a completely controlled environment, preferably runs in the users' browser instead of on your system. You might start with Jython (python for java) and package it as a java applet. Since it would be running in the java sandbox, on the user's machine, your system would be reasonably safe.

The question of safety was for his server, not for the client's machine. Java's potential security hazards, like those for any other web technology, are that the server could be used to deploy programs hazardous to the client.

asked	1 month ago
viewed	210 times
active	1 month ago

Best practices for execution of untrusted code

2 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged python security web-services untrusted-code or ask your own question.

Best practices for execution of untrusted code

2 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python security web-services untrusted-code or ask your own question.

Related