Tell me more ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

I have a project where I need to allow users to run arbitrary, untrusted python code (a bit like this) against my server. I'm fairly new to python and I'd like to avoid making any mistakes that introduce security holes or other vulnerabilities into the system. Are there any best-practices available, recommended reading, or other pointers you can give me make my service usable but not abusable?

Here's what I've considered so far:

  • Remove __builtins__ from the exec context to prohibit use of potentially dangerous packages like os. Users will only be able to use packages I provide to them.
  • Use threads to enforce a reasonable timeout.
  • I'd like to limit the total amount of memory that can be allocated within the exec context, but I'm not sure if it's even possible.

There are some alternatives to a straight exec, but I'm not sure which of these would be helpful here:

  • Using an ast.NodeVisitor to catch any attempt to access unsafe objects. But what objects should I prohibit?
  • Searching for any double-underscores in the input. (less graceful than the above option).
  • Using PyPy or something similar to sandbox the code.

NOTE: I'm aware that there is at least one JavaScript-based interpreter. That will not work in my scenario.

share|improve this question
6  
1  
@MartijnPieters: Excellent. Probably worthy of an answer, if you summarize each one. – Robert Harvey Mar 22 at 21:14
@MartijnPieters Thanks, I'm reading through them now. Highly informative! – p.s.w.g Mar 22 at 21:20
Consider also: garbage left on the disk, network (do not let them send spam or whatever), permissions to other files (reading your files). Even eject in while loop can destroy CD mechanics... I would go for virtualization (jails or some kvm you name it) or at least user with almost no privileges. Set reasonable nice and amount of memory to advantage your own programms. – kyticka Mar 22 at 21:27

2 Answers

up vote 5 down vote accepted

Python sandboxing is hard. Python is inherently introspectable, at multiple levels.

This also means that you can find the factory methods for specific types from those types themselves, and construct new low-level objects, which will be run directly by the interpreter without limitation.

Here are some examples of finding creative ways to break out of Python sandboxes:

The basic idea is always to find a way to create base Python types; functions and classes and break out of the shell by getting the Python interpreter to execute arbitrary (unchecked!) bytecode.

The same and more applies to the exec statement (exec() function in Python 3).

So, you want to:

  • Strictly control the byte compilation of the Python code, or at least post-process the bytecode to remove any access to names starting with underscores.

    This requires intimate knowledge of how the Python interpreter works and how Python bytecode is structured. Code objects are nested; a module's bytecode only covers the top level of statements, each function and class consists of their own bytecode sequence plus metadata, containing other bytecode objects for nested functions and classes, for example.

  • You need to whitelist modules that can be used. Carefully.

    A python module contains references to other modules. If you import os, there is a local name os in your module namespace that refers to the os module. This can lead a determined attacker to modules that can help them break out of the sandbox. The pickle module, for example, lets you load arbitrary code objects for example, so if any path through whitelisted modules leads to the pickle module, you have a problem still.

  • You need to strictly limit the time quotas. Even the most neutered code can still attempt to run forever, tying up your resources.

Take a look at RestrictedPython, which attempts to give you the strict bytecode control. RestrictedPython transforms Python code into something that lets you control what names, modules and objects are permissible in Python 2.3 through to 2.7.

If RestrictedPython is secure enough for your purposes does depend on the policies you implement. Not allowing access to names starting with an underscore and strictly whitelisting the modules would be a start.

share|improve this answer

There is no way you can do this safely.

If you wanted to do something like this safely, you'd have to start by having your own implementation of python which runs in a completely controlled environment, preferably runs in the users' browser instead of on your system. You might start with Jython (python for java) and package it as a java applet. Since it would be running in the java sandbox, on the user's machine, your system would be reasonably safe.

share|improve this answer
1  
I loled for "java" and "safe" – grasGendarme Mar 23 at 9:51
The question of safety was for his server, not for the client's machine. Java's potential security hazards, like those for any other web technology, are that the server could be used to deploy programs hazardous to the client. – ddyer Mar 25 at 0:48

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.