Every few years, the Open Web Application Security Project (OWASP) ranks the almost critical web application security risks. Since the start report, injection risks have always been on top. Amidst all injection types, SQL injection is one of the virtually common attack vectors, and arguably the most dangerous. As Python is one of the most popular programming languages in the world, knowing how to protect against Python SQL injection is critical.

In this tutorial, you're going to learn:

  • What Python SQL injection is and how to prevent it
  • How to compose queries with both literals and identifiers every bit parameters
  • How to safely execute queries in a database

This tutorial is suited for users of all database engines. The examples hither use PostgreSQL, but the results can be reproduced in other database direction systems (such as SQLite, MySQL, Microsoft SQL Server, Oracle, and and so on).

Understanding Python SQL Injection

SQL Injection attacks are such a common security vulnerability that the legendary xkcd webcomic devoted a comic to it:

A humorous webcomic by xkcd about the potential effect of SQL injection
"Exploits of a Mom" (Prototype: xkcd)

Generating and executing SQL queries is a common task. Even so, companies around the world often brand horrible mistakes when it comes to composing SQL statements. While the ORM layer usually composes SQL queries, sometimes you have to write your own.

When you employ Python to execute these queries directly into a database, there'south a risk you could make mistakes that might compromise your arrangement. In this tutorial, you lot'll larn how to successfully implement functions that compose dynamic SQL queries without putting your system at hazard for Python SQL injection.

Setting Up a Database

To get started, you're going to prepare a fresh PostgreSQL database and populate information technology with data. Throughout the tutorial, y'all'll use this database to witness firsthand how Python SQL injection works.

Creating a Database

Get-go, open your trounce and create a new PostgreSQL database owned by the user postgres:

                                                  $                  createdb -O postgres psycopgtest                              

Here yous used the command line selection -O to set the possessor of the database to the user postgres. You also specified the proper name of the database, which is psycopgtest.

Your new database is ready to go! You tin can connect to information technology using psql:

                                                  $                  psql -U postgres -d psycopgtest                  psql (11.2, server 10.5)                  Type "help" for help.                              

Y'all're now connected to the database psycopgtest as the user postgres. This user is also the database owner, so yous'll accept read permissions on every table in the database.

Creating a Table With Information

Next, you need to create a table with some user information and add together data to it:

                                                  psycopgtest=#                  CREATE                  Tabular array                  users                  (                  username                  varchar                  (                  xxx                  ),                  admin                  boolean                  );                  CREATE Table                  psycopgtest=#                  INSERT                  INTO                  users                  (                  username                  ,                  admin                  )                  VALUES                  (                  'ran'                  ,                  truthful                  ),                  (                  'haki'                  ,                  false                  );                  INSERT 0 2                  psycopgtest=#                  SELECT                  *                  FROM                  users                  ;                                      username | admin                  ----------+-------                                      ran      | t                                      haki     | f                  (two rows)                              

The tabular array has ii columns: username and admin. The admin column indicates whether or non a user has administrative privileges. Your goal is to target the admin field and try to abuse it.

Setting Up a Python Virtual Surroundings

Now that you have a database, information technology's time to gear up your Python environment. For stride-by-step instructions on how to practice this, check out Python Virtual Environments: A Primer.

Create your virtual environment in a new directory:

                                                  (~/src)                  $                  mkdir psycopgtest                  (~/src)                  $                                    cd                  psycopgtest                  (~/src/psycopgtest)                  $                  python3 -1000 venv venv                              

Later on you run this control, a new directory called venv will exist created. This directory will store all the packages you install inside the virtual surround.

Connecting to the Database

To connect to a database in Python, you demand a database adapter. Most database adapters follow version 2.0 of the Python Database API Specification PEP 249. Every major database engine has a leading adapter:

To connect to a PostgreSQL database, you lot'll need to install Psycopg, which is the most popular adapter for PostgreSQL in Python. Django ORM uses it by default, and it'south besides supported past SQLAlchemy.

In your final, activate the virtual environment and employ pip to install psycopg:

                                                  (~/src/psycopgtest)                  $                                    source                  venv/bin/activate                  (~/src/psycopgtest)                  $                  python -m pip install psycopg2>=                  2.eight.0                  Collecting psycopg2                                      Using buried https://....                                      psycopg2-ii.8.two.tar.gz                  Installing nerveless packages: psycopg2                                      Running setup.py install for psycopg2 ... done                  Successfully installed psycopg2-ii.8.ii                              

At present y'all're ready to create a connectedness to your database. Hither'south the start of your Python script:

                                                  import                  psycopg2                  connexion                  =                  psycopg2                  .                  connect                  (                  host                  =                  "localhost"                  ,                  database                  =                  "psycopgtest"                  ,                  user                  =                  "postgres"                  ,                  countersign                  =                  None                  ,                  )                  connection                  .                  set_session                  (                  autocommit                  =                  True                  )                              

You used psycopg2.connect() to create the connection. This function accepts the post-obit arguments:

  • host is the IP accost or the DNS of the server where your database is located. In this case, the host is your local automobile, or localhost.

  • database is the proper name of the database to connect to. You desire to connect to the database you created earlier, psycopgtest.

  • user is a user with permissions for the database. In this case, you want to connect to the database as the owner, so y'all pass the user postgres.

  • password is the password for whoever you specified in user. In nigh development environments, users can connect to the local database without a password.

After setting upwards the connectedness, you lot configured the session with autocommit=Truthful. Activating autocommit means you won't have to manually manage transactions past issuing a commit or rollback. This is the default behavior in nigh ORMs. Y'all use this behavior here every bit well then that you can focus on composing SQL queries instead of managing transactions.

Executing a Query

Now that yous take a connection to the database, you're ready to execute a query:

>>>

                                                  >>>                                    with                  connection                  .                  cursor                  ()                  as                  cursor                  :                  ...                                    cursor                  .                  execute                  (                  'SELECT COUNT(*) FROM users'                  )                  ...                                    result                  =                  cursor                  .                  fetchone                  ()                  ...                                    impress                  (                  result                  )                  (2,)                              

You used the connection object to create a cursor. Simply like a file in Python, cursor is implemented equally a context manager. When you lot create the context, a cursor is opened for you to use to send commands to the database. When the context exits, the cursor closes and y'all can no longer use information technology.

While inside the context, yous used cursor to execute a query and fetch the results. In this case, you issued a query to count the rows in the users tabular array. To fetch the result from the query, you executed cursor.fetchone() and received a tuple. Since the query can only return one result, you used fetchone(). If the query were to return more than ane upshot, so y'all'd need to either iterate over cursor or use ane of the other fetch* methods.

Using Query Parameters in SQL

In the previous department, you created a database, established a connection to it, and executed a query. The query y'all used was static. In other words, it had no parameters. Now you'll starting time to use parameters in your queries.

First, you lot're going to implement a part that checks whether or non a user is an admin. is_admin() accepts a username and returns that user's admin status:

                                            # BAD EXAMPLE. DON'T Exercise THIS!                def                is_admin                (                username                :                str                )                ->                bool                :                with                connection                .                cursor                ()                every bit                cursor                :                cursor                .                execute                (                """                                  SELECT                                  admin                                  FROM                                  users                                  WHERE                                  username = '                %s                '                                  """                %                username                )                event                =                cursor                .                fetchone                ()                admin                ,                =                result                render                admin                          

This function executes a query to fetch the value of the admin column for a given username. You used fetchone() to return a tuple with a single outcome. Then, you unpacked this tuple into the variable admin. To test your function, cheque some usernames:

>>>

                                            >>>                                is_admin                (                'haki'                )                Imitation                >>>                                is_admin                (                'ran'                )                Truthful                          

So far so good. The function returned the expected effect for both users. Simply what about non-existing user? Have a look at this Python traceback:

>>>

                                            >>>                                is_admin                (                'foo'                )                Traceback (well-nigh contempo telephone call concluding):                File                "<stdin>", line                ane, in                <module>                File                "<stdin>", line                12, in                is_admin                TypeError:                cannot unpack not-iterable NoneType object                          

When the user does non exist, a TypeError is raised. This is considering .fetchone() returns None when no results are found, and unpacking None raises a TypeError. The only place you can unpack a tuple is where you populate admin from result.

To handle not-existing users, create a special instance for when outcome is None:

                                            # BAD Example. DON'T Exercise THIS!                def                is_admin                (                username                :                str                )                ->                bool                :                with                connection                .                cursor                ()                as                cursor                :                cursor                .                execute                (                """                                  SELECT                                  admin                                  FROM                                  users                                  WHERE                                  username = '                %s                '                                  """                %                username                )                outcome                =                cursor                .                fetchone                ()                                  if                  result                  is                  None                  :                                                  # User does not exist                                                  return                  Faux                                admin                ,                =                issue                return                admin                          

Here, you've added a special case for handling None. If username does not exist, and so the function should return Imitation. Once more, test the office on some users:

>>>

                                            >>>                                is_admin                (                'haki'                )                Faux                >>>                                is_admin                (                'ran'                )                True                >>>                                is_admin                (                'foo'                )                False                          

Dandy! The function tin can now handle non-existing usernames as well.

Exploiting Query Parameters With Python SQL Injection

In the previous example, you used string interpolation to generate a query. Then, you executed the query and sent the resulting string directly to the database. However, at that place's something you may have overlooked during this procedure.

Think back to the username argument you passed to is_admin(). What exactly does this variable correspond? You might assume that username is but a cord that represents an actual user's proper noun. As y'all're about to see, though, an intruder tin can easily exploit this kind of oversight and cause major harm by performing Python SQL injection.

Endeavor to check if the post-obit user is an admin or non:

>>>

                                            >>>                                is_admin                (                "'; select true; --"                )                True                          

Expect… What just happened?

Let'due south take some other await at the implementation. Print out the actual query existence executed in the database:

>>>

                                            >>>                                print                (                "select admin from users where username = '                %due south                '"                %                "'; select true; --"                )                select admin from users where username = ''; select true; --'                          

The resulting text contains three statements. To understand exactly how Python SQL injection works, you need to inspect each part individually. The first argument is as follows:

                                            select                admin                from                users                where                username                =                ''                ;                          

This is your intended query. The semicolon (;) terminates the query, and so the result of this query does not matter. Next upwardly is the 2d statement:

This statement was synthetic by the intruder. Information technology'south designed to e'er render True.

Lastly, yous see this brusque flake of code:

This snippet defuses anything that comes after information technology. The intruder added the comment symbol (--) to turn everything you might have put after the last placeholder into a annotate.

When you execute the function with this argument, it will always render True . If, for example, you use this function in your login folio, an intruder could log in with the username '; select true; --, and they'll be granted access.

If you think this is bad, it could become worse! Intruders with knowledge of your table structure can utilize Python SQL injection to cause permanent harm. For example, the intruder can inject an update statement to modify the data in the database:

>>>

                                            >>>                                is_admin                (                'haki'                )                Imitation                >>>                                is_admin                (                "'; update users set admin = 'true' where username = 'haki'; select truthful; --"                )                True                >>>                                is_admin                (                'haki'                )                True                          

Let'southward break it down again:

This snippet terminates the query, just like in the previous injection. The side by side statement is as follows:

                                            update                users                prepare                admin                =                'true'                where                username                =                'haki'                ;                          

This department updates admin to true for user haki.

Finally, there's this code snippet:

Every bit in the previous example, this piece returns true and comments out everything that follows it.

Why is this worse? Well, if the intruder manages to execute the function with this input, then user haki will go an admin:

                                            psycopgtest=#                select                *                from                users                ;                                  username | admin                ----------+-------                                  ran      | t                                                      haki     | t                                (2 rows)                          

The intruder no longer has to use the hack. They can just log in with the username haki. (If the intruder actually wanted to cause harm, then they could even effect a DROP DATABASE control.)

Before you forget, restore haki back to its original land:

                                            psycopgtest=#                update                users                fix                admin                =                false                where                username                =                'haki'                ;                UPDATE 1                          

And so, why is this happening? Well, what do you know about the username statement? You know it should exist a string representing the username, but you don't actually check or enforce this assertion. This can be dangerous! Information technology's exactly what attackers are looking for when they try to hack your organization.

Crafting Prophylactic Query Parameters

In the previous section, you lot saw how an intruder can exploit your organization and gain admin permissions by using a carefully crafted string. The issue was that you lot allowed the value passed from the customer to be executed directly to the database, without performing any sort of check or validation. SQL injections rely on this type of vulnerability.

Any time user input is used in a database query, there's a possible vulnerability for SQL injection. The fundamental to preventing Python SQL injection is to make sure the value is being used as the programmer intended. In the previous instance, you intended for username to be used as a cord. In reality, information technology was used as a raw SQL argument.

To make sure values are used as they're intended, you lot need to escape the value. For instance, to prevent intruders from injecting raw SQL in the place of a string argument, you can escape quotation marks:

>>>

                                                  >>>                                    # BAD EXAMPLE. DON'T Do THIS!                  >>>                                    username                  =                  username                  .                  replace                  (                  "'"                  ,                  "''"                  )                              

This is merely one example. At that place are a lot of special characters and scenarios to recollect nearly when trying to preclude Python SQL injection. Lucky for you, modernistic database adapters, come with congenital-in tools for preventing Python SQL injection by using query parameters. These are used instead of evidently string interpolation to compose a query with parameters.

Now that you have a amend understanding of the vulnerability, you're ready to rewrite the part using query parameters instead of string interpolation:

                                                                      1                  def                  is_admin                  (                  username                  :                  str                  )                  ->                  bool                  :                                      2                  with                  connection                  .                  cursor                  ()                  as                  cursor                  :                                      three                  cursor                  .                  execute                  (                  """                                      4                                      SELECT                                      5                                      admin                                      half dozen                                      FROM                                      7                                      users                                      8                                      WHERE                                      ix                                                            username =                                        %(username)s                                                        10                                      """                  ,                  {                  11                                      'username'                    :                    username                                    12                  })                  xiii                  result                  =                  cursor                  .                  fetchone                  ()                  14                  15                  if                  effect                  is                  None                  :                  xvi                  # User does not exist                  17                  return                  False                  xviii                  19                  admin                  ,                  =                  result                  20                  return                  admin                              

Hither'due south what's unlike in this example:

  • In line 9, you used a named parameter username to betoken where the username should get. Notice how the parameter username is no longer surrounded by single quotation marks.

  • In line 11, you passed the value of username as the 2d statement to cursor.execute(). The connectedness will use the type and value of username when executing the query in the database.

To examination this function, try some valid and invalid values, including the dangerous string from before:

>>>

                                                  >>>                                    is_admin                  (                  'haki'                  )                  False                  >>>                                    is_admin                  (                  'ran'                  )                  True                  >>>                                    is_admin                  (                  'foo'                  )                  Fake                  >>>                                    is_admin                  (                  "'; select true; --"                  )                  Imitation                              

Amazing! The part returned the expected issue for all values. What'south more, the dangerous string no longer works. To understand why, you lot tin inspect the query generated by execute():

>>>

                                                  >>>                                    with                  connection                  .                  cursor                  ()                  as                  cursor                  :                  ...                                    cursor                  .                  execute                  (                  """                  ...                                                        SELECT                  ...                                                        admin                  ...                                                        FROM                  ...                                                        users                  ...                                                        WHERE                  ...                                                        username =                                    %(username)due south                                    ...                                                        """                  ,                  {                  ...                                    'username'                  :                  "'; select truthful; --"                  ...                                    })                  ...                                    print                  (                  cursor                  .                  query                  .                  decode                  (                  'utf-8'                  ))                  SELECT                                      admin                  FROM                                      users                  WHERE                                      username = '''; select true; --'                              

The connexion treated the value of username as a string and escaped whatsoever characters that might cease the string and introduce Python SQL injection.

Passing Safe Query Parameters

Database adapters usually offer several ways to pass query parameters. Named placeholders are usually the best for readability, just some implementations might benefit from using other options.

Let's take a quick look at some of the right and wrong ways to utilise query parameters. The following lawmaking cake shows the types of queries you'll want to avert:

                                                  # BAD EXAMPLES. DON'T DO THIS!                  cursor                  .                  execute                  (                  "SELECT admin FROM users WHERE username = '"                  +                  username                  +                  '");                  cursor                  .                  execute                  (                  "SELECT admin FROM users WHERE username = '                  %s                  '                                    % u                  sername);                  cursor                  .                  execute                  (                  "SELECT admin FROM users WHERE username = '                  {}                  '"                  .                  format                  (                  username                  ));                  cursor                  .                  execute                  (                  f                  "SELECT admin FROM users WHERE username = '                  {                  username                  }                  '"                  );                              

Each of these statements passes username from the customer directly to the database, without performing any sort of check or validation. This sort of code is ripe for inviting Python SQL injection.

In contrast, these types of queries should be safe for you to execute:

                                                  # Condom EXAMPLES. DO THIS!                  cursor                  .                  execute                  (                  "SELECT admin FROM users WHERE username =                                    %s                  '"                  ,                  (                  username                  ,                  ));                  cursor                  .                  execute                  (                  "SELECT admin FROM users WHERE username =                                    %(username)s                  "                  ,                  {                  'username'                  :                  username                  });                              

In these statements, username is passed equally a named parameter. Now, the database will use the specified blazon and value of username when executing the query, offering protection from Python SQL injection.

Using SQL Limerick

And so far you've used parameters for literals. Literals are values such every bit numbers, strings, and dates. Just what if you have a utilise case that requires composing a dissimilar query—one where the parameter is something else, like a tabular array or column proper name?

Inspired by the previous instance, permit'south implement a office that accepts the name of a table and returns the number of rows in that table:

                                            # BAD Example. DON'T DO THIS!                def                count_rows                (                table_name                :                str                )                ->                int                :                with                connection                .                cursor                ()                equally                cursor                :                cursor                .                execute                (                """                                  SELECT                                  count(*)                                  FROM                                                %(table_name)southward                                                  """                ,                {                'table_name'                :                table_name                ,                })                outcome                =                cursor                .                fetchone                ()                rowcount                ,                =                result                return                rowcount                          

Try to execute the role on your users table:

>>>

                                            Traceback (virtually recent phone call last):                File                "<stdin>", line                1, in                <module>                File                "<stdin>", line                nine, in                count_rows                psycopg2.errors.SyntaxError:                syntax error at or near "'users'"                LINE v:                 'users'                                  ^                          

The control failed to generate the SQL. As you've seen already, the database adapter treats the variable as a cord or a literal. A tabular array name, however, is non a plainly string. This is where SQL composition comes in.

You already know information technology'due south not prophylactic to use string interpolation to etch SQL. Luckily, Psycopg provides a module called psycopg.sql to help you safely compose SQL queries. Permit's rewrite the function using psycopg.sql.SQL():

                                            from                psycopg2                import                sql                def                count_rows                (                table_name                :                str                )                ->                int                :                with                connection                .                cursor                ()                every bit                cursor                :                stmt                =                sql                .                SQL                (                """                                  SELECT                                  count(*)                                  FROM                                                {table_name}                                                  """                )                .                format                (                table_name                =                sql                .                Identifier                (                table_name                ),                )                cursor                .                execute                (                stmt                )                result                =                cursor                .                fetchone                ()                rowcount                ,                =                result                render                rowcount                          

At that place are two differences in this implementation. Start, y'all used sql.SQL() to compose the query. Then, you lot used sql.Identifier() to annotate the argument value table_name. (An identifier is a column or table name.)

Now, attempt executing the function on the users tabular array:

>>>

                                            >>>                                count_rows                (                'users'                )                2                          

Great! Next, allow'south run into what happens when the tabular array does non exist:

>>>

                                            >>>                                count_rows                (                'foo'                )                Traceback (most contempo phone call last):                File                "<stdin>", line                1, in                <module>                File                "<stdin>", line                11, in                count_rows                psycopg2.errors.UndefinedTable:                relation "foo" does not exist                LINE 5:                 "foo"                                  ^                          

The part throws the UndefinedTable exception. In the post-obit steps, y'all'll use this exception as an indication that your function is safe from a Python SQL injection attack.

To put information technology all together, add an selection to count rows in the tabular array upwards to a certain limit. This feature might be useful for very large tables. To implement this, add a LIMIT clause to the query, along with query parameters for the limit'south value:

                                            from                psycopg2                import                sql                def                count_rows                (                table_name                :                str                ,                limit                :                int                )                ->                int                :                with                connection                .                cursor                ()                as                cursor                :                stmt                =                sql                .                SQL                (                """                                  SELECT                                  COUNT(*)                                  FROM (                                  SELECT                                  1                                  FROM                                                {table_name}                                                                      LIMIT                                                                                      {limit}                                                                    ) AS limit_query                                  """                )                .                format                (                table_name                =                sql                .                Identifier                (                table_name                ),                                  limit                  =                  sql                  .                  Literal                  (                  limit                  ),                                )                cursor                .                execute                (                stmt                )                result                =                cursor                .                fetchone                ()                rowcount                ,                =                result                return                rowcount                          

In this code block, you annotated limit using sql.Literal(). As in the previous example, psycopg volition bind all query parameters as literals when using the simple approach. Still, when using sql.SQL(), you need to explicitly annotate each parameter using either sql.Identifier() or sql.Literal().

Execute the function to make certain that it works:

>>>

                                            >>>                                count_rows                (                'users'                ,                one                )                ane                >>>                                count_rows                (                'users'                ,                10                )                two                          

At present that yous meet the function is working, make sure it's as well safety:

>>>

                                            >>>                                count_rows                (                "(select 1) as foo; update users set up admin = true where proper name = 'haki'; --"                ,                i                )                Traceback (about contempo phone call concluding):                File                "<stdin>", line                one, in                <module>                File                "<stdin>", line                18, in                count_rows                psycopg2.errors.UndefinedTable:                relation "(select 1) as foo; update users gear up admin = true where name = '" does not exist                LINE 8:                     "(select 1) as foo; update users set adm...                                  ^                          

This traceback shows that psycopg escaped the value, and the database treated it as a table name. Since a table with this name doesn't be, an UndefinedTable exception was raised and you were not hacked!

Determination

Y'all've successfully implemented a office that composes dynamic SQL without putting your system at risk for Python SQL injection! You've used both literals and identifiers in your query without compromising security.

Y'all've learned:

  • What Python SQL injection is and how it can be exploited
  • How to prevent Python SQL injection using query parameters
  • How to safely compose SQL statements that use literals and identifiers as parameters

Y'all're now able to create programs that can withstand attacks from the exterior. Go along and thwart the hackers!