Web Application Development

Jakub Klinkovský

:: Czech Technical University in Prague
:: Faculty of Nuclear Sciences and Physical Engineering
:: Department of Software Engineering

Academic Year 2024-2025

Security

Basic security concepts: confidentiality, integrity, and availability of data.

Information security is a broad area that includes activities such as risk analysis, vulnerability analysis, and implementation of various measures.

Every security measure must balance 4 main factors:

  1. productivity (usability)
  2. cost (financial or time-related)
  3. effectiveness
  4. value of the information to be protected

Security Measures for Web Applications

Areas and techniques we will discuss:

  1. Secure data transfer – HTTPS protocol
  2. Authentication and session
  3. Cross-site scripting (XSS)
  4. Cross-site request forgery (CSRF)
  5. Clickjacking
  6. SQL injection
  7. Validation of uploaded data
  8. Validation of request header data
  9. Security response headers

See Security in Django for more information. Do you know any other techniques?

HTTPS Protocol

HTTPS is a secure transport protocol (HTTP over TLS). It enables:

  1. authentication of the visited server (certificate signed by a certification authority),
  2. encryption of transferred data (ensuring integrity and confidentiality).

Certificates are issued for a given domain (or a group of domains) for a limited period, after which it is necessary to request renewal. Depending on the certification authority, the entire process can be automated and performed for free (e.g. using Let's Encrypt).

For applications deployed in practice, it is always better to prefer the HTTPS protocol over HTTP, as it prevents eavesdropping and data tampering (both from the server to the client and from the client to the server).

Django Framework Configuration for HTTPS

In the settings.py file, you can set a number of parameters related to the HTTPS protocol:

  • SECURE_SSL_REDIRECT – redirection of HTTP requests to HTTPS
  • SECURE_PROXY_SSL_HEADER – for configuration when the application is behind a proxy server that communicates with the outside world via HTTPS, but with the application via HTTP (affects the trustworthiness of the X-Forwarded-Proto header)
  • parameters for HTTP Strict Transport Security (HSTS) – for applications that should be visited only via HTTPS (header Strict-Transport-Security allows eliminating the initial HTTP request)
  • secure cookies (see later)

Authentication and Session

Authentication should always take place using the HTTPS protocol. It is easiest to set up the entire application to run strictly over HTTPS.

When using cookies, it is necessary to ensure their security:

  • set the SESSION_COOKIE_SECURE parameter to True (the browser should send them only via HTTPS – important, as the initial request may be via HTTP)
  • used cookies should not be accessible to any subdomain – see Session security (it is possible to set cookies for a higher-level domain, which will then relate to all subdomains)

Password (In)Security

center

https://explainxkcd.com/wiki/index.php/936:_Password_Strength

Cross-site scripting (XSS)

XSS attacks involve inserting client-side scripts (e.g. JavaScript) into an HTML document of another web application.

Example in the context of a Django application:

<style class={{ var }}>...</style>

If var contains, for example, the string class1 onmouseover=javascript:func(), we have a problem... Other examples are on Wikipedia.

Django provides automatic HTML escaping in templates, but it is necessary to pay attention to cases where this protection is disabled (e.g. when displaying stored HTML data in the database).

Untrusted data can come from anywhere (URL parameters, forms, database, cookies, uploaded files,...).

Cross-site request forgery (CSRF)

Cross-site request forgery (CSRF) attacks involve sending unauthorized commands that originate from a user that the server application trusts.

Examples:

  • HTTP GET method: trivial exploitation (just a link with a URL to a page with parameters)
    This method should never be used to change the application state!
  • HTTP POST method: a form in HTML is needed, exploitation possibilities depend on the data encoding method and potential application protection (see below)
  • other HTTP methods (PUT, DELETE,...) can be used in a browser only if the principles of same-origin policy and cross-origin resource sharing are met, which prevent CSRF attacks (but the application can explicitly deactivate them)

Example of a CSRF attack using the POST method

  • application A contains an insecure form: https://app1.com/form.php
  • page B, where the attacker placed their own HTML code, contains a form that sends data to application A (e.g., using hidden tags):
    <form action="https://app1.com/form.php" method="post">
        <input type="hidden" name="password" value="not-secret">
        ...
    </form>
    
  • if the attacker can place their own JavaScript on page B, they don't even need a form to send a request to application A

Meanwhile, it doesn't matter if application A uses HTTPS or if the form requires authentication (when sending the request, the browser will send all cookies of application A, including the session ID of the logged-in user).

Protection against CSRF in the Django framework

Django provides protection against most variants of CSRF attacks, but the application must use it. For form protection, it uses the so-called synchronizer token pattern:

  • SecurityMiddleware sets a CSRF cookie – a random string generated using a secret key, the value changes at the beginning of each session
  • a hidden value with the name csrfmiddlewaretoken is added to forms sent by the POST method (see the {% csrf_token %} tag used in templates)
  • when a POST request is received, validation takes place:
    • the value of csrfmiddlewaretoken must match the data in the CSRF cookie
    • cryptographic methods allow validation based on a secret key on the server and even the use of multiple different tokens with a common cookie value
  • Django also checks the Origin and Referer headers, see How it works.

CSRF protection parameters in the Django framework

CSRF protection has several parameters, e.g., CSRF_COOKIE_SECURE ensures that CSRF cookies are sent only over HTTPS.

The used protection method has its limitations:

  • subdomains can set cookies for higher-level domains, setting a suitable pair of cookie + token can bypass protection
  • the only reliable solution is to ensure that subdomains are only accessible to trusted parties (or at least that subdomains cannot set cookies, e.g., static websites)

More details: see OWASP cheatsheet.

Clickjacking

Clickjacking is a type of attack where the attacker convinces the user to click on something other than what the user perceives in their browser (or email client). Technically, it can be done in many ways, e.g.,:

  • using cross-site scripting and hidden layers or elements in HTML
  • using frames ( <frame> and <iframe> tags in HTML)

Django protection uses the X-Frame-Options header, where the value DENY prevents the use of content in any frame.

A more modern header Content-Security-Policy allows greater flexibility (but its setup is not yet available in Django).

More details: see OWASP cheatsheet.

SQL injection

SQL injection is an attack similar to cross-site scripting, where the attacker constructs fake input data that allows them to run arbitrary SQL code:

center

Django protection is ensured by using the QuerySet object, which constructs so-called parametrized queries (input data is separated from the query structure and the database system takes care of their escaping).

If you write your own SQL queries, you must pay attention to proper escaping.

Validation of uploaded data

Security principles for uploaded files:

  • limit the maximum size (risk of DOS attacks)
  • prevent the execution of static files (risk of running arbitrary code)
  • validate data that will be displayed within an HTML document
    • e.g., an uploaded file that contains a valid PNG header and further HTML code can allow cross-site scripting attacks when displayed in an <img> tag
    • there is no foolproof solution at the framework level
    • providing uploaded files from a different domain (e.g., usercontent-example.com for example.com) can utilize the blocking of cross-site scripting attacks thanks to the same-origin principle

Validation of data in request headers

In certain cases, Django uses data from the Host header to construct a URL.

Django performs validation:

  • escaping to prevent cross-site scripting
  • comparison with the list of ALLOWED_HOSTS in the settings.py file

Note: validated data is only available using the request.get_host() method. In the request.META dictionary, the original, unvalidated value is stored.

Other security response headers

In many cases, security uses parameters set in response headers (see, for example, the mentioned Strict-Transport-Security or X-Frame-Options).

Other important headers configurable in Django security middleware:

  • Referrer-Policy – instructions for the browser on when (not) to send the Referer header (typo is already in the standard 🙃) for links on the displayed page
  • Cross-Origin-Opener-Policy – allows setting the level of isolation in the browser between top-level documents and cross-origin documents
  • X-Content-Type-Options – the value nosniff prevents the browser from guessing the content type based on the data itself and using the result instead of the Content-Type header

More details: see OWASP cheatsheet.

Summary of security in Django

For the development of secure applications, Django uses:

  • cookie security
  • setting security headers in sent responses
    (configuration of compatible browser behavior)

And it also provides:

  • tools for data validation (models, forms, headers, images)
  • automatic escaping of characters in templates (HTML) and database queries (SQL)

Deployment

Deployment means running the application on a server for use in the real world.

What is needed:

  • a server with a public IP address and domain
    • own hardware or cloud provider or web hosting
  • installation and configuration of a web server
  • configuration of the application for deployment

Details: How to deploy Django.

Note: for the semestral projects, deployment is not necessary. You can try it just out of curiosity or for your own needs.

Web hosting for Python

Our faculty web hosting does not allow the deployment of Python applications (it only has basic support for ASP and PHP scripts).

Examples of services available for free (with many limitations) and with Python support:

  • and several others...

Interface between the application and the web server

For Python, two standards are used:

  • WSGI (Web Server Gateway Interface) – traditional interface for synchronous applications
  • ASGI (Asynchronous Server Gateway Interface) – modern interface using asynchronous operations in the Python language (keywords async and await)

However, our application does not use asynchronous functions of the Django framework, so it is sufficient to use WSGI.

Configuring the application for a WSGI server

Examples of servers: Gunicorn, uWSGI, Apache + mod_wsgi

General procedure:

  1. run the server in the directory where the manage.py script is located (this script will not be run during deployment, though)
  2. configure the server to find the object representing the WSGI application
    • in the default project, it is the application object in the mysite/wsgi.py file (i.e., the mysite.wsgi module)
  3. configure the settings module intended for deployment
    • the default mysite.settings module contains DEBUG = True,
      for deployment it is a good idea to create a separate module
    • for configuration, you can use the DJANGO_SETTINGS_MODULE environment variable (see its use in the mysite/wsgi.py file)

Deployment checklist

Before actual deployment, it is a good idea to go through the deployment checklist. Some checks can be automated by running the manage.py check --deploy command.

  1. Critical variables in settings: SECRET_KEY, DEBUG
  2. Parameters depending on the environment in which the application will run, e.g.,
    ALLOWED_HOSTS, CACHES, DATABASES
  3. Configuration of serving static files – variables STATIC_ROOT,
    STATIC_URL, MEDIA_ROOT, MEDIA_URL and corresponding web server configuration
  4. HTTPS
  5. Optimization (see below)
  6. Error logging

# Content of Today's Lecture 1. Security 2. Deployment 3. Performance and optimization ---

[Qovery](https://www.qovery.com/) ([pricing](https://www.qovery.com/pricing), [Django](https://hub.qovery.com/guides/tutorial/quickstart-django/))

[Circumeo](https://www.circumeo.io/) ([pricing](https://www.circumeo.io/#pricing))

--- # Performance and optimization The efficiency of the application must be ensured not only in the development environment – the real world behaves differently (a large number of requests, more data in the database, etc.). First, you need to determine the characteristics of the environment and the given application: - benchmarks and profiling – e.g., [Django Debug Toolbar](https://github.com/jazzband/django-debug-toolbar/) - monitoring – e.g., number of visits and server load Then optimization of the most affected parts can begin. In the Django documentation: [Performance and optimization](https://docs.djangoproject.com/en/5.2/topics/performance/) --- ## Optimizing database access - the time it takes to process database accesses depends on the amount and type of data stored in it (if the test database is small, even inefficient queries will be processed quickly, but in practice, they will cause significant slowdowns) - processing an operation directly in the database is typically faster than equivalent code in Python, e.g., `some_objects.count()` vs `len(some_objects)` - practically all database systems use so-called [indexes](https://en.wikipedia.org/wiki/Database_index) – auxiliary data structures that enable fast searching and sorting by specified attributes (columns in a table) - primary and unique keys are automatically indexed, additional indexes can be added using [`Meta.indexes`](https://docs.djangoproject.com/en/5.2/ref/models/options/#django.db.models.Options.indexes) - details see [Database access optimization](https://docs.djangoproject.com/en/5.2/topics/db/optimization/) --- ## Optimizing by caching - Django contains a [caching framework](https://docs.djangoproject.com/en/5.2/topics/cache/), i.e., temporary storage of frequently needed data in a location with fast access - often used are Memcached and Redis programs – caching in operating memory (can also be used for caching data stored in the database) - data whose acquisition takes a very long time can be cached in the database (permanent storage, but slower access) - data with a simple structure can be efficiently cached directly in files on disk Levels of caching: - entire application (website) (`django.middleware.cache`) - individual _view_, where caching makes sense (decorator `@cache_page`) - parts of templates (tag `{% cache... %}`) - any data (using [low-level API](https://docs.djangoproject.com/en/5.2/topics/cache/#the-low-level-cache-api)) --- ## Optimizing at the HTTP level - [ConditionalGetMiddleware](https://docs.djangoproject.com/en/5.2/ref/middleware/#django.middleware.http.ConditionalGetMiddleware) – sets response headers [`ETag`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag) and [`Last-Modified`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Last-Modified) and checks request headers [`If-None-Match`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-None-Match) and [`If-Modified-Since`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since) (using data cached on the client side) - [GZipMiddleware](https://docs.djangoproject.com/en/5.2/ref/middleware/#django.middleware.gzip.GZipMiddleware) – compresses data in the response (can be set in most web servers, e.g., [uWSGI](https://ugu.readthedocs.io/en/latest/compress.html)) - using tools for [minimizing CSS and JavaScript](https://www.imperva.com/learn/performance/minification/) --- ## Further optimization - using ASGI instead of WSGI - using a different language and package for templates (e.g., [Jinja2](https://jinja.palletsprojects.com/en/3.1.x/)) - optimizing application code (e.g., implementing a more efficient algorithm or using a more efficient library)