Web Application Development

Jakub Klinkovský

:: Czech Technical University in Prague
:: Faculty of Nuclear Sciences and Physical Engineering
:: Department of Software Engineering

Academic Year 2024-2025

Download and Upload of Files

How Download Works

The traditional way of downloading files using the HTTP protocol:

  1. URL specifies the location of the given file on the web
  2. The client sends a request using the GET method
  3. The server sends a response with the requested data
  4. The client processes the response (displaying the multimedia file in the browser or saving the file to the disk)

This process is sufficient for small files that are typically displayed directly in the browser. Requests are sent and processed asynchronously, and there can be a large number of them.

More Efficient Download

To increase efficiency, several extensions to the HTTP protocol have been developed:

  • continuation of an interrupted transfer (headers Accept-Range, Range, Content-Range, and status code 206 Partial Content)
  • chunked encoding of the file – allows for so-called streaming (in HTTP/1.1 using the Transfer-Encoding: chunked header, HTTP/2 has its own mechanisms)
  • streaming data with adaptive bitrate (see Wikipedia)

For specific applications, there are advanced protocols (e.g. WebRTC for real-time transmission of image and sound between clients).

How Upload Works

  1. The client has a document with a form that contains an element (elements) for uploading a file (see below)
  2. The client encodes the file data into the body of an HTTP request of type POST or PUT, which it sends to the server
  3. The server processes the received data (e.g. performs validation, saves it to the database or disk)
  4. The server sends a response containing the result of processing the uploaded data

The actual data transfer using the HTTP protocol is analogous to the case of download.

Upload and Security

An application that allows file uploads is potentially dangerous:

  1. The maximum size of uploaded files should be limited.
  2. Uploaded files should always be treated as static data.
    Never run scripts uploaded from an unverified source!
  3. The type of data should always be validated (e.g., whether a PNG image contains image data or HTML).

There is no universal solution for data validation. We will explain and demonstrate different ways to limit security consequences later.

Forms for File Upload

To upload files, it is necessary to specify the data encoding type using form-data. This is done using the enctype attribute of the <form> tag.

Example:

<form enctype="multipart/form-data" action="/upload/" method="post">
    ...
</form>

Elements for File Upload Forms

There is a tag <input type="file">. Example:

<label for="avatar">Choose a profile picture:</label>

<input type="file"
       id="avatar" name="avatar"
       accept="image/png, image/jpeg">

Attributes:

  • accept – list of accepted data types
  • capture – allows creating a new file (image, video, sound) using the camera
    (currently supported only by mobile device browsers)
  • multiple – allows selecting multiple files at once

File Upload Forms in Django

Object form elements: FileField and ImageField (a class derived from FileField).

Note: for models, there are also FileField and ImageField (see below).

Example:

from django import forms

class UploadFileForm(forms.Form):
    file = forms.FileField()

Note: the max_length attribute for FileField limits the length of the file name, not its content!

The allow_empty_file attribute determines whether the form accepts files with empty content during validation.

Creating an Empty Form

To display an empty form, a simple function in views.py is sufficient:

def index(request):
    variables = {"form": forms.UploadFileForm()}
    return render(request, "demo/index.html", variables)

When formatting the form in the template, it is necessary to specify the enctype attribute for the <form> tag, e.g.:

<form enctype="multipart/form-data" action="{% url 'upload' %}" method="post">
    {% csrf_token %}
    {{ form.as_div }}
    <input type="submit" value="Send">
</form>

Processing a Form with Upload – Checking the HTTP Method

In the views.py file, we create a function that will process only the upload:

from django.http import HttpResponseNotAllowed

def upload(request):
    # upload must be submitted via the POST method
    if request.method != 'POST':
        # return HTTP 405: Method Not Allowed
        return HttpResponseNotAllowed(["POST"])

    ...

The HttpResponseNotAllowed object represents a response with the HTTP 405 status code.

Processing a Form with Upload – Validation

Next, we create an instance of the form bound to the dictionaries with the request parameters (request.POST) and uploaded files (request.FILES):

    ...

    # create a form with data from the request
    form = forms.UploadFileForm(request.POST, request.FILES)

    # check if the form is valid
    if not form.is_valid():
        # render the template showing errors in the form
        variables = {"form": form}
        return render(request, "demo/index.html", variables)

    ...

Processing a Form with Upload – Saving the File

After validating the form, we process the data of the uploaded files. In this example, only the file request.FILES["file"], where "file" corresponds to the file attribute in the given form. Objects in the FILES dictionary have the type UploadedFile.

    ...

    # do something with the uploaded file
    save_uploaded_file(request.FILES["file"])

    # Always return an HttpResponseRedirect after successfully dealing
    # with POST data. This prevents data from being posted twice if the
    # user hits the <Back> or <Reload> button.
    return HttpResponseRedirect(reverse("index"))

Note: in this example, I do not use ModelForm – the file must be processed manually. ModelForm would take care of saving all data, including files.

Behind the Scenes – Django Upload Handlers

As soon as Django starts processing the request (the request object), the uploaded file data may not yet be available. During processing, various attributes of the request, including uploaded files, are created on demand.

Accessing the request.FILES dictionary triggers the use of so-called file handlers objects, which take care of the actual upload to a temporary location and create an UploadedFile object. The default setting in settings.py is:

FILE_UPLOAD_HANDLERS = [
    'django.core.files.uploadhandler.MemoryFileUploadHandler',
    'django.core.files.uploadhandler.TemporaryFileUploadHandler',
]

Up to 2.5 MB, MemoryFile is used, otherwise TemporaryFile on the disk.

It is possible to implement a custom file handler (e.g. ProgressBarUploadHandler for AJAX).

Interlude – Working with Files and Directories in Python

The __file__ variable – a string with the path to the source file of the current module

The open() function – opens a file on the disk for reading or writing

Modules for working with files:

  • pathlib (or the older os.path) – the Path class for representing paths in the file system and corresponding operations
  • shutil – high-level operations with files and tree structures on the disk (copying, moving, deleting, archiving)
  • tempfile – creating temporary files and directories
  • filecmp – comparing files and directories

And other low-level modules, see the official overview.

See also ZPRO 2024: exercise 18 and exercise 19.

Completing the Upload Processing

The save_uploaded_file function can be implemented as follows:

from django.conf import settings

def save_uploaded_file(uploaded_file):
    # choose upload directory
    upload_dir = Path(settings.MEDIA_ROOT) / "uploads"

    # create the directory
    upload_dir.mkdir(parents=True, exist_ok=True)

    # set upload file path
    upload_path = upload_dir / uploaded_file.name

    # save the uploaded data
    with open(upload_path, "wb+") as f:
        for chunk in uploaded_file.chunks():
            f.write(chunk)

Configuration of Storage for Uploaded Files

In the settings.py file, we define the storage using:

  • MEDIA_ROOT – location of files on the disk
  • MEDIA_URL – part of the URL for accessing files from the client side

E.g.:

MEDIA_ROOT = BASE_DIR / "media"
MEDIA_URL = "media/"

In urls.py, it is necessary to set the mapping (works only in debug mode):

from django.conf import settings
from django.conf.urls.static import static
urlpatterns = [
    ...
] + static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

File Management and Storage

The default storage for files in Django is the local file system – see MEDIA_ROOT.
In addition, it is possible to implement a custom storage – e.g. for remote systems.
The django-storages project provides an interface for Amazon S3, Dropbox, etc.

More information about file management in Django can be found in the Managing files section.

Uploading Multiple Files

To enable the upload of multiple files in one field, certain changes are required, see the Django documentation.

Limiting the Type of Files

  1. For the <input type="file"> tag, it is appropriate to use the accept attribute:

    class UploadFileForm(forms.Form):
        file = forms.FileField(
            widget=forms.ClearableFileInput(attrs={"accept": "application/pdf"})
        )
    
  2. On the application side, validation should be performed, at least to check the content_type for UploadedFile in the save_uploaded_file function:

        ...
        if uploaded_file.content_type != "application/pdf":
            raise ValidationError("only PDF files are allowed")
        ...
    

    The content of the file should also be verified!

Object-Oriented Approach to File Uploads and Models

Django allows for the processing of uploaded files in cooperation with models:

  • the FileField class for defining model attributes
  • linking uploaded files with object-relational mapping
  • data is stored on the disk and the path to the file is stored in the database

Example: definition in models.py:

from django.db import models

class MyModel(models.Model):
    # file will be uploaded to MEDIA_ROOT/uploads
    upload = models.FileField(upload_to="uploads/")

Alternatively, you can use tokens for formatting functions strftime(), e.g.
upload_to="uploads/%Y/%m/%d/" will save files to the directory
MEDIA_ROOT/uploads/2024/04/19

Processing Uploads Using the FileField Model Attribute

The upload function in the views.py file will look the same as in the previous example, except that instead of using the save_uploaded_file function:

    # do something with the uploaded file
    save_uploaded_file(request.FILES["file"])

we create an instance of the model ("file" corresponds to the attribute in the form definition and upload= corresponds to the attribute in the model definition):

    # save the uploaded file to a model
    instance = models.MyModel(upload=request.FILES["file"])
    instance.save()

The save() method takes care of properly storing the data, i.e., saving the file to the disk and the path to the file in the database. If you use ModelForm, you only need to save the form 😄

Working with Files Stored Using a Model

Files stored in this way are accessed using the FieldFile class. E.g.:

f = MyModel.objects.first()     # instance of the MyModel class
f.upload                        # instance of the FieldFile class

Important attributes:

  • name – relative path with respect to the storage
  • path – path to the file in the file system
  • url – relative URL for accessing the given file

Note: notice that in the model definition, the FileField class was used.

Example: Displaying a List of Uploaded Files

In the index function in the views.py file, we add a query for records in the database:

def index(request):
    ...

    variables["uploads"] = models.MyModel.objects.all()
    return render(request, "demo/index.html", variables)

In the template, we format the list of files with links:

<ul>
{% for file in uploads %}
    <li><a href="{{ file.upload.url }}">{{ file.upload.name }}</a></li>
{% endfor %}
</ul>

Using the ImageField Class

Instead of the FileField class, you can use the ImageField class in the model or form definition, which provides additional functions for processing images:

  • for the <input> tag, the accept="image/*" attribute is automatically used
  • in the application, the file extension is validated (e.g. .jpg, .png, etc.)
  • validation of image data in the file (but it is not 100%...)
  • uses the Pillow library, which needs to be installed:
    pip install Pillow
    
  • the UploadedFile object has an additional image attribute, which corresponds to an instance of PIL.Image used for validation – the file is closed after validation, so only metadata is available (e.g. attributes format, width, height)

Example of Using the ImageField Class

class ImageForm(forms.Form):
    img = forms.ImageField()
form = forms.ImageUploadForm(request.POST, request.FILES)
if form.is_valid():
    img_field = form.cleaned_data["img"]    # instance UploadedFile
    img = img_field.image                   # instance PIL.Image
    metadata = {
        "format": img.format,
        "width": img.width,
        "height": img.height,
    }
    # if we need the data, we must reopen the file
    image = PIL.Image.open(img_field)

- [fileinput](https://docs.python.org/3/library/fileinput.html) – helper module for reading from multiple files in one `for` loop

NOTE: this approach is for some old Django version... 1. Use the `multiple` attribute for the `<input type="file">` tag. In the Django object world, this is done as follows: ```python class UploadFileForm(forms.Form): file = forms.FileField(widget=forms.ClearableFileInput( attrs={"multiple": True})) ``` 2. Process the list of files in the function in `views.py`: ```python def upload(request): ... # do something with the uploaded files for file in request.FILES.getlist("file"): save_uploaded_file(file) ... ```