Web Application Development

Jakub Klinkovský

:: Czech Technical University in Prague
:: Faculty of Nuclear Sciences and Physical Engineering
:: Department of Software Engineering

Academic Year 2024-2025

Database Systems

Basic Concepts

  • Database system is an information system used for storing and working with data.
  • Redundancy is a state where the same data is stored in multiple locations.
  • Inconsistency is a state where only part of the redundant data is updated (some data remains in contradiction with the updated data).
  • Integrity is a state where data corresponds to reality (e.g. age is a positive number, month is a number between 1 and 12).

Development of Database Systems

  • development since 1950
  • 1970-1980 – theory of the relational database model, creation of SQL
  • 1990-2000 – object-oriented data model
  • 2000-2010 – NoSQL (not-only SQL), NewSQL (modernization of SQL)

Database Management System (DBMS)

  • a standalone program that manages data in a database and provides access to it
    (data is typically represented by a file(s) with a fixed structure)
  • allow separation of the application layer from the data layer
    (DBMS defines an interface for accessing data)
  • advantages:
    • reduction of redundancy (with proper database design)
    • ensures consistency (a state without contradictions) and data integrity
    • data sharing, protection of data from misuse (access restriction)
    • data independence from the application, centralized data management
    • a unified approach to data, the possibility of using standard query languages (e.g. structured query language, SQL)

Examples of Database Management Systems

  • relational: MySQL, MariaDB, PostgreSQL, Microsoft SQL Server, Oracle Database
  • columnar: CrateDB, Kudu, InfluxDB, Prometheus (storing time series)
  • object-oriented: OpenAccess, ODABA, Wakanda
  • document-oriented: Apache CouchDB, MongoDB, Oracle NoSQL Database
  • in-memory: Memcached, Redis

Relational Database Model

It is based on the principle of tables and relationships between them.

  • table (entity, relation) consists of a header and a body (degree = number of columns, cardinality = number of rows)
  • attribute (field) = column of the table (domain = data type)
  • record (tuple) = row of the table
  • key = set of attributes of the table that are essential for defining integrity constraints
  • relationship = connection between tables

Types of Keys and Relationships

  • keys: primary (PK), unique (UK), foreign (FK)
  • relationship is a reference between the values of the foreign key (tab. 1) and the primary key (tab. 2)
  • types of relationships: one-to-many (non-unique FK to unique PK), one-to-one (PK is also FK), many-to-many (normalization using an additional table)

Integrity Constraints

  • domain integrity – control of the data type of values in the record (according to the column type)
    • NOT NULL integrity
    • some DBMS allow specifying additional checks (e.g. PostgreSQL)
  • entity integrity – given by the value of the primary key (must be unique)
  • referential integrity – given by the value of the foreign key (must exist a corresponding record in the referenced table)
  • integrity given by the unique key – values in the given table must be unique (but can be NULL)

Note: some DBMS are infamous for ignoring integrity constraints defined in the database (e.g. MySQL, MariaDB).

Normalization of Data

Normalization = gradual transformation of a table into a more suitable shape (using lossless decomposition).

Normalization goals:

  • create a flexible database design
  • properly place attributes in tables
  • properly define the primary key of each table
  • reduce (eliminate) data redundancy
  • maximize data structure stability
  • increase the efficiency of applications using the database
  • reduce maintenance costs

Normal Forms of Relation Schemes

  • 0NF (non-first normal form): unique rows (no duplicate records).
  • 1NF: 0NF + scalar columns (columns cannot contain relations or composite values).
  • 2NF: 1NF + there is no dependency on part of the key (no non-key attribute is partially functionally dependent on the key).
  • 3NF: 2NF + there is no "non-key dependency" (no non-key attribute is transitively dependent on the key).
  • note: there is also EKNF (elementary key normal form) and BCNF (Boyce-Codd normal form) between 3NF and 4NF.
  • 4NF: 3NF + "the table describes one entity" (if a non-key attribute is non-trivially dependent, it must be dependent only on the key).
  • 5NF: 4NF + adding a new substantial (important and "reasonable") constraint does not cause the table to fall into a lower form (usually into 2NF).

Design Requirements

Design requirements for the database schema using normalization:

  • lossless decomposition (combining decomposed tables must result in exactly the same table)
  • preservation of dependencies
  • achieving at least 3NF

Note: some approaches used in practice intentionally perform denormalization for faster DBMS response to queries (which is a technical problem that should be solved with better implementation).

Database Design Levels

Database design has 3 levels:

  1. conceptual level – the goal is to create a semantic model describing reality
    (design of the ERA (entity-relationship-attribute) model, e.g. using creately.com)
  2. logical level – views of data created over the model describing reality
    (selection of required data from a combination of tables)
  3. implementation level – selection of DBMS in which the database will be created
    (and its actual creation)

Example 1

People and their attributes (decomposition of the M:N relationship)

  • each person can have many attributes
  • each property can have many people
  • each property can be had by each person at most once

contain center

Example 2

Categories, commodity items (goods), and price development

  • each item (and each category) belongs to at most one category
  • each price (without VAT) relates to exactly one item at a certain time

contain center

Example 3

Categories and commodity items
(decomposition of the M:N
relationship)

  • unlike example 2, goods can belong to several categories
  • each category still belongs to at most one category

SQL – Language for Relational Database Systems

SQL (Structured Query Language)

  • It is used in most relational database systems.
  • SQL is not only a query language (DQL, data query language), but it serves also other means:
    • DDL (data definition language) – language for defining data
      (e.g. CREATE, DROP)
    • DCL (data control language) – language for data control
      (e.g. GRANT, REVOKE)
    • DML (data manipulation language) – language for manipulating data
      (e.g. INSERT, UPDATE)
  • There are SQL language standards approved by ANSI, but most relational DBMS implement only a certain part of the standard + their own functions ⇒ problems with code portability to another DBMS.

Example – Creating a Table

CREATE TABLE [ IF NOT EXISTS ] table_name
(
  { column_name data_type [ constraints ... ] }
  [, ... ]
)
[ modifiers ... ];

where constraints can be e.g.:

{ NOT NULL |
  NULL |
  CHECK ( expression ) |
  DEFAULT default_expr |
  UNIQUE index_parameters |
  PRIMARY KEY index_parameters |
  FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ modifiers ... ] }

Legend: [ text ] – optional part, ... – repetition of the element

Example – Changing the Table Structure

ALTER TABLE [ IF EXISTS ] table_name action [, ...];

where action can be e.g. ADD, MODIFY, DROP, etc. with appropriate parameters.

Using the ALTER TABLE command, you can e.g.:

  • add a new attribute (column)
  • change the definition of an attribute (under certain conditions)
  • remove an attribute
  • add, change, or remove a foreign key
  • add or remove a condition between columns

Example – Removing a Table

DROP TABLE table_name;

Note: using the DROP TABLE command will remove the table along with its data!

A table cannot be removed if it would violate the integrity of the database, e.g.:

  • there are foreign keys in other tables that reference it (referential integrity would be violated)
  • the table is part of a view (VIEW) or procedure (PROCEDURE)

Example – Inserting a Record into a Table

INSERT INTO table_name [ ( column_name [, ...] ) ]
VALUES ( { expression | DEFAULT } [, ...] ) [, ...]
[ modifiers ... ];

Using the INSERT command, you can insert one record into the specified table (in some DBMS, you can insert multiple records at once).

Data that violates the integrity constraints of the database cannot be inserted (e.g. domain, entity, or referential integrity).

Example – Changing Records in a Table

UPDATE table_name [ modifiers ... ]
SET { column_name = { expression | DEFAULT } } [, ...]
[ WHERE condition ]
[ modifiers ... ];

Using the UPDATE command, you can change one or more values in one or more records. The number of affected records is determined by the WHERE condition – if WHERE is absent, the change is made to all records in the table!

Modified records must meet the integrity constraints of the database (e.g. domain, entity, or referential integrity), otherwise the command will fail.

When changing the primary key, the DBMS performs the appropriate actions in the subordinate tables (if ON UPDATE CASCADE).

Example – Deleting Records in a Table

DELETE FROM table_name [ modifiers ... ]
[ WHERE condition ];

Using the DELETE command, you can delete one or more records from the specified table. The number of deleted records is determined by the WHERE condition – if WHERE is absent, all records in the table are deleted!

Deleting records must not violate the referential integrity of the database, otherwise the command will fail.

Foreign keys in subordinate tables can be modified using defined actions (e.g. ON DELETE CASCADE or ON DELETE SET NULL).

Example – Selecting Data from a Table

SELECT <projection>
FROM <data_source>
[ WHERE <restriction> ]
[ GROUP BY <aggregation> [ HAVING <second_restriction> ] ]
[ ORDER BY <ordering> ];

Query execution is performed in the following order:

  1. data sourcerestriction
  2. aggregation
  3. second restriction
  4. sorting
  5. projectionoutput

The result is a "virtual table", which can be used as a data source in further queries.

Operations in Data Selection from a Table

  • Projection:
    • column names from the given data sources (if at least two data sources are used and the same column name is used in both, the column name must be specified in the form <table>.<column>)
    • to select all columns, use *
    • functions can be used, e.g. SUM(<column>)
    • a column or expression can be renamed/named:
      <column or expression> AS <new_name>
  • Data Source = physical or virtual table, or a combination of tables (JOIN). A table can be renamed (e.g. when joining a table with itself). See below.
  • Restriction using a condition (logical expression)
    • Determines which records will be selected (those for which the condition is true, TRUE).
  • Aggregation = expression (typically just the column name, or columns) by which the rows of the table are grouped. See below.
  • Sorting = expression (typically the column name, or columns) by which the rows of the result are sorted.
    • Columns existing in the data source can be used (they do not have to be part of the projection). The first expression is the main sorting criterion, the others are considered secondary.
    • Implicitly, the order is ascending (ASC), for descending, DESC must be added.

Data Source in a Query

The data source in a query can be:

  1. one (specific) table, e.g.
    SELECT * FROM school ORDER BY name;
    
    ⇒ list of all data from the school table, sorted by name
  2. multiple tables (Cartesian product or table join based on relationships), e.g.
    SELECT first_name, last_name
    FROM student JOIN studies ON id_student = who
    WHERE something = 1
    ORDER BY last_name, first_name;
    
    ⇒ list of first and last names of students who study at the school with property something = 1 (sorted by surname, then by given name)

Data Source – Cartesian Product

  1. Cartesian product (CROSS JOIN)
    • syntax: tabA, tabB
    • result: joins rows of both tables "each with each" (if tabA has m columns and M rows and tabB has n columns and N rows, then the resulting Cartesian product contains m + n columns and M · N rows)
    • use only with the correct WHERE condition!!!
  • center
  • center

Data Source – Inner Join

  1. Inner join (INNER JOIN)
    • syntax: tabA JOIN tabB ON <condition>
    • result: joins rows of both tables based on a condition (typically "foreign key=corresponding primary key")
    • DBMS are optimized for JOIN ⇒ it is faster than the Cartesian product
SELECT Customers.customer_id, Customers.first_name, Orders.amount
FROM Customers
INNER JOIN Orders
ON Customers.customer_id = Orders.customer;

center

Data Source – Left and Right Outer Join

  1. Left (outer) join (LEFT JOIN)

    • syntax: tabA LEFT JOIN tabB ON <condition>
    • result: joins table tabA with table tabB so that it also preserves rows from tabA that do not have an equivalent in tabB
    • the order matters!
    • it is slower than the inner join
  2. Right (outer) join (RIGHT JOIN): tabA RIGHT JOIN tabB ON <condition>
    the result is (except for the order of attributes) the same as tabB LEFT JOIN tabA ON <condition>

SELECT Customers.customer_id, Customers.first_name, Orders.amount
FROM Customers
LEFT JOIN Orders
ON Customers.customer_id = Orders.customer;

center

SELECT Customers.customer_id, Customers.first_name, Orders.amount
FROM Customers
RIGHT JOIN Orders
ON Customers.customer_id = Orders.customer;

center

Data Source – Full Outer Join

  1. Full outer join (FULL JOIN)
    • syntax: tabA FULL [OUTER] JOIN tabB ON <condition>
    • result: joins tabA with table tabB so that it also preserves rows from tabB that do not have an equivalent in the second table
    • in practice, it is not used much, some DBMS do not support it
SELECT Customers.customer_id, Customers.first_name, Orders.amount
FROM Customers
FULL OUTER JOIN Orders
ON Customers.customer_id = Orders.customer;

center

contain center

Aggregation – Grouping Rows

The GROUP BY clause is used to group rows into sets. Aggregation functions are then applied to the resulting sets, most commonly:

function result p (usually column name or expression)
COUNT(*) number of records
COUNT(<p>) number of non-NULL values any
SUM(<p>) sum of values p number
MIN(<p>) minimum of values p number, string, date
MAX(<p>) maximum of values p number, string, date
AVG(<p>) arithmetic mean number

Query Examples

List of names and surnames of students whose surname starts with 'Nov', sorted by surname, then by name:

SELECT first_name, last_name FROM student
WHERE last_name LIKE 'Nov%' ORDER BY last_name, first_name;

List of IDs of students who are currently studying at the school with school=3 (order is random):

SELECT student_id FROM studies
WHERE school = 3 AND finished IS NULL;

List of students with the number of schools they have studied (or are studying):

SELECT student_id, MIN(last_name) AS last_name, COUNT(school) AS school_count 
FROM student LEFT JOIN studies ON id_student = student_id
GROUP BY student_id;

Database Systems in the Django Framework

Backend for Database System

  • Django supports the following database systems: PostgreSQL, MariaDB, MySQL, Oracle, SQLite.
  • The default backend is SQLite – a simple database system implemented in C, does not require installation of an external application, the database is realized in one file (see db.sqlite3 in our project).
  • The SQLite database can be manually edited using the SQLite Browser program.
  • An alternative database system can be set in the project's settings.py file, see Django documentation.

Object-Relational Mapping (ORM)

The ORM system provides automatic conversion and synchronization of data between the relational database and the object-oriented programming language.

Terminology and implementation in the Django framework (see Django documentation):

  • model = object representing a specific record in the database (subclass of django.db.models.Model)
  • field = description of model attributes (attribute of the object corresponds to a column in the relational model)
    • type – determines the data type in the database and automatically generated forms
    • parameters – e.g. null, blank, choices, default, help_text, primary_key, unique

Model definitions are made in the models.py file.

Example – ORM for a Simple Model

from django.db import models

class Person(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)

Django creates the following table using SQL:

CREATE TABLE myapp_person (
    "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    "first_name" varchar(30) NOT NULL,
    "last_name" varchar(30) NOT NULL
);
  • the table name is prefixed with the application name (myapp)
  • the id column was added automatically (can be disabled)
  • Django generates SQL in the dialect for the given DBMS (in this example, PostgreSQL)

Definition of Relationships Between Models

Relationships between models can be defined using the ForeignKey class:

from django.db import models

class Musician(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    instrument = models.CharField(max_length=100)

class Album(models.Model):
    artist = models.ForeignKey(Musician, on_delete=models.CASCADE)
    name = models.CharField(max_length=100)
    release_date = models.DateField()
    num_stars = models.IntegerField()

Further classes ManyToManyField and OneToOneField.

Available Attribute Types (fields)

Complete overview: Django documentation.

  • BooleanField, IntegerField, BigIntegerField, FloatField, DecimalField
  • PositiveIntegerField, PositiveBigIntegerField
  • CharField (parameter max_length), TextField (unlimited size)
  • DateField, TimeField, DateTimeField, DurationField (in Python, objects datetime.date, datetime.time, datetime.datetime, datetime.timedelta)
  • others: EmailField, URLField, UUIDField, JSONField, ImageField, FileField, FilePathField, ...

Restrictions on Model Definitions

  1. The attribute name must not be a keyword in the Python language:
    class Example(models.Model):
        pass = models.IntegerField()    # 'pass' is a keyword!
    
  2. The attribute name must not contain two or more underscores in a row:
    class Example(models.Model):
        foo__bar = models.IntegerField() # 'foo__bar' has two underscores!
    
    A pair of underscores has a special meaning in the syntax for queries, see later.
  3. The attribute name must not end with an underscore (for similar reasons).

Database Management Using the manage.py Script

The manage.py script provides several subcommands for database management in the project:

  • dbshell – starts the command-line client for the given database
  • makemigrations – creates files for migrating the database after changing the model specifications – see Django documentation on the migration process
    • files are in directories <app>/migrations/, e.g. demo/migrations/
  • showmigrations – displays the migration status in the project
  • migrate – synchronizes the database status with the set of migrations
  • loaddata – finds data for initialization and inserts it into the database
  • dumpdata – displays all data in the database
  • inspectdb – creates an object model (in Python syntax) for an existing database

ORM Workflow

Development of a new application (or new model):

  1. Define models in the models.py file (see previous slides)
  2. Create corresponding tables in the database using the manage.py script
    • create a so-called migration – a file describing the changes to be made
    • perform migrations (can be chained and applied to different databases)
  3. Start the application (web server)
  4. Further, ORM takes care of the contents of individual tables (creating new objects and inserting them into the database)

Complication: changing the schema of an existing database, if we do not want to lose data.

Completing the Setup of a Django Project

After startproject, we have pending migrations for the admin, auth, etc. apps:

> python manage.py showmigrations 
admin
 [ ] 0001_initial
 [ ] 0002_logentry_remove_auto_add
 [ ] 0003_logentry_add_action_flag_choices
auth
 [ ] 0001_initial
 [ ] 0002_alter_permission_name_max_length
 [ ] 0003_alter_user_email_max_length
 [ ] 0004_alter_user_username_opts
 [ ] 0005_alter_user_last_login_null
 [ ] 0006_require_contenttypes_0002
 [ ] 0007_alter_validators_add_error_messages
 [ ] 0008_alter_user_username_max_length
 [ ] 0009_alter_user_last_name_max_length
 [ ] 0010_alter_group_name_max_length
 [ ] 0011_update_proxy_permissions
 [ ] 0012_alter_user_first_name_max_length
contenttypes
 [ ] 0001_initial
 [ ] 0002_remove_content_type_name
demo
 (no migrations)
sessions
 [ ] 0001_initial

Performing Database Migration

> python manage.py migrate 
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, sessions
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying auth.0001_initial... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying admin.0003_logentry_add_action_flag_choices... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying auth.0012_alter_user_first_name_max_length... OK
  Applying sessions.0001_initial... OK

Checking Migration Status

> python manage.py showmigrations 
admin
 [X] 0001_initial
 [X] 0002_logentry_remove_auto_add
 [X] 0003_logentry_add_action_flag_choices
auth
 [X] 0001_initial
 [X] 0002_alter_permission_name_max_length
 [X] 0003_alter_user_email_max_length
 [X] 0004_alter_user_username_opts
 [X] 0005_alter_user_last_login_null
 [X] 0006_require_contenttypes_0002
 [X] 0007_alter_validators_add_error_messages
 [X] 0008_alter_user_username_max_length
 [X] 0009_alter_user_last_name_max_length
 [X] 0010_alter_group_name_max_length
 [X] 0011_update_proxy_permissions
 [X] 0012_alter_user_first_name_max_length
contenttypes
 [X] 0001_initial
 [X] 0002_remove_content_type_name
demo
 (no migrations)
sessions
 [X] 0001_initial

Adding Models

  1. Create new models (e.g. from slide 44)
  2. Create a migration:
    > python manage.py makemigrations 
    Migrations for 'demo':
      demo/migrations/0001_initial.py
        - Create model Musician
        - Create model Album
    
  3. Perform migrations:
    > python manage.py migrate
    Operations to perform:
      Apply all migrations: admin, auth, contenttypes, demo, sessions
    Running migrations:
      Applying demo.0001_initial... OK
    

Inserting Records into the Database – Admin WebUI

For development purposes, testing data can be inserted:

  1. Using the administrative application – see http://localhost:8000/admin
    • first, it is necessary to create an administrative account using the command
      python manage.py createsuperuser
    • models need to be registered in the admin.py file to be displayed:
      from .models import Musician, Album
      admin.site.register(Musician)
      admin.site.register(Album)
      

Inserting Records into the Database – Shell

  1. Using the interactive shell: python manage.py shell, e.g.:
    >>> from demo.models import *
    
    >>> karel = Musician()
    >>> karel.first_name = "Karel"
    >>> karel.last_name = "Gott"
    >>> karel.instrument = "zpěv"
    >>> karel.save()
    
    >>> album = Album()
    >>> album.artist = karel   # in fact, the database stores only the id
    >>> album.name = "Karel Gott"
    >>> album.release_date = "1973-01-01"
    >>> album.num_stars = 5
    >>> album.save()
    

Retrieving Records from the Database

The Model class has a special attribute objects for managing access to instances:

>>> from demo.models import *

>>> Album.objects.first()     # returns the first object
>>> Album.objects.last()      # returns the last object

>>> Album.objects.all()       # returns a set of all objects
>>> Album.objects.all()[:5]   # returns the first 5 objects

>>> Album.objects.filter(name="Karel Gott")    # filtering by album name
>>> Album.objects.filter(artist__first_name="Karel", artist__last_name="Gott")
                            # filtering through a relationship between tables, connection: '__'

>>> Album.objects.exclude(name="Karel Gott")   # complementary filtering

Further possibilities: see access to objects

Test Data for Database Development

When developing an application, it is not always necessary to create a proper migration and sometimes it is better to start from the beginning (or the last functional state, which is deployed on a public server).

Workflow with restart:

  1. Delete the database (the db.sqlite3 file)
  2. Create a database in the initial state (according to the state of the models.py file)
  • python manage.py makemigrations
  • python manage.py migrate
  1. Initialize the database with test data – so-called fixtures
  2. Work on modifying the models

Inserting Records into the Database – Fixtures

  1. The demo/fixtures/ directory contains test data for the demo application:

    • several files that need to be loaded in the correct order according to dependencies, e.g. python manage.py loaddata genres musicians bands albums

    • file format: see Django documentation

    • definition of data for ManyToManyField: list of PK values that are referenced:

      {
          "model": "demo.Band",
          "pk": 1,
          "fields": {
              "name": "The Beatles",
              "members": [2, 3, 4, 5]
          }
      },