Generalization in DBMS | Why we use it?

Khurram Hanif March 23, 2023 5 minutes read

In DBMS generalization refers to the process of abstracting the common attributes of a group of entities or data items and creating a more general entity or data item.

For example, let’s say we have different types of vehicles, such as cars, motorcycles, and trucks, each with its own specific attributes such as number of wheels, engine type, and seating capacity. We can generalize these different types of vehicles into a more general entity called “vehicle,” which would have common attributes such as “number of wheels” and “fuel type,” regardless of the specific type of vehicle.

Generalization is often used in database design to create a more efficient and organized database structure by reducing the redundancy of data. It also allows for more flexibility in querying and analyzing the data.

Why Generalization is Used?

Generalization is used in database design for several reasons:

Reducing redundancy

When there are multiple similar entities in a database, generalization helps to reduce the redundancy of data by creating a more general entity that captures the common attributes of those entities. This results in a more efficient and organized database structure.

Data abstraction

Generalization helps to abstract the common attributes of a group of entities and create a more generalized entity. This allows for more flexibility in querying and analyzing the data, as queries and analyses can be performed on the more generalized entity rather than on each specific entity.

Code reuse

By creating a more generalized entity, code can be reused for multiple entities. For example, if there are multiple types of vehicles in a database, code for calculating fuel efficiency can be reused for each type of vehicle by calling the fuel efficiency function on the generalized “vehicle” entity.

Simplifying database design

Generalization simplifies the database design by reducing the number of tables needed to represent the entities in the database. This makes it easier to manage and maintain the database.

Implementation of Generalization

Generalization is a process in database management systems (DBMS) where a set of lower-level entities is combined to form a higher-level entity. This process involves identifying common attributes and relationships among the lower-level entities and using them to create a new entity with fewer attributes and relationships.

In implementing generalization in DBMS, following steps are included:

Identify lower-level entities: The first step is to identify the lower-level entities that can be combined to form a higher-level entity. These lower-level entities should have common attributes and relationships.
Define the higher-level entity: The higher-level entity should be defined based on the common attributes and relationships of the lower-level entities. The higher-level entity should have fewer attributes and relationships than the lower-level entities.
Create a super-type table: A super-type table is created to store the common attributes of the lower-level entities. This table will serve as the parent table for the lower-level entities.
Create sub-type tables: Sub-type tables are created to store the attributes that are unique to each lower-level entity. These tables will have a foreign key that references the primary key of the super-type table.
Define relationships: Relationships between the super-type and sub-type tables are defined based on the common attributes and relationships of the lower-level entities.
Implement constraints: Constraints are implemented to ensure that data integrity is maintained. For example, a foreign key constraint can be used to ensure that a sub-type table can only reference a valid record in the super-type table.
Query the database: Once the generalization is implemented, the database can be queried to retrieve data from the super-type and sub-type tables.

Examples

Let’s see the example of vehicles.

Example 1: Vehicles

Let’s say we have a database for a car dealership that sells different types of vehicles, such as cars, motorcycles, and trucks. Each of these vehicles has its own specific attributes, such as number of wheels, engine type, and seating capacity. Instead of creating separate tables for each type of vehicle, we can use generalization to create a more generalized entity called “vehicle” that includes the common attributes of all types of vehicles. Here is an example of how this might look:

In this example, the “vehicle” table includes the common attributes of all types of vehicles, such as “engine type”, “model”, “number of wheels”, and “manufacturer”. The “vehicle_type” attribute indicates whether the vehicle is a car or truck. The specific attributes of each type of vehicle are stored in separate tables, such as “car_attributes”, and “truck_attributes”.

Generalization in DBMS Example. Why generalization is used? — Generalization in DBMS

Example 2: Employees

Let’s say we have a database for a company that has different types of employees, such as full-time employees, part-time employees, and contractors. Each type of employee has its own specific attributes, such as hours worked, hourly rate, and contract end date. Instead of creating separate tables for each type of employee, we can use generalization to create a more generalized entity called “employee” that includes the common attributes of all types of employees.

In this example, the “employee” table includes the common attributes of all types of employees

employee table
- employee_id
- first_name
- last_name
- hire_date
- employee_type

The “employee_type” attribute indicates whether the employee is a full-time employee, part-time employee, or contractor.

The specific attributes of each type of employee are stored in separate tables, such as “full_time_employee_attributes”, “part_time_employee_attributes”, and “contractor_attributes”.

In both examples, generalization allows us to reduce redundancy and simplify the database design by creating a more efficient and organized structure. It also allows for more flexibility in querying and analyzing the data.

More to read