Types of Hashing in DBMS (Static & Dynamic Hashing)

Hashing is a technique used in database management systems to quickly search for and retrieve data from a database. It involves using a hash function to map data, such as a key or value, to a fixed-size index, called a hash code or hash value. This allows for efficient searching and retrieval of data by comparing the hash value of the data to be retrieved with the hash values stored in the database. Hashing can be used in various database structures such as hash tables, hash indexes, and hash maps.

What is Hashing?

Hashing is a technique used in database management systems (DBMS) to store and retrieve data efficiently. It involves using a hash function to map keys to their corresponding data values, and storing the data in a hash table. The hash function takes a key as input and produces a hash value that serves as an index to the location where the corresponding data is stored in the hash table.

Read also: Introduction to Data Models

Hash Organization

Hash organization in a database management system (DBMS) uses a hash function and a hash table to map keys to their corresponding data values. The hash function takes a key as input and produces a hash value that serves as an index to the location where the corresponding data is stored in the hash table.

Bucket

A bucket in hash organization is a location in the hash table where data values are stored. The hash table is divided into a fixed number of buckets or slots, each of which can hold one or more data values. When data is inserted into the table, the hash function is used to calculate the hash value for the key, and the data is placed in the corresponding bucket.

Hash Function

A hash function in hash organization is a mathematical function that takes a key as input and produces a hash value that serves as an index to the location where the corresponding data is stored in the hash table. The hash function must be designed carefully to ensure that it distributes the data evenly across the buckets, reducing the number of collisions, i.e., situations where multiple keys map to the same bucket. In the case of collisions, various collision resolution techniques, such as open addressing or chaining, can be used to resolve the conflict and store the data correctly.

Hash functions can be designed in various ways, such as modulo, mid-square or folding, to produce different results. The choice of the hash function will depend on the specific requirements of the application and the trade-offs between collision resolution, hash value distribution and performance.

Types of Hashing

These are two types of hashing used in DBMS. There are also other variations and combinations of these techniques that can be used depending on the specific requirements of the application.

Static Hashing

Static hashing is a technique used in database management systems where the size and structure of a hash table is fixed and determined at the time of its creation. In this technique, the number of slots in the table does not change as elements are added or removed.

With static hashing, a hash function is used to map keys to the fixed-size hash table. The function takes the key as input, performs some mathematical operations on it, and returns an index that corresponds to a slot in the table. The value associated with the key is then stored in the corresponding slot.

When a value needs to be retrieved, the hash function is used again to calculate the index of the slot where the value should be stored. The value is then retrieved from that slot.

Static hashing can be efficient when the number of elements in the table is known in advance and the distribution of keys is uniform. It also can avoid the overhead of resizing the table and rehashing the keys when the number of elements change. However, if the number of elements in the table changes significantly, static hashing can lead to a lot of collisions (when two keys hash to the same slot) which can slow down the search and retrieval of data.

Dynamic hashing, which allows for the automatic adjustment of the size and structure of the hash table, can be used to resolve this issue.

Static Hashing Example

An example of static hashing is as follows:

Suppose we have a database of product information, where each product has a unique product code as the key. We use a hash function to map the product codes to a fixed-size hash table with 10 slots. The hash function takes the product code, performs some mathematical operations on it, and returns an index between 0 and 9.

We create a hash table with 10 slots, and all slots are empty. As we add product information to the database, the hash function maps each code to a slot in the table. For example, product code P001 is mapped to slot 1, product code P002 is mapped to slot 2, and so on.

Once the product information is stored in the table, we can retrieve it quickly by using the hash function to calculate the index of the slot where the information is stored, and then retrieve it from that slot.

Because the size of the table is fixed and determined at the time of its creation, static hashing can be efficient when the number of elements in the table is known in advance and the distribution of keys is uniform. However, if the number of elements in the table changes significantly, collisions may occur and retrieval of elements could become slow.

Dynamic Hashing

Dynamic hashing is a technique used in database management systems to efficiently manage the size and structure of a hash table. In traditional static hashing, the size of the hash table is fixed and determined at the time of creation. With dynamic hashing, the size of the hash table can change as the number of elements in the table changes.

Dynamic hashing is typically implemented using a technique called extendible hashing. This is a method where the hash table is initially created with a small number of slots. As more elements are added to the table, the number of slots is increased to accommodate the additional data. The hash values are then recalculated to match the new structure of the table.

Dynamic hashing allows the hash table to adapt to changing data sets, providing better performance and more efficient use of memory. It also reduces the likelihood of collisions, which can occur when multiple elements have the same hash value and are stored in the same slot.

Dynamic Hashing Example

An example of dynamic hashing is as follows:

Suppose we have a database of employee records, where each record has a unique employee ID as the key. We use a hash function to map the employee IDs to a fixed-size hash table with 8 slots. The hash function takes the employee ID, performs some mathematical operations on it, and returns an index between 0 and 7.

Initially, the hash table has 8 slots, and all slots are empty. As we add employee records to the database, the hash function maps each ID to a slot in the table. For example, employee ID 12345 is mapped to slot 3, employee ID 56789 is mapped to slot 6, and so on.

As more and more employee records are added, the hash table starts to fill up. Eventually, we reach a point where some of the slots have more than one record, which is known as a collision. To resolve this issue, we can use dynamic hashing.

In dynamic hashing, we can split a slot in half when it becomes too full. For example, if slot 3 becomes too full, we can split it into two new slots, 3A and 3B. We then need to recalculate the hash values for the employee IDs that were originally stored in slot 3 and move them to the appropriate new slot.

By using dynamic hashing, the size of the hash table can be automatically adjusted to accommodate the changing number of records, making the search and retrieval of data more efficient.

What is Hashing in File Structure?

Hashing in file structures is a technique used to organize and manage data stored in files. It involves using a hash function to map the data, such as a key or value, to a fixed-size index, called a hash code or hash value. This index is used to identify the location of the data within the file.

Hashing can be used in various file structures such as hash files, hash tables, and hash indexes. In a hash file, data is stored in a fixed-size block called a bucket, and the hash function is used to map the data to a specific bucket. In a hash table, data is stored in a table with multiple buckets and the hash function is used to map the data to a specific bucket within the table. In a hash index, the hash function is used to map data to an index, which is used to quickly locate the data within the file.

Hashing in file structures allows for efficient searching and retrieval of data by comparing the hash value of the data to be retrieved with the hash values stored in the file. It also allows for quick access to data, as the location of the data can be determined by the hash code, rather than by searching through the entire file.

However, hashing in file structures can also lead to collisions, which occur when multiple elements have the same hash value and are stored in the same location. To avoid collisions, various collision resolution techniques such as open addressing and chaining can be used.

Difference between Indexing and Hashing in DBMS

Indexing and Hashing are two techniques used to improve the performance of database management systems (DBMSs) by allowing for faster data retrieval. However, they differ in several ways:

Purpose: Indexing is used to improve the efficiency of data retrieval operations, such as search, sort and filter, by providing a more direct access path to the data. On the other hand, Hashing is used to improve the efficiency of data insertion, update and deletion operations by allowing for faster data placement and retrieval based on a hash value calculated from the key.
Structure: Indexing involves creating an index data structure, usually a B-Tree or a Hash Table, that maps the values of one or more columns of a table to their location in the table. Hashing, on the other hand, involves transforming a key into a hash value that serves as an index to the location where the corresponding data is stored in the hash table.
Space overhead: Indexing requires additional space to store the index data structure, which can become quite large for large tables. Hashing, on the other hand, has a constant space overhead, which is usually small, regardless of the size of the table.
Update overhead: Updating an indexed table requires updating the index data structure, which can be an expensive operation. Hashing, on the other hand, has a constant update overhead, which is usually small, regardless of the size of the table.
Search efficiency: Indexing provides faster data retrieval operations, as it allows for more direct access to the data based on the index. Hashing, on the other hand, provides faster data insertion, update and deletion operations, as it allows for faster data placement and retrieval based on the hash value.