Hash Buckets - Databricks

Hash Buckets

Glossary Item
« Back to Glossary Index
Source Databricks

In computing, a hash table (hash map) is a data structure that provides virtually direct access to objects based on a key (a unique String or Integer). A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

Here are the main features of the key used:

  • The key used can be your SSN, your telephone number, account number, etc
  • Must have unique keys
  • Each key is associated with–mapped to–a value

Hash buckets are used to apportion data items for sorting or lookup purposes The aim of this work is to weaken the linked lists so that searching for a specific item can be accessed within a shorter timeframe.
Hash Buckets

A hash table that uses buckets is actually a combination of an array and a linked list. Each
element in the array (the hash table) is a header for a linked list. All elements that hash
into the same location will be stored in the list.

The hash function assigns each record to the first slot within one of the buckets. In case the slot is occupied, then the bucket slots will be searched sequentially until an open slot is found.

In case a bucket is completely full, the record will get stored in an overflow bucket of infinite capacity at the end of the table. All buckets share the same overflow bucket. However, a good implementation will use a hash function that distributes the records evenly among the buckets so that as few records as possible go into the overflow bucket.

« Back to Glossary Index