IM

ISSA MDOE

Full Stack Software Engineer

Loading0%
2025-12-01 15 min read

The Art of Database Indexing: B-Trees, Hash, and GIN

#Database#PostgreSQL#Performance
👨🏽‍💻
Issa Ally Mdoe
Full Stack Engineer

Indexes are the single most effective way to optimize database performance, yet they are often misunderstood. Adding an index isn't a magic "make it fast" button. You need to understand the underlying data structures to choose the right strategy.

In this guide, we'll explore index types in PostgreSQL and when to use them.

1. B-Tree Indexes (The Default)

B-Tree (Balanced Tree) is the default index type for a reason. It handles equality and range queries efficiently. It keeps data sorted in a balanced tree structure, allowing O(log n) search.

  • Equality: WHERE id = 5
  • Ranges: WHERE age > 18 AND age < 30
  • Sorting: ORDER BY created_at DESC
sql
-- Creating a B-tree index
CREATE INDEX idx_users_email ON users(email);

2. Hash Indexes

Hash indexes are purely for equality checks. They map a value to a 32-bit hash bucket.

  • Exact equality where the value is large (e.g., long strings/URLs), and you only ever search by =.
  • Cannot handle range queries (>, <).
  • Cannot be used for sorting.
  • Historically not crash-safe (fixed in Postgres 10+).
sql
-- Creating a Hash index
CREATE INDEX idx_sessions_token ON sessions USING HASH (token);

3. GIN (Generalized Inverted Index)

This is where Postgres shines. GIN indexes are designed for composite values like Arrays and JSONB. They act like the index at the back of a book, mapping individual elements to the rows that contain them.

  • Full-text search.
  • Querying inside JSONB documents.
  • Array containment checks.
sql
-- Indexing a JSONB column
CREATE INDEX idx_products_attributes ON products USING GIN (attributes);

-- Fast query on JSON SELECT * FROM products WHERE attributes @> '{"color": "red"}'; `

4. GiST (Generalized Search Tree)

GiST is highly extensible and used for complex data types like Geometric data (PostGIS) and Range types.

  • "Nearest neighbor" searches (finding closest locations).
  • Overlapping ranges (e.g., calendar scheduling).

Multicolumn Indexes & The Leftmost Prefix Rule

  • (a)
  • (a, b)
  • (a, b, c)

It CANNOT efficienty use the index for searches on just (b) or (c) without (a). This is the "Leftmost Prefix" rule.

Analyzing Performance

Never guess. Use EXPLAIN ANALYZE to see exactly what the query planner is doing.

sql
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

Look for "Index Scan" (good) vs "Seq Scan" (bad for large tables). If you see a Sequential Scan on a table with millions of rows, you are missing an index.


Enjoyed this article?
Share it with your network