Top 10 Database Interview Questions
1. What is the difference between SQL and NoSQL databases?
SQL (Structured Query Language) databases are relational databases that use structured schemas and are ideal for handling complex queries and transactions. They follow a predefined schema and use tables to store data, with relationships defined by primary and foreign keys. Examples include MySQL, PostgreSQL, and Oracle.
NoSQL (Not Only SQL) databases, on the other hand, are non-relational and designed to handle unstructured or semi-structured data. They are more flexible in terms of schema design and are suited for applications requiring scalability and high performance. NoSQL databases include document-based databases (e.g., MongoDB), key-value stores (e.g., Redis), column-family stores (e.g., Cassandra), and graph databases (e.g., Neo4j).
2. Explain the concept of normalization in databases.
Normalization is a process in database design aimed at reducing redundancy and improving data integrity by organizing data into tables. The process involves decomposing tables into smaller, related tables and defining relationships between them.
There are several normal forms, including:
- First Normal Form (1NF): Ensures that each column contains only atomic (indivisible) values and each record is unique.
- Second Normal Form (2NF): Builds on 1NF by ensuring that all non-key attributes are fully functionally dependent on the primary key.
- Third Normal Form (3NF): Further refines 2NF by removing transitive dependencies, where non-key attributes depend on other non-key attributes.
3. What is an index, and why is it important?
An index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional storage and potentially slower write operations. Indexes work by creating a sorted list of values and pointers to the corresponding records, allowing the database engine to quickly locate and retrieve data.
Indexes are important because they significantly enhance query performance, especially for large datasets. Common types of indexes include:
- Primary Index: Automatically created on primary keys.
- Secondary Index: Created on columns not part of the primary key.
- Composite Index: Created on multiple columns to support queries that filter on those columns.
4. What is a foreign key, and how does it work?
A foreign key is a column or set of columns in a table that establishes a link between the data in two tables. It refers to the primary key of another table, creating a relationship between the two tables. This relationship enforces referential integrity by ensuring that the value in the foreign key column must match a value in the referenced table’s primary key column or be null.
For example, in a database with Orders
and Customers
tables, the Orders
table might have a CustomerID
column that acts as a foreign key referring to the CustomerID
primary key in the Customers
table.
5. What is a database transaction, and what are its key properties?
A database transaction is a sequence of one or more SQL operations executed as a single unit of work. Transactions ensure that the database remains in a consistent state even in the event of system failures or concurrent access.
The key properties of transactions are encapsulated by the ACID acronym:
- Atomicity: Ensures that all operations within a transaction are completed successfully; if any operation fails, the entire transaction is rolled back.
- Consistency: Ensures that the database transitions from one valid state to another, maintaining data integrity.
- Isolation: Ensures that concurrent transactions do not interfere with each other, providing a stable view of the data.
- Durability: Ensures that once a transaction is committed, its changes are permanent and survive system crashes.
6. Explain the difference between INNER JOIN and LEFT JOIN.
In SQL, JOIN operations combine rows from two or more tables based on a related column. The primary difference between INNER JOIN and LEFT JOIN lies in how they handle non-matching rows:
- INNER JOIN: Returns only the rows where there is a match in both joined tables. If a row in either table does not have a corresponding match in the other table, it is excluded from the result set.
- LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table and the matched rows from the right table. If there is no match, NULL values are returned for columns from the right table.
7. What is a stored procedure, and what are its advantages?
A stored procedure is a precompiled collection of SQL statements and optional control-of-flow statements stored in the database. It can be executed with a single call and can accept parameters to dynamically modify its behavior.
Advantages of using stored procedures include:
- Performance: Precompiled execution plans improve performance by reducing parsing and compilation overhead.
- Reusability: Stored procedures can be reused across different applications and queries, reducing code duplication.
- Security: Stored procedures can encapsulate business logic and enforce access control, reducing the risk of SQL injection attacks.
- Maintainability: Changes to business logic can be made in a single place, simplifying maintenance.
8. What is a database view, and how is it used?
A database view is a virtual table based on the result of a SQL query. It does not store data physically but provides a way to present data in a specific format or subset. Views can simplify complex queries, enhance security by restricting access to specific columns or rows, and present data from multiple tables as a unified result.
For example, a view might be created to show only active employees from a Employees
table or to combine data from Orders
and Customers
tables for reporting purposes.
9. What is denormalization, and when might it be used?
Denormalization is the process of combining normalized tables to improve query performance by reducing the number of joins required. While normalization reduces redundancy and ensures data integrity, it can lead to complex queries and performance overhead. Denormalization aims to strike a balance between data integrity and performance.
Denormalization might be used in scenarios where read performance is critical, such as in reporting or analytical systems. It involves adding redundant data or merging tables to speed up query execution.
10. What are some common database backup and recovery strategies?
Backup and recovery are crucial for protecting data and ensuring business continuity. Common strategies include:
- Full Backup: A complete copy of the entire database. It is typically performed regularly and serves as the baseline for recovery.
- Incremental Backup: Only changes made since the last backup are recorded. It reduces backup size and time but requires a base backup for recovery.
- Differential Backup: Captures changes made since the last full backup. It balances backup size and recovery time.
- Point-in-Time Recovery: Allows recovery of the database to a specific point in time, which is useful for undoing accidental data modifications.
Implementing a robust backup and recovery strategy involves scheduling regular backups, testing recovery procedures, and storing backups securely.
Top Comments
No Comments Yet