Database Normalization
What is Normalization?
- Normalization is the process of restructuring the logical data model of a database to eliminate redundancy, organize data efficiently and reduce repeating data and to reduce the potential for anomalies during data operations.
- The formal classifications used for describing a relational database's level of normalization are called normal forms (abbreviated as NF)
History
- Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form in his paper A Relational Model of Data for Large Shared Data Banks Codd stated:
- “There is, in fact, a very
simple elimination procedure which we shall call normalization. Through
decomposition nonsimple domains are replaced by ‘domains
whose elements are atomic (nondecomposable) values.”
What Happens without Normalization
- Information is stored redundantly wasting storage
- A non-normalized database can suffer from data anomalies:
- Update (Modification) anomalies
Anomalies example
Consider the relation:
EMP_PROJ ( Emp_ID, Proj_ID, E_Name, P_Name, No_Hours,
Wage)
- Insert Anomaly
- Cannot insert a project unless an employee is assigned to .
- Inversely - cannot insert an employee unless he/she is assigned to a project.
- Delete Anomaly
- When a project is deleted, it will result in deleting all the employees who work on that project.
- Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.
Definitions
- Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) the value of the Primary Key (or its part).
Definitions
- Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key.
Partial Dependency
Definitions
- Transitive Dependency – when a non-key attribute determines another non-key attribute.
Transitive Dependency
First Normal Form (1NF)
- Each table has a primary key: minimal set of attributes which can uniquely identify a record.
- There are no multivalued attributes or repeating groups in the table.
Bringing a Relation to 1NF
Bringing a Relation to 1NF
- Option 1: Make a determinant (a determinant is any attribute whose value determines other values within a row) of the repeating group or the multivalued attribute a part of the primary key.
Composite Primary Key
Bringing a Relation to 1NF
- Option 2: Remove the entire repeating group from the relation. Create another relation which would contain all the attributes of the repeating group, plus the primary key from the first relation. In this new relation, the primary key from the original relation and the determinant of the repeating group will comprise a primary key.
Bringing a Relation to 1NF
Second Normal Form (2NF)
- The database must meet all the requirements of the first normal form.
- There should be no partial dependencies
Bringing a Relation to 2NF
- Goal: Remove Partial Dependencies
Composite Primary Key
Partial Dependency
Bringing a Relation to 2NF
- Remove attributes that are dependent from the part but not the whole of the primary key from the original relation. For each partial dependency, create a new relation, with the corresponding part of the primary key from the original as the primary key.
Bringing a Relation to 2NF
Third Normal Form (3NF)
- The database must meet all the requirements of the second normal form.
- There should be no transitive dependencies
Bringing a Relation to 3NF
- Goal: Get rid of transitive dependencies.
Transitive Dependency
Bringing a Relation to 3NF
- Remove the attributes, which are dependent on a non-key attribute, from the original relation. For each transitive dependency, create a new relation with the non-key attribute which is a determinant in the transitive dependency as a primary key, and the dependent non-key attribute as a dependent.
Bringing a Relation to 3NF
- A table is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key
- If a table contains only one candidate key, the 3NF and the BCNF are equivalent
- BCNF is a special case of 3NF.
Boyce-Codd Normal Form (BCNF)
Normal Forms: Review
- Unnormalized – There are multivalued attributes or repeating groups
- 1 NF – No multivalued attributes or repeating groups
- 2 NF – 1 NF plus no partial dependencies
- 3 NF – 2 NF plus no transitive dependencies
Example 1: Determine NF
- ISBN Title
- ISBN Publisher
- Publisher Address
- No M/V
attributes, therefore at least 1NF
- No partial
dependencies, therefore at least 2NF
- There is
a transitive dependency (Publisher Address), therefore, not 3NF
Conclusion: The relation is in 2NF
Example 2: Determine NF
1) No M/V attributes, therefore at least 1NF
2) There is a partial dependency (Product_ID Description),
therefore not in 2NF
Conclusion: The relation is in 1NF
23