Data Normalisation, Primary Key and Foreign Key
What is Data Normalization?
Wikipedia defines data normalization as "The process of restructuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity"
Normalization is nothing but a process of identifying and arranging data which will help to avoid storing duplicate data and at the same time not missing any data that is needed for processing or analysis.
Too difficult to understand by just reading the above statements. Lets understand the meaning and process of normalization by looking at some examples from various fields.
Data Normalization process of a book Library: Every Library has a list of books and a list of its members who borrow a single book or multiple books at any given point of time or no book has been borrowed by the member from the library. Unfortunately, this library cannot afford to buy any library software for management. Currently the team uses a spread sheet or say MS excel to manage its books, members and the transaction records. Pasted below is the expected data sheet of the records.
Improvised Transaction Table /Data after Normalisation.
Wikipedia defines data normalization as "The process of restructuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity"
Normalization is nothing but a process of identifying and arranging data which will help to avoid storing duplicate data and at the same time not missing any data that is needed for processing or analysis.
Too difficult to understand by just reading the above statements. Lets understand the meaning and process of normalization by looking at some examples from various fields.
Data Normalization process of a book Library: Every Library has a list of books and a list of its members who borrow a single book or multiple books at any given point of time or no book has been borrowed by the member from the library. Unfortunately, this library cannot afford to buy any library software for management. Currently the team uses a spread sheet or say MS excel to manage its books, members and the transaction records. Pasted below is the expected data sheet of the records.
Transaction spread sheet of the Library.
Let's understand the data in the above pasted transaction table. It captures the following data
Let's understand the data in the above pasted transaction table. It captures the following data
1) Date of Transaction or date of issue of book to customer/member of library.
2) Book ID- its the unique book identification tag.
3) Name of the Book.
4)Customer/Member ID - It is the unique membership number.
5)Name of customer/Member.
6) Exp date of return - It is the expected date of return of the book taken by the member.
Basically, data related to books, customers and transactions of a given book and member is captured in a single record or say a single spread sheet.
There is nothing wrong in capturing the data in the above format in any spread sheet or for that matter any data base (RDBMS) software.
But then if you want to know the count of members or say the count of books in the library, then you would have to either pivot on respective fields like customer Name and book name. Also, the count of a given book would also go wrong if the spelling of the name of book or name of the member is spelled differently in two different records.
Eg. One book is named as "Rich Dad and Poor Dad" in one of the transaction and the second book is mentioned as "Rich Dad & Poor Dad".
In this case the system would count the book "Rich dad and Poor Dad" and "Rich Dad & Poor Dad" as two different books even though they are the same book.
How are such anamolies or errors to be eliminated?
The answer is Data Normalisation.
In the process of Normalisation, basic data related to books, customers and transaction are separated and stored in seperate tables.
The unique tables of customers and books are known as "Master Tables". The records related to member or customers can be tagged or named as "Customer Master". This table will capture details related to only customers or members of the Library, it will capture the name of the customer, address and contact details if any in this master table. You also assign a unique customer ID for every member.
The customer master Table/Data pasted below captures the customer ID and Name of customer and also note that no two same customers or members will have the same customer ID.
Customer Master Table/Data
Pls note the Master Book Table/Data that has been created of books during the process of normisation. It captures the book name and Book ID the book identification tag.
Pls note the Master Book Table/Data that has been created of books during the process of normisation. It captures the book name and Book ID the book identification tag.
Master Book Table/Data
Pls note the new or improvised transaction of table after the Normalisation has done on records /data/tables of the library under discussion.
The new or improvised transaction table doesn't capture either the name of the book of the name of the customer or member, it only captures their unique ID.
How does this process of Normalisation help in managing data or in simple words what's the advantage of capturing data in this method or the Normalisation.?
Advantages are as follows :
1)Unique Records are identified and stored in various tables. Imagine the member Ram" having customer ID "1004" changes his name to "Rama". In such cases you would have to change the name of the member "Ram" to "Rama" against every transaction record I. E if there are 1000 transactions against his member I'd, you would need to change it 1000 times and there would be a high probability that some get missed. It is also quite possible that the Librarian is on leave and a staff who is taking care of transaction on the particular day knows the member as ""Ram" and captures his his name as Ram on that day.
Now if the data is normalized we know that only the customer ID is captured in the transaction. Table. We only have to correct the name of the member in the master table against his member ID "1004".
Vow, a lot of effort has been saved and also ensures no errors in future transaction for a given member.
2) The second big advantage is saving of memory space required to store transaction tables which needs either to be shared or backed up. There is no point in storing the name of the member and the member ship id against every transaction table which would just consume additional space.
3) Its easier to index records(Need not worry on this advantage at this point).
Trust you have understood the concept of Normalisation. Understanding this concept will help you in the long run and as you work on more complex data.
Let's understand the concept of Primary Key and Foreign Key.
PRIMARY KEY:
Primary key is nothing but the unique identification of a record in a given table or data.
The book id and the customer ID can be defned or treated as Primary Key. A particular record can have only one primary key and that they cannot be duplicate. I. E no two same records in a table/data range can have the same identity or primary Key. If our book master has two books of "Rich Dad Poor Dad" the book master table will have two records of the book one having book ID as "B001" and "B002".
The primary key can also be just a unique serial number such as 1,2 3 etc etc. It can also be customized asdone here where in business related books are identified with the letter "B" as prefix and novels with the prefix "N".
Let me tell you that you need not remember the book Id s or customer ID s /Membership nos for all Master records. In case of books the book ID is printed on the book and that the membership card issued to members captures the customer ID /master ID.
Most RDBMS softwares have the provision to display the name and primary for a given unique ID. When we type the book id in the table or data sheet or display the system will also show/display the name of the book or the name of the customer to validate the same.
FOREIGN KEY
Foreign key is nothing but the primary key captured in the transaction table. In our case the book id and customer ID captured in the new or improvised table is the foreign key.
The biggest advantage is that it will not allow you to capture and primary key or id that is not part of the referenced master table. It is this foreign key that relates the master table and transaction table which will help us in data processing.
This is precisely the reason such databases are known as Relational Database and softwares which uses this process of binding one or more tables are known as Relational Database Management System also known as RDBMS.
Trust you enjoyed the reading as well are able to grasp the concept of Data Normalisation, Primary Key and Foreign Key.
Will try to post more examples on this topic so as to enable a larger and diverse audience from various fields and backgrounds to help understand this concept, which is very important and powerful.
Would also request to you give your feedback or suggestions for improvement on this topic.




Comments
Post a Comment