In this blog, we will dive deep into the visualization of data. We will see it from different perspectives and how it helps users and developers to manage enormous databases.
Data Abstraction
Data is stored in the database and also can be viewed by different users. Some parts of data will be seen by specifically some users. A certain percentage of data will be hidden by some percentage of users. This concept is known as data abstraction. Database System provides users with an abstract view of data which hides certain parts of data which simplifies user interaction with the system.
Levels of Data Abstraction
The view level provides the βview of dataβ to the users and hides the irrelevant details such as data relationship, database schema, constraints, security, etc from the user.
The three-schema architecture defines the view of data at three levels:
- The main objective of this architecture is to have an effective separation between the user interface and the physical database. So, the user never has to be concerned regarding the internal storage of the database and it has a simplified interaction with the database system.
1. Physical Level: describes how the data is stored in the hardware. It also describes how the data can be accessed. The physical level shows the data abstraction at the lowest level and it has complex data structures. Only the database administrator operates at this level.
2: Logical Level: Here, the data is stored in the form of the entity set, entities, their data types, the relationship among the entity sets, user operations performed to retrieve or modify the data and certain constraints on the data.
It is the developer and database administrator who operates at the logical or the conceptual level.
3: View Level:It is the highest level of data abstraction and exhibits only a part of the whole database. It exhibits the data in which the user is interested. The view level can describe many views of the same data and also can hide information for security purposes.
Lets take an example where we have to create a database of a college. Now, what entity sets would be involved? Student, Lecturer, Department, Course etc..
Now, the entity sets Student, Lecturer, Department, Course will be stored in the storage as the consecutive blocks of the memory location. This is the physical or internal level and is hidden from the programmers but the database administrator is aware of it. Also specifies the size of the data.
At the logical level, the programmers define the entity sets and the relationship among these entity sets using a programming language like SQL. So, the programmers work at the logical level and even the database administrator also operates at this level.
At the view level, the users have the set of applications that they use to retrieve the data they are interested in. We can have multiple views for a certain type of user only.
Instances and Schemas
Instances: We can define an instance as the information stored in the database at a particular point of time. Let us discuss it with the help of an example.
- As we discussed above the database comprises of several entity sets and the relationship between them. Now, the data in the database keeps on changing with time. As we keep inserting or deleting the data to and from the database.
- Now, at a particular time if we retrieve any information from the database then that corresponds to an instance. Hence, we can say it is a collection of information stored in the database at a particular moment.
Schemas: Whenever we talk about the database the developers have to deal with the definition and the data of the database.The definition of a database comprises of the description of what data it would contain what would be the relationship between the data. This definition is the database schema.
It's basically an overall design of the database. In the table diagram above we can different levels as different schemas.
Hence we can say every database comprises of three schemas:
- Physical Schema
- Logical Schema
- Sub Schemas
Data Independence
It is an ability to modify a schema definition in one level without affecting a schema definition in other levels. Data independence can be classified as shown below:
Logical Data Independence: Logical data independence describes the degree up to which the logical or conceptual schema can be changed without modifying the external schema.
Now, a question arises what is the need to change the data schema at a logical or conceptual level? Well, the changes to data schema at the logical level are made either to enlarge or reduce the database by adding or deleting more entities, entity sets, or changing the constraints on data.
Physical Data Independence: Physical data independence defines the extent up to which the data schema can be changed at the physical or internal level without modifying the data schema at logical and view level.
Well, the physical schema is changed if we add additional storage to the system or we reorganize some files to enhance the retrieval speed of the records.
The interfaces between the various levels and components should be well defined so that changes in some parts do not seriously affect others.
Data Models
Its a collection of conceptional tools used for describing :
- Data
- Data relationship
- Data semantics
- Consistency constraints
There are following four data models used for understanding the structure of the database:
1) Relation Model:
Let us understand this model by taking an example of a flat file
This flat file contains information about the details of customers and in table 2 we can understand that "account-number" is primary key and its a foreign key in table1
This kind of model is the "Relational Model" in which a collection of tables are used to represent both the data and the relationships among the data.
2) Entity-Relationship Model:
It is based on the notion of real-world entities and relationships among them. While formulating real-world scenario into the database model, the ER Model creates entity set, relationship set, general attributes and constraints.
- Entity β An entity in an ER Model is a real-world entity having properties called attributes. Every attribute is defined by its set of values called domain. For example, in a school database, a student is considered as an entity. Student has various attributes like name, age, class, etc.
- Relationship β The logical association among entities is called relationship. Relationships are mapped with entities in various ways. Mapping cardinalities define the number of association between two entities.
3) Object-Based Data Model:
An extension of the ER model with notions of functions, encapsulation, and object identity, as well. This model supports a rich type system that includes structured and collection types. Thus, in 1980s, various database systems following the object-oriented approach were developed. Here, the objects are nothing but the data-carrying its properties.
This model combines the features of object-oriented data model and relational data model. It adds database functionality to object programming languages.
3) Semistructured Data Model
It permits the specifications of data where individual data items of the same type may have different sets of attributes. The XML is widely used to represent semi-structured data.
Database Design
In order to develop a database or an application, what is important is that to consider all the requirements and then representing them in form of a database. It is required to produce an efficient , high quality and minimum cost database .
There are 6 phases in total in database designing:
1. Requirement Collection and Ananlysis:
Its a process of knowing and analyzing the expectations of the users for new database application done by a team of analysts or requirement experts. The requirement specifications such as object oriented analysis and data flow diagramare used to transform the initial requirement experts
2. Conceptual Database Design:
There consist of something called database designer which selects a suitable data model and translates the data requirement resulting from previous phase into a conceptual database schema. The ER diagram is generally used to represent it. Designer reviews the schema to confirm that all data requirements are indeed satisfied and are not in conflict with one another.
Designer can also examine the design and remove any redundant features if present which is called NORMALIZATION.
3. Choice of DBMS:
This depends on many factors like cost, DBMS features and tools, underlying model ,portability etc. Some technical factors that affect the choice of dbms are:
- type of DBMS
- storage structure
- access path
- interface available
- architect it supports
4. Physical Database Design:
- This concerns with physical features like storage structures,file organization and access paths for the database files are specified to achieve good performance.
- Various options for file organization and access path are various types of indexing,clustering of records, hashtag techniques etc.
5. Data Model Mapping:
- Data mapping is the process of matching fields from one database to another. It's the first step to facilitate data migration, data integration, and other data management tasks.
- Data mapping has been a common business function for some time, but as the amount of data and sources increase, the process of data mapping has become more complex, requiring automated tools to make it feasible for large data sets.
6. System Implementation and tuning:
Implementation of the database occurs here. DDL statements of the selected DBMS are complied to create the database schema and database files and finally the database is loaded with the data.
Thank you for reading the blog.ππ»π
Follow for further parts of DBMS, coming soon...
π° Keep Learning !! Keep Sharing !! π°
I would love to connect with you. Give me a Hello on the below socials:
π Feel free to connect with me :
- LinkedIn : linkedin.com/in/abhinav-kumar-9639b4143
- GitHub : github.com/abhinav2712
- Twitter : twitter.com/Abhinav10465018