Structured data, which is the first variation, is data that has a well-defined structure and has a consistent order. Structured data can be easily accessed and used by a person. It is usually stored in a well-defined format, such as rows and columns of a table in spreadsheets and databases- mainly in relational database management systems (RDBMS).
Unstructured data, which is the second variation, on the other hand, exhibits fewer properties. However, the most distinctive characteristic is that this kind of data has a definite structure and cannot conform to any formal rules exhibited by data models like those in RDBMS. The last variation is the unstructured data. This kind of data has no structure and does not conform to the conventional data models and the formal structural rules. However, it may have information related to the date and time.
Characteristics of big data management
Generally, big data is associated with three main characteristics. These are:
- Volume
As earlier stated, big data is characterized by massive amounts of information generated from various areas like social media, smartphones, business transactions, sensors, images, video and text from different sources. These different data formats in massive volumes come in petabytes, terabytes or even zettabytes and can only be managed with advanced big data technologies.
- Variety
Big data come in different formats and variations. It comes in various transactional and demographic formats like phone numbers, addresses, photographs, video and audio streams, with more than 80% being completely unstructured.
- Velocity
Information coming into big data repositories arrives at a fast rate. This characteristic where data comes faster is characterized by velocity, which is the speed of accumulation of data. Velocity also refers to the speed at which data can be processed and analyzed to get patterns or insights that can be used in critical business operations.
Beyond the three V’s, there are also two V’s which are crucial. They include:
Veracity: This explains the degree of reliability and truth that big data offers regarding relevance, accuracy, and cleanliness.
Value: The main aim of gathering big data in an organization is so that it can be analyzed and insights or patterns uncovered so that it can be used to make decisions and in other processes. This explains why data is valuable and can influence change in an organization for the better.
Big data management architecture
The big data management infrastructure consists of the IT infrastructure, tools, configuration and human resources or talent. All these components are fitted together to address the peculiar needs of an organization. Technologies, skills and other external inputs can be brought together to manage data.
Basics of data mining
The data mining process involves examining useful patterns in large chunks of data from different sources. It is also known as data discovery or information harvesting. It aims to extract fragments of useful information from databases and find patterns between the streams of data that could not be discerned in raw data.
Data mining techniques
Data mining can follow two strands. The first one is creating predictive power using the existing information to forecast future values. The second one is finding descriptive power for a better representation of patterns in the existing data. Some data mining technologies include:
- Classification analysis
This entails separating the basic information in a data set into categories. These categories are then analyzed.
- Association Rule Learning
This technique finds the existing association between two or more events or properties by looking for an underlying model in the database.
- Anomaly analysis
This technique detects anomalies that include behaviours or relationships that occur away from the previously predicted patterns. It is used in fraud detection, health monitoring and identifying disturbances in the ecosystem.