Estimated reading time: 3 minutes, 14 seconds

Know More About Managing Big Data Featured

Know More About Managing Big Data Priscilla Du Preez

Big data is a topic that many people talk about today. Big data is vast amounts of data that cannot be stored, retrieved and processed using traditional approaches. This data consists of various variations.

Structured data, which is the first variation, is data that has a well-defined structure and has a consistent order. Structured data can be easily accessed and used by a person. It is usually stored in a well-defined format, such as rows and columns of a table in spreadsheets and databases- mainly in relational database management systems (RDBMS).

Unstructured data, which is the second variation, on the other hand, exhibits fewer properties. However, the most distinctive characteristic is that this kind of data has a definite structure and cannot conform to any formal rules exhibited by data models like those in RDBMS. The last variation is the unstructured data. This kind of data has no structure and does not conform to the conventional data models and the formal structural rules. However, it may have information related to the date and time.

Characteristics of big data management

Generally, big data is associated with three main characteristics. These are:

  1. Volume

As earlier stated, big data is characterized by massive amounts of information generated from various areas like social media, smartphones, business transactions, sensors, images, video and text from different sources. These different data formats in massive volumes come in petabytes, terabytes or even zettabytes and can only be managed with advanced big data technologies.

  1. Variety

Big data come in different formats and variations. It comes in various transactional and demographic formats like phone numbers, addresses, photographs, video and audio streams, with more than 80% being completely unstructured.

  1. Velocity

Information coming into big data repositories arrives at a fast rate. This characteristic where data comes faster is characterized by velocity, which is the speed of accumulation of data. Velocity also refers to the speed at which data can be processed and analyzed to get patterns or insights that can be used in critical business operations.

Beyond the three V’s, there are also two V’s which are crucial. They include:

Veracity: This explains the degree of reliability and truth that big data offers regarding relevance, accuracy, and cleanliness.

Value: The main aim of gathering big data in an organization is so that it can be analyzed and insights or patterns uncovered so that it can be used to make decisions and in other processes. This explains why data is valuable and can influence change in an organization for the better.

Big data management architecture

The big data management infrastructure consists of the IT infrastructure, tools, configuration and human resources or talent. All these components are fitted together to address the peculiar needs of an organization. Technologies, skills and other external inputs can be brought together to manage data.

Basics of data mining

The data mining process involves examining useful patterns in large chunks of data from different sources. It is also known as data discovery or information harvesting. It aims to extract fragments of useful information from databases and find patterns between the streams of data that could not be discerned in raw data.

Data mining techniques

Data mining can follow two strands. The first one is creating predictive power using the existing information to forecast future values. The second one is finding descriptive power for a better representation of patterns in the existing data. Some data mining technologies include:

  1. Classification analysis

This entails separating the basic information in a data set into categories. These categories are then analyzed.

  1. Association Rule Learning

This technique finds the existing association between two or more events or properties by looking for an underlying model in the database.

  1. Anomaly analysis

This technique detects anomalies that include behaviours or relationships that occur away from the previously predicted patterns. It is used in fraud detection, health monitoring and identifying disturbances in the ecosystem.

Read 260 times
Rate this item
(0 votes)
Scott Koegler

Scott Koegler is Executive Editor for Big Data & Analytics Tech Brief

scottkoegler.me/

Visit other PMG Sites:

click me
PMG360 is committed to protecting the privacy of the personal data we collect from our subscribers/agents/customers/exhibitors and sponsors. On May 25th, the European's GDPR policy will be enforced. Nothing is changing about your current settings or how your information is processed, however, we have made a few changes. We have updated our Privacy Policy and Cookie Policy to make it easier for you to understand what information we collect, how and why we collect it.