Data lakes were first coined by Pentaho CTO James Dixon in 2010. "If you think of a data mart as a store of bottled water -- cleansed and packaged and structured for easy consumption -- the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples, “Dixon explained. In other words data lakes are storage repositories located on premises or on the cloud used to store both structured and unstructured data for later use.
Records stored in these vast repositories were once useful or were anticipated to be in the future – therefore they were stored until they were needed. Like most things that are put away out of sight – data scientists forgot that the data existed or only thought about them from time to time. During this time of negligence hackers would steal the valuable data or plant a virus into already existing data. With no data protection – the theft often times went unnoticed for months to years. The hack would only be discovered after a data scientist went to look for the data only to discover that it was gone or altered. By that time, it was too late and data recovery was futile.
Like all data – businesses are held accountable for data loss regardless of where the data is stored. Therefore, it’s crucial that companies make sure that their data is secure whether in a data warehouse, a data lake or stored on a PC. In an article with Builtin, Carbon Black Chief Cybersecurity Officer Tom Kellermann warned “We need to be very, very concerned [about] these massive data lakes. They have become targets not just for traditional hackers and disillusioned individuals, but also for nation states.” Thankfully there are steps that an organization can do to protect their data.
Kellermann goes on to suggest that platforms that store secure data shouldn’t have a permanent administrator. Instead everyone should have limited access or be a “temporary administrator” to ensure that records are protected.
It’s also important to note that cloud providers such as Amazon Web Services aren’t necessarily impenetrable – but they offer more protection then a PC or a small company’s security. If a business wants to ensure that their data is secure but do not have the resources to protect it on their own – investing in a cloud service provider is fundamental.
Securing data lakes are just as important as securing data that is currently being analyzed. Is your data lake protected?