воскресенье, 5 февраля 2023 г.

Big Data Trends in 2023

Original is posted here (CZ) and here (EN)


Intro

Amid many global socio-political changes, big data and analytics have become essential tools for doing business and ensuring company growth. The ongoing rise of big data, including cloud computing, has reshaped global tech trends. In 2023, we expect a similar surge for new and innovative technologies, which will ensure more efficient processes and operations. 


Cloud Adoption 

According to Gartner, 70% of companies have already partially migrated to the cloud, and 95% of new solutions by 2025 will be deployed to the cloud. While this trend will continue in 2023, cloud migration has risks and challenges that should be considered among its benefits. 

The first challenge is the risk of vendor lock-in. Once migrating your system to the cloud, your solution might be tightly linked to the services of a particular cloud provider. That's why it is essential to think about an exit strategy beforehand and use cloud-agnostic architecture to make further migrations possible. Another potential solution for this challenge is to rely on multi-cloud data solutions such as Snowflake/Databricks. 

Moreover, each cloud provider has its strengths and weaknesses. Sometimes it may be better to choose one provider for machine learning and another for the data warehouse. There is an emerging need for inter-cloud technologies which enable parts of data solutions to seamlessly collaborate across services of different cloud providers (and often with on-premise systems).

Another challenge worth mentioning is that not all data systems can be hosted on public clouds. For instance, some regulatory limitations may prevent data from being put on public clouds or make it risky. Often companies that still want to use some cloud benefits decide to use inhouse cloud. In this case, they may benefit from virtualization platforms like OpenStack or use on-premise cloud services like Azure Stack.


Regulatory Demands

According to Gartner, 75% of the world’s population will have their personal data covered with GDPR-like regulations by 2024. Besides an obvious interest in proper data security, this causes greater interest in data governance as a framework for companies to understand and manage their data. In fact, data governance becomes not an internal ask of management which would like to increase efficiency and turn data into an asset, but a mandatory external requirement.

Data governance consists of many important elements, including the following:

  • Data catalogue allows enterprises to track information about all their data assets systematically and ensures that no data is left outside the framework.
  • Data lineage tracks data paths across companies and ensures a shared vision of data inputs, outputs and transformations on this path.

Data Democratization


Research shows that companies will benefit from giving access to their data to all employees across organizations and not only in specific silos for predefined reports. 

The data democratization trend has many components:

  • First, these are various self-service solutions that allow employees to play with data independently. These may be reporting solutions like PowerBI or low-code automation tools like Alteryx. Another way is to expose data via APIs and allow research via scripting languages like Python.
  • The second aspect is the disclosure of metadata for employees to understand what data exists in the company and how they can be interpreted. Again, this brings us to the idea of data catalogs.
  • The third important aspect is data literacy. It is not enough to disclose data. It is important to ensure that employees understand and work with data properly.
  • And obviously, security and access separation aspects must be considered.

Artificial Intelligence (AI) Adoption


While AI (along with Big Data) has become a buzzword, we cannot ignore the recent developments in AI areas, such as ChatGPT, DALL-E and other OpenAI models.

Among other potential areas where AI may influence Data solutions, it’s worth mentioning the following:

  • Improving data observability through AI tools. These might be used to automate data discovery – for example, identify sensitive personal data, and find entities in data. Another potential area of application is data quality. AI tools may enable further automation of this process via automatic detection of data issues and even potentially automatically fixing them.
  • Augmented analytics simplifies exploratory analysis and leverages AI tools used in data preparation / featuring stages.
  • The topic of responsible AI becomes increasingly important. One should not start using AI model in production unless ensuring the fairness of input data and being able to explain its output.

Development of Data Architectural Approaches


The trends described above clearly indicate that data systems are becoming more and more complex. What’s more, modern companies cannot have a dedicated data system for each silo. Data should flow seamlessly across companies, and the ideal data solution will provide comprehensive data management across enterprise silos. This results in attempts to build enterprise-level data architecture covering all aspects of data management, such as input data sourcing, data structuring and reporting, data governance and advanced analytics. We see a trend to build hybrid solutions – data lakehouse – which combines the benefits of classical data warehouses (to deal with structured data) and data lakes (to deal with raw and unstructured data).

While the data lakehouse approach focuses mostly on the technical side of things, it may be extended with data mesh concept, which focuses on organizational perspective. It ensures that domain-oriented data are owned by respective business functions while remaining discoverable and addressable across the whole organization.