воскресенье, 20 октября 2024 г.

Review of managerial roles on IT projects - PM, PdM, DM, EM, RM and so on

When working on a large IT project or within a large IT company, you will encounter a lot of individuals in various managerial roles - such as Project Manager, Product Manager, Delivery Manager, Engineering Manager, Resource Manager, and others. But why are there so many different types of managers, and what distinguishes them from one another?



Let’s start with the most familiar role - Project Manager (PM). As defined by PMI [1], “Project managers are organized, goal-oriented professionals who use innovation, creativity, and collaboration to lead projects that make an impact”. PMs are responsible for outlining the project’s scope and timeline, planning, overseeing execution, managing the budget, etc. Project management as a discipline has a long history that predates the IT industry, with roots in architecture and civil engineering.

Given the broad scope of a PM responsibilities, one might wonder why other types of managers are needed for project delivery. The answer lies in the definition of a project itself. According to PMI, a project is a temporary endeavor undertaken to create a unique product, service, or result. The key point is that a project has a defined start and end, as well as a specific goal. In contrast, a product is an offering designed to meet customer needs or desires, and while a product may be created as part of a project, it typically evolves through a series of projects or just incremental changes over time.



This leads us to the next type of managerial role - Product Manager (PdM). A Product Manager is responsible for overseeing the development of a product within an organization. They are tasked with creating the product strategy or product vision, defining business and functional requirements, and managing the product roadmap, including feature releases. In this last aspect, the PdM’s role intersects with that of a Project Manager, though a PdM typically takes a less formalized approach to feature delivery. Product Managers often follow agile methodologies, delivering product features in small, incremental stages, with a flexible and adaptive delivery roadmap.

Additionally, PdMs place a stronger emphasis on the marketing and customer needs aspects of product development, bringing them closer to roles like Marketing Analysts and Business Analysts (BA). However, while these roles focus on specific areas, the PdM spans the analytical, delivery, and strategic dimensions of product development. It’s also worth noting that the Product Owner (PO) role in Scrum is closely aligned with the responsibilities of a Product Manager.



So far, we’ve reviewed the roles of Project Manager (PM) and Product Manager (PdM), which address two distinct types of deliverables: projects and products. But why are additional managerial roles needed?

The first reason is that large organizations rarely focus on a single project. Instead, they typically manage multiple projects, grouped into programs or portfolios. This is where two senior project management roles come into play - Program Manager and Portfolio Manager. Both oversee groups of projects, but the difference is that a program consists of related projects, while a portfolio contains unrelated projects. Program and Portfolio Managers are responsible for the oversight and successful delivery of these project groups, focusing on higher-level aspects such as formal status tracking, resource allocation, budget planning, and return on investment analysis, while staying less involved in individual project execution.



Now, let’s explore more specialized roles, starting with the Delivery Manager (DM). The DM role is often vaguely defined, for example, one definition states that a "Delivery Manager is responsible for the successful delivery of a project or product. They work with a team of developers, designers, and other professionals to ensure it is completed on time, within budget, and to the required quality standards" [2]. How, then, does this differ from the Project Manager role?

We believe that the first key distinction between PM and DM roles is in the expected level of responsibility. In traditional project management, the PM oversees a well-defined team - business analysts who gather and formalize requirements, architects who define the technical solution, developers who implement it, and quality engineers who ensure proper testing. The PM’s role is to create a plan, allocate tasks, and monitor execution, often relying heavily on input from these specialists. Since the PM cannot be an expert in every field, this sometimes leads to a situation where they become more of a Project Administrator, simply tracking formal status updates rather than actively driving execution.

In contrast, the Delivery Manager is primarily focused on ensuring successful and efficient delivery at every stage of the project. While the PM may rely on expert input without challenging it, the DM takes a more hands-on approach, actively pushing the project forward.

The second key distinction is in mindset and cross-disciplinary skills. A DM is expected to challenge assumptions, push the team towards delivery, and possess practical knowledge in the business domain and relevant disciplines such as business analysis and development. This enables the DM to better understand the work being done, assess the quality of deliverables, and anticipate the needs of the development team. While the DM is not required to write code or conduct code reviews, they should have a solid understanding of the development process and be able to evaluate the quality of IT deliverables and the team’s approach.



Let’s turn to the role of Engineering Manager (EM). There are various definitions of this role as well, such as, "an engineering manager ensures that the engineering projects assigned to them are completed, and that general engineering duties are fulfilled" [3]. At first glance, it’s hard to differentiate between the roles of a PM and an EM, since both are focused on ensuring project completion. However, the key distinction is that the EM places a strong emphasis on engineering aspects.

Typically, an EM has a background in development and is often a natural progression for those in Team Lead or Development Lead roles who want to stay engaged with the technical side rather than transitioning entirely into management. We would even go further and describe the Engineering Manager as a sort of Technical Product Manager. Similar to a Product Manager, an EM defines the technical vision for a product and works on delivering the technology roadmap - whether that’s managing technical debt or promoting the adoption of best engineering practices.



And finally let's look from the angle of specifics of the IT company itself. When looking at types of IT companies, they can generally be categorized into two broad groups: product companies and service companies. Product companies develop their own IT products and sell them either directly to consumers (B2C) or to other businesses (B2B). In contrast, service companies offer IT services, such as IT support or development as a service. Within this category, outsourcing and outstaffing companies are specific types of service providers. These companies employ large numbers of IT professionals (usually in cost-effective locations) and "sell" or "rent" their expertise to product companies as a quick and scalable way to enhance their IT capabilities.

A unique role often found in these service companies is that of the Resource Manager (RM). The RM serves as a people manager, overseeing the company’s most valuable asset - its workforce. Unlike a Project Manager, the RM is not typically involved in project delivery (although they may sometimes take on dual roles like PM or EM). Instead, the RM focuses on managing the talent pool, which includes supporting career development, handling compensation and performance management, resolving conflicts, and ensuring key performance indicators (KPIs) such as resource utilization, employee attrition, and profitability margins are met.


воскресенье, 5 февраля 2023 г.

Big Data Trends in 2023

Original is posted here (CZ) and here (EN)


Intro

Amid many global socio-political changes, big data and analytics have become essential tools for doing business and ensuring company growth. The ongoing rise of big data, including cloud computing, has reshaped global tech trends. In 2023, we expect a similar surge for new and innovative technologies, which will ensure more efficient processes and operations. 


Cloud Adoption 

According to Gartner, 70% of companies have already partially migrated to the cloud, and 95% of new solutions by 2025 will be deployed to the cloud. While this trend will continue in 2023, cloud migration has risks and challenges that should be considered among its benefits. 

The first challenge is the risk of vendor lock-in. Once migrating your system to the cloud, your solution might be tightly linked to the services of a particular cloud provider. That's why it is essential to think about an exit strategy beforehand and use cloud-agnostic architecture to make further migrations possible. Another potential solution for this challenge is to rely on multi-cloud data solutions such as Snowflake/Databricks. 

Moreover, each cloud provider has its strengths and weaknesses. Sometimes it may be better to choose one provider for machine learning and another for the data warehouse. There is an emerging need for inter-cloud technologies which enable parts of data solutions to seamlessly collaborate across services of different cloud providers (and often with on-premise systems).

Another challenge worth mentioning is that not all data systems can be hosted on public clouds. For instance, some regulatory limitations may prevent data from being put on public clouds or make it risky. Often companies that still want to use some cloud benefits decide to use inhouse cloud. In this case, they may benefit from virtualization platforms like OpenStack or use on-premise cloud services like Azure Stack.


Regulatory Demands

According to Gartner, 75% of the world’s population will have their personal data covered with GDPR-like regulations by 2024. Besides an obvious interest in proper data security, this causes greater interest in data governance as a framework for companies to understand and manage their data. In fact, data governance becomes not an internal ask of management which would like to increase efficiency and turn data into an asset, but a mandatory external requirement.

Data governance consists of many important elements, including the following:

  • Data catalogue allows enterprises to track information about all their data assets systematically and ensures that no data is left outside the framework.
  • Data lineage tracks data paths across companies and ensures a shared vision of data inputs, outputs and transformations on this path.

Data Democratization


Research shows that companies will benefit from giving access to their data to all employees across organizations and not only in specific silos for predefined reports. 

The data democratization trend has many components:

  • First, these are various self-service solutions that allow employees to play with data independently. These may be reporting solutions like PowerBI or low-code automation tools like Alteryx. Another way is to expose data via APIs and allow research via scripting languages like Python.
  • The second aspect is the disclosure of metadata for employees to understand what data exists in the company and how they can be interpreted. Again, this brings us to the idea of data catalogs.
  • The third important aspect is data literacy. It is not enough to disclose data. It is important to ensure that employees understand and work with data properly.
  • And obviously, security and access separation aspects must be considered.

Artificial Intelligence (AI) Adoption


While AI (along with Big Data) has become a buzzword, we cannot ignore the recent developments in AI areas, such as ChatGPT, DALL-E and other OpenAI models.

Among other potential areas where AI may influence Data solutions, it’s worth mentioning the following:

  • Improving data observability through AI tools. These might be used to automate data discovery – for example, identify sensitive personal data, and find entities in data. Another potential area of application is data quality. AI tools may enable further automation of this process via automatic detection of data issues and even potentially automatically fixing them.
  • Augmented analytics simplifies exploratory analysis and leverages AI tools used in data preparation / featuring stages.
  • The topic of responsible AI becomes increasingly important. One should not start using AI model in production unless ensuring the fairness of input data and being able to explain its output.

Development of Data Architectural Approaches


The trends described above clearly indicate that data systems are becoming more and more complex. What’s more, modern companies cannot have a dedicated data system for each silo. Data should flow seamlessly across companies, and the ideal data solution will provide comprehensive data management across enterprise silos. This results in attempts to build enterprise-level data architecture covering all aspects of data management, such as input data sourcing, data structuring and reporting, data governance and advanced analytics. We see a trend to build hybrid solutions – data lakehouse – which combines the benefits of classical data warehouses (to deal with structured data) and data lakes (to deal with raw and unstructured data).

While the data lakehouse approach focuses mostly on the technical side of things, it may be extended with data mesh concept, which focuses on organizational perspective. It ensures that domain-oriented data are owned by respective business functions while remaining discoverable and addressable across the whole organization.

вторник, 2 марта 2021 г.

Typer - yet another approach for types verification in R

Introduction


In this article I would like to describe my new R package - typer - which allows to describe function input & output parameter types and verify actual parameter values during execution.

R is dynamically typed programming language. On one hand, this simplifies experimenting and writing code in REPL style. On other hand, it may cause problems if code is going to be reused. For example, in absence of type definitions any value can be passed into function call which may cause unexpected behavior.

Here is simple illustration:

func <- function(a, b) {
    return(a + b)
}

func(1, "1") # This will throw error
func(1, TRUE) # This may result in incorrect behaviour

There are multiple ways how to fight this problem. The most common one is to perform simple checks of input params at the beginning of the function like below.

func <- function (a, b) {

  if (is.numeric(a)) stop("a must be numeric")
  if (is.numeric(b)) stop("b must be numeric")

  return (a + b)
}

func(1, "1") # This will throw clear error
func(1, TRUE) # This will throw clear error

There are also some packages which simplify such checks. The good example here is checkmate package. It allows writing easy-to-understand and fast assertions on function input parameter values.

This is how our sample function may look like with asserts from checkmate package:

func <- function (a, b) {

  assertNumeric(a)
  assertNumeric(b)

  return (a + b)
}

Typer

The idea of typer package is to provide ability to describe function parameters in declarative way. Then this info can be used to verify actual parameter values during function execution. It also can be used for documents generation (as alternative for roxygen comments).

среда, 7 февраля 2018 г.

Batch system database design considerations

Introduction

To continue topic of best practices for batch systems creation, in this article I would like to describe some considerations related to database design for batch systems.

More precisely, I am referring batch systems related to financial area (such as risk engines), but I believe that these principles can be applied to all batch systems. Also I will also concentrate on relational databases. In theory, batch system can be built around NoSQL storage, or use service-oriented / microservices architecture where there will be no central data storage at all. But practically I believe that relational databases used for most batch systems. Such databases contain input data for calculations which are loaded from external upstream systems. Also, they store calculation results which are used later for reporting purposes, or extracted to downstream systems. In fact, design of such databases often follows traditional Extract Transform Load (ETL) approach.

Looking at wider picture, such  batch systems are usually part of larger enterprise framework. Often organizations use federated enterprise framework. Under such approach organization has many batch systems which are responsible for particular business functions and which communicate between each other by exchanging data feeds.

Batch Database Design Considerations

When creating database for such batch system, the following considerations are often taken into account:
  1. Master Data Management. This is enterprise-wide practice which allows to make sure that important reference data (e.g. clients data) are used consistently across organization and there is single point of reference for them. There are many different approaches which allows to set up proper MDM solution. For example, this presentation can be used as reference. But MDM is rather organization-wide problem, so I would not spent much time on its review in this article.
  2. Data Lineage & Traceability. This is an ability to track data sourcing & data flow between different systems and ideally within single system as well. It is not critical requirement for system/organization functioning per se. But it is often very important for troubleshooting & audit purposes. It should be relatively easy to track data sourcing between systems which follows best ETL practices. But it is usually harder to have automatic tracking of data flows within single system, unless some special solutions are used.
  3. Data Consistency. From my point of view, this is one of the most important requirements. It is crucial for providing accurate & reproducible batch outputs. I include here consistency between different input data items (e.g. making sure that data pieces are consistent between each other from timing point of view). I also include here repeatability of calculations (e.g. so we can re-run some calculations for given combinations of input data and get same results).

суббота, 6 января 2018 г.

Comparing different serialization methods in R

Introduction


In this article I would like to compare different approaches for data serialization available in R. The comparison will be done from point of views of serialization / deserialization performance and compactness of disk space required. I would perform analysis for data table objects, since these are objects which I need to serialize/deserialize in my practice most often.

The following approaches are reviewed:
  • Functions saveRDS / readRDS:
    • It supports all R objects types and provides as-is serialization / deserialization (with possible nuances for custom references objects).
    • It supports compressed & uncompressed data storage.
    • Essentially, this is dump of object memory representation in R, so unfortunately this is R-only serialization format.
  • Package feather:
    • This is fast and language agnostic alternative for RDS format.
    • It uses column oriented file format (based on Appache Arrow and Flatbuffers library).
    • The format is open-source and is supported both in R & Python.
  • Package fst:
    • This is another alternative for RDS & Feather formats which can be used for fast data frames serialization.
    • It supports compression by using LZ4 and ZSTD algorithms.
    • The big advantage of this approach that it provides full random access to rows & columns of stored data.
  • Package RProtoBuf:
    • This is R interface package for Protocol Buffers serialization method proposed by Google.
    • Usually, this approach is used for serializing of relatively small structured objects. But it would be interesting to see how it will deal with data tables serialization in R.
  • Functions write.csv & read.csv:
    • This is standard R functions for storing & reading data frames in text-based CSV format.
    • This approach can be easily applied only to data frame objects, but I've included it into comparison, since most objects which I need to serialize in my practice are data tables.
  • Functions fwrite & fread from data.table package:
    •  This is another approach for storing & reading data table objects.
    • These functions are much more optimized in comparison to standard ones above, so it would be nice to compare them.
  • Package RSQLite:
    • This package provides R interface to SQLite embedded database engine.
    • Also it may be overkill to use such approach for simple data tables serialization purposes, I've included this package into comparison for sake of completeness.

воскресенье, 18 июня 2017 г.

R packages useful for general purpose development

The goal of this article is to provide brief overview of R packages which can be useful for general purpose programming in R. I am going to update this article from time to time to add more useful packages here.

  1. testthat. Write unit tests for R code. Alternatives - RUnit.
  2. roxygen2. Auto generate package documentation from special formatted comments in-line provided with R code.
  3. checkmate. Package to implement fast pre-conditions checks and asserts for your code.
  4. logging. Package to organize consistent logging in your R scripts and packages. The package allows to specify different handlers for logging messages (e.g. to redirect them to file, database, console), different level of logging (e.g. for debug purposes or production mode).
  5. lintr. Static code analyzer for R. Can be integrated with testthat package to check code formatting & style to during package compilation. This allows to enforce common. Alternatives - formatR.
  6. argparse. Package to simplify parsing of command line arguments for R scripts.
  7. cyclocomp. Calculate cyclomatic complexity of R functions. Can be used to calculate count of unit tests required to cover all execution paths within function.
  8. covr. Calculates test coverage for R functions.
  9. TypeInfo. Prototype R package to specify types for function parameters & return values. This may be a good way to fight problems with R weak typing.


Comparing approaches to correlated random numbers generation

Introduction

Correlated random numbers generation is crucial part of market data simulations and thus it is one of the important functions within Monte-Carlo risk engines. The most popular approaches here are usage of Cholesky decomposition, Singular value decomposition, or Eigen decomposition (aka Spectral decomposition). These approaches have their own advantages and disadvantages. In this article I would like to perform small comparison of these methods on real life data.


Approaches to Correlated Random Numbers Generation

In general, correlated random generation consists of two steps:

1. Decomposition of  the correlation matrix C:


2. Then correlated random numbers can be generated by using U matrix as follows:


So, let's imagine that we have variable corr_matrix with correlation matrix:
corr_matrix = matrix(c(1.0, 0.3, 0.6, 0.3, 1.0, 0.4, 0.6, 0.4, 1.0 ), 
nrow = 3, ncol = 3)

And we have matrix rnd with 3 independent series of random numbers:
rnd = matrix(rnorm(10000 * 3), nrow = 10000, ncol = 3)


Then, for Cholesky decomposition this approach looks as follows:
u = chol(corr_matrix) corr_rnd = rnd %*% u

For SVD:
svd_m = svd(corr_matrix) 
u = svd_m$u %*% diag(sqrt(svd_m$d)) %*%
t(svd_m$v)corr_rnd = rnd %*% u

For Eigen decomposition:
e = eigen(corr_matrix, symmetric = T)u = e$vectors %*%
diag(sqrt(e$values))corr_rnd = rnd %*% u