Data Infrastructure Optimisation

Project overview

The company faced significant challenges in managing and integrating data across multiple platforms, leading to inefficiencies in reporting and decision-making processes. To address these issues, the project aimed to develop a robust and future-proof data infrastructure designed to streamline data management, enhance reporting capabilities, and support data-driven decision-making across the organisation.

The client

The client was an online pharmacy in Spain, offering a wide range of health and wellness products to customers nationwide. Committed to making healthcare accessible to everyone, provides an efficient, user-friendly platform for customers to conveniently purchase pharmaceuticals online.

Objectives:

Create a Centralized Data Warehouse

The unified data warehouse is to serve as the single source of truth, where data from all tools and platforms is consolidated. This ensures consistency and reliability in reporting and analysis.

Enable Data-Driven Decision-Making

Establish a reporting structure that enables strategic decision-making based on comprehensive and reliable data in Power BI. The reports should combine data coming from different tools and connect to the BigQuery database.

Unlock Analytical Capabilities

Implement a database layer that supports complex data analysis without the immediate need for dashboard creation. This should provide an overview of key metrics, including acquisition, retention, and monetisation, with the ability to segment users based on their behaviour.

Challenges:

Lack of Centralised Data Warehouse

The company had a BigQuery server where they stored the Google Analytics data, but for all the other tools data was not stored in the warehouse but on manually updated Sharepoint files.

Unmapped Data Relationships

The content and relationships of various data sources were undocumented, complicating data integration and analysis.

Limited data retention in BigQuery

Although Google Analytics data was available in BigQuery, the data retention period was only 60 days, limiting long-term analysis.

BigQuery and Power BI integration

At the beginning of the project no integration was defined between BigQuery and PowerBI, making it necessary to manually download and store raw Google Analytics data

This project focused on resolving these challenges by building a scalable data infrastructure that not only addresses current issues but also lays the groundwork for future growth and analytical needs.

Actions

  • Collaborated with stakeholders to select the most suitable data-warehousing, ELT and data visualisation tools for the project
  • Created the main ETL process using Google Cloud Functions and Google Cloud Storage Bucket
  • Developed custom solutions using Google Cloud Functions and Python to connect to API endpoints and pull all the necessary data to the datawarehouse.
  • Analysed data structures and schemas of each source tool, and created a comprehensive data catalog in Atlan, which established the data relationships between different data tables.
  • Reconfigured Google Analytics data retention to store all historical data in the warehouse
  • Built the SQL queries to create the aggregated reporting data layer
  • Conducted workshops with stakeholders to define dashboard requirements and ensure alignment with business needs.
  • Designed and developed interactive Power BI dashboards that offer drill-down capabilities, multiple levels of aggregation, and user-friendly interfaces.
  • Created a custom segmentation logic based on purchase habits, and made end users able to customise segmentation parameters, then filter the dashboards according to default or customised segment.

Tools I used in this project

Result

200 hours/year Time saving

The automation of ETL processes eliminated the need for manual data handling, leading to a saving of 200 work hours on a yearly basis.

Accelerated data analysis

With the creation of data catalog and aggregated reporting layer, analysts can quickly access and query data, leading to faster insights and more agile decision-making.

7.8% revenue increase in 3 months

The user segmentation led to better targeting and optimization of marketing efforts. This improved ROI on marketing campaigns and enhanced customer retention ultimately resulting in a revenue increaser.