This is last in the two part series where I have have tried to explain approaches to achieving agility with data. If you have not already gone through part I , then follow this link.
Reminder of what we are trying to achieve by adopting any one or hybrid approach is as follows:
- Optimize Query performance
- Common Query Language
- Central data model for business analysts
- Fast access to data insights
The part – I of this series helped us understand the single Physical Data Store approach and now we are going to talk about Logical Data Store Approach
Logical Data Store Approach
In this approach we do not execute a Load of data to single store but tend to hand off more directly to data analysts ability to construct logical view or data models across various data sources without the need of lifting and shifting the data. There is a need to construct logical data models and to a large extent removes the need of developers to get involved straight up in any process.
The above landscape tells us that Single Data Store architecture does provide some inhibitions to agility at the end of the day and this is something which logical data ware house architecture is looking to address.
The main theme here is that we are centralizing the data models as opposed to the data itself.
Let us now summarize the approaches across both major themes to achieve agility:
Considerations to Single Physical Data Store Approach
Brings data to one place and then use the store to do transformations
Takes an approach where the lake contains all relevant information in raw state post ingestion on continuous basis to cater to multiple personas
If used in conjunction to ELT architecture, it provides for a fine balance between developer and analyst community. The schematization of raw data is helpful and allows analysts to create logical data models post transformation within the store
Extent of development required depends on choice of ELT infrastructure adopted
It is not a hard choice or decision of CTO’s organization and in essence with less engineering resources you may still achieve quite a lot
- It is dependent on the architecture that the teams would have followed in bringing data to a single store, implying that if customer connector architecture or ETL approach has been adopted with wrong choices then, the friction to get data in the store will remain very high
- Storage of data and connecting to DWH will determine pricing of bring it all together along with other investments to standardize the ingestion pipeline architecture
Considerations to Logical Data Store Approach
- It centralizes the data modelling and not the actual raw data store
- It centralizes the modeled data for BI exposure
- It provides for more self-service BI architecture
- Maturity of organization and type of skill set to operate this kind of infrastructure
- At what size should this be recommended?
- How much help would be required for multiple businesses become self-serve on this model?
- The CTO organization can make a choice for this but would need Data Ops to work alongside BI for creating & enabling data models that allow you to operate and leverage the power or else this can get reduced to being just another ELT infra that may not justify its deployment
Through this mini-series , one would get general idea of various methods by which agility can be achieved to unlock the golden joins ( as I call it ) that drives maximum value for the organization and provides data when it is needed most.
According to me in order to make a choice , try to introspect and define the maturity index of following three parameters
- Analyst Org
- Engineering Org
- Current DWH infrastructure
- Data set sizes
In addition to this also be reminded that hybrid approach will always bean option if the organization is quite large and centralization in general to drive all the personas might not fit through one or the other working model.