If you have a Microsoft ecosystem but have been wanting to take advantage of some of the newer tools on the market, Snowflake’s announcement of their release on Azure today means new opportunities for you to upgrade your analytics platform while not throwing away the investment in and keeping the uniformity of your current environment.
For those who are not familiar, Snowflake is a cloud-based, massively parallel processing (MPP), columnar storage database. It is a newer option for data warehousing that is set up for more efficient querying of large data volumes. It also consumes structured and semi-structured data in a way that traditional relational databases are not designed to do as effectively — think “big data” without the “big overhead”.
With this release, Microsoft companies can evaluate utilizing Snowflake to increase the performance and flexibility of their analytics environment by adding it on top of their existing data integration process. To determine where Snowflake might be a good fit, our Aptitive consultants took a dive into where Snowflake could sit in our current and prospective Microsoft clients’ ecosystem.
Where does Snowflake fit in your current Azure environment?
Snowflake is best used as the home of an analytical layer (or dimensional model for the more technical) that enables reporting. Think of this as the new substitute for products like SQL Server Analysis Services (SSAS).
While we still recommend that you maintain a data integration hub and related process for all of your consuming application needs (think of this as sending consolidated and cleansed data back to each source system to keep them in sync), Snowflake can sit right at the end of that process. Since it is optimized for read activities, it compliments the needs of business intelligence tools like Power BI, Looker, Tableau, Qlik, etc., making it faster for business users to grab the information they need via those tools. Integration and ETL is possible with many tools and services, including Azure Data Factory.
When would you want to use Snowflake?
There are two main ideas behind Snowflake’s competitive advantage when it comes to data warehousing platforms: is its automatic optimization of query execution and the hands-off nature of its maintenance.
Given what we have seen thus far, we recommend Snowflake for two main use cases:
- Developing a more efficient analytics platform
- Creating a platform for flexible data discovery
Our clients that fit within the above two use cases usually had:
- Multiple business users — the more people you have querying your database at one time, the more the database has to be configured to handle that load so as to not lock up other processes. Traditional databases can be scaled up to handle larger reads (think of a query that produces data), but this takes a decent amount of time and effort to achieve and they are more often optimized for writes (think of loading data into a table). In Snowflake, a user can spin up resources for just the one query, then spin it back down right after. This allows for a more modular use of higher power resources.
- Lots of data — if you have ever tried to perform a huge query on your current system, you likely noticed a slow down from your usual processing. Traditional databases are not as optimized for read activities as columnar databases are. This makes options like Snowflake more attractive to those performing heavier analytical queries on a regular basis.
“Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certain broad classes of workloads, column-based systems can outperform rowbased systems. ”— MIT Computer Science and Artificial Intelligence Laboratory
- Mix of structured and semi-structured data — though many traditional databases offer options for consuming semi-structured data like (think JSON, XML, etc.), they are not optimized to do so. If you have a mix of structured and semi-structured data, options like Snowflake might be more efficient for your process.
What’s the tradeoff?
Snowflake can be a great option for clients who need a true analytical platform where they can perform data discovery or need to read out a lot of data at once. That said, the following are situations where an alternative to Snowflake might make more sense:
- You need a data hub to keep your various applications in sync (application integration and API endpoints)
- You are not trying to perform complex aggregations or data discovery
- You are invested in an existing solution that does not have many performance or management overhead concerns
So what’s the gist?
Snowflake makes it easy to kick off a data warehouse project with its ease of use and start-small, scale-big approach. Despite this innovative architecture, however, we still recommend applying the same data warehousing fundamentals that you would have in the past.
Yes, Snowflake will allow you to create views on top of messy raw source data, but you will ultimately lose the battle on performance as your data grows in complexity. We suggest approaching a new Snowflake deployment with a vision of your data platform as a whole — not just the analytics layer. Performing all your complex business logic in Snowflake is possible, but due to the columnar architecture, integration use cases are better served in complimentary cloud database platforms, such as Azure SQL.
With a roadmap and strategy in place for both your integration and analytics needs, you will be able to better select the right mix of tools and technologies for your solution. That piece is where we come in. Get in touch with us for a Snowflake planning and whiteboarding session, on the house.