Azure: Synapse Analytics w/ DataBricks and DevOps

Andrew Fogarty

05/07/2021

# load python
library(reticulate)
use_python('C:/Users/Andrew/Anaconda3/')
use_condaenv(condaenv='my_ml', required=TRUE)
library(knitr)

1 Introduction

Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated resources—at scale. Azure Synapse brings these worlds together with a unified experience to ingest, explore, prepare, manage, and serve data for immediate BI and machine learning needs.

In this demo, we will walk through the necessary procedures to:

2 Create an Azure Synapse Environment

In order to create an Azure Synapse environment, several preparatory objects are needed:

In the image below, we create the Synapse workspace, storage account, and file system name in one single effort.

3 Inject SQL Tables into our Data Lake

We can now launch Azure Synapse Analytics, which looks like the below image:

To use a modern data warehousing technology like a Data Lake, we need to fill it with data. Data Lakes store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. Synapse can fill our Data Lake with data from just about any source we can imagine, but to do that, we must setup connections between the sources of our data (e.g., SQL tables) and the sink (our Data Lake) so that we can copy the data and have our own versions for analysis.

Our first task is to connect to an Azure SQL Database which has the commonly known AdventureWorks data set already loaded. First, we open our Data tab, select Linked, press the + sign, and then select Integration dataset to open up a new connection to our Azure SQL Database from where we want to copy its data into our Data Lake.

Next, we instantiate our connection by creating a new link to the Azure SQL Database by specifying the Azure subscription it is under, the Azure SQL Server name, the database name, the user name, and the password. Lastly, we test the connection to ensure we entered our login credentials appropriately.