Synapse: Get KeyVault Properties Using Token Library

This post is how to get the key vault properties using the token library.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

import sys
from pyspark.sql import SparkSession

linked_service_name = '<KEYVAULT_LINKED_SERVICE_NAME>'
spark = <GET_SPARK_SESSION>
token_library = spark._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary
key_vault_url = token_library.getFullConnectionStringAsMap(linked_service_name).get('url')

print(key_vault_url)
print(token_library.getFullConnectionStringAsMap(linked_service_name))

 

Synapse: SAS Token

This post is how to get the SAS token from a notebook.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

from notebookutils import mssparkutils

linked_service_storage_account_name = '<LINKED_SERVICE_STORAGE_NAME>'
blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_storage_account_name)

spark.conf.set('fs.azure.sas.<CONTAINER_NAME>.<ADLS_STORAGE_ACCOUNT_NAME>.blob.core.windows.net', blob_sas_token

 

Synapse: Environment Variables

This post is how to work with environment variables in Synapse.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

Get Environment Variable

It should be noted that “str” is the type that variable is. You can change it to whatever is required.

var: str = spark.conf.get('spark.executorEnv.<ENV_NAME>')
Set Environment Variable
spark.conf.set('spark.executorEnv.<ENV_NAME>', '<VALUE>')

 

Synapse: List Python Packages

This post is how to list the python packages in various ways.

You can use %pip to list the python packages that are installed.

%pip freeze

However doing it that way may not give you the exact versions that are installed. To get a comprehensive list do the following.

import pkg_resources

for package in pkg_resources.working_set:
    print(package)

 

Synapse: Help Command

This post is just how to use the help command from mssparkutils.

You can use help at various levels of Synapse.

Root

The following command will tell you what areas help can assist you in. This will respond with

  • fs
  • notebook
  • credentials
  • env
from notebookutils import mssparkutils

mssparkutils.help()
FileSystem

If you leave the help command empty it will just return all options that are available for help. If you put a command in then it will explain that command in greater detail.

from notebookutils import mssparkutils

mssparkutils.fs.help()

mssparkutils.fs.help('cp')

 

Synapse: Mounts

This post is how to work with mounts on Synapse.

I suggest mounting to an ADLS storage account. That is what I will assume in the below examples.

List Mounts
from notebookutils import mssparkutils

mssparkutils.fs.mounts()
Get Mount Path

The output of this command will produce ‘/synfs/<number>/mnt/<CONTAINER_NAME>’

from notebookutils import mssparkutils

mount_name = "/mnt/<CONTAINER_NAME>"
mount_path = mssparkutils.fs.getMountPath(mount_name)
Unmount
from notebookutils import mssparkutils

mount_name = "/mnt/<CONTAINER_NAME>"
mssparkutils.fs.unmount(mount_name)
Mount Using a Linked Service

First you must have a linked service created to the storage account. This linked service must be hard-coded and not parameterized in any way.

from notebookutils import mssparkutils

container = '<CONTAINER_NAME>'
storage_account = '<STORAGE_ACCOUNT_NAME>'
sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.
linked_service_name = '<LINKED_SERVICE_NAME>'

mssparkutils.fs.mount(
    source='abfss://%s@%s.dfs.core.windows.net/%s/' % (container, storage_account, sub_folder),
    mountPoint='/mnt/%s' % (container),
    {'linkedService':linked_service_name, 'fileCacheTimeout': 120, 'timeout': 120}
)
Mount Using Configs

You will need to get the secret. Refer to Synapse: Get Secret

from notebookutils import mssparkutils

client_id = '<CLIENT_ID>'
tenant_id = '<TENANT_ID>'
container = '<CONTAINER_NAME>'
storage_account = '<STORAGE_ACCOUNT_NAME>'
sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.

configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.oauth.provider.type": "org.apache.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id": client_id,
  "fs.azure.account.oauth2.client.secret": secret,
  "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/" tenant_id + "/oauth2/token"
}

mssparkutils.fs.mount(
  source='abfss://%s@%s.dfs.core.windows.net/%s' % (container, storage_account, sub_folder),
  mountPoint='/mnt/%s' % (container),
  extraConfigs=configs
)

 

Synapse: Get Secret

This post is how to get a secret from a key vault in Synapse.

If you have Data Exfiltration enabled (which is recommended) then you need to have a Managed Private Endpoint setup to your KeyVault.

You also need to ensure your Synapse Managed Identity has access to your Key Vault.

You also need a un-parameterized Linked Service Created.

Then you can query your Key Vault to get the secret with the following command.

from notebookutils import mssparkutils

secret = mssparkutils.credentials.getSecret('<KEY_VAULT_NAME>', '<SECRET_KEY>', '<LINKED_SERVICE_KEYVAULT_NAME>')

 

Databricks: Set Spark Configs

This post is how to set the spark configs on Databricks or Synapse Notebooks.

First you will need a spark session. Refer to PySpark: Create a Spark Session for more details.

secret = 'value' #I highly suggest you get the password from the keyvault
storage_account = ''
application_id = ''
tenant_id = ''

spark.config.set('fs.azure.account.auth.type.{}.dfs.core.windows.net'.format(storage_account), 'OAuth')

spark.config.set('fs.azure.account.oauth.provider.type.{}.dfs.core.windows.net'.format(storage_account), 'org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider')

spark.config.set('fs.azure.account.oauth2.client.id.{}.dfs.core.windows.net'.format(storage_account), application_id)

spark.config.set('fs.azure.account.oauth2.client.secret.{}.dfs.core.windows.net'.format(storage_account), secret)

spark.config.set('fs.azure.account.oauth2.client.endpoint.{}.dfs.core.windows.net'.format(storage_account), 'https://login.microsoftonline.com/{}/oauth2/token'.format(tenant_id))

If you are running in Databricks you could add them to cluster start. Although I recommand doing it in a notebook instead.

spark.hadoop.fs.azure.account.auth.type.<STORAGE_ACCOUNT>.dfs.core.windows.net OAuth
fs.azure.account.oauth.provider.type.<STORAGE_ACCOUNT>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.id.<STORAGE_ACCOUNT>.dfs.core.windows.net <CLIENT_ID>
fs.azure.account.oauth2.client.secret.<STORAGE_ACCOUNT>.dfs.core.windows.net secret
fs.azure.account.oauth2.client.endpoint.<STORAGE_ACCOUNT>.dfs.core.windows.net https://login.microsoftonline.com/<TENANT_ID>/oauth2/token