Topic 3, Mix Questions
You have an Azure Synapse Analytics dedicated SQL pool that contains the users shown in the following table.
You have an Azure event hub named retailhub that has 16 partitions. Transactions are
posted to retailhub. Each transaction includes the transaction ID, the individual line items,
and the payment details. The transaction ID is used as the partition key.
You are designing an Azure Stream Analytics job to identify potentially fraudulent
transactions at a retail store. The job will use retailhub as the input. The job will output the
transaction ID, the individual line items, the payment details, a fraud score, and a fraud
indicator.
You plan to send the output to an Azure event hub named fraudhub.
You need to ensure that the fraud detection solution is highly scalable and processes
transactions as quickly as possible.
How should you structure the output of the Stream Analytics job? To answer, select the
appropriate options in the answer area.
NOTE: Each correct selection is worth one point
You have a C# application that process data from an Azure IoT hub and performs complex
transformations.
You need to replace the application with a real-time solution. The solution must reuse as
much code as
possible from the existing application
A.
Azure Databricks
B.
Azure Event Grid
C.
Azure Stream Analytics
D.
Azure Data Factory
Azure Stream Analytics
Explanation:
Azure Stream Analytics on IoT Edge empowers developers to deploy near-real-time
analytical intelligence closer to IoT devices so that they can unlock the full value of devicegenerated
data. UDF are available in C# for IoT Edge jobs
Azure Stream Analytics on IoT Edge runs within the Azure IoT Edge framework. Once the
job is created in Stream Analytics, you can deploy and manage it using IoT Hub.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-edge
You build an Azure Data Factory pipeline to move data from an Azure Data Lake Storage
Gen2 container to a database in an Azure Synapse Analytics dedicated SQL pool.
Data in the container is stored in the following folder structure.
/in/{YYYY}/{MM}/{DD}/{HH}/{mm}
The earliest folder is /in/2021/01/01/00/00. The latest folder is /in/2021/01/15/01/45.
You need to configure a pipeline trigger to meet the following requirements:
Existing data must be loaded.
Data must be loaded every 30 minutes.
Late-arriving data of up to two minutes must he included in the load for the time at
which the data should have arrived.
How should you configure the pipeline trigger? To answer, select the appropriate options in
the answer area.
NOTE: Each correct selection is worth one point.
You are designing the folder structure for an Azure Data Lake Storage Gen2 container.
Users will query data by using a variety of services including Azure Databricks and Azure
Synapse Analytics serverless SQL pools. The data will be secured by subject area. Most
queries will include data from the current year or current month.
Which folder structure should you recommend to support fast queries and simplified folder
security?
A.
/{SubjectArea}/{DataSource}/{DD}/{MM}/{YYYY}/{FileData}_{YYYY}_{MM}_{DD}.csv
B.
/{DD}/{MM}/{YYYY}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv
C.
/{YYYY}/{MM}/{DD}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv
D.
/{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv
/{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv
Explanation:
There's an important reason to put the date at the end of the directory structure. If you want
to lock down certain regions or subject matters to users/groups, then you can easily do so
with the POSIX permissions. Otherwise, if there was a need to restrict a certain security
group to viewing just the UK data or certain planes, with the date structure in front a
separate permission would be required for numerous directories under every hour
directory. Additionally, having the date structure in front would exponentially increase the
number of directories as time went on.
Note: In IoT workloads, there can be a great deal of data being landed in the data store that
spans across numerous products, devices, organizations, and customers. It’s important to
pre-plan the directory layout for organization, security, and efficient processing of the data
You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast
columns to specified types of data, and insert the data into a table in an Azure Synapse
Analytic dedicated SQL pool. The CSV file contains three columns named username,
comment, and date.
The data flow already contains the following:
A source transformation.
A Derived Column transformation to set the appropriate types of data.
A sink transformation to land the data in the pool.
You need to ensure that the data flow meets the following requirements:
All valid rows must be written to the destination table.
Truncation errors in the comment column must be avoided proactively.
Any rows containing comment values that will cause truncation errors upon insert
must be written to a file in blob storage.
Which two actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A.
To the data flow, add a sink transformation to write the rows to a file in blob storage.
B.
To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation errors.
C.
To the data flow, add a filter transformation to filter out rows that will cause truncation errors.
D.
Add a select transformation to select only the rows that will cause truncation errors.
To the data flow, add a sink transformation to write the rows to a file in blob storage.
To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation errors.
Explanation:
B: Example:
1. This conditional split transformation defines the maximum length of "title" to be five. Any
row that is less than or equal to five will go into the GoodRows stream. Any row that is
larger than five will go into the BadRows stream.
You have a table named SalesFact in an enterprise data warehouse in Azure Synapse
Analytics. SalesFact contains sales data from the past 36 months and has the following
characteristics:
Is partitioned by month
Contains one billion rows
Has clustered columnstore indexes
At the beginning of each month, you need to remove data from SalesFact that is older than
36 months as quickly as possible.
Which three actions should you perform in sequence in a stored procedure? To answer,
move the appropriate actions from the list of actions to the answer area and arrange them
in the correct order.
You have a self-hosted integration runtime in Azure Data Factory.
The current status of the integration runtime has the following configurations:
Status: Running
Type: Self-Hosted
Version: 4.4.7292.1
Running / Registered Node(s): 1/1
High Availability Enabled: False
Linked Count: 0
Queue Length: 0
Average Queue Duration. 0.00s
The integration runtime has the following node details:
Name: X-M
Status: Running
Version: 4.4.7292.1
Available Memory: 7697MB
CPU Utilization: 6%
Network (In/Out): 1.21KBps/0.83KBps
Concurrent Jobs (Running/Limit): 2/14
Role: Dispatcher/Worker
Credential Status: In Sync
Use the drop-down menus to select the answer choice that completes each statement
based on the information presented.
NOTE: Each correct selection is worth one point
You are designing an Azure Synapse Analytics dedicated SQL pool.
You need to ensure that you can audit access to Personally Identifiable information (PII).
What should you include in the solution?
A.
dynamic data masking
B.
row-level security (RLS)
C.
sensitivity classifications
D.
column-level security
column-level security
You are planning a streaming data solution that will use Azure Databricks. The solution will
stream sales transaction data from an online store. The solution has the following
specifications:
* The output data will contain items purchased, quantity, line total sales amount, and line
total tax amount.
* Line total sales amount and line total tax amount will be aggregated in Databricks.
* Sales transactions will never be updated. Instead, new rows will be added to adjust a
sale.
You need to recommend an output mode for the dataset that will be processed by using
Structured Streaming. The solution must minimize duplicate data.
What should you recommend?
A.
Append
B.
Update
C.
Complete
Complete
Page 2 out of 21 Pages |
Previous |