DAS-C01 Online Practice Questions and Answers

Questions 4

A media company has a streaming playback application. The company needs to collect and analyze data to provide near-real-time feedback on playback issues within 30 seconds. The company requires a consumer application to identify playback issues, such as decreased quality during a specified time frame. The data will be streamed in JSON format. The schema can change over time.

Which solution will meet these requirements?

A. Send the data to Amazon Kinesis Data Firehose with delivery to Amazon S3. Configure an S3 event to invoke an AWS Lambda function to process and analyze the data.

B. Send the data to Amazon Managed Streaming for Apache Kafka. Configure Amazon Kinesis Data Analytics for SQL Application as the consumer application to process and analyze the data.

C. Send the data to Amazon Kinesis Data Firehose with delivery to Amazon S3. Configure Amazon S3 to initiate an event for AWS Lambda to process and analyze the data.

D. Send the data to Amazon Kinesis Data Streams. Configure an Amazon Kinesis Data Analytics for Apache Flink application as the consumer application to process and analyze the data.

Browse 285 Q&As

Questions 5

A company has 1 million scanned documents stored as image files in Amazon S3. The documents contain typewritten application forms with information including the applicant first name, applicant last name, application date, application type, and application text. The company has developed a machine learning algorithm to extract the metadata values from the scanned documents. The company wants to allow internal data analysts to analyze and find applications using the applicant name, application date, or application text. The original images should also be downloadable. Cost control is secondary to query performance.

Which solution organizes the images and metadata to drive insights while meeting the requirements?

A. For each image, use object tags to add the metadata. Use Amazon S3 Select to retrieve the files based on the applicant name and application date.

B. Index the metadata and the Amazon S3 location of the image file in Amazon OpenSearch Service (Amazon Elasticsearch Service). Allow the data analysts to use OpenSearch Dashboards (Kibana) to submit queries to the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster.

C. Store the metadata and the Amazon S3 location of the image file in an Amazon Redshift table. Allow the data analysts to run ad-hoc queries on the table.

D. Store the metadata and the Amazon S3 location of the image files in an Apache Parquet file in Amazon S3, and define a table in the AWS Glue Data Catalog. Allow data analysts to use Amazon Athena to submit custom queries.

Browse 285 Q&As

Questions 6

A company is streaming its high-volume billing data (100 MBps) to Amazon Kinesis Data Streams. A data analyst partitioned the data on account_id to ensure that all records belonging to an account go to the same Kinesis shard and order is maintained. While building a custom consumer using the Kinesis Java SDK, the data analyst notices that, sometimes, the messages arrive out of order for account_id. Upon further investigation, the data analyst discovers the messages that are out of order seem to be arriving from different shards for the same account_id and are seen when a stream resize runs.

What is an explanation for this behavior and what is the solution?

A. There are multiple shards in a stream and order needs to be maintained in the shard. The data analyst needs to make sure there is only a single shard in the stream and no stream resize runs.

B. The hash key generation process for the records is not working correctly. The data analyst should generate an explicit hash key on the producer side so the records are directed to the appropriate shard accurately.

C. The records are not being received by Kinesis Data Streams in order. The producer should use the PutRecords API call instead of the PutRecord API call with the SequenceNumberForOrdering parameter.

D. The consumer is not processing the parent shard completely before processing the child shards after a stream resize. The data analyst should process the parent shard completely first before processing the child shards.

Browse 285 Q&As

Questions 7

A company has collected more than 100 TB of log files in the last 24 months. The files are stored as raw text in a dedicated Amazon S3 bucket. Each object has a key of the form year-month-day_log_HHmmss.txt where HHmmss represents the time the log file was initially created. A table was created in Amazon Athena that points to the S3 bucket. One-time queries are run against a subset of columns in the table several times an hour.

A data analyst must make changes to reduce the cost of running these queries. Management wants a solution with minimal maintenance overhead.

Which combination of steps should the data analyst take to meet these requirements? (Choose three.)

A. Convert the log files to Apace Avro format.

B. Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data.

C. Convert the log files to Apache Parquet format.

D. Add a key prefix of the form year-month-day/ to the S3 objects to partition the data.

E. Drop and recreate the table with the PARTITIONED BY clause. Run the ALTER TABLE ADD PARTITION statement.

F. Drop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE statement.

Browse 285 Q&As

Questions 8

A data analytics specialist is building an automated ETL ingestion pipeline using AWS Glue to ingest compressed files that have been uploaded to an Amazon S3 bucket. The ingestion pipeline should support incremental data processing.

Which AWS Glue feature should the data analytics specialist use to meet this requirement?

A. Workflows

B. Triggers

C. Job bookmarks

D. Classifiers

Browse 285 Q&As

Questions 9

A media company is using Amazon QuickSight dashboards to visualize its national sales data. The dashboard is using a dataset with these fields: ID, date, time_zone, city, state, country, longitude, latitude, sales_volume, and number_of_items.

To modify ongoing campaigns, the company wants an interactive and intuitive visualization of which states across the country recorded a significantly lower sales volume compared to the national average.

Which addition to the company's QuickSight dashboard will meet this requirement?

A. A geospatial color-coded chart of sales volume data across the country.

B. A pivot table of sales volume data summed up at the state level.

C. A drill-down layer for state-level sales volume data.

D. A drill through to other dashboards containing state-level sales volume data.

Browse 285 Q&As

Questions 10

A technology company has an application with millions of active users every day. The company queries daily usage data with Amazon Athena to understand how users interact with the application. The data includes the date and time, the location ID, and the services used. The company wants to use Athena to run queries to analyze the data with the lowest latency possible.

Which solution meets these requirements?

A. Store the data in Apache Avro format with the date and time as the partition, with the data sorted by the location ID.

B. Store the data in Apache Parquet format with the date and time as the partition, with the data sorted by the location ID.

C. Store the data in Apache ORC format with the location ID as the partition, with the data sorted by the date and time.

D. Store the data in .csv format with the location ID as the partition, with the data sorted by the date and time.

Browse 285 Q&As

Questions 11

An online food delivery company wants to optimize its storage costs. The company has been collecting operational data for the last 10 years in a data lake that was built on Amazon S3 by using a Standard storage class. The company does not keep data that is older than 7 years. The data analytics team frequently uses data from the past 6 months for reporting and runs queries on data from the last 2 years about once a month. Data that is more than 2 years old is rarely accessed and is only used for audit purposes.

Which combination of solutions will optimize the company's storage costs? (Choose two.)

A. Create an S3 Lifecycle configuration rule to transition data that is older than 6 months to the S3 Standard-Infrequent Access (S3 Standard-IA) storage class. Create another S3 Lifecycle configuration rule to transition data that is older than 2 years to the S3 Glacier Deep Archive storage class.

B. Create an S3 Lifecycle configuration rule to transition data that is older than 6 months to the S3 One Zone-Infrequent Access (S3 One Zone-IA) storage class. Create another S3 Lifecycle configuration rule to transition data that is older than 2 years to the S3 Glacier Flexible Retrieval storage class.

C. Use the S3 Intelligent-Tiering storage class to store data instead of the S3 Standard storage class.

D. Create an S3 Lifecycle expiration rule to delete data that is older than 7 years.

E. Create an S3 Lifecycle configuration rule to transition data that is older than 7 years to the S3 Glacier Deep Archive storage class.

Browse 285 Q&As

Questions 12

A company is using a single master node in an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster to provide a search API to its front-end applications. The company configured an automated process on AWS that monitors the OpenSearch Service cluster and automatically adds data nodes to scale the cluster, when needed. During the initial load testing, the system reacted to scaling events properly by adding data nodes, but every time a new data node is added, the company experiences a blue/green deployment that creates a disruption in service. The company wants to create highly available solution that will prevent these service disruptions.

Which solution meets these requirements?

A. Increase the number of OpenSearch Service master nodes from one to two.

B. Configure multi-zone awareness on the OpenSearch Service cluster.

C. Configure the OpenSearch Service cluster to use three dedicated master nodes.

D. Disable OpenSearch Service Auto-Tune and roll back its changes.

Browse 285 Q&As

Questions 13

A company is attempting to use Amazon Athena to run analytics queries. The company is using the UNLOAD statement and has noticed that the result of the query is not appearing in the destination Amazon S3 bucket. The query previously ran successfully, and the company has not changed any configuration since then.

Which reason would prevent the query from running successfully?

A. The destination S3 bucket does not have the output folder.

B. The destination S3 bucket contains files from previous queries.

C. The destination S3 bucket has an S3 Lifecycle policy configured.

D. The destination S3 bucket has versioning enabled.

Browse 285 Q&As

Exam Code: DAS-C01

Exam Name: AWS Certified Data Analytics - Specialty (DAS-C01)

Last Update: May 10, 2024

Questions: 285 Q&As