Microsoft Azure Data Lake Storage Gen2 Read Use Case
Microsoft Azure Data Lake Storage Gen2
Read Use Case
If you want to read large data sets, the task can take a long time to process. You can configure the following read operation properties to partition the source and read the partitions concurrently, which can optimize performance:
Block Size
: partitions a large file or object into smaller parts each of specified block size. When reading a large file, consider partitioning a large file into smaller parts and configure
Concurrent Threads
to spawn required number of threads to process data in parallel.
Concurrent Threads
: number of concurrent connections to read data from
Microsoft Azure Data Lake Storage Gen2
. When reading a large file or object, you can spawn multiple threads to process data. Default is 10.
You must configure
Block Size
if you want multiple threads to process data in parallel.
The following image shows the source properties for parallel read from a large source file: