Table of Contents

Search

  1. Preface
  2. Introduction to Data Engineering Streaming
  3. Data Engineering Streaming Administration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Window Transformation
  8. Appendix A: Connections
  9. Appendix B: Monitoring REST API Reference
  10. Appendix C: Sample Files

Complex File Execution Parameters

Complex File Execution Parameters

When you write to an HDFS complex file, you can configure how the complex file data object writes to the file. Specify these properties in the execution parameters property of the streaming mapping.
Use execution parameters to configure the following properties:
Rollover properties
When you write to an HDFS complex file, the file rollover process closes the current file that is being written to and creates a new file on the basis of file size or time. When you write to the HDFS file, you can configure a time-based rollover or size-based rollover. You can use the following optional execution parameters to configure rollover:
  • rolloverTime
    . You can configure a rollover of the HDFS file when a certain period of time has elapsed. Specify rollover time in hours. For example, you can specify a value of 1.
  • rolloverSize
    . You can configure a rollover of the HDFS target file when the target file reaches a certain size. Specify the size in GB. The default rollover size is 1 GB.
The default is size-based rollover. You can implement both rollover schemes for a target file, in which case, the event that occurs first triggers a rollover. For example, if you set rollover time to 1 hour and rollover size to 1 GB, the target service rolls the file over when the file reaches a size of 1 GB even if the 1-hour period has not elapsed.
Pool properties
You can configure the maximum pool size that one Spark executor can have to write to a file. Use the
pool.maxTotal
execution parameter to specify the pool size. Default pool size is 8.
Retry Interval
You can specify the time interval for which Spark tries to create the target file or write to it if it fails to do so the first time. Spark tries a maximum of three times during the time interval that you specify. Use the
retryTimeout
execution parameter to specify the timeout in milliseconds. Default is 30,000 milliseconds.