PowerExchange for Amazon S3 User Guide for PowerCenter

PowerExchange for Amazon S3 User Guide for PowerCenter

Working with Multiple Files

Working with Multiple Files

You can read from multiple Amazon S3 sources and write to a single target.
To read multiple files, all files must be available in the same Amazon S3 bucket. When you want to read from multiple sources in the Amazon S3 bucket, you must create a
.manifest
file that contains all the source files with the respective absolute path or directory path. You must specify the
.manifest
file name in the following format:
<file_name>.manifest
For example, the
.manifest
file contains source files in the following format:
{ "fileLocations": [{ "URIs": [ "dir1/dir2/file_1.csv", "dir1/dir2/dir4/file_2.csv", "dirA/dirB/file_3.csv", "dirA/dirB/file_4.csv" ] }, { "URIPrefixes": [ "dir1/dir2/", "dir1/dir2/"] } ], "settings": { "stopOnFail": "true" } }
You can configure the
stopOnFail
property to display error messages while reading multiple files. Set the value to true, if you want the PowerCenter Integration Service to display error messages if the read operation fails for any of the source files. If you set the value to false, the error messages appear only in the session log. The PowerCenter Integration Service skips the file that generated the error and continues to read other files.
The
Data Preview
tab displays the data of the first file available in the URI specified in the
.manifest
. If the URI section is empty, the first file in the folder specified in URIPrefixes is displayed.
You can specify an asterisk (*) wildcard in the file name to fetch files from the Amazon S3 bucket. You can specify the asterisk (*) wildcard to fetch all the files or only the files that match the name pattern. Specify the wildcard character in the following format:
abc*.txt abc.*
For example, if you specify
result*.txt
, all the file names starting with the term
result
and ending with the
.txt
file extension are read. If you specify
result.*
, all the file names starting with the term
result
are read regardless of the extension.
Use the wildcard character to specify files from a single folder. For example,
{ "fileLocations": [{ "URIs": [ "dir1/dir2/file_1.csv", "dir1/dir2/dir4/file_2.csv", ] }, { "URIPrefixes": [ "dir1/dir2/", "dir1/dir2/"] } ], { "WildcardURIs": [ "multiread_wildcard/file_1/*.csv" ] } ] "settings": { "stopOnFail": "true" } }
You cannot use the wildcard characters to specify folder names. For example,
{ "WildcardURIs": [ "multiread_wildcard/dir1*/", "multiread_wildcard/*/" ] }
PowerExchange for Amazon S3 supports only asterisk (*) wildcard character.