S3 get list of files in folder

Operation Name

Get File/Folder List

Function overview

Get file/folder list on Amazon S3

Data Model

Data model of this component is XML type.

Properties

S3 get list of files in folder
For information about using variables, refer to "variables".
Basic settings
Item nameRequired / OptionalUse of VariablesDescriptionRemarks
Name Required Not available Enter the name on the script canvas.  
Required settings
Item nameRequired / OptionalUse of VariablesDescriptionRemarks
Destination Required Not available Select Global Resources.
  • [Add]:
    Adds new global resource.
  • [Edit list]:
    Global resource settings can be edited with "Edit Resource list".
 
Bucket name Required Available Specify Bucket.
  • If specifying Bucket which large number of files exist in [Bucket name], it may takes time to update list of [Folder path].
    S3 get list of files in folder
    For avoidance method, refer to "Notes" section.
Folder path Required Available Enter Amazon S3 folder path.
  • S3 get list of files in folder
    The Amazon S3 folder path specified needs to be absolute path.
  • S3 get list of files in folder
    Constrained characters of DataSpider File System cannot be used except for path separator "/".
Recursive processing Optional Not available For a specified folder, select whether to get the file/folder recursively or not.
  • [Checked]:
    If the specified folder contains folders, then get the file/folder recursively.
  • [Not Checked]:(default)
    Gets a list of only the specified folder.
 
Include permissions in result Optional Not available Select whether to include file access right information in the results.
  • [Checked]:
    Include file access right information in the results.
  • [Not Checked]:(default)
    Do not include file access right information in the results.
 
Data processing method
Item nameRequired / OptionalUse of VariablesDescriptionRemarks
Mass data processing Required Not available Select a data processing method.
  • [Use script settings]: (default)
    Apply mass data processing settings of script property to adapter.
  • [Disable]:
    Mass data processing is not performed.
  • [Enable]:
    Mass data processing is performed.
 
Comment
Item nameRequired / OptionalUse of VariablesDescriptionRemarks
Comment Optional Not available You can write a short description of this adapter.
The description will be reflected in the specifications.
 

Schema

Input Schema

None.

Output Schema


  
    
  

Element NameAttribute NameDescriptionRemarks
server -    
bucket - Appear in each Bucket which is gotten.  
name Output the name of the retrieved bucket.  
status Output the status of the retrieved bucket.
  • Exist: Status that the Bucket exists.
  • ErrorBucketNotFound: The status that specified Bucket does not exist in Amazon S3. The specified [Bucket name] may be incorrect.
  • Error: The status that because error has occurred, Bucket could not be obtained.
 
file - Appear in each gotten file/folder.  
etag Outputs the ETag of the retrieved file/folder.
  • If file/folder information could not be retrieved successfully, the value is blank.
name Output the name of the retrieved file/folder.
  • If status is "ErrorRemoteFolderNotFound" then the value is blank.
public Output access right of retrieved files/folder.
  • true: Public
  • false: Private
  • If file/folder information could not be retrieved successfully, the value is blank.
  • Should the "READ" authority be granted to "AllUsers Group", the access authority becomes "Public".
  • If [Include permissions in result] is [Not Checked] then the value is blank.
remotepath Output file path on Amazon S3 of the retrieved file/folder.  
size Outputs the size of the file/folder retrieved. The unit is byte.
  • If file/folder information could not be retrieved successfully, the value is blank.
  • In case of folder, "0" is output.
status Output the status of the gotten file/folder.
  • Exist: Status that file/folder exists.
  • Virtual:Status that the folder specified in [Folder path] does not exist on Amazon S3 and files and/or folders exist in the folder.
  • ErrorRemoteFolderNotFound: The status that the specified [Folder path] is not on the Amazon S3, or file is specified.
  • Error: The status that because an error occurred file/folder information could not be got.
 
storageclass Output Storage Class of the gotten file/folder.
  • STANDARD: Standard
  • REDUCED_REDUNDANCY: RRS(Reduced Redundancy Storage)
  • STANDARD_IA: Standard_IA
  • If file/folder information could not be retrieved successfully, the value is blank.
type Output the classification of gotten file/folder.
  • File: File
  • Folder: Folder
 
updated Output the last updated date of the retrieved file/folder.
  • If file/folder information could not be retrieved successfully, the value is blank.
  • Output by the format of [xmlfw.daterenderingformat] inside system property.If it has not been set, the default format will be "yyyy-MM-dd'T'HH: mm: ss.SSSZZ "is printed on.
    Example :2007-10-16T13: 15:22.738 +0900
  • Time zone is the time zone of DataSpiderServer.

Loading schema in Mapper

Schema is loaded automatically.
S3 get list of files in folder
See "Edit Schema" for details with regards to defining a schema.

Mass data processing

Mass data processing is supported.

PSP Usage

PSP is not supported.

Available component variables

Component variable nameDescriptionRemarks
count Return the number which is the total number of the gotten folder number and file number.
  • The value defaults to null.
  • The sum of "folder_count" and "file_count".
folder_count Returns the number of retrieved folder.
  • The value defaults to null.
file_count Returns the number of retrieved file.
  • The value defaults to null.
message_category Stroes the category to which corresponding message code belongs to, when an error occurs.
  • The value defaults to null.
message_code Stores its corresponding message code of occured error.
  • The value defaults to null.
message_level Stores the severity of the corresponding message code of the occured error.
  • The value defaults to null.
error_type Stores the type of the occured error.
  • The value defaults to null.
  • Error is represented in the format depicted below.
    Example:java.io.FileNotFoundException
  • S3 get list of files in folder
    The message may vary depending on the DataSpider Servista version.
error_message Stores the error message for the occured error.
  • The value defaults to null.
  • S3 get list of files in folder
    The message may vary depending on the DataSpider Servista version.
error_trace Stores stack trace of the occurred error.
  • The value defaults to null.
  • S3 get list of files in folder
    The message may vary depending on the DataSpider Servista version or the client application used.

Specification limitations

None.

Main exceptions

Exception nameCausesSolution
ResourceNotFoundException
Resource definition could not be found.Name: []
[Destination] is not specified. Please specify [Destination].
ResourceNotFoundException
Resource definition could not be found.Name: []
Resource definition selected in [Destination] is not found. Please check the global resource specified in [Destination].
InvalidPropertyConfigurationException
is not specified.
[] is not specified. Please specify [].
Status Code: 403, AWS Request ID: XXXXXXXXXXXXXXXX, AWS Error Code: InvalidAccessKeyId, AWS Error Message: The AWS Access Key Id you provided does not exist in our records., S3 Extended Request ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX [Access Key ID] of global resources specified in [Destination] is not correct. Check the settings of the global resource specified in [Destination].
Status Code: 403, AWS Request ID: XXXXXXXXXXXXXXXX, AWS Error Code: SignatureDoesNotMatch, AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method., S3 Extended Request ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX [Secret Access Key] of the global resources specified in [Destination] is incorrect. Check the settings of the global resource specified in [Destination].
com.amazonaws.SdkClientException
Unable to execute HTTP request:
[Endpoint] of the global resource specified in [Destination] is incorrect. Check the settings of the global resource specified in [Destination].

Notes

  • Use HTTPS for communicating between Amazon S3 and this adapter.

  • If the setting of Region in which Bucket exist and endpoint is different, because it takes time to propagate the status of Bucket and file/folder, you may not get the status of latest Bucket and file/folder and fail to execute the operation.

  • If specifying Bucket which large number of files exist in [Bucket name], it may takes time to update list of [Folder path].
    in these cases, please set the following property items and do not show items in the list.
    KeyLocationDescriptionRemarks
    itemmaxsize $DATASPIDER_HOME/server/plugin/data_processing/modules/amazon_s3_adapter/META-INF/
    adapter.properties
    The setting procedure is as follows.
    1. Stop DataSpiderServer.
    2. Open adapter.properties file, add "itemmaxsize=0" and save.
    3. Start DataSpiderServer.
    • S3 get list of files in folder
      If you set this property, please input the path of the folder to [Folder path] directly.

  • When running, if an error unrelated to the connection (error which is not described in "main exception") occurs, set [Error] to status attribute of file element of the file / folder in which error occurs, then output as result data and continue the further process. When running, if an error unrelated to the connection (error which is not described in "main exception") occurs, set [Error] to status attribute of file element of the file / folder in which error occurs, then output as result data and continue the further process.

How do you get all the files from a folder in the Amazon S3 bucket?

To list all files, located in a folder of an S3 bucket, use the s3 ls command, passing in the entire path to the folder and setting the --recursive parameter.

How do I see how many files are in a S3 bucket?

Go to AWS Billing, then reports, then AWS Usage reports. Select Amazon Simple Storage Service, then Operation StandardStorage. Then you can download a CSV file that includes a UsageType of StorageObjectCount that lists the item count for each bucket.

How do I view files in S3 bucket?

In AWS Explorer, expand the Amazon S3 node, and double-click a bucket or open the context (right-click) menu for the bucket and choose Browse. In the Browse view of your bucket, choose Upload File or Upload Folder. In the File-Open dialog box, navigate to the files to upload, choose them, and then choose Open.

How do I extract files from S3 bucket?

In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.