Green Header
File Component

The File Component in DataOps Suite is the initial step in data extraction after configuring the file-based data source. It is used to extract data from supported file formats such as CSV, XML, AVRO, JSON, PARQUET, and COBOL copybooks, and store the extracted data as a dataset for further use in workflows. Once the data is extracted through the File Component, users can proceed to perform various operations such as data processing, validation checks, and reconciliation.

Important Notice

Prerequisites

Important Notice

Getting Started

After establishing a File source connection in DataOps Suite, create a new Dataflow or open any existing Dataflow to execute the file component and then, follow the steps below:

The "File" component wizard opens with the Properties page.

File Component Wizard Details

When you want to create or edit a File Component, navigate through the File Component wizard, set up Properties and File, and run the component.

Properties

This is the first step in configuring the File component. Here, you define the basic settings for how the component should behave.

Important Notice
Important Notice

A sample screenshot of the "File" component is shown below.


File

The "File" tab in the File Component is where users configure the details needed to read data from a fixed-length file.

Folder/File Details

The user must either select a file from storage (using the folder icon) or enter the file path manually. This is the source file to be read. 

For a better file browsing experience, a File Manager has been implemented to support sub-folders and fetch data files or folders from S3 and Azure Data Lake System (ADLS), Shared, and Local file locations.

The user can upload any file from the local machine to the S3, ADLS, Local, or Shared file location using the Browse file button. The user can select any folder or file to fetch the data and then click the OK button.

Important Notice

A sample screenshot of the File Manager is shown below:


For the Files in the Shared Folder data source connection, it is possible to execute the folder now. A folder can have many files, so for a better output, it is important to maintain a similar structure of files. This functionality allows you to efficiently integrate data from multiple files within a single shared folder without needing to specify each file individually.

Example: Let us consider a CSV-shared folder data source connection (e.g., CSV_local) having an "Automation_folder" folder.


The folder “Automation_folder” is selected for the execution as shown below.


Handling Duplicate File Uploads

When you upload a file to the File Manager, the system checks if a file with the same name already exists in the selected data source location.

If a duplicate is detected, a warning message will appear:

"File Already Exists – Do you want to overwrite the existing file?"


You will then have two options:

File Options

The read and write options for the selected file source will be displayed. Add new options or modify the existing ones.  By default, you will see the options used while creating the data source connection. If there is a need to customize the options, use the add, edit, or delete buttons.

Encode

Detects the encoding format if the file source was created using an encoding character.

Important Notice

Result

On the Result page, you can execute the File component and preview the results.

Run a Component 

To run a component, perform any one of the following operations:

Or

When the component runs successfully, the status will be displayed as "Completed".

To preview the results, navigate to the following tabs:

Preview

This tab displays the data of the file. The user can customize the number of rows to display in the output. The default count is 50.

Schema

This tab displays the columns and their data type used in the query. The Download Schema icon allows users to download the schema information of the queried data in CSV format. This is particularly useful for understanding the structure and data types of the columns in the result set. This downloaded schema can then be used for further analysis that requires knowledge of the data structure. The user can also filter the columns and their data types for a quick search. 

Statements

This tab displays query statements of SQL and Spark types. When the component is being run, you can see the status "queued" for the statements. Once the statement is run successfully, the status will be shown as "OK."


A sample screenshot of the output of the file component is shown below.



See Also

Dataflow actions are the fundamental operations the user can perform on dataflows. These actions allows the user to perform various actions, such as adding components, viewing datasets, switching to diagram view, downloading the full dataset, and accessing more options.

For more information on Dataflow actions, click here.

A sample screenshot of the various "Dataflow Actions" (highlighted) is shown below.

PreviousNext

© Datagaps. All rights reserved.
Send feedback on this topic to Datagaps Support