📋Data handling in Coretex

In this section we will go through some ways of managing data in Coretex, namely:

  1. Uploading Datasets,

  2. Downloading Datasets,

  3. Combining Datasets,

  4. Duplicating Datasets,

  5. Converting image datasets into Coretex Image Datasets.

Uploading Datasets

There are three different ways to upload a Dataset to Coretex.ai:

  • Web UI,

  • Coretex CLI,

  • Coretex python library.

Web UI

To upload a Dataset using Web UI open Datasets page, then click on "New Dataset" button.

After clicking on the "New Dataset" button following UI is displayed:

Name of the Dataset must be entered and it has to be unique. If you are already inside a Project then the Project field will be pre-populated with that value, otherwise you will need to enter the Project too. Type will be pre-populated with the Type of the selected Project, and it can not be changed in any scenario.

You can add Samples into Dataset in two different ways:

  • Drag and drop a file(s) into a box outlined with dashed lines,

  • Click on "Upload File" button. This will open default system UI for selecting file(s).

File type which can be selected will depend on Type of the selected Project.

After selecting file(s) and if other parameters are inputted correctly "Create Dataset" button will be enabled. Clicking on this button starts the process of creating the Dataset and uploading file(s). Do not leave the page once you have clicked "Create Dataset" button otherwise the Dataset will be created, but it will be invalid as the process of uploading file(s) will be interrupted.

Coretex CLI

To upload a dataset using Coretex CLI there is one requirement - properly installed and configured Coretex CLI.

Uploading a dataset using CLI required 3 (optionally 4) parameters:

  • Dataset name - Must be unique inside a project

  • Project ID

  • Project Type ID - must match the type of the Project which is specified using parameter "Project ID"

  • Optional: Path to dataset directory - if not specified a prompt asking you if you want to use current directory as a dataset directory will be displayed

Once you have those parameters, uploading a dataset can be done by running the following command:

coretex dataset create "Dataset Name" PROJECT_ID PROJECT [PATH_TO_DATASET_FOLDER]

When uploading large datasets it is a good idea to use Coretex CLI instead of Web UI

Coretex Python library

To upload a Dataset using Coretex Python library there are two requirements - configured Coretex CLI and properly installed Python 3.8 or newer.

To install Coretex python library via pip can be done by running the following command:

pip install coretex

The following script is an example of how a Dataset can be created and its Samples uploaded to Coretex.ai.

from coretex import ComputerVisionDataset, ComputerVisionSample
from coretex.networking import networkManager


# First step - Authenticate with your credentials
response = networkManager.authenticate("your@user.name", "yourPassword123")
if response.hasFailed():
    raise RuntimeError(">> Failed to authenticate")

# Second step - Create Dataset using desired Dataset type
# In this example we are using ComputerVisionDataset
dataset = ComputerVisionDataset.createDataset(
    name = "Dataset Name",
    projectId = 123
)

if dataset is None:
    raise RuntimeError(">> Failed to create Dataset")

# Third step - Add Samples to Dataset using Sample type
# which matches the type of the created Dataset
# In this example we are using ComputerVisionSample
sample = ComputerVisionSample.createComputerVisionSample(
    datasetId = dataset.id,
    imagePath = "path/to/image.png"
)

if sample is None:
    raise RuntimeError(">> Failed to create Sample")

For a more detailed description of the code visit Coretex python library documentation.

Downloading Datasets

There are three different ways to download a Dataset from Coretex.ai:

  • Web UI,

  • Coretex CLI,

  • Coretex python library.

Web UI

To download a dataset from Coretex.ai using Web UI first open the Dataset details screen for the Dataset you want to download. For a Dataset of type Other it should look like this:

On the left panel you can see a list of Samples contained in the Dataset. Only the top level folders represent a Sample. To download a Sample press the download icon. Using Web UI downloading a Dataset can only be done Sample by Sample.

Only Datasets of type Other can be downloaded from Web UI. To download Datasets of any other type please use either CLI or Coretex python library.

Coretex CLI

To download a dataset using Coretex CLI there is one requirement - properly installed and configured Coretex CLI.

Downloading a dataset using CLI requires only one parameter - Dataset ID.

Coretex CLI command for downloading the Dataset is:

coretex dataset download DATASET_ID

Downloading Dataset using Coretex CLI stores the Dataset inside the "storage path" which was configured during the configuration of the CLI. The default path for Coretex CLI storage is "USER_HOME_DIRECTORY/.coretex"

Coretex Python library

To download a Dataset using Coretex Python library there are two requirements - configured Coretex CLI and properly installed Python 3.8 or newer.

To install Coretex python library via pip can be done by running the following command:

pip install coretex

The following script is an example of how a Dataset can be downloaded from Coretex.ai.

from coretex import NetworkDataset
from coretex.networking import networkManager


# First step - Authenticate with your credentials
response = networkManager.authenticate("your@user.name", "yourPassword123")
if response.hasFailed():
    raise RuntimeError(">> Failed to authenticate")

# Second step - Fetch the Dataset from Coretex
dataset: NetworkDataset = NetworkDataset.fetchById(objectId = 1)

# Third step - Download the Dataset from Coretex
dataset.download()

Downloading Dataset using Coretex python library stores the Dataset inside the "storage path" which was configured during the configuration of the CLI. The default path for Coretex CLI storage is "USER_HOME_DIRECTORY/.coretex".

If a Dataset has been downloaded using Coretex python library, next time you try to download it the Coretex python library will first check the "storage path" for that Dataset and if it exists and if it is not corrupted it will not download it again.

For a more detailed description of the code visit Coretex python library documentation.

Combining Datasets

Dataset combine is a feature which is used for combining/merging two or more Datasets into a single resulting dataset. To combine Datasets you first need to select two or more Datasets on the Dataset page. Doing so will change the "New Dataset" button into a "Combine" button.

Only the Datasets belonging to the same Project can be combined.

After clicking on the "Combine" button the following UI is displayed:

You are required to enter the name of the Dataset which will be created as a combination of selected Datasets. Project and Type are pre-populated and their values cannot be changed.

There are two ways in which the Datasets can be combined:

  • Soft Copy Creates a new Dataset and links all Samples from selected Datasets to the created Dataset. It requires almost no additional storage, and it is almost instantaneous, but it will lock the resulting combined Dataset, as well as selected Datasets. This prevents any future editing of these Datasets, but in return, it minimizes space usage.

  • Hard Copy Creates a new Dataset and duplicates all of Samples from selected Datasets to the created Dataset. Duplication of all Samples requires storage equal to the original size of selected Datasets, and it takes some time to finish (depending on the size of the selected Datasets). Hard Copy does not lock the resulting combined Dataset nor original Datasets, so they can be edited afterwards.

If you performed a Hard Copy of the selected Datasets, changing one of the original Datasets does not change the resulting combined Dataset, and vice versa.

Duplicating Datasets

Dataset duplicate is a feature which is used to duplicate a Dataset. Dataset duplicate can be used to duplicate a Dataset into another task different from the one it currently resides in. If the Dataset you are duplicating is locked, by duplicating that Dataset you can unlock it to allow further editing of the duplicated Dataset.

To duplicate a Dataset you first need to select a Dataset on the Datasets page. Doing so will change the "New Dataset" button into a "Duplicate" button.

After clicking on the "Duplicate" button the following UI is displayed:

You are required to enter the name of the Dataset which will be created as a result of duplicating the selected Dataset. Project and Type are pre-populated and their values cannot be changed.

Duplicating a Dataset performs a copy of the selected Dataset as well as its Samples. The amount of additional storage which will be occupied by duplicating a Dataset is equal to the size of the Dataset selected for duplication. Execution time of the duplication scaled with the size of the Dataset you are duplicating, ie. bigger Datasets take more time to duplicate.

Converting image datasets into Coretex Image Datasets

To convert an existing image dataset into a Coretex Image Dataset you will need to use Coretex python library.

There are two requirements for using Coretex Python library - properly configured Coretex CLI and properly installed Python 3.8 or newer.

To install Coretex python library via pip can be done by running the following command:

pip install coretex

The supported formats for conversion of the image datasets into Coretex Image Datasets are:

  • COCO

  • YOLO

  • CreateML

  • Pascal VOC

  • Label Me

  • City Scape

To convert any of these image datasets a simple Python script using Coretex python library can be used:

from coretex import convert, ConverterProcessorType
from coretex.networking import networkManager


# First step - Authenticate with your credentials
response = networkManager.authenticate("your@user.name", "yourPassword123")
if response.hasFailed():
    raise RuntimeError(">> Failed to authenticate")

# Second step - Convert the dataset to Coretex Dataset
dataset = convert(
    type = ConverterProcessorType.coco,
    datasetName = "Converted COCO dataset",
    projectId = 1,
    datasetPath = "path/to/your/coco/dataset"
)

if dataset is None:
    raise RuntimeError(">> Failed to convert coco dataset to Coretex Dataset")

By changing the value of "type" parameter you are changing from what dataset format are you converting your dataset to Coretex Dataset.

Last updated