Data handling in Coretex
Last updated
Last updated
In this section we will go through some ways of managing data in Coretex, namely:
Uploading Datasets,
Downloading Datasets,
Combining Datasets,
Duplicating Datasets,
Converting image datasets into Coretex Image Datasets.
There are three different ways to upload a Dataset to Coretex.ai:
Web UI,
Coretex CLI,
Coretex python library.
To upload a Dataset using Web UI open Datasets page, then click on "New Dataset" button.
After clicking on the "New Dataset" button following UI is displayed:
Name of the Dataset must be entered and it has to be unique. If you are already inside a Project then the Project field will be pre-populated with that value, otherwise you will need to enter the Project too. Type will be pre-populated with the Type of the selected Project, and it can not be changed in any scenario.
You can add Samples into Dataset in two different ways:
Drag and drop a file(s) into a box outlined with dashed lines,
Click on "Upload File" button. This will open default system UI for selecting file(s).
File type which can be selected will depend on Type of the selected Project.
After selecting file(s) and if other parameters are inputted correctly "Create Dataset" button will be enabled. Clicking on this button starts the process of creating the Dataset and uploading file(s). Do not leave the page once you have clicked "Create Dataset" button otherwise the Dataset will be created, but it will be invalid as the process of uploading file(s) will be interrupted.
To upload a dataset using Coretex CLI there is one requirement - properly installed and configured Coretex CLI.
Uploading a dataset using CLI required 3 (optionally 4) parameters:
Dataset name - Must be unique inside a project
Project ID
Project Type ID - must match the type of the Project which is specified using parameter "Project ID"
Optional: Path to dataset directory - if not specified a prompt asking you if you want to use current directory as a dataset directory will be displayed
Once you have those parameters, uploading a dataset can be done by running the following command:
When uploading large datasets it is a good idea to use Coretex CLI instead of Web UI
To upload a Dataset using Coretex Python library there are two requirements - configured Coretex CLI and properly installed Python 3.8 or newer.
To install Coretex python library via pip can be done by running the following command:
The following script is an example of how a Dataset can be created and its Samples uploaded to Coretex.ai.
For a more detailed description of the code visit Coretex python library documentation.
There are three different ways to download a Dataset from Coretex.ai:
Web UI,
Coretex CLI,
Coretex python library.
To download a dataset from Coretex.ai using Web UI first open the Dataset details screen for the Dataset you want to download. For a Dataset of type Other it should look like this:
On the left panel you can see a list of Samples contained in the Dataset. Only the top level folders represent a Sample. To download a Sample press the download icon. Using Web UI downloading a Dataset can only be done Sample by Sample.
Only Datasets of type Other can be downloaded from Web UI. To download Datasets of any other type please use either CLI or Coretex python library.
To download a dataset using Coretex CLI there is one requirement - properly installed and configured Coretex CLI.
Downloading a dataset using CLI requires only one parameter - Dataset ID.
Coretex CLI command for downloading the Dataset is:
Downloading Dataset using Coretex CLI stores the Dataset inside the "storage path" which was configured during the configuration of the CLI. The default path for Coretex CLI storage is "USER_HOME_DIRECTORY/.coretex"
To download a Dataset using Coretex Python library there are two requirements - configured Coretex CLI and properly installed Python 3.8 or newer.
To install Coretex python library via pip can be done by running the following command:
The following script is an example of how a Dataset can be downloaded from Coretex.ai.
Downloading Dataset using Coretex python library stores the Dataset inside the "storage path" which was configured during the configuration of the CLI. The default path for Coretex CLI storage is "USER_HOME_DIRECTORY/.coretex".
If a Dataset has been downloaded using Coretex python library, next time you try to download it the Coretex python library will first check the "storage path" for that Dataset and if it exists and if it is not corrupted it will not download it again.
For a more detailed description of the code visit Coretex python library documentation.
Dataset combine is a feature which is used for combining/merging two or more Datasets into a single resulting dataset. To combine Datasets you first need to select two or more Datasets on the Dataset page. Doing so will change the "New Dataset" button into a "Combine" button.
Only the Datasets belonging to the same Project can be combined.
After clicking on the "Combine" button the following UI is displayed:
You are required to enter the name of the Dataset which will be created as a combination of selected Datasets. Project and Type are pre-populated and their values cannot be changed.
There are two ways in which the Datasets can be combined:
Soft Copy Creates a new Dataset and links all Samples from selected Datasets to the created Dataset. It requires almost no additional storage, and it is almost instantaneous, but it will lock the resulting combined Dataset, as well as selected Datasets. This prevents any future editing of these Datasets, but in return, it minimizes space usage.
Hard Copy Creates a new Dataset and duplicates all of Samples from selected Datasets to the created Dataset. Duplication of all Samples requires storage equal to the original size of selected Datasets, and it takes some time to finish (depending on the size of the selected Datasets). Hard Copy does not lock the resulting combined Dataset nor original Datasets, so they can be edited afterwards.
If you performed a Hard Copy of the selected Datasets, changing one of the original Datasets does not change the resulting combined Dataset, and vice versa.
Dataset duplicate is a feature which is used to duplicate a Dataset. Dataset duplicate can be used to duplicate a Dataset into another task different from the one it currently resides in. If the Dataset you are duplicating is locked, by duplicating that Dataset you can unlock it to allow further editing of the duplicated Dataset.
To duplicate a Dataset you first need to select a Dataset on the Datasets page. Doing so will change the "New Dataset" button into a "Duplicate" button.
After clicking on the "Duplicate" button the following UI is displayed:
You are required to enter the name of the Dataset which will be created as a result of duplicating the selected Dataset. Project and Type are pre-populated and their values cannot be changed.
Duplicating a Dataset performs a copy of the selected Dataset as well as its Samples. The amount of additional storage which will be occupied by duplicating a Dataset is equal to the size of the Dataset selected for duplication. Execution time of the duplication scaled with the size of the Dataset you are duplicating, ie. bigger Datasets take more time to duplicate.
To convert an existing image dataset into a Coretex Image Dataset you will need to use Coretex python library.
There are two requirements for using Coretex Python library - properly configured Coretex CLI and properly installed Python 3.8 or newer.
To install Coretex python library via pip can be done by running the following command:
The supported formats for conversion of the image datasets into Coretex Image Datasets are:
COCO
YOLO
CreateML
Pascal VOC
Label Me
City Scape
To convert any of these image datasets a simple Python script using Coretex python library can be used:
By changing the value of "type" parameter you are changing from what dataset format are you converting your dataset to Coretex Dataset.