DataFrames
DataFrames
The SDK provides convenient helper methods for working with pandas DataFrames. These methods simplify uploading, downloading, and working with tabular datasets.
Upload a DataFrame
Upload a pandas DataFrame directly as a new dataset version:
import pandas as pd
## Create or load your DataFrame
df = pd.DataFrame({
"feature1": [1, 2, 3, 4, 5],
"feature2": ["a", "b", "c", "d", "e"],
"labels": ["cat", "dog", "cat", "dog", "cat"],
"split": ["train", "train", "train", "valid", "valid"]
})
# Upload as a new dataset
dataset_version = client.upload_df(
df=df,
name="my_tabular_dataset",
split_column="split",
label_column="labels"
)Add a new version to an existing dataset:
dataset_version = client.upload_df(
df=updated_df,
dataset_id=existing_dataset.id,
name="v2_more_data",
split_column="split",
label_column="labels"
)| Parameter | Type | Description |
|---|---|---|
| df | pd.DataFrame | The pandas DataFrame to upload |
| dataset_id | str | Existing dataset id (optional, creates new if not provided) |
| name | str | Name for the dataset/version. Default: “my_dataset_version” |
| keep_index | bool | Include DataFrame index in upload. Default: False |
| index_label | str | Column name for the index. Default: “index” |
| separator | str | CSV separator. Default: “,” |
| split_column | str | Column containing split names. Default: “split” |
| label_column | str | Column containing labels. Default: “labels” |
| multi_label | bool | Whether items can have multiple labels. Default: False |
| label_separator | str | Separator for multiple labels. Default: " " |
Returns: DatasetVersion object
Download a DataFrame
Download a dataset version directly as a pandas DataFrame:
df = client.download_df(
dataset_id=my_dataset.id,
dataset_version_id=my_version.id
)
print(df.head())| Parameter | Type | Description |
|---|---|---|
| dataset_id | str | The dataset id |
| dataset_version_id | str | The dataset version id |
| download_folder | str | Temporary folder for download. Default: “tmp” |
| separator | str | CSV separator. Default: “,” |
| **kwargs | Additional arguments passed to pd.read_csv() |
Returns: pd.DataFrame
Read a local DataFrame
Read a CSV file from disk into a DataFrame:
df = client.read_df(
folder="data",
filename="my_data",
extension=".csv",
separator=","
)| Parameter | Type | Description |
|---|---|---|
| folder | str | Folder containing the file |
| filename | str | Filename without extension |
| extension | str | File extension. Default: “.csv” |
| separator | str | CSV separator. Default: “,” |
| **kwargs | Additional arguments passed to pd.read_csv() |
Returns: pd.DataFrame
Working with Structured Datasets
For more control over structured/tabular datasets, use these methods:
Add columns to a version
Configure column names for a structured dataset version:
columns = ["feature1", "feature2", "feature3", "label"]
updated_version = client.add_columns_structured_dataset_version(
dataset_version=my_version,
column_names=columns,
csv_separator=","
)| Parameter | Type | Description |
|---|---|---|
| dataset_version | DatasetVersion | The dataset version to update |
| column_names | List[str] | List of column names |
| csv_separator | str | Separator for CSV. Default: “,” |
Create a structured item
Add a single row to a structured dataset:
item_data = {
"feature1": 10,
"feature2": "category_a",
"feature3": 3.14,
"label": "positive"
}
item = client.create_structured_dataset_item(
dataset_version=my_version,
dataset_split=train_split,
item=item_data
)| Parameter | Type | Description |
|---|---|---|
| dataset_version | DatasetVersion | The dataset version |
| dataset_split | DatasetSplit | The split to add the item to |
| item | dict | Dictionary with column values |
Get a structured item
Retrieve a structured dataset item as a dictionary:
item_data = client.get_structured_dataset_item(
dataset_id=my_dataset.id,
dataset_version_id=my_version.id,
item_id=item.id
)
print(item_data)
# {'feature1': '10', 'feature2': 'category_a', 'feature3': '3.14', 'label': 'positive'}| Parameter | Type | Description |
|---|---|---|
| dataset_id | str | The dataset id |
| dataset_version_id | str | The dataset version id |
| item_id | str | The dataset item id |
Returns: dict with column names as keys
Complete Example
import pandas as pd
from seeme import Client
client = Client()
client.login("username", "password")
# Create training data
train_data = pd.DataFrame({
"age": [25, 30, 35, 40, 45],
"income": [50000, 60000, 70000, 80000, 90000],
"category": ["A", "B", "A", "C", "B"],
"outcome": ["yes", "no", "yes", "yes", "no"],
"split": ["train", "train", "train", "valid", "valid"]
})
# Upload as a new dataset
version = client.upload_df(
df=train_data,
name="customer_classification",
split_column="split",
label_column="outcome"
)
print(f"Created dataset version: {version.id}")
# Later, download and continue working
df = client.download_df(
dataset_id=version.dataset_id,
dataset_version_id=version.id
)
# Add more data and upload a new version
new_data = pd.concat([df, additional_data])
new_version = client.upload_df(
df=new_data,
dataset_id=version.dataset_id,
name="v2_expanded",
split_column="split",
label_column="outcome"
)