Set up Google Cloud Storage
Dynamically import tasks and export annotations to Google Cloud Storage (GCS) buckets in Label Studio. For details about how Label Studio secures access to cloud storage, see Secure access to cloud storage.
Configure access to your Google Cloud Storage bucket
First, review the information in Cloud storage for projects and Secure access to cloud storage.
Then you will need to complete the following prerequisites:
1. Enable programmatic access to your bucket
See Cloud Storage Client Libraries in the Google Cloud Storage documentation for how to set up access to your GCS bucket.
2. Set up authentication to your bucket
Your account must have the Service Account Token Creator and Storage Object Viewer roles and storage.buckets.get access permission. See Setting up authentication and IAM permissions for Cloud Storage in the Google Cloud Storage documentation.
note
If you are using WIF, see Service account permissions below.
3. Configure CORS
Set up cross-origin resource sharing (CORS) access to your bucket, using a policy that allows GET access from the same host name as your Label Studio deployment. See Configuring cross-origin resource sharing (CORS) in the Google Cloud User Guide.
note
This is only required if you are using pre-signed URLs. If you are using proxying, you do not have to configure CORS. For more information, see Pre-signed URLs vs Storage proxies.
Use or modify the following example:
echo '[
{
"origin": ["*"],
"method": ["GET"],
"responseHeader": ["Content-Type","Access-Control-Allow-Origin"],
"maxAgeSeconds": 3600
}
]' > cors-config.json
Replace YOUR_BUCKET_NAME with your actual bucket name in the following command to update CORS for your bucket:
gsutil cors set cors-config.json gs://YOUR_BUCKET_NAME
Google Cloud Storage
Before you begin:
- Review the information in Cloud storage for projects and Secure access to cloud storage.
- Configure access to your bucket.
Google Application Credentials
You will need to provide Google Application Credentials. These will be a JSON file that you input while setting up your storage.
- From the Google Cloud Console, go to IAM & Admin > Service Accounts.
- Select the specific service account you need credentials for. If you don’t have one, create a new one.
- In the service account details, go to the Keys tab and click Add Key > Create new key.
- Select the JSON key type and click Create. The JSON file will be generated and automatically downloaded to your computer.
See also:
note
If you're using a service account to authorize access to the Google Cloud Platform, make sure to activate it. See gcloud auth activate-service-account.
Create a source storage connection
From Label Studio, open your project and select Settings > Cloud Storage > Add Source Storage.
Select Google Cloud Storage and click Next.
Configure Connection
Complete the following fields and then click Test connection:
| Field | Description |
|---|---|
| Storage Title | Enter a name to identify the storage connection. |
| Bucket Name | Enter the name of your GCS bucket. |
| Google Application Credentials |
Enter the JSON file with the GCS credentials you created to manage authentication for your bucket. On-prem users: Alternatively, you can use the GOOGLE_APPLICATION_CREDENTIALS environment variable and/or set up Application Default Credentials, so that users do not need to configure credentials manually. See Application Default Credentials for enhanced security below. |
| Google Project ID |
Enter the ID of your Google project in which the bucket is located (for example, my-label-studio-project). If you're unsure, you can find this in Google Cloud Console under IAM & Admin > Settings. |
| Use pre-signed URLs (On) / Proxy through the platform (Off) |
This determines how data from your bucket is loaded:
For more information, see Pre-signed URLs vs Storage proxies. |
| Expire pre-signed URLs (minutes) | Control how long pre-signed URLs remain valid. |
Import Settings & Preview
Complete the following fields and then click Load preview to ensure you are syncing the correct data:
| Bucket Prefix | Optionally, enter the directory name within your bucket that you would like to use. For example, data-set-1 or data-set-1/subfolder-2. |
| Import Method | Select whether you want create a task for each file in your bucket or whether you would like to use a JSON/JSONL/Parquet file to define the data for each task. |
| File Name Filter | Specify a regular expression to filter bucket objects. Use .* to collect all objects. |
| Scan all sub-folders | Enable this option to perform a recursive scan across subfolders within your container. |
Review & Confirm
If everything looks correct, click Save & Sync to sync immediately, or click Save to save your settings and sync later.
Tip
You can also use the API to sync import storage.
Create a target storage connection
From Label Studio, open your project and select Settings > Cloud Storage > Add Target Storage.
Select Google Cloud Storage and click Next.
Complete the following fields:
| Storage Title | Enter a name to identify the storage connection. |
| Bucket Name | Enter the name of your GCS bucket. |
| Bucket Prefix |
Optionally, enter the directory name within your bucket that you would like to use. For example, data-set-1 or data-set-1/subfolder-2.
|
| Google Application Credentials |
Enter the JSON file with the GCS credentials you created to manage authentication for your bucket. On-prem users: Alternatively, you can use the GOOGLE_APPLICATION_CREDENTIALS environment variable and/or set up Application Default Credentials, so that users do not need to configure credentials manually. See Application Default Credentials for enhanced security below. |
| Google Project ID |
Enter the ID of your Google project in which the bucket is located (for example, my-label-studio-project). If you're unsure, you can find this in Google Cloud Console under IAM & Admin > Settings. |
| Can delete objects from storage | Enable this option if you want to delete annotations stored in the bucket when they are deleted in Label Studio. Your credentials must include the ability to delete bucket objects. |
After adding the storage, click Sync.
Tip
You can also use the API to sync export storage.
Application Default Credentials for enhanced security for GCS
If you use Label Studio on-premises with Google Cloud Storage, you can set up Application Default Credentials to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually.
The recommended way to to do this is by using the GOOGLE_APPLICATION_CREDENTIALS environment variable. For example:
export GOOGLE_APPLICATION_CREDENTIALS=json-file-with-GCP-creds-23441-8f8sd99vsd115a.json
Google Cloud Storage with Workload Identity Federation (WIF)
You can also use Workload Identity Federation (WIF) pools with Google Cloud Storage.
Unlike with application credentials, WIF allows you to use temporary credentials. Each time you make a request to GCS, Label Studio connects to your identity pool to request temporary credentials.
For more information about WIF, see Google Cloud - Workload Identity Federation.
Before you begin:
- Review the information in Cloud storage for projects and Secure access to cloud storage.
- Configure access to your bucket.
Service account permissions
You will need a service account that has the following permissions
- Bucket: Storage Admin (
roles/storage.admin) - Project: Service Account Token Creator (
roles/iam.serviceAccountTokenCreator) - Project: Storage Object Viewer (
roles/storage.viewer)
See Create service accounts in the Google Cloud documentation.
Create a Workload Identity Pool
There are several methods you can use to create a WIF pool.
Using Terraform
An example script is provided below. Ensure all required variables are set:
GCP project variables:
var.gcp_project_namevar.gcp_region
SaaS provided by HumanSignal:
var.aws_account_id=490065312183var.aws_role_name=label-studio-app-production
Then run:
terraform init
terraform plan
terraform apply
Once applied, you will have a functioning Workload Identity Pool that trusts the Label Studio AWS IAM Role.
## Variables
/* AWS variables are so that AWS-hosted Label Studio resources can reach out to request credentials */
variable "gcp_project_name" {
type = string
description = "GCP Project name"
}
variable "gcp_region" {
type = string
description = "GCP Region"
}
variable "label_studio_gcp_sa_name" {
type = string
description = "GCP Label Studio Service Account Name"
}
variable "aws_account_id" {
type = string
description = "AWS Project ID"
}
variable "aws_role_name" {
type = string
description = "AWS Role name"
}
variable "external_ids" {
type = list(string)
default = []
description = "List of external ids"
}
## Outputs
output "GCP_WORKLOAD_ID" {
value = google_iam_workload_identity_pool_provider.label-studio-provider-jwt.workload_identity_pool_id
}
output "GCP_WORKLOAD_PROVIDER" {
value = google_iam_workload_identity_pool_provider.label-studio-provider-jwt.workload_identity_pool_provider_id
}
## Main
provider "google" {
project = var.gcp_project_name
region = var.gcp_region
}
resource "random_id" "random" {
byte_length = 4
}
locals {
aws_assumed_role = "arn:aws:sts::${var.aws_account_id}:assumed-role/${var.aws_role_name}"
external_id_condition = (
length(var.external_ids) > 0
? format("(attribute.aws_role == \"%s\") && (attribute.external_id in [%s])",
local.aws_assumed_role,
join(", ", formatlist("\"%s\"", var.external_ids))
)
: format("(attribute.aws_role == \"%s\")", local.aws_assumed_role)
)
}
resource "google_iam_workload_identity_pool" "label-studio-pool" {
workload_identity_pool_id = "label-studio-pool-${random_id.random.hex}"
project = var.gcp_project_name
}
resource "google_iam_workload_identity_pool_provider" "label-studio-provider-jwt" {
workload_identity_pool_id = google_iam_workload_identity_pool.label-studio-pool.workload_identity_pool_id
workload_identity_pool_provider_id = "label-studio-jwt-${random_id.random.hex}"
attribute_condition = local.external_id_condition
attribute_mapping = {
"google.subject" = "assertion.arn"
"attribute.aws_account" = "assertion.account"
"attribute.aws_role" = "assertion.arn.contains('assumed-role') ? assertion.arn.extract('{account_arn}assumed-role/') + 'assumed-role/' + assertion.arn.extract('assumed-role/{role_name}/') : assertion.arn"
"attribute.external_id" = "assertion.external_id"
}
aws {
account_id = var.aws_account_id
}
}
data "google_service_account" "existing_sa" {
account_id = var.label_studio_gcp_sa_name
}
resource "google_service_account_iam_binding" "label-studio-sa-oidc" {
service_account_id = data.google_service_account.existing_sa.name
role = "roles/iam.workloadIdentityUser"
members = [
"principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.label-studio-pool.name}/attribute.aws_role/${local.aws_assumed_role}"
]
}
Using the gcloud command line
Replace the bracketed variables ([PROJECT_ID], [POOL_ID], [PROVIDER_ID], etc.) with your own values.
Make sure you escape quotes or use single quotes when necessary.
Create the Workload Identity pool:
gcloud iam workload-identity-pools create [POOL_ID] \ --project=[PROJECT_ID] \ --location="global" \ --display-name="[POOL_DISPLAY_NAME]"Where:
[POOL_ID]is the ID that you want to assign to your WIF pool (for example,label-studio-pool-abc123). Note this because you will need to reuse it later.[PROJECT_ID]is the ID of your Google Cloud project.[POOL_DISPLAY_NAME]is a human-readable name for your pool (optional, but recommended).
Create the provider for AWS.
This allows AWS principals that have the correct external ID and AWS role configured to impersonate the Google Cloud service account. This is necessary because the Label Studio resources making the request are hosted in AWS.
gcloud iam workload-identity-pools providers create-aws [PROVIDER_ID] \ --workload-identity-pool="[POOL_ID]" \ --account-id="490065312183" \ --attribute-condition="attribute.aws_role==\"arn:aws:sts::490065312183:assumed-role/label-studio-app-production\"" \ --attribute-mapping="google.subject=assertion.arn,attribute.aws_account=assertion.account,attribute.aws_role=assertion.arn,attribute.external_id=assertion.external_id"Where:
[PROVIDER_ID]is a provider ID (for example,label-studio-app-production).[POOL_ID]: The pool ID you provided in step 1.
Grant the service account that you created earlier the
iam.workloadIdentityUserrole.gcloud iam service-accounts add-iam-policy-binding [SERVICE_ACCOUNT_EMAIL] \ --role="roles/iam.workloadIdentityUser" \ --member="principalSet://iam.googleapis.com/projects/[PROJECT_NUMBER]/locations/global/workloadIdentityPools/[POOL_ID]/attribute.aws_role/arn:aws:sts::490065312183:assumed-role/label-studio-app-production"Where:
[SERVICE_ACCOUNT_EMAIL]is the email associated with you GCS service account (for example,my-service-account@[PROJECT_ID].iam.gserviceaccount.com).[PROJECT_NUMBER]: Your Google project number. This is different than the project ID. You can find the project number with the following command:gcloud projects describe $PROJECT_ID --format="value(projectNumber)"[POOL_ID]: The pool ID you provided in step 1.
Before setting up your connection in Label Studio, note what you provided for the following variables (you will be asked to provide them):
[POOL_ID][PROVIDER_ID][SERVICE_ACCOUNT_EMAIL][PROJECT_NUMBER][PROJECT_ID]
Using the Google Cloud Console
Before you begin, ensure you are in the correct project:

From the Google Cloud Console, navigate to IAM & Admin > Workload Identity Pools.
Click Get Started to enable the APIs.
Under Create an identity pool, complete the following fields:
- Name: This is the pool ID (for example,
label-studio-pool-abc123). Note this ID because you will need it again later. - Description: This is the display name for the pool (for example, “Label Studio Pool”).
- Name: This is the pool ID (for example,
Under Add a provider pool, complete the following fields:
- Select a provider: Select AWS. This is the location where the Label Studio components responsible for issuing requests are stored.
- Provider name: Enter
Label Studio App Production(you can use a different display name, but you need to ensure that the corresponding provider ID is stilllabel-studio-app-production) - Provider ID: Enter
label-studio-app-production. - AWS Account ID: Enter
490065312183.
Under Configure provider attributes, enter the following:
Click Add condition and then enter the following:
attribute.aws_role=="arn:aws:sts::490065312183:assumed-role/label-studio-app-production"Click Edit mapping and then add the following:
google.subject = assertion.arnattribute.aws_role = assertion.arn.contains('assumed-role') ? assertion.arn.extract('{account_arn}assumed-role/') + 'assumed-role/' + assertion.arn.extract('assumed-role/{role_name}/') : assertion.arn(this might be filled in by default)attribute.aws_account = assertion.accountattribute.external_id = assertion.external_id
Click Save.
Go to IAM & Admin > Service Accounts and find the service account you want to allow AWS (Label Studio) to impersonate. See Service account permissions above.
From the Principals with access tab, click Grant Access.

In the New principals field, add the following:
principalSet://iam.googleapis.com/projects/[PROJECT_NUMBER]/locations/global/workloadIdentityPools/[POOL_ID]/attribute.aws_role/arn:aws:sts::490065312183:assumed-role/label-studio-app-productionWhere:
[PROJECT_NUMBER]- Replace this with your Google project number. This is different than the project ID. To find the project number, go to IAM & Admin > Settings.[POOL_ID]- Replace this with the pool ID (the Name you entered in step 3 above, e.g.label-studio-pool-abc123).
Under Assign Roles, use the search field in the Role drop-down menu to find the Workload Identity User role.

Click Save
Before setting up your connection in Label Studio, note the following (you will be asked to provide them)
- Your pool ID - available from IAM & Admin > Workload Identity Pools
- Your provider ID - available from IAM & Admin > Workload Identity Pools (this should be
label-studio-app-production) - Your service account email - available from IAM & Admin > Service Accounts. Select the service account and the email is listed under Details.
- Your Google project number - available from IAM & Admin > Settings
- Your Google project ID - available from IAM & Admin > Settings
Create a source storage connection
From Label Studio, open your project and select Settings > Cloud Storage > Add Source Storage.
Select Google Cloud Storage (WIF Auth) and click Next.
Configure Connection
Complete the following fields and then click Test connection:
| Field | Description |
|---|---|
| Storage Title | Enter a name to identify the storage connection. |
| Bucket Name | Enter the name of your GCS bucket. |
| Workload Identity Pool ID | This is the ID you specified when creating the Work Identity Pool. You can find this in Google Cloud Console under IAM & Admin > Workload Identity Pools. |
| Workload Identity Provider ID | This is the ID you specified when setting up the provider. You can find this in Google Cloud Console under IAM & Admin > Workload Identity Pools. |
| Service Account Email |
This is the email associated with the service account you set up as part of the prerequisites. You can find it in the Details page of the service account under IAM & Admin > Service Accounts. For example, labelstudio@random-string-382222.iam.gserviceaccount.com.
|
| Google Project ID | Your Google project ID. You can find this in Google Cloud Console under IAM & Admin > Settings. |
| Google Project Number | Your Google project number. You can find this in Google Cloud Console under IAM & Admin > Settings. |
| Use pre-signed URLs (On) / Proxy through the platform (Off) |
This determines how data from your bucket is loaded:
For more information, see Pre-signed URLs vs Storage proxies. |
| Expire pre-signed URLs (minutes) | Control how long pre-signed URLs remain valid. |
Import Settings & Preview
Complete the following fields and then click Load preview to ensure you are syncing the correct data:
| Bucket Prefix | Optionally, enter the directory name within your bucket that you would like to use. For example, data-set-1 or data-set-1/subfolder-2. |
| Import Method | Select whether you want create a task for each file in your bucket or whether you would like to use a JSON/JSONL/Parquet file to define the data for each task. |
| File Name Filter | Specify a regular expression to filter bucket objects. Use .* to collect all objects. |
| Scan all sub-folders | Enable this option to perform a recursive scan across subfolders within your container. |
Review & Confirm
If everything looks correct, click Save & Sync to sync immediately, or click Save to save your settings and sync later.
Tip
You can also use the API to sync import storage.
Create a target storage connection
From Label Studio, open your project and select Settings > Cloud Storage > Add Target Storage.
Select Google Cloud Storage (WIF Auth) and click Next.
Complete the following fields:
| Storage Title | Enter a name to identify the storage connection. |
| Bucket Name | Enter the name of your GCS bucket. |
| Bucket Prefix |
Optionally, enter the directory name within your bucket that you would like to use. For example, data-set-1 or data-set-1/subfolder-2.
|
| Workload Identity Pool ID | This is the ID you specified when creating the Work Identity Pool. You can find this in Google Cloud Console under IAM & Admin > Workload Identity Pools. |
| Workload Identity Provider ID | This is the ID you specified when setting up the provider. You can find this in Google Cloud Console under IAM & Admin > Workload Identity Pools. |
| Service Account Email |
This is the email associated with the service account you set up as part of the prerequisites. You can find it in the Details page of the service account under IAM & Admin > Service Accounts. For example, labelstudio@random-string-382222.iam.gserviceaccount.com.
|
| Google Project ID | Your Google project ID. You can find this in Google Cloud Console under IAM & Admin > Settings. |
| Google Project Number | Your Google project number. You can find this in Google Cloud Console under IAM & Admin > Settings. |
| Can delete objects from storage | Enable this option if you want to delete annotations stored in the bucket when they are deleted in Label Studio. Your credentials must include the ability to delete bucket objects. |
After adding the storage, click Sync.
Tip
You can also use the API to sync export storage.
Add storage with the Label Studio API
You can also use the API to programmatically create connections. See our API documentation.
IP filtering for enhanced security for GCS
Google Cloud Storage offers bucket IP filtering as a powerful security mechanism to restrict access to your data based on source IP addresses. This feature helps prevent unauthorized access and provides fine-grained control over who can interact with your storage buckets.
Read more about Source storage behind your VPC.
Common Use Cases:
- Restrict bucket access to only your organization’s IP ranges
- Allow access only from specific VPC networks in your infrastructure
- Secure sensitive data by limiting access to known IP addresses
- Control access for third-party integrations by whitelisting their IPs
How to Set Up IP Filtering
- First, create your GCS bucket through the console or CLI
- Create a JSON configuration file to define IP filtering rules. You have two options:
For public IP ranges:
{ "mode": "Enabled", "publicNetworkSource": { "allowedIpCidrRanges": [ "xxx.xxx.xxx.xxx", // Your first IP address "xxx.xxx.xxx.xxx", // Your second IP address "xxx.xxx.xxx.xxx/xx" // Your IP range in CIDR notation ] } }
note
If you're using Label Studio Enterprise at app.humansignal.com and accessing it from your office network:
- Add Label Studio Enterprise outgoing IP addresses (see IP ranges)
- Add your office network IP range (e.g. 192.168.1.0/24)
- If both Label Studio Enterprise and your office are on the same VPN network (e.g. 10.0.0.0/16), you only need to add that VPN subnet
For VPC network sources:
{
"mode": "Enabled",
"vpcNetworkSources": [
{
"network": "projects/PROJECT_ID/global/networks/NETWORK_NAME",
"allowedIpCidrRanges": [
RANGE_CIDR
]
}
]
}
Apply the IP filtering rules to your bucket using the following command:
gcloud alpha storage buckets update gs://BUCKET_NAME --ip-filter-file=IP_FILTER_CONFIG_FILETo remove IP filtering rules when no longer needed:
gcloud alpha storage buckets update gs://BUCKET_NAME --clear-ip-filter
Limitations to Consider
- Maximum of 200 IP CIDR blocks across all rules
- Maximum of 25 VPC networks in the IP filter rules
- Not supported for dual-regional buckets
- May affect access from certain Google Cloud services