Google Cloud BigQuery Setup

This guide provides instructions for setting up BigQuery to export data from the previously configured dataset to GCS at regular intervals.

Service Account Creation and Role Assignment

ℹ️
This can be performed by users with sufficient permissions (Organization Admin, Project Admin) to create service accounts and grant permissions. For service account permissions and role setup guide, please refer to the corresponding guide document.

  1. Verify that the project for using GCS and BigQuery is correct.
  2. Click create service account.

  1. Enter the service account name (e.g., billing_bigquery_export_account). It’s recommended to use an easily identifiable name.
  2. Click complete.

Select ‘IAM & Admin’ > ‘Roles’ from the navigation menu (☰) in the top left.

  1. Enter the name of the created service account (billing_bigquery_export_account) in the filter field.
  2. Click the searched service account.

  1. Move to the Permissions tab.
  2. Click the permission control button to activate the right sidebar.
  3. Add the following 3 roles (BigQuery Data Viewer, BigQuery Job User, Storage Object Admin).
BigQuery Data Viewer
    - bigquery.tables.get
    - bigquery.tables.getData

BigQuery Job User
    - bigquery.jobs.create

Storage Object Admin
    - storage.objects.create
    - storage.objects.list
    - storage.objects.delete

GCS Bucket Creation

Select ‘Cloud Storage’ > ‘Buckets’ from the navigation menu (☰) in the top left.

⚠️
Region must be the same as the Region of the Billing information dataset configured in BigQuery. Please refer to the previous step guide document.
  1. Confirm project selection at the top and click create.
  2. Enter the bucket name.
  3. Verify that global access prevention is enabled.
  4. Click create to finish.

Scheduled Query Creation

  1. Select ‘BigQuery’ > ‘Scheduled Queries’ from the navigation menu (☰) in the top left.
  2. Confirm the project at the top and click create scheduled query.

  1. Create two schedules by appropriately modifying the General Usage Costs and Adjustment Amount queries according to the guidelines. Although there are two schedules, monthly files will be overwritten with the same pattern.
  2. Execute to verify that the query runs normally.
  3. Select “Schedule” from the top tools.

The queries create the following file tree structure:

            • billing_data_YYYYMM-*.parquet
  • General Usage Costs and Adjustment Amounts

    General usage costs and adjustment amounts differ in terms of collection cycle, data characteristics, update timing, and final cost calculation method.

    • General usage costs are standard usage-based cost data aggregated daily, representing provisional amounts that are not yet finalized. These costs are aggregated almost in real-time as GCP service usage occurs and are added to the dataset daily. At this stage, the costs are not yet discounted or finalized.

    • Adjustment amounts are all variable items reflected to finalize the billing amount after the month closes, including credits, promotions, taxes, and all other items that are ultimately reflected at month-end. Generally, data for the month is finalized in the early days of the following month, and these adjustments can appear as positive (+) or negative (-) amounts.

    Scheduled Query SQL 1. General Usage Costs

    SQL CODE

    This is a query to perform export for general usage costs.

    Enter the following items appropriately:

    -- run_date variable declaration: DATE type with default value set to yesterday's date.
    -- This date serves as the end date for data query period and the basis for file path generation.
    DECLARE run_date DATE DEFAULT DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);
    
    -- start_of_month variable declaration: DATE type with default value set to the first day of the month containing 'run_date' (yesterday).
    -- Example: If today is September 4, 2025, run_date is September 3, and start_of_month is September 1.
    DECLARE start_of_month DATE DEFAULT DATE_TRUNC(run_date, MONTH);
    
    -- EXPORT DATA statement: Exports query results to Google Cloud Storage (GCS).
    EXPORT DATA OPTIONS(
      -- uri: Specifies the GCS path and filename where data will be stored.
      -- The path is dynamically generated based on the year and month of 'run_date' (yesterday).
      uri=CONCAT(
        'gs://{GCS_BUCKET}/{PROJECT_ID}/',
        FORMAT_DATE('%Y/%m', run_date), -- 'YYYY/MM' format folder path (based on yesterday)
        '/billing_data_',
        FORMAT_DATE('%Y%m', run_date),  -- 'YYYYMM' format filename part (based on yesterday)
        '-*.parquet'
      ),
      format='PARQUET',        -- File format is Parquet
      compression='GZIP',      -- Compression method is GZIP
      overwrite=true           -- Allow overwriting existing files
    ) AS
    -- Exports the results of the SELECT query below.
    SELECT *
    FROM `{DATA_SET_ID}`
    WHERE
      -- Only scans data partitions from '1st of this month' to 'yesterday' based on _PARTITIONTIME.
      -- This improves query performance and reduces costs.
      DATE(_PARTITIONTIME) BETWEEN start_of_month AND run_date
      -- Filters data where invoice.month field matches the month containing 'yesterday' ('YYYYMM' format).
      -- Example: If run_date is 2025-09-03, only data with invoice.month = '202509' is selected.
      AND invoice.month = FORMAT_DATE('%Y%m', run_date);

    Scheduled Query SQL 2. Adjustment Amount

    SQL CODE

    This is a query to perform export for adjustment amount usage costs.

    Enter the following items appropriately:

    -- run_date variable declaration: DATE type with default value set to yesterday's date.
    -- CURRENT_DATE() returns today's date, and subtracting INTERVAL 1 DAY calculates yesterday.
    DECLARE run_date DATE DEFAULT DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);
    
    -- prev_month_start variable declaration: DATE type with default value set to the first day of the previous month.
    -- Example: If today is September 4, 2025, then (1) subtract one month from CURRENT_DATE() (August 4, 2025),
    -- (2) use DATE_TRUNC function to convert to the first day of that month (August 1, 2025).
    DECLARE prev_month_start DATE DEFAULT DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH);
    
    -- EXPORT DATA statement: Exports query results to Google Cloud Storage (GCS).
    EXPORT DATA OPTIONS(
      -- uri: Specifies the GCS path and filename where data will be stored.
      -- Uses CONCAT function to dynamically generate the path.
      uri=CONCAT(
        'gs://{GCS_BUCKET}/{PROJECT_ID}/',
        FORMAT_DATE('%Y/%m', prev_month_start), -- Generate 'YYYY/MM' format folder path
        '/billing_data_',
        FORMAT_DATE('%Y%m', prev_month_start),  -- Generate 'YYYYMM' format filename part
        '-*.parquet' -- Wildcard (*) indicating BigQuery can export to multiple files
      ),
      format='PARQUET',        -- File format is specified as Parquet
      compression='GZIP',      -- Compression method is specified as GZIP
      overwrite=true           -- Overwrite setting when files with the same name already exist
    ) AS
    -- Exports the results of the SELECT query below.
    SELECT * -- Selects all columns.
    FROM `{DATA_SET_ID}` -- Target GCP billing data table
    WHERE
      -- _PARTITIONTIME refers to the partition date of the partitioned table.
      -- Using this condition reduces the amount of data scanned, improving query performance and reducing costs.
      -- Here, it limits scanning to only data partitions from '1st of last month' to 'yesterday'.
      DATE(_PARTITIONTIME) BETWEEN prev_month_start AND run_date
      -- Filters data where invoice.month field matches 'last month'.
      -- Used together with _PARTITIONTIME filtering to extract exactly the billing data for the previous month.
      AND invoice.month = FORMAT_DATE('%Y%m', prev_month_start);

    DATA_SET_ID

    DATA_SET_ID can be obtained after the dataset is successfully created in the previous Google Cloud Billing export setup step.

    • Select ‘BigQuery’ > ‘Studio’ from the navigation menu (☰) in the top left.
    • Expand the created dataset and copy the ID from the options of the existing set.
    • You can obtain an ID in the following format:
    {PROJECT_ID}.{DATA_SET_NAME}.gcp_billing_export_v1_XXXXXX_XXXXXX_XXXXXX

    GCS_BUCKET

    GCS_BUCKET can be obtained after the bucket is successfully created in the previous GCS Bucket Creation step.

    • Select Cloud Storage > ‘Bucket’ from the navigation menu (☰) in the top left.
    • Select the bucket that was successfully created in the GCS Bucket Creation step.

    • Click the copy icon next to the path to copy the path.
    • You can copy the following bucket name:
    test_billing_data

    Scheduled Query Schedule Setup

    ℹ️
    Schedulers must be created, one for each query, for a total of 2.
    1. Enter the name of the executed scheduler.
    2. Enter schedule options.
    3. Specify schedule options:
      • Scheduled Query 1: General usage cost export Daily at UTC 23:00 (refer to plugin settings for execution time)
        • days / 23:00 / Start now / End never
      • Scheduled Query 2: Adjustment amount usage cost export Monthly on 1st-10th at UTC 23:00 (refer to plugin settings for execution time)
        • Months / 1,2,3,4,5,6,7,8,9,10 / 23:00 / Start now / End never
    4. Set the execution entity to the service account created in Service Account Creation and Role Assignment .
    5. Click the save button to finish.

    Execution Verification

    1. Move to scheduled queries from the left sidebar menu.
    2. Verify the normal operation of the created scheduler on the screen.
      • The created scheduler will run automatically once initially.
      • If not working properly, you can click the scheduler to check error details or make modifications.

    Execution Progress Verification

    General usage costs are executed daily to collect cumulative monthly data, while adjustment amounts collect the previous month’s data in the early days of the following month after month-end. Data is systematically organized by year/month/day for easy analysis, with general usage costs updating almost in real-time and adjustment amounts updating monthly.

    Collection Period: August 11, 2025 to October 15, 2025 (approximately 3 months)

            • billing_data_202508-000000.parquet
            • billing_data_202508-000001.parquet
            • billing_data_202508-NNNNNN.parquet
            • billing_data_202509-000000.parquet
            • billing_data_202509-000001.parquet
            • billing_data_202509-NNNNNN.parquet
            • billing_data_202510-000000.parquet
            • billing_data_202510-NNNNNN.parquet
  • Scheduled Query 1 (General Usage Costs):

    • Execution Cycle: Runs daily at UTC 23:00
    • Data Range: Usage data from the 1st of the month to the day before execution
    • Collection Results:
      • August: Executed August 11-31, collected August 1-30 data
      • September: Executed September 1-30, collected September 1-30 data
      • October: Executed October 1-15, collected October 1-14 data

    Scheduled Query 2 (Adjustment Amount):

    • Execution Cycle: Runs on the 1st-10th of each month at UTC 23:00
    • Data Range: Previous month’s adjustment data (credits, discounts, taxes, etc.)
    • Collection Results:
      • August adjustment data: Executed September 1-10, collection completed
      • September adjustment data: Executed October 1-10, collection completed
      • October adjustment data: Scheduled to execute November 1-10 (not yet collected)