Skip to content

Data Export

Overview

Data Export is a feature that extracts event data being ingested into Hive Analytics every hour and uploads it to cloud storage.

The data provided by Data Export consists of event raw data that can be used to build your own database directly or processed into the form you want for analysis according to your needs.

Hive Analytics provides the file conversion and transfer, but you must register the cloud storage as a service in the cloud you use.

Note

Data is provided per event; split transfer by project is not supported.


What Can You Do?

Data Analysts

  • You can ingest data collected in Hive Analytics directly into your own database, process it into the desired form, and perform in-depth analysis.
  • You can connect raw data to your own BI tools or analytical environment to build custom dashboards.

Developers

  • You can automatically ingest Hive event data into your own data pipeline.
  • By integrating with AWS S3 or GCP Cloud Storage, you can use the latest hourly event data directly from cloud storage.

Quick Start

If you are setting up Data Export for the first time, follow the steps below to complete cloud storage integration.

  1. Create a dedicated bucket for Data Export in the cloud storage you will use (AWS S3 or GCP Cloud Storage) and prepare the authentication key.
  2. Go to the Analytics console > Data > Data Export settings page.
  3. Select the event(log) to export. (up to 10)
  4. Select the storage (AWS S3 / GCP Cloud Storage) and enter the bucket name.
  5. Select the data type (CSV / JSON).
  6. Register the authentication key.

Note

For how to create a bucket and issue an authentication key for each cloud storage provider, refer to Full Features.


Full Features

Data Export Logic

data_export_01.png Event data stored in BigQuery is converted into files every hour according to the Data Export schedule and uploaded to the registered cloud storage.

Data Basis

  • The selected event data is queried and transferred as files to cloud storage.
  • Data is extracted every hour based on UTC according to the transfer schedule.
    • Example: At 01:00 (UTC) on September 1, 2023, data from 00:00:00 to 00:59:59 (UTC) on September 1, 2023 is extracted and transferred
    • The partitioning basis for the dateTime attribute is set to the previous day relative to the query date.
    • Example: When extracting data from 00:00:00 to 00:59:59 (UTC) on September 1, 2023, the partitioning basis is 00:00:00 on August 30, 2023
    • If the datetime value is earlier than query time minus 1 day, it will not be included in the export data.
  • Data is queried based on the time it was entered into BigQuery.
    • Based on the bigqueryRegistTimestamp attribute
    • Sample query for data extraction
SELECT *
FROM bigquery_table
WHERE bigqueryRegistTimestamp BETWEEN '2023-09-01 00:00:00' and '2023-09-01 00:59:59'
and dateTime >= '2023-08-31 00:00:00'

Data Export Settings

data_export_02.png

Select Events

Select the events(logs) to export.

  • You can search and select by entering part of the event name.
  • Up to 10 events can be selected.

Select Storage

You must use cloud storage as the storage for saving data.

Supported clouds:

  • AWS S3
  • GCP Google Cloud Storage

Location (Bucket Name)

Enter the bucket name for the storage.

  • If the AWS S3 bucket name is s3://s3_bucket_name, enter only s3_bucket_name
  • If the Google Cloud Storage bucket name is gs://google_bucket_name, enter only google_bucket_name

Data Type

Two data types are provided.

  • CSV
  • JSON
  • All files are encoded in UTF-8.

File Upload Frequency

Files are uploaded every hour after extracting data for a one-hour range.

  • The time is extracted based on the bigqueryRegistTimestamp attribute value. (UTC basis)
    • Example: Data extraction and upload start at 15:00 (UTC): data from 05:00:00 to 05:59:59 is extracted from the bigqueryRegistTimestamp attribute.
  • The completion time may vary depending on the number of files and upload volume.

Register Authentication Key

Permission is required to upload data to cloud storage. You must register an authentication key or key file that has data write permissions. The registration method varies by cloud service.

  • S3 - Register ACCESS_KEY, ACCESS_SECRET_KEY values data_export_03.png
  • GCS - Register authentication key file data_export_04.png

Cloud Storage Settings

GCP - Google Cloud Storage

The following settings are required to export data to Google Cloud.

  1. Go to Cloud Storage on the Google Cloud Console page.

    data_export_05.png 2. Create a bucket dedicated to Data Export. - Once the bucket name is set, it cannot be changed. If needed, delete the existing bucket and create a new one. - We recommend creating a bucket dedicated to Data Export. 3. Create a service key to provide to Data Export and grant write permission to the bucket. 1. In the console page, go to IAM & Admin -> Service Accounts. 2. Click Create Service Account to create a new account. - The ID used for the account can be created with any name you want. (Example: hive_data_transfer_account@projectId.iam.gserviceaccount.com) data_export_06.png - After creating the account, go to the Keys tab and create a service key. - Use Add Key -> Create New Key to generate a JSON key file. - Download the created key file and keep it safe. 3. Go back to Cloud Storage and open the Permissions tab for the created bucket. data_export_07.png - In the Permissions tab, click Grant Access -> Add Principal and enter the newly created service account ID. - Under Role, add the two Cloud Storage permissions: Storage Object Creator and Storage Object Viewer, then click Confirm. 4. After all settings are complete, register the service key file on the Hive Analytics Data Export settings page.

AWS - S3

The following settings are required to export data to AWS.

  1. Go to Storage -> S3 on the AWS Console page. data_export_08.png
  2. Create a bucket dedicated to Data Export.
    • Once the bucket name is set, it cannot be changed. If needed, delete the existing bucket and create a new one.
    • We recommend using it only for a bucket dedicated to Data Export.
  3. You must create an account for Data Export.
    • This user should be used only as a dedicated account for Data Export. Create a new IAM user.
  4. Create access keys for the created account. Related information can be found in Managing Access Keys for IAM Users - Creating Access Keys.
    • Store the access key in a secure location.
  5. Add an inline policy for the created account.
    • Refer to the item about including inline policies for user groups (console) to create the policy.
    • Select the JSON tab to create the policy, then paste the following JSON code.
    • For YOUR-BUCKET-NAME-HERE, enter the bucket name you created.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetBucketLocation", "s3:ListBucket"],
      "Resource": ["arn:aws:s3:::YOUR-BUCKET-NAME-HERE"]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject"],
      "Resource": ["arn:aws:s3:::YOUR-BUCKET-NAME-HERE/*"]
    }
  ]
}
  1. After completing all tasks, add the stored access key to [Analytics console > Data > Data Export] settings.

File Storage Format

Data Storage Directory Structure

General file path format:

withhive/data_export/build_type/YYYY/MM/DD/event_name/event_name_YYYY_MM_DD_UUID.file_extension
  • Build Type: Has two values, sandbox and live. If set in sandbox, it is stored as sandbox.
  • YYYY/MM/DD: The year/month/day basis from which the data is extracted. (UTC basis)
  • UUID: A random value used to prevent overwriting due to duplicate file names.
  • File Extension: Depends on the selected file type.
File Type Compressed Final File Name
json V withhive/data_export/build_type/YYYY/MM/DD/event_name/event_name_YYYY_MM_DD_UUID.json.gzip
csv V withhive/data_export/build_type/YYYY/MM/DD/event_name/event_name_YYYY_MM_DD_UUID.csv.gzip

File Extensions

  • csv.gzip: A file consisting of data fields separated by commas ( , ). Encryption settings cannot be applied when compressing the file (not supported).
  • json.gzip: A file consisting of structured data text using JavaScript object syntax. It is line-separated, and the JSON file is compressed with gzip. Encryption settings cannot be applied when compressing the file (not supported).

Notes & Tips

  • No backfill for past data: Data Export starts working from the time it is registered. Past data collected before registration will not be sent retroactively.
  • Maximum event limit: The maximum number of selectable events is 10.
  • Total extraction size limit per event: If data extraction exceeds 500Mbytes, it will be excluded from transfer. The actual transferred data is a compressed file of about 15%.