Saturday, July 2, 2022

Data Architecture, GCP

Data 



BigQuery



App Architecture, GCP

App


App, headless



App, headless, solution



GCP

DNS
domain name system
CDN
content delivery network
Armor
defense against web and DDoS attacks
Apigee Sense
behavior detection to protect APIs
reCAPTCHA Enterprise
protect your website from fraudulent activity, spam, and abuse
VPC
virtual network for google cloud resources
NAT
giving private instances internet access
Load Balancing
distributing traffic
global, regional
Run
running containerized apps
serverlessauto scalingregional
App Engine
apps and backends
serverlessauto scalingregional
GKE
running containerized apps
auto scalingHAregional
Functions
creating functions that respond to cloud events
serverlessauto scalingregional
SQL
MySQL, PostgreSQL, SQL server
scalableHAregional
Spanner
cloud-native relational database
auto shardingHAmulti region
Firestore
cloud-native document database
serverlessauto scalingmulti region
Memorystore
managed Redis and Memcached
scalableHAregional
pub/sub
event ingestion and delivery
Storageobject storage
API Gateway
develop, deploy, secure, and manage APIs
Endpoints
Apigee
API management, development, and security platform
Security command center
a platform for defending against threats to your google cloud assets
Operations suite
Monitoring
infrastructure and application health
Logging
audit, platform, and application logs management
Error Reporting
exception monitoring and alerting
Debugger
app state inspection and in-production debugging
Trace
collecting latency data from an app
Profiler
app performance
Firebase
app development platform
Discovery solutions for Retail
search and recommendation
commercetools
Elastic Path
x2bee
CI/CD


솔루션 후보

commercetools
100% cloud-native, 100% API-first and 100% global
HeadlessGCP
Elastic Path
The enterprise cloud infrastructure, 99.99% uptime
HeadlessAWS
x2beeHeadless국내

글로벌 커머스 솔루션 경쟁력


Saturday, June 25, 2022

AI - Monitoring

Data drift

  • What
    • Feature
    • Population
    • Covariate shift
  • How
    • Enough of the data needs to be labeled to introduce new classes
    • Retrain the model

Concept drift

  • What
    • Pt(X, y) != Pt+1(X, y)
    • Sudden
    • Gradual
    • Incremental
    • Recurring
  • How
    • The old data needs to be relabeled. Retrain the model
    • Use an ensemble approach to train your new model

Prediction drift

Label drift


Training, serving skew

  • Tensorflow data validation

Thursday, June 9, 2022

Help 4 Other

 Google Sheets

  • IMPORTRANGE does not support importing data from a connected sheet
    • Use Pivot table or Extract first
    • Then use the function on these data

Python
  • sqlalchemy.exc.ObjectNotExecutableError
    • Check the version
    • sqlalchemy==1.4.42
  • TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.
    • Check the version
    • pandas==1.3.5
  • __init__() got multiple values for argument 'schema'
    • pip install sqlalchemy==1.4.46

VS code
  • (.venv) workspace-vs % python -V
    • zsh: command not found: python
  • add "python.experiments.optOutFrom": ["pythonTerminalEnvVarActivation"] to settings.json

Monday, May 16, 2022

Help 4 GCP

Cloud Workflows

  • Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials
    • Option 1
      • http.post + googleapis.bigquery.v2.jobs.query
      • call: http.post args: url: ${"https://bigquery.googleapis.com/bigquery/v2/projects/"+project+"/queries"} headers: Content-type: "application/json" auth: type: OAuth2 scope: ["https://www.googleapis.com/auth/drive","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/bigquery"] body: query: select * from sheets.sheets_data timeoutMs: 200000 useLegacySql: false result: response
      • Google Sheets > Share > Add the Service Account runs query > Viewer > Done
    • Option 2
      • Scheduled query + googleapis.bigquerydatatransfer.v1
      • call: googleapis.bigquerydatatransfer.v1.projects.locations.transferConfigs.startManualRuns args: parent: ${scheduled_query_name} body: requestedRunTime: ${time.format(sys.now())} result: response

BigQuery

  • string_field_0 after a Google Sheets file created as an external table
    • Add a number column if all columns are in text format
  • Permission denied while getting Drive credentials 
    • Google Sheets > Share > Add the Service Account runs query > Viewer > Done


Cloud Functions

  • Your client does not have permission to get URL
    • Cloud Functions Developer
      • It will take time to be applied
  • failed to export: failed to write image to the following tags
    • Use “gcloud beta function” and “--docker-registry=artifact-registry”
  • Your client does not have permission to get URL /yourUrl from this server
    • auth:
      • type: OIDC

Cloud Run

  • DefaultCredentialsError: Neither metadata server or valid service account credentials are found
    • Use a service account
    • Cloud Run Invoker
      • It will take time to be applied


Data Fusion

  • The client is not authorized to make this request
    • Add the related role to the data fusion service account
    • E.g., Add Cloud SQL Client to service-[project-number]@gcp-sa-datafusion.iam.gserviceaccount.com
  • MongoSocketException, UnknownHostException
    • Use all shard hosts
  • MongoSocketReadException, Prematurely reached end of stream
    • Add ssl=true
  • mongodb-plugins, Authentication failed
    • Add authSource=admin


Data Studio

  • interval 1 day, Invalid formula
    • Use interval 24 hour


Cloud Storage

  • Error getting access token from metadata server at
    • val hadoopConf = spark.sparkContext.hadoopConfiguration
    • hadoopConf.set("google.cloud.auth.service.account.enable", "true")
    • hadoopConf.set("google.cloud.auth.service.account.json.keyfile", "yourKey.json")