Tuesday, September 3, 2024

Data Architecture, On-premise

Main Only

- PB+ data

- 2016 version



Monday, September 2, 2024

Data Architecture, GCP, I

 Data, Main Only

- serverless, microservices, event-driven pipelines


BigQuery

- easy data for everyone

Wednesday, January 4, 2023

Tip 4 Others

Regexp

Find the shortest matches between two strings

(from ((?!from).)*? on)

Extract table names from SQL

(from|join)\s+(\S+)

Google Sheets

Add missing dates to the list of dates with values

=ARRAYFORMULA({TO_DATE(ROW( INDIRECT("A"&DATEVALUE(MIN(A:A))&":B"&DATEVALUE(MAX(A:A))))), IFERROR(VLOOKUP(TO_DATE(ROW( INDIRECT("A"&DATEVALUE(MIN(A:A))&":B"&DATEVALUE(MAX(A:A))))), A:B, 2, 0))})

Automatic timestamp when a cell on the same row gets updated

function onEdit(e) {

  var ss = SpreadsheetApp.getActiveSheet();

  var r = ss.getActiveCell();

  //1.Change 'Sheet1' to be matching your sheet name

  if (r.getColumn() > 1 && ss.getName()=='Sheet1') { // 2. If Edit is done in any column after Column (A)  And sheet name is Sheet1 then:

var celladdress ='A'+ r.getRowIndex() 

    ss.getRange(celladdress).setValue(new Date()).setNumberFormat("yyyy-MM-dd hh:mm:ss");

  }

};

Automatic sorting when a cell gets updated

function onEdit(event){

  var sheet = event.source.getActiveSheet();

  var editedCell = sheet.getActiveCell();


  var columnToSortBy = 4;

  var tableRange = "B3:E9";


  if(editedCell.getColumn() == columnToSortBy){   

    var range = sheet.getRange(tableRange);

    range.sort( { column : columnToSortBy } );

  }

}


Python

Connect to BigQuery without credential files like JSON

from pydata_google_auth import get_user_credentials
scopes = ["https://www.googleapis.com/auth/bigquery"]
credentials = get_user_credentials(scopes)
//  notebooks running on servers or cloud resources
credentials = get_user_credentials(scopes, use_local_webserver=False)

Increase the density of x-ticks

test = sns.distplot(df.BattlePower_diff, 100, kde=False)

plt.xticks(rotation=90)

test.xaxis.set_major_locator(plt.MaxNLocator(200))


MongoDB

filter between two dates with time zone

{

    $or:[

        {createdAt: {$gte: ISODate("2023-07-26T00:00:00.000+09:00"), $lt: ISODate("2023-07-27T00:00:00.000+09:00")}},

        {updatedAt: {$gte: ISODate("2023-07-26T00:00:00.000+09:00"), $lt: ISODate("2023-07-27T00:00:00.000+09:00")}},

    ]

}


{

    $or:[

        {createdAt: {$gte: ISODate("2023-07-25T15:00:00.000+00:00"), $lt: ISODate("2023-07-26T15:00:00.000+00:00")}},

        {updatedAt: {$gte: ISODate("2023-07-25T15:00:00.000+00:00"), $lt: ISODate("2023-07-26T15:00:00.000+00:00")}},

    ]

}

Monday, December 5, 2022

Dataproc Alert

 Monitoring, Alerting 2 Slack

  • EDIT NOTIFICATION CHANNELS
  • Slack > ADD NEW > ALLOW
  • Slack Channel Name > Cloud Alerting Display Name > SEND TEST NOTIFICATION
    • If a private channel is specified, Invite Google Cloud Monitoring to the channel
  • Save
  • Up to Alerting overview page
  • CREATE POLICY > SELECT A METRIC
  • Cloud Dataproc Cluster > N metrics > Cluster > N metrics > Failed jobs > Apply
  • ADD FILTER > error_type = DATAPROC_SERVER_ERROR > DONE
  • Rolling window function, delta > NEXT
  • Threshold value, 0 > NEXT
  • Notification Channels > Slack, the channel name > OK
  • Alert policy name > NEXT
  • CREATE POLICY

Eventarc, Workflows 2 Slack
  • Eventarc, google.cloud.dataproc.v1.ClusterController.DeleteCluster triggers a Workflow
  • Workflows, check job status > send a slack message if it equals ERROR
  • Sample
  • main:
        params: [input]
        steps:
        - assignArgs:
                assign:
                - projectID: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
                - response: ${input}
                - jobId: ${"step-spark-postgresql-2-bigquery-" + text.substring(response.data.resource.labels.cluster_name, 36, 49)}
        - checkOperation:
                switch:
                    - condition: ${"first" in response.data.operation}
                      next: getJob
                next: returnOutput
        - getJob:
                call: http.get
                args:
                    auth:
                        type: OAuth2
                        scopes: https://www.googleapis.com/auth/cloud-platform
                    url: ${"https://dataproc.googleapis.com/v1/projects/" + projectID + "/regions/asia-northeast1/jobs/" + jobId}
                result: response
        - checkStatus:
                switch:
                    - condition: ${response.body.status.state == "ERROR"}
                      next: sendSlack
                next: returnOutput
        - sendSlack:
                call: http.post
                args:
                    headers: {Content-type: application/json, Authorization: Bearer xoxb-your-token}
                    url: https://slack.com/api/chat.postMessage
                    body: { "channel": "your-channel-id", "text": '${"https://console.cloud.google.com/dataproc/jobs/" + jobId + "?region=asia-northeast1&project=" + projectID}'}
        - returnOutput:
                return: ${response}

Saturday, July 2, 2022

AI Architecture, GCP

 


Data Architecture, GCP

Data 



BigQuery



App Architecture, GCP

App


App, headless



App, headless, solution



GCP

DNS
domain name system
CDN
content delivery network
Armor
defense against web and DDoS attacks
Apigee Sense
behavior detection to protect APIs
reCAPTCHA Enterprise
protect your website from fraudulent activity, spam, and abuse
VPC
virtual network for google cloud resources
NAT
giving private instances internet access
Load Balancing
distributing traffic
global, regional
Run
running containerized apps
serverlessauto scalingregional
App Engine
apps and backends
serverlessauto scalingregional
GKE
running containerized apps
auto scalingHAregional
Functions
creating functions that respond to cloud events
serverlessauto scalingregional
SQL
MySQL, PostgreSQL, SQL server
scalableHAregional
Spanner
cloud-native relational database
auto shardingHAmulti region
Firestore
cloud-native document database
serverlessauto scalingmulti region
Memorystore
managed Redis and Memcached
scalableHAregional
pub/sub
event ingestion and delivery
Storageobject storage
API Gateway
develop, deploy, secure, and manage APIs
Endpoints
Apigee
API management, development, and security platform
Security command center
a platform for defending against threats to your google cloud assets
Operations suite
Monitoring
infrastructure and application health
Logging
audit, platform, and application logs management
Error Reporting
exception monitoring and alerting
Debugger
app state inspection and in-production debugging
Trace
collecting latency data from an app
Profiler
app performance
Firebase
app development platform
Discovery solutions for Retail
search and recommendation
commercetools
Elastic Path
x2bee
CI/CD


솔루션 후보

commercetools
100% cloud-native, 100% API-first and 100% global
HeadlessGCP
Elastic Path
The enterprise cloud infrastructure, 99.99% uptime
HeadlessAWS
x2beeHeadless국내

글로벌 커머스 솔루션 경쟁력