데이터 분석가:Applied Data Analytics

Main Quest06_Project_Tableau 태블로 시각화 6-1

뇌건강과 데이터분석가 2025. 3. 30. 20:34

피그마 링크 (암호: aiffel) https://www.figma.com/design/kr6NrmqIG8QTAdVHSYTt6c/Untitled?node-id=0-1&t=iWyFTulKIA4HY72R-1 코랩 링크

공동작업을 위해서는 구글빅쿼리로 작업을 진행하고 데이터마트를 만들어서 서로 공유
장점 : 공동작업과 중간에 업데이트는 쿼리코드만 공유하면 쉽게 데이터나 뷰가 생성된다.
     - 주의 : 사전에 이런정보를 공유하지 않으면 개인의 쿼리코드작업후 저장안된건 공유불가능 빅쿼리는 저장안한 코드는 삭제됨.
         보안책 : 이럴경우 빅쿼리에서 본인이 업데이트한 뷰나 테이블을 CSV로 저장(빅쿼리 에서 저장할 수 있음) 한 후 공유하면됨

이때 아래 처럼 빅쿼리 공유를 구글이메일로 권한을 줘야 액세스 거부를 사전에 통과할 수 있다.
- 아래 -
Google BigQuery 서비스에서 이 요청에 대한 액세스를 거부했습니다. Access Denied: Table aiffel-454815:ecommerce_mart2.v_event_log: Permission bigquery.tables.get denied on table aiffel-454815:ecommerce_mart2.v_event_log (or it may not exist). Google BigQuery 서버 ''에 연결할 수 없습니다. 서버가 실행 중이며 요청된 데이터베이스에 액세스할 수 있는 권한이 있는지 확인하십시오.

작업시 코드 변경 공유 내용 : 


물류팀_A군 :  v_logistics_dashboard 컬럼 추가 및 수정 → v_logistics_dashboard_4_view로 수정

  • v_logistics_dashboardorder
order_item_id order_id product_id user_id inventory_created_at created_at order_item_shipped_at order_item_delivered_at sold_at order_item_returned_at distribution_center_id distribution_center_name latitude longitude city state order_item_status source_type  
1 null null 13844 null 2020-08-16 18:33:00 UTC null null null null null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
2 null null 13844 null 2024-07-06 01:43:30 UTC null null null 2024-08-11 11:25:30 UTC null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
3 null null 13844 null 2025-01-25 02:17:00 UTC null null null null null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
4 null null 13844 null 2021-01-13 04:48:29 UTC null null null 2021-01-14 23:52:29 UTC null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
5 null null 13844 null 2025-02-03 15:42:00 UTC null null null null null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
6 null null 13844 null 2020-06-02 17:40:00 UTC null null null null null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
7 null null 13844 null 2022-10-30 02:53:00 UTC null null null null null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
8 null null 13844 null 2022-04-04 09:06:00 UTC null null null null null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
9 null null 13844 null 2021-06-27 16:24:00 UTC null null null null null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
10 null null 13844 null 2023-04-15 04:39:30 UTC null null null 2023-05-23 00:31:30 UTC null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
11 null null 13844 null 2024-05-14 16:24:34 UTC null null null 2024-05-25 12:23:34 UTC null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
12 null null 13844 null 2024-06-22 16:05:19 UTC null null null 2024-06-30 04:37:19 UTC null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
13 null null 13844 null 2024-09-12 16:54:00 UTC null null null null null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
14 null null 13844 null 2024-09-25 09:49:04 UTC null null null 2024-11-18 07:02:04 UTC null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
15 null null 13844 null 2023-06-03 02:41:00 UTC null null null null null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
16 null null 13844 null 2023-05-06 02:54:00 UTC null null null null null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
17 null null 14086 null 2023-10-21 11:54:30 UTC null null null 2023-11-26 01:39:30 UTC null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
18 null null 14086 null 2023-11-03 03:09:33 UTC null null null 2023-12-21 12:23:33 UTC null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
19 null null 14086 null 2021-04-25 01:51:00 UTC null null null null null 7 Philadelphia PA 39.95 -75.1667 null null null inventory
  • v_logistics_dashboard_4_view
  • 더보기
order_item_id order_id product_id user_id created_at product_distribution_center_id distribution_center_name latitude longitude city state country name total_orders delayed_orders returned_orders delay_rate return_rate delay_to_return_rate on_time_rate spi_score shipped_at delivered_at returned_at delivery_days delayed
1 33160 22948 17788 18307 2022-08-06 10:54:30 UTC 4 Los Angeles CA 34.05 -118.25 Zaragoza Aragón Spain Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 null null null null null
2 5808 4006 11059 3160 2025-01-30 09:12:24 UTC 4 Los Angeles CA 34.05 -118.25 Riverside California United States Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 null null null null null
3 60229 41535 4492 33264 2019-11-24 22:33:23 UTC 4 Los Angeles CA 34.05 -118.25 Titusville Florida United States Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 2019-11-25 10:09:00 UTC null null null null
4 83763 57727 22121 46176 2023-11-10 06:44:34 UTC 4 Los Angeles CA 34.05 -118.25 null Gyeongsangnam-do South Korea Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 null null null null null
5 133342 91994 18075 73699 2022-06-22 10:11:31 UTC 4 Los Angeles CA 34.05 -118.25 Tongliao Hebei China Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 2022-06-23 18:36:00 UTC null null null null
6 99745 68729 6494 54881 2024-01-20 15:37:09 UTC 4 Los Angeles CA 34.05 -118.25 Shan Tou Hubei China Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 null null null null null
7 88444 60957 27380 48791 2024-05-11 09:39:26 UTC 4 Los Angeles CA 34.05 -118.25 Châtellerault Nouvelle-Aquitaine France Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 null null null null null
8 92899 64048 23615 51226 2024-12-21 06:11:46 UTC 4 Los Angeles CA 34.05 -118.25 Annandale Virginia United States Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 null null null null null
9 101490 69967 23584 55892 2024-11-19 07:29:53 UTC 4 Los Angeles CA 34.05 -118.25 Maceió Alagoas Brasil Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 2024-11-21 13:31:00 UTC null null null null
10 55378 38231 25966 30651 2023-06-26 09:11:12 UTC 4 Los Angeles CA 34.05 -118.25 Shanghai Jiangsu China Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 null null null null null
11 65077 44868 9089 35991 2020-08-26 06:05:09 UTC 4 Los Angeles CA 34.05 -118.25 Sydney New South Wales Australia Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 null null null null null
12 20002 13771 27413 10954 2023-02-15 16:10:49 UTC 4 Los Angeles CA 34.05 -118.25 Sin-le-Noble Hauts-de-France France Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 null null null null null
13 91232 62916 11425 50348 2025-03-14 16:41:56 UTC 4 Los Angeles CA 34.05 -118.25 Houston Texas United States Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 2025-03-17 04:40:00 UTC null null null null
14 165001 113969 22596 91332 2025-02-15 11:11:37 UTC 4 Los Angeles CA 34.05 -118.25 Huaihua Beijing China Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 null null null null null
15 131786 90919 22158 72843 2025-03-23 01:38:30.087368 UTC 4 Los Angeles CA 34.05 -118.25 Busan Busan South Korea Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 2025-03-21 11:41:18.087368 UTC null null null null
16 6272 4319 7944 3393 2023-11-26 13:04:14 UTC 4 Los Angeles CA 34.05 -118.25 Devizes England United Kingdom Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 2023-11-26 03:28:00 UTC null null null null
17 179261 123800 22718 99205 2024-12-17 16:27:39 UTC 4 Los Angeles CA 34.05 -118.25 Sudbury England United Kingdom Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 2024-12-16 15:09:00 UTC null null null null
18 74278 51151 2547 40958 2021-09-02 05:22:23 UTC 4 Los Angeles CA 34.05 -118.25 Lakeland Highlands Florida United States Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 null null null null null
19 122576 84576 10622 67785 2021-12-06 22:44:44 UTC 4 Los Angeles CA 34.05 -118.25 Chengdu Hubei China Los Angeles CA 17223 1848 1753 0.10729838007315799 0.10178250014515473 0.28625541125541126 0.892701619926842 0.89490797189804339 2021-12-06 04:32:00 UTC null null null null

 

  • 컬럼 info
  column_name example_value inferred_type
1 order_item_id 33160 int64
2 order_id 22948 int64
3 product_id 17788 int64
4 user_id 18307 int64
5 created_at 2022-08-06 10:54:30 UTC str
6 product_distribution_center_id 4 int64
7 distribution_center_name Los Angeles CA str
8 latitude 34.05 float64
9 longitude -118.25 float64
10 city Zaragoza str
11 state Arag처n str
12 country Spain str
13 name Los Angeles CA str
14 total_orders 17223 int64
15 delayed_orders 1848 int64
16 returned_orders 1753 int64
17 delay_rate 0.10729838 float64
18 return_rate 0.1017825 float64
19 delay_to_return_rate 0.286255411 float64
20 on_time_rate 0.89270162 float64
21 spi_score 0.894907972 float64
22 shipped_at   float
23 delivered_at   float
24 returned_at   float
25 delivery_days   float64
26 delayed   float

고객팀_B양 :  ‘v_user_rfm_segmentation` 뷰 생성 (고객별로 세그멘테이션)

  • 뷰 생성 코드

CREATE OR REPLACE VIEW `aiffel-454815.ecommerce_mart2.v_user_rfm_segmentation` AS

WITH base AS (
  SELECT
    user_id,
    MIN(DATE(order_created_at)) AS first_order_date,
    MAX(DATE(order_created_at)) AS last_order_date,
    COUNT(DISTINCT order_id) AS frequency,
    SUM(retail_price * num_of_item) AS total_revenue,
    AVG(retail_price * num_of_item) AS aov
  FROM `aiffel-454815.ecommerce_mart2.v_marketing_orders`
  WHERE status != 'Cancelled'
  GROUP BY user_id
),

rfm_step1 AS (
  SELECT
    *,
    DATE_DIFF(DATE('2025-03-27'), last_order_date, DAY) AS recency,
    DATE_DIFF(last_order_date, first_order_date, WEEK) + 1 AS active_weeks
  FROM base
),

rfm_step2 AS (
  SELECT
    *,
    frequency / NULLIF(active_weeks, 0) AS purchase_freq_per_week,
    total_revenue / NULLIF(frequency, 0) AS arpu
  FROM rfm_step1
),

rfm_calc AS (
  SELECT
    *,
    aov * purchase_freq_per_week * active_weeks AS raw_ltv,
    DATE_DIFF(DATE('2025-03-27'), last_order_date, DAY) >= 60 AS is_churned,
    CASE
      WHEN DATE_DIFF(DATE('2025-03-27'), last_order_date, DAY) >= 60 THEN 
        aov * purchase_freq_per_week * active_weeks * 0.5
      ELSE 
        aov * purchase_freq_per_week * active_weeks
    END AS ltv,
    NTILE(5) OVER (ORDER BY recency ASC) AS r_score,
    NTILE(5) OVER (ORDER BY frequency ASC) AS f_score,
    NTILE(5) OVER (ORDER BY total_revenue ASC) AS m_score
  FROM rfm_step2
),

clustered AS (
  SELECT *,
    r_score + f_score + m_score AS rfm_score,
    CASE
      WHEN r_score >= 3 AND f_score >= 3 AND m_score >= 3 AND (r_score + f_score + m_score) >= 10 THEN 'VIP'
      WHEN f_score >= 4 THEN '충성고객'
      WHEN r_score <= 2 AND f_score <= 2 AND m_score <= 2 THEN '이탈위험고객'
      ELSE '일반고객'
    END AS cluster
  FROM rfm_calc
)

SELECT
  user_id,
  recency,
  frequency,
  total_revenue,
  aov,
  arpu,
  purchase_freq_per_week,
  active_weeks,
  raw_ltv,
  is_churned,
  ltv,
  r_score,
  f_score,
  m_score,
  rfm_score,
  cluster
FROM clustered


  • (0328) 마트뷰 변경
  • 고객팀_세그먼트 View

    CREATE OR REPLACE VIEW ecommerce_mart2.v_user_rfm_segmentation AS

    -- [1] 유저별 기본 구매 정보 집계 (취소 제외)
    WITH base AS (
      SELECT
        user_id,
        MIN(DATE(order_created_at)) AS first_order_date,
        MAX(DATE(order_created_at)) AS last_order_date,
        COUNT(DISTINCT order_id) AS frequency,
        SUM(sale_price * num_of_item) AS total_revenue,
        AVG(sale_price * num_of_item) AS aov
      FROM ecommerce_mart2.v_marketing_orders
      WHERE status != 'Cancelled'
      GROUP BY user_id
    ),

    -- [2] Recency, Active Weeks 계산
    rfm_step1 AS (
      SELECT
        *,
        DATE_DIFF(DATE('2025-03-27'), last_order_date, DAY) AS recency,
        DATE_DIFF(last_order_date, first_order_date, WEEK) + 1 AS active_weeks
      FROM base
    ),

    -- [3] 주당 구매 빈도, ARPU, LTV 계산
    rfm_step2 AS (
      SELECT
        *,
        frequency / NULLIF(active_weeks, 0) AS purchase_freq_per_week,
        total_revenue / NULLIF(frequency, 0) AS arpu
      FROM rfm_step1
    ),

    -- [4] LTV 계산 + 이탈 여부 반영 (60일 이상 미접속 시 LTV 50% 감소)
    rfm_ltv AS (
      SELECT
        *,
        aov * purchase_freq_per_week * active_weeks AS raw_ltv,
        DATE_DIFF(DATE('2025-03-27'), last_order_date, DAY) >= 60 AS is_churned,
        CASE
          WHEN DATE_DIFF(DATE('2025-03-27'), last_order_date, DAY) >= 60 THEN 
            aov * purchase_freq_per_week * active_weeks * 0.5
          ELSE 
            aov * purchase_freq_per_week * active_weeks
        END AS ltv
      FROM rfm_step2
    ),

    -- [5] RFM 점수 부여 (NTILE 사용 시 user_id 중복 방지를 위해 여기서 별도 처리)
    rfm_scored AS (
      SELECT
        *,
        NTILE(5) OVER (ORDER BY recency ASC) AS r_score,
        NTILE(5) OVER (ORDER BY frequency ASC) AS f_score,
        NTILE(5) OVER (ORDER BY total_revenue ASC) AS m_score
      FROM rfm_ltv
    ),

    -- [6] 클러스터 정의
    clustered AS (
      SELECT *,
        r_score + f_score + m_score AS rfm_score,
        CASE
          WHEN r_score >= 3 AND f_score >= 3 AND m_score >= 3 AND (r_score + f_score + m_score) >= 10 THEN 'VIP'
          WHEN f_score >= 4 THEN '충성고객'
          WHEN r_score <= 2 AND f_score <= 2 AND m_score <= 2 THEN '이탈위험고객'
          ELSE '일반고객'
        END AS cluster
      FROM rfm_scored
    )

    -- [7] 최종 출력
    SELECT
      user_id,
      first_order_date,
      last_order_date,
      recency,
      frequency AS orders,
      total_revenue,
      aov,
      arpu,
      purchase_freq_per_week,
      active_weeks,
      raw_ltv,
      is_churned,
      ltv,
      r_score,
      f_score,
      m_score,
      rfm_score,
      cluster
    FROM clustered;
    + f_score + m_score) >= 10 THEN 'VIP' WHEN f_score >= 4 THEN '충성고객' WHEN r_score <= 2 AND f_score <= 2 AND m_score <= 2 THEN '이탈위험고객' ELSE '일반고객' END AS cluster FROM rfm_scored ) -- [7] 최종 출력 SELECT user_id, first_order_date, last_order_date, recency, frequency AS orders, total_revenue, aov, arpu, purchase_freq_per_week, active_weeks, raw_ltv, is_churned, ltv, r_score, f_score, m_score, rfm_score, cluster FROM clustered;

마트 뷰 수정코드 
-- 1. 전사 공통 데이터 마트 (Fact + Dim 테이블 구성)
-- 가정: 데이터셋들은 BigQuery에 업로드되어 있고, dataset 이름은 `ecommerce_raw` 라고 가정

-- Fact Table: fact_order_item
CREATE OR REPLACE TABLE ecommerce_mart2.fact_order_item AS
SELECT 
  oi.id AS order_item_id,
  oi.product_id,
  oi.order_id,
  o.user_id,
  -- p.cost,
  -- p.category,
  -- p.name AS product_name,
  -- p.brand,
  -- p.retail_price,
  -- p.department,
  -- p.distribution_center_id,
  o.created_at AS order_created_at,
  oi.created_at AS order_item_created_at,
  oi.status,
  oi.shipped_at,
  oi.sale_price,
  oi.delivered_at AS order_item_delivered_at,
  o.delivered_at AS order_deliverd_at, 
  oi.returned_at,
  o.num_of_item
FROM ecommerce_raw.order_items oi
JOIN ecommerce_raw.orders o ON oi.order_id = o.order_id;
-- JOIN ecommerce_raw.product p ON oi.product_id = p.id;

--Fact Table: Inventory_item (재고 분석용)
CREATE OR REPLACE TABLE ecommerce_mart2.fact_inventory_item AS
SELECT
  ii.id AS inventory_item_id,
  ii.product_id,
  ii.created_at AS inventory_created_at,
  ii.sold_at,
  ii.cost,
  ii.product_category AS category,
  ii.product_name,
  ii.product_brand AS brand,
  ii.product_retail_price AS retail_price,
  ii.product_department AS department,
  ii.product_sku AS sku,
  ii.product_distribution_center_id AS distribution_center_id
FROM ecommerce_raw.inventory_items ii;

-- Fact Table: fact_event_log (마케팅팀 세션 분석용)
CREATE OR REPLACE TABLE ecommerce_mart2.fact_event_log AS
SELECT
  id AS event_id,
  user_id,
  session_id,
  created_at AS event_created_at,
  event_type,
  traffic_source,
  browser,
  city,
  state,
  postal_code,
  uri,
  sequence_number
FROM ecommerce_raw.events;

-- Dim Table: dim_user
CREATE OR REPLACE TABLE ecommerce_mart2.dim_user AS
SELECT 
  id AS user_id,
  first_name,
  last_name,
  email,
  age,
  gender,
  city,
  state,
  postal_code,
  country,
  traffic_source,
  created_at AS user_created_at
FROM ecommerce_raw.users;

-- Dim Table: dim_product
CREATE OR REPLACE TABLE ecommerce_mart2.dim_product AS
SELECT 
  id AS product_id,
  name,
  category,
  brand,
  department,
  sku,
  retail_price,
  distribution_center_id
FROM ecommerce_raw.products;

-- Dim Table: dim_distribution_center
CREATE OR REPLACE TABLE ecommerce_mart2.dim_distribution_center AS
SELECT 
  id AS distribution_center_id,
  name,
  latitude,
  longitude
FROM ecommerce_raw.distribution_centers;


-- 2. 팀별 마트용 View 생성 (검사)

-- 운영팀 View
CREATE OR REPLACE VIEW ecommerce_mart2.v_operations_dashboard AS
SELECT 
  f.*,
  u.city,
  u.state,
  p.category,
  p.brand,
  p.name AS product_name,
  p.retail_price AS product_retail_price,
  p.distribution_center_id
FROM ecommerce_mart2.fact_order_item f
LEFT JOIN ecommerce_mart2.dim_user u ON f.user_id = u.user_id
LEFT JOIN ecommerce_mart2.dim_product p ON f.product_id = p.product_id;

-- 마케팅팀 View 1: 주문/고객 중심
CREATE OR REPLACE VIEW ecommerce_mart2.v_marketing_orders AS
SELECT 
  f.user_id,
  f.order_id,
  f.order_created_at,
  u.traffic_source,
  u.gender,
  u.age,
  u.city,
  u.state,
  f.status,
  f.product_id,
  f.returned_at,
  f.num_of_item,
  p.retail_price,
  f.sale_price
FROM `ecommerce_mart.dim_product` p
RIGHT JOIN ecommerce_mart2.fact_order_item f ON p.product_id = f.product_id
LEFT JOIN ecommerce_mart2.dim_user u ON f.user_id = u.user_id
 ;

-- 마케팅팀 View 2: 세션/이벤트 로그 중심
CREATE OR REPLACE VIEW ecommerce_mart2.v_event_log AS
SELECT
  event_id,
  user_id,
  session_id,
  event_created_at,
  event_type,
  traffic_source,
  browser,
  city,
  state,
  postal_code,
  uri,
  sequence_number
FROM ecommerce_mart2.fact_event_log;

-- 판매팀 View
CREATE OR REPLACE VIEW ecommerce_mart2.v_sales_dashboard AS
SELECT 
  f.product_id,
  p.name AS product_name,
  p.category,
  p.brand,
  p.retail_price,
  f.status,
  f.returned_at,
  f.order_item_created_at,
  f.order_item_delivered_at,
  f.num_of_item,
  f.shipped_at AS order_item_shiped_at,
  f.sale_price
FROM ecommerce_mart2.fact_order_item f
LEFT JOIN ecommerce_mart2.dim_product p ON f.product_id = p.product_id;

-- 물류팀 View
CREATE OR REPLACE VIEW ecommerce_mart2.v_logistics_dashboard AS

-- 주문 기반 데이터
SELECT 
  f.order_item_id,
  f.order_id,
  f.product_id,
  f.user_id,
  f.order_item_created_at AS created_at,
  f.shipped_at,
  f.order_item_delivered_at AS delivered_at,
  f.returned_at,
  dp.distribution_center_id,
  dc.name AS distribution_center_name,
  dc.latitude,
  dc.longitude,
  u.city,
  u.state,
  f.status,
  'order' AS source_type  -- 주문 데이터라는 표시
FROM ecommerce_mart2.fact_order_item f
LEFT JOIN ecommerce_mart2.dim_user u ON f.user_id = u.user_id
LEFT JOIN ecommerce_mart2.dim_product dp ON f.product_id = dp.product_id
LEFT JOIN ecommerce_mart2.dim_distribution_center dc ON dp.distribution_center_id = dc.distribution_center_id

UNION ALL

-- 재고 기반 데이터
SELECT
  NULL AS order_item_id,
  NULL AS order_id,
  i.product_id,
  NULL AS user_id,
  i.inventory_created_at AS created_at,
  NULL AS shipped_at,
  i.sold_at AS delivered_at,  -- 실제 판매 시점이 배송 완료 시점과 유사
  NULL AS returned_at,
  i.distribution_center_id,
  dc.name AS distribution_center_name,
  dc.latitude,
  dc.longitude,
  NULL AS city,
  NULL AS state,
  NULL AS status,
  'inventory' AS source_type  -- 재고 데이터라는 표시
FROM ecommerce_mart2.fact_inventory_item i
LEFT JOIN ecommerce_mart2.dim_distribution_center dc ON i.distribution_center_id = dc.distribution_center_id;

-- 고객팀_세그먼트 View

CREATE OR REPLACE VIEW ecommerce_mart2.v_user_rfm_segmentation AS

-- [1] 유저별 기본 구매 정보 집계 (취소 제외)
WITH base AS (
  SELECT
    user_id,
    MIN(DATE(order_created_at)) AS first_order_date,
    MAX(DATE(order_created_at)) AS last_order_date,
    COUNT(DISTINCT order_id) AS frequency,
    SUM(sale_price * num_of_item) AS total_revenue,
    AVG(sale_price * num_of_item) AS aov
  FROM `aiffel-454815.ecommerce_mart2.v_marketing_orders`
  WHERE status != 'Cancelled'
  GROUP BY user_id
),

-- [2] Recency, Active Weeks 계산
rfm_step1 AS (
  SELECT
    *,
    DATE_DIFF(DATE('2025-03-27'), last_order_date, DAY) AS recency,
    DATE_DIFF(last_order_date, first_order_date, WEEK) + 1 AS active_weeks
  FROM base
),

-- [3] 주당 구매 빈도, ARPU, LTV 계산
rfm_step2 AS (
  SELECT
    *,
    frequency / NULLIF(active_weeks, 0) AS purchase_freq_per_week,
    total_revenue / NULLIF(frequency, 0) AS arpu
  FROM rfm_step1
),

-- [4] LTV 계산 + 이탈 여부 반영 (60일 이상 미접속 시 LTV 50% 감소)
rfm_ltv AS (
  SELECT
    *,
    aov * purchase_freq_per_week * active_weeks AS raw_ltv,
    DATE_DIFF(DATE('2025-03-27'), last_order_date, DAY) >= 60 AS is_churned,
    CASE
      WHEN DATE_DIFF(DATE('2025-03-27'), last_order_date, DAY) >= 60 THEN 
        aov * purchase_freq_per_week * active_weeks * 0.5
      ELSE 
        aov * purchase_freq_per_week * active_weeks
    END AS ltv
  FROM rfm_step2
),

-- [5] RFM 점수 부여 (NTILE 사용 시 user_id 중복 방지를 위해 여기서 별도 처리)
rfm_scored AS (
  SELECT
    *,
    NTILE(5) OVER (ORDER BY recency ASC) AS r_score,
    NTILE(5) OVER (ORDER BY frequency ASC) AS f_score,
    NTILE(5) OVER (ORDER BY total_revenue ASC) AS m_score
  FROM rfm_ltv
),

-- [6] 클러스터 정의
clustered AS (
  SELECT *,
    r_score + f_score + m_score AS rfm_score,
    CASE
      WHEN r_score >= 3 AND f_score >= 3 AND m_score >= 3 AND (r_score + f_score + m_score) >= 10 THEN 'VIP'
      WHEN f_score >= 4 THEN '충성고객'
      WHEN r_score <= 2 AND f_score <= 2 AND m_score <= 2 THEN '이탈위험고객'
      ELSE '일반고객'
    END AS cluster
  FROM rfm_scored
)

-- [7] 최종 출력
SELECT
  user_id,
  last_order_date,
  recency,
  frequency AS orders,
  total_revenue,
  aov,
  arpu,
  purchase_freq_per_week,
  active_weeks,
  raw_ltv,
  is_churned,
  ltv,
  r_score,
  f_score,
  m_score,
  rfm_score,
  cluster
FROM clustered;


세일즈&환불팀_C군

  • 세일즈 뷰 변경
      • 판매팀 View CREATE OR REPLACE VIEW ecommerce_mart2.v_sales_dashboard AS SELECT f.product_id, p.name AS product_name, p.category, f.order_id, f.order_created_at, p.brand, u.city, u.state, ii.cost, u.traffic_source, u.age, p.retail_price, f.status, f.returned_at, f.order_item_created_at AS sales_date, f.order_item_delivered_at, f.num_of_item, f.shipped_at AS order_item_shiped_at, f.sale_price FROM ecommerce_mart2.fact_order_item f LEFT JOIN ecommerce_mart2.dim_product p ON f.product_id = p.product_id LEFT JOIN ecommerce_mart2.dim_user u ON f.user_id = u.user_id LEFT JOIN ecommerce_mart2.fact_inventory_item ii ON f.product_id = ii.product_id;

        •  

초기 검토 데이터후보

Airbnb Open Data (이탈리아 피렌체) 2안 선택안함

태블로에서 1순위기준 잼있는거 2순위 인터랙티브하고 동적인 대시보드 구현에 최적화된 데이터 목적 : 실제로 여행을 간다는가정하에 필요한 항목이나 정보를 시각화(태블로 시각화) Airbnb Open Data (이탈리아 피렌체) 대시보드 구조

시계열, 지리적, 가격 기반 분석을 포함한 인터랙티브한 대시보드 구현이 가능. 단, 속도와 데이터 정제 측면에서 사전 전처리와 설계가 중요

1. 장점 (Tableau 대시보드 최적화 관점)

데이터 연계성 높음

listings.csv: 숙소 고유 정보

calendar.csv.gz: 날짜별 예약 가능성과 가격

reviews.csv.gz: 리뷰 내용 및 날짜

neighbourhoods.csv: 지리적 구분 → 이들은 listing_id나 neighbourhood 기준으로 서로 연계할 수 있어 분석 단위 통합이 가능함

시계열 분석 가능

calendar.csv: 일자별 가격, 예약 가능 여부 제공 → 가격 추이, 시즌별 분석 가능

reviews.csv: 리뷰 날짜 기반 → 사용자 경험 변화 분석

지리적 시각화 가능

neighbourhoods.csv 및 .geojson → 지도 시각화 지원

listings.csv에는 위도/경도 포함 → Tableau Map View 완벽 지원

사용자 분석이 가능

리뷰 텍스트, 리뷰 개수, 평점 등 고객 반응 기반의 분석 구현 가능

슈퍼호스트, 응답속도 등 호스트 특성도 분석 가능

2. 단점 및 주의할 점

데이터 양이 많아 Tableau 속도 저하 가능성

특히 calendar.csv는 listing × 날짜 수 만큼 레코드 존재 → 데이터 추출(extract) 방식 추천

리뷰 내용은 자연어로 처리 어려움

리뷰 텍스트는 Tableau에서 직접 분석 어려움 → 사전 텍스트 전처리 필요

결측치 및 형식 불일치 가능성

일부 열에 결측값 많을 수 있음 (ex: 호스트 응답률, 편의시설)

날짜 포맷, 숫자 포맷이 locale에 따라 깨질 수 있음 → 사전 정제 필요

시간 범위 통일 필요

각 데이터셋의 기준 날짜가 상이할 수 있음 → 연도/월 필터는 사전 정의 필요

3. 추가 추천 파생변수 (계산 필드 또는 파생 필드)

Tableau에서 인터랙티브 분석 강화를 위해 파생변수 추가추천

price_cleaned (문자열에서 숫자 추출: $120 → 120)

is_available (calendar 데이터의 available → True/False 처리)

review_per_month (리뷰 수 / 운영 개월 수)

weekday (calendar의 날짜에서 요일 추출)

season (월에 따라 성수기/비수기 분류)

stay_category (숙박 일 수에 따라 단기/중기/장기 분류)

Airbnb Open Data 대시보드 구조

목적: 숙소 유형별 트렌드 분석 및 사용자 맞춤 숙소 필터링

주요 시각화 요소

  1. 숙소 개수 및 평균 가격 KPI
  2. 숙소 위치 맵 (지도 시각화)
    • 가격 범위에 따라 색상 또는 점 크기 조절
  3. 숙소 유형별 가격 분포 (박스플롯 또는 막대그래프)
    • Entire home, Private room 등
  4. 월별 리뷰 수 추이 (시계열)
  5. 호스트 상위 Top 10 및 숙소 수 (가로 막대)

인터랙티브 기능

  • 슬라이서: 가격 범위, 숙소 유형, 지역 선택
  • 지도 클릭 시 해당 위치의 상세 정보 필터링
  • 날짜 필터: 연도/월별 선택
  • 리뷰 수 기준 정렬 기능
  • https://insideairbnb.com/florence/
  • ▶ 이유 : 가격데이터에 대한 세분화가 안되어있어 수익, 지출, 등 회사입장에서 인사이트 도출에 한계가 있음

이렇게 빅쿼리에서 서로 데이터에 대한 기본 데이터는 아래로 정함.

Looker Ecommerce BigQuery Dataset 사


https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset

 

Looker Ecommerce BigQuery Dataset

CSV version of BigQuery Looker Ecommerce Dataset

www.kaggle.com

프로세스

데이터 선정 → EDA → 데이터 마트 구축 → 대시보드 설계 → 대시보드 구축