大數(shù)據(jù)技術的落地依賴“工具鏈掌握+場景化應用”的雙重能力。從 Python 爬蟲、Hive數(shù)據(jù)分析到 Flink 實時計算、數(shù)倉架構設計,技能點的綜合應用能力已成為企業(yè)招聘的核心標準。本書以“真實項目驅(qū)動實訓”為核心思路,精選 4 個典型實訓項目構建階梯式訓練體系,涵蓋離線處理、實時計算、數(shù)倉設計等核心場景,強化工程思維;整合 Python 爬蟲、Hive、Flink、Kafka 等多種主流工具,覆蓋數(shù)據(jù)采集、清洗、存儲、分析、可視化全流程;融入大數(shù)據(jù)競賽考點,銜接崗位技能需求。本書適合作為高等學校大數(shù)據(jù)相關專業(yè)的實訓教材,也可為數(shù)據(jù)工程從業(yè)者提供實踐參考。
張志偉,副教授,宿州學院信息工程學院軟件工程教研室主任,博士畢業(yè)于華南理工大學計算機科學與技術專業(yè),研究方向為數(shù)據(jù)科學與大數(shù)據(jù)技術、人工智能,主持多項國家自然科學基金委員會項目和省級項目,編寫圖書3部。
第 1 章 歷史天氣數(shù)據(jù)分析項目································································································.1
任務一 需求分析·················································································································.1
任務二 技術架構分析及設計 ·····························································································.2
任務三 歷史天氣數(shù)據(jù)采集 ·································································································.5
任務四 導入天氣數(shù)據(jù)至 Hive···························································································.13
任務五 歷史天氣數(shù)據(jù)分析 ·······························································································.22
任務六 結果指標表導出···································································································.33
任務七 數(shù)據(jù)可視化···········································································································.36
第 2 章 音樂推薦系統(tǒng)··············································································································.44
任務一 需求分析···············································································································.44
任務二 技術架構分析及設計 ···························································································.45
任務三 數(shù)據(jù)集合和項目概述 ···························································································.47
任務四 數(shù)據(jù)加載模塊·······································································································.52
任務五 數(shù)據(jù)統(tǒng)計模塊·······································································································.55
任務六 離線推薦模塊·······································································································.59
任務七 實時推薦模塊·······································································································.65
第 3 章 電商離線數(shù)倉··············································································································.72
任務一 需求分析···············································································································.72
任務二 數(shù)倉概述及架構分析 ···························································································.73
任務三 數(shù)據(jù)源···················································································································.75
任務四 數(shù)倉建設···············································································································.77
任務五 工作流調(diào)度···········································································································117
任務六 數(shù)據(jù)可視化·········································································································.128
第 4 章 智慧社區(qū)實時數(shù)倉····································································································.136
任務一 需求分析·············································································································.136
任務二 技術架構分析及設計 ·························································································.137
任務三 數(shù)據(jù)源與預處理·································································································.140
任務四 實時計算框架配置 ·····························································································.153
任務五 DIM 層構建········································································································.155
任務六 ODS 層構建········································································································.169
任務七 DWD 層構建 ······································································································.174
任務八 DWS 層構建·······································································································.182
任務九 數(shù)據(jù)可視化與應用 ·····························································································.192
附錄 A Hadoop 部署與配置··································································································.196
附錄 B MySQL 部署··············································································································.206
附錄 C Hive 部署與配置 ·······································································································.208
附錄 D DataX 部署與配置 ····································································································.215
附錄 E Zookeeper 部署與配置·····························································································.216
附錄 F Kafka 部署與配置······································································································.220
附錄 G Flume 部署與配置····································································································.224
附錄 H DolphinScheduler 部署與配置················································································.227
附錄 I Superset 部署與配置·································································································.234