본 글은 학교수업(이민수 교수님의 '빅데이터응용')을 듣고 복습차 정리한 글입니다.
DataBase Essentials
- Database Application Type
Web(de-facto standard user interface): formatted as HTML + URL + HTTP
- Why?: enable users to access database from anywhere
- Dynamic generation of documents: tailor the display based on user information / up-to-date
- Client Side Scripting: allow documents to be active(=permit flexible interaction) / security problem
- Server-Side Scripting: connecting a database to the web / Caching can improve performance.
- HTTP/Sessions/Cookies(:Session Information)/Servlet
Database 순서
Data
- Storage and indexing
Data Model(Tables)
Query Lang.(SQL) > SQL Query
Query Processing and Optimization(Index, Join)
Query Parser > Query Tree(shows syntax)
Query Optimizer > Query Execution Plan(shows access methods, execution methods)
Query Executer > Query Result
Concurrency Control
Locking
Timestamp based control
Transaction & Recovery
Database 저장 (Storage and Indexing)
Database > (System/User/Temporary)Table space > Segments > Extent
Horizontal Partitioning:
- benefits(backup and recovery/Loading operations in data warehouse/ Query performance benefits)
- Range Partitioning / Hash Partitioning / Composite Partitioning (range+hash) / List Partitioning
Indexing
- B-tree indices / Bitmap Indices / Function-Base Indices / Join Indices / Domain Indices
기본 단어
- Instance: a shared memory area and set of processes that interact with the data files.
- Segments: Data Segments(table data) / Index Segments(index) / Temporary Segment(sorting) / Rollback segments(uncommitted transaction)
Query Processing and Optimization
Execution Methods > Query Tree
- Table Scans
- Joins
Optimization > Query Execution Plan
- Query Transformations: transforms query string into more efficiently executable query string (View Merging, Subquery Flattening, Materialized view rewrite, Star transformation)
- Access Path Selection: Cost-based Optimizer(lowest overall cost) that determines join order, join methods and access paths
Parallel Execution > Query Result
: dividing the work between multiple processes
Parallel Full table scan / Parallel Join / Parallel Sorts
Concurrency Control(동시에 여러개가 작동되게)
: enable multiple reads and writes from different transactions to coexist
- Locking: Row-level Locking, table lock
- Timestamp-based Control: SCN(System Change Number)
Transactions & Recovery
Transaction Property: Atomicity(all or nothing) / Consistency(다 함 or 다 취소) / Isolation(isolated) / Durability(업데이트되야)
Recovery: Redo(: Redo Log) / Undo(: Rollback Segment(처음 데이터로 돌아가기)) + Standby Database
Advanced SQL
- Ranking
- Windowing
Data Analysis
Multidimensional Data & Data WareHousing
- Fact table, Dimension tables : Measure Attributes, Dimension Attributes(Hierarchy) for Cross-tabulation > Data Cube
- Schema: Star Schema, Snowflake Schema
Data Mining: Prediction
- Classification: Decision Tree, Naïve Bayesian, Support Vector Machine, Neural Network
- Regression
- Descriptive Patterns: Associations, Clusters(Hierarchical Clustering)
'Undergraduate lectures' 카테고리의 다른 글
[Mathematics] 수치해석 정리 (0) | 2020.12.18 |
---|---|
[소프트웨어] 소프트웨어공학 정리 (0) | 2020.12.18 |
[Database] 전체 기본 개념 정리 (0) | 2020.07.18 |
[운영체제] 중간/기말고사 틀린 문제 정리 (0) | 2020.07.04 |
[컴퓨터구조] LINUX리눅스 명령어 정리 (0) | 2019.10.24 |