[데이터베이스] 빅데이터응용 정리
Undergraduate lectures

[데이터베이스] 빅데이터응용 정리

본 글은 학교수업(이민수 교수님의 '빅데이터응용')을 듣고 복습차 정리한 글입니다.

 

DataBase Essentials

- Database Application Type

Web(de-facto standard user interface): formatted as HTML + URL + HTTP

- Why?: enable users to access database from anywhere

- Dynamic generation of documents: tailor the display based on user information / up-to-date

- Client Side Scripting: allow documents to be active(=permit flexible interaction) / security problem

- Server-Side Scripting: connecting a database to the web / Caching can improve performance.

- HTTP/Sessions/Cookies(:Session Information)/Servlet

 

Database 순서

Data

- Storage and indexing

Data Model(Tables)

Query Lang.(SQL) > SQL Query

Query Processing and Optimization(Index, Join)

Query Parser > Query Tree(shows syntax)

Query Optimizer > Query Execution Plan(shows access methods, execution methods)

Query Executer > Query Result

Concurrency Control

Locking

Timestamp based control

Transaction & Recovery

 

Database 저장 (Storage and Indexing)

Database > (System/User/Temporary)Table space > Segments > Extent

Horizontal Partitioning: 

- benefits(backup and recovery/Loading operations in data warehouse/ Query performance benefits)

- Range Partitioning / Hash Partitioning / Composite Partitioning (range+hash) / List Partitioning

Indexing

- B-tree indices / Bitmap Indices / Function-Base Indices / Join Indices / Domain Indices

기본 단어

- Instance: a shared memory area and set of processes that interact with the data files.

- Segments: Data Segments(table data) / Index Segments(index) / Temporary Segment(sorting) / Rollback segments(uncommitted transaction)

 

Query Processing and Optimization

Execution Methods > Query Tree

- Table Scans

- Joins

Optimization > Query Execution Plan

- Query Transformations: transforms query string into more efficiently executable query string (View Merging, Subquery Flattening, Materialized view rewrite, Star transformation)

- Access Path Selection: Cost-based Optimizer(lowest overall cost) that determines join order, join methods and access paths

Parallel Execution > Query Result

: dividing the work between multiple processes

Parallel Full table scan / Parallel Join / Parallel Sorts

Concurrency Control(동시에 여러개가 작동되게)

: enable multiple reads and writes from different transactions to coexist

- Locking: Row-level Locking, table lock

- Timestamp-based Control: SCN(System Change Number)

 

Transactions & Recovery

Transaction Property: Atomicity(all or nothing) / Consistency(다 함 or 다 취소) / Isolation(isolated) / Durability(업데이트되야)

Recovery: Redo(: Redo Log) / Undo(: Rollback Segment(처음 데이터로 돌아가기)) + Standby Database

 

Advanced SQL

- Ranking

- Windowing

 

Data Analysis

Multidimensional Data & Data WareHousing

- Fact table, Dimension tables : Measure Attributes, Dimension Attributes(Hierarchy) for Cross-tabulation > Data Cube

- Schema: Star Schema, Snowflake Schema

Data Mining: Prediction

- Classification: Decision Tree, Naïve Bayesian, Support Vector Machine, Neural Network 

- Regression

- Descriptive Patterns: Associations, Clusters(Hierarchical Clustering)

 

 

코드 연습지.docx
0.02MB
Keyword 연습지.docx
0.02MB
BigDataApplication 기말.pdf
0.24MB