mac40 / BDC

Big Data Computing

3 stars 0 forks

Star

Watch

40 commits

Failed to load latest commit information.

README.md

Big Data Computing

Big Data phenomenon

Technological progress
- storage capacity
- communication bandwidth
- computing power
- Reduction of ICT costs
Digital Universe
- Integration of digital technologies in every human activity
- Scientific research (produces a lot of data)
- Exponential growth of data
Data can be either structured (database records) or unstructured (textual data)

Application Domains

The analysis of large datasets arises in:
- Retailing: product improvement, recommandation systems
- Banking/Finance: fraud detection...
- Telecommunications: user profiling
- Science: validation methods
- Medicine: diagnosis/therapy
- Social studies: IOT

The Four V's of DATA

Volume
- size of data poses several computational challenges and requires a data-centric perspective
Velocity
- the data arrives at such high rate that tey cannot be stored and processed offline, but need to be processed in streaming
Variety
- large datasets often come unconstructed and may relate to very different scenarios
Veracity
- large datasets coming form real-word applications are likely to contain noisy, uncerain data

All points above require a paradigm shift with respect to traditional computing

Course presentation

Main objectives

Novel computing/programming frameworks for big data processing: theory and practice
- Spark
A sample of key primitives for data analysis
- Rigorous setting (be able to analitically predict what's going to happen)
- Algorithmic solutions with focus on large inputs

Specific Content

Computational Frameworks: MapReduce, Apache Spark
Clustering primitives (Professor's focus)
Graph analysis primitives
Association analysis primitives (Data mining)
Data stream processing

Evaluation

Written exam (26 points)
Homeworks (6+1 points)
- groups of max 3/4 sudents
- 4 assignments, one every 2/3 weeks
- Use of Apache Spark on individual PCs (assignments 1-3) and CloudVeneto (assignment 4)

Online tools

Moodle: forum, evaluation of homeworks and of written exams
Uniweb: written exam lists, official final grades
Course website: http://www.dei.unipd.it/~capri/BDC/

About

Big Data Computing

big-data university padua hadoop-mapreduce clustering association-analysis spark

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.