Data Analytics with HPC
Dates: 20-21 Jun 2018
Location: Queens University Belfast
Please note: these materials are based on a previous run of the course and may be subject to change before the course begins, but they will give you an idea of the content to be covered.
Lecture Slides
Unless otherwise indicated all material is Copyright © EPCC, The University of Edinburgh, and is only made available for private study.
Day 1
- 09:00 – 09:30 Arrival/set-up/Welcome
- 09:30 – 10:30 What are data analytics, big data, data science
- 10:30 – 11:00 COFFEE
- 11:00 – 12:00 Data Cleaning
- 12:00 – 13:00 Practical: Data Cleaning notes and practical files
- 13:00 – 14:00 LUNCH
- 14:00 – 14:45 Supervised Learning, feature selection, trees, forests
- 14:45 – 15:30 Naïve Bayes
- 15:30 – 16:00 COFFEE
- 16:00 – 17:00 Naïve Bayes Practical
- 17:00 - CLOSE OF DAY
Day 2
- 09:00 – 10:30 MapReduce / Hadoop
- 10:30 – 11:00 COFFEE
- 11:00 – 11:30 Hadoop demonstrations
- 11:30 – 12:30 Unsupervised learning
- 12:30 – 13:30 LUNCH
- 13:30 – 14:15 Spark
- 14:15 – 14:45 Spark demonstration and examples
- 14:45 – 15:15 COFFEE
- 15:15 – 16:00 Spark, Data streaming
- 16:00 – CLOSE OF COURSE
Exercise Material
Unless otherwise indicated all material is Copyright © EPCC, The University of Edinburgh, and is only made available for private study.Data cleaning materials Naïve Bayes materials Hadoop materials Spark demo materials
- Spark k-means walkthough (pdf)
- Spark k-means walkthrough (Jupyter notebook)
- Spark k-means walkthrough (Jupyter notebook hosted on github)
Course Chat page
Chat pageThe Chat page is a live collaborative online document which we will use to share links, information and comments. All course participants are encouraged to contribute.