top of page
DISCnet BG .png

Introduction to Big Data Module

The aims of this 3-day DISCnet course is to explore how big data techniques can be used to solve massive scale data analysis problems;  introduce students to the theoretical background of cloud computing and practical applications; teach students about the processing of large datasets using Big Data techniques. Map-reduction will be a key focus along with other important techniques.

Learning Objectives:
- Understand the theoretical approaches to big data analysis and the design of modern big data processing pipelines.
- Design a big data processing system.
- Successfully analyse large datasets using Python and Spark.


Prerequisites:
Practicals will require programming in Python, as well as the use of the UNIX command line / bash shell (e.g., skill learnt during the Software Carpentry course, or equivalent). While students do not need significant experience in Python itself, some serious programming experience is required as the course exercises will require you to write big data analytics code. This course is not suitable for students who have zero practical experience in writing code.

Students who are not confident in Python are expected to use the resources on Python to gain experience before the class. All students are expected to complete the pre-study exercise which looks at lambda expressions in Python. This should be done at least 2 weeks prior to the course to ensure sufficient time for your cloud server accounts to be created.

Pre-course set-up and exercises:

Important Information:

The course is mandatory for DISCnet core students and also open to non-core DISCnet and GRADnet students. The course is aimed at students in Year 1 or 2 of their PhD. The course may also be suitable for students in later years, depending on their computing and programming experience.

 

You will need a laptop computer for the course. Laptops should have minimum requirements of 2 Cores, 2 GHz CPU processor, 8Gb RAM, 30 Gb free disk space to run the virtual machine image.

bottom of page