Big Data Infrastructure
General information
Course instructor: Assist. Prof. Vedran Miletić
Name of the course: Big Data Infrastructure
Study programme: Graduate University Study Programme Informatics
Status of the course: compulsory for Intelligent and Interactive Systems (IIS) module, elective for other modules
Year of study: 1
ECTS credits and manner of instruction:
- ECTS credits: 6
- Number of class hours (L+E+S): 30+30+0
Course description
Course objectives
The aim of the course is to acquire knowledge about the infrastructure in the background of applications and services of intelligent information systems that work with big data, and to acquire skills in the implementation and maintenance of such infrastructure in the cloud.
Course enrollment requirements
There are no enrolment requirements.
Expected learning outcomes
It is expected that after successfully fulfilling all the obligations stipulated in the program, the student will be able to:
O1. Choose distributed architectures for working with big data (eg lambda, kappa, delta, etc.) and appropriate tools for such architectures.
O2. Anticipate the needs of an intelligent information system for the infrastructure in the cloud with the connection to appropriate interfaces of data, information and knowledge repositories with associated metadata.
O3. Design a model of data management, coordination, message exchange, and interaction in an intelligent information system using appropriate methods and techniques (e.g. distributed databases, cache systems, message exchange systems, data streaming systems, etc.) and a corresponding distributed database model using appropriate languages for data modeling and taking into account the specifics of the system architecture.
O4. Recommend technologies for implementing the integration of data, information and knowledge from heterogeneous and distributed data systems that meet the requirements of the given problem.
O5. Choose an appropriate set of cloud technologies (eg monolithic and microservice architectures, containers, virtual machines, etc.) for the implementation of an intelligent information system.
O6. Develop intelligent cloud services based on data analytics and artificial intelligence, as well as related interfaces and appropriate documentation.
O7. Develop components of intelligent information systems and associated automated testing procedures using platforms, libraries, frameworks and cloud services as infrastructure.
O8. Implement an intelligent agent that solves the given problem using the default interfaces, services, applications, interaction mechanisms, and types of behavior suitable for the given problem, and an agent model of the system that will be used to simulate the system's behavior.
Course content
The content of the course consists of topics:
- Reliability, scalability, and sustainability of applications. Data models. Data storage and retrieval. Data encoding for storage and transmission.
- Data replication and partitioning. Transactions. Challenges of distributed systems: errors, unreliability, consistency guarantee, and consensus.
- Development and implementation of cloud-native applications. Cloud data operations. Portability between different clouds. The evolution of monolithic applications into microservices.
- Infrastructure and services for serial and streaming data processing. Intelligent information system and agent support services.
- Technological trends and the future of large-scale data processing systems.
Manner of instruction
- lectures
- seminars and workshops
- exercises
- distance learning
- fieldwork
- individual assignments
- multimedia and network
- laboratories
- mentorship
- other
Comments
Teaching will be conducted by combining work in the classroom and independent work outside the classroom, with the use of an e-learning system.
Student responsibilities
Responsibilities of students in the course are:
- Regularly attending classes, participating in all course activities, and monitoring notifications related to classes in the e-learning system.
- Take continuous knowledge assesments (theoretical and practical colloquiums) and successfully pass them.
- Create practical works (individual or team projects) on given topics and defend them.
- Take the final exam and score at least 50% on it.
A detailed scoring system for the course and passing scores for individual activities will be specified in the course syllabus.
Monitoring1 of student work
- Class attendance: 2
- Class participation:
- Seminar paper:
- Experimental work: 1
- Written exam: 1
- Oral exam: 1
- Essay:
- Research:
- Project:
- Continuous assessment:
- Report:
- Practical work: 1
- Portfolio:
Assessment of learning outcomes in class and at the final exam (procedure and examples)
- A written or online test in which the student will demonstrate the understanding and the ability to analyze and synthesize theoretical concepts of distributed systems, heterogeneous data systems, architectures for working with large-scale data, infrastructure of intelligent information systems, and cloud technologies (I1, I2, I4, I5) .
- Experimental work with different architectures for working with big data and appropriate tools (e.g. Hadoop, Spark, Kafka, HBase, etc.) with the aim of collecting analytical metrics necessary for predicting infrastructure needs of an intelligent information system based on that architecture (I1, I2). In accordance with the provided infrastructure, the student will design a model of data management, coordination, message exchange, and interaction and also recommend technologies for the implementation of a heterogeneous and distributed data system (such as distributed relational and non-relational (NoSQL) databases, databases based on data streaming (e.g. Kafka), blockchain technologies and/or generalized databases, document-based databases, and media and object-oriented databases) (I3, I4).
- Practical work defended orally in which the student will choose an appropriate set of cloud technologies (such as AWS, Azure, Google Cloud, IBM Cloud, Scaleway, DigitalOcean, Watson, Wit.ai, Botpress, etc.) and use it for the development of an intelligent service (e.g. an intelligent agent or an intelligent information system component) based on data analytics and artificial intelligence and also associated interfaces (e.g. REST, WebSocket, TCP/UDP, ZMTP, AMQP, XMPP, etc.), with appropriate documentation (I5, I6, I8). As part of the development, they will also implement procedures for automated testing of the cloud service using appropriate technologies (e.g. unit testing, end-to-end testing, penetration testing, ethical hacking, etc.) (I7).
Mandatory literature (at the time of submission of study programme proposal)
- Takada, M. Distributed systems: for fun and profit. (Mixu, 2013). Available online: book.mixu.net/distsys/
- Beyer, B., Jones, C., Petoff, J. & Murphy, N. R. Site Reliability Engineering: How Google Runs Production Systems. Available online: sre.google/sre-book/table-of-contents/
- Kleppmann, M. Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. (O'Reilly Media, 2017).
- Scholl, B., Swanson, T. & Jausovec, P. Cloud Native: Using Containers, Functions, and Data to Build Next-Generation Applications. (O'Reilly Media, 2019).
- Aspnes, J. Notes on Theory of Distributed Systems. (Aspnes, 2021). Available online: cs-www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf
- Sadržaji pripremljeni za učenje putem sustava za učenje.
Optional/additional literature (at the time of submission of the study programme proposal)
- Raman, A., Hoder, C., Bisson, S. & Branscombe, M. Azure AI Services at Scale for Cloud, Mobile, and Edge: Building Intelligent Apps with Azure Cognitive Services and Machine Learning. (O'Reilly Media, 2022).
- Fregly, C. & Barth, A. Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines. (O'Reilly Media, 2021).
- Winder, P. Reinforcement Learning: Industrial Applications of Intelligent Agents. (O'Reilly Media, 2020).
- Adkins, H., Beyer, B., Blankinship, P., Oprea, A., Lewandowski, P. & Stubblefield, A. Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems. (O'Reilly Media, 2020). Available online: sre.google/static/pdf/building_secure_and_reliable_systems.pdf
- Reznik, P., Dobson, J. & Glenow, M. Cloud Native Transformation: Practical Patterns for Innovation. (O'Reilly Media, 2019).
- Arundel, J. & Domingus, J. Cloud Native DevOps with Kubernetes: Building, Deploying, and Scaling Modern Applications in the Cloud. (O'Reilly Media, 2019).
- Newman, S. Monolith to Microservices: Evolutionary Patterns to Transform Your Monolith. (O'Reilly Media, 2019).
- Sridharan, C. Distributed Systems Observability. (O'Reilly Media, 2018).
- Burns, B. Designing Distributed Systems. (O'Reilly Media, 2018).
- Beyer, B., Murphy, N. R., Rensin, D., Kawahara, K. & Thorne, S. The Site Reliability Workbook: Practical Ways to Implement SRE. (O'Reilly Media, 2018). Available online: sre.google/workbook/table-of-contents/
Number of assigned reading copies in relation to the number of students currently attending the course
Title | Number of copies | Number of students |
---|---|---|
Distributed systems: for fun and profit | Available online | 20 |
Site Reliability Engineering: How Google Runs Production Systems | Available online | 20 |
Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems | 1 | 20 |
Cloud Native: Using Containers, Functions, and Data to Build Next-Generation Applications | 1 | 20 |
Notes on Theory of Distributed Systems | Available online | 20 |
Quality monitoring methods that ensure the acquisition of exit knowledge, skills and competences
Periodical evaluations will be carried out in order to ensure and continuously improve the quality of the course and the study programme (as part of the activities of the Quality Assurance Committee of the Faculty of Informatics and Digital Technologies). In the last week of classes, students will anonymously evaluate the quality of the course. An analysis of student success in the course will also be carried out (percentage of students who successfully completed the course and their grade average).
-
Important: Enter the appropriate proportion of ECTS credits for each activity so that the total number of credits equals the ECTS value of the course. Use empty fields for additional activities. ↩