CS 647 Distributed Systems

CS647: Distributed Systems

3 credits

Spring 2023

General Information

Instructor:

Student Learning Information

Course Description

In-depth discussion of fundamental concepts of distributed computer systems. Covers development techniques and runtime challenges, with a focus on reliability and system validation techniques. Subjects discussed include: interprocess communication, remote procedure calls and method invocation, middleware, distributed services, coordination, transactions, replication and weak data consistency models. Significant system-building term project in Java or similar language.

Course Purpose within a Program of Study

Within the revised MSSE program, this will serve as one of 6 possible CS electives (3 required along with 3/6 IS electives) to provide broader knowledge knowledge of software engineering. The MSSE degree emphasizes modern practices and techniques to produce reliable software that functions as desired, in a timely manner. Distributed systems are an increasingly important domain and more software systems move to shared cloud infrastructure.

Within the CS PhD program, the course will serve as an elective suitable for any graduate student, but particularly for those with research interests in software engineering, systems, or programming languages.

Statement of Expected Learning

The course objectives are to:

As learning outcomes, students completing this course should be able to:

Course Materials

Required: No textbooks, instructor-selected research papers.

Recommended: Designing Data-Intensive Applications, by Martin Kleppmann. This book is optional, but will provide a nice extended discussion for much of the course material. In addition to the natural option to purchase a hardcopy, it is available in electronic form (with and without DRM, see the book's site), and via O'Reilly's Safari Books Online platform. You can access this via Drexel Libraries' subscription by going here, clicking on "Full Text Online" and signing in with your Drexel credentials.

Required and Supplemental Materials and Technologies

The course will use class Discord channels for general course questions and material questions that may be of interest to everyone in the course (setup questions, assignment clarifications, etc.). If you have already joined the CCI Discord server, you should be added to the class channels automatically. If not, go here to connect a Discord account to CCI's server.

This course will be programming-intensive; you should expect to write a moderate amount of very challenging code.
Assignments must be completed using the Akka library for (distributed) actors on the JVM. You are welcome to use any JVM language you choose (Java, Scala, Kotlin, or any other well-maintained JVM language).

Previous iterations of the course did permit students to use the Akka.NET reimplementation of Akka for .NET (i.e., C# and F#). Unfortunately the two have now diverged quite a bit, so it's no longer possible for me to give a single example structure that works across both platforms.

Projects in the course will require you to write full self-contained projects using a build file for a cross-platform build tool that handles fetching dependencies, compiling your code, and executing the code. On the JVM, any of the major tools is acceptable (ant+ivy, maven, gradle, or sbt). Projects must build from the command line: projects that only build via an IDE will be penalized, though you are welcome to use whatever editors or IDEs you like for writing your code.

You are expected to use the Akka documentation as appropriate. There is a book in the works which covers the latest version of Akka well, but it will not be released in time for the class: Akka in Action, 2nd Edition. Early access to the latest draft (which will be upgraded to permanent access) is available via the publisher, but before deciding to buy that, keep in mind:

  1. You don't need the book to do the class. Lectures will include some introductions to the core concepts, and other aspects of what you'll need are well-covered by documentation.
  2. It is likely the book will be published during the term. In that case, shortly after release it should appear in O'Reilly's online learning platform, which Drexel has a subscription to (so when this happens, you would be able to access it for free with your university login)

Getting Help

The best way to get help is to use the course Discord channels, email the professor, or show up to office hours.

Examples of questions that are best for Discord include:

These are good for Discord because they're questions of general interest which don't go into too much detail about specific code you're writing.

Examples of questions that are best for email or office hours only:

Questions about personal circumstances (e.g, extension requests) should go to the professor only, via email.

Assignments, Assessments, and Evaluations

Graded Assignments and Learning Activities

The course grading is focused on responses to readings, as well as homework projects.

Readings

Each week (except the first) you will need to respond to two research papers, no later than midnight the day before class. Late responses are not accepted, but below there is a policy allowing you to skip a few of these during the term without penalty.

When reading each paper, you should consider if you could write:

Not all of these will make sense for every paper, but most of them are sensible for most papers.

Each week there will be two discussion questions to respond to, one for each paper, to touch on some of the aspects above, and possibly relate papers to others discussed earlier in the term. In those responses, you are welcome and encouraged(!) to raise other issues you might have encountered, especially points of confusion.

Note that different people will often have different takes on the same paper, disagree on whether a choice made by the authors is a strength or weakness, or find different things clarifying or confusing. This is all okay! Everyone has different backgrounds.

If you are confused about some part of the paper, don’t be shy: almost certainly someone else found it confusing, too, and maybe they’re too timid to mention it. By pointing out what was confusing or difficult, we raise the opportunity to discuss it in class and help everyone understand better. I found some parts of these papers difficult or confusing the first time (or two) I read them - this is normal.

Professor Gordon will read the responses in time to adjust lecture to clarify common points of confusion or elaborate on things people found interesting.

You may skip up to 4 reading responses during the term, by submitting, instead of a response, a sentence saying you are using a skip --- by the deadline. Do not email me a request, just submit it. You may do this for any 4 responses, distributed any time in the term. You may skip one response in each of four weeks, both responses in each of two weeks, or take the in-between option of skipping one week completely and half the work on two additional weeks. Assignments for those weeks will simply be omitted when calculating your reading response grade for the term.
A couple things to consider about skips:

  1. The skip only means you do not need to write a response. If it's a paper that is useful for your homework, you may still find it beneficial (or essential) to read the paper, even if you skip the response.
  2. Because using a skip reduces the number of reading grades you have, it makes the remaining grades worth slightly more.

You may apply skips retroactively; if at the end of the term you have a 0 or 1 you'd like to remove, and you have skips remaining, you can apply the skip to that response by emailing me a request in the final week of the term (I will not do this automatically). There is no bonus for having unused skips at the end of the term.

Reading responses are graded on a scale of 0-2, with a possible extra-credit score of 3:

The discussion questions do not have unique right answers. You're expected to answer the questions after some thought. If you feel you do not understand some aspect of the paper well enought to answer the question, you can instead explain your confusion or uncertainty in detail, and such answers can (and usually will) receive full credit if they reflect a serious attempt at understanding and pinpointing the sources of confusion. Even answers that reflect misunderstandings can sometimes earn full credit; the grading is based more on your answer reflecting a serious engagement with the paper and course material than on specific factual understandings.

A Note on Proofs: Many of the papers you will read this term include formal proofs of correctness for an algorithm or protocol. You won't be asked to produce proofs in this class. But, you will need to understand them. Some of your homework assignments will have you implementing algorithms from these papers, and understanding the proofs of correctness will help you think about the code you write. More broadly, outside this class, some of the proofs are fundamental impossibility results. It's all well and good to understand that X is impossible, but you're rarely asked to do X that is known to be impossible. Instead, you're sometimes asked to do Y, where Y and X have some strong similarities. Sometimes Y is simpler than X in a key way that makes it possible. Sometimes Y is actually a variation on X. Understanding the proof for why X is impossible will help you recognize when you see variations on it.

Homeworks

The homeworks are tentatively on the following:

The late policy for homeworks is as follows: for the term, you have 5 late days to distribute between homework assignments at your discretion, with one restriction: the final homework may not be submitted after the end of the final week of classes.

Each homework should only require a modest amount of code, but that code might be very difficult to write and debug. in addition to coding, each homework will include some kind of reflection or analysis of what you did, generally in an open-ended way.

Grading Matrix

The late/skip policies were described above.

In addition the late and skip policies, extensions are possible for good reason with reasonable notice. I am aware that students have jobs, family matters, paper deadlines for their PhD, etc., which can interfere with completing assignments. I want your grade to reflect your mastery of the material and quality of work you hand in, not whether or not you were fortunate enough to avoid major life events during the term. If something comes up during the term, let me know. If it's unexpected (e.g., you end up in the ER when you were planning to work on coursework), let me know when you can and we'll figure it out. If it's something you know about in advance (e.g., you must travel for work), let me know as soon as you know, and we can discuss whether we should give you an extension on an assignment. I reserve the right to request supporting evidence for your stated need for an extension (only so far as justifying the existence of a good excuse; e.g., I might ask for a note confirming existence of a health issue interfering with attendance or assignment completion, but I don't need to know the details of the particular health issue).

Academic Integrity, Cheating, and Plagiarism

The list of links at the end of the syllabus include a link to the University's academic integrity policy. If you haven't actually read it before, you should, because not meaning to plagiarize is not an excuse for plagiarism. This includes not realizing that something needed to be quoted, or being unfamiliar with the idea that paraphrased sentences still require citation (and possibly quotes), or opting to reuse someone else's words or code because you're not confident in the quality of your own.

The general idea is that you should not submit work that is not your own --- code or written prose --- that is not properly attributed. This includes, but is not limited, to things like putting direct quotes from someone else's writing in quotes and citing the source, and giving the source for small snippets of code you might have taken from StackOverflow or similar. Again, you should read the actual university integrity policy.

The University leaves the penalty for cheating, plagiarism, etc. in a course up to the professor. If you cheat in this class, I will give you an F for the term.
I realize that most cheating is a consequence of poor time management, or unexpected or hard-to-manage obligations beyond the class.
That is exactly why you have late days for homeworks, skips for readings, and the course has a fairly flexible extension policy - I want you to succeed, but I want you to do so honestly. If you have any doubts about whether something might cross the line into cheating, please ask me before you do it. The worst I'll say is "No, don't do that." And I'll be glad you asked. This is far better than an F for the term.

To avoid misunderstandings, please do not share pieces of your assignment code via Discord. General questions like "how do I set up an Actor" or "How do I configure this setting in Akka.NET" are great questions for Discord. "When this core part of my homework code executes it crashes" is not appropriate for Discord (your classmates should never see your code), but a great thing to email the professor with.

Two final notes:

Grade Scale

The following scale will be used to convert points to letter grades:

Grade
97-100 A+ 82-86.99 B 70-71.99 C-
92-96.99 A 80-81.99 B- 67-69.99 D+
90-91.99 A- 77-79.99 C+ 60-66.99 D
87-89.99 B+ 72-76.99 C 0-59.99 F

Note that the instructor may revise this conversion if/when necessary.

Course Schedule

(This schedule is tentative and may change during the course.)

Most weeks attempt to pair:

Currently the syllabus is final up to and including week 6.

Week by week:

  1. Introduction, Overview, Actors
    • No readings due
  2. Challenges and Time in distributed systems
  3. Strong Consistency
  4. Consensus (Paxos, Raft, etc.)
  5. CAP, FLP, and other impossibilities
  6. Weak and Eventual Consistency
  7. Getting Things Right
  8. Large scale data storage and processing: Hadoop & Spark
  9. Distributed Resource Management
  10. Getting Things Right, Part 2

This reading list is tenative. Some readings in Weeks 7-10 will probably change before it's time to read the papers. Possible additional topics or papers include:

Academic Policies

This course follows university, college, and department policies, including but not limited to:

The instructor(s) may, at his/her/their discretion, change any part of the course before or during the term, including assignments, grade breakdowns, due dates, and schedule. Such changes will be communicated to students via the course web site. This web site should be checked regularly and frequently for such changes and announcements.

Students requesting accommodations due to a disability at Drexel University need to request a current Accommodations Verification Letter (AVL) in the ClockWork database before accommodations can be made. These requests are received by Disability Resources (DR), who then issues the AVL to the appropriate contacts. For additional information, visit the DR website at drexel.edu/oed/disabilityResources/overview/, or contact DR for more information by phone at 215.895.1401, or by email at disability@drexel.edu.