The Log: What every software engineer should know about real-time data's unifying abstraction

Book Details

  • Full Title: The Log: What every software engineer should know about real-time data's unifying abstraction

  • Author: Jay Kreps

  • ISBN/URL: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying; https://web.archive.org/web/20190913020020/https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

  • Reading Period: 2019.08–2019.09.13

  • Source: LinkedIn recommendation

General Review

  • This is an in-depth article explaining the use of log as the medium for data transfer / message brokering in a distributed system.

    • The specific system is Kafka.

    • In particular, the article discusses the problems faced by LinkedIn prior to Kafka, and how Kafka enabled centralize communications across different services producing and consuming data/messages.

  • The article also contains numerous references to various other similar and related technologies.

Specific Takeaways

  • The log is a useful abstraction for centralize data management across multiple producers and consumers of data.

  • When designing new systems, consider using a centralized log for all data, and keep any new sources of data centralized.

To Internalize Now

  • Nil

To Learn/Do Soon

  • More about Kafka, and how it may be useful

  • Pick up a book on Kafka.

  • Read up on RabbitMq.

  • Distributed systems

To Revisit When Necessary

  • How various opensource technologies fit together (or overlap) in the whole data pipeline / system:

    • Zookeeper, Helix, Curator, Mesos, YARN, Lucene, LevelDB, Netty, Jetty, Finagle, rest.li, Avro, Protocol Buffers, Thrift, Kafka, Bookeeper

  • Comprehensive list of references (academic papers, systems, talks and blogs) on:

    • stream processing

    • distributed systems

  • What are the similar problems faced in enterprise software engineering, including:

    • Event Sourcing

    • Change Data Capture

    • Complex Event Processing (CEP)

    • Enterprise Service Bus

Other Resources Referred To