Monday, November 8, 2021

Peer-to-peer applications with Urbit

This article gives an overview of the architecture of Urbit applications. I spent a weekend trying out Urbit, reading documentation, and digging through the source code. I'm always on the lookout for the next milestone system that will change the internet and computing landscape. In particular, I find decentralized and peer-to-peer systems interesting because I have a sense that the internet is not quite right. It could be better if only someone could figure out how to do that and make it mainstream.

Urbit is an operating system and network designed to give users control by running applications on personal servers instead of on centralized servers operated by the application creators. This means data is stored on personal servers and is not immediately accessible to application creators. Both the Urbit operating system and network run on top of existing computing infrastructure. It's not a baremetal operating system (it runs under Linux, macOS, and Windows) or a new Layer 3 network protocol (it uses UDP). If you want to know more there is an overview of Urbit here.

The operating function

The Urbit kernel, Arvo, is a single-function operating system in the sense of purely functional programming. The operating system function takes the previous state and input events and produces the next state and output events. This means that the state of the system can be saved after each invocation. If there is a power failure or the system otherwise stops execution it's easy to resume it later from the last state.

Urbit has a virtual machine and runtime that supports this programming environment. The low-level language is called Nock and the higher-level language is called Hoon. I haven't investigated them in detail, but they appear to support deterministic purely functional programming with I/O and other side-effects kept outside via monads and passing around inputs like the current time.

Applications

Applications, also called agents, follow the purely functional model where they produce the next state as their result. Agents expose their services in three ways:

  1. Peek, a read-only query that fetches data without changing state.
  2. Poke, a command and response similar to a Remote Procedure Call (RPC).
  3. Subscriptions, a stream of updates that may be delivered over time until the subscription is closed.

For example, an application that keeps a counter can define a poke interface for incrementing the counter and a peek interface for querying its value. A subscription can be used to receive an update whenever the counter changes.

Urbit supports peeks, pokes, and subscriptions over the network. This is how applications running on different personal servers can communicate. If we want to replicate a remote counter we can subscribe to it and then poke our local counter every time an update is received. This replication model leads to the store/hook/view architecture, a way of splitting applications into components that support local state, remote replication, and a user interface. In our counter example the store would be the counter, the hook would be the code that replicates remote counters, and the view would provide any logic needed for the user interface to control the counter.

Interacting with the outside world

User interfaces for applications are typically implemented in Landscape, a web-based user interface for interacting with Urbit from your browser. The user interface can be a modern JavaScript application that communicates with the agent running inside Urbit via the HTTP JSON API. This API supports peeks, pokes, and subscriptions. In other words, the application's backend is implemented as an Urbit agent while the frontend is a regular client-side web application.

Of course there are also APIs for data storage, networking, HTTP, etc. For example, the weather widget in Landscape fetches the weather from a web service using an HTTP request.

Urbit also supports peer discovery so you can resolve the funny IDs like ~bitbet-bolbel and establish connections to remote Urbit instances. The IDs are allocated hierarchically and ultimately registered on the Ethereum blockchain.

Criticisms

Keep in mind I only spent a weekend investigating Urbit so I don't understand the full system and could be wrong about what I have described. Also, I've spent a lot of time and am therefore invested in Linux and conventional programming environments. Feedback from the Urbit community is welcome, just send me an email or message me on IRC or Matrix.

The application and network model is intended for personal servers. I don't think people want personal servers. It's been tried before by Sandstorm, FreedomBox, and various projects without mainstream success. I think a more interesting model for the powerful devices we run today is one without any "server" at all. Instead of having an always-on server that is hosted somewhere, apps should be able to replicate and sync directly between a laptop and a phone. Having the option to run a personal server for always-on services like chat rooms or file hosting is nice, but many things don't need this. I wish Urbit was less focussed on personal servers and more on apps that replicate and sync directly between each other.

Urbit is obfuscated by the most extreme not invented here (NIH) syndrome I have ever seen. I tried to keep the terminology at a minimum in this article, so it might not be obvious unless you dive into the documentation or source code yourself. Not only is most of it a reinvention of existing stuff but it also uses new terminology for everything. It was disappointing to find that what first appeared like an alien system that might hold interesting discoveries was just a quirky reimplementation of existing concepts.

It will be difficult for Urbit to catch on as a platform since it has no common terminology with existing programming environments. If you want to write an app for Urbit using the Hoon programming language you'll have to wade through a lot of NIH at every level of the stack (programming language, operating system, APIs). There is an argument that reinventing everything allows the whole system to be small and self-contained, but in practice that's not true since Landscape apps are JavaScript web applications. They drag in the entire conventional computing environment that Urbit was supposed to replace. I wonder if the same kind of system can be built on top of a browser plus Deno with WebRTC for the server side, reusing existing technology that is actively developed by teams much larger than Urbit. It seems like a waste because Urbit doesn't really innovate in VMs, programming languages, etc yet the NIH stuff needs to be maintained.

Finally, for a system that is very much exposed to the network, I didn't see a strong discipline or much support for writing secure applications. The degree of network transparency that Urbit agents have also means that they present an attack surface. I would have expected the documentation and APIs/tooling to steer developers in a direction where it's hard to make mistakes. My impression is that a lot of the attack surface in agents is hand coded and security issues could become commonplace when Urbit gains more apps written by a wider developer community.

Despite this criticism I really enjoyed playing with Urbit. It is a cool rabbit hole to go down.

Conclusion

Urbit applications boil down to a relatively familiar interface similar to what can be done with gRPC: command/response, querying data, and subscriptions. The Urbit network allows applications to talk to each other directly in a peer-to-peer fashion. Users run apps on personal servers instead of centralized servers operated by the application creators (like Twitter, Facebook, etc). If Urbit can attract enough early adopters then it could become an interesting operating system and ecosystem that overcomes some of the issues of today's centralized internet. If you're wondering, I think it's worth spending a weekend exploring Urbit!

Friday, October 15, 2021

A new approach to usermode networking with passt

There is a new project called passt that Stefano Brivio has been working on to implement usermode networking, the magic that forwards network packets between QEMU guests and the host network.

passt is designed as a replacement for QEMU's --netdev user (also known as slirp), a feature that is commonly used but not really considered production-ready. What passt improves on is security and performance, finally making usermode networking production-ready. That's all you need to know to try it out but I thought the internals of how passt works are interesting, so this article explains the approach.

Why usermode networking is necessary

Guests send and receive Ethernet frames through emulated network interface cards like virtio-net. Those packets need to be injected into the host network but popular operating systems don't provide an API for sending and receiving Ethernet frames because that poses a security risk (spoofing) or could simply interfere with other applications on the host.

Actually that's not quite true, operating systems do provide specialized APIs for injecting Ethernet frames but they come with limitations. For example, the Linux tun/tap driver requires additional network configuration steps as well as administrator privileges. Sometimes it's not possible to take advantage of tap due to these limitations and we really need a solution for unprivileged users. That's what usermode networking is about.

Transmuting Ethernet frames to Socket API calls

Since an unprivileged user cannot inject Ethernet frames into the host network, we have to make due with the POSIX Sockets API that is available to unprivileged users. Each Ethernet frame sent by the guest needs to be converted into equivalent Sockets API calls on the host so that the desired effect is achieved even though we weren't able to transmit the original Ethernet frame byte-for-byte. Incoming packets from the external network need to be received via the Sockets API and repackaged into Ethernet frames that the guest network interface card can receive.

In networking parlance this conversion between Ethernet frames and Sockets API calls is a Layer 2 (Data Link Layer)/Layer 4 (Transport Layer) conversion. The Ethernet frames have additional packet headers including the Ethernet header, IP header, and the TCP/UDP header that the Sockets API calls don't include. Careful use of the Sockets API makes it possible to synthesize Ethernet frames that are similar enough to the original ones that the guest can communicate successfully.

For the most part this conversion requires parsing and building, respectively, packet headers in a straightforward way. The TCP protocol makes things more interesting though because a TCP connection involves non-trivial state that is normally handled by the TCP/IP stack. For example, data sent over a TCP connection might arrive out of order or some chunks may have been dropped. There is also a state machine for the TCP connection lifecycle including its famous three-way handshake. This means TCP connections must be carefully tracked so that these low-level protocol features can be simulated correctly.

How passt works

Passt runs as an unprivileged host userspace process that is connected to QEMU through --netdev socket, a way to transfer Ethernet frames from QEMU's network interface card emulation to another process like passt. When passt reads an Ethernet frame like a UDP message from the guest it sends the data onwards through an equivalent AF_INET SOCK_DGRAM socket on the host. It also keeps the socket open so replies can be read on the host and then packaged into Ethernet frames that are written to the guest. The effect of this is that guest network communication appears like it's coming from the passt process on the host and integrates nicely into host networking.

How TCP works is a bit more interesting. Since TCP connections require acknowledgement messages for reliable delivery, passt uses the recvmmsg(2) MSG_PEEK flag to fetch data while keeping it queued in the host network stack's rcvbuf until the guest acknowledges it. This avoids extra buffer management code in passt and is part of its strategy of implementing only a subset of TCP. There is no need to duplicate the full TCP/IP stack since the host and guest already have them, but achieving this requires detailed knowledge of TCP so that passt can maintain just enough state.

Incoming connections are handled by port forwarding. This means passt can bind to port 2222 on the host and forward connections to port 22 inside the guest. This is very useful for usermode networking since the user may not have permission to bind to low-numbered ports on the host or there might already be host services listening on those ports. If you don't want to mess with port forwarding you can use passt's all mode, which simply listens on all non-ephemeral ports (basically a brute force approach).

A few basic network protocols are necessary for network communication: ARP, ICMP, DHCP, DNS, and IPv6 services. Passt offers these because the guest cannot talk to those services on the host network directly. They can be disabled when the guest has knowledge of the network configuration and doesn't need them.

Why passt is unique

Thanks to running in a separate process from QEMU and by taking a minimalist approach, passt is able to tighten security. Its seccomp filters are stronger than anything the larger QEMU process could do. The code is clean and specifically designed for security and simplicity. I think writing passt in C was a missed opportunity. Some users may rule it out entirely for this reason. Userspace parsing of untrusted network packets should be done in a memory-safe programming language nowadays. Nevertheless, it's a step up from slirp, which has a poor track record of security issues and runs as part of the QEMU process.

I'm excited to see how passt performs in comparison to slirp. passt uses techniques like recvmmsg(2)/sendmmsg(2) to batch message transfer and reads multiple Ethernet frames from the guest in a single syscall to amortize the cost of syscalls over multiple packets. There is no dynamic memory allocation and packet headers are pre-populated to minimize the number of CPU cycles spent in the data path. While this is promising, QEMU's --netdev socket isn't the fastest (packets first take a trip through QEMU's net subsystem queues), but still a good trade-off between performance and simplicity/security. Based on reading the code, I think passt will be faster than slirp but I haven't benchmarked it myself.

There is another mode that passt supports for containers instead of virtualization. Although it's not relevant to QEMU, this so-called pasta mode is a cool feature for container networking. In this mode pasta connects a network namespace with the outside world (init namespace) through a tap device. This might become passt's killer feature, because the same software can be used for both virtualization and containers, so why bother investing in two separate solutions?

Conclusion

Passt is a promising replacement for slirp (on Linux hosts at least). It looks like there will finally be a production-ready usermode networking feature for QEMU that is fast and secure. Passt's functionality is generic enough that other projects besides QEMU will be able to use it, which is great because this kind of networking code is non-trivial to develop. I look forward to passt becoming available for use with QEMU in Linux distributions soon!

Thursday, September 16, 2021

KVM Forum 2021 Highlights

Here are highlights from this year's KVM Forum conference, the yearly event around QEMU, KVM, and related communities like VIRTIO and rust-vmm.

Video recordings will be posted soon. In the meantime, here are short summaries of what I learnt. You can find slides to many of these talks in the links below.

vDPA

vDPA is a Linux driver framework for developing hybrid hardware/software VIRTIO devices.

In Hyperscale vDPA, Jason Wang covered ways to create fine-grained virtual devices for virtual machines and containers using vDPA. This means offering a way to slice up a physical device into many virtual devices with some aspects of the virtual device handled in hardware and others in software. This is the direction that networking and accelerator devices are heading in and is actively being discussed in the VIRTIO community. Many different approaches are possible and Jason's talk enumerates some of them:

  • An interface for management commands (a virtio-pci capability, a management virtqueue)
  • DMA isolation (PCIe PASID, a platform-independent device MMU)
  • More than 2048 MSI-X interrupts (virtio-pci capability for VIRTIO-specific MSI-X tables)

Another new vDPA project was presented by Yongji Xie. VDUSE - vDPA Device in Userspace showed how vDPA devices can be implemented in userspace. Although this is roughly the same use case as vhost-user, it has the unique advantage of allowing containers and bare metal to attach devices. An untrusted userspace process emulates the device and the host kernel can either present a vhost device to virtual machines or attach to the userspace device. This is a neat way to develop software devices that can also benefit container workloads.

Stefano Garzarella covered the new unified virtio-blk storage stack in vdpa-blk: Unified Hardware and Software Offload for virtio-blk. The goal is to support hardware virtio-blk devices, an optimized host kernel software device, and still offer QEMU block layer features like qcow2 images. This allows the fast path to go directly to hardware or an optimized in-kernel device while software storage features can still be used when desired via a slow path.

vfio-user

VFIO User - Using VFIO as the IPC Protocol in Multi-process QEMU focussed on the new out-of-process device interface that John Johnson, Jagannathan Raman, and Elena Ufimtseva have been working on together with others. This new protocol allows PCI (and perhaps other busses in the future) devices to be implemented as separate processes. QEMU communicates with the device over a UNIX domain socket. The approach is similar to vhost-user except the protocol messages are based on the Linux VFIO ioctl interface instead of the vhost ioctls.

While vhost-user has been in use for a number of years for VIRTIO-based devices, vfio-user now makes it possible to implement non-VIRTIO devices as separate processes. There were several other talks about vfio-user at KVM Forum 2021 that you can also check out:

virtiofs

In Towards High-availability for Virtio-fs, Jiachen Zheng and Yongji Xie explained how they extended virtiofs to handle crash recovery and live updates. These features are challenging for any program with a lot of state because care must to taken to maintain a consistent snapshot to resume from in the case of a restart. They tackled this by storing Linux file handles and a journal in a shm file. This required some changes to QEMU's virtiofsd data structures that makes them suitable for storing in shm and a journal that makes it possible to provide idempotency for operations like mkdir that would otherwise fail if replayed.

Virtual IOMMUs

Suravee Suthikulpanit and Wei Huang gave a talk titled Analysis of AMD HW-assisted vIOMMU Implementation and Performance. AMD is working on a hardware implementation of a virtual IOMMU that allows guests to specify DMA permissions for guest memory. This functionality is important for VFIO device assignment within guests, for example. Although it can be done in software via emulation of real IOMMUs or the virtio-iommu device that was designed specifically for virtual machines, implementing the vIOMMU in real hardware has performance advantages. One interesting feature of the hardware-assisted vIOMMU is that it natively supports encrypted memory for AMD SEV-SNP guests, something that is slow and clumsy to do in software.

Friday, July 2, 2021

Slides available for "Bring Your Own Virtual Devices: Frameworks for Software and Hardware Device Virtualization"

The PDF slides for my "Bring Your Own Virtual Devices: Frameworks for Software and Hardware Device Virtualization" talk from the 16th Workshop on Virtualization in High-Performance Cloud Computing are now available.

This talk covers out-of-process device interfaces including vhost (kernel), vhost-user, Linux VFIO, mdev, vfio-user, vDPA, and VDUSE. It gives a brief overview of each interface, how it works, and how to develop your own devices.

The growing number of out-of-process device interfaces available in QEMU/KVM can make it hard to understand and compare them. Each of these interfaces is designed for different use cases. For example, whether you want to pass through hardware or implement the device in software, if you want to implement your device in the host kernel or in host userspace, etc. This talk will give you the necessary knowledge to compare these interfaces yourself so you can decide which one is most appropriate for your use case.

For more information about the design of out-of-process device interfaces, see also my previous blog post about requirements for out-of-process devices.

Sunday, June 20, 2021

My performance benchmarking workflow (2021)

Benchmarking computer systems is time-consuming because setting up the necessary environment involves a lot of work. Over time I have built a workflow that mitigates the cost of setting up benchmarks and allows me to analyze performance more effectively. This blog post covers my workflow as of 2021.

Performance investigations often follow these steps:

  1. Set up hardware and software.
  2. Run initial benchmarks to verify that the bottleneck under investigation is being triggered.
  3. Collect a full set of benchmark results and monitoring data.
  4. Analyze results and form a hypothesis about the bottleneck.
  5. Implement a proof-of-concept optimization to test the hypothesis.
  6. Go to Step 3 until the desired benchmark results are reached, keeping those optimizations that helped.

This is a long process that is costly to pause/resume or replicate again in the future. Setting up hardware and software manually is both time-consuming and error-prone. Therefore we don't want to do it more than once. There is a risk that replicating the benchmark on another machine will fail to produce identical results due to differences in environments.

The consequence of high-overhead processes is that we minimize their use since we cannot afford to run through the process as often as we'd like. This means we cannot answer all the performance questions we'd like to and therefore our understanding is limited. We cannot discover all the truths that would enable us to make performance improvements.

A more lightweight process would encourage experimentation and lead to higher productivity.

An ideal workflow

In a low-overhead world I would like to do the following:

  1. Set up hardware and software once only and be able to return to that state again in the future at the press of a button.
  2. Capture the full benchmarking environment so the configuration can be inspected and modified easily.
  3. Store benchmark results so that each run is available for further analysis in the future.

The workflow is actually quite similar to developing code with git:

  1. Create a topic branch for this performance investigation.
  2. Add an environment definition to produce the desired hardware and software state.
  3. Run the benchmark and collect the results.
  4. Commit the environment and results.
  5. Go to Step 2 to modify the environment (e.g. apply proof-of-concept patches to software) and repeat.

Since the performance investigation is captured in a git branch it's easy to switch to another investigation without losing history.

This is actually what I do! Git provides the storage and time machine functionality for easily pausing/resuming or replicating performance investigations.

Ansible

Ansible provides the automation system necessary to put hardware and software into the desired state for benchmarking. Ansible's killer feature is the large ecosystem of modules for handling tasks like installing packages, configuring virtual machines and containers, etc. I find Ansible more productive than Python or shell scripting thanks to Ansible's modules collection.

I've begun collecting Ansible tasks for Linux KVM development in virt-tasks. If you're wondering what the configuration for running a benchmark looks like, here is an Ansible playbook that builds QEMU and a guest kernel, creates a Fedora 34 virtual machine, runs the fio disk I/O benchmark, and collects the results:

---
- hosts: hosts
  tasks:
    - include_tasks: tasks/build-qemu.yml
      vars:
        - repo: https://gitlab.com/qemu-project/qemu.git
        - version: v6.0.0

    - name: create disk image
      include_tasks: tasks/virt-builder-create-image.yml
      vars:
        - os_version: fedora-34
        - size: 32G
        - output: /var/lib/libvirt/images/test.img
        - format: raw

    - name: build guest kernel
      include_tasks: tasks/build-kernel.yml
      vars:
        - repo: https://gitlab.com/stefanha/linux.git
        - version: cpuidle-haltpoll-virtqueue
        - config_src_path: files/.config

    - name: stop vm
      virt:
        name: test
        state: shutdown
      ignore_errors: yes

    - name: start vm
      include_tasks: tasks/start-vm.yml
      vars:
        - xml: "{{ lookup('file', 'files/test.xml') }}"
        - host: 192.168.122.192

- hosts: vms
  tasks:
    - name: install fio and rsync
      dnf:
        state: present
        name:
          - fio
          - rsync

    - name: run fio
      script: files/fio.sh

    - name: fetch fio output files
      synchronize:
        src: fio-output/
        dest: notebook/fio-output/poll_source-off
        mode: pull

    - name: run fio
      script: files/fio.sh --enable

    - name: fetch fio output files
      synchronize:
        src: fio-output/
        dest: notebook/fio-output/poll_source-on
        mode: pull

- hosts: hosts
  tasks:
    - name: stop vm
      virt:
        name: test
        state: shutdown

An important point is that the Ansible playbook sets up the full environment, runs all benchmarks, and collects the results. Unlike a playbook that runs a single benchmark this one runs the full suite so that the playbook captures the entire environment that produced the results. This distinction is important when tweaking the benchmark configuration or trying out proof-of-concept optimizations. Each git commit needs to encompass the full environment so that the performance investigation is reproducible and can be resumed in the future.

Jupyter

I recently started using JupyterLab notebooks for data analysis. It provides a convenient environment for graphing results and organizing them in documents.

Thanks to the official Jupyter container images you can get a full Python data analysis environment running with just one command:

$ podman run --userns=keep-id -e JUPYTER_ENABLE_LAB=yes -p 8888:8888 --rm -v "$PWD":/home/jovyan/work:z jupyter/scipy-notebook

So far I have only scratched the surface of JupyterLab. It works well for visualizing data although the way I currently use it is not much different from writing a Python matplotlib script and running it from the command-line. In time I'll get a better appreciation for its strengths and weaknesses.

Conclusion

A git-based workflow that automates benchmark setup and stores the results goes a long way towards mitigating the high overhead of performance investigations. If I'm interrupted or need to switch to a different machine it's easy to resume the investigation. Having the results and details of the environment stored together makes it possible to revisit benchmark runs in the future to reproduce or tweak them. The combination of git, Ansible, and Jupyter achieves this workflow quite well, but if you're familiar with other tools I'd love to hear!

Monday, April 5, 2021

Learning programming languages

You may be productive and comfortable in one programming language but find the idea of learning a new programming language daunting. Or you may know and use multiple programming languages but haven't learnt a new one in a while. Or you might be a programming language geek who is just curious about how others dive into new programming languages and get productive quickly. No matter how easy or difficult it is for you to engage in new programming languages, this article explains how I like to learn new programming languages. Although people learn best in different ways, I hope you'll find my thought process interesting even if you decide to take a different approach.

Background

Language N+1

This article isn't aimed at learning to program. Learning your first programming language is much harder than learning an additional one. The reason is that many abstract concepts are involved in computer programming. When you first encounter programming, most languages require you to understand concepts like iteration, scopes, (im)mutability, arrays, modules, functions, and much more. The good news is that when you learn an additional language you'll already be familiar with common concepts and can therefore take a more streamlined approach in order to get up to speed quickly.

Courses, videos, exercises

There is a lot of educational material online that teaches various programming languages, but I don't find structured courses, videos, or exercises efficient. If you already know common programming concepts and have an idea of what you want to build in the new programming language, then it's more efficient to chart your own course. Materials that let you jump/skip around will let you focus on information that is novel and that you actually need. Working through a series of exercises that someone else designed may be time spent practicing the wrong things since usually you are the one with the best idea of what to practice. Courses, videos, and exercises tend to be an "on the rails" experience where you are exposed to information in a linear fashion whether it's useful at the moment or not.

1. Understanding the computational model

The first question about a new programming language is "what is its computational model?". Sadly, many language manuals and websites do not describe the computational model beyond what programming paradigms are supported (object-oriented, concatenative, functional, logic programming, etc). The actual computational model may only become fully apparent later. Or it might be expressed in too much detail in a language standards document to be of use early on. In any case, it's worthwhile reading the programming language's website for information on the computational model to grasp the big picture.

It's the computational model that you need to understand in order to write programs. Often we think about syntax and language features too much when learning a new language. The computational model informs us how to break down requirements into programs. We approach logic programming differently from object-oriented programming in how we organize data and code. The syntax and to an extent even the language features don't matter.

Understanding the computational model also helps you situate the new programming language relative to others, especially programming languages that you already know. It will give you an idea of how different programming will be and where you'll need to learn new concepts.

2. The language tutorial

After familiarizing yourself with the computational model of the programming language, the next step is to learn the basic syntax and concepts. Most modern programming languages have an official tutorial available online. The tutorial introduces the language elements, usually with short examples, and its table of contents gives an overview of what the language consists of. The tutorial can be completed in a few hours or days. Unlike full courses, official programming language tutorials often lend themselves to non-linear reading, which is helpful when certain aspects of the language are already familiar or will not be relevant to you.

I remember reading the Python tutorial in an afternoon years ago, but watch out: at this point you might be able to write valid syntax but you won't be writing idiomatic code yet. There's that saying "you can write FORTRAN in any language". In order to write programs that are expressed naturally and take advantage of the language effectively, more effort will be necessary.

3. Writing toy programs

After becoming aware of the language elements the next step is to explore how the language works. This can be done by writing small programs. Often these toy programs are familiar tasks you've already solved in other languages. If you want to write games, maybe it's Pong. If you write web applications, it could be a todo list. There are lots of different well-known programs to write.

During the course of writing toy programs you'll encounter syntax errors or issues with the program. Learning to interpret common error messages is important because they will come up in more complicated scenarios later where it can be harder to resolve them if you haven't seen them before.

You'll also hit common tasks for which you need to find solutions in the standard library or language reference manual. Whether it's parsing command-line options, regular expression matching, HTTP requests, or error handling, the language probably has a way of doing it. Toy programs present a simple environment in which to explore the basic facilities of a programming language.

4. Gaining a deeper appreciation for the language

Once you have written some toy programs you'll be able to start writing your own programs that solve new problems. At this stage you start being productive but there is still more to learn. In particular, the language's idioms and patterns must be studied in order to write natural code. Once I have experience with the basics of a language I like to read the source code to the standard library, popular libraries, and popular applications. In the beginning this is hard because they use unfamiliar language features or library dependencies, but after following up on unknown parts of one program, you'll find it becomes easier to read other programs because your knowledge of the language has expanded.

At this point it is also worth looking for style guides, manuals on language idioms, and documentation on common gotchas or anti-patterns. These will provide the information about thinking natively in the new programming language. This is what's needed to become fluent in the language and capable of reading and writing real programs confidently.

Although I have presented steps in a linear order, learning complex subjects is often an iterative process. Sometimes I find myself jumping back and forth between steps as my understanding evolves.

Conclusion

Learning a new programming language is time-consuming no matter how you do it. However, it doesn't all need to happen upfront and after a few days of reading the documentation and experimenting with toy programs, it's possible to peform basic tasks. Learning how to use a language effectively by studying popular programs and reading guides is the quickest way I've found to reaching fluency. Finally, it just takes practice!

Friday, March 12, 2021

Overcoming fear of public communication in open source

When given the choice between communicating in private or in public, many people opt for private communication. They send questions or unfinished patches to individuals instead of posting on mailing lists, forums, or chat rooms. This post explains why public communication is a faster and more efficient way for developers to communicate in open source communities than private communication. A big factor in the private vs public decision is psychological and I've provided a checklist to give you confidence when communicating in public.

Time and time again I find people initiating discussions through private channels when I know it would be advantageous to have them in public instead. I even keep an email reply template handy asking the sender to reach out to the public mailing list instead of communicating with me in private. Communicating in public is a good default unless you need confidentiality (security bugs, business reasons, etc). So why do people prefer to ask questions or send patches off-list?

Why we fear public communication

I'm interested in understanding why people avoid public communications channels in open source communities. The purpose of mailing lists, chat rooms, and forums is to engage in discussions and share knowledge. Members of these communities are interested in the topic and want to engage in discussion. But there are some common reasons I've found why people don't make use of public communications channels:

  • I don't want to create noise. Mailing lists are home to important discussions between experienced members of the community. Sending a relatively simple question or unfinished patch that no expert would need to ask can make you doubt whether it deserves to be sent at all. This is a fallacy. As long as the question or patches are relevant to the community and you have done your homework (see the checklist below), there is no need to worry about creating noise.
  • I will look dumb or seem like a bad programmer. When you need help it's likely that your understanding is incomplete. We often hold ourselves to artifically high standards when asking for help in public, yet when we observe others interacting in public we don't hold their questions or unfinished code against them. Only if they show a repeated pattern of sloppy work does it damage their reputation. Asking for help in public won't harm your image.
  • Too much traffic. Big open source communities are often so active that no single person can keep track of everything that is going on. Even core members of the community rely on filtering only topics relevant to them. Since filtering becomes necessary at scale anyway, there's no need to worry about sending too many messages.

A note about tone: in the past people were more likely to be put off by the unfriendly tone in chat rooms and mailing lists. Over time the tone seems to have improved in general, probably due to multiple factors like open source becoming more professional, codes of conduct making participants aware of their behavior, etc. I don't want to cover the pros and cons of these factors, but suffice to say that nowadays tone is less of a problem and that's a good thing for everyone.

Why public communication is faster and more efficient

Next let's look at the reasons why public communication is beneficial:

Including the community from the start avoids waste

If you ask for help in private it's possible that the answer you get will not end up being consensus in the community. When you eventually send patches to the community they might disagree with the approach you settled on in private and you'll have to redo your work to get the patches merged. This can be avoided by including the community from the start.

Similarly, you can avoid duplicating work when multiple people are investigating the same problem in private without knowing about each other. If you discuss what you are doing in public then you can collaborate and avoid spending time creating multiple solutions, only one of which can be merged.

Public discussions create a searchable knowledge base

If you find a solution to your problem in private, that doesn't help others who have the same question. Public discussions are typically archived and searchable on the web. When the next person has the same question they will find the answer online and won't need to ask at all!

Additionally, everyone benefits and learns from each other when questions are asked in public. We soak up knowledge by following these discussions. If they are private then we don't have this opportunity.

Get a reply even if the person you asked is unavailable

The person you intended to reach may be away due to timezones, holidays, etc. A private discussion is blocked until they respond to your message. Public discussions, on the other hand, can progress even when the original participants are unavailable. You can get an answer to your question faster by asking in public.

Public activity gives visibility to your work

Private discussions are invisible to your teammates, managers, and other people affected by your progress. When you communicate in public they will be able plan better, help out when needed, and give you credit for the effort you are putting in.

A checklist before you press Send

Hopefully I have encouraged you to go ahead and ask questions or send unfinished patches in public. Here is a checklist that will help you feel confident about communicating in public:

Before asking questions...

  1. Have you searched the web, documentation, and code?
  2. Have you added printfs, GDB breakpoints, or enabled tracing to understand the behavior of the system?

Before sending patches...

  1. Is the code formatted according to the coding standard?
  2. Are the error cases handled, memory allocation/ownership correctly implemented, and thread safety addressed? Most of the time you can figure these out yourself and forgetting to do so may distract from the topics you want help with.
  3. Did you add todo comments pointing out unfinished aspects of the code? Being upfront about what is missing helps readers understand the status of the code and saves them time trying to distinguish requirements you forgot from things that are simply not yet implemented.
  4. Do the commit descriptions explain the purpose of the code changes and does the cover letter give an overview of the patch series?
  5. Are you using git-format-patch --subject-prefix RFC (same for git-publish) to mark your patches as a "request for comment"? This tells people you are seeking input on unfinished code.

Finally, do you know who to CC on emails or mention in comments/chats? Look up the relevant maintainers or active developers using git-log(1), scripts/get_maintainer.pl, etc.

Conclusion

Private communication can be slower, less efficient, and adds less value to an open source community. For many people public communication feels a little scary and they prefer to avoid it. I hope that by understanding the advantages of public communication you will be motivated to use public communication channels more.