About Me

About Me

I am a software engineer at Stripe working remotely from Corvallis, Oregon. Prior to that, I completed my PhD at the University of Massachusetts Amherst while working in the PLASMA lab with Emery Berger. Feel free to contact me if you want to get in touch!

Research

My research spans software systems, programming languages, and software engineering, with a focus on overcoming challenges posed by web application development. My research interests include, but are not limited to, software debugging, performance, portability, and reliability.

Comedians in cafes getting data: Evaluating timing and adaptivity in real-world robot comedy performance

John Vilk and Naomi T. Fitter

Social robots and autonomous social agents are becoming more ingrained in our everyday lives. Interactive agents from Siri to Anki’s Cozmo robot include the ability to tell jokes to engage users. This ability will build in importance as in-home social agents take on more intimate roles, so it is important to gain a greater understanding of how robots can best use humor. Stand-up comedy provides a naturally-structured experimental context for initial studies of robot humor. In this preliminary work, we aimed to compare audience responses to a robotic stand-up comedian over multiple performances that varied robot timing and adaptivity. Our first study of 22 performances in the wild showed that a robot with good timing was significantly funnier. A second study of 10 performances found that an adaptive performance was not necessarily funnier, although adaptations almost always improved audience perception of individual jokes. The end result of this research provides key clues for how social robots can best engage people with humor.

This work was awarded a Best Paper Award at HRI 2020.

Righting Web Development

John Vilk

This dissertation revisits web development and provides developers with a complete set of development tools with full support for the browser environment. McFly is the first time-traveling debugger for the browser, and lets developers debug web applications and their visual state during time-travel; components of this work shipped in Microsoft’s ChakraCore JavaScript engine. BLeak is the first system for automatically debugging memory leaks in web applications, and provides developers with a ranked list of memory leaks along with the source code responsible for them. BCause constructs a causal graph of a web application’s events, which helps developers understand their code’s behavior. Doppio lets developers run code written in conventional languages in the browser, and Browsix brings Unix into the browser to enable unmodified programs expecting a Unix-like environment to run directly in the browser. Together, these five systems form a solid foundation for web development.

My dissertation was chosen as an Outstanding Doctoral Dissertation by the UMass Amherst College of Information and Computer Sciences.

ReJS: Time-Travel Debugging for Browser-Based Applications

John Vilk, Emery D. Berger, James Mickens, and Mark Marron

Time-traveling debuggers offer the promise of simplifying debugging by letting developers freely step forwards and backwards through a program’s execution. However, web applications present multiple challenges that make time-travel debugging especially difficult. A time-traveling debugger for web applications must accurately reproduce all network interactions, asynchronous events, and visual states observed during the original execution, both while stepping forwards and backwards. This must all be done in the context of a complex and highly multithreaded browser runtime. At the same time, to be practical, a time-traveling debugger must maintain interactive speeds.

This paper presents McFly, the first time-traveling debugger for web applications. McFly departs from previous approaches by operating on a high-level representation of the browser’s internal state. This approach lets McFly provide accurate time-travel debugging - maintaining JavaScript and visual state in sync at all times - at interactive speeds. McFly’s architecture is browser-agnostic, building on web standards supported by all major browsers. We have implemented McFly as an extension to the Microsoft Edge web browser, and core parts of McFly have been integrated into a time-traveling debugger product from Microsoft.

BLeak: Automatically Debugging Memory Leaks in Web Applications

John Vilk and Emery D. Berger

Despite the presence of garbage collection in managed languages like JavaScript, memory leaks remain a serious problem. In the context of web applications, these leaks are especially pervasive and difficult to debug. Web application memory leaks can take many forms, including failing to dispose of unneeded event listeners, repeatedly injecting iframes and CSS files, and failing to call cleanup routines in third-party libraries. Leaks degrade responsiveness by increasing GC frequency and overhead, and can even lead to browser tab crashes by exhausting available memory. Because previous leak detection approaches designed for conventional C, C++ or Java applications are ineffective in the browser environment, tracking down leaks currently requires intensive manual effort by web developers.

This paper introduces BLeak (Browser Leak debugger), the first system for automatically debugging memory leaks in web applications. BLeak’s algorithms leverage the observation that in modern web applications, users often repeatedly return to the same (approximate) visual state (e.g., the inbox view in Gmail). Sustained growth between round trips is a strong indicator of a memory leak. To use BLeak, a developer writes a short script (≈40 LOC) to drive a web application in round trips to the same visual state. BLeak then automatically generates a list of leaks found along with their root causes, ranked by severity. Guided by BLeak, we identify and fix over 50 memory leaks in popular libraries and apps including Airbnb, AngularJS, Google Analytics, Google Maps SDK, and jQuery. BLeak’s median precision is 100%; fixing the leaks it identifies reduces heap growth by an average of 94%, saving from 0.5 MB to 8 MB per round trip.

BLeak appeared at PLDI 2018 and was selected as a SIGPLAN Research Highlight.

Browsix: Bridging the Gap Between Unix and the Browser

Bobby Powers, John Vilk, and Emery D. Berger

Applications written to run on conventional operating systems typically depend on OS abstractions like processes, pipes, signals, sockets, and a shared file system. Porting these applications to the web currently requires extensive rewriting or hosting significant portions of code server-side because browsers present a nontraditional runtime environment that lacks OS functionality.

This paper presents Browsix, a framework that bridges the considerable gap between conventional operating systems and the browser, enabling unmodified programs expecting a Unix-like environment to run directly in the browser. Browsix comprises two core parts: (1) a JavaScript-only system that makes core Unix features (including pipes, concurrent processes, signals, sockets, and a shared file system) available to web applications; and (2) extended JavaScript runtimes for C, C++, Go, and Node.js that support running programs written in these languages as processes in the browser. Browsix supports running a POSIX shell, making it straightforward to connect applications together via pipes.

We illustrate Browsix’s capabilities via case studies that demonstrate how it eases porting legacy applications to the browser and enables new functionality. We demonstrate a Browsix-enabled LaTeX editor that operates by executing unmodified versions of pdfLaTeX and BibTeX. This browser-only LaTeX editor can render documents in seconds, making it fast enough to be practical. We further demonstrate how Browsix lets us port a client-server application to run entirely in the browser for disconnected operation. Creating these applications required less than 50 lines of glue code and no code modifications, demonstrating how easily Browsix can be used to build sophisticated web applications from existing parts without modification.

This work appeared at ASPLOS 2017.

SurroundWeb: Mitigating Privacy Concerns in a 3D Web Browser

John Vilk, David Molnar, Eyal Ofek, Chris Rossbach, Benjamin Livshits, Alexander Moshchuk, Helen J. Wang, and Ran Gal

Immersive experiences that mix digital and real-world objects are becoming reality, but they raise serious privacy concerns as they require real-time sensor input. SurroundWeb is the first 3D web browser that provides the novel functionality of rendering web content onto a room while tackling many of the inherent privacy challenges. Following the principle of least privilege, we propose three abstractions for immersive rendering:

  1. The room skeleton lets applications place content in response to the physical dimensions and locations of renderable surfaces in a room.
  2. The detection sandbox lets applications declaratively place content near recognized objects in the room without revealing if the object is present.
  3. Satellite screens let applications display content across devices registered with SurroundWeb.

We implement these abstractions in a prototype system, and demonstrate that a wide range of immersive experiences can be implemented with acceptable performance.

SurroundWeb appeared at IEEE S&P 2015.

Doppio: Breaking the Browser Language Barrier

John Vilk and Emery D. Berger

Doppio is a JavaScript-based runtime system that makes it possible to run unaltered applications written in general-purpose languages directly inside the browser. Doppio provides a wide range of runtime services, including a file system that enables local and external (cloud-based) storage, an unmanaged heap, sockets, blocking I/O, and multiple threads. We demonstrate Doppio’s usefulness with two case studies: we extend Emscripten with Doppio, letting it run an unmodified C++ application in the browser with full functionality, and present DoppioJVM, an interpreter that runs unmodified JVM programs directly in the browser. This work appeared at PLDI 2014, won the Distinguished Artifact Award, and was selected as a SIGPLAN Research Highlight!

DoppioJVM is currently used by the University of Illinois at CodeMoo.com to teach students how to program in Java, and Doppio is used by the Software Collection at the Internet Archive to bring historical software to life in the browser.

Side Projects

Occasionally, I find the time to work on some fun side projects.

MakeTypes: TypeScript Types from JSON Examples

MakeTypes lets developers typecheck code that interacts with JSON services by generating TypeScript types from JSON examples. MakeTypes can also optionally perform runtime type checking to enforce these types at runtime, guarding code against unexpected service responses.

MakeTypes uses the Common Preferred Shape Relation from Petricek et al. (PLDI 2016).

TypeScript Generator for Dropbox’s Stone

Stone is a language-agnostic specification language for APIs. Stone’s generators can generate SDKs for a variety of languages from a Stone specification, including JavaScript.

I mapped Stone’s type system onto TypeScript’s type system, and implemented a Stone generator that produces TypeScript definition files from a Stone API specification.

This generator provides TypeScript types for the Dropbox JavaScript SDK.

Software Collection at the Internet Archive

The Software Collection at the Internet Archive relies on Doppio’s file system to make a large collection of software playable in the browser. I added support for union mounting into the file system to support layering a mutable file system on top of an immutable zip file, making it possible to load software from a zip file and persist file system changes to in-browser storage.

JSMESS: Multi Emulator Super System in JavaScript

The JSMESS project aims to port the Multi Emulator Super System (MESS) to JavaScript to enable accurate and embeddable vintage computer and video game console emulators in the browser. This project has enabled the Internet Archive to provide its visitors with interactive demos of vintage and historical software for educational purposes. JSMESS uses Emscripten along with tweaks to the MESS source code to accomplish this feat.

JSMESS is now officially part of MESS/MAME.

SDL Joystick Support in Emscripten

I augmented Emscripten’s SDL emulation with support for the SDL Joystick API using the HTML5 Gamepad API. As a result, SDL applications ported to the web with Emscripten, such as JSMESS, will function appropriately with USB gamepads. This code has been integrated into the Emscripten code base, and is present in Emscripten releases since v1.7.3.

Education

  • Ph.D. in Computer Science, University of Massachusetts, Anticipated September, 2018.
  • M.S. in Computer Science, University of Massachusetts, 2014.
  • B.S. in Computer Science, Worcester Polytechnic Institute, May, 2011.

Employment

  • Software Engineer, Stripe, 2018 to present.
  • Research Assistant, University of Massachusetts, 2011 to 2018.
  • Teaching Assistant, University of Massachusetts, Spring 2016.
  • Research Intern, Microsoft Research, Summer 2015.
  • Research Intern, Microsoft Research, Summer 2014.
  • Research Intern, Microsoft Research, Summer 2013.
  • Software Engineering Intern, Google, Summer 2012.
    • With the Java Platform Team.
  • Research Intern, MIT Lincoln Laboratory, Summer 2011.
    • In Division 6: Communication Systems
  • Software Engineering Intern, SoftArtisans Inc., Summer 2010.

Honors & Awards

Professional Service

  • Program Committee Member, PLDI 2020.
  • External Review Committee Member, PLDI 2016, PLDI 2017.
  • Artifact Evaluation Committee Member, OOPSLA 2015, PLDI 2015, OOPSLA 2014.
  • Graduate Representative, University of Massachusetts College of Information and Computer Sciences, 2014-2015.
    • One of four Ph.D. students elected to serve for one year. Attended and voted during faculy meetings, met with faculty candidates, and organized department seminars.
  • Student Volunteer, PLDI 2016, PLDI 2016 PC Meeting, OOPSLA 2014.
  • External Reviewer, USENIX Security 2014, PLDI 2014, OOPSLA 2012, MSPC 2012, HotPar 2012.

Teaching & Mentoring

  • Guest Lecturer, COMPSCI 326: Web Development, University of Massachusetts Amherst, Spring 2016.
  • Teaching Assistant, COMPSCI 326: Web Development, University of Massachusetts Amherst, Spring 2016.
    • As a TA, I co-taught the class with the instructor. I redesigned the curriculum, designed and wrote all of the assignments, graded assignments, held well-attended office hours, and answered hundreds of questions on Piazza. An archived version of the course website and assignments can be found here.
  • Research Mentor
    • Muhammad Bhatti, Undergraduate in Computer Science, NUST, Pakistan, Summer 2015.
    • Giles Lavelle, Undergraduate in Computer Science, University of Bristol, Summer 2013.
      • Remotely supervised contributions to Doppio via Google Summer of Code. Extended Doppio’s file system with cloud storage support.
    • Braden McDorman, Undergraduate in Computer Science, University of Oklahoma, Summer 2013.
      • Remotely supervised contributions to Doppio via Google Summer of Code. Extended Doppio with support for outgoing TCP sockets.
  • Mentor, Computing Beyond the Double Bind Mentoring Network, 2015-2016.
    • Provided remote guidance and support to undergraduate women of color in computer science.