https://cacm.acm.org/magazines/2019/8/238344-scaling-static-analyses-at-facebook/fulltext
To industry professionals we say: advanced static analyses, like those found in the research literature, can be deployed at scale and deliver value for general code. And to academics we say: from an industrial point of view the subject appears to have many unexplored avenues, and this provides research opportunities to inform future tools.
Deployments #
“diff time” deployment #
- analyzers participate as bots in code review
- make automatic comments when engineer submits code modification
- this kind of deployment lead to 70% fix rate
- traditional (offline or batch) deployment saw a 0% fix rate
- security related issues are pushed to the security engineer on-call for commenting on code modification
Software Development at Facebook #
- there is a main codebase (master)
- this gets altered by modifications submitted by devs
- CI/CD:
- anaylyses run on the code modification and participate by commenting their findings directly in the code review tool
Reporting #
The actioned reports and missed bugs are related to the classic concepts of true positives and false negatives from the academic static analysis literature. A true positive is a report of a potential bug that can happen in a run of the program in question (whether or not it will happen in practice); a false positive is one that cannot happen.
False positives #
the false positive rate is challenging to measure for a large, rapidly changing codebase: it would be extremely time consuming for humans to judge all reports as false or true as the code is changing.
- don’t focus on true positives and false negatives (even if valuable concepts)
- pay more attention to action rate and the observed missed bugs
Actioned reports #
Observable missed bugs #
- has been observed in some way
- but was not reported by an analysis
Tools #
Tools used by Fb to conduct static analysis
Infer #
Infer has its roots in academic research on program analysis with separation logic,5 research, which led to a startup company (Monoidics Ltd.) that was acquired by Facebook in 2013. Infer was open sourced in 2015 (www.fbinfer.com) and is used at Amazon, Spotify, Mozilla, and other companies.
- targets mobile apps
- applied to Java, Objective C and C++
- processes about 10s of millions of Android and Objective C code
- uses analysis logic based on the theory of Separation Logic
- finds errors related to more than 30 types of issues:
- memory safety
- concurrency (deadlocks and starvation)
- security (information flow)
- custom errors (suggested by Fb devs)
Zocolan #
- mainly does “taint” analysis
- builds a dependency graph that related methods to their potential callers
- uses this graph to schedule parallel analyses of individual methods
- deployed for more than 2 years (in 2019), first to security engineers then to software engineers
- report can trigger the security expert to create tasks
- can process over 100-million lines of Hack code in less than 30 minutes
- implements new modular parallel taint analysis algorithm
Lessons learned #
First run #
First deployment was rather batch than continous:
- run once (per night)
- generate list of issues
- assign issues to devs
Results:
- devs didn’t act on the issues assigned
- Fb reduced the false positive rate (down to 20%) but devs still didn’t take actions on issues
Switch to Diff time #
- the response of engineers was at about 70%
- positive rate didn’t change
- but the impact was bigger when the static analysis was deployed at diff time
Human factors #
The success of the diff time came as no surprise to Fb’s devs:
- mental effort of context switch+
- if dev is working on one problem, and the assigned issue is about another one, they must swap out the mental context of the first problem and swap in the second
- by participating as a bot in the code review process, the context switch was kind of solved
- relevance
- sometimes it’s hard to find the right person to assign issues to
- by commenting on a diff that introduces an issue we have a pretty good chance to find the relevant person