Points-to analysis and the real world
One of the pieces of GCC I maintain is the alias analysis, including our field-sensitive points-to analysis. Points-to analysis is one of the most published about areas of compilers, and yet it is rare to find papers that are actually applicable to real world compilers.
To give some idea of numbers, there are probably 5-10 peer reviewed journal/conference published papers on points-to analysis every year. I read all of them. I also read all the research that Google, Citeseer, and other methods can find.
Most of it, sadly is not useful, for a variety of reasons. Those that concentrate on speeding up points-to analysis rarely contribute things that make a real difference. There are a variety of reasons for this:
- Taking shortcuts - A lot of papers deal with speedups that just aren’t sound (leaving out effects of external function calls, assuming random things about variables, visibility, whatever, instead of just handling the full language) and they leave it as an exercise to the reader to make what they’ve done sound. In every case I’ve implemented one of these unsound algorithms, making it sound lost almost all speedup the algorithm purported to give.
- Points-to analysis research that focuses on Java - Java is rather easy as a language to write pointer analysis for. You can’t take the address of fields, you know which functions are really doing allocation of objects, etc. I’ve seen plenty of techniques that work great for huge java programs (which I guess is useful to static java compilers, but JIT’s just aren’t really going to play in this space until memory and time requirements are ridiculously low. They aren’t there yet). These techniques are essentially useless when applied to C, C++, or other common languages, simply because they use language specific assumptions that no longer apply.
IMHO, It would be much better to do work on pointer analysis, that can handle a language like C, and then apply it to Java by making it simpler, rather than coming up with algorithms that only work on Java, and can’t be made to work on something with more complications without losing all the benefit the work done was supposed to bring. - Comparison to ridiculousness - This is more a combined criticism. It turns out a lot of the people write their own implementations of the algorithms they compare against, and it’s obvious they only put a minimal amount of engineering work into them. You then see comparisons against really dumb implementations of other algorithms, or using really bad data structures, and claims of speedups, when if you look at good implementations of the other algorithm, you see the time is usually the same as the new algorithm claims. Part of this is another problem - a lot of papers lack source code that implements their work (to the credit of many researchers, they are almost always happy to give source, but it would be nice to just make it available with the paper, and put in a URL in the paper to the source). It’s interesting to note that for the really good algorithms I have found, when I looked at the source, they often had really good implementations of the algorithms they compared against.
- Misguided notions of what is practical - I think this more or less speaks for itself, but I have read pointer analysis papers that claim “practical” algorithms, where their idea of practical is to take > 1 hour or > 4 gig of memory to analyze 100k of code. You can always claim users are willing to wait, or we shouldn’t run expensive algorithms everywhere. The problem with these claims are:
- That only about 0.1% of users are willing to wait.
- Making algorithms that only run sometimes are a maintenance burden and a can of worms for bug reproduction = unlikely to be implemented without incredibly significant benefit.
- There are algorithms without these requirements that have just as good precision.
This is not to say all the research I read is not applicable, and not a dig at, for example, people doing Java pointer analysis. I have seen good work on Java pointer analysis, where authors explain how to make it applicable to other languages, or at least have some cognizance that it is not useful outside the realm of Java. I am simply pointing out that if you want your work to be used in production compilers, it needs to be implementable without having to spend another 3 years trying to get back the speed you claim when you transform the algorithm to be sound for C/C++/etc.
Just to give examples of good pointer analysis research i’ve seen in the past few years, the work done by Paul H J Kelly and David J Pearce on field-sensitive pointer analysis for C now forms the basis for the algorithm used by GCC. In addition, Calvin Lin and Ben Hardekopf have done an amazing job of attacking the real problems of memory usage and time of points-to algorithms, using implementations they plan on making freely available. Most of their work concentrates not on better solving algorithms, but on very cheap offline detection of much larger classes of pointer and location equivalences. There are plenty of other people doing good work here (particularly in the area of scaling context sensitive analysis), I am just pointing out the ones I remember off the top of my head.
May 14th, 2007 at 5:27 pm
In a nutshell: ideally, successful design anticipates all relevant and possible ways failure can occur. Some people have a paranoidic approach to design, and they imagine that the impossible could happen. Failure in design is often non-quantitative, especially in software engineering, where accidental complexities are essentially non-numerical.
Has anyone ever published historical case studies on these design failures in optimizing compilers? I wouldn’t think so, because generally you implement an algorithm, it sucks, and you throw it away. Although these sorts of failures are not as dramatic as the events at IBM that contributed to the “Software Crisis”, there are probably some lessons that can be learned.
October 2nd, 2007 at 7:34 am
Hi Dan,
It seems that you’re really found of pointer analysis papers.
I really liked your position on current research works.
Since I’m starting research on this topic, it would be great if you could share your state-of-the-art bibliography on pointer analysis (for C, Java, etc.).
Many thanks in advance,
Cheers