prev

next

of 45

View

245Download

8

Embed Size (px)

- Slide 1
- Pointer analysis
- Slide 2
- Pointer Analysis Outline: What is pointer analysis Intraprocedural pointer analysis Interprocedural pointer analysis Andersen and Steensgaard
- Slide 3
- Pointer and Alias Analysis Aliases: two expressions that denote the same memory location. Aliases are introduced by: pointers call-by-reference array indexing C unions
- Slide 4
- Useful for what? Improve the precision of analyses that require knowing what is modified or referenced (eg const prop, CSE ) Eliminate redundant loads/stores and dead stores. Parallelization of code can recursive calls to quick_sort be run in parallel? Yes, provided that they reference distinct regions of the array. Identify objects to be tracked in error detection tools x := *p;... y := *p; // replace with y := x? *x :=...; // is *x dead? x.lock();... y.unlock(); // same object as x?
- Slide 5
- Kinds of alias information Points-to information (must or may versions) at program point, compute a set of pairs of the form p ! x, where p points to x. can represent this information in a points-to graph Alias pairs at each program point, compute the set of of all pairs (e 1,e 2 ) where e 1 and e 2 must/may reference the same memory. Storage shape analysis at each program point, compute an abstract description of the pointer structure. p x y zp
- Slide 6
- Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
- Slide 7
- Flow functions x := a + b in out F x := a+b (in) = x := k in out F x := k (in) =
- Slide 8
- Flow functions x := &y in out F x := &y (in) = x := y in out F x := y (in) =
- Slide 9
- Flow functions *x := y in out F *x := y (in) = x := *y in out F x := *y (in) =
- Slide 10
- Intraprocedural Points-to Analysis Flow functions:
- Slide 11
- Pointers to dynamically-allocated memory Handle statements of the form: x := new T One idea: generate a new variable each time the new statement is analyzed to stand for the new location:
- Slide 12
- Example l := new Cons p := l t := new Cons *p := t p := t
- Slide 13
- Example solved l := new Cons p := l t := new Cons *p := t p := t l p V1 l p tV2 l p V1 t V2 l t V1 p V2 l t V1 p V2 l t V1 p V2V3 l t V1 p V2V3 l t V1 p V2V3
- Slide 14
- What went wrong? Lattice was infinitely tall! Instead, we need to summarize the infinitely many allocated objects in a finite way. introduce summary nodes, which will stand for a whole class of allocated objects. For example: For each new statement with label L, introduce a summary node loc L, which stands for the memory allocated by statement L. Summary nodes can use other criterion for merging.
- Slide 15
- Example revisited & solved S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 Iter 1Iter 2Iter 3
- Slide 16
- Example revisited & solved S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 l t S1 p S2 l t S1 p S2 l t S1 p S2 l t S1 p S2 l t S1 p S2 l t S1 p S2 l t S1 p S2 Iter 1Iter 2Iter 3
- Slide 17
- Array aliasing, and pointers to arrays Array indexing can cause aliasing: a[i] aliases b[j] if: a aliases b and i = j a and b overlap, and i = j + k, where k is the amount of overlap. Can have pointers to elements of an array p := &a[i];...; p++; How can arrays be modeled? Could treat the whole array as one location. Could try to reason about the array index expressions: array dependence analysis.
- Slide 18
- Fields Can summarize fields using per field summary for each field F, keep a points-to node called F that summarizes all possible values that can ever be stored in F Can also use allocation sites for each field F, and each allocation site S, keep a points-to node called (F, S) that summarizes all possible values that can ever be stored in the field F of objects allocated at site S.
- Slide 19
- Summary We just saw: intraprocedural points-to analysis handling dynamically allocated memory handling pointers to arrays But, intraprocedural pointer analysis is not enough. Sharing data structures across multiple procedures is one the big benefits of pointers: instead of passing the whole data structures around, just pass pointers to them (eg C pass by reference). So pointers end up pointing to structures shared across procedures. If you dont do an interproc analysis, youll have to make conservative assumptions functions entries and function calls.
- Slide 20
- Conservative approximation on entry Say we dont have interprocedural pointer analysis. What should the information be at the input of the following procedure: global g; void p(x,y) {... } xyg
- Slide 21
- Conservative approximation on entry Here are a few solutions: xyg locations from alloc sites prior to this invocation global g; void p(x,y) {... } They are all very conservative! We can try to do better. x,y,g & locations from alloc sites prior to this invocation
- Slide 22
- Interprocedural pointer analysis Main difficulty in performing interprocedural pointer analysis is scaling One can use a bottom-up summary based approach (Wilson & Lam 95), but even these are hard to scale
- Slide 23
- Cost: space: store one fact at each prog point time: iteration S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 l t S1 p L2 l t S1 p S2 l t S1 p S2 l t S1 p S2 l t L1 p L2 l t S1 p S2 l t S1 p S2 Iter 1Iter 2Iter 3 Example revisited
- Slide 24
- New idea: store one dataflow fact Store one dataflow fact for the whole program Each statement updates this one dataflow fact use the previous flow functions, but now they take the whole program dataflow fact, and return an updated version of it. Process each statement once, ignoring the order of the statements This is called a flow-insensitive analysis.
- Slide 25
- Flow insensitive pointer analysis S1: l := new Cons p := l S2: t := new Cons *p := t p := t
- Slide 26
- Flow insensitive pointer analysis S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2
- Slide 27
- Flow sensitive vs. insensitive S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t S1 p S2 Flow-sensitive SolnFlow-insensitive Soln l t S1 p S2
- Slide 28
- What went wrong? What happened to the link between p and S1? Cant do strong updates anymore! Need to remove all the kill sets from the flow functions. What happened to the self loop on S2? We still have to iterate!
- Slide 29
- Flow insensitive pointer analysis: fixed S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 l t S1 p L2 l t S1 p S2 l t S1 p S2 l t S1 p S2 l t L1 p L2 l t S1 p S2 Iter 1Iter 2Iter 3 l t S1 p S2 Final result This is Andersens algorithm 94
- Slide 30
- Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t S1 p S2 Flow-sensitive SolnFlow-insensitive Soln l t S1 p S2
- Slide 31
- Flow insensitive loss of precision Flow insensitive analysis leads to loss of precision! main() { x := &y;... x := &z; } Flow insensitive analysis tells us that x may point to z here! However: uses less memory (memory can be a big bottleneck to running on large programs) runs faster
- Slide 32
- Worst case complexity of Andersen *x = y x abc y def x abc y def Worst case: N 2 per statement, so at least N 3 for the whole program. Andersen is in fact O(N 3 )
- Slide 33
- New idea: one successor per node Make each node have only one successor. This is an invariant that we want to maintain. x a,b,c y d,e,f *x = y x a,b,c y d,e,f
- Slide 34
- x *x = y y More general case for *x = y
- Slide 35
- x *x = y yxyxy More general case for *x = y
- Slide 36
- x x = *y y Handling: x = *y
- Slide 37
- x x = *y yxyxy Handling: x = *y
- Slide 38
- x x = y y x = &y xy Handling: x = y (what about y = x?) Handling: x = &y
- Slide 39
- x x = y yxyxy x = &y xyx y, xy Handling: x = y (what about y = x?) Handling: x = &y get the same for y = x
- Slide 40
- Our favorite example, once more! S1: l := new Cons p := l S2: t := new Cons *p := t p := t 1 2 3 4 5
- Slide 41
- Our favorite example, once more! S1: l := new Cons p := l S2: t := new Cons *p := t p := t l S1 t S2 p l S1 l p l t S2 p l S1,S2 tp 1 2 3 4 5 12 3 l S1 t S2 p 4 5
- Slide 42
- Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t S1 p S2 Flow-sensitive Subset-based Flow-insensitive Subset-based l t S1 p S2 l S1,S2 tp Flow-insensitive Unification- based
- Slide 43
- bar