Commit 0a6b7b7813799f76e1859387688611af05db376c

Authored by bellard
1 parent b314f270

update

git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4581 c046a42c-6fe2-441c-8c8c-71466251a162
Showing 2 changed files with 64 additions and 86 deletions
tcg/README
... ... @@ -16,14 +16,18 @@ from the host, although it is never the case for QEMU.
16 16  
17 17 A TCG "function" corresponds to a QEMU Translated Block (TB).
18 18  
19   -A TCG "temporary" is a variable only live in a given
20   -function. Temporaries are allocated explicitly in each function.
  19 +A TCG "temporary" is a variable only live in a basic
  20 +block. Temporaries are allocated explicitly in each function.
21 21  
22   -A TCG "global" is a variable which is live in all the functions. They
23   -are defined before the functions defined. A TCG global can be a memory
24   -location (e.g. a QEMU CPU register), a fixed host register (e.g. the
25   -QEMU CPU state pointer) or a memory location which is stored in a
26   -register outside QEMU TBs (not implemented yet).
  22 +A TCG "local temporary" is a variable only live in a function. Local
  23 +temporaries are allocated explicitly in each function.
  24 +
  25 +A TCG "global" is a variable which is live in all the functions
  26 +(equivalent of a C global variable). They are defined before the
  27 +functions defined. A TCG global can be a memory location (e.g. a QEMU
  28 +CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
  29 +or a memory location which is stored in a register outside QEMU TBs
  30 +(not implemented yet).
27 31  
28 32 A TCG "basic block" corresponds to a list of instructions terminated
29 33 by a branch instruction.
... ... @@ -32,11 +36,11 @@ by a branch instruction.
32 36  
33 37 3.1) Introduction
34 38  
35   -TCG instructions operate on variables which are temporaries or
36   -globals. TCG instructions and variables are strongly typed. Two types
37   -are supported: 32 bit integers and 64 bit integers. Pointers are
38   -defined as an alias to 32 bit or 64 bit integers depending on the TCG
39   -target word size.
  39 +TCG instructions operate on variables which are temporaries, local
  40 +temporaries or globals. TCG instructions and variables are strongly
  41 +typed. Two types are supported: 32 bit integers and 64 bit
  42 +integers. Pointers are defined as an alias to 32 bit or 64 bit
  43 +integers depending on the TCG target word size.
40 44  
41 45 Each instruction has a fixed number of output variable operands, input
42 46 variable operands and always constant operands.
... ... @@ -44,14 +48,12 @@ variable operands and always constant operands.
44 48 The notable exception is the call instruction which has a variable
45 49 number of outputs and inputs.
46 50  
47   -In the textual form, output operands come first, followed by input
48   -operands, followed by constant operands. The output type is included
49   -in the instruction name. Constants are prefixed with a '$'.
  51 +In the textual form, output operands usually come first, followed by
  52 +input operands, followed by constant operands. The output type is
  53 +included in the instruction name. Constants are prefixed with a '$'.
50 54  
51 55 add_i32 t0, t1, t2 (t0 <- t1 + t2)
52 56  
53   -sub_i64 t2, t3, $4 (t2 <- t3 - 4)
54   -
55 57 3.2) Assumptions
56 58  
57 59 * Basic blocks
... ... @@ -62,9 +64,8 @@ sub_i64 t2, t3, $4 (t2 &lt;- t3 - 4)
62 64 - Basic blocks start after the end of a previous basic block, at a
63 65 set_label instruction or after a legacy dyngen operation.
64 66  
65   -After the end of a basic block, temporaries at destroyed and globals
66   -are stored at their initial storage (register or memory place
67   -depending on their declarations).
  67 +After the end of a basic block, the content of temporaries is
  68 +destroyed, but local temporaries and globals are preserved.
68 69  
69 70 * Floating point types are not supported yet
70 71  
... ... @@ -100,7 +101,7 @@ optimizations:
100 101 is suppressed.
101 102  
102 103 - A liveness analysis is done at the basic block level. The
103   - information is used to suppress moves from a dead temporary to
  104 + information is used to suppress moves from a dead variable to
104 105 another one. It is also used to remove instructions which compute
105 106 dead results. The later is especially useful for condition code
106 107 optimization in QEMU.
... ... @@ -113,47 +114,6 @@ optimizations:
113 114  
114 115 only the last instruction is kept.
115 116  
116   -- A macro system is supported (may get closer to function inlining
117   - some day). It is useful if the liveness analysis is likely to prove
118   - that some results of a computation are indeed not useful. With the
119   - macro system, the user can provide several alternative
120   - implementations which are used depending on the used results. It is
121   - especially useful for condition code optimization in QEMU.
122   -
123   - Here is an example:
124   -
125   - macro_2 t0, t1, $1
126   - mov_i32 t0, $0x1234
127   -
128   - The macro identified by the ID "$1" normally returns the values t0
129   - and t1. Suppose its implementation is:
130   -
131   - macro_start
132   - brcond_i32 t2, $0, $TCG_COND_EQ, $1
133   - mov_i32 t0, $2
134   - br $2
135   - set_label $1
136   - mov_i32 t0, $3
137   - set_label $2
138   - add_i32 t1, t3, t4
139   - macro_end
140   -
141   - If t0 is not used after the macro, the user can provide a simpler
142   - implementation:
143   -
144   - macro_start
145   - add_i32 t1, t2, t4
146   - macro_end
147   -
148   - TCG automatically chooses the right implementation depending on
149   - which macro outputs are used after it.
150   -
151   - Note that if TCG did more expensive optimizations, macros would be
152   - less useful. In the previous example a macro is useful because the
153   - liveness analysis is done on each basic block separately. Hence TCG
154   - cannot remove the code computing 't0' even if it is not used after
155   - the first macro implementation.
156   -
157 117 3.4) Instruction Reference
158 118  
159 119 ********* Function call
... ... @@ -241,6 +201,10 @@ t0=t1|t2
241 201  
242 202 t0=t1^t2
243 203  
  204 +* not_i32/i64 t0, t1
  205 +
  206 +t0=~t1
  207 +
244 208 ********* Shifts
245 209  
246 210 * shl_i32/i64 t0, t1, t2
... ... @@ -428,3 +392,34 @@ to apply more optimizations because more registers will be free for
428 392 the generated code.
429 393  
430 394 The exception model is the same as the dyngen one.
  395 +
  396 +6) Recommended coding rules for best performance
  397 +
  398 +- Use globals to represent the parts of the QEMU CPU state which are
  399 + often modified, e.g. the integer registers and the condition
  400 + codes. TCG will be able to use host registers to store them.
  401 +
  402 +- Avoid globals stored in fixed registers. They must be used only to
  403 + store the pointer to the CPU state and possibly to store a pointer
  404 + to a register window. The other uses are to ensure backward
  405 + compatibility with dyngen during the porting a new target to TCG.
  406 +
  407 +- Use temporaries. Use local temporaries only when really needed,
  408 + e.g. when you need to use a value after a jump. Local temporaries
  409 + introduce a performance hit in the current TCG implementation: their
  410 + content is saved to memory at end of each basic block.
  411 +
  412 +- Free temporaries and local temporaries when they are no longer used
  413 + (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
  414 + should free it after it is used. Freeing temporaries does not yield
  415 + a better generated code, but it reduces the memory usage of TCG and
  416 + the speed of the translation.
  417 +
  418 +- Don't hesitate to use helpers for complicated or seldom used target
  419 + intructions. There is little performance advantage in using TCG to
  420 + implement target instructions taking more than about twenty TCG
  421 + instructions.
  422 +
  423 +- Use the 'discard' instruction if you know that TCG won't be able to
  424 + prove that a given global is "dead" at a given program point. The
  425 + x86 target uses it to improve the condition codes optimisation.
... ...
tcg/TODO
1   -- test macro system
  1 +- Add new instructions such as: andnot, ror, rol, setcond, clz, ctz,
  2 + popcnt.
2 3  
3   -- test conditional jumps
  4 +- See if it is worth exporting mul2, mulu2, div2, divu2.
4 5  
5   -- test mul, div, ext8s, ext16s, bswap
6   -
7   -- generate a global TB prologue and epilogue to save/restore registers
8   - to/from the CPU state and to reserve a stack frame to optimize
9   - helper calls. Modify cpu-exec.c so that it does not use global
10   - register variables (except maybe for 'env').
11   -
12   -- fully convert the x86 target. The minimal amount of work includes:
13   - - add cc_src, cc_dst and cc_op as globals
14   - - disable its eflags optimization (the liveness analysis should
15   - suffice)
16   - - move complicated operations to helpers (in particular FPU, SSE, MMX).
17   -
18   -- optimize the x86 target:
19   - - move some or all the registers as globals
20   - - use the TB prologue and epilogue to have QEMU target registers in
21   - pre assigned host registers.
  6 +- Support of globals saved in fixed registers between TBs.
22 7  
23 8 Ideas:
24 9  
25 10 - Move the slow part of the qemu_ld/st ops after the end of the TB.
26 11  
27   -- Experiment: change instruction storage to simplify macro handling
28   - and to handle dynamic allocation and see if the translation speed is
29   - OK.
30   -
31   -- change exception syntax to get closer to QOP system (exception
  12 +- Change exception syntax to get closer to QOP system (exception
32 13 parameters given with a specific instruction).
  14 +
  15 +- Add float and vector support.
... ...