documentation (386405f7) | Commits | gwj / at91sam9263

Commit 386405f78661e0a4f82087196c7b084b8c612b48

Authored by bellard 2003-03-23 21:28:45 +0000

documentation


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@45 c046a42c-6fe2-441c-8c8c-71466251a162

Inline Side-by-side

Showing 1 changed file with 305 additions and 0 deletions

qemu-doc.texi 0 → 100644

View file @386405f

	1	+\input texinfo @c -- texinfo --
	2	+
	3	+@settitle QEMU x86 Emulator Reference Documentation
	4	+@titlepage
	5	+@sp 7
	6	+@center @titlefont{QEMU x86 Emulator Reference Documentation}
	7	+@sp 3
	8	+@end titlepage
	9	+
	10	+@chapter Introduction
	11	+
	12	+QEMU is an x86 processor emulator. Its purpose is to run x86 Linux
	13	+processes on non-x86 Linux architectures such as PowerPC or ARM. By
	14	+using dynamic translation it achieves a reasonnable speed while being
	15	+easy to port on new host CPUs. An obviously interesting x86 only process
	16	+is 'wine' (Windows emulation).
	17	+
	18	+QEMU features:
	19	+
	20	+@itemize
	21	+
	22	+@item User space only x86 emulator.
	23	+
	24	+@item Currently ported on i386 and PowerPC.
	25	+
	26	+@item Using dynamic translation for reasonnable speed.
	27	+
	28	+@item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation.
	29	+User space LDT and GDT are emulated.
	30	+
	31	+@item Generic Linux system call converter, including most ioctls.
	32	+
	33	+@item clone() emulation using native CPU clone() to use Linux scheduler for threads.
	34	+
	35	+@item Accurate signal handling by remapping host signals to virtual x86 signals.
	36	+
	37	+@item The virtual x86 CPU is a library (@code{libqemu}) which can be used
	38	+in other projects.
	39	+
	40	+@item An extensive Linux x86 CPU test program is included @file{tests/test-i386}.
	41	+It can be used to test other x86 virtual CPUs.
	42	+
	43	+@end itemize
	44	+
	45	+Current QEMU Limitations:
	46	+
	47	+@itemize
	48	+
	49	+@item Not all x86 exceptions are precise (yet). [Very few programs need that].
	50	+
	51	+@item Not self virtualizable (yet). [You cannot launch qemu with qemu on the same CPU].
	52	+
	53	+@item No support for self modifying code (yet). [Very few programs need that, a notable exception is QEMU itself !].
	54	+
	55	+@item No VM86 mode (yet), althought the virtual
	56	+CPU has support for most of it. [VM86 support is useful to launch old 16
	57	+bit DOS programs with dosemu or wine].
	58	+
	59	+@item No SSE/MMX support (yet).
	60	+
	61	+@item No x86-64 support.
	62	+
	63	+@item Some Linux syscalls are missing.
	64	+
	65	+@item The x86 segment limits and access rights are not tested at every
	66	+memory access (and will never be to have good performances).
	67	+
	68	+@item On non x86 host CPUs, @code{double}s are used instead of the non standard
	69	+10 byte @code{long double}s of x86 for floating point emulation to get
	70	+maximum performances.
	71	+
	72	+@end itemize
	73	+
	74	+@chapter Invocation
	75	+
	76	+In order to launch a Linux process, QEMU needs the process executable
	77	+itself and all the target (x86) dynamic libraries used by it. Currently,
	78	+QEMU is not distributed with the necessary packages so that you can test
	79	+it easily on non x86 CPUs.
	80	+
	81	+However, the statically x86 binary 'tests/hello' can be used to do a
	82	+first test:
	83	+
	84	+@example
	85	+qemu tests/hello
	86	+@end example
	87	+
	88	+@code{Hello world} should be printed on the terminal.
	89	+
	90	+If you are testing it on a x86 CPU, then you can test it on any process:
	91	+
	92	+@example
	93	+qemu /bin/ls -l
	94	+@end example
	95	+
	96	+@chapter QEMU Internals
	97	+
	98	+@section QEMU compared to other emulators
	99	+
	100	+Unlike bochs [3], QEMU emulates only a user space x86 CPU. It means that
	101	+you cannot launch an operating system with it. The benefit is that it is
	102	+simpler and faster due to the fact that some of the low level CPU state
	103	+can be ignored (in particular, no virtual memory needs to be emulated).
	104	+
	105	+Like Valgrind [2], QEMU does user space emulation and dynamic
	106	+translation. Valgrind is mainly a memory debugger while QEMU has no
	107	+support for it (QEMU could be used to detect out of bound memory accesses
	108	+as Valgrind, but it has no support to track uninitialised data as
	109	+Valgrind does). Valgrind dynamic translator generates better code than
	110	+QEMU (in particular it does register allocation) but it is closely tied
	111	+to an x86 host.
	112	+
	113	+EM86 [4] is the closest project to QEMU (and QEMU still uses some of its
	114	+code, in particular the ELF file loader). EM86 was limited to an alpha
	115	+host and used a proprietary and slow interpreter (the interpreter part
	116	+of the FX!32 Digital Win32 code translator [5]).
	117	+
	118	+@section Portable dynamic translation
	119	+
	120	+QEMU is a dynamic translator. When it first encounters a piece of code,
	121	+it converts it to the host instruction set. Usually dynamic translators
	122	+are very complicated and highly CPU dependant. QEMU uses some tricks
	123	+which make it relatively easily portable and simple while achieving good
	124	+performances.
	125	+
	126	+The basic idea is to split every x86 instruction into fewer simpler
	127	+instructions. Each simple instruction is implemented by a piece of C
	128	+code (see @file{op-i386.c}). Then a compile time tool (@file{dyngen})
	129	+takes the corresponding object file (@file{op-i386.o}) to generate a
	130	+dynamic code generator which concatenates the simple instructions to
	131	+build a function (see @file{op-i386.h:dyngen_code()}).
	132	+
	133	+In essence, the process is similar to [1], but more work is done at
	134	+compile time.
	135	+
	136	+A key idea to get optimal performances is that constant parameters can
	137	+be passed to the simple operations. For that purpose, dummy ELF
	138	+relocations are generated with gcc for each constant parameter. Then,
	139	+the tool (@file{dyngen}) can locate the relocations and generate the
	140	+appriopriate C code to resolve them when building the dynamic code.
	141	+
	142	+That way, QEMU is no more difficult to port than a dynamic linker.
	143	+
	144	+To go even faster, GCC static register variables are used to keep the
	145	+state of the virtual CPU.
	146	+
	147	+@section Register allocation
	148	+
	149	+Since QEMU uses fixed simple instructions, no efficient register
	150	+allocation can be done. However, because RISC CPUs have a lot of
	151	+register, most of the virtual CPU state can be put in registers without
	152	+doing complicated register allocation.
	153	+
	154	+@section Condition code optimisations
	155	+
	156	+Good CPU condition codes emulation (@code{EFLAGS} register on x86) is a
	157	+critical point to get good performances. QEMU uses lazy condition code
	158	+evaluation: instead of computing the condition codes after each x86
	159	+instruction, it store justs one operand (called @code{CC_CRC}), the
	160	+result (called @code{CC_DST}) and the type of operation (called
	161	+@code{CC_OP}).
	162	+
	163	+@code{CC_OP} is almost never explicitely set in the generated code
	164	+because it is known at translation time.
	165	+
	166	+In order to increase performances, a backward pass is performed on the
	167	+generated simple instructions (see
	168	+@code{translate-i386.c:optimize_flags()}). When it can be proved that
	169	+the condition codes are not needed by the next instructions, no
	170	+condition codes are computed at all.
	171	+
	172	+@section Translation CPU state optimisations
	173	+
	174	+The x86 CPU has many internal states which change the way it evaluates
	175	+instructions. In order to achieve a good speed, the translation phase
	176	+considers that some state information of the virtual x86 CPU cannot
	177	+change in it. For example, if the SS, DS and ES segments have a zero
	178	+base, then the translator does not even generate an addition for the
	179	+segment base.
	180	+
	181	+[The FPU stack pointer register is not handled that way yet].
	182	+
	183	+@section Translation cache
	184	+
	185	+A 2MByte cache holds the most recently used translations. For
	186	+simplicity, it is completely flushed when it is full. A translation unit
	187	+contains just a single basic block (a block of x86 instructions
	188	+terminated by a jump or by a virtual CPU state change which the
	189	+translator cannot deduce statically).
	190	+
	191	+[Currently, the translated code is not patched if it jumps to another
	192	+translated code].
	193	+
	194	+@section Exception support
	195	+
	196	+longjmp() is used when an exception such as division by zero is
	197	+encountered. The host SIGSEGV and SIGBUS signal handlers are used to get
	198	+invalid memory accesses.
	199	+
	200	+[Currently, the virtual CPU cannot retrieve the exact CPU state in some
	201	+exceptions, although it could except for the @code{EFLAGS} register].
	202	+
	203	+@section Linux system call translation
	204	+
	205	+QEMU includes a generic system call translator for Linux. It means that
	206	+the parameters of the system calls can be converted to fix the
	207	+endianness and 32/64 bit issues. The IOCTLs are converted with a generic
	208	+type description system (see @file{ioctls.h} and @file{thunk.c}).
	209	+
	210	+@section Linux signals
	211	+
	212	+Normal and real-time signals are queued along with their information
	213	+(@code{siginfo_t}) as it is done in the Linux kernel. Then an interrupt
	214	+request is done to the virtual CPU. When it is interrupted, one queued
	215	+signal is handled by generating a stack frame in the virtual CPU as the
	216	+Linux kernel does. The @code{sigreturn()} system call is emulated to return
	217	+from the virtual signal handler.
	218	+
	219	+Some signals (such as SIGALRM) directly come from the host. Other
	220	+signals are synthetized from the virtual CPU exceptions such as SIGFPE
	221	+when a division by zero is done (see @code{main.c:cpu_loop()}).
	222	+
	223	+The blocked signal mask is still handled by the host Linux kernel so
	224	+that most signal system calls can be redirected directly to the host
	225	+Linux kernel. Only the @code{sigaction()} and @code{sigreturn()} system
	226	+calls need to be fully emulated (see @file{signal.c}).
	227	+
	228	+@section clone() system call and threads
	229	+
	230	+The Linux clone() system call is usually used to create a thread. QEMU
	231	+uses the host clone() system call so that real host threads are created
	232	+for each emulated thread. One virtual CPU instance is created for each
	233	+thread.
	234	+
	235	+The virtual x86 CPU atomic operations are emulated with a global lock so
	236	+that their semantic is preserved.
	237	+
	238	+@section Bibliography
	239	+
	240	+@table @asis
	241	+
	242	+@item [1]
	243	+@url{http://citeseer.nj.nec.com/piumarta98optimizing.html}, Optimizing
	244	+direct threaded code by selective inlining (1998) by Ian Piumarta, Fabio
	245	+Riccardi.
	246	+
	247	+@item [2]
	248	+@url{http://developer.kde.org/~sewardj/}, Valgrind, an open-source
	249	+memory debugger for x86-GNU/Linux, by Julian Seward.
	250	+
	251	+@item [3]
	252	+@url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project,
	253	+by Kevin Lawton et al.
	254	+
	255	+@item [4]
	256	+@url{http://www.cs.rose-hulman.edu/~donaldlf/em86/index.html}, the EM86
	257	+x86 emulator on Alpha-Linux.
	258	+
	259	+@item [5]
	260	+@url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/full_papers/chernoff/chernoff.pdf},
	261	+DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton
	262	+Chernoff and Ray Hookway.
	263	+
	264	+@end table
	265	+
	266	+@chapter Regression Tests
	267	+
	268	+In the directory @file{tests/}, various interesting x86 testing programs
	269	+are available. There are used for regression testing.
	270	+
	271	+@section @file{hello}
	272	+
	273	+Very simple statically linked x86 program, just to test QEMU during a
	274	+port to a new host CPU.
	275	+
	276	+@section @file{test-i386}
	277	+
	278	+This program executes most of the 16 bit and 32 bit x86 instructions and
	279	+generates a text output. It can be compared with the output obtained with
	280	+a real CPU or another emulator. The target @code{make test} runs this
	281	+program and a @code{diff} on the generated output.
	282	+
	283	+The Linux system call @code{modify_ldt()} is used to create x86 selectors
	284	+to test some 16 bit addressing and 32 bit with segmentation cases.
	285	+
	286	+@section @file{testsig}
	287	+
	288	+This program tests various signal cases, including SIGFPE, SIGSEGV and
	289	+SIGILL.
	290	+
	291	+@section @file{testclone}
	292	+
	293	+Tests the @code{clone()} system call (basic test).
	294	+
	295	+@section @file{testthread}
	296	+
	297	+Tests the glibc threads (more complicated than @code{clone()} because signals
	298	+are also used).
	299	+
	300	+@section @file{sha1}
	301	+
	302	+It is a simple benchmark. Care must be taken to interpret the results
	303	+because it mostly tests the ability of the virtual CPU to optimize the
	304	+@code{rol} x86 instruction and the condition code computations.
	305	+
...	...

gwj / at91sam9263 · Commits

GitLab

documentation