Blame - docs/FAQ.rst - platform_external_llvm

blob: ef8b0c886bfaf29bf91827697cc09ba9e51caf55 [file] [log] [blame]

Michael J. Spencer	0ed5cf4	2012-06-18 20:21:38 +0000	[diff] [blame]	1	================================
				2	Frequently Asked Questions (FAQ)
				3	================================
				4
				5	.. contents::
				6	:local:
				7
				8
				9	License
				10	=======
				11
				12	Does the University of Illinois Open Source License really qualify as an "open source" license?
				13	-----------------------------------------------------------------------------------------------
				14	Yes, the license is `certified
				15	<http://www.opensource.org/licenses/UoI-NCSA.php>`_ by the Open Source
				16	Initiative (OSI).
				17
				18
				19	Can I modify LLVM source code and redistribute the modified source?
				20	-------------------------------------------------------------------
				21	Yes. The modified source distribution must retain the copyright notice and
Sylvestre Ledru	0bab80e	2016-07-28 09:28:58 +0000	[diff] [blame]	22	follow the three bulleted conditions listed in the `LLVM license
Michael J. Spencer	0ed5cf4	2012-06-18 20:21:38 +0000	[diff] [blame]	23	<http://llvm.org/svn/llvm-project/llvm/trunk/LICENSE.TXT>`_.
				24
				25
				26	Can I modify the LLVM source code and redistribute binaries or other tools based on it, without redistributing the source?
				27	--------------------------------------------------------------------------------------------------------------------------
				28	Yes. This is why we distribute LLVM under a less restrictive license than GPL,
				29	as explained in the first question above.
				30
				31
				32	Source Code
				33	===========
				34
				35	In what language is LLVM written?
				36	---------------------------------
				37	All of the LLVM tools and libraries are written in C++ with extensive use of
				38	the STL.
				39
				40
				41	How portable is the LLVM source code?
				42	-------------------------------------
				43	The LLVM source code should be portable to most modern Unix-like operating
				44	systems. Most of the code is written in standard C++ with operating system
				45	services abstracted to a support library. The tools required to build and
				46	test LLVM have been ported to a plethora of platforms.
				47
				48	Some porting problems may exist in the following areas:
				49
				50	* The autoconf/makefile build system relies heavily on UNIX shell tools,
				51	like the Bourne Shell and sed. Porting to systems without these tools
				52	(MacOS 9, Plan 9) will require more effort.
				53
Sean Silva	bdb0c0a	2012-12-27 10:23:04 +0000	[diff] [blame]	54	What API do I use to store a value to one of the virtual registers in LLVM IR's SSA representation?
				55	---------------------------------------------------------------------------------------------------
				56
				57	In short: you can't. It's actually kind of a silly question once you grok
				58	what's going on. Basically, in code like:
				59
				60	.. code-block:: llvm
				61
				62	%result = add i32 %foo, %bar
				63
				64	, ``%result`` is just a name given to the ``Value`` of the ``add``
				65	instruction. In other words, ``%result`` is the add instruction. The
				66	"assignment" doesn't explicitly "store" anything to any "virtual register";
				67	the "``=``" is more like the mathematical sense of equality.
				68
				69	Longer explanation: In order to generate a textual representation of the
				70	IR, some kind of name has to be given to each instruction so that other
				71	instructions can textually reference it. However, the isomorphic in-memory
				72	representation that you manipulate from C++ has no such restriction since
				73	instructions can simply keep pointers to any other ``Value``'s that they
				74	reference. In fact, the names of dummy numbered temporaries like ``%1`` are
				75	not explicitly represented in the in-memory representation at all (see
				76	``Value::getName()``).
Michael J. Spencer	0ed5cf4	2012-06-18 20:21:38 +0000	[diff] [blame]	77
Michael J. Spencer	0ed5cf4	2012-06-18 20:21:38 +0000	[diff] [blame]	78
				79	Source Languages
				80	================
				81
				82	What source languages are supported?
				83	------------------------------------
Michael J. Spencer	0ed5cf4	2012-06-18 20:21:38 +0000	[diff] [blame]	84
Wilfred Hughes	0ba30ee	2016-03-12 00:43:26 +0000	[diff] [blame]	85	LLVM currently has full support for C and C++ source languages through
				86	`Clang <http://clang.llvm.org/>`_. Many other language frontends have
				87	been written using LLVM, and an incomplete list is available at
				88	`projects with LLVM <http://llvm.org/ProjectsWithLLVM/>`_.
Michael J. Spencer	0ed5cf4	2012-06-18 20:21:38 +0000	[diff] [blame]	89
				90
				91	I'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators?
				92	----------------------------------------------------------------------------------------------------------------------------------------
				93	Your compiler front-end will communicate with LLVM by creating a module in the
				94	LLVM intermediate representation (IR) format. Assuming you want to write your
				95	language's compiler in the language itself (rather than C++), there are 3
				96	major ways to tackle generating LLVM IR from a front-end:
				97
				98	1. **Call into the LLVM libraries code using your language's FFI (foreign
				99	function interface).**
				100
				101	* for: best tracks changes to the LLVM IR, .ll syntax, and .bc format
				102
				103	* for: enables running LLVM optimization passes without a emit/parse
				104	overhead
				105
				106	* for: adapts well to a JIT context
				107
				108	* against: lots of ugly glue code to write
				109
				110	2. Emit LLVM assembly from your compiler's native language.
				111
				112	* for: very straightforward to get started
				113
				114	* against: the .ll parser is slower than the bitcode reader when
				115	interfacing to the middle end
				116
				117	* against: it may be harder to track changes to the IR
				118
				119	3. Emit LLVM bitcode from your compiler's native language.
				120
				121	* for: can use the more-efficient bitcode reader when interfacing to the
				122	middle end
				123
				124	* against: you'll have to re-engineer the LLVM IR object model and bitcode
				125	writer in your language
				126
				127	* against: it may be harder to track changes to the IR
				128
				129	If you go with the first option, the C bindings in include/llvm-c should help
				130	a lot, since most languages have strong support for interfacing with C. The
				131	most common hurdle with calling C from managed code is interfacing with the
				132	garbage collector. The C interface was designed to require very little memory
				133	management, and so is straightforward in this regard.
				134
				135	What support is there for a higher level source language constructs for building a compiler?
				136	--------------------------------------------------------------------------------------------
				137	Currently, there isn't much. LLVM supports an intermediate representation
				138	which is useful for code representation but will not support the high level
				139	(abstract syntax tree) representation needed by most compilers. There are no
				140	facilities for lexical nor semantic analysis.
				141
				142
				143	I don't understand the ``GetElementPtr`` instruction. Help!
				144	-----------------------------------------------------------
				145	See `The Often Misunderstood GEP Instruction <GetElementPtr.html>`_.
				146
				147
				148	Using the C and C++ Front Ends
				149	==============================
				150
				151	Can I compile C or C++ code to platform-independent LLVM bitcode?
				152	-----------------------------------------------------------------
				153	No. C and C++ are inherently platform-dependent languages. The most obvious
				154	example of this is the preprocessor. A very common way that C code is made
				155	portable is by using the preprocessor to include platform-specific code. In
				156	practice, information about other platforms is lost after preprocessing, so
				157	the result is inherently dependent on the platform that the preprocessing was
				158	targeting.
				159
				160	Another example is ``sizeof``. It's common for ``sizeof(long)`` to vary
				161	between platforms. In most C front-ends, ``sizeof`` is expanded to a
				162	constant immediately, thus hard-wiring a platform-specific detail.
				163
				164	Also, since many platforms define their ABIs in terms of C, and since LLVM is
				165	lower-level than C, front-ends currently must emit platform-specific IR in
				166	order to have the result conform to the platform ABI.
				167
				168
				169	Questions about code generated by the demo page
				170	===============================================
				171
				172	What is this ``llvm.global_ctors`` and ``_GLOBAL__I_a...`` stuff that happens when I ``#include <iostream>``?
				173	-------------------------------------------------------------------------------------------------------------
				174	If you ``#include`` the ``<iostream>`` header into a C++ translation unit,
				175	the file will probably use the ``std::cin``/``std::cout``/... global objects.
				176	However, C++ does not guarantee an order of initialization between static
				177	objects in different translation units, so if a static ctor/dtor in your .cpp
				178	file used ``std::cout``, for example, the object would not necessarily be
				179	automatically initialized before your use.
				180
				181	To make ``std::cout`` and friends work correctly in these scenarios, the STL
				182	that we use declares a static object that gets created in every translation
				183	unit that includes ``<iostream>``. This object has a static constructor
				184	and destructor that initializes and destroys the global iostream objects
				185	before they could possibly be used in the file. The code that you see in the
				186	``.ll`` file corresponds to the constructor and destructor registration code.
				187
				188	If you would like to make it easier to understand the LLVM code generated
				189	by the compiler in the demo page, consider using ``printf()`` instead of
				190	``iostream``\s to print values.
				191
				192
				193	Where did all of my code go??
				194	-----------------------------
				195	If you are using the LLVM demo page, you may often wonder what happened to
				196	all of the code that you typed in. Remember that the demo script is running
				197	the code through the LLVM optimizers, so if your code doesn't actually do
				198	anything useful, it might all be deleted.
				199
				200	To prevent this, make sure that the code is actually needed. For example, if
				201	you are computing some expression, return the value from the function instead
				202	of leaving it in a local variable. If you really want to constrain the
				203	optimizer, you can read from and assign to ``volatile`` global variables.
				204
				205
				206	What is this "``undef``" thing that shows up in my code?
				207	--------------------------------------------------------
				208	``undef`` is the LLVM way of representing a value that is not defined. You
				209	can get these if you do not initialize a variable before you use it. For
				210	example, the C function:
				211
				212	.. code-block:: c
				213
				214	int X() { int i; return i; }
				215
				216	Is compiled to "``ret i32 undef``" because "``i``" never has a value specified
				217	for it.
				218
				219
				220	Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into "unreachable"? Why not make the verifier reject it?
				221	----------------------------------------------------------------------------------------------------------------------------------------------------------
				222	This is a common problem run into by authors of front-ends that are using
				223	custom calling conventions: you need to make sure to set the right calling
				224	convention on both the function and on each call to the function. For
				225	example, this code:
				226
				227	.. code-block:: llvm
				228
				229	define fastcc void @foo() {
				230	ret void
				231	}
				232	define void @bar() {
				233	call void @foo()
				234	ret void
				235	}
				236
				237	Is optimized to:
				238
				239	.. code-block:: llvm
				240
				241	define fastcc void @foo() {
				242	ret void
				243	}
				244	define void @bar() {
				245	unreachable
				246	}
				247
				248	... with "``opt -instcombine -simplifycfg``". This often bites people because
				249	"all their code disappears". Setting the calling convention on the caller and
				250	callee is required for indirect calls to work, so people often ask why not
				251	make the verifier reject this sort of thing.
				252
				253	The answer is that this code has undefined behavior, but it is not illegal.
				254	If we made it illegal, then every transformation that could potentially create
				255	this would have to ensure that it doesn't, and there is valid code that can
				256	create this sort of construct (in dead code). The sorts of things that can
				257	cause this to happen are fairly contrived, but we still need to accept them.
				258	Here's an example:
				259
				260	.. code-block:: llvm
				261
				262	define fastcc void @foo() {
				263	ret void
				264	}
				265	define internal void @bar(void()* %FP, i1 %cond) {
				266	br i1 %cond, label %T, label %F
				267	T:
				268	call void %FP()
				269	ret void
				270	F:
				271	call fastcc void %FP()
				272	ret void
				273	}
				274	define void @test() {
				275	%X = or i1 false, false
				276	call void @bar(void()* @foo, i1 %X)
				277	ret void
				278	}
				279
				280	In this example, "test" always passes ``@foo``/``false`` into ``bar``, which
				281	ensures that it is dynamically called with the right calling conv (thus, the
				282	code is perfectly well defined). If you run this through the inliner, you
				283	get this (the explicit "or" is there so that the inliner doesn't dead code
				284	eliminate a bunch of stuff):
				285
				286	.. code-block:: llvm
				287
				288	define fastcc void @foo() {
				289	ret void
				290	}
				291	define void @test() {
				292	%X = or i1 false, false
				293	br i1 %X, label %T.i, label %F.i
				294	T.i:
				295	call void @foo()
				296	br label %bar.exit
				297	F.i:
				298	call fastcc void @foo()
				299	br label %bar.exit
				300	bar.exit:
				301	ret void
				302	}
				303
				304	Here you can see that the inlining pass made an undefined call to ``@foo``
				305	with the wrong calling convention. We really don't want to make the inliner
				306	have to know about this sort of thing, so it needs to be valid code. In this
				307	case, dead code elimination can trivially remove the undefined code. However,
				308	if ``%X`` was an input argument to ``@test``, the inliner would produce this:
				309
				310	.. code-block:: llvm
				311
				312	define fastcc void @foo() {
				313	ret void
				314	}
				315
				316	define void @test(i1 %X) {
				317	br i1 %X, label %T.i, label %F.i
				318	T.i:
				319	call void @foo()
				320	br label %bar.exit
				321	F.i:
				322	call fastcc void @foo()
				323	br label %bar.exit
				324	bar.exit:
				325	ret void
				326	}
				327
				328	The interesting thing about this is that ``%X`` must be false for the
				329	code to be well-defined, but no amount of dead code elimination will be able
				330	to delete the broken call as unreachable. However, since
				331	``instcombine``/``simplifycfg`` turns the undefined call into unreachable, we
				332	end up with a branch on a condition that goes to unreachable: a branch to
				333	unreachable can never happen, so "``-inline -instcombine -simplifycfg``" is
				334	able to produce:
				335
				336	.. code-block:: llvm
				337
				338	define fastcc void @foo() {
				339	ret void
				340	}
				341	define void @test(i1 %X) {
				342	F.i:
				343	call fastcc void @foo()
				344	ret void
				345	}