Blame - docs/SourceLevelDebugging.rst - platform_external_llvm

blob: 1a5ed2f2ad02156a456c5d2c16c863a82149c919 [file] [log] [blame]

Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	1	================================
				2	Source Level Debugging with LLVM
				3	================================
				4
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10
				11	This document is the central repository for all information pertaining to debug
				12	information in LLVM. It describes the :ref:`actual format that the LLVM debug
				13	information takes <format>`, which is useful for those interested in creating
				14	front-ends or dealing directly with the information. Further, this document
				15	provides specific examples of what debug information for C/C++ looks like.
				16
				17	Philosophy behind LLVM debugging information
				18	--------------------------------------------
				19
				20	The idea of the LLVM debugging information is to capture how the important
				21	pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
				22	Several design aspects have shaped the solution that appears here. The
				23	important ones are:
				24
				25	* Debugging information should have very little impact on the rest of the
				26	compiler. No transformations, analyses, or code generators should need to
				27	be modified because of debugging information.
				28
				29	* LLVM optimizations should interact in :ref:`well-defined and easily described
				30	ways <intro_debugopt>` with the debugging information.
				31
				32	* Because LLVM is designed to support arbitrary programming languages,
				33	LLVM-to-LLVM tools should not need to know anything about the semantics of
				34	the source-level-language.
				35
				36	* Source-level languages are often widely different from one another.
				37	LLVM should not put any restrictions of the flavor of the source-language,
				38	and the debugging information should work with any language.
				39
				40	* With code generator support, it should be possible to use an LLVM compiler
				41	to compile a program to native machine code and standard debugging
				42	formats. This allows compatibility with traditional machine-code level
				43	debuggers, like GDB or DBX.
				44
				45	The approach used by the LLVM implementation is to use a small set of
				46	:ref:`intrinsic functions <format_common_intrinsics>` to define a mapping
				47	between LLVM program objects and the source-level objects. The description of
				48	the source-level program is maintained in LLVM metadata in an
				49	:ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end
				50	currently uses working draft 7 of the `DWARF 3 standard
				51	<http://www.eagercon.com/dwarf/dwarf3std.htm>`_).
				52
				53	When a program is being debugged, a debugger interacts with the user and turns
				54	the stored debug information into source-language specific information. As
				55	such, a debugger must be aware of the source-language, and is thus tied to a
				56	specific language or family of languages.
				57
				58	Debug information consumers
				59	---------------------------
				60
				61	The role of debug information is to provide meta information normally stripped
				62	away during the compilation process. This meta information provides an LLVM
				63	user a relationship between generated code and the original program source
				64	code.
				65
Reid Kleckner	2b8506b	2016-06-07 20:27:30 +0000	[diff] [blame]	66	Currently, there are two backend consumers of debug info: DwarfDebug and
Vedant Kumar	a0a6883	2016-11-01 23:55:50 +0000	[diff] [blame]	67	CodeViewDebug. DwarfDebug produces DWARF suitable for use with GDB, LLDB, and
Reid Kleckner	2b8506b	2016-06-07 20:27:30 +0000	[diff] [blame]	68	other DWARF-based debuggers. :ref:`CodeViewDebug <codeview>` produces CodeView,
				69	the Microsoft debug info format, which is usable with Microsoft debuggers such
				70	as Visual Studio and WinDBG. LLVM's debug information format is mostly derived
				71	from and inspired by DWARF, but it is feasible to translate into other target
				72	debug info formats such as STABS.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	73
				74	It would also be reasonable to use debug information to feed profiling tools
				75	for analysis of generated code, or, tools for reconstructing the original
				76	source from generated code.
				77
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	78	.. _intro_debugopt:
				79
Anastasis Grammenos	629edfb	2018-07-19 14:08:54 +0000	[diff] [blame]	80	Debug information and optimizations
				81	-----------------------------------
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	82
				83	An extremely high priority of LLVM debugging information is to make it interact
				84	well with optimizations and analysis. In particular, the LLVM debug
				85	information provides the following guarantees:
				86
				87	* LLVM debug information **always provides information to accurately read
				88	the source-level state of the program**, regardless of which LLVM
				89	optimizations have been run, and without any modification to the
				90	optimizations themselves. However, some optimizations may impact the
				91	ability to modify the current state of the program with a debugger, such
				92	as setting program variables, or calling functions that have been
				93	deleted.
				94
Vedant Kumar	a0a6883	2016-11-01 23:55:50 +0000	[diff] [blame]	95	* As desired, LLVM optimizations can be upgraded to be aware of debugging
				96	information, allowing them to update the debugging information as they
				97	perform aggressive optimizations. This means that, with effort, the LLVM
				98	optimizers could optimize debug code just as well as non-debug code.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	99
				100	* LLVM debug information does not prevent optimizations from
				101	happening (for example inlining, basic block reordering/merging/cleanup,
				102	tail duplication, etc).
				103
				104	* LLVM debug information is automatically optimized along with the rest of
				105	the program, using existing facilities. For example, duplicate
				106	information is automatically merged by the linker, and unused information
				107	is automatically removed.
				108
				109	Basically, the debug information allows you to compile a program with
				110	"``-O0 -g``" and get full debug information, allowing you to arbitrarily modify
				111	the program as it executes from a debugger. Compiling a program with
				112	"``-O3 -g``" gives you full debug information that is always available and
				113	accurate for reading (e.g., you get accurate stack traces despite tail call
				114	elimination and inlining), but you might lose the ability to modify the program
Vedant Kumar	a0a6883	2016-11-01 23:55:50 +0000	[diff] [blame]	115	and call functions which were optimized out of the program, or inlined away
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	116	completely.
				117
Matthias Braun	64a07d9	2018-08-31 21:47:01 +0000	[diff] [blame]	118	The :doc:`LLVM test-suite <TestSuiteMakefileGuide>` provides a framework to
				119	test the optimizer's handling of debugging information. It can be run like
				120	this:
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	121
				122	.. code-block:: bash
				123
				124	% cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level
				125	% make TEST=dbgopt
				126
				127	This will test impact of debugging information on optimization passes. If
				128	debugging information influences optimization passes then it will be reported
				129	as a failure. See :doc:`TestingGuide` for more information on LLVM test
				130	infrastructure and how to run various tests.
				131
				132	.. _format:
				133
				134	Debugging information format
				135	============================
				136
				137	LLVM debugging information has been carefully designed to make it possible for
				138	the optimizer to optimize the program and debugging information without
				139	necessarily having to know anything about debugging information. In
				140	particular, the use of metadata avoids duplicated debugging information from
				141	the beginning, and the global dead code elimination pass automatically deletes
				142	debugging information for a function if it decides to delete the function.
				143
				144	To do this, most of the debugging information (descriptors for types,
				145	variables, functions, source files, etc) is inserted by the language front-end
				146	in the form of LLVM metadata.
				147
				148	Debug information is designed to be agnostic about the target debugger and
				149	debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic
				150	pass to decode the information that represents variables, types, functions,
				151	namespaces, etc: this allows for arbitrary source-language semantics and
				152	type-systems to be used, as long as there is a module written for the target
				153	debugger to interpret the information.
				154
				155	To provide basic functionality, the LLVM debugger does have to make some
				156	assumptions about the source-level language being debugged, though it keeps
				157	these to a minimum. The only common features that the LLVM debugger assumes
Michael Kuperstein	6a882f8	2015-05-14 10:58:59 +0000	[diff] [blame]	158	exist are `source files <LangRef.html#difile>`_, and `program objects
				159	<LangRef.html#diglobalvariable>`_. These abstract objects are used by a
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	160	debugger to form stack traces, show information about local variables, etc.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	161
				162	This section of the documentation first describes the representation aspects
				163	common to any source-language. :ref:`ccxx_frontend` describes the data layout
				164	conventions used by the C and C++ front-ends.
				165
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	166	Debug information descriptors are `specialized metadata nodes
				167	<LangRef.html#specialized-metadata>`_, first-class subclasses of ``Metadata``.
Adrian Prantl	2a39c99	2014-08-01 22:11:58 +0000	[diff] [blame]	168
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	169	.. _format_common_intrinsics:
				170
				171	Debugger intrinsic functions
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	172	----------------------------
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	173
				174	LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to
Reid Kleckner	0e1ce27	2017-09-21 19:52:03 +0000	[diff] [blame]	175	track source local variables through optimization and code generation.
				176
				177	``llvm.dbg.addr``
				178	^^^^^^^^^^^^^^^^^^^^
				179
				180	.. code-block:: llvm
				181
				182	void @llvm.dbg.addr(metadata, metadata, metadata)
				183
				184	This intrinsic provides information about a local element (e.g., variable).
				185	The first argument is metadata holding the address of variable, typically a
				186	static alloca in the function entry block. The second argument is a
				187	`local variable <LangRef.html#dilocalvariable>`_ containing a description of
				188	the variable. The third argument is a `complex expression
				189	<LangRef.html#diexpression>`_. An `llvm.dbg.addr` intrinsic describes the
				190	address of a source variable.
				191
Jonas Devlieghere	2b2534c	2017-11-06 11:47:24 +0000	[diff] [blame]	192	.. code-block:: text
Reid Kleckner	0e1ce27	2017-09-21 19:52:03 +0000	[diff] [blame]	193
				194	%i.addr = alloca i32, align 4
				195	call void @llvm.dbg.addr(metadata i32* %i.addr, metadata !1,
				196	metadata !DIExpression()), !dbg !2
				197	!1 = !DILocalVariable(name: "i", ...) ; int i
				198	!2 = !DILocation(...)
				199	...
				200	%buffer = alloca [256 x i8], align 8
				201	; The address of i is buffer+64.
				202	call void @llvm.dbg.addr(metadata [256 x i8]* %buffer, metadata !3,
				203	metadata !DIExpression(DW_OP_plus, 64)), !dbg !4
				204	!3 = !DILocalVariable(name: "i", ...) ; int i
				205	!4 = !DILocation(...)
				206
				207	A frontend should generate exactly one call to ``llvm.dbg.addr`` at the point
				208	of declaration of a source variable. Optimization passes that fully promote the
				209	variable from memory to SSA values will replace this call with possibly
				210	multiple calls to `llvm.dbg.value`. Passes that delete stores are effectively
				211	partial promotion, and they will insert a mix of calls to ``llvm.dbg.value``
				212	and ``llvm.dbg.addr`` to track the source variable value when it is available.
				213	After optimization, there may be multiple calls to ``llvm.dbg.addr`` describing
				214	the program points where the variables lives in memory. All calls for the same
				215	concrete source variable must agree on the memory location.
				216
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	217
				218	``llvm.dbg.declare``
				219	^^^^^^^^^^^^^^^^^^^^
				220
				221	.. code-block:: llvm
				222
Michael Kuperstein	6a882f8	2015-05-14 10:58:59 +0000	[diff] [blame]	223	void @llvm.dbg.declare(metadata, metadata, metadata)
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	224
Reid Kleckner	0e1ce27	2017-09-21 19:52:03 +0000	[diff] [blame]	225	This intrinsic is identical to `llvm.dbg.addr`, except that there can only be
				226	one call to `llvm.dbg.declare` for a given concrete `local variable
				227	<LangRef.html#dilocalvariable>`_. It is not control-dependent, meaning that if
				228	a call to `llvm.dbg.declare` exists and has a valid location argument, that
				229	address is considered to be the true home of the variable across its entire
				230	lifetime. This makes it hard for optimizations to preserve accurate debug info
				231	in the presence of ``llvm.dbg.declare``, so we are transitioning away from it,
				232	and we plan to deprecate it in future LLVM releases.
Adrian Prantl	b560ea7	2017-04-18 01:21:53 +0000	[diff] [blame]	233
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	234
				235	``llvm.dbg.value``
				236	^^^^^^^^^^^^^^^^^^
				237
				238	.. code-block:: llvm
				239
Adrian Prantl	5d0334a	2017-07-28 20:21:02 +0000	[diff] [blame]	240	void @llvm.dbg.value(metadata, metadata, metadata)
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	241
				242	This intrinsic provides information when a user source variable is set to a new
Vedant Kumar	beb047f	2017-10-26 17:58:05 +0000	[diff] [blame]	243	value. The first argument is the new value (wrapped as metadata). The second
Adrian Prantl	5d0334a	2017-07-28 20:21:02 +0000	[diff] [blame]	244	argument is a `local variable <LangRef.html#dilocalvariable>`_ containing a
Vedant Kumar	beb047f	2017-10-26 17:58:05 +0000	[diff] [blame]	245	description of the variable. The third argument is a `complex expression
Adrian Prantl	5d0334a	2017-07-28 20:21:02 +0000	[diff] [blame]	246	<LangRef.html#diexpression>`_.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	247
Vedant Kumar	1731345	2018-07-28 00:33:47 +0000	[diff] [blame]	248	An `llvm.dbg.value` intrinsic describes the value of a source variable
				249	directly, not its address. Note that the value operand of this intrinsic may
				250	be indirect (i.e, a pointer to the source variable), provided that interpreting
				251	the complex expression derives the direct value.
				252
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	253	Object lifetimes and scoping
				254	============================
				255
				256	In many languages, the local variables in functions can have their lifetimes or
				257	scopes limited to a subset of a function. In the C family of languages, for
				258	example, variables are only live (readable and writable) within the source
				259	block that they are defined in. In functional languages, values are only
				260	readable after they have been defined. Though this is a very obvious concept,
				261	it is non-trivial to model in LLVM, because it has no notion of scoping in this
				262	sense, and does not want to be tied to a language's scoping rules.
				263
				264	In order to handle this, the LLVM debug format uses the metadata attached to
				265	llvm instructions to encode line number and scoping information. Consider the
				266	following C fragment, for example:
				267
				268	.. code-block:: c
				269
				270	1. void foo() {
				271	2. int X = 21;
				272	3. int Y = 22;
				273	4. {
				274	5. int Z = 23;
				275	6. Z = X;
				276	7. }
				277	8. X = Y;
				278	9. }
				279
Reid Kleckner	0e1ce27	2017-09-21 19:52:03 +0000	[diff] [blame]	280	.. FIXME: Update the following example to use llvm.dbg.addr once that is the
				281	default in clang.
				282
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	283	Compiled to LLVM, this function would be represented like this:
				284
Renato Golin	88ea57f	2016-07-20 12:16:38 +0000	[diff] [blame]	285	.. code-block:: text
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	286
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	287	; Function Attrs: nounwind ssp uwtable
Peter Collingbourne	d04e60e	2015-11-06 02:41:02 +0000	[diff] [blame]	288	define void @foo() #0 !dbg !4 {
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	289	entry:
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	290	%X = alloca i32, align 4
Bill Wendling	1a57aa4	2013-10-27 04:50:34 +0000	[diff] [blame]	291	%Y = alloca i32, align 4
				292	%Z = alloca i32, align 4
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	293	call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
				294	store i32 21, i32* %X, align 4, !dbg !14
				295	call void @llvm.dbg.declare(metadata i32* %Y, metadata !15, metadata !13), !dbg !16
				296	store i32 22, i32* %Y, align 4, !dbg !16
				297	call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
				298	store i32 23, i32* %Z, align 4, !dbg !19
				299	%0 = load i32, i32* %X, align 4, !dbg !20
				300	store i32 %0, i32* %Z, align 4, !dbg !21
				301	%1 = load i32, i32* %Y, align 4, !dbg !22
				302	store i32 %1, i32* %X, align 4, !dbg !23
				303	ret void, !dbg !24
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	304	}
				305
David Blaikie	61212bc	2013-05-29 02:05:13 +0000	[diff] [blame]	306	; Function Attrs: nounwind readnone
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	307	declare void @llvm.dbg.declare(metadata, metadata, metadata) #1
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	308
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	309	attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
David Blaikie	61212bc	2013-05-29 02:05:13 +0000	[diff] [blame]	310	attributes #1 = { nounwind readnone }
				311
				312	!llvm.dbg.cu = !{!0}
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	313	!llvm.module.flags = !{!7, !8, !9}
				314	!llvm.ident = !{!10}
Bill Wendling	1a57aa4	2013-10-27 04:50:34 +0000	[diff] [blame]	315
Adrian Prantl	7876f64	2016-04-01 00:16:49 +0000	[diff] [blame]	316	!0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !2, subprograms: !3, globals: !2, imports: !2)
Duncan P. N. Exon Smith	e56023a	2015-04-29 16:38:44 +0000	[diff] [blame]	317	!1 = !DIFile(filename: "/dev/stdin", directory: "/Users/dexonsmith/data/llvm/debug-info")
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	318	!2 = !{}
				319	!3 = !{!4}
Peter Collingbourne	d04e60e	2015-11-06 02:41:02 +0000	[diff] [blame]	320	!4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: false, variables: !2)
Duncan P. N. Exon Smith	e56023a	2015-04-29 16:38:44 +0000	[diff] [blame]	321	!5 = !DISubroutineType(types: !6)
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	322	!6 = !{null}
				323	!7 = !{i32 2, !"Dwarf Version", i32 2}
				324	!8 = !{i32 2, !"Debug Info Version", i32 3}
				325	!9 = !{i32 1, !"PIC Level", i32 2}
				326	!10 = !{!"clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)"}
Duncan P. N. Exon Smith	bf2040f	2015-07-31 18:58:39 +0000	[diff] [blame]	327	!11 = !DILocalVariable(name: "X", scope: !4, file: !1, line: 2, type: !12)
Duncan P. N. Exon Smith	e56023a	2015-04-29 16:38:44 +0000	[diff] [blame]	328	!12 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
				329	!13 = !DIExpression()
				330	!14 = !DILocation(line: 2, column: 9, scope: !4)
Duncan P. N. Exon Smith	bf2040f	2015-07-31 18:58:39 +0000	[diff] [blame]	331	!15 = !DILocalVariable(name: "Y", scope: !4, file: !1, line: 3, type: !12)
Duncan P. N. Exon Smith	e56023a	2015-04-29 16:38:44 +0000	[diff] [blame]	332	!16 = !DILocation(line: 3, column: 9, scope: !4)
Duncan P. N. Exon Smith	bf2040f	2015-07-31 18:58:39 +0000	[diff] [blame]	333	!17 = !DILocalVariable(name: "Z", scope: !18, file: !1, line: 5, type: !12)
Duncan P. N. Exon Smith	e56023a	2015-04-29 16:38:44 +0000	[diff] [blame]	334	!18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
				335	!19 = !DILocation(line: 5, column: 11, scope: !18)
				336	!20 = !DILocation(line: 6, column: 11, scope: !18)
				337	!21 = !DILocation(line: 6, column: 9, scope: !18)
				338	!22 = !DILocation(line: 8, column: 9, scope: !4)
				339	!23 = !DILocation(line: 8, column: 7, scope: !4)
				340	!24 = !DILocation(line: 9, column: 3, scope: !4)
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	341
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	342
				343	This example illustrates a few important details about LLVM debugging
				344	information. In particular, it shows how the ``llvm.dbg.declare`` intrinsic and
				345	location information, which are attached to an instruction, are applied
				346	together to allow a debugger to analyze the relationship between statements,
				347	variable definitions, and the code used to implement the function.
				348
				349	.. code-block:: llvm
				350
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	351	call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
David Blaikie	61212bc	2013-05-29 02:05:13 +0000	[diff] [blame]	352	; [debug line = 2:7] [debug variable = X]
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	353
				354	The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	355	variable ``X``. The metadata ``!dbg !14`` attached to the intrinsic provides
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	356	scope information for the variable ``X``.
				357
Renato Golin	88ea57f	2016-07-20 12:16:38 +0000	[diff] [blame]	358	.. code-block:: text
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	359
Duncan P. N. Exon Smith	e56023a	2015-04-29 16:38:44 +0000	[diff] [blame]	360	!14 = !DILocation(line: 2, column: 9, scope: !4)
Peter Collingbourne	d04e60e	2015-11-06 02:41:02 +0000	[diff] [blame]	361	!4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5,
				362	isLocal: false, isDefinition: true, scopeLine: 1,
				363	isOptimized: false, variables: !2)
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	364
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	365	Here ``!14`` is metadata providing `location information
Michael Kuperstein	6a882f8	2015-05-14 10:58:59 +0000	[diff] [blame]	366	<LangRef.html#dilocation>`_. In this example, scope is encoded by ``!4``, a
				367	`subprogram descriptor <LangRef.html#disubprogram>`_. This way the location
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	368	information attached to the intrinsics indicates that the variable ``X`` is
				369	declared at line number 2 at a function level scope in function ``foo``.
				370
				371	Now lets take another example.
				372
				373	.. code-block:: llvm
				374
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	375	call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
David Blaikie	61212bc	2013-05-29 02:05:13 +0000	[diff] [blame]	376	; [debug line = 5:9] [debug variable = Z]
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	377
David Blaikie	61212bc	2013-05-29 02:05:13 +0000	[diff] [blame]	378	The third intrinsic ``%llvm.dbg.declare`` encodes debugging information for
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	379	variable ``Z``. The metadata ``!dbg !19`` attached to the intrinsic provides
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	380	scope information for the variable ``Z``.
				381
Renato Golin	88ea57f	2016-07-20 12:16:38 +0000	[diff] [blame]	382	.. code-block:: text
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	383
Duncan P. N. Exon Smith	e56023a	2015-04-29 16:38:44 +0000	[diff] [blame]	384	!18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
				385	!19 = !DILocation(line: 5, column: 11, scope: !18)
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	386
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	387	Here ``!19`` indicates that ``Z`` is declared at line number 5 and column
Alex Langford	985a082	2018-08-21 01:43:03 +0000	[diff] [blame]	388	number 11 inside of lexical scope ``!18``. The lexical scope itself resides
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	389	inside of subprogram ``!4`` described above.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	390
				391	The scope information attached with each instruction provides a straightforward
				392	way to find instructions covered by a scope.
				393
				394	.. _ccxx_frontend:
				395
				396	C/C++ front-end specific debug information
				397	==========================================
				398
				399	The C and C++ front-ends represent information about the program in a format
				400	that is effectively identical to `DWARF 3.0
				401	<http://www.eagercon.com/dwarf/dwarf3std.htm>`_ in terms of information
				402	content. This allows code generators to trivially support native debuggers by
				403	generating standard dwarf information, and contains enough information for
				404	non-dwarf targets to translate it as needed.
				405
				406	This section describes the forms used to represent C and C++ programs. Other
				407	languages could pattern themselves after this (which itself is tuned to
				408	representing programs in the same way that DWARF 3 does), or they could choose
				409	to provide completely different forms if they don't fit into the DWARF model.
				410	As support for debugging information gets added to the various LLVM
				411	source-language front-ends, the information used should be documented here.
				412
Duncan P. N. Exon Smith	ffd2f50	2014-10-04 14:56:56 +0000	[diff] [blame]	413	The following sections provide examples of a few C/C++ constructs and the debug
				414	information that would best describe those constructs. The canonical
				415	references are the ``DIDescriptor`` classes defined in
				416	``include/llvm/IR/DebugInfo.h`` and the implementations of the helper functions
				417	in ``lib/IR/DIBuilder.cpp``.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	418
				419	C/C++ source file information
				420	-----------------------------
				421
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	422	``llvm::Instruction`` provides easy access to metadata attached with an
				423	instruction. One can extract line number information encoded in LLVM IR using
Duncan P. N. Exon Smith	66353e3	2015-08-06 18:15:25 +0000	[diff] [blame]	424	``Instruction::getDebugLoc()`` and ``DILocation::getLine()``.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	425
				426	.. code-block:: c++
				427
Duncan P. N. Exon Smith	66353e3	2015-08-06 18:15:25 +0000	[diff] [blame]	428	if (DILocation *Loc = I->getDebugLoc()) { // Here I is an LLVM instruction
				429	unsigned Line = Loc->getLine();
				430	StringRef File = Loc->getFilename();
				431	StringRef Dir = Loc->getDirectory();
Calixte Denizet	44db1d1	2018-09-20 08:53:06 +0000	[diff] [blame]	432	bool ImplicitCode = Loc->isImplicitCode();
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	433	}
				434
Calixte Denizet	44db1d1	2018-09-20 08:53:06 +0000	[diff] [blame]	435	When the flag ImplicitCode is true then it means that the Instruction has been
				436	added by the front-end but doesn't correspond to source code written by the user. For example
				437
				438	.. code-block:: c++
				439
				440	if (MyBoolean) {
				441	MyObject MO;
				442	...
				443	}
				444
				445	At the end of the scope the MyObject's destructor is called but it isn't written
				446	explicitly. This information is useful to avoid to have counters on brackets when
				447	making code coverage.
				448
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	449	C/C++ global variable information
				450	---------------------------------
				451
				452	Given an integer global variable declared as follows:
				453
				454	.. code-block:: c
				455
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	456	_Alignas(8) int MyGlobal = 100;
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	457
				458	a C/C++ front-end would generate the following descriptors:
				459
Renato Golin	88ea57f	2016-07-20 12:16:38 +0000	[diff] [blame]	460	.. code-block:: text
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	461
				462	;;
				463	;; Define the global itself.
				464	;;
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	465	@MyGlobal = global i32 100, align 8, !dbg !0
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	466
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	467	;;
				468	;; List of debug info of globals
				469	;;
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	470	!llvm.dbg.cu = !{!1}
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	471
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	472	;; Some unrelated metadata.
				473	!llvm.module.flags = !{!6, !7}
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	474	!llvm.ident = !{!8}
				475
				476	;; Define the global variable itself
				477	!0 = distinct !DIGlobalVariable(name: "MyGlobal", scope: !1, file: !2, line: 1, type: !5, isLocal: false, isDefinition: true, align: 64)
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	478
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	479	;; Define the compile unit.
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	480	!1 = distinct !DICompileUnit(language: DW_LANG_C99, file: !2,
James Y Knight	46d00b4	2019-01-15 16:18:52 +0000	[diff] [blame]	481	producer: "clang version 4.0.0",
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	482	isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug,
				483	enums: !3, globals: !4)
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	484
				485	;;
				486	;; Define the file
				487	;;
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	488	!2 = !DIFile(filename: "/dev/stdin",
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	489	directory: "/Users/dexonsmith/data/llvm/debug-info")
				490
				491	;; An empty array.
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	492	!3 = !{}
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	493
				494	;; The Array of Global Variables
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	495	!4 = !{!0}
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	496
				497	;;
				498	;; Define the type
				499	;;
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	500	!5 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	501
				502	;; Dwarf version to output.
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	503	!6 = !{i32 2, !"Dwarf Version", i32 4}
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	504
				505	;; Debug info schema version.
				506	!7 = !{i32 2, !"Debug Info Version", i32 3}
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	507
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	508	;; Compiler identification
James Y Knight	46d00b4	2019-01-15 16:18:52 +0000	[diff] [blame]	509	!8 = !{!"clang version 4.0.0"}
Victor Leschuk	0ab9364	2016-10-26 11:59:03 +0000	[diff] [blame]	510
				511
				512	The align value in DIGlobalVariable description specifies variable alignment in
				513	case it was forced by C11 _Alignas(), C++11 alignas() keywords or compiler
				514	attribute __attribute__((aligned ())). In other case (when this field is missing)
				515	alignment is considered default. This is used when producing DWARF output
				516	for DW_AT_alignment value.
				517
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	518	C/C++ function information
				519	--------------------------
				520
				521	Given a function declared as follows:
				522
				523	.. code-block:: c
				524
				525	int main(int argc, char *argv[]) {
				526	return 0;
				527	}
				528
				529	a C/C++ front-end would generate the following descriptors:
				530
Renato Golin	88ea57f	2016-07-20 12:16:38 +0000	[diff] [blame]	531	.. code-block:: text
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	532
				533	;;
David Blaikie	61212bc	2013-05-29 02:05:13 +0000	[diff] [blame]	534	;; Define the anchor for subprograms.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	535	;;
Duncan P. N. Exon Smith	e56023a	2015-04-29 16:38:44 +0000	[diff] [blame]	536	!4 = !DISubprogram(name: "main", scope: !1, file: !1, line: 1, type: !5,
Duncan P. N. Exon Smith	8906493	2015-03-17 23:41:05 +0000	[diff] [blame]	537	isLocal: false, isDefinition: true, scopeLine: 1,
				538	flags: DIFlagPrototyped, isOptimized: false,
Peter Collingbourne	d04e60e	2015-11-06 02:41:02 +0000	[diff] [blame]	539	variables: !2)
Duncan P. N. Exon Smith	ffd2f50	2014-10-04 14:56:56 +0000	[diff] [blame]	540
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	541	;;
				542	;; Define the subprogram itself.
				543	;;
Peter Collingbourne	d04e60e	2015-11-06 02:41:02 +0000	[diff] [blame]	544	define i32 @main(i32 %argc, i8** %argv) !dbg !4 {
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	545	...
				546	}
				547
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	548	Debugging information format
				549	============================
				550
				551	Debugging Information Extension for Objective C Properties
				552	----------------------------------------------------------
				553
				554	Introduction
				555	^^^^^^^^^^^^
				556
				557	Objective C provides a simpler way to declare and define accessor methods using
				558	declared properties. The language provides features to declare a property and
				559	to let compiler synthesize accessor methods.
				560
				561	The debugger lets developer inspect Objective C interfaces and their instance
				562	variables and class variables. However, the debugger does not know anything
				563	about the properties defined in Objective C interfaces. The debugger consumes
				564	information generated by compiler in DWARF format. The format does not support
				565	encoding of Objective C properties. This proposal describes DWARF extensions to
				566	encode Objective C properties, which the debugger can use to let developers
				567	inspect Objective C properties.
				568
				569	Proposal
				570	^^^^^^^^
				571
				572	Objective C properties exist separately from class members. A property can be
				573	defined only by "setter" and "getter" selectors, and be calculated anew on each
				574	access. Or a property can just be a direct access to some declared ivar.
				575	Finally it can have an ivar "automatically synthesized" for it by the compiler,
				576	in which case the property can be referred to in user code directly using the
				577	standard C dereference syntax as well as through the property "dot" syntax, but
				578	there is no entry in the ``@interface`` declaration corresponding to this ivar.
				579
				580	To facilitate debugging, these properties we will add a new DWARF TAG into the
				581	``DW_TAG_structure_type`` definition for the class to hold the description of a
				582	given property, and a set of DWARF attributes that provide said description.
				583	The property tag will also contain the name and declared type of the property.
				584
				585	If there is a related ivar, there will also be a DWARF property attribute placed
				586	in the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG
				587	for that property. And in the case where the compiler synthesizes the ivar
				588	directly, the compiler is expected to generate a ``DW_TAG_member`` for that
				589	ivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used
				590	to access this ivar directly in code, and with the property attribute pointing
				591	back to the property it is backing.
				592
				593	The following examples will serve as illustration for our discussion:
				594
				595	.. code-block:: objc
				596
				597	@interface I1 {
				598	int n2;
				599	}
				600
				601	@property int p1;
				602	@property int p2;
				603	@end
				604
				605	@implementation I1
				606	@synthesize p1;
				607	@synthesize p2 = n2;
				608	@end
				609
				610	This produces the following DWARF (this is a "pseudo dwarfdump" output):
				611
				612	.. code-block:: none
				613
				614	0x00000100: TAG_structure_type [7] *
				615	AT_APPLE_runtime_class( 0x10 )
				616	AT_name( "I1" )
				617	AT_decl_file( "Objc_Property.m" )
				618	AT_decl_line( 3 )
				619
				620	0x00000110 TAG_APPLE_property
				621	AT_name ( "p1" )
				622	AT_type ( {0x00000150} ( int ) )
				623
				624	0x00000120: TAG_APPLE_property
				625	AT_name ( "p2" )
				626	AT_type ( {0x00000150} ( int ) )
				627
				628	0x00000130: TAG_member [8]
				629	AT_name( "_p1" )
				630	AT_APPLE_property ( {0x00000110} "p1" )
				631	AT_type( {0x00000150} ( int ) )
				632	AT_artificial ( 0x1 )
				633
				634	0x00000140: TAG_member [8]
				635	AT_name( "n2" )
				636	AT_APPLE_property ( {0x00000120} "p2" )
				637	AT_type( {0x00000150} ( int ) )
				638
				639	0x00000150: AT_type( ( int ) )
				640
				641	Note, the current convention is that the name of the ivar for an
				642	auto-synthesized property is the name of the property from which it derives
				643	with an underscore prepended, as is shown in the example. But we actually
				644	don't need to know this convention, since we are given the name of the ivar
				645	directly.
				646
				647	Also, it is common practice in ObjC to have different property declarations in
				648	the @interface and @implementation - e.g. to provide a read-only property in
				649	the interface,and a read-write interface in the implementation. In that case,
				650	the compiler should emit whichever property declaration will be in force in the
				651	current translation unit.
				652
				653	Developers can decorate a property with attributes which are encoded using
				654	``DW_AT_APPLE_property_attribute``.
				655
				656	.. code-block:: objc
				657
				658	@property (readonly, nonatomic) int pr;
				659
				660	.. code-block:: none
				661
				662	TAG_APPLE_property [8]
				663	AT_name( "pr" )
				664	AT_type ( {0x00000147} (int) )
				665	AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
				666
				667	The setter and getter method names are attached to the property using
				668	``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes.
				669
				670	.. code-block:: objc
				671
				672	@interface I1
				673	@property (setter=myOwnP3Setter:) int p3;
				674	-(void)myOwnP3Setter:(int)a;
				675	@end
				676
				677	@implementation I1
				678	@synthesize p3;
				679	-(void)myOwnP3Setter:(int)a{ }
				680	@end
				681
				682	The DWARF for this would be:
				683
				684	.. code-block:: none
				685
				686	0x000003bd: TAG_structure_type [7] *
				687	AT_APPLE_runtime_class( 0x10 )
				688	AT_name( "I1" )
				689	AT_decl_file( "Objc_Property.m" )
				690	AT_decl_line( 3 )
				691
				692	0x000003cd TAG_APPLE_property
				693	AT_name ( "p3" )
				694	AT_APPLE_property_setter ( "myOwnP3Setter:" )
				695	AT_type( {0x00000147} ( int ) )
				696
				697	0x000003f3: TAG_member [8]
				698	AT_name( "_p3" )
				699	AT_type ( {0x00000147} ( int ) )
				700	AT_APPLE_property ( {0x000003cd} )
				701	AT_artificial ( 0x1 )
				702
				703	New DWARF Tags
				704	^^^^^^^^^^^^^^
				705
				706	+-----------------------+--------+
				707	\| TAG \| Value \|
				708	+=======================+========+
				709	\| DW_TAG_APPLE_property \| 0x4200 \|
				710	+-----------------------+--------+
				711
				712	New DWARF Attributes
				713	^^^^^^^^^^^^^^^^^^^^
				714
				715	+--------------------------------+--------+-----------+
				716	\| Attribute \| Value \| Classes \|
				717	+================================+========+===========+
				718	\| DW_AT_APPLE_property \| 0x3fed \| Reference \|
				719	+--------------------------------+--------+-----------+
				720	\| DW_AT_APPLE_property_getter \| 0x3fe9 \| String \|
				721	+--------------------------------+--------+-----------+
				722	\| DW_AT_APPLE_property_setter \| 0x3fea \| String \|
				723	+--------------------------------+--------+-----------+
				724	\| DW_AT_APPLE_property_attribute \| 0x3feb \| Constant \|
				725	+--------------------------------+--------+-----------+
				726
				727	New DWARF Constants
				728	^^^^^^^^^^^^^^^^^^^
				729
Frederic Riss	6c54948	2014-10-08 14:59:44 +0000	[diff] [blame]	730	+--------------------------------------+-------+
				731	\| Name \| Value \|
				732	+======================================+=======+
				733	\| DW_APPLE_PROPERTY_readonly \| 0x01 \|
				734	+--------------------------------------+-------+
				735	\| DW_APPLE_PROPERTY_getter \| 0x02 \|
				736	+--------------------------------------+-------+
				737	\| DW_APPLE_PROPERTY_assign \| 0x04 \|
				738	+--------------------------------------+-------+
				739	\| DW_APPLE_PROPERTY_readwrite \| 0x08 \|
				740	+--------------------------------------+-------+
				741	\| DW_APPLE_PROPERTY_retain \| 0x10 \|
				742	+--------------------------------------+-------+
				743	\| DW_APPLE_PROPERTY_copy \| 0x20 \|
				744	+--------------------------------------+-------+
				745	\| DW_APPLE_PROPERTY_nonatomic \| 0x40 \|
				746	+--------------------------------------+-------+
				747	\| DW_APPLE_PROPERTY_setter \| 0x80 \|
				748	+--------------------------------------+-------+
				749	\| DW_APPLE_PROPERTY_atomic \| 0x100 \|
				750	+--------------------------------------+-------+
				751	\| DW_APPLE_PROPERTY_weak \| 0x200 \|
				752	+--------------------------------------+-------+
				753	\| DW_APPLE_PROPERTY_strong \| 0x400 \|
				754	+--------------------------------------+-------+
				755	\| DW_APPLE_PROPERTY_unsafe_unretained \| 0x800 \|
Adrian Prantl	57df85b	2016-07-14 00:41:18 +0000	[diff] [blame]	756	+--------------------------------------+-------+
				757	\| DW_APPLE_PROPERTY_nullability \| 0x1000\|
				758	+--------------------------------------+-------+
				759	\| DW_APPLE_PROPERTY_null_resettable \| 0x2000\|
				760	+--------------------------------------+-------+
				761	\| DW_APPLE_PROPERTY_class \| 0x4000\|
				762	+--------------------------------------+-------+
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	763
				764	Name Accelerator Tables
				765	-----------------------
				766
				767	Introduction
				768	^^^^^^^^^^^^
				769
				770	The "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a
				771	debugger needs. The "``pub``" in the section name indicates that the entries
				772	in the table are publicly visible names only. This means no static or hidden
				773	functions show up in the "``.debug_pubnames``". No static variables or private
				774	class variables are in the "``.debug_pubtypes``". Many compilers add different
				775	things to these tables, so we can't rely upon the contents between gcc, icc, or
				776	clang.
				777
				778	The typical query given by users tends not to match up with the contents of
				779	these tables. For example, the DWARF spec states that "In the case of the name
				780	of a function member or static data member of a C++ structure, class or union,
				781	the name presented in the "``.debug_pubnames``" section is not the simple name
				782	given by the ``DW_AT_name attribute`` of the referenced debugging information
				783	entry, but rather the fully qualified name of the data or function member."
				784	So the only names in these tables for complex C++ entries is a fully
				785	qualified name. Debugger users tend not to enter their search strings as
				786	"``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or
				787	"``a::b::c``". So the name entered in the name table must be demangled in
				788	order to chop it up appropriately and additional names must be manually entered
				789	into the table to make it effective as a name lookup table for debuggers to
Bruce Mitchener	767c34a	2015-09-12 01:17:08 +0000	[diff] [blame]	790	use.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	791
				792	All debuggers currently ignore the "``.debug_pubnames``" table as a result of
				793	its inconsistent and useless public-only name content making it a waste of
				794	space in the object file. These tables, when they are written to disk, are not
				795	sorted in any way, leaving every debugger to do its own parsing and sorting.
				796	These tables also include an inlined copy of the string values in the table
				797	itself making the tables much larger than they need to be on disk, especially
				798	for large C++ programs.
				799
				800	Can't we just fix the sections by adding all of the names we need to this
				801	table? No, because that is not what the tables are defined to contain and we
				802	won't know the difference between the old bad tables and the new good tables.
				803	At best we could make our own renamed sections that contain all of the data we
				804	need.
				805
				806	These tables are also insufficient for what a debugger like LLDB needs. LLDB
				807	uses clang for its expression parsing where LLDB acts as a PCH. LLDB is then
				808	often asked to look for type "``foo``" or namespace "``bar``", or list items in
				809	namespace "``baz``". Namespaces are not included in the pubnames or pubtypes
				810	tables. Since clang asks a lot of questions when it is parsing an expression,
				811	we need to be very fast when looking up names, as it happens a lot. Having new
				812	accelerator tables that are optimized for very quick lookups will benefit this
				813	type of debugging experience greatly.
				814
				815	We would like to generate name lookup tables that can be mapped into memory
				816	from disk, and used as is, with little or no up-front parsing. We would also
				817	be able to control the exact content of these different tables so they contain
				818	exactly what we need. The Name Accelerator Tables were designed to fix these
				819	issues. In order to solve these issues we need to:
				820
				821	* Have a format that can be mapped into memory from disk and used as is
				822	* Lookups should be very fast
				823	* Extensible table format so these tables can be made by many producers
				824	* Contain all of the names needed for typical lookups out of the box
				825	* Strict rules for the contents of tables
				826
				827	Table size is important and the accelerator table format should allow the reuse
				828	of strings from common string tables so the strings for the names are not
				829	duplicated. We also want to make sure the table is ready to be used as-is by
				830	simply mapping the table into memory with minimal header parsing.
				831
				832	The name lookups need to be fast and optimized for the kinds of lookups that
				833	debuggers tend to do. Optimally we would like to touch as few parts of the
				834	mapped table as possible when doing a name lookup and be able to quickly find
				835	the name entry we are looking for, or discover there are no matches. In the
				836	case of debuggers we optimized for lookups that fail most of the time.
				837
				838	Each table that is defined should have strict rules on exactly what is in the
				839	accelerator tables and documented so clients can rely on the content.
				840
				841	Hash Tables
				842	^^^^^^^^^^^
				843
				844	Standard Hash Tables
				845	""""""""""""""""""""
				846
				847	Typical hash tables have a header, buckets, and each bucket points to the
				848	bucket contents:
				849
				850	.. code-block:: none
				851
				852	.------------.
				853	\| HEADER \|
				854	\|------------\|
				855	\| BUCKETS \|
				856	\|------------\|
				857	\| DATA \|
				858	`------------'
				859
				860	The BUCKETS are an array of offsets to DATA for each hash:
				861
				862	.. code-block:: none
				863
				864	.------------.
				865	\| 0x00001000 \| BUCKETS[0]
				866	\| 0x00002000 \| BUCKETS[1]
				867	\| 0x00002200 \| BUCKETS[2]
				868	\| 0x000034f0 \| BUCKETS[3]
				869	\| \| ...
				870	\| 0xXXXXXXXX \| BUCKETS[n_buckets]
				871	'------------'
				872
				873	So for ``bucket[3]`` in the example above, we have an offset into the table
				874	0x000034f0 which points to a chain of entries for the bucket. Each bucket must
				875	contain a next pointer, full 32 bit hash value, the string itself, and the data
				876	for the current string value.
				877
				878	.. code-block:: none
				879
				880	.------------.
				881	0x000034f0: \| 0x00003500 \| next pointer
				882	\| 0x12345678 \| 32 bit hash
				883	\| "erase" \| string value
				884	\| data[n] \| HashData for this bucket
				885	\|------------\|
				886	0x00003500: \| 0x00003550 \| next pointer
				887	\| 0x29273623 \| 32 bit hash
				888	\| "dump" \| string value
				889	\| data[n] \| HashData for this bucket
				890	\|------------\|
				891	0x00003550: \| 0x00000000 \| next pointer
				892	\| 0x82638293 \| 32 bit hash
				893	\| "main" \| string value
				894	\| data[n] \| HashData for this bucket
				895	`------------'
				896
				897	The problem with this layout for debuggers is that we need to optimize for the
				898	negative lookup case where the symbol we're searching for is not present. So
Vedant Kumar	a0a6883	2016-11-01 23:55:50 +0000	[diff] [blame]	899	if we were to lookup "``printf``" in the table above, we would make a 32-bit
				900	hash for "``printf``", it might match ``bucket[3]``. We would need to go to
				901	the offset 0x000034f0 and start looking to see if our 32 bit hash matches. To
				902	do so, we need to read the next pointer, then read the hash, compare it, and
				903	skip to the next bucket. Each time we are skipping many bytes in memory and
				904	touching new pages just to do the compare on the full 32 bit hash. All of
				905	these accesses then tell us that we didn't have a match.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	906
				907	Name Hash Tables
				908	""""""""""""""""
				909
				910	To solve the issues mentioned above we have structured the hash tables a bit
				911	differently: a header, buckets, an array of all unique 32 bit hash values,
				912	followed by an array of hash value data offsets, one for each hash value, then
				913	the data for all hash values:
				914
				915	.. code-block:: none
				916
				917	.-------------.
				918	\| HEADER \|
				919	\|-------------\|
				920	\| BUCKETS \|
				921	\|-------------\|
				922	\| HASHES \|
				923	\|-------------\|
				924	\| OFFSETS \|
				925	\|-------------\|
				926	\| DATA \|
				927	`-------------'
				928
				929	The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array. By
				930	making all of the full 32 bit hash values contiguous in memory, we allow
				931	ourselves to efficiently check for a match while touching as little memory as
				932	possible. Most often checking the 32 bit hash values is as far as the lookup
				933	goes. If it does match, it usually is a match with no collisions. So for a
				934	table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash
				935	values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and
				936	``OFFSETS`` as:
				937
				938	.. code-block:: none
				939
				940	.-------------------------.
				941	\| HEADER.magic \| uint32_t
				942	\| HEADER.version \| uint16_t
				943	\| HEADER.hash_function \| uint16_t
				944	\| HEADER.bucket_count \| uint32_t
				945	\| HEADER.hashes_count \| uint32_t
				946	\| HEADER.header_data_len \| uint32_t
				947	\| HEADER_DATA \| HeaderData
				948	\|-------------------------\|
Eric Christopher	afa288d	2013-03-18 20:21:47 +0000	[diff] [blame]	949	\| BUCKETS \| uint32_t[n_buckets] // 32 bit hash indexes
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	950	\|-------------------------\|
Eric Christopher	afa288d	2013-03-18 20:21:47 +0000	[diff] [blame]	951	\| HASHES \| uint32_t[n_hashes] // 32 bit hash values
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	952	\|-------------------------\|
Eric Christopher	afa288d	2013-03-18 20:21:47 +0000	[diff] [blame]	953	\| OFFSETS \| uint32_t[n_hashes] // 32 bit offsets to hash value data
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	954	\|-------------------------\|
				955	\| ALL HASH DATA \|
				956	`-------------------------'
				957
				958	So taking the exact same data from the standard hash example above we end up
				959	with:
				960
				961	.. code-block:: none
				962
				963	.------------.
				964	\| HEADER \|
				965	\|------------\|
				966	\| 0 \| BUCKETS[0]
				967	\| 2 \| BUCKETS[1]
				968	\| 5 \| BUCKETS[2]
				969	\| 6 \| BUCKETS[3]
				970	\| \| ...
				971	\| ... \| BUCKETS[n_buckets]
				972	\|------------\|
				973	\| 0x........ \| HASHES[0]
				974	\| 0x........ \| HASHES[1]
				975	\| 0x........ \| HASHES[2]
				976	\| 0x........ \| HASHES[3]
				977	\| 0x........ \| HASHES[4]
				978	\| 0x........ \| HASHES[5]
				979	\| 0x12345678 \| HASHES[6] hash for BUCKETS[3]
				980	\| 0x29273623 \| HASHES[7] hash for BUCKETS[3]
				981	\| 0x82638293 \| HASHES[8] hash for BUCKETS[3]
				982	\| 0x........ \| HASHES[9]
				983	\| 0x........ \| HASHES[10]
				984	\| 0x........ \| HASHES[11]
				985	\| 0x........ \| HASHES[12]
				986	\| 0x........ \| HASHES[13]
				987	\| 0x........ \| HASHES[n_hashes]
				988	\|------------\|
				989	\| 0x........ \| OFFSETS[0]
				990	\| 0x........ \| OFFSETS[1]
				991	\| 0x........ \| OFFSETS[2]
				992	\| 0x........ \| OFFSETS[3]
				993	\| 0x........ \| OFFSETS[4]
				994	\| 0x........ \| OFFSETS[5]
				995	\| 0x000034f0 \| OFFSETS[6] offset for BUCKETS[3]
				996	\| 0x00003500 \| OFFSETS[7] offset for BUCKETS[3]
				997	\| 0x00003550 \| OFFSETS[8] offset for BUCKETS[3]
				998	\| 0x........ \| OFFSETS[9]
				999	\| 0x........ \| OFFSETS[10]
				1000	\| 0x........ \| OFFSETS[11]
				1001	\| 0x........ \| OFFSETS[12]
				1002	\| 0x........ \| OFFSETS[13]
				1003	\| 0x........ \| OFFSETS[n_hashes]
				1004	\|------------\|
				1005	\| \|
				1006	\| \|
				1007	\| \|
				1008	\| \|
				1009	\| \|
				1010	\|------------\|
				1011	0x000034f0: \| 0x00001203 \| .debug_str ("erase")
				1012	\| 0x00000004 \| A 32 bit array count - number of HashData with name "erase"
				1013	\| 0x........ \| HashData[0]
				1014	\| 0x........ \| HashData[1]
				1015	\| 0x........ \| HashData[2]
				1016	\| 0x........ \| HashData[3]
				1017	\| 0x00000000 \| String offset into .debug_str (terminate data for hash)
				1018	\|------------\|
				1019	0x00003500: \| 0x00001203 \| String offset into .debug_str ("collision")
				1020	\| 0x00000002 \| A 32 bit array count - number of HashData with name "collision"
				1021	\| 0x........ \| HashData[0]
				1022	\| 0x........ \| HashData[1]
				1023	\| 0x00001203 \| String offset into .debug_str ("dump")
				1024	\| 0x00000003 \| A 32 bit array count - number of HashData with name "dump"
				1025	\| 0x........ \| HashData[0]
				1026	\| 0x........ \| HashData[1]
				1027	\| 0x........ \| HashData[2]
				1028	\| 0x00000000 \| String offset into .debug_str (terminate data for hash)
				1029	\|------------\|
				1030	0x00003550: \| 0x00001203 \| String offset into .debug_str ("main")
				1031	\| 0x00000009 \| A 32 bit array count - number of HashData with name "main"
				1032	\| 0x........ \| HashData[0]
				1033	\| 0x........ \| HashData[1]
				1034	\| 0x........ \| HashData[2]
				1035	\| 0x........ \| HashData[3]
				1036	\| 0x........ \| HashData[4]
				1037	\| 0x........ \| HashData[5]
				1038	\| 0x........ \| HashData[6]
				1039	\| 0x........ \| HashData[7]
				1040	\| 0x........ \| HashData[8]
				1041	\| 0x00000000 \| String offset into .debug_str (terminate data for hash)
				1042	`------------'
				1043
				1044	So we still have all of the same data, we just organize it more efficiently for
				1045	debugger lookup. If we repeat the same "``printf``" lookup from above, we
				1046	would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit
				1047	hash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which
				1048	is the index into the ``HASHES`` table. We would then compare any consecutive
				1049	32 bit hashes values in the ``HASHES`` array as long as the hashes would be in
				1050	``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo
				1051	``n_buckets`` is still 3. In the case of a failed lookup we would access the
				1052	memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes
				1053	before we know that we have no match. We don't end up marching through
				1054	multiple words of memory and we really keep the number of processor data cache
				1055	lines being accessed as small as possible.
				1056
				1057	The string hash that is used for these lookup tables is the Daniel J.
				1058	Bernstein hash which is also used in the ELF ``GNU_HASH`` sections. It is a
				1059	very good hash for all kinds of names in programs with very few hash
				1060	collisions.
				1061
				1062	Empty buckets are designated by using an invalid hash index of ``UINT32_MAX``.
				1063
				1064	Details
				1065	^^^^^^^
				1066
				1067	These name hash tables are designed to be generic where specializations of the
				1068	table get to define additional data that goes into the header ("``HeaderData``"),
				1069	how the string value is stored ("``KeyType``") and the content of the data for each
				1070	hash value.
				1071
				1072	Header Layout
				1073	"""""""""""""
				1074
				1075	The header has a fixed part, and the specialized part. The exact format of the
				1076	header is:
				1077
				1078	.. code-block:: c
				1079
				1080	struct Header
				1081	{
				1082	uint32_t magic; // 'HASH' magic value to allow endian detection
				1083	uint16_t version; // Version number
				1084	uint16_t hash_function; // The hash function enumeration that was used
				1085	uint32_t bucket_count; // The number of buckets in this hash table
				1086	uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table
				1087	uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
				1088	// Specifically the length of the following HeaderData field - this does not
				1089	// include the size of the preceding fields
				1090	HeaderData header_data; // Implementation specific header data
				1091	};
				1092
				1093	The header starts with a 32 bit "``magic``" value which must be ``'HASH'``
				1094	encoded as an ASCII integer. This allows the detection of the start of the
				1095	hash table and also allows the table's byte order to be determined so the table
				1096	can be correctly extracted. The "``magic``" value is followed by a 16 bit
				1097	``version`` number which allows the table to be revised and modified in the
				1098	future. The current version number is 1. ``hash_function`` is a ``uint16_t``
				1099	enumeration that specifies which hash function was used to produce this table.
				1100	The current values for the hash function enumerations include:
				1101
				1102	.. code-block:: c
				1103
				1104	enum HashFunctionType
				1105	{
				1106	eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
				1107	};
				1108
				1109	``bucket_count`` is a 32 bit unsigned integer that represents how many buckets
				1110	are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32 bit
				1111	hash values that are in the ``HASHES`` array, and is the same number of offsets
				1112	are contained in the ``OFFSETS`` array. ``header_data_len`` specifies the size
				1113	in bytes of the ``HeaderData`` that is filled in by specialized versions of
				1114	this table.
				1115
				1116	Fixed Lookup
				1117	""""""""""""
				1118
				1119	The header is followed by the buckets, hashes, offsets, and hash value data.
				1120
				1121	.. code-block:: c
				1122
				1123	struct FixedTable
				1124	{
				1125	uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below
				1126	uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table
				1127	uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above
				1128	};
				1129
				1130	``buckets`` is an array of 32 bit indexes into the ``hashes`` array. The
				1131	``hashes`` array contains all of the 32 bit hash values for all names in the
				1132	hash table. Each hash in the ``hashes`` table has an offset in the ``offsets``
				1133	array that points to the data for the hash value.
				1134
				1135	This table setup makes it very easy to repurpose these tables to contain
				1136	different data, while keeping the lookup mechanism the same for all tables.
				1137	This layout also makes it possible to save the table to disk and map it in
				1138	later and do very efficient name lookups with little or no parsing.
				1139
				1140	DWARF lookup tables can be implemented in a variety of ways and can store a lot
				1141	of information for each name. We want to make the DWARF tables extensible and
				1142	able to store the data efficiently so we have used some of the DWARF features
				1143	that enable efficient data storage to define exactly what kind of data we store
				1144	for each name.
				1145
				1146	The ``HeaderData`` contains a definition of the contents of each HashData chunk.
				1147	We might want to store an offset to all of the debug information entries (DIEs)
				1148	for each name. To keep things extensible, we create a list of items, or
				1149	Atoms, that are contained in the data for each name. First comes the type of
				1150	the data in each atom:
				1151
				1152	.. code-block:: c
				1153
				1154	enum AtomType
				1155	{
				1156	eAtomTypeNULL = 0u,
				1157	eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding
				1158	eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question
				1159	eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
				1160	eAtomTypeNameFlags = 4u, // Flags from enum NameFlags
				1161	eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags
				1162	};
				1163
				1164	The enumeration values and their meanings are:
				1165
				1166	.. code-block:: none
				1167
				1168	eAtomTypeNULL - a termination atom that specifies the end of the atom list
				1169	eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name
				1170	eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE
				1171	eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
				1172	eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...)
				1173	eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...)
				1174
				1175	Then we allow each atom type to define the atom type and how the data for each
				1176	atom type data is encoded:
				1177
				1178	.. code-block:: c
				1179
				1180	struct Atom
				1181	{
				1182	uint16_t type; // AtomType enum value
				1183	uint16_t form; // DWARF DW_FORM_XXX defines
				1184	};
				1185
				1186	The ``form`` type above is from the DWARF specification and defines the exact
				1187	encoding of the data for the Atom type. See the DWARF specification for the
				1188	``DW_FORM_`` definitions.
				1189
				1190	.. code-block:: c
				1191
				1192	struct HeaderData
				1193	{
				1194	uint32_t die_offset_base;
				1195	uint32_t atom_count;
				1196	Atoms atoms[atom_count0];
				1197	};
				1198
				1199	``HeaderData`` defines the base DIE offset that should be added to any atoms
				1200	that are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``,
				1201	``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``. It also defines
				1202	what is contained in each ``HashData`` object -- ``Atom.form`` tells us how large
				1203	each field will be in the ``HashData`` and the ``Atom.type`` tells us how this data
				1204	should be interpreted.
				1205
				1206	For the current implementations of the "``.apple_names``" (all functions +
				1207	globals), the "``.apple_types``" (names of all types that are defined), and
				1208	the "``.apple_namespaces``" (all namespaces), we currently set the ``Atom``
				1209	array to be:
				1210
				1211	.. code-block:: c
				1212
				1213	HeaderData.atom_count = 1;
				1214	HeaderData.atoms[0].type = eAtomTypeDIEOffset;
				1215	HeaderData.atoms[0].form = DW_FORM_data4;
				1216
				1217	This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
Eric Christopher	61e0b78	2013-03-19 23:10:26 +0000	[diff] [blame]	1218	encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have
				1219	multiple matching DIEs in a single file, which could come up with an inlined
				1220	function for instance. Future tables could include more information about the
				1221	DIE such as flags indicating if the DIE is a function, method, block,
				1222	or inlined.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	1223
				1224	The KeyType for the DWARF table is a 32 bit string table offset into the
Eric Christopher	61e0b78	2013-03-19 23:10:26 +0000	[diff] [blame]	1225	".debug_str" table. The ".debug_str" is the string table for the DWARF which
				1226	may already contain copies of all of the strings. This helps make sure, with
				1227	help from the compiler, that we reuse the strings between all of the DWARF
				1228	sections and keeps the hash table size down. Another benefit to having the
				1229	compiler generate all strings as DW_FORM_strp in the debug info, is that
				1230	DWARF parsing can be made much faster.
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	1231
				1232	After a lookup is made, we get an offset into the hash data. The hash data
Eric Christopher	61e0b78	2013-03-19 23:10:26 +0000	[diff] [blame]	1233	needs to be able to deal with 32 bit hash collisions, so the chunk of data
				1234	at the offset in the hash data consists of a triple:
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	1235
				1236	.. code-block:: c
				1237
				1238	uint32_t str_offset
				1239	uint32_t hash_data_count
				1240	HashData[hash_data_count]
				1241
				1242	If "str_offset" is zero, then the bucket contents are done. 99.9% of the
Eric Christopher	61e0b78	2013-03-19 23:10:26 +0000	[diff] [blame]	1243	hash data chunks contain a single item (no 32 bit hash collision):
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	1244
				1245	.. code-block:: none
				1246
				1247	.------------.
				1248	\| 0x00001023 \| uint32_t KeyType (.debug_str[0x0001023] => "main")
				1249	\| 0x00000004 \| uint32_t HashData count
				1250	\| 0x........ \| uint32_t HashData[0] DIE offset
				1251	\| 0x........ \| uint32_t HashData[1] DIE offset
				1252	\| 0x........ \| uint32_t HashData[2] DIE offset
				1253	\| 0x........ \| uint32_t HashData[3] DIE offset
				1254	\| 0x00000000 \| uint32_t KeyType (end of hash chain)
				1255	`------------'
				1256
				1257	If there are collisions, you will have multiple valid string offsets:
				1258
				1259	.. code-block:: none
				1260
				1261	.------------.
				1262	\| 0x00001023 \| uint32_t KeyType (.debug_str[0x0001023] => "main")
				1263	\| 0x00000004 \| uint32_t HashData count
				1264	\| 0x........ \| uint32_t HashData[0] DIE offset
				1265	\| 0x........ \| uint32_t HashData[1] DIE offset
				1266	\| 0x........ \| uint32_t HashData[2] DIE offset
				1267	\| 0x........ \| uint32_t HashData[3] DIE offset
				1268	\| 0x00002023 \| uint32_t KeyType (.debug_str[0x0002023] => "print")
				1269	\| 0x00000002 \| uint32_t HashData count
				1270	\| 0x........ \| uint32_t HashData[0] DIE offset
				1271	\| 0x........ \| uint32_t HashData[1] DIE offset
				1272	\| 0x00000000 \| uint32_t KeyType (end of hash chain)
				1273	`------------'
				1274
				1275	Current testing with real world C++ binaries has shown that there is around 1
				1276	32 bit hash collision per 100,000 name entries.
				1277
				1278	Contents
				1279	^^^^^^^^
				1280
				1281	As we said, we want to strictly define exactly what is included in the
				1282	different tables. For DWARF, we have 3 tables: "``.apple_names``",
				1283	"``.apple_types``", and "``.apple_namespaces``".
				1284
				1285	"``.apple_names``" sections should contain an entry for each DWARF DIE whose
				1286	``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or
				1287	``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``,
				1288	``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``. It also contains
				1289	``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and
				1290	static variables). All global and static variables should be included,
				1291	including those scoped within functions and classes. For example using the
				1292	following code:
				1293
				1294	.. code-block:: c
				1295
				1296	static int var = 0;
				1297
				1298	void f ()
				1299	{
				1300	static int var = 0;
				1301	}
				1302
				1303	Both of the static ``var`` variables would be included in the table. All
				1304	functions should emit both their full names and their basenames. For C or C++,
				1305	the full name is the mangled name (if available) which is usually in the
				1306	``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the
				1307	function basename. If global or static variables have a mangled name in a
				1308	``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the
				1309	simple name found in the ``DW_AT_name`` attribute.
				1310
				1311	"``.apple_types``" sections should contain an entry for each DWARF DIE whose
				1312	tag is one of:
				1313
				1314	* DW_TAG_array_type
				1315	* DW_TAG_class_type
				1316	* DW_TAG_enumeration_type
				1317	* DW_TAG_pointer_type
				1318	* DW_TAG_reference_type
				1319	* DW_TAG_string_type
				1320	* DW_TAG_structure_type
				1321	* DW_TAG_subroutine_type
				1322	* DW_TAG_typedef
				1323	* DW_TAG_union_type
				1324	* DW_TAG_ptr_to_member_type
				1325	* DW_TAG_set_type
				1326	* DW_TAG_subrange_type
				1327	* DW_TAG_base_type
				1328	* DW_TAG_const_type
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	1329	* DW_TAG_file_type
				1330	* DW_TAG_namelist
				1331	* DW_TAG_packed_type
				1332	* DW_TAG_volatile_type
				1333	* DW_TAG_restrict_type
Victor Leschuk	b339549	2016-10-31 19:09:38 +0000	[diff] [blame]	1334	* DW_TAG_atomic_type
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	1335	* DW_TAG_interface_type
				1336	* DW_TAG_unspecified_type
				1337	* DW_TAG_shared_type
				1338
				1339	Only entries with a ``DW_AT_name`` attribute are included, and the entry must
				1340	not be a forward declaration (``DW_AT_declaration`` attribute with a non-zero
				1341	value). For example, using the following code:
				1342
				1343	.. code-block:: c
				1344
				1345	int main ()
				1346	{
				1347	int *b = 0;
				1348	return *b;
				1349	}
				1350
				1351	We get a few type DIEs:
				1352
				1353	.. code-block:: none
				1354
				1355	0x00000067: TAG_base_type [5]
				1356	AT_encoding( DW_ATE_signed )
				1357	AT_name( "int" )
				1358	AT_byte_size( 0x04 )
				1359
				1360	0x0000006e: TAG_pointer_type [6]
				1361	AT_type( {0x00000067} ( int ) )
				1362	AT_byte_size( 0x08 )
				1363
				1364	The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``.
				1365
				1366	"``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs.
				1367	If we run into a namespace that has no name this is an anonymous namespace, and
				1368	the name should be output as "``(anonymous namespace)``" (without the quotes).
				1369	Why? This matches the output of the ``abi::cxa_demangle()`` that is in the
				1370	standard C++ library that demangles mangled names.
				1371
				1372
				1373	Language Extensions and File Format Changes
				1374	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				1375
				1376	Objective-C Extensions
				1377	""""""""""""""""""""""
				1378
				1379	"``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an
				1380	Objective-C class. The name used in the hash table is the name of the
				1381	Objective-C class itself. If the Objective-C class has a category, then an
				1382	entry is made for both the class name without the category, and for the class
				1383	name with the category. So if we have a DIE at offset 0x1234 with a name of
				1384	method "``-[NSString(my_additions) stringWithSpecialString:]``", we would add
				1385	an entry for "``NSString``" that points to DIE 0x1234, and an entry for
				1386	"``NSString(my_additions)``" that points to 0x1234. This allows us to quickly
				1387	track down all Objective-C methods for an Objective-C class when doing
				1388	expressions. It is needed because of the dynamic nature of Objective-C where
				1389	anyone can add methods to a class. The DWARF for Objective-C methods is also
				1390	emitted differently from C++ classes where the methods are not usually
				1391	contained in the class definition, they are scattered about across one or more
				1392	compile units. Categories can also be defined in different shared libraries.
				1393	So we need to be able to quickly find all of the methods and class functions
				1394	given the Objective-C class name, or quickly find all methods and class
				1395	functions for a class + category name. This table does not contain any
				1396	selector names, it just maps Objective-C class names (or class names +
				1397	category) to all of the methods and class functions. The selectors are added
				1398	as function basenames in the "``.debug_names``" section.
				1399
				1400	In the "``.apple_names``" section for Objective-C functions, the full name is
				1401	the entire function name with the brackets ("``-[NSString
				1402	stringWithCString:]``") and the basename is the selector only
				1403	("``stringWithCString:``").
				1404
				1405	Mach-O Changes
				1406	""""""""""""""
				1407
Alp Toker	087ab61	2013-12-05 05:44:44 +0000	[diff] [blame]	1408	The sections names for the apple hash tables are for non-mach-o files. For
Dmitri Gribenko	bbef5ea	2012-11-22 11:56:02 +0000	[diff] [blame]	1409	mach-o files, the sections should be contained in the ``__DWARF`` segment with
				1410	names as follows:
				1411
				1412	* "``.apple_names``" -> "``__apple_names``"
				1413	* "``.apple_types``" -> "``__apple_types``"
				1414	* "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit)
				1415	* "``.apple_objc``" -> "``__apple_objc``"
				1416
Reid Kleckner	2b8506b	2016-06-07 20:27:30 +0000	[diff] [blame]	1417	.. _codeview:
				1418
				1419	CodeView Debug Info Format
				1420	==========================
				1421
				1422	LLVM supports emitting CodeView, the Microsoft debug info format, and this
				1423	section describes the design and implementation of that support.
				1424
				1425	Format Background
				1426	-----------------
				1427
				1428	CodeView as a format is clearly oriented around C++ debugging, and in C++, the
				1429	majority of debug information tends to be type information. Therefore, the
				1430	overriding design constraint of CodeView is the separation of type information
				1431	from other "symbol" information so that type information can be efficiently
				1432	merged across translation units. Both type information and symbol information is
				1433	generally stored as a sequence of records, where each record begins with a
				1434	16-bit record size and a 16-bit record kind.
				1435
				1436	Type information is usually stored in the ``.debug$T`` section of the object
				1437	file. All other debug info, such as line info, string table, symbol info, and
				1438	inlinee info, is stored in one or more ``.debug$S`` sections. There may only be
				1439	one ``.debug$T`` section per object file, since all other debug info refers to
				1440	it. If a PDB (enabled by the ``/Zi`` MSVC option) was used during compilation,
				1441	the ``.debug$T`` section will contain only an ``LF_TYPESERVER2`` record pointing
				1442	to the PDB. When using PDBs, symbol information appears to remain in the object
				1443	file ``.debug$S`` sections.
				1444
				1445	Type records are referred to by their index, which is the number of records in
				1446	the stream before a given record plus ``0x1000``. Many common basic types, such
				1447	as the basic integral types and unqualified pointers to them, are represented
				1448	using type indices less than ``0x1000``. Such basic types are built in to
				1449	CodeView consumers and do not require type records.
				1450
				1451	Each type record may only contain type indices that are less than its own type
				1452	index. This ensures that the graph of type stream references is acyclic. While
				1453	the source-level type graph may contain cycles through pointer types (consider a
				1454	linked list struct), these cycles are removed from the type stream by always
				1455	referring to the forward declaration record of user-defined record types. Only
				1456	"symbol" records in the ``.debug$S`` streams may refer to complete,
				1457	non-forward-declaration type records.
				1458
				1459	Working with CodeView
				1460	---------------------
				1461
				1462	These are instructions for some common tasks for developers working to improve
				1463	LLVM's CodeView support. Most of them revolve around using the CodeView dumper
				1464	embedded in ``llvm-readobj``.
				1465
				1466	* Testing MSVC's output::
				1467
				1468	$ cl -c -Z7 foo.cpp # Use /Z7 to keep types in the object file
				1469	$ llvm-readobj -codeview foo.obj
				1470
				1471	* Getting LLVM IR debug info out of Clang::
				1472
				1473	$ clang -g -gcodeview --target=x86_64-windows-msvc foo.cpp -S -emit-llvm
				1474
				1475	Use this to generate LLVM IR for LLVM test cases.
				1476
				1477	* Generate and dump CodeView from LLVM IR metadata::
				1478
				1479	$ llc foo.ll -filetype=obj -o foo.obj
				1480	$ llvm-readobj -codeview foo.obj > foo.txt
				1481
				1482	Use this pattern in lit test cases and FileCheck the output of llvm-readobj
				1483
				1484	Improving LLVM's CodeView support is a process of finding interesting type
				1485	records, constructing a C++ test case that makes MSVC emit those records,
				1486	dumping the records, understanding them, and then generating equivalent records
				1487	in LLVM's backend.
Anastasis Grammenos	629edfb	2018-07-19 14:08:54 +0000	[diff] [blame]	1488
				1489	Testing Debug Info Preservation in Optimizations
				1490	================================================
				1491
				1492	The following paragraphs are an introduction to the debugify utility
				1493	and examples of how to use it in regression tests to check debug info
				1494	preservation after optimizations.
				1495
				1496	The ``debugify`` utility
				1497	------------------------
				1498
				1499	The ``debugify`` synthetic debug info testing utility consists of two
				1500	main parts. The ``debugify`` pass and the ``check-debugify`` one. They are
				1501	meant to be used with ``opt`` for development purposes.
				1502
				1503	The first applies synthetic debug information to every instruction of the module,
				1504	while the latter checks that this DI is still available after an optimization
				1505	has occurred, reporting any errors/warnings while doing so.
				1506
				1507	The instructions are assigned sequentially increasing line locations,
				1508	and are immediately used by debug value intrinsics when possible.
				1509
				1510	For example, here is a module before:
				1511
				1512	.. code-block:: llvm
				1513
Chandler Carruth	f6ba2bc	2018-08-06 10:03:25 +0000	[diff] [blame]	1514	define void @f(i32* %x) {
Anastasis Grammenos	629edfb	2018-07-19 14:08:54 +0000	[diff] [blame]	1515	entry:
				1516	%x.addr = alloca i32*, align 8
				1517	store i32* %x, i32** %x.addr, align 8
				1518	%0 = load i32, i32* %x.addr, align 8
				1519	store i32 10, i32* %0, align 4
				1520	ret void
				1521	}
				1522
				1523	and after running ``opt -debugify`` on it we get:
				1524
Chandler Carruth	084fdcc	2018-08-06 10:20:05 +0000	[diff] [blame]	1525	.. code-block:: text
Anastasis Grammenos	629edfb	2018-07-19 14:08:54 +0000	[diff] [blame]	1526
Chandler Carruth	f6ba2bc	2018-08-06 10:03:25 +0000	[diff] [blame]	1527	define void @f(i32* %x) !dbg !6 {
Anastasis Grammenos	629edfb	2018-07-19 14:08:54 +0000	[diff] [blame]	1528	entry:
				1529	%x.addr = alloca i32*, align 8, !dbg !12
				1530	call void @llvm.dbg.value(metadata i32** %x.addr, metadata !9, metadata !DIExpression()), !dbg !12
				1531	store i32* %x, i32** %x.addr, align 8, !dbg !13
				1532	%0 = load i32, i32* %x.addr, align 8, !dbg !14
				1533	call void @llvm.dbg.value(metadata i32* %0, metadata !11, metadata !DIExpression()), !dbg !14
				1534	store i32 10, i32* %0, align 4, !dbg !15
				1535	ret void, !dbg !16
				1536	}
				1537
				1538	!llvm.dbg.cu = !{!0}
				1539	!llvm.debugify = !{!3, !4}
				1540	!llvm.module.flags = !{!5}
				1541
				1542	!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
				1543	!1 = !DIFile(filename: "debugify-sample.ll", directory: "/")
				1544	!2 = !{}
				1545	!3 = !{i32 5}
				1546	!4 = !{i32 2}
				1547	!5 = !{i32 2, !"Debug Info Version", i32 3}
				1548	!6 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !1, line: 1, type: !7, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, retainedNodes: !8)
				1549	!7 = !DISubroutineType(types: !2)
				1550	!8 = !{!9, !11}
				1551	!9 = !DILocalVariable(name: "1", scope: !6, file: !1, line: 1, type: !10)
				1552	!10 = !DIBasicType(name: "ty64", size: 64, encoding: DW_ATE_unsigned)
				1553	!11 = !DILocalVariable(name: "2", scope: !6, file: !1, line: 3, type: !10)
				1554	!12 = !DILocation(line: 1, column: 1, scope: !6)
				1555	!13 = !DILocation(line: 2, column: 1, scope: !6)
				1556	!14 = !DILocation(line: 3, column: 1, scope: !6)
				1557	!15 = !DILocation(line: 4, column: 1, scope: !6)
				1558	!16 = !DILocation(line: 5, column: 1, scope: !6)
				1559
				1560	The following is an example of the -check-debugify output:
				1561
				1562	.. code-block:: none
				1563
				1564	$ opt -enable-debugify -loop-vectorize llvm/test/Transforms/LoopVectorize/i8-induction.ll -disable-output
				1565	ERROR: Instruction with empty DebugLoc in function f -- %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				1566
				1567	Errors/warnings can range from instructions with empty debug location to an
				1568	instruction having a type that's incompatible with the source variable it describes,
				1569	all the way to missing lines and missing debug value intrinsics.
				1570
				1571	Fixing errors
				1572	^^^^^^^^^^^^^
				1573
				1574	Each of the errors above has a relevant API available to fix it.
				1575
				1576	* In the case of missing debug location, ``Instruction::setDebugLoc`` or possibly
				1577	``IRBuilder::setCurrentDebugLocation`` when using a Builder and the new location
				1578	should be reused.
				1579
				1580	* When a debug value has incompatible type ``llvm::replaceAllDbgUsesWith`` can be used.
				1581	After a RAUW call an incompatible type error can occur because RAUW does not handle
				1582	widening and narrowing of variables while ``llvm::replaceAllDbgUsesWith`` does. It is
				1583	also capable of changing the DWARF expression used by the debugger to describe the variable.
				1584	It also prevents use-before-def by salvaging or deleting invalid debug values.
				1585
				1586	* When a debug value is missing ``llvm::salvageDebugInfo`` can be used when no replacement
				1587	exists, or ``llvm::replaceAllDbgUsesWith`` when a replacement exists.
				1588
				1589	Using ``debugify``
				1590	------------------
				1591
				1592	In order for ``check-debugify`` to work, the DI must be coming from
				1593	``debugify``. Thus, modules with existing DI will be skipped.
				1594
				1595	The most straightforward way to use ``debugify`` is as follows::
				1596
				1597	$ opt -debugify -pass-to-test -check-debugify sample.ll
				1598
				1599	This will inject synthetic DI to ``sample.ll`` run the ``pass-to-test``
				1600	and then check for missing DI.
				1601
				1602	Some other ways to run debugify are avaliable:
				1603
				1604	.. code-block:: bash
				1605
				1606	# Same as the above example.
				1607	$ opt -enable-debugify -pass-to-test sample.ll
				1608
				1609	# Suppresses verbose debugify output.
				1610	$ opt -enable-debugify -debugify-quiet -pass-to-test sample.ll
				1611
				1612	# Prepend -debugify before and append -check-debugify -strip after
				1613	# each pass on the pipeline (similar to -verify-each).
				1614	$ opt -debugify-each -O2 sample.ll
				1615
				1616	``debugify`` can also be used to test a backend, e.g:
				1617
				1618	.. code-block:: bash
				1619
				1620	$ opt -debugify < sample.ll \| llc -o -
				1621
				1622	``debugify`` in regression tests
				1623	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				1624
				1625	The ``-debugify`` pass is especially helpful when it comes to testing that
				1626	a given pass preserves DI while transforming the module. For this to work,
				1627	the ``-debugify`` output must be stable enough to use in regression tests.
				1628	Changes to this pass are not allowed to break existing tests.
				1629
				1630	It allows us to test for DI loss in the same tests we check that the
				1631	transformation is actually doing what it should.
				1632
				1633	Here is an example from ``test/Transforms/InstCombine/cast-mul-select.ll``:
				1634
				1635	.. code-block:: llvm
				1636
				1637	; RUN: opt < %s -debugify -instcombine -S \| FileCheck %s --check-prefix=DEBUGINFO
				1638
				1639	define i32 @mul(i32 %x, i32 %y) {
				1640	; DBGINFO-LABEL: @mul(
				1641	; DBGINFO-NEXT: [[C:%.]] = mul i32 {{.}}
				1642	; DBGINFO-NEXT: call void @llvm.dbg.value(metadata i32 [[C]]
				1643	; DBGINFO-NEXT: [[D:%.]] = and i32 {{.}}
				1644	; DBGINFO-NEXT: call void @llvm.dbg.value(metadata i32 [[D]]
				1645
				1646	%A = trunc i32 %x to i8
				1647	%B = trunc i32 %y to i8
				1648	%C = mul i8 %A, %B
				1649	%D = zext i8 %C to i32
				1650	ret i32 %D
				1651	}
				1652
				1653	Here we test that the two ``dbg.value`` instrinsics are preserved and
				1654	are correctly pointing to the ``[[C]]`` and ``[[D]]`` variables.
				1655
				1656	.. note::
				1657
				1658	Note, that when writing this kind of regression tests, it is important
				1659	to make them as robust as possible. That's why we should try to avoid
				1660	hardcoding line/variable numbers in check lines. If for example you test
				1661	for a ``DILocation`` to have a specific line number, and someone later adds
				1662	an instruction before the one we check the test will fail. In the cases this
				1663	can't be avoided (say, if a test wouldn't be precise enough), moving the
Paul Robinson	c0afd08	2018-11-19 22:53:42 +0000	[diff] [blame]	1664	test to its own file is preferred.