Blame - docs/AMDGPUModifierSyntax.rst - platform_external_llvm

blob: 1a555b678324d1e6e49f3760c973cfec1699059b [file] [log] [blame]

Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1	======================================
				2	Syntax of AMDGPU Instruction Modifiers
				3	======================================
				4
				5	.. contents::
				6	:local:
				7
				8	Conventions
				9	===========
				10
				11	The following notation is used throughout this document:
				12
				13	=================== =============================================================
				14	Notation Description
				15	=================== =============================================================
				16	{0..N} Any integer value in the range from 0 to N (inclusive).
				17	<x> Syntax and meaning of x is explained elsewhere.
				18	=================== =============================================================
				19
				20	.. _amdgpu_syn_modifiers:
				21
				22	Modifiers
				23	=========
				24
				25	DS Modifiers
				26	------------
				27
				28	.. _amdgpu_synid_ds_offset8:
				29
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	30	offset8
				31	~~~~~~~
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	32
				33	Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0.
				34
				35	Used with DS instructions which have 2 addresses.
				36
				37	=================== =====================================================
				38	Syntax Description
				39	=================== =====================================================
				40	offset:{0..0xFF} Specifies an unsigned 8-bit offset as a positive
				41	:ref:`integer number <amdgpu_synid_integer_number>`.
				42	=================== =====================================================
				43
				44	Examples:
				45
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	46	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	47
				48	offset:255
				49	offset:0xff
				50
				51	.. _amdgpu_synid_ds_offset16:
				52
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	53	offset16
				54	~~~~~~~~
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	55
				56	Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0.
				57
				58	Used with DS instructions which have 1 address.
				59
				60	==================== ======================================================
				61	Syntax Description
				62	==================== ======================================================
				63	offset:{0..0xFFFF} Specifies an unsigned 16-bit offset as a positive
				64	:ref:`integer number <amdgpu_synid_integer_number>`.
				65	==================== ======================================================
				66
				67	Examples:
				68
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	69	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	70
				71	offset:65535
				72	offset:0xffff
				73
				74	.. _amdgpu_synid_sw_offset16:
				75
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	76	pattern
				77	~~~~~~~
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	78
				79	This is a special modifier which may be used with ds_swizzle_b32 instruction only.
				80	It specifies a swizzle pattern in numeric or symbolic form. The default value is 0.
				81
				82	See AMD documentation for more information.
				83
				84	======================================================= ===========================================================
				85	Syntax Description
				86	======================================================= ===========================================================
				87	offset:{0..0xFFFF} Specifies a 16-bit swizzle pattern.
				88	offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3}) Specifies a quad permute mode pattern
				89
				90	Each number is a lane id.
				91	offset:swizzle(BITMASK_PERM, "<mask>") Specifies a bitmask permute mode pattern.
				92
				93	The pattern converts a 5-bit lane id to another
				94	lane id with which the lane interacts.
				95
				96	mask is a 5 character sequence which
				97	specifies how to transform the bits of the
				98	lane id.
				99
				100	The following characters are allowed:
				101
				102	* "0" - set bit to 0.
				103
				104	* "1" - set bit to 1.
				105
				106	* "p" - preserve bit.
				107
				108	* "i" - inverse bit.
				109
				110	offset:swizzle(BROADCAST,{2..32},{0..N}) Specifies a broadcast mode.
				111
				112	Broadcasts the value of any particular lane to
				113	all lanes in its group.
				114
				115	The first numeric parameter is a group
				116	size and must be equal to 2, 4, 8, 16 or 32.
				117
				118	The second numeric parameter is an index of the
				119	lane being broadcasted.
				120
				121	The index must not exceed group size.
				122	offset:swizzle(SWAP,{1..16}) Specifies a swap mode.
				123
				124	Swaps the neighboring groups of
				125	1, 2, 4, 8 or 16 lanes.
				126	offset:swizzle(REVERSE,{2..32}) Specifies a reverse mode.
				127
				128	Reverses the lanes for groups of 2, 4, 8, 16 or 32 lanes.
				129	======================================================= ===========================================================
				130
				131	Numeric parameters may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
				132	:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
				133
				134	Examples:
				135
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	136	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	137
				138	offset:255
				139	offset:0xffff
				140	offset:swizzle(QUAD_PERM, 0, 1, 2 ,3)
				141	offset:swizzle(BITMASK_PERM, "01pi0")
				142	offset:swizzle(BROADCAST, 2, 0)
				143	offset:swizzle(SWAP, 8)
				144	offset:swizzle(REVERSE, 30 + 2)
				145
				146	.. _amdgpu_synid_gds:
				147
				148	gds
				149	~~~
				150
				151	Specifies whether to use GDS or LDS memory (LDS is the default).
				152
				153	======================================== ================================================
				154	Syntax Description
				155	======================================== ================================================
				156	gds Use GDS memory.
				157	======================================== ================================================
				158
				159
				160	EXP Modifiers
				161	-------------
				162
				163	.. _amdgpu_synid_done:
				164
				165	done
				166	~~~~
				167
				168	Specifies if this is the last export from the shader to the target. By default, current
				169	instruction does not finish an export sequence.
				170
				171	======================================== ================================================
				172	Syntax Description
				173	======================================== ================================================
				174	done Indicates the last export operation.
				175	======================================== ================================================
				176
				177	.. _amdgpu_synid_compr:
				178
				179	compr
				180	~~~~~
				181
				182	Indicates if the data are compressed (data are not compressed by default).
				183
				184	======================================== ================================================
				185	Syntax Description
				186	======================================== ================================================
				187	compr Data are compressed.
				188	======================================== ================================================
				189
				190	.. _amdgpu_synid_vm:
				191
				192	vm
				193	~~
				194
				195	Specifies valid mask flag state (off by default).
				196
				197	======================================== ================================================
				198	Syntax Description
				199	======================================== ================================================
				200	vm Set valid mask flag.
				201	======================================== ================================================
				202
				203	FLAT Modifiers
				204	--------------
				205
				206	.. _amdgpu_synid_flat_offset12:
				207
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	208	offset12
				209	~~~~~~~~
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	210
				211	Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
				212
				213	Cannot be used with global/scratch opcodes. GFX9 only.
				214
				215	================= ======================================================
				216	Syntax Description
				217	================= ======================================================
				218	offset:{0..4095} Specifies a 12-bit unsigned offset as a positive
				219	:ref:`integer number <amdgpu_synid_integer_number>`.
				220	================= ======================================================
				221
				222	Examples:
				223
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	224	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	225
				226	offset:4095
				227	offset:0xff
				228
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	229	.. _amdgpu_synid_flat_offset13s:
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	230
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	231	offset13s
				232	~~~~~~~~~
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	233
				234	Specifies an immediate signed 13-bit offset, in bytes. The default value is 0.
				235
				236	Can be used with global/scratch opcodes only. GFX9 only.
				237
				238	============================ =======================================================
				239	Syntax Description
				240	============================ =======================================================
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	241	offset:{-4096..4095} Specifies a 13-bit signed offset as an
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	242	:ref:`integer number <amdgpu_synid_integer_number>`.
				243	============================ =======================================================
				244
				245	Examples:
				246
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	247	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	248
				249	offset:-4000
				250	offset:0x10
				251
				252	glc
				253	~~~
				254
				255	See a description :ref:`here<amdgpu_synid_glc>`.
				256
				257	slc
				258	~~~
				259
				260	See a description :ref:`here<amdgpu_synid_slc>`.
				261
				262	tfe
				263	~~~
				264
				265	See a description :ref:`here<amdgpu_synid_tfe>`.
				266
				267	nv
				268	~~
				269
				270	See a description :ref:`here<amdgpu_synid_nv>`.
				271
				272	MIMG Modifiers
				273	--------------
				274
				275	.. _amdgpu_synid_dmask:
				276
				277	dmask
				278	~~~~~
				279
				280	Specifies which channels (image components) are used by the operation. By default, no channels
				281	are used.
				282
				283	=============== =====================================================
				284	Syntax Description
				285	=============== =====================================================
				286	dmask:{0..15} Specifies image channels as a positive
				287	:ref:`integer number <amdgpu_synid_integer_number>`.
				288
				289	Each bit corresponds to one of 4 image
				290	components (RGBA).
				291
				292	If the specified bit value
				293	is 0, the component is not used, value 1 means
				294	that the component is used.
				295	=============== =====================================================
				296
				297	This modifier has some limitations depending on instruction kind:
				298
				299	=================================================== ========================
				300	Instruction Kind Valid dmask Values
				301	=================================================== ========================
				302	32-bit atomic cmpswap 0x3
				303	32-bit atomic instructions except for cmpswap 0x1
				304	64-bit atomic cmpswap 0xF
				305	64-bit atomic instructions except for cmpswap 0x3
				306	gather4 0x1, 0x2, 0x4, 0x8
				307	Other instructions any value
				308	=================================================== ========================
				309
				310	Examples:
				311
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	312	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	313
				314	dmask:0xf
				315	dmask:0b1111
				316	dmask:3
				317
				318	.. _amdgpu_synid_unorm:
				319
				320	unorm
				321	~~~~~
				322
				323	Specifies whether the address is normalized or not (the address is normalized by default).
				324
				325	======================== ========================================
				326	Syntax Description
				327	======================== ========================================
				328	unorm Force the address to be unnormalized.
				329	======================== ========================================
				330
				331	glc
				332	~~~
				333
				334	See a description :ref:`here<amdgpu_synid_glc>`.
				335
				336	slc
				337	~~~
				338
				339	See a description :ref:`here<amdgpu_synid_slc>`.
				340
				341	.. _amdgpu_synid_r128:
				342
				343	r128
				344	~~~~
				345
				346	Specifies texture resource size. The default size is 256 bits.
				347
				348	GFX7 and GFX8 only.
				349
				350	=================== ================================================
				351	Syntax Description
				352	=================== ================================================
				353	r128 Specifies 128 bits texture resource size.
				354	=================== ================================================
				355
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	356	.. WARNING:: Using this modifier should descrease rsrc operand size from 8 to 4 dwords, but assembler does not currently support this feature.
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	357
				358	tfe
				359	~~~
				360
				361	See a description :ref:`here<amdgpu_synid_tfe>`.
				362
				363	.. _amdgpu_synid_lwe:
				364
				365	lwe
				366	~~~
				367
				368	Specifies LOD warning status (LOD warning is disabled by default).
				369
				370	======================================== ================================================
				371	Syntax Description
				372	======================================== ================================================
				373	lwe Enables LOD warning.
				374	======================================== ================================================
				375
				376	.. _amdgpu_synid_da:
				377
				378	da
				379	~~
				380
				381	Specifies if an array index must be sent to TA. By default, array index is not sent.
				382
				383	======================================== ================================================
				384	Syntax Description
				385	======================================== ================================================
				386	da Send an array-index to TA.
				387	======================================== ================================================
				388
				389	.. _amdgpu_synid_d16:
				390
				391	d16
				392	~~~
				393
				394	Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
				395
				396	======================================== ================================================
				397	Syntax Description
				398	======================================== ================================================
				399	d16 Enables 16-bits data mode.
				400
				401	On loads, convert data in memory to 16-bit
				402	format before storing it in VGPRs.
				403
				404	For stores, convert 16-bit data in VGPRs to
				405	32 bits before going to memory.
				406
				407	Note that GFX8.0 does not support data packing.
				408	Each 16-bit data element occupies 1 VGPR.
				409
				410	GFX8.1 and GFX9 support data packing.
				411	Each pair of 16-bit data elements
				412	occupies 1 VGPR.
				413	======================================== ================================================
				414
				415	.. _amdgpu_synid_a16:
				416
				417	a16
				418	~~~
				419
				420	Specifies size of image address components: 16 or 32 bits (32 bits by default). GFX9 only.
				421
				422	======================================== ================================================
				423	Syntax Description
				424	======================================== ================================================
				425	a16 Enables 16-bits image address components.
				426	======================================== ================================================
				427
				428	Miscellaneous Modifiers
				429	-----------------------
				430
				431	.. _amdgpu_synid_glc:
				432
				433	glc
				434	~~~
				435
				436	This modifier has different meaning for loads, stores, and atomic operations.
				437	The default value is off (0).
				438
				439	See AMD documentation for details.
				440
				441	======================================== ================================================
				442	Syntax Description
				443	======================================== ================================================
				444	glc Set glc bit to 1.
				445	======================================== ================================================
				446
				447	.. _amdgpu_synid_slc:
				448
				449	slc
				450	~~~
				451
				452	Specifies cache policy. The default value is off (0).
				453
				454	See AMD documentation for details.
				455
				456	======================================== ================================================
				457	Syntax Description
				458	======================================== ================================================
				459	slc Set slc bit to 1.
				460	======================================== ================================================
				461
				462	.. _amdgpu_synid_tfe:
				463
				464	tfe
				465	~~~
				466
				467	Controls access to partially resident textures. The default value is off (0).
				468
				469	See AMD documentation for details.
				470
				471	======================================== ================================================
				472	Syntax Description
				473	======================================== ================================================
				474	tfe Set tfe bit to 1.
				475	======================================== ================================================
				476
				477	.. _amdgpu_synid_nv:
				478
				479	nv
				480	~~
				481
				482	Specifies if instruction is operating on non-volatile memory. By default, memory is volatile.
				483
				484	GFX9 only.
				485
				486	======================================== ================================================
				487	Syntax Description
				488	======================================== ================================================
				489	nv Indicates that instruction operates on
				490	non-volatile memory.
				491	======================================== ================================================
				492
				493	MUBUF/MTBUF Modifiers
				494	---------------------
				495
				496	.. _amdgpu_synid_idxen:
				497
				498	idxen
				499	~~~~~
				500
				501	Specifies whether address components include an index. By default, no components are used.
				502
				503	Can be used together with :ref:`offen<amdgpu_synid_offen>`.
				504
				505	Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
				506
				507	======================================== ================================================
				508	Syntax Description
				509	======================================== ================================================
				510	idxen Address components include an index.
				511	======================================== ================================================
				512
				513	.. _amdgpu_synid_offen:
				514
				515	offen
				516	~~~~~
				517
				518	Specifies whether address components include an offset. By default, no components are used.
				519
				520	Can be used together with :ref:`idxen<amdgpu_synid_idxen>`.
				521
				522	Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
				523
				524	======================================== ================================================
				525	Syntax Description
				526	======================================== ================================================
				527	offen Address components include an offset.
				528	======================================== ================================================
				529
				530	.. _amdgpu_synid_addr64:
				531
				532	addr64
				533	~~~~~~
				534
				535	Specifies whether a 64-bit address is used. By default, no address is used.
				536
				537	GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and
				538	:ref:`idxen<amdgpu_synid_idxen>` modifiers.
				539
				540	======================================== ================================================
				541	Syntax Description
				542	======================================== ================================================
				543	addr64 A 64-bit address is used.
				544	======================================== ================================================
				545
				546	.. _amdgpu_synid_buf_offset12:
				547
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	548	offset12
				549	~~~~~~~~
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	550
				551	Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
				552
				553	=============================== ======================================================
				554	Syntax Description
				555	=============================== ======================================================
				556	offset:{0..0xFFF} Specifies a 12-bit unsigned offset as a positive
				557	:ref:`integer number <amdgpu_synid_integer_number>`.
				558	=============================== ======================================================
				559
				560	Examples:
				561
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	562	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	563
				564	offset:0
				565	offset:0x10
				566
				567	glc
				568	~~~
				569
				570	See a description :ref:`here<amdgpu_synid_glc>`.
				571
				572	slc
				573	~~~
				574
				575	See a description :ref:`here<amdgpu_synid_slc>`.
				576
				577	.. _amdgpu_synid_lds:
				578
				579	lds
				580	~~~
				581
				582	Specifies where to store the result: VGPRs or LDS (VGPRs by default).
				583
				584	======================================== ===========================
				585	Syntax Description
				586	======================================== ===========================
				587	lds Store result in LDS.
				588	======================================== ===========================
				589
				590	tfe
				591	~~~
				592
				593	See a description :ref:`here<amdgpu_synid_tfe>`.
				594
				595	.. _amdgpu_synid_dfmt:
				596
				597	dfmt
				598	~~~~
				599
				600	TBD
				601
				602	.. _amdgpu_synid_nfmt:
				603
				604	nfmt
				605	~~~~
				606
				607	TBD
				608
				609	SMRD/SMEM Modifiers
				610	-------------------
				611
				612	glc
				613	~~~
				614
				615	See a description :ref:`here<amdgpu_synid_glc>`.
				616
				617	nv
				618	~~
				619
				620	See a description :ref:`here<amdgpu_synid_nv>`.
				621
				622	VINTRP Modifiers
				623	----------------
				624
				625	.. _amdgpu_synid_high:
				626
				627	high
				628	~~~~
				629
				630	Specifies which half of the LDS word to use. Low half of LDS word is used by default.
				631	GFX9 only.
				632
				633	======================================== ================================
				634	Syntax Description
				635	======================================== ================================
				636	high Use high half of LDS word.
				637	======================================== ================================
				638
				639	VOP1/VOP2 DPP Modifiers
				640	-----------------------
				641
				642	GFX8 and GFX9 only.
				643
				644	.. _amdgpu_synid_dpp_ctrl:
				645
				646	dpp_ctrl
				647	~~~~~~~~
				648
				649	Specifies how data are shared between threads. This is a mandatory modifier.
				650	There is no default value.
				651
				652	Note. The lanes of a wavefront are organized in four banks and four rows.
				653
				654	======================================== ================================================
				655	Syntax Description
				656	======================================== ================================================
				657	quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads.
				658	row_mirror Mirror threads within row.
				659	row_half_mirror Mirror threads within 1/2 row (8 threads).
				660	row_bcast:15 Broadcast 15th thread of each row to next row.
				661	row_bcast:31 Broadcast thread 31 to rows 2 and 3.
				662	wave_shl:1 Wavefront left shift by 1 thread.
				663	wave_rol:1 Wavefront left rotate by 1 thread.
				664	wave_shr:1 Wavefront right shift by 1 thread.
				665	wave_ror:1 Wavefront right rotate by 1 thread.
				666	row_shl:{1..15} Row shift left by 1-15 threads.
				667	row_shr:{1..15} Row shift right by 1-15 threads.
				668	row_ror:{1..15} Row rotate right by 1-15 threads.
				669	======================================== ================================================
				670
				671	Note: Numeric parameters may be specified as either
				672	:ref:`integer numbers<amdgpu_synid_integer_number>` or
				673	:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
				674
				675	Examples:
				676
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	677	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	678
				679	quad_perm:[0, 1, 2, 3]
				680	row_shl:3
				681
				682	.. _amdgpu_synid_row_mask:
				683
				684	row_mask
				685	~~~~~~~~
				686
				687	Controls which rows are enabled for data sharing. By default, all rows are enabled.
				688
				689	Note. The lanes of a wavefront are organized in four banks and four rows.
				690
				691	======================================== =====================================================
				692	Syntax Description
				693	======================================== =====================================================
				694	row_mask:{0..15} Specifies a row mask as a positive
				695	:ref:`integer number <amdgpu_synid_integer_number>`.
				696
				697	Each of 4 bits in the mask controls one
				698	row (0 - disabled, 1 - enabled).
				699	======================================== =====================================================
				700
				701	Examples:
				702
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	703	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	704
				705	row_mask:0xf
				706	row_mask:0b1010
				707	row_mask:0b1111
				708
				709	.. _amdgpu_synid_bank_mask:
				710
				711	bank_mask
				712	~~~~~~~~~
				713
				714	Controls which banks are enabled for data sharing. By default, all banks are enabled.
				715
				716	Note. The lanes of a wavefront are organized in four banks and four rows.
				717
				718	======================================== =======================================================
				719	Syntax Description
				720	======================================== =======================================================
				721	bank_mask:{0..15} Specifies a bank mask as a positive
				722	:ref:`integer number <amdgpu_synid_integer_number>`.
				723
				724	Each of 4 bits in the mask controls one
				725	bank (0 - disabled, 1 - enabled).
				726	======================================== =======================================================
				727
				728	Examples:
				729
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	730	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	731
				732	bank_mask:0x3
				733	bank_mask:0b0011
				734	bank_mask:0b1111
				735
				736	.. _amdgpu_synid_bound_ctrl:
				737
				738	bound_ctrl
				739	~~~~~~~~~~
				740
				741	Controls data sharing when accessing an invalid lane. By default, data sharing with
				742	invalid lanes is disabled.
				743
				744	======================================== ================================================
				745	Syntax Description
				746	======================================== ================================================
				747	bound_ctrl:0 Enables data sharing with invalid lanes.
				748
				749	Accessing data from an invalid lane will
				750	return zero.
				751	======================================== ================================================
				752
				753	VOP1/VOP2/VOPC SDWA Modifiers
				754	-----------------------------
				755
				756	GFX8 and GFX9 only.
				757
				758	clamp
				759	~~~~~
				760
				761	See a description :ref:`here<amdgpu_synid_clamp>`.
				762
				763	omod
				764	~~~~
				765
				766	See a description :ref:`here<amdgpu_synid_omod>`.
				767
				768	GFX9 only.
				769
				770	.. _amdgpu_synid_dst_sel:
				771
				772	dst_sel
				773	~~~~~~~
				774
				775	Selects which bits in the destination are affected. By default, all bits are affected.
				776
				777	======================================== ================================================
				778	Syntax Description
				779	======================================== ================================================
				780	dst_sel:DWORD Use bits 31:0.
				781	dst_sel:BYTE_0 Use bits 7:0.
				782	dst_sel:BYTE_1 Use bits 15:8.
				783	dst_sel:BYTE_2 Use bits 23:16.
				784	dst_sel:BYTE_3 Use bits 31:24.
				785	dst_sel:WORD_0 Use bits 15:0.
				786	dst_sel:WORD_1 Use bits 31:16.
				787	======================================== ================================================
				788
				789
				790	.. _amdgpu_synid_dst_unused:
				791
				792	dst_unused
				793	~~~~~~~~~~
				794
				795	Controls what to do with the bits in the destination which are not selected
				796	by :ref:`dst_sel<amdgpu_synid_dst_sel>`.
				797	By default, unused bits are preserved.
				798
				799	======================================== ================================================
				800	Syntax Description
				801	======================================== ================================================
				802	dst_unused:UNUSED_PAD Pad with zeros.
				803	dst_unused:UNUSED_SEXT Sign-extend upper bits, zero lower bits.
				804	dst_unused:UNUSED_PRESERVE Preserve bits.
				805	======================================== ================================================
				806
				807	.. _amdgpu_synid_src0_sel:
				808
				809	src0_sel
				810	~~~~~~~~
				811
				812	Controls which bits in the src0 are used. By default, all bits are used.
				813
				814	======================================== ================================================
				815	Syntax Description
				816	======================================== ================================================
				817	src0_sel:DWORD Use bits 31:0.
				818	src0_sel:BYTE_0 Use bits 7:0.
				819	src0_sel:BYTE_1 Use bits 15:8.
				820	src0_sel:BYTE_2 Use bits 23:16.
				821	src0_sel:BYTE_3 Use bits 31:24.
				822	src0_sel:WORD_0 Use bits 15:0.
				823	src0_sel:WORD_1 Use bits 31:16.
				824	======================================== ================================================
				825
				826	.. _amdgpu_synid_src1_sel:
				827
				828	src1_sel
				829	~~~~~~~~
				830
				831	Controls which bits in the src1 are used. By default, all bits are used.
				832
				833	======================================== ================================================
				834	Syntax Description
				835	======================================== ================================================
				836	src1_sel:DWORD Use bits 31:0.
				837	src1_sel:BYTE_0 Use bits 7:0.
				838	src1_sel:BYTE_1 Use bits 15:8.
				839	src1_sel:BYTE_2 Use bits 23:16.
				840	src1_sel:BYTE_3 Use bits 31:24.
				841	src1_sel:WORD_0 Use bits 15:0.
				842	src1_sel:WORD_1 Use bits 31:16.
				843	======================================== ================================================
				844
				845	.. _amdgpu_synid_sdwa_operand_modifiers:
				846
				847	VOP1/VOP2/VOPC SDWA Operand Modifiers
				848	-------------------------------------
				849
				850	Operand modifiers are not used separately. They are applied to source operands.
				851
				852	GFX8 and GFX9 only.
				853
				854	abs
				855	~~~
				856
				857	See a description :ref:`here<amdgpu_synid_abs>`.
				858
				859	neg
				860	~~~
				861
				862	See a description :ref:`here<amdgpu_synid_neg>`.
				863
				864	.. _amdgpu_synid_sext:
				865
				866	sext
				867	~~~~
				868
				869	Sign-extends value of a (sub-dword) operand to fill all 32 bits.
				870	Has no effect for 32-bit operands.
				871
				872	Valid for integer operands only.
				873
				874	======================================== ================================================
				875	Syntax Description
				876	======================================== ================================================
				877	sext(<operand>) Sign-extend operand value.
				878	======================================== ================================================
				879
				880	Examples:
				881
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	882	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	883
				884	sext(v4)
				885	sext(v255)
				886
				887	VOP3 Modifiers
				888	--------------
				889
				890	.. _amdgpu_synid_vop3_op_sel:
				891
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	892	op_sel
				893	~~~~~~
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	894
				895	Selects the low [15:0] or high [31:16] operand bits for source and destination operands.
				896	By default, low bits are used for all operands.
				897
				898	The number of values specified with the op_sel modifier must match the number of instruction
				899	operands (both source and destination). First value controls src0, second value controls src1
				900	and so on, except that the last value controls destination.
				901	The value 0 selects the low bits, while 1 selects the high bits.
				902
				903	Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
				904	by op_sel must be 0.
				905
				906	GFX9 only.
				907
				908	======================================== ============================================================
				909	Syntax Description
				910	======================================== ============================================================
				911	op_sel:[{0..1},{0..1}] Select operand bits for instructions with 1 source operand.
				912	op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
				913	op_sel:[{0..1},{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
				914	======================================== ============================================================
				915
				916	Examples:
				917
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	918	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	919
				920	op_sel:[0,0]
				921	op_sel:[0,1]
				922
				923	.. _amdgpu_synid_clamp:
				924
				925	clamp
				926	~~~~~
				927
				928	Clamp meaning depends on instruction.
				929
				930	For v_cmp instructions, clamp modifier indicates that the compare signals
				931	if a floating point exception occurs. By default, signaling is disabled.
				932	Not supported by GFX7.
				933
				934	For integer operations, clamp modifier indicates that the result must be clamped
				935	to the largest and smallest representable value. By default, there is no clamping.
				936	Integer clamping is not supported by GFX7.
				937
				938	For floating point operations, clamp modifier indicates that the result must be clamped
				939	to the range [0.0, 1.0]. By default, there is no clamping.
				940
				941	Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any).
				942
				943	======================================== ================================================
				944	Syntax Description
				945	======================================== ================================================
				946	clamp Enables clamping (or signaling).
				947	======================================== ================================================
				948
				949	.. _amdgpu_synid_omod:
				950
				951	omod
				952	~~~~
				953
				954	Specifies if an output modifier must be applied to the result.
				955	By default, no output modifiers are applied.
				956
				957	Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any).
				958
				959	Output modifiers are valid for f32 and f64 floating point results only.
				960	They must not be used with f16.
				961
				962	Note. v_cvt_f16_f32 is an exception. This instruction produces f16 result
				963	but accepts output modifiers.
				964
				965	======================================== ================================================
				966	Syntax Description
				967	======================================== ================================================
				968	mul:2 Multiply the result by 2.
				969	mul:4 Multiply the result by 4.
				970	div:2 Multiply the result by 0.5.
				971	======================================== ================================================
				972
				973	.. _amdgpu_synid_vop3_operand_modifiers:
				974
				975	VOP3 Operand Modifiers
				976	----------------------
				977
				978	Operand modifiers are not used separately. They are applied to source operands.
				979
				980	.. _amdgpu_synid_abs:
				981
				982	abs
				983	~~~
				984
				985	Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any).
				986	Valid for floating point operands only.
				987
				988	======================================== ================================================
				989	Syntax Description
				990	======================================== ================================================
				991	abs(<operand>) Get absolute value of operand.
				992	\\|<operand>\| The same as above.
				993	======================================== ================================================
				994
				995	Examples:
				996
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	997	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	998
				999	abs(v36)
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	1000	\\|v36\|
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1001
				1002	.. _amdgpu_synid_neg:
				1003
				1004	neg
				1005	~~~
				1006
				1007	Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any).
				1008	Valid for floating point operands only.
				1009
				1010	======================================== ================================================
				1011	Syntax Description
				1012	======================================== ================================================
				1013	neg(<operand>) Get negative value of operand.
				1014	-<operand> The same as above.
				1015	======================================== ================================================
				1016
				1017	Examples:
				1018
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	1019	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1020
				1021	neg(v[0])
				1022	-v4
				1023
				1024	VOP3P Modifiers
				1025	---------------
				1026
				1027	This section describes modifiers of regular VOP3P instructions.
				1028
				1029	v_mad_mix_f32, v_mad_mixhi_f16 and v_mad_mixlo_f16
				1030	instructions use these modifiers :ref:`in a special manner<amdgpu_synid_mad_mix>`.
				1031
				1032	GFX9 only.
				1033
				1034	.. _amdgpu_synid_op_sel:
				1035
				1036	op_sel
				1037	~~~~~~
				1038
				1039	Selects the low [15:0] or high [31:16] operand bits as input to the operation
				1040	which results in the lower-half of the destination.
				1041	By default, low bits are used for all operands.
				1042
				1043	The number of values specified by the op_sel modifier must match the number of source
				1044	operands. First value controls src0, second value controls src1 and so on.
				1045
				1046	The value 0 selects the low bits, while 1 selects the high bits.
				1047
				1048	================================= =============================================================
				1049	Syntax Description
				1050	================================= =============================================================
				1051	op_sel:[{0..1}] Select operand bits for instructions with 1 source operand.
				1052	op_sel:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
				1053	op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
				1054	================================= =============================================================
				1055
				1056	Examples:
				1057
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	1058	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1059
				1060	op_sel:[0,0]
				1061	op_sel:[0,1,0]
				1062
				1063	.. _amdgpu_synid_op_sel_hi:
				1064
				1065	op_sel_hi
				1066	~~~~~~~~~
				1067
				1068	Selects the low [15:0] or high [31:16] operand bits as input to the operation
				1069	which results in the upper-half of the destination.
				1070	By default, high bits are used for all operands.
				1071
				1072	The number of values specified by the op_sel_hi modifier must match the number of source
				1073	operands. First value controls src0, second value controls src1 and so on.
				1074
				1075	The value 0 selects the low bits, while 1 selects the high bits.
				1076
				1077	=================================== =============================================================
				1078	Syntax Description
				1079	=================================== =============================================================
				1080	op_sel_hi:[{0..1}] Select operand bits for instructions with 1 source operand.
				1081	op_sel_hi:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
				1082	op_sel_hi:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
				1083	=================================== =============================================================
				1084
				1085	Examples:
				1086
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	1087	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1088
				1089	op_sel_hi:[0,0]
				1090	op_sel_hi:[0,0,1]
				1091
				1092	.. _amdgpu_synid_neg_lo:
				1093
				1094	neg_lo
				1095	~~~~~~
				1096
				1097	Specifies whether to change sign of operand values selected by
				1098	:ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used
				1099	as input to the operation which results in the upper-half of the destination.
				1100
				1101	The number of values specified by this modifier must match the number of source
				1102	operands. First value controls src0, second value controls src1 and so on.
				1103
				1104	The value 0 indicates that the corresponding operand value is used unmodified,
				1105	the value 1 indicates that negative value of the operand must be used.
				1106
				1107	By default, operand values are used unmodified.
				1108
				1109	This modifier is valid for floating point operands only.
				1110
				1111	================================ ==================================================================
				1112	Syntax Description
				1113	================================ ==================================================================
				1114	neg_lo:[{0..1}] Select affected operands for instructions with 1 source operand.
				1115	neg_lo:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands.
				1116	neg_lo:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
				1117	================================ ==================================================================
				1118
				1119	Examples:
				1120
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	1121	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1122
				1123	neg_lo:[0]
				1124	neg_lo:[0,1]
				1125
				1126	.. _amdgpu_synid_neg_hi:
				1127
				1128	neg_hi
				1129	~~~~~~
				1130
				1131	Specifies whether to change sign of operand values selected by
				1132	:ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used
				1133	as input to the operation which results in the upper-half of the destination.
				1134
				1135	The number of values specified by this modifier must match the number of source
				1136	operands. First value controls src0, second value controls src1 and so on.
				1137
				1138	The value 0 indicates that the corresponding operand value is used unmodified,
				1139	the value 1 indicates that negative value of the operand must be used.
				1140
				1141	By default, operand values are used unmodified.
				1142
				1143	This modifier is valid for floating point operands only.
				1144
				1145	=============================== ==================================================================
				1146	Syntax Description
				1147	=============================== ==================================================================
				1148	neg_hi:[{0..1}] Select affected operands for instructions with 1 source operand.
				1149	neg_hi:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands.
				1150	neg_hi:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
				1151	=============================== ==================================================================
				1152
				1153	Examples:
				1154
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	1155	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1156
				1157	neg_hi:[1,0]
				1158	neg_hi:[0,1,1]
				1159
				1160	clamp
				1161	~~~~~
				1162
				1163	See a description :ref:`here<amdgpu_synid_clamp>`.
				1164
				1165	.. _amdgpu_synid_mad_mix:
				1166
				1167	VOP3P V_MAD_MIX Modifiers
				1168	-------------------------
				1169
				1170	v_mad_mix_f32, v_mad_mixhi_f16 and v_mad_mixlo_f16 instructions
				1171	use op_sel and op_sel_hi modifiers
				1172	in a manner different from regular VOP3P instructions.
				1173
				1174	See a description below.
				1175
				1176	GFX9 only.
				1177
				1178	.. _amdgpu_synid_mad_mix_op_sel:
				1179
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	1180	m_op_sel
				1181	~~~~~~~~
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1182
				1183	This operand has meaning only for 16-bit source operands as indicated by
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	1184	:ref:`m_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`.
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1185	It specifies to select either the low [15:0] or high [31:16] operand bits
				1186	as input to the operation.
				1187
				1188	The number of values specified by the op_sel modifier must match the number of source
				1189	operands. First value controls src0, second value controls src1 and so on.
				1190
				1191	The value 0 indicates the low bits, the value 1 indicates the high 16 bits.
				1192
				1193	By default, low bits are used for all operands.
				1194
				1195	=============================== ================================================
				1196	Syntax Description
				1197	=============================== ================================================
				1198	op_sel:[{0..1},{0..1},{0..1}] Select location of each 16-bit source operand.
				1199	=============================== ================================================
				1200
				1201	Examples:
				1202
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	1203	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1204
				1205	op_sel:[0,1]
				1206
				1207	.. _amdgpu_synid_mad_mix_op_sel_hi:
				1208
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	1209	m_op_sel_hi
				1210	~~~~~~~~~~~
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1211
				1212	Selects the size of source operands: either 32 bits or 16 bits.
				1213	By default, 32 bits are used for all source operands.
				1214
				1215	The number of values specified by the op_sel_hi modifier must match the number of source
				1216	operands. First value controls src0, second value controls src1 and so on.
				1217
				1218	The value 0 indicates 32 bits, the value 1 indicates 16 bits.
				1219
				1220	The location of 16 bits in the operand may be specified by
Dmitry Preobrazhensky	78dfeb7	2018-12-28 11:48:23 +0000	[diff] [blame]	1221	:ref:`m_op_sel<amdgpu_synid_mad_mix_op_sel>`.
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1222
				1223	======================================== ====================================
				1224	Syntax Description
				1225	======================================== ====================================
				1226	op_sel_hi:[{0..1},{0..1},{0..1}] Select size of each source operand.
				1227	======================================== ====================================
				1228
				1229	Examples:
				1230
Dmitry Preobrazhensky	23da110	2018-12-17 18:53:10 +0000	[diff] [blame]	1231	.. parsed-literal::
Dmitry Preobrazhensky	51120d7	2018-12-17 17:38:11 +0000	[diff] [blame]	1232
				1233	op_sel_hi:[1,1,1]
				1234
				1235	abs
				1236	~~~
				1237
				1238	See a description :ref:`here<amdgpu_synid_abs>`.
				1239
				1240	neg
				1241	~~~
				1242
				1243	See a description :ref:`here<amdgpu_synid_neg>`.
				1244
				1245	clamp
				1246	~~~~~
				1247
				1248	See a description :ref:`here<amdgpu_synid_clamp>`.