blob: 51e477aca48198dd073521bc0c5f65b5996e9878 [file] [log] [blame]
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +00001=====================================
2Syntax of AMDGPU Instruction Operands
3=====================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +00004
5.. contents::
6 :local:
7
8Conventions
9===========
10
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +000011The following notation is used throughout this document:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +000012
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +000013 =================== =============================================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +000014 Notation Description
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +000015 =================== =============================================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +000016 {0..N} Any integer value in the range from 0 to N (inclusive).
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +000017 <x> Syntax and meaning of *x* is explained elsewhere.
18 =================== =============================================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +000019
20.. _amdgpu_syn_operands:
21
22Operands
23========
24
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +000025.. _amdgpu_synid_v:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +000026
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +000027v
28-
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +000029
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +000030Vector registers. There are 256 32-bit vector registers.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +000031
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +000032A sequence of *vector* registers may be used to operate with more than 32 bits of data.
33
34Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* registers.
35
36 =================================================== ====================================================================
37 Syntax Description
38 =================================================== ====================================================================
39 **v**\<N> A single 32-bit *vector* register.
40
41 *N* must be a decimal integer number.
42 **v[**\ <N>\ **]** A single 32-bit *vector* register.
43
44 *N* may be specified as an
45 :ref:`integer number<amdgpu_synid_integer_number>`
46 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
47 **v[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers.
48
49 *N* and *K* may be specified as
50 :ref:`integer numbers<amdgpu_synid_integer_number>`
51 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
52 **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers.
53
54 Register indices must be specified as decimal integer numbers.
55 =================================================== ====================================================================
56
57Note. *N* and *K* must satisfy the following conditions:
58
59* *N* <= *K*.
60* 0 <= *N* <= 255.
61* 0 <= *K* <= 255.
62* *K-N+1* must be equal to 1, 2, 3, 4, 8 or 16.
63
64Examples:
65
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +000066.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +000067
68 v255
69 v[0]
70 v[0:1]
71 v[1:1]
72 v[0:3]
73 v[2*2]
74 v[1-1:2-1]
75 [v252]
76 [v252,v253,v254,v255]
77
78.. _amdgpu_synid_s:
79
80s
81-
82
83Scalar 32-bit registers. The number of available *scalar* registers depends on GPU:
84
85 ======= ============================
86 GPU Number of *scalar* registers
87 ======= ============================
88 GFX7 104
89 GFX8 102
90 GFX9 102
91 ======= ============================
92
93A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
94Assembler currently supports sequences of 1, 2, 4, 8 and 16 *scalar* registers.
95
96Pairs of *scalar* registers must be even-aligned (the first register must be even).
97Sequences of 4 and more *scalar* registers must be quad-aligned.
98
99 ======================================================== ====================================================================
100 Syntax Description
101 ======================================================== ====================================================================
102 **s**\ <N> A single 32-bit *scalar* register.
103
104 *N* must be a decimal integer number.
105 **s[**\ <N>\ **]** A single 32-bit *scalar* register.
106
107 *N* may be specified as an
108 :ref:`integer number<amdgpu_synid_integer_number>`
109 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
110 **s[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers.
111
112 *N* and *K* may be specified as
113 :ref:`integer numbers<amdgpu_synid_integer_number>`
114 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
115 **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers.
116
117 Register indices must be specified as decimal integer numbers.
118 ======================================================== ====================================================================
119
120Note. *N* and *K* must satisfy the following conditions:
121
122* *N* must be properly aligned based on sequence size.
123* *N* <= *K*.
124* 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
125* 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
126* *K-N+1* must be equal to 1, 2, 4, 8 or 16.
127
128Examples:
129
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000130.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000131
132 s0
133 s[0]
134 s[0:1]
135 s[1:1]
136 s[0:3]
137 s[2*2]
138 s[1-1:2-1]
139 [s4]
140 [s4,s5,s6,s7]
141
142Examples of *scalar* registers with an invalid alignment:
143
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000144.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000145
146 s[1:2]
147 s[2:5]
148
149.. _amdgpu_synid_trap:
150
151trap
152----
153
154A set of trap handler registers:
155
156* :ref:`ttmp<amdgpu_synid_ttmp>`
157* :ref:`tba<amdgpu_synid_tba>`
158* :ref:`tma<amdgpu_synid_tma>`
159
160.. _amdgpu_synid_ttmp:
161
162ttmp
163----
164
165Trap handler temporary scalar registers, 32-bits wide.
166The number of available *ttmp* registers depends on GPU:
167
168 ======= ===========================
169 GPU Number of *ttmp* registers
170 ======= ===========================
171 GFX7 12
172 GFX8 12
173 GFX9 16
174 ======= ===========================
175
176A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
177Assembler currently supports sequences of 1, 2, 4, 8 and 16 *ttmp* registers.
178
179Pairs of *ttmp* registers must be even-aligned (the first register must be even).
180Sequences of 4 and more *ttmp* registers must be quad-aligned.
181
182 ============================================================= ====================================================================
183 Syntax Description
184 ============================================================= ====================================================================
185 **ttmp**\ <N> A single 32-bit *ttmp* register.
186
187 *N* must be a decimal integer number.
188 **ttmp[**\ <N>\ **]** A single 32-bit *ttmp* register.
189
190 *N* may be specified as an
191 :ref:`integer number<amdgpu_synid_integer_number>`
192 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
193 **ttmp[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers.
194
195 *N* and *K* may be specified as
196 :ref:`integer numbers<amdgpu_synid_integer_number>`
197 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
198 **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers.
199
200 Register indices must be specified as decimal integer numbers.
201 ============================================================= ====================================================================
202
203Note. *N* and *K* must satisfy the following conditions:
204
205* *N* must be properly aligned based on sequence size.
206* *N* <= *K*.
207* 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
208* 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
209* *K-N+1* must be equal to 1, 2, 4, 8 or 16.
210
211Examples:
212
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000213.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000214
215 ttmp0
216 ttmp[0]
217 ttmp[0:1]
218 ttmp[1:1]
219 ttmp[0:3]
220 ttmp[2*2]
221 ttmp[1-1:2-1]
222 [ttmp4]
223 [ttmp4,ttmp5,ttmp6,ttmp7]
224
225Examples of *ttmp* registers with an invalid alignment:
226
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000227.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000228
229 ttmp[1:2]
230 ttmp[2:5]
231
232.. _amdgpu_synid_tba:
233
234tba
235---
236
237Trap base address, 64-bits wide. Holds the pointer to the current trap handler program.
238
239 ================== ======================================================================= =============
240 Syntax Description Availability
241 ================== ======================================================================= =============
242 tba 64-bit *trap base address* register. GFX7, GFX8
243 [tba] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8
244 [tba_lo,tba_hi] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8
245 ================== ======================================================================= =============
246
247High and low 32 bits of *trap base address* may be accessed as separate registers:
248
249 ================== ======================================================================= =============
250 Syntax Description Availability
251 ================== ======================================================================= =============
252 tba_lo Low 32 bits of *trap base address* register. GFX7, GFX8
253 tba_hi High 32 bits of *trap base address* register. GFX7, GFX8
254 [tba_lo] Low 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8
255 [tba_hi] High 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8
256 ================== ======================================================================= =============
257
258Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9,
259but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
260
261.. _amdgpu_synid_tma:
262
263tma
264---
265
266Trap memory address, 64-bits wide.
267
268 ================= ======================================================================= ==================
269 Syntax Description Availability
270 ================= ======================================================================= ==================
271 tma 64-bit *trap memory address* register. GFX7, GFX8
272 [tma] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8
273 [tma_lo,tma_hi] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8
274 ================= ======================================================================= ==================
275
276High and low 32 bits of *trap memory address* may be accessed as separate registers:
277
278 ================= ======================================================================= ==================
279 Syntax Description Availability
280 ================= ======================================================================= ==================
281 tma_lo Low 32 bits of *trap memory address* register. GFX7, GFX8
282 tma_hi High 32 bits of *trap memory address* register. GFX7, GFX8
283 [tma_lo] Low 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
284 [tma_hi] High 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
285 ================= ======================================================================= ==================
286
287Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9,
288but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
289
290.. _amdgpu_synid_flat_scratch:
291
292flat_scratch
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000293------------
294
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000295Flat scratch address, 64-bits wide. Holds the base address of scratch memory.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000296
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000297 ================================== ================================================================
298 Syntax Description
299 ================================== ================================================================
300 flat_scratch 64-bit *flat scratch* address register.
301 [flat_scratch] 64-bit *flat scratch* address register (an alternative syntax).
302 [flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an alternative syntax).
303 ================================== ================================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000304
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000305High and low 32 bits of *flat scratch* address may be accessed as separate registers:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000306
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000307 ========================= =========================================================================
308 Syntax Description
309 ========================= =========================================================================
310 flat_scratch_lo Low 32 bits of *flat scratch* address register.
311 flat_scratch_hi High 32 bits of *flat scratch* address register.
312 [flat_scratch_lo] Low 32 bits of *flat scratch* address register (an alternative syntax).
313 [flat_scratch_hi] High 32 bits of *flat scratch* address register (an alternative syntax).
314 ========================= =========================================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000315
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000316.. _amdgpu_synid_xnack:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000317
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000318xnack
319-----
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000320
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000321Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
322received an *XNACK* due to a vector memory operation.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000323
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000324.. WARNING:: GFX7 does not support *xnack* feature. Not all GFX8 and GFX9 :ref:`processors<amdgpu-processors>` support *xnack* feature.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000325
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000326\
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000327
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000328 ============================== =====================================================
329 Syntax Description
330 ============================== =====================================================
331 xnack_mask 64-bit *xnack mask* register.
332 [xnack_mask] 64-bit *xnack mask* register (an alternative syntax).
333 [xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an alternative syntax).
334 ============================== =====================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000335
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000336High and low 32 bits of *xnack mask* may be accessed as separate registers:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000337
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000338 ===================== ==============================================================
339 Syntax Description
340 ===================== ==============================================================
341 xnack_mask_lo Low 32 bits of *xnack mask* register.
342 xnack_mask_hi High 32 bits of *xnack mask* register.
343 [xnack_mask_lo] Low 32 bits of *xnack mask* register (an alternative syntax).
344 [xnack_mask_hi] High 32 bits of *xnack mask* register (an alternative syntax).
345 ===================== ==============================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000346
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000347.. _amdgpu_synid_vcc:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000348
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000349vcc
350---
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000351
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000352Vector condition code, 64-bits wide. A bit mask with one bit per thread;
353it holds the result of a vector compare operation.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000354
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000355 ================ =========================================================================
356 Syntax Description
357 ================ =========================================================================
358 vcc 64-bit *vector condition code* register.
359 [vcc] 64-bit *vector condition code* register (an alternative syntax).
360 [vcc_lo,vcc_hi] 64-bit *vector condition code* register (an alternative syntax).
361 ================ =========================================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000362
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000363High and low 32 bits of *vector condition code* may be accessed as separate registers:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000364
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000365 ================ =========================================================================
366 Syntax Description
367 ================ =========================================================================
368 vcc_lo Low 32 bits of *vector condition code* register.
369 vcc_hi High 32 bits of *vector condition code* register.
370 [vcc_lo] Low 32 bits of *vector condition code* register (an alternative syntax).
371 [vcc_hi] High 32 bits of *vector condition code* register (an alternative syntax).
372 ================ =========================================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000373
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000374.. _amdgpu_synid_m0:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000375
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000376m0
377--
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000378
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000379A 32-bit memory register. It has various uses,
380including register indexing and bounds checking.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000381
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000382 =========== ===================================================
383 Syntax Description
384 =========== ===================================================
385 m0 A 32-bit *memory* register.
386 [m0] A 32-bit *memory* register (an alternative syntax).
387 =========== ===================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000388
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000389.. _amdgpu_synid_exec:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000390
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000391exec
392----
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000393
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000394Execute mask, 64-bits wide. A bit mask with one bit per thread,
395which is applied to vector instructions and controls which threads execute
396and which ignore the instruction.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000397
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000398 ===================== =================================================================
399 Syntax Description
400 ===================== =================================================================
401 exec 64-bit *execute mask* register.
402 [exec] 64-bit *execute mask* register (an alternative syntax).
403 [exec_lo,exec_hi] 64-bit *execute mask* register (an alternative syntax).
404 ===================== =================================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000405
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000406High and low 32 bits of *execute mask* may be accessed as separate registers:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000407
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000408 ===================== =================================================================
409 Syntax Description
410 ===================== =================================================================
411 exec_lo Low 32 bits of *execute mask* register.
412 exec_hi High 32 bits of *execute mask* register.
413 [exec_lo] Low 32 bits of *execute mask* register (an alternative syntax).
414 [exec_hi] High 32 bits of *execute mask* register (an alternative syntax).
415 ===================== =================================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000416
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000417.. _amdgpu_synid_vccz:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000418
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000419vccz
420----
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000421
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000422A single bit-flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000423
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000424.. WARNING:: This operand is not currently supported by AMDGPU assembler.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000425
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000426.. _amdgpu_synid_execz:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000427
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000428execz
429-----
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000430
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000431A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000432
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000433.. WARNING:: This operand is not currently supported by AMDGPU assembler.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000434
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000435.. _amdgpu_synid_scc:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000436
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000437scc
438---
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000439
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000440A single bit flag indicating the result of a scalar compare operation.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000441
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000442.. WARNING:: This operand is not currently supported by AMDGPU assembler.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000443
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000444lds_direct
445----------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000446
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000447A special operand which supplies a 32-bit value
448fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000449
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000450.. WARNING:: This operand is not currently supported by AMDGPU assembler.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000451
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000452.. _amdgpu_synid_constant:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000453
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000454constant
455--------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000456
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000457A set of integer and floating-point *inline constants*:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000458
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000459* :ref:`iconst<amdgpu_synid_iconst>`
460* :ref:`fconst<amdgpu_synid_fconst>`
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000461
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000462These operands are encoded as a part of instruction.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000463
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000464If a number may be encoded as either
465a :ref:`literal<amdgpu_synid_literal>` or
466an :ref:`inline constant<amdgpu_synid_constant>`,
467assembler selects the latter encoding as more efficient.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000468
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000469.. _amdgpu_synid_iconst:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000470
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000471iconst
472------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000473
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000474An :ref:`integer number<amdgpu_synid_integer_number>`
475encoded as an *inline constant*.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000476
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000477Only a small fraction of integer numbers may be encoded as *inline constants*.
478They are enumerated in the table below.
479Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000480
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000481Integer *inline constants* are converted to
482:ref:`expected operand type<amdgpu_syn_instruction_type>`
483as described :ref:`here<amdgpu_synid_int_const_conv>`.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000484
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000485 ================================== ====================================
486 Value Note
487 ================================== ====================================
488 {0..64} Positive integer inline constants.
489 {-16..-1} Negative integer inline constants.
490 ================================== ====================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000491
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000492.. WARNING:: GFX7 does not support inline constants for *f16* operands.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000493
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000494There are also symbolic inline constants which provide read-only access to H/W registers.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000495
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000496.. WARNING:: These inline constants are not currently supported by AMDGPU assembler.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000497
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000498\
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000499
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000500 ======================== ================================================ =============
501 Syntax Note Availability
502 ======================== ================================================ =============
503 shared_base Base address of shared memory region. GFX9
504 shared_limit Address of the end of shared memory region. GFX9
505 private_base Base address of private memory region. GFX9
506 private_limit Address of the end of private memory region. GFX9
507 pops_exiting_wave_id A dedicated counter for POPS. GFX9
508 ======================== ================================================ =============
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000509
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000510.. _amdgpu_synid_fconst:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000511
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000512fconst
513------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000514
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000515A :ref:`floating-point number<amdgpu_synid_floating-point_number>`
516encoded as an *inline constant*.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000517
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000518Only a small fraction of floating-point numbers may be encoded as *inline constants*.
519They are enumerated in the table below.
520Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000521
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000522Floating-point *inline constants* are converted to
523:ref:`expected operand type<amdgpu_syn_instruction_type>`
524as described :ref:`here<amdgpu_synid_fp_const_conv>`.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000525
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000526 ================================== ===================================================== ==================
527 Value Note Availability
528 ================================== ===================================================== ==================
529 0.0 The same as integer constant 0. All GPUs
530 0.5 Floating-point constant 0.5 All GPUs
531 1.0 Floating-point constant 1.0 All GPUs
532 2.0 Floating-point constant 2.0 All GPUs
533 4.0 Floating-point constant 4.0 All GPUs
534 -0.5 Floating-point constant -0.5 All GPUs
535 -1.0 Floating-point constant -1.0 All GPUs
536 -2.0 Floating-point constant -2.0 All GPUs
537 -4.0 Floating-point constant -4.0 All GPUs
538 0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8, GFX9
539 0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8, GFX9
540 0.159154943091895317852646485335 1.0/(2.0*pi). GFX8, GFX9
541 ================================== ===================================================== ==================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000542
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000543.. WARNING:: GFX7 does not support inline constants for *f16* operands.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000544
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000545.. _amdgpu_synid_literal:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000546
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000547literal
548-------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000549
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000550A literal is a 64-bit value which is encoded as a separate 32-bit dword in the instruction stream.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000551
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000552If a number may be encoded as either
553a :ref:`literal<amdgpu_synid_literal>` or
554an :ref:`inline constant<amdgpu_synid_constant>`,
555assembler selects the latter encoding as more efficient.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000556
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000557Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
558:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or
559:ref:`expressions<amdgpu_synid_expression>`
560(expressions are currently supported for 32-bit operands only).
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000561
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000562A 64-bit literal value is converted by assembler
563to an :ref:`expected operand type<amdgpu_syn_instruction_type>`
564as described :ref:`here<amdgpu_synid_lit_conv>`.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000565
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000566An instruction may use only one literal but several operands may refer the same literal.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000567
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000568.. _amdgpu_synid_uimm8:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000569
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000570uimm8
571-----
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000572
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000573A 8-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
574The value is encoded as part of the opcode so it is free to use.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000575
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000576.. _amdgpu_synid_uimm32:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000577
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000578uimm32
579------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000580
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000581A 32-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
582The value is stored as a separate 32-bit dword in the instruction stream.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000583
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000584.. _amdgpu_synid_uimm20:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000585
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000586uimm20
587------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000588
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000589A 20-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000590
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000591.. _amdgpu_synid_uimm21:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000592
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000593uimm21
594------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000595
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000596A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000597
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000598.. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000599
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000600.. _amdgpu_synid_simm21:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000601
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000602simm21
603------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000604
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000605A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000606
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000607.. WARNING:: Assembler currently supports 20-bit unsigned offsets only .Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000608
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000609.. _amdgpu_synid_off:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000610
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000611off
612---
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000613
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000614A special entity which indicates that the value of this operand is not used.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000615
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000616 ================================== ===================================================
617 Syntax Description
618 ================================== ===================================================
619 off Indicates an unused operand.
620 ================================== ===================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000621
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000622
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000623.. _amdgpu_synid_number:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000624
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000625Numbers
626=======
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000627
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000628.. _amdgpu_synid_integer_number:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000629
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000630Integer Numbers
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000631---------------
632
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000633Integer numbers are 64 bits wide.
634They may be specified in binary, octal, hexadecimal and decimal formats:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000635
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000636 ============== ====================================
637 Format Syntax
638 ============== ====================================
639 Decimal [-]?[1-9][0-9]*
640 Binary [-]?0b[01]+
641 Octal [-]?0[0-7]+
642 Hexadecimal [-]?0x[0-9a-fA-F]+
643 \ [-]?[0x]?[0-9][0-9a-fA-F]*[hH]
644 ============== ====================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000645
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000646Examples:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000647
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000648.. parsed-literal::
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000649
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000650 -1234
651 0b1010
652 010
653 0xff
654 0ffh
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000655
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000656.. _amdgpu_synid_floating-point_number:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000657
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000658Floating-Point Numbers
659----------------------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000660
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000661All floating-point numbers are handled as double (64 bits wide).
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000662
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000663Floating-point numbers may be specified in hexadecimal and decimal formats:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000664
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000665 ============== ======================================================== ========================================================
666 Format Syntax Note
667 ============== ======================================================== ========================================================
668 Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? Must include either a decimal separator or an exponent.
669 Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+
670 ============== ======================================================== ========================================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000671
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000672Examples:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000673
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000674.. parsed-literal::
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000675
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000676 -1.234
677 234e2
678 -0x1afp-10
679 0x.1afp10
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000680
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000681.. _amdgpu_synid_expression:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000682
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000683Expressions
684===========
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000685
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000686An expression specifies an address or a numeric value.
687There are two kinds of expressions:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000688
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000689* :ref:`Absolute<amdgpu_synid_absolute_expression>`.
690* :ref:`Relocatable<amdgpu_synid_relocatable_expression>`.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000691
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000692.. _amdgpu_synid_absolute_expression:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000693
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000694Absolute Expressions
695--------------------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000696
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000697The value of an absolute expression remains the same after program relocation.
698Absolute expressions must not include unassigned and relocatable values
699such as labels.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000700
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000701Examples:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000702
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000703.. parsed-literal::
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000704
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000705 x = -1
706 y = x + 10
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000707
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000708.. _amdgpu_synid_relocatable_expression:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000709
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000710Relocatable Expressions
711-----------------------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000712
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000713The value of a relocatable expression depends on program relocation.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000714
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000715Note that use of relocatable expressions is limited with branch targets
716and 32-bit :ref:`literals<amdgpu_synid_literal>`.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000717
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000718Addition information about relocation may be found :ref:`here<amdgpu-relocation-records>`.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000719
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000720Examples:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000721
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000722.. parsed-literal::
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000723
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000724 y = x + 10 // x is not yet defined. Undefined symbols are assumed to be PC-relative.
725 z = .
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000726
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000727Expression Data Type
728--------------------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000729
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000730Expressions and operands of expressions are interpreted as 64-bit integers.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000731
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000732Expressions may include 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` (double).
733However these operands are also handled as 64-bit integers
734using binary representation of specified floating-point numbers.
735No conversion from floating-point to integer is performed.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000736
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000737Examples:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000738
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000739.. parsed-literal::
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000740
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000741 x = 0.1 // x is assigned an integer 4591870180066957722 which is a binary representation of 0.1.
742 y = x + x // y is a sum of two integer values; it is not equal to 0.2!
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000743
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000744Syntax
745------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000746
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000747Expressions are composed of
748:ref:`symbols<amdgpu_synid_symbol>`,
749:ref:`integer numbers<amdgpu_synid_integer_number>`,
750:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
751:ref:`binary operators<amdgpu_synid_expression_bin_op>`,
752:ref:`unary operators<amdgpu_synid_expression_un_op>` and subexpressions.
Dmitry Preobrazhenskye2d5b752018-07-27 14:17:15 +0000753
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000754Expressions may also use "." which is a reference to the current PC (program counter).
Dmitry Preobrazhenskye2d5b752018-07-27 14:17:15 +0000755
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000756The syntax of expressions is shown below::
Dmitry Preobrazhenskye2d5b752018-07-27 14:17:15 +0000757
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000758 expr ::= expr binop expr | primaryexpr ;
Dmitry Preobrazhenskye2d5b752018-07-27 14:17:15 +0000759
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000760 primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ;
Dmitry Preobrazhenskye2d5b752018-07-27 14:17:15 +0000761
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000762 binop ::= '&&'
763 | '||'
764 | '|'
765 | '^'
766 | '&'
767 | '!'
768 | '=='
769 | '!='
770 | '<>'
771 | '<'
772 | '<='
773 | '>'
774 | '>='
775 | '<<'
776 | '>>'
777 | '+'
778 | '-'
779 | '*'
780 | '/'
781 | '%' ;
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000782
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000783 unop ::= '~'
784 | '+'
785 | '-'
786 | '!' ;
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000787
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000788.. _amdgpu_synid_expression_bin_op:
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000789
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000790Binary Operators
791----------------
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000792
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000793Binary operators are described in the following table.
794They operate on and produce 64-bit integers.
795Operators with higher priority are performed first.
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000796
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000797 ========== ========= ===============================================
798 Operator Priority Meaning
799 ========== ========= ===============================================
800 \* 5 Integer multiplication.
801 / 5 Integer division.
802 % 5 Integer signed remainder.
803 \+ 4 Integer addition.
804 \- 4 Integer subtraction.
805 << 3 Integer shift left.
806 >> 3 Logical shift right.
807 == 2 Equality comparison.
808 != 2 Inequality comparison.
809 <> 2 Inequality comparison.
810 < 2 Signed less than comparison.
811 <= 2 Signed less than or equal comparison.
812 > 2 Signed greater than comparison.
813 >= 2 Signed greater than or equal comparison.
814 \| 1 Bitwise or.
815 ^ 1 Bitwise xor.
816 & 1 Bitwise and.
817 && 0 Logical and.
818 || 0 Logical or.
819 ========== ========= ===============================================
Dmitry Preobrazhensky1eaf2d72018-03-12 15:55:08 +0000820
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000821.. _amdgpu_synid_expression_un_op:
822
823Unary Operators
824---------------
825
826Unary operators are described in the following table.
827They operate on and produce 64-bit integers.
828
829 ========== ===============================================
830 Operator Meaning
831 ========== ===============================================
832 ! Logical negation.
833 ~ Bitwise negation.
834 \+ Integer unary plus.
835 \- Integer unary minus.
836 ========== ===============================================
837
838.. _amdgpu_synid_symbol:
839
840Symbols
841-------
842
843A symbol is a named 64-bit value, representing a relocatable
844address or an absolute (non-relocatable) number.
845
846Symbol names have the following syntax:
847 ``[a-zA-Z_.][a-zA-Z0-9_$.@]*``
848
849The table below provides several examples of syntax used for symbol definition.
850
851 ================ ==========================================================
852 Syntax Meaning
853 ================ ==========================================================
854 .globl <S> Declares a global symbol S without assigning it a value.
855 .set <S>, <E> Assigns the value of an expression E to a symbol S.
856 <S> = <E> Assigns the value of an expression E to a symbol S.
857 <S>: Declares a label S and assigns it the current PC value.
858 ================ ==========================================================
859
860A symbol may be used before it is declared or assigned;
861unassigned symbols are assumed to be PC-relative.
862
863Addition information about symbols may be found :ref:`here<amdgpu-symbols>`.
864
865.. _amdgpu_synid_conv:
866
867Conversions
868===========
869
870This section describes what happens when a 64-bit
871:ref:`integer number<amdgpu_synid_integer_number>`, a
872:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or a
873:ref:`symbol<amdgpu_synid_symbol>`
874is used for an operand which has a different type or size.
875
876Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W:
877
878* Values encoded as :ref:`inline constants<amdgpu_synid_constant>` are handled by H/W.
879* Values encoded as :ref:`literals<amdgpu_synid_literal>` are converted by assembler.
880
881.. _amdgpu_synid_const_conv:
882
883Inline Constants
884----------------
885
886.. _amdgpu_synid_int_const_conv:
887
888Integer Inline Constants
889~~~~~~~~~~~~~~~~~~~~~~~~
890
891Integer :ref:`inline constants<amdgpu_synid_constant>`
892may be thought of as 64-bit
893:ref:`integer numbers<amdgpu_synid_integer_number>`;
894when used as operands they are truncated to the size of
895:ref:`expected operand type<amdgpu_syn_instruction_type>`.
896No data type conversions are performed.
897
898Examples:
899
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000900.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000901
902 // GFX9
903
904 v_add_u16 v0, -1, 0 // v0 = 0xFFFF
905 v_add_f16 v0, -1, 0 // v0 = 0xFFFF (NaN)
906
907 v_add_u32 v0, -1, 0 // v0 = 0xFFFFFFFF
908 v_add_f32 v0, -1, 0 // v0 = 0xFFFFFFFF (NaN)
909
910.. _amdgpu_synid_fp_const_conv:
911
912Floating-Point Inline Constants
913~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
914
915Floating-point :ref:`inline constants<amdgpu_synid_constant>`
916may be thought of as 64-bit
917:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`;
918when used as operands they are converted to a floating-point number of
919:ref:`expected operand size<amdgpu_syn_instruction_type>`.
920
921Examples:
922
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000923.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000924
925 // GFX9
926
927 v_add_f16 v0, 1.0, 0 // v0 = 0x3C00 (1.0)
928 v_add_u16 v0, 1.0, 0 // v0 = 0x3C00
929
930 v_add_f32 v0, 1.0, 0 // v0 = 0x3F800000 (1.0)
931 v_add_u32 v0, 1.0, 0 // v0 = 0x3F800000
932
933
934.. _amdgpu_synid_lit_conv:
935
936Literals
937--------
938
939.. _amdgpu_synid_int_lit_conv:
940
941Integer Literals
942~~~~~~~~~~~~~~~~
943
944Integer :ref:`literals<amdgpu_synid_literal>`
945are specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>`.
946
947When used as operands they are converted to
948:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
949
950 ============== ============== =============== ====================================================================
951 Expected type Condition Result Note
952 ============== ============== =============== ====================================================================
Dmitry Preobrazhensky78dfeb72018-12-28 11:48:23 +0000953 i16, u16, b16 cond(num,16) num.u16 Truncate to 16 bits.
954 i32, u32, b32 cond(num,32) num.u32 Truncate to 32 bits.
955 i64 cond(num,32) {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits.
956 u64, b64 cond(num,32) { 0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits.
957 f16 cond(num,16) num.u16 Use low 16 bits as an f16 value.
958 f32 cond(num,32) num.u32 Use low 32 bits as an f32 value.
959 f64 cond(num,32) {num.u32,0} Use low 32 bits of the number as high 32 bits
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000960 of the result; low 32 bits of the result are zeroed.
961 ============== ============== =============== ====================================================================
962
963The condition *cond(X,S)* indicates if a 64-bit number *X*
964can be converted to a smaller size *S* by truncation of upper bits.
965There are two cases when the conversion is possible:
966
967* The truncated bits are all 0.
968* The truncated bits are all 1 and the value after truncation has its MSB bit set.
969
970Examples of valid literals:
971
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000972.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000973
974 // GFX9
Dmitry Preobrazhensky78dfeb72018-12-28 11:48:23 +0000975 // Literal value after conversion:
976 v_add_u16 v0, 0xff00, v0 // 0xff00
977 v_add_u16 v0, 0xffffffffffffff00, v0 // 0xff00
978 v_add_u16 v0, -256, v0 // 0xff00
979 // Literal value after conversion:
980 s_bfe_i64 s[0:1], 0xffefffff, s3 // 0xffffffffffefffff
981 s_bfe_u64 s[0:1], 0xffefffff, s3 // 0x00000000ffefffff
982 v_ceil_f64_e32 v[0:1], 0xffefffff // 0xffefffff00000000 (-1.7976922776554302e308)
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000983
984Examples of invalid literals:
985
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +0000986.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000987
988 // GFX9
989
Dmitry Preobrazhensky78dfeb72018-12-28 11:48:23 +0000990 v_add_u16 v0, 0x1ff00, v0 // truncated bits are not all 0 or 1
991 v_add_u16 v0, 0xffffffffffff00ff, v0 // truncated bits do not match MSB of the result
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +0000992
993.. _amdgpu_synid_fp_lit_conv:
994
995Floating-Point Literals
996~~~~~~~~~~~~~~~~~~~~~~~
997
998Floating-point :ref:`literals<amdgpu_synid_literal>` are specified as 64-bit
999:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
1000
1001When used as operands they are converted to
1002:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
1003
1004 ============== ============== ================= =================================================================
1005 Expected type Condition Result Note
1006 ============== ============== ================= =================================================================
Dmitry Preobrazhensky78dfeb72018-12-28 11:48:23 +00001007 i16, u16, b16 cond(num,16) f16(num) Convert to f16 and use bits of the result as an integer value.
1008 i32, u32, b32 cond(num,32) f32(num) Convert to f32 and use bits of the result as an integer value.
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +00001009 i64, u64, b64 false \- Conversion disabled because of an unclear semantics.
Dmitry Preobrazhensky78dfeb72018-12-28 11:48:23 +00001010 f16 cond(num,16) f16(num) Convert to f16.
1011 f32 cond(num,32) f32(num) Convert to f32.
1012 f64 true {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result;
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +00001013 zero-fill low 32 bits of the result.
1014
1015 Note that the result may differ from the original number.
1016 ============== ============== ================= =================================================================
1017
1018The condition *cond(X,S)* indicates if an f64 number *X* can be converted
1019to a smaller *S*-bit floating-point type without overflow or underflow.
1020Precision lost is allowed.
1021
1022Examples of valid literals:
1023
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +00001024.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +00001025
1026 // GFX9
1027
1028 v_add_f16 v1, 65500.0, v2
1029 v_add_f32 v1, 65600.0, v2
1030
Dmitry Preobrazhensky78dfeb72018-12-28 11:48:23 +00001031 // Literal value before conversion: 1.7976931348623157e308 (0x7fefffffffffffff)
1032 // Literal value after conversion: 1.7976922776554302e308 (0x7fefffff00000000)
1033 v_ceil_f64 v[0:1], 1.7976931348623157e308
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +00001034
1035Examples of invalid literals:
1036
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +00001037.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +00001038
1039 // GFX9
1040
Dmitry Preobrazhensky78dfeb72018-12-28 11:48:23 +00001041 v_add_f16 v1, 65600.0, v2 // overflow
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +00001042
1043.. _amdgpu_synid_exp_conv:
1044
1045Expressions
1046~~~~~~~~~~~
1047
1048Expressions operate with and result in 64-bit integers.
1049
1050When used as operands they are truncated to
1051:ref:`expected operand size<amdgpu_syn_instruction_type>`.
1052No data type conversions are performed.
1053
1054Examples:
1055
Dmitry Preobrazhensky23da1102018-12-17 18:53:10 +00001056.. parsed-literal::
Dmitry Preobrazhensky51120d72018-12-17 17:38:11 +00001057
1058 // GFX9
1059
1060 x = 0.1
1061 v_sqrt_f32 v0, x // v0 = [low 32 bits of 0.1 (double)]
1062 v_sqrt_f32 v0, (0.1 + 0) // the same as above
1063 v_sqrt_f32 v0, 0.1 // v0 = [0.1 (double) converted to float]
1064