Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC: SOF: skl enable the core and get ROM init #120

Conversation

zhuyingjiang
Copy link

@zhuyingjiang zhuyingjiang commented Sep 10, 2018

enable the kbl DSP core.
get the ROM init.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall needing more comments explaining what each block does.


// skl/kbl enable core and code loader DMA has some difference with apl/cnl
// add the APIs for skl/kbl
static int cl_stream_prepare_skl(struct snd_sof_dev *sdev, unsigned int format,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed to set tag to 0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is for debug, this function will be implement in next patch. and if not set a value, checkpatch will give a warning.

@@ -372,3 +372,148 @@ int hda_dsp_cl_boot_firmware(struct snd_sof_dev *sdev)
dev_err(sdev->dev, "error: load fw failed err: %d\n", ret);
return ret;
}

// skl/kbl enable core and code loader DMA has some difference with apl/cnl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please always use C style comments /* comment */

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will change that.

}

/* prepare DMA for code loader stream */
tag = cl_stream_prepare_skl(sdev, 0x40, fwsize, &sdev->dmab,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just set tag = 0 at start of this function if it's always going to be 0 ???

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like answered above, the next patch will get the true tag value, this is only for debug.

int tag, ret, i;
u32 hipcie;
u32 reg;
const struct sof_intel_dsp_desc *chip = sdev->hda->desc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structures should always be at the top of the variables.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will change that.

}

static int cl_dsp_init_skl(struct snd_sof_dev *sdev, const void *fwdata,
u32 fwsize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, github can skew the formatting a little, so it's best to always run checkpatch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run the checkpatch, but don't know why here is still not aligned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just github rendering, if checkpatch is OK then you are good.


memcpy(sdev->dmab.area, fwdata, fwsize);

snd_sof_dsp_update_bits(sdev, HDA_DSP_BAR, HDA_DSP_REG_ADSPIC,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls comment what we are setting and why

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so every register set need to be comment? Or only the main part?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally, every register if its not obvious.

u32 reg;
const struct sof_intel_dsp_desc *chip = sdev->hda->desc;

if (hda_dsp_core_is_enabled(sdev, HDA_DSP_CORE_MASK(0))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to comment what this block does, and why we check core status here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will have comment on that.

goto err;
}
} else {
ret = hda_dsp_core_reset_power_down(sdev, HDA_DSP_CORE_MASK(0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please comment this block

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


static int cl_copy_fw_skl(struct snd_sof_dev *sdev, int tag)
{
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is just for debug ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

struct firmware stripped_firmware;
int ret, tag;

stripped_firmware.data = plat_data->fw->data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to temporarily add an offset here to strip the extended manifest when using the closed source firmware (as it has an extended manifest pre pended ato the start of the FW binary file) otherwise the FW will not authenticate or boot..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ask @keyonjie for extended manifest details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will ask Keyon what is the offset. We still need to check if the firmware is rightly booted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, FW will not boot without the offset change and remove of extended manifest

1. enable the skl core
2. get the ROM int

Signed-off-by: Zhu Yingjiang <yingjiang.zhu@linux.intel.com>
@zhuyingjiang
Copy link
Author

@lgirdwood

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this PR is the correct one as it looks more complete than your other PR ? Can you close the old PR that way I know I'm reviewing the correct one.

unsigned int size, struct snd_dma_buffer *dmab,
int direction)
{
/* the skl cl dma don't use stream tag, ret is for debug */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this func if its not needed.

}

/* prepare DMA for code loader stream */
ret = cl_stream_prepare_skl(sdev, 0x40, fwsize, &sdev->dmab,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic number 0x40, please use a macro

ranj063 pushed a commit to ranj063/linux that referenced this pull request Nov 28, 2018
commit 5c4c450 upstream.

The parameters of v4l2_ctrl_new_std_menu_items() are tricky: instead of
the number of possible values, it requires the number of the maximum
value. In other words, the ARRAY_SIZE() value should be decremented,
otherwise it will go past the array bounds, as warned by KASAN:

[  279.839688] BUG: KASAN: global-out-of-bounds in v4l2_querymenu+0x10d/0x180 [videodev]
[  279.839709] Read of size 8 at addr ffffffffc10a4cb0 by task v4l2-compliance/16676

[  279.839736] CPU: 1 PID: 16676 Comm: v4l2-compliance Not tainted 4.18.0-rc2+ thesofproject#120
[  279.839741] Hardware name:  /NUC5i7RYB, BIOS RYBDWi35.86A.0364.2017.0511.0949 05/11/2017
[  279.839743] Call Trace:
[  279.839758]  dump_stack+0x71/0xab
[  279.839807]  ? v4l2_querymenu+0x10d/0x180 [videodev]
[  279.839817]  print_address_description+0x1c9/0x270
[  279.839863]  ? v4l2_querymenu+0x10d/0x180 [videodev]
[  279.839871]  kasan_report+0x237/0x360
[  279.839918]  v4l2_querymenu+0x10d/0x180 [videodev]
[  279.839964]  __video_do_ioctl+0x2c8/0x590 [videodev]
[  279.840011]  ? copy_overflow+0x20/0x20 [videodev]
[  279.840020]  ? avc_ss_reset+0xa0/0xa0
[  279.840028]  ? check_stack_object+0x21/0x60
[  279.840036]  ? __check_object_size+0xe7/0x240
[  279.840080]  video_usercopy+0xed/0x730 [videodev]
[  279.840123]  ? copy_overflow+0x20/0x20 [videodev]
[  279.840167]  ? v4l_enumstd+0x40/0x40 [videodev]
[  279.840177]  ? __handle_mm_fault+0x9f9/0x1ba0
[  279.840186]  ? __pmd_alloc+0x2c0/0x2c0
[  279.840193]  ? __vfs_write+0xb6/0x350
[  279.840200]  ? kernel_read+0xa0/0xa0
[  279.840244]  ? video_usercopy+0x730/0x730 [videodev]
[  279.840284]  v4l2_ioctl+0xa1/0xb0 [videodev]
[  279.840295]  do_vfs_ioctl+0x117/0x8a0
[  279.840303]  ? selinux_file_ioctl+0x211/0x2f0
[  279.840313]  ? ioctl_preallocate+0x120/0x120
[  279.840319]  ? selinux_capable+0x20/0x20
[  279.840332]  ksys_ioctl+0x70/0x80
[  279.840342]  __x64_sys_ioctl+0x3d/0x50
[  279.840351]  do_syscall_64+0x6d/0x1c0
[  279.840361]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  279.840367] RIP: 0033:0x7fdfb46275d7
[  279.840369] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
[  279.840474] RSP: 002b:00007ffee1179038 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  279.840483] RAX: ffffffffffffffda RBX: 00007ffee1179180 RCX: 00007fdfb46275d7
[  279.840488] RDX: 00007ffee11790c0 RSI: 00000000c02c5625 RDI: 0000000000000003
[  279.840493] RBP: 0000000000000002 R08: 0000000000000020 R09: 00000000009f0902
[  279.840497] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffee117a5a0
[  279.840501] R13: 00007ffee11790c0 R14: 0000000000000002 R15: 0000000000000000

[  279.840515] The buggy address belongs to the variable:
[  279.840535]  tvp5150_test_patterns+0x10/0xffffffffffffe360 [tvp5150]

Fixes: c43875f ("[media] tvp5150: replace MEDIA_ENT_F_CONN_TEST by a control")
Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
naveen-manohar pushed a commit to naveen-manohar/linux that referenced this pull request Jun 22, 2020
commit 1413ef6 upstream.

The struct cdev is embedded in the struct i2c_dev. In the current code,
we would free the i2c_dev struct directly in put_i2c_dev(), but the
cdev is manged by a kobject, and the release of it is not predictable.
So it is very possible that the i2c_dev is freed before the cdev is
entirely released. We can easily get the following call trace with
CONFIG_DEBUG_KOBJECT_RELEASE and CONFIG_DEBUG_OBJECTS_TIMERS enabled.
  ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x38
  WARNING: CPU: 19 PID: 1 at lib/debugobjects.c:325 debug_print_object+0xb0/0xf0
  Modules linked in:
  CPU: 19 PID: 1 Comm: swapper/0 Tainted: G        W         5.2.20-yocto-standard+ thesofproject#120
  Hardware name: Marvell OcteonTX CN96XX board (DT)
  pstate: 80c00089 (Nzcv daIf +PAN +UAO)
  pc : debug_print_object+0xb0/0xf0
  lr : debug_print_object+0xb0/0xf0
  sp : ffff00001292f7d0
  x29: ffff00001292f7d0 x28: ffff800b82151788
  x27: 0000000000000001 x26: ffff800b892c0000
  x25: ffff0000124a2558 x24: 0000000000000000
  x23: ffff00001107a1d8 x22: ffff0000116b5088
  x21: ffff800bdc6afca8 x20: ffff000012471ae8
  x19: ffff00001168f2c8 x18: 0000000000000010
  x17: 00000000fd6f304b x16: 00000000ee79de43
  x15: ffff800bc0e80568 x14: 79616c6564203a74
  x13: 6e6968207473696c x12: 5f72656d6974203a
  x11: ffff0000113f0018 x10: 0000000000000000
  x9 : 000000000000001f x8 : 0000000000000000
  x7 : ffff0000101294cc x6 : 0000000000000000
  x5 : 0000000000000000 x4 : 0000000000000001
  x3 : 00000000ffffffff x2 : 0000000000000000
  x1 : 387fc15c8ec0f200 x0 : 0000000000000000
  Call trace:
   debug_print_object+0xb0/0xf0
   __debug_check_no_obj_freed+0x19c/0x228
   debug_check_no_obj_freed+0x1c/0x28
   kfree+0x250/0x440
   put_i2c_dev+0x68/0x78
   i2cdev_detach_adapter+0x60/0xc8
   i2cdev_notifier_call+0x3c/0x70
   notifier_call_chain+0x8c/0xe8
   blocking_notifier_call_chain+0x64/0x88
   device_del+0x74/0x380
   device_unregister+0x54/0x78
   i2c_del_adapter+0x278/0x2d0
   unittest_i2c_bus_remove+0x3c/0x80
   platform_drv_remove+0x30/0x50
   device_release_driver_internal+0xf4/0x1c0
   driver_detach+0x58/0xa0
   bus_remove_driver+0x84/0xd8
   driver_unregister+0x34/0x60
   platform_driver_unregister+0x20/0x30
   of_unittest_overlay+0x8d4/0xbe0
   of_unittest+0xae8/0xb3c
   do_one_initcall+0xac/0x450
   do_initcall_level+0x208/0x224
   kernel_init_freeable+0x2d8/0x36c
   kernel_init+0x18/0x108
   ret_from_fork+0x10/0x1c
  irq event stamp: 3934661
  hardirqs last  enabled at (3934661): [<ffff00001009fa04>] debug_exception_exit+0x4c/0x58
  hardirqs last disabled at (3934660): [<ffff00001009fb14>] debug_exception_enter+0xa4/0xe0
  softirqs last  enabled at (3934654): [<ffff000010081d94>] __do_softirq+0x46c/0x628
  softirqs last disabled at (3934649): [<ffff0000100b4a1c>] irq_exit+0x104/0x118

This is a common issue when using cdev embedded in a struct.
Fortunately, we already have a mechanism to solve this kind of issue.
Please see commit 233ed09 ("chardev: add helper function to
register char devs with a struct device") for more detail.

In this patch, we choose to embed the struct device into the i2c_dev,
and use the API provided by the commit 233ed09 to make sure that
the release of i2c_dev and cdev are in sequence.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
aiChaoSONG pushed a commit to aiChaoSONG/linux that referenced this pull request May 6, 2021
Vamshigopal pushed a commit to Vamshigopal/linux that referenced this pull request Jan 24, 2023
[ Upstream commit 031af50 ]

The inline assembly for arm64's cmpxchg_double*() implementations use a
+Q constraint to hazard against other accesses to the memory location
being exchanged. However, the pointer passed to the constraint is a
pointer to unsigned long, and thus the hazard only applies to the first
8 bytes of the location.

GCC can take advantage of this, assuming that other portions of the
location are unchanged, leading to a number of potential problems.

This is similar to what we fixed back in commit:

  fee960b ("arm64: xchg: hazard against entire exchange variable")

... but we forgot to adjust cmpxchg_double*() similarly at the same
time.

The same problem applies, as demonstrated with the following test:

| struct big {
|         u64 lo, hi;
| } __aligned(128);
|
| unsigned long foo(struct big *b)
| {
|         u64 hi_old, hi_new;
|
|         hi_old = b->hi;
|         cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78);
|         hi_new = b->hi;
|
|         return hi_old ^ hi_new;
| }

... which GCC 12.1.0 compiles as:

| 0000000000000000 <foo>:
|    0:   d503233f        paciasp
|    4:   aa0003e4        mov     x4, x0
|    8:   1400000e        b       40 <foo+0x40>
|    c:   d2800240        mov     x0, #0x12                       // thesofproject#18
|   10:   d2800681        mov     x1, #0x34                       // thesofproject#52
|   14:   aa0003e5        mov     x5, x0
|   18:   aa0103e6        mov     x6, x1
|   1c:   d2800ac2        mov     x2, #0x56                       // thesofproject#86
|   20:   d2800f03        mov     x3, #0x78                       // thesofproject#120
|   24:   48207c82        casp    x0, x1, x2, x3, [x4]
|   28:   ca050000        eor     x0, x0, x5
|   2c:   ca060021        eor     x1, x1, x6
|   30:   aa010000        orr     x0, x0, x1
|   34:   d2800000        mov     x0, #0x0                        // #0    <--- BANG
|   38:   d50323bf        autiasp
|   3c:   d65f03c0        ret
|   40:   d2800240        mov     x0, #0x12                       // thesofproject#18
|   44:   d2800681        mov     x1, #0x34                       // thesofproject#52
|   48:   d2800ac2        mov     x2, #0x56                       // thesofproject#86
|   4c:   d2800f03        mov     x3, #0x78                       // thesofproject#120
|   50:   f9800091        prfm    pstl1strm, [x4]
|   54:   c87f1885        ldxp    x5, x6, [x4]
|   58:   ca0000a5        eor     x5, x5, x0
|   5c:   ca0100c6        eor     x6, x6, x1
|   60:   aa0600a6        orr     x6, x5, x6
|   64:   b5000066        cbnz    x6, 70 <foo+0x70>
|   68:   c8250c82        stxp    w5, x2, x3, [x4]
|   6c:   35ffff45        cbnz    w5, 54 <foo+0x54>
|   70:   d2800000        mov     x0, #0x0                        // #0     <--- BANG
|   74:   d50323bf        autiasp
|   78:   d65f03c0        ret

Notice that at the lines with "BANG" comments, GCC has assumed that the
higher 8 bytes are unchanged by the cmpxchg_double() call, and that
`hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and
LL/SC versions of cmpxchg_double().

This patch fixes the issue by passing a pointer to __uint128_t into the
+Q constraint, ensuring that the compiler hazards against the entire 16
bytes being modified.

With this change, GCC 12.1.0 compiles the above test as:

| 0000000000000000 <foo>:
|    0:   f9400407        ldr     x7, [x0, thesofproject#8]
|    4:   d503233f        paciasp
|    8:   aa0003e4        mov     x4, x0
|    c:   1400000f        b       48 <foo+0x48>
|   10:   d2800240        mov     x0, #0x12                       // thesofproject#18
|   14:   d2800681        mov     x1, #0x34                       // thesofproject#52
|   18:   aa0003e5        mov     x5, x0
|   1c:   aa0103e6        mov     x6, x1
|   20:   d2800ac2        mov     x2, #0x56                       // thesofproject#86
|   24:   d2800f03        mov     x3, #0x78                       // thesofproject#120
|   28:   48207c82        casp    x0, x1, x2, x3, [x4]
|   2c:   ca050000        eor     x0, x0, x5
|   30:   ca060021        eor     x1, x1, x6
|   34:   aa010000        orr     x0, x0, x1
|   38:   f9400480        ldr     x0, [x4, thesofproject#8]
|   3c:   d50323bf        autiasp
|   40:   ca0000e0        eor     x0, x7, x0
|   44:   d65f03c0        ret
|   48:   d2800240        mov     x0, #0x12                       // thesofproject#18
|   4c:   d2800681        mov     x1, #0x34                       // thesofproject#52
|   50:   d2800ac2        mov     x2, #0x56                       // thesofproject#86
|   54:   d2800f03        mov     x3, #0x78                       // thesofproject#120
|   58:   f9800091        prfm    pstl1strm, [x4]
|   5c:   c87f1885        ldxp    x5, x6, [x4]
|   60:   ca0000a5        eor     x5, x5, x0
|   64:   ca0100c6        eor     x6, x6, x1
|   68:   aa0600a6        orr     x6, x5, x6
|   6c:   b5000066        cbnz    x6, 78 <foo+0x78>
|   70:   c8250c82        stxp    w5, x2, x3, [x4]
|   74:   35ffff45        cbnz    w5, 5c <foo+0x5c>
|   78:   f9400480        ldr     x0, [x4, thesofproject#8]
|   7c:   d50323bf        autiasp
|   80:   ca0000e0        eor     x0, x7, x0
|   84:   d65f03c0        ret

... sampling the high 8 bytes before and after the cmpxchg, and
performing an EOR, as we'd expect.

For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note
that linux-4.9.y is oldest currently supported stable release, and
mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run
on my machines due to library incompatibilities.

I've also used a standalone test to check that we can use a __uint128_t
pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM
3.9.1.

Fixes: 5284e1b ("arm64: xchg: Implement cmpxchg_double")
Fixes: e9a4b79 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/
Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
ujfalusi pushed a commit to ujfalusi/sof-linux that referenced this pull request Feb 17, 2023
The inline assembly for arm64's cmpxchg_double*() implementations use a
+Q constraint to hazard against other accesses to the memory location
being exchanged. However, the pointer passed to the constraint is a
pointer to unsigned long, and thus the hazard only applies to the first
8 bytes of the location.

GCC can take advantage of this, assuming that other portions of the
location are unchanged, leading to a number of potential problems.

This is similar to what we fixed back in commit:

  fee960b ("arm64: xchg: hazard against entire exchange variable")

... but we forgot to adjust cmpxchg_double*() similarly at the same
time.

The same problem applies, as demonstrated with the following test:

| struct big {
|         u64 lo, hi;
| } __aligned(128);
|
| unsigned long foo(struct big *b)
| {
|         u64 hi_old, hi_new;
|
|         hi_old = b->hi;
|         cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78);
|         hi_new = b->hi;
|
|         return hi_old ^ hi_new;
| }

... which GCC 12.1.0 compiles as:

| 0000000000000000 <foo>:
|    0:   d503233f        paciasp
|    4:   aa0003e4        mov     x4, x0
|    8:   1400000e        b       40 <foo+0x40>
|    c:   d2800240        mov     x0, #0x12                       // thesofproject#18
|   10:   d2800681        mov     x1, #0x34                       // thesofproject#52
|   14:   aa0003e5        mov     x5, x0
|   18:   aa0103e6        mov     x6, x1
|   1c:   d2800ac2        mov     x2, #0x56                       // thesofproject#86
|   20:   d2800f03        mov     x3, #0x78                       // thesofproject#120
|   24:   48207c82        casp    x0, x1, x2, x3, [x4]
|   28:   ca050000        eor     x0, x0, x5
|   2c:   ca060021        eor     x1, x1, x6
|   30:   aa010000        orr     x0, x0, x1
|   34:   d2800000        mov     x0, #0x0                        // #0    <--- BANG
|   38:   d50323bf        autiasp
|   3c:   d65f03c0        ret
|   40:   d2800240        mov     x0, #0x12                       // thesofproject#18
|   44:   d2800681        mov     x1, #0x34                       // thesofproject#52
|   48:   d2800ac2        mov     x2, #0x56                       // thesofproject#86
|   4c:   d2800f03        mov     x3, #0x78                       // thesofproject#120
|   50:   f9800091        prfm    pstl1strm, [x4]
|   54:   c87f1885        ldxp    x5, x6, [x4]
|   58:   ca0000a5        eor     x5, x5, x0
|   5c:   ca0100c6        eor     x6, x6, x1
|   60:   aa0600a6        orr     x6, x5, x6
|   64:   b5000066        cbnz    x6, 70 <foo+0x70>
|   68:   c8250c82        stxp    w5, x2, x3, [x4]
|   6c:   35ffff45        cbnz    w5, 54 <foo+0x54>
|   70:   d2800000        mov     x0, #0x0                        // #0     <--- BANG
|   74:   d50323bf        autiasp
|   78:   d65f03c0        ret

Notice that at the lines with "BANG" comments, GCC has assumed that the
higher 8 bytes are unchanged by the cmpxchg_double() call, and that
`hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and
LL/SC versions of cmpxchg_double().

This patch fixes the issue by passing a pointer to __uint128_t into the
+Q constraint, ensuring that the compiler hazards against the entire 16
bytes being modified.

With this change, GCC 12.1.0 compiles the above test as:

| 0000000000000000 <foo>:
|    0:   f9400407        ldr     x7, [x0, thesofproject#8]
|    4:   d503233f        paciasp
|    8:   aa0003e4        mov     x4, x0
|    c:   1400000f        b       48 <foo+0x48>
|   10:   d2800240        mov     x0, #0x12                       // thesofproject#18
|   14:   d2800681        mov     x1, #0x34                       // thesofproject#52
|   18:   aa0003e5        mov     x5, x0
|   1c:   aa0103e6        mov     x6, x1
|   20:   d2800ac2        mov     x2, #0x56                       // thesofproject#86
|   24:   d2800f03        mov     x3, #0x78                       // thesofproject#120
|   28:   48207c82        casp    x0, x1, x2, x3, [x4]
|   2c:   ca050000        eor     x0, x0, x5
|   30:   ca060021        eor     x1, x1, x6
|   34:   aa010000        orr     x0, x0, x1
|   38:   f9400480        ldr     x0, [x4, thesofproject#8]
|   3c:   d50323bf        autiasp
|   40:   ca0000e0        eor     x0, x7, x0
|   44:   d65f03c0        ret
|   48:   d2800240        mov     x0, #0x12                       // thesofproject#18
|   4c:   d2800681        mov     x1, #0x34                       // thesofproject#52
|   50:   d2800ac2        mov     x2, #0x56                       // thesofproject#86
|   54:   d2800f03        mov     x3, #0x78                       // thesofproject#120
|   58:   f9800091        prfm    pstl1strm, [x4]
|   5c:   c87f1885        ldxp    x5, x6, [x4]
|   60:   ca0000a5        eor     x5, x5, x0
|   64:   ca0100c6        eor     x6, x6, x1
|   68:   aa0600a6        orr     x6, x5, x6
|   6c:   b5000066        cbnz    x6, 78 <foo+0x78>
|   70:   c8250c82        stxp    w5, x2, x3, [x4]
|   74:   35ffff45        cbnz    w5, 5c <foo+0x5c>
|   78:   f9400480        ldr     x0, [x4, thesofproject#8]
|   7c:   d50323bf        autiasp
|   80:   ca0000e0        eor     x0, x7, x0
|   84:   d65f03c0        ret

... sampling the high 8 bytes before and after the cmpxchg, and
performing an EOR, as we'd expect.

For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note
that linux-4.9.y is oldest currently supported stable release, and
mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run
on my machines due to library incompatibilities.

I've also used a standalone test to check that we can use a __uint128_t
pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM
3.9.1.

Fixes: 5284e1b ("arm64: xchg: Implement cmpxchg_double")
Fixes: e9a4b79 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/
Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
vijendarmukunda pushed a commit to vijendarmukunda/linux that referenced this pull request Feb 13, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  thesofproject#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  thesofproject#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  thesofproject#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  thesofproject#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  thesofproject#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  thesofproject#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  thesofproject#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  thesofproject#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  thesofproject#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  thesofproject#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  thesofproject#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  thesofproject#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  thesofproject#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants