Slides of my talk at Open Source Summit NA

I just delivered a talk today at Open Source Summit NA, here in LA, about everything we’ve been doing to support explicit synchronization on the Media and Graphics pipeline in the kernel. You can find the slides here.

The DRM side is already mainline, but V4L2 is currently my focus of work along with the linux-media community in the kernel. Blog posts about that should appear soon on this blog.

Collabora Contributions to Linux Kernel 4.10

Linux Kernel v4.10 is out and this time Collabora contributed a total of 39 patches by 10 different developers. You can read more about the v4.10 merge window on LWN.net: part 1, part 2 and part 3.

Now here is a look at the changes made by Collaborans. To begin with Daniel Stone fixed an issue when waiting for fences on the i915 driver, while Emil Velikov added support to read the PCI revision for sysfs to improve the starting time in some applications.

Emilio López added a set of selftests for the Sync File Framework and Enric Balletbo i Serra added support for the ChromeOS Embedded Controller Sensor Hub. Fabien Lahoudere added support for the NVD9128 simple panel and enabled ULPI phy for USB on i.MX.

Gabriel Krisman fixed a spurious CARD_INT interrupts for SD cards that was preventing one of our kernelCI machines to boot. On the graphics side Gustavo Padovan added Explicit Synchronization support to DRM/KMS.

Martyn Welch added GPIO support for CP2105 USB serial device while Nicolas Dufresne fixed Exynos4 FIMC to roundup imagesize to row size for tiled formats, otherwise there would be enough space to fit the last row of the image. Last but not least, Tomeu Vizoso added debugfs interface to capture frames CRCs, which is quite helpful for debugging and automated graphics testing.

And now the complete list of Collabora contributions:

Daniel Stone (1):

Emil Velikov (1):

Emilio López (7):

Enric Balletbo i Serra (3):

Fabien Lahoudere (4):

Gabriel Krisman Bertazi (1):

Gustavo Padovan (18):

Martyn Welch (1):

Nicolas Dufresne (1):

Tomeu Vizoso (2):

Mainline Explicit Fencing – part 3

In the last two articles we talked about how Explicit Fencing can help the graphics pipeline in general and what happened on the effort to upstream the Android Sync Framework. Now on the third post of this series we will go through the Explicit Fencing implementation on DRM and other elements of the graphics stack.

The DRM implementation lays down on top of two kernel infrastructures, struct dma_fence, which represents the fence and struct sync file that provides the file descriptors to be shared with userspace (as it was discussed in the previous articles). With fencing the display infrastructure needs to wait for a signal on that fence before displaying the buffer on the screen. On a Explicit Fencing implementation that fence is sent from userspace to the kernel. The display infrastructure also sends back to userspace a fence, encapsulated in a struct sync_file, that will be signalled when the buffer is scanned out on the screen. The same process happens on the rendering side.

It is mandatory to use of Atomic Modesetting and here is not plan to support legacy APIs. The fence that DRM will wait on needs to be passed via the IN_FENCE_FD property for each DRM plane, that means it will receive one sync_file fd containing one or more dma_fence per plane. Remember that in DRM a plane directly relates to a framebuffer so one can also say that there is one sync_file per framebuffer.

On the other hand for the fences created by the kernel that are sent back to userspace the OUT_FENCE_PTR property is used. It is a DRM CRTC property because we only create one dma_fence per CRTC as all the buffers on it will be scanned out at the same time. The kernel sends this fence back to userspace by writing the fd number to the pointer provided in the OUT_FENCE_PTR property. Note that, unlike from what Android did, when the fence signals it means the previous buffer – the buffer removed from the screen – is free for reuse. On Android when the signal was raised it meant the current buffer was freed. However, the Android folks have patched SurfaceFlinger already to support the Mainline semantics when using Explicit Fencing!

Nonetheless, that is only one side of the equation and to have the full graphics pipeline running with Explicit Fencing we need to support it on the rendering side as well. As every rendering driver has its own userspace API we need to add Explicit Fencing support to every single driver there. The freedreno driver already has its Explicit Fencing support  mainline and there is work in progress to add support to i915 and virtio_gpu.

On the userspace side Mesa already has support for the EGL_ANDROID_native_fence_sync needed to use Explicit Fencing on Android. Libdrm incorporated the headers to access the sync file IOCTL wrappers. On Android, libsync now has support for both the old Android Sync and Mainline Sinc File APIs. And finally, on drm_hwcomposer, patches to use Atomic Modesetting and Explicit Fencing are available but they are not upstream yet.

Validation tests for both Sync Files and fences on the Atomic API were written and added to IGT.

Collabora Contributions to Linux Kernel 4.9

Linux Kernel 4.9 was released this week and once more Collabora developers took part on the kernel development cycle. This time we contributed 37 patches by 11 different developers, our highest number of single contributors in a kernel release ever. Remember that in the previous release we had our highest number total contributions. The numbers shows how Collabora have been increasing its commitment in contributing to the upstream kernel community.

For those who want to see an overall report of what was happened in the 4.9 kernel take a look  on the always good LWN articles: part 1, 2  and 3.

As for Collabora contributions most of our work was in the DRM and DMABUF subsystems. Andrew Shadura and Daniel Stone added to fixes to the AMD and i915 drivers respectively. Emilio López added the missing install of sync_file.h uapi.

Gustavo Padovan advanced a few more steps on the goal to add explicit fencing to the DRM subsystem, besides a few improvements to Sync File and the virtio_gpu driver he also de-staged the SW_SYNC validation framework that helps with Sync File testing.

Peter Senna added drm_bridge support to imx-ldb device while Tomeu Vizoso improved drm_bridge support on RockChip’s analogic-dp and added documentation about validation of the DRM subsystem.

Outside of the Graphics world we had Enric Balletbo i Serra adding support to upload firmware on the ziirave watchdog device. Fabien Lahoudere and Martyn Welch enabled and improved DMA support for i.MX53 UARTs, allowing the device tree to decide whether DMA is used or not. Martyn also added a fake VMEbus (Versa Module Europa bus) to help with VME driver development.

On the Bluetooth, subsystem Frédéric Dalleau fixed an error code for SCO connections, that was causing big timeout and failures on SCO connections requests. Finally Robert Foss worked to clear the pipeline on errors for cdc-wdm USB devices.

Andrew Shadura (1):

Daniel Stone (1):

Emilio López (2):

Enric Balletbo i Serra (1):

Fabien Lahoudere (3):

Frédéric Dalleau (1):

Gustavo Padovan (14):

Martyn Welch (4):

Peter Senna Tschudin (1):

Robert Foss (2):

Tomeu Vizoso (7):

Mainline Explicit Fencing – part 2

In the first post we covered the main concepts behind Explicit Synchronization for the Linux Kernel. Now in the second post of the series we are going to look to the Android Sync Framework, the first (out-of-tree) Explicit Fencing implementation for the Linux Kernel.

The Sync Framework was the Android solution to implement Explicit Fencing in AOSP. It uses file descriptors to communicate fencing information between userspace and kernel and between userspace process.

In the Sync Framework it all starts with the creation of a Sync Timeline, a struct created for each driver context to represent a monotonically increasing counter. It is the Sync Timeline who will guarantee the ordering between fences in the same Timeline. The driver contexts could be different GPU rings, or different Displays on your hardware.

Sync Timeline

Sync Timeline

Then we have Sync Points(sync_pt), the name Android gave to fences, they represent a specific value in the Sync Timeline. When created the Sync Point is initialized in the Active state, and when it signals, i.e., the job it was associated to finishes, it transits to the Signaled state and informs the Sync Timeline to update the value of the last signaled Sync Point.

Sync Point

Sync Point

To export and import Sync Points to/from userspace the Sync Fence struct is used. Under the hood the the Sync Fence is a Linux file and we use thte Sync Fence to store Sync Point information. To exported to userspace a unused file descriptor(fd) is associated to the Sync Fence file. Drivers can then use the file descriptor to pass the Sync Point information around.

Sync Fence

Sync Fence

The Sync Fence is usually created just after the Sync Point creation, it then travel through the pipeline, via userspace, until the driver that is going to wait for the Sync Fence to signal. The Sync Fence signal when the Sync Point inside it signals.

One of the most important features of the Android Sync Framework is the ability to merge Sync Fences into a new Sync Fence containing all Sync Points from both Sync Fences. It can contain as many Sync Points as your resource allows. A merged Sync Fence will only signal when all its Sync Points signals.

Sync Fence with Merged fences

Sync Fence with Merged Fences. Here we merge two Sync Points into one Sync File.

When it comes to userspace API the Sync Framework has implements three ioctl calls. The first one is to wait on sync_fence to signal. There is also a call to merge two sync_fences into a third and new sync_fence. And finally there is a also a call to grab information about the sync_fence and all its sync_points.

The Sync Fences fds are passed to/from the kernel in the calls to ask the kernel to render or display a buffer.

This was intended to be a overview of the Sync Framework as we will see some of these concepts on the next article where we will talk about the effort to add explict fencing on mainline kernel. If you want to learn more about the Sync Framework you can find more info here and here.

Collabora Contributions to Linux Kernel 4.8

Linux Kernel 4.8 is out and once more Collabora engineers did a significant contribution to the Kernel. For the 4.8 Collabora contributed 101 patches by 8 engineers, our record to date in single kernel release! We’ve also seen the first contribution from Frederic Dalleau since he joined Collabora. LWN.net covered the new features of the new kernel in three different posts, here, here and here.

On the Collabora side of the contributions we touched a few different areas in the kernel. Bob Ham, who recently left Collabora, added support for the Alea I Random Number Generator, while Enric Balletbo improved the audio support on the Rockchip rk3288 SoC. Frederic Dalleau fixed an important memory leak on the Bluetooth stack.

Gustavo Padovan continued his work add Explicit Synchronization for Buffer Sharing on the kernel. In this release he added fence_array support and prepared the SW_SYNC interfaces for de-staging, SW_SYNC meant to be used for Explict Syncronization testing. He also worked in removing some of the legacy functions from drm_irq.c from the kernel.

Helen Koike added some improvements and clean ups to the ASoC subsystem mainly on the max9877 and tpa6130a2 drivers. Nicolas Dufresne fixed the bytes per line calculation on YUV planes on the uvcvideo driver.

Thierry Escande added many improvements the NFC digital layer and Tomeu Vizoso added a new helper for the ChromeOS Embedded Controller and improved usage of DRM Core APIs on the Rockchip driver. He also fixed an issue with the Analogix DP on Rockchip that was not enabling clocks in the correct order.

Bob Ham (2):

Enric Balletbo i Serra (8):

Frederic Dalleau (1):

Gustavo Padovan (50):

Helen Koike (8):

Nicolas Dufresne (1):

Thierry Escande (26):

Tomeu Vizoso (5):

My talk about Mainline Explicit Fencing at XDC 2016!

Last week I was at XDC in Helsinki where I presented about the Explicit Fencing work we’ve been doing on the Mainline Linux Kernel in the lastest few months. There was a livestream of all presentations during the conference and recorded sections are available. You can check the video of my presentation. Check out the slides too.

If you want to check the code we’ve been writing they are available here:

Linux Kernel: https://git.kernel.org/cgit/linux/kernel/git/padovan/linux.git/log/?h=fences

Mesa: https://git.collabora.com/cgit/user/padovan/mesa.git/log/?h=fences

libdrm: https://git.collabora.com/cgit/user/padovan/libdrm.git/log/?h=fences

kmscube: https://github.com/robclark/kmscube/tree/atomic-fence

Soon we will get Explicit Fencing on Android’s drm_hwcomposer as well so expect updates on this blog with more information about that. :)

Also I would like to take the opportunity to thank Collabora for sponsoring my travel to XDC and Martin Peres for organizing such a great conference. It was my first time attending XDC and my time there was absolutely great, I  have learnt a lot about what the Graphics community have been doing lately and I met the people doing this work. I was happy to see a lot of interest from many people around the Explicit Fencing work we’ve doing.

 

Mainline Explicit Fencing – part 1

When it comes to buffer sharing synchronization in the kernel there are two ways of doing it: Implicit Fencing and Explicit Fencing. The difference between them relies on the fact that the kernel may or may not share synchronization information with userspace, it will either be implicit, with no fencing information provided, or explicit with all information available to userspace.

The fencing synchronization mechanism allows the sharing of buffers without the risk of a driver or userspace to read an incomplete buffer or write to a buffer that is still under use somewhere else in the system. The fencing provides ordering to these operations to make reads or writes happen only when the buffer is not used by other drivers anymore. For example,when a GPU job is queued a fence is associated to the buffer in the job, that fence can be used by other drivers for synchronization purposes, they won’t use the buffer a signal from the fence is received. The signal means the buffers is now free to be used. Similarly we can have the same setting for the GPU driver to wait the buffer to come out of the screen to render on it again.

The central piece here is the fence, an element that is attached to each buffer whenever a request involving the buffer is sent to the kernel. The fence can be used by userspace or other drivers to wait for the work to finish. So once the work is finished the fence signals and the waiter can proceed and do whatever they want with the buffer.

While Implicit Fencing  helps a lot with buffer synchronization there are a few cases where the whole desktop compositing could stall. Imagine the following compositor flow: there are 3 buffers to process, A, B and C. A and B are sent for rendering in parallel while C is going to be composed of both A and B. But the compositor will only be notified when both buffers are rendered thus if B takes too long the compositing of the whole desktop will be blocked waiting for B and C won’t be displayed in time.

A compositor processing two buffers in parallel

A compositor processing two buffers in parallel, with Implicit Fencing if B takes too long the desktop compositor freezes.

However with Explicit Fencing the compositor should have one fence for each buffer and will be notified when each buffer is rendered. So if A renders fast and B takes too long the compositor can decide not wait for B and proceed with the scanout of C with buffer A but an old version of B. The fencing information allows the compositor to be smart and take decisions to avoid the screen to freeze for example.

As of today the Linux Kernel only has generic APIs for Implicit Fencing, although some drivers have Explicit Fencing already their APIs are device specific. Android currently has its own implementation through the Android Sync Framework – which will be explained in the next article.

Explicit Fencing works on a Consumer-Producer fashion. In an GPU rendering + scanout to the screen pipeline it would synchronize between the kernel drivers, so when submitting a new rendering job to the GPU(Producer side) userspace would get back a fence related to that buffer submitted. That means userspace doesn’t need to block waiting for the job to complete, a signal is sent when the job is finished. As userspace doesn’t need to block it and has a fence of the buffer it then can proceed right away with the syscall to ask the display hardware(Consumer) to scanout the buffer that is yet to be processed. With explicit fencing the kernel is taught to wait for the fence to signal, before starting the scanout process.

A new fence is returned to userspace when the buffer is submitted to the kernel for scanout on the display hardware, that fence will signal when the buffer is not being displayed anymore, thus is ready for reuse by another rendering job. When the userspace gets this fence back it can submit a new rendering job to the GPU without waiting. The wait is done on the kernel side by the GPU driver, once the fence signals the rendering on that buffer can be initiated.

Explicit Fencing

The fence travels all the way to userspace and the next element on the pipeline. The yellow arrows represents the fences on userspace.

Last but not least, debugability of the graphics pipeline is improved. Having access to the fence in userspace helps a lot understanding what is happening in the pipeline. Previously, with Implicit Fencing there was no infomation available, so it was hard to figure out what was happening on the pipeline, also each vendor was trying to implement their own Implicit Fencing mechanism. Now with an standard Explicit Fencing mechanism it easier to build debug/tracing infrastructure that can be used to investigate issues in any system.

The next article will explain the Android Sync Framework and later the work on mainline to support explicit fencing will be described.

Slides for my LinuxCon talk on Mainline Explicit Fencing

For those of you that are interested here are the slides of the my presentation at LinuxCon North America this week. The conference was great with very good talks and very interesting meetings on the hallway track.

My presentation covered the effort to create the Explicit Fencing mechanism on the Linux Kernel which is to be used mainly by the Graphics pipeline. In short, Explicit Fencing is a way to give userspace information about the current state of shared buffers inside the kernel. This is done through fences, that can then be passed around to userspace and/or other kernel drivers for synchronization purposes. This allows both userspace and kernel to wait for kernel jobs to finish without blocking. It also significantly helps the compositor take more efficient and smart decisions on scheduling frames to display on the screen. I’ll be posting an article with more details on it soon. :)

Finally I would like to thank Collabora for sponsoring my travel to LinuxCon.