The Sync Framework was the Android solution to implement Explicit Fencing in AOSP. It uses file descriptors to communicate fencing information between userspace and kernel and between userspace process.
In the Sync Framework it all starts with the creation of a Sync Timeline, a struct created for each driver context to represent a monotonically increasing counter. It is the Sync Timeline who will guarantee the ordering between fences in the same Timeline. The driver contexts could be different GPU rings, or different Displays on your hardware.
Then we have Sync Points(sync_pt), the name Android gave to fences, they represent a specific value in the Sync Timeline. When created the Sync Point is initialized in the Active state, and when it signals, i.e., the job it was associated to finishes, it transits to the Signaled state and informs the Sync Timeline to update the value of the last signaled Sync Point.
To export and import Sync Points to/from userspace the Sync Fence struct is used. Under the hood the the Sync Fence is a Linux file and we use thte Sync Fence to store Sync Point information. To exported to userspace a unused file descriptor(fd) is associated to the Sync Fence file. Drivers can then use the file descriptor to pass the Sync Point information around.
The Sync Fence is usually created just after the Sync Point creation, it then travel through the pipeline, via userspace, until the driver that is going to wait for the Sync Fence to signal. The Sync Fence signal when the Sync Point inside it signals.
One of the most important features of the Android Sync Framework is the ability to merge Sync Fences into a new Sync Fence containing all Sync Points from both Sync Fences. It can contain as many Sync Points as your resource allows. A merged Sync Fence will only signal when all its Sync Points signals.
When it comes to userspace API the Sync Framework has implements three ioctl calls. The first one is to wait on sync_fence to signal. There is also a call to merge two sync_fences into a third and new sync_fence. And finally there is a also a call to grab information about the sync_fence and all its sync_points.
The Sync Fences fds are passed to/from the kernel in the calls to ask the kernel to render or display a buffer.
This was intended to be a overview of the Sync Framework as we will see some of these concepts on the next article where we will talk about the effort to add explict fencing on mainline kernel. If you want to learn more about the Sync Framework you can find more info here and here.]]>
On the Collabora side of the contributions we touched a few different areas in the kernel. Bob Ham, who recently left Collabora, added support for the Alea I Random Number Generator, while Enric Balletbo improved the audio support on the Rockchip rk3288 SoC. Frederic Dalleau fixed an important memory leak on the Bluetooth stack.
Gustavo Padovan continued his work add Explicit Synchronization for Buffer Sharing on the kernel. In this release he added fence_array support and prepared the SW_SYNC interfaces for de-staging, SW_SYNC meant to be used for Explict Syncronization testing. He also worked in removing some of the legacy functions from drm_irq.c from the kernel.
Helen Koike added some improvements and clean ups to the ASoC subsystem mainly on the max9877 and tpa6130a2 drivers. Nicolas Dufresne fixed the bytes per line calculation on YUV planes on the uvcvideo driver.
Thierry Escande added many improvements the NFC digital layer and Tomeu Vizoso added a new helper for the ChromeOS Embedded Controller and improved usage of DRM Core APIs on the Rockchip driver. He also fixed an issue with the Analogix DP on Rockchip that was not enabling clocks in the correct order.
Bob Ham (2):
Enric Balletbo i Serra (8):
Frederic Dalleau (1):
Gustavo Padovan (50):
Helen Koike (8):
Nicolas Dufresne (1):
Thierry Escande (26):
Tomeu Vizoso (5):
If you want to check the code we’ve been writing they are available here:
Linux Kernel: https://git.kernel.org/cgit/linux/kernel/git/padovan/linux.git/log/?h=fences
Soon we will get Explicit Fencing on Android’s drm_hwcomposer as well so expect updates on this blog with more information about that. :)
Also I would like to take the opportunity to thank Collabora for sponsoring my travel to XDC and Martin Peres for organizing such a great conference. It was my first time attending XDC and my time there was absolutely great, I have learnt a lot about what the Graphics community have been doing lately and I met the people doing this work. I was happy to see a lot of interest from many people around the Explicit Fencing work we’ve doing.
The fencing synchronization mechanism allows the sharing of buffers without the risk of a driver or userspace to read an incomplete buffer or write to a buffer that is still under use somewhere else in the system. The fencing provides ordering to these operations to make reads or writes happen only when the buffer is not used by other drivers anymore. For example,when a GPU job is queued a fence is associated to the buffer in the job, that fence can be used by other drivers for synchronization purposes, they won’t use the buffer a signal from the fence is received. The signal means the buffers is now free to be used. Similarly we can have the same setting for the GPU driver to wait the buffer to come out of the screen to render on it again.
The central piece here is the fence, an element that is attached to each buffer whenever a request involving the buffer is sent to the kernel. The fence can be used by userspace or other drivers to wait for the work to finish. So once the work is finished the fence signals and the waiter can proceed and do whatever they want with the buffer.
While Implicit Fencing helps a lot with buffer synchronization there are a few cases where the whole desktop compositing could stall. Imagine the following compositor flow: there are 3 buffers to process, A, B and C. A and B are sent for rendering in parallel while C is going to be composed of both A and B. But the compositor will only be notified when both buffers are rendered thus if B takes too long the compositing of the whole desktop will be blocked waiting for B and C won’t be displayed in time.
However with Explicit Fencing the compositor should have one fence for each buffer and will be notified when each buffer is rendered. So if A renders fast and B takes too long the compositor can decide not wait for B and proceed with the scanout of C with buffer A but an old version of B. The fencing information allows the compositor to be smart and take decisions to avoid the screen to freeze for example.
As of today the Linux Kernel only has generic APIs for Implicit Fencing, although some drivers have Explicit Fencing already their APIs are device specific. Android currently has its own implementation through the Android Sync Framework – which will be explained in the next article.
Explicit Fencing works on a Consumer-Producer fashion. In an GPU rendering + scanout to the screen pipeline it would synchronize between the kernel drivers, so when submitting a new rendering job to the GPU(Producer side) userspace would get back a fence related to that buffer submitted. That means userspace doesn’t need to block waiting for the job to complete, a signal is sent when the job is finished. As userspace doesn’t need to block it and has a fence of the buffer it then can proceed right away with the syscall to ask the display hardware(Consumer) to scanout the buffer that is yet to be processed. With explicit fencing the kernel is taught to wait for the fence to signal, before starting the scanout process.
A new fence is returned to userspace when the buffer is submitted to the kernel for scanout on the display hardware, that fence will signal when the buffer is not being displayed anymore, thus is ready for reuse by another rendering job. When the userspace gets this fence back it can submit a new rendering job to the GPU without waiting. The wait is done on the kernel side by the GPU driver, once the fence signals the rendering on that buffer can be initiated.
Last but not least, debugability of the graphics pipeline is improved. Having access to the fence in userspace helps a lot understanding what is happening in the pipeline. Previously, with Implicit Fencing there was no infomation available, so it was hard to figure out what was happening on the pipeline, also each vendor was trying to implement their own Implicit Fencing mechanism. Now with an standard Explicit Fencing mechanism it easier to build debug/tracing infrastructure that can be used to investigate issues in any system.
The next article will explain the Android Sync Framework and later the work on mainline to support explicit fencing will be described.]]>
My presentation covered the effort to create the Explicit Fencing mechanism on the Linux Kernel which is to be used mainly by the Graphics pipeline. In short, Explicit Fencing is a way to give userspace information about the current state of shared buffers inside the kernel. This is done through fences, that can then be passed around to userspace and/or other kernel drivers for synchronization purposes. This allows both userspace and kernel to wait for kernel jobs to finish without blocking. It also significantly helps the compositor take more efficient and smart decisions on scheduling frames to display on the screen. I’ll be posting an article with more details on it soon. :)
Finally I would like to thank Collabora for sponsoring my travel to LinuxCon.]]>
Enric added support for the Analogix anx78xx DRM Bridge and fixed two SD Card related issues on OMAP igep00x0: fix remove/insert detection and enable support to read the write-protect pin.
Gustavo de-staged the sync_file framework (Android Sync framework) that will be used to add explicit fencing support to the graphics pipeline and started a work to clean up usage of legacy vblank helpers.
Helen Koike created a separated module for the V4L2 Test Pattern Generator and fixed return errors on the pipeline validation code while Robert Foss improved the DRM documentation and fixed drm/vc4 (Raspberry Pi) when there is already a pending update when calling atomic_commit.
Tomeu fixed two Rockchip issues while working on the intel-gpu-tools support for other platforms.
Enric Balletbo i Serra (6):
Gustavo Padovan (22):
Helen Koike (3):
Robert Foss (3):
Tomeu Vizoso (2):
As part of Collabora’s continued commitment to further increase its participation to the Linux Kernel, Collabora is actively looking to expand its team of core software engineers. If you’d like to learn more, follow this link.
Here are some highlights of Collabora’s participation in Kernel 4.6:
Andrew Shadura fixed the number of buttons reported on the Pemount 6000 USB touchscreen controller, while Daniel Stone enabled BCM283x familiy devices in the ARM multi_v7_defconfig and Emilio López added module autoloading for a few sunxi devices.
Enric Balletbo i Serra added boot console output to AM335X(Sitara) and OMAP3-IGEP and fixed audio codec setup on AM335X using the right external clock. Martyn Welch added the USB device ID for the GE Healthcare cp210x serial device and renamed the reset reason of the Zodiac Watchdog.
Gustavo Padovan cleaned up the Android Sync Framework on the staging tree for further de-staging of the Sync File infrastructure, which will land in 4.7. Most of the work was removing interfaces that won’t be used in mainline. He also added vblank event support for atomic commits in the virtio DRM driver.
Peter Senna improved an error path and added some style fixes to the sisusbvga driver. While Sjoerd Simons enabled wireless on radxa Rock2 boards, fixed an issue withthe brcmfmac sdio driver sometimes timing out with a false positive and fixed some issues with Serial output on Renesas R-Car porter board.
Tomeu Vizoso changed driver_match_device() to return errors and in case of -EPROBE_DEFER queue the device for deferred probing, he also provided two fixes to Rockchip DRM driver as part of his work on making intel-gpu-tools work on other platforms.
Following is a list of all patches submitted by Collabora for this kernel release:
Andrew Shadura (1):
Daniel Stone (1):
Emilio López (4):
Enric Balletbo i Serra (3):
Gustavo Padovan (17):
Martyn Welch (2):
Peter Senna Tschudin (4):
Sjoerd Simons (6):
Tomeu Vizoso (4):
As part of its continued committment to further increase ts participation to the Linux Kernel, Collabora is looking to expand its team of core software engineers. If you’d like to learn more, follow this link.
Here are some highlights of Collabora’s participation in Kernel 4.5:
Daniel Stone improved i915 runtime WARN() messages and fixed an important issue in the component subsystem when component_add() fails. Danilo Cesar made the DRM Docbook ready for Markdown text.
Gustavo Padovan improved the pm_runtime management on the drm/exynos driver and started work on de-staging the Android Sync Framework. On Rockchip, Sjoerd Simons enabled IR receiver to RK3288 Radxa Rock 2 Square, added multi_v7_defconfig for Rockchip audio and enabled RK3288 SPDIF clocks to change their parent. On the net side, Sjoerd added a patch to turn carrier off on phy attach to avoid unknown states and another patch to add ethernet0 alias for the RK3288 to help u-boot find this device-node.
During his brief time with us at Collabora, Heiko Stübner added the dts file for the veyron-brain board, a shutdown callback to platform variant dwc2 devices for a special clock handling to avoid getting stuck on the reboot/poweroff process and multi_v7_defconfig support to Rockchip’s io-domain driver, crypto module and rk808 clkout module. He also enabled support for veyron minnie touchscreen, adjusted temperature limits on veyron-speedy and fixed the edp-24m clock to be associated to the internal 24MHz oscillator all the time.
Martyn Welch added a driver for the Zodiac Aerospace RAVE Watchdog Processor, while Tomeu Vizoso added a device_is_bound() helper function and setter for dev.pm_domain that comes with extra checkings. Tomeu also added a patch to allow USB devices to remain runtime-suspended when sleeping and another patch to optimize sleep by going direct_complete if driver has no prepare and PM callbacks. Lastly, Tomeu also fixed a freq issue on Tegra devfreq_dev_profile.target callback.
Following is a list of all patches submitted by Collabora for this kernel release:
Daniel Stone (3):
Danilo Cesar Lemes de Paula (1):
Gustavo Padovan (9):
Heiko Stübner (8):
Martyn Welch (2):
Sjoerd Simons (5):
Tomeu Vizoso (5):
On that note, Collabora is hiring experienced kernel hackers to further increase our participation in the Linux Kernel. If you are interested, please drop a line!
In this release Daniel Stone fixed a potential circular deadlock when loading the i915 GuC firmware and incorrect pipe paramenter on drm_crtc_send_vblank_event() that was leading to WARN_ON. Danilo Cesar Lemes de Paula improved the kernel-doc script to fix an issue with struct drm_modeset_lock not showing at the final kernel Doc and fixes a fault in the highlight processing by using arrays instead of hashes.
Emilio López enabled EC verified boot context on Peach Boards and driver to read/write nvram’s verified boot context to/from userspace for Chromebook devices and Enric Balletbo i Serra added support for TI’s tps65217 charger driver while Gustavo Padovan added cursor support on exynos DRM driver. Javier did some improvements to the Chromebook EC driver.
Sjoerd Simons added rockchip support by default on ARM multi_v7_defconfig and a driver for the SPDIF audio transceiver on rockchip boards. Tomeu Vizoso removed the regulator_list as it was redundant because the regulators devices can be found through the regulator_class, fixed an clk reparenting issue on exynos5250 that was preventing the screen to work after the second suspend.
A full list of all commits is provided here:
Daniel Stone (2):
Danilo Cesar Lemes de Paula (3):
Emilio López (3):
Enric Balletbo i Serra (3):
Gustavo Padovan (3):
Javier Martinez Canillas (3):
Sjoerd Simons (17):
Tomeu Vizoso (4):
Danilo worked on the kernel doc scripts to add cross-reference links to html documentation and arguments documentation in struct body. While Sjoerd Simons fixed a clock definition in rockchip and a incorrect udelay usage for the stmmac phy reset delay.
Tomeu fixed gpiolib to defer probe if the pin controller isn’t available, added another fix to chipidea USB to defer probe of usbmisc hasn’t been probed yet. On Tegra Tomeu worked to support to gpio-ranges property. Still on Tegra cpuidle_state.enter_freeze() was added.
Gustavo Padovan did a lot of exynos DRM work, with the most important changes being improvements to atomic modesetting, including the asynchronous atomic commit in exynos, in async mode we just schedule the atomic update and return right away to the userspace, in a similar way that PageFlips works in the old API. In this release the exynos atomic modesetting interface was enabled for userspace usage. Another important set of patches was the removal of structs exynos_drm_display and exynos_drm_encoder layers which greatly improved the code making it cleaner and easier to use. Apart from that there is also a few cleanup and fixes.
Danilo Cesar Lemes de Paula (2):
Gustavo Padovan (36):
Javier Martinez Canillas (1):
Sjoerd Simons (2):
Tomeu Vizoso (7):