cannot boot with linux-libre>=5.7, amdgpu and cryptsetup

Thu Jul 16 03:55:44 UTC 2020

On Wed, 15 Jul 2020 02:59:56 +0000
edgar at openmail.cc wrote:

> Hi. I will be answering as I go to each part
Hi,

> > but despite writing in depth documentation for both
> > Trisquel and Parabola on how to do that,  
> 
> Where is the info for Parabola? (not that it is really needed at this 
> point)
https://libreplanet.org/wiki/Group:Hardware/research/gpu/radeon/How_to_patch_and_test_linux-libre_in_Parabola

It was linked to in the Libreplanet page about radeon:
https://libreplanet.org/wiki/Group:Hardware/research/gpu/radeon

The issue with that tutorial is that it's made for the radeon driver
and not the amdgpu one. In addition, as far as I know, nobody tested it
so it probably contains small mistakes, typos or similar issues.

But in any case it can be a good thing to read as it gives a concrete
example on how to do it for a case that is somewhat similar.

> > In addition I will need you to test a patched linux-libre kernel.
> > In the
> > worst case I can temporarily add a new kernel for that in Paraobla
> > to make it easy for you to test the result.  
> 
> I think that I can help with that. In fact, I tried to refactor an
> AUR package for the AMD Raven series so that it uses linux-libre. I
> am able to compile it, but it suffers the same drawback for the
> screen as the regular linux-libre (no wonder, but I had to try).
The tutorial on radeon explains how to do it in a way that is much more
easy than that as you just download Linux source code and deblob it
with the linux-libre script.

With Parabola you can simply do the modification directly on the source
code, compile it, and install the module really fast.

Then as the code is in git, you can easily make a patch from your
modifications.

The next step would be to generate this patch from linux-libre deblob
script as explained from the howto.

As it's done with radeon in mind, things might be different with
amdgpu, but it should be a good starting point: since the method to do
that easily is known, we can then try to focus on the most important
point which is to try to make it work on amdgpu and look if there are
significant differences in that driver or not.

> > Then, as root, try to load the amdgpu driver and get back logs with
> > the following commands:  
> >> dmesg -c
> >> modprobe amdgpu
> >> dmesg > amdgpu.log
> >> sync
> >> poweroff  
> 
> (see dmesg-5.7.8-amdgpu.log.gz)
> 
> I think that you meant
> 
> #+begin_src bash
> shutdown -P now
> #+end_src
The idea is to try to capture the logs of the failing driver. My
hypothesis is that amdgpu is not different from the radeon driver.

I'll give a bit more background to help you understand better.

In Linux, drivers have typically two entry points in a driver:
- There is an init function that is run when the driver is loaded. This
  is not the interesting part here.
- Then there is a probe function that runs when potentially
  compatible hardware is detected. For instance if you plug an USB
  Ethernet card, the kernel will know that an USB device is plugged in,
  and will then call the probe function from the driver that is
  registered for "CDC Ethernet" class USB devices. This is what is
  interesting to us here. In the case of your GPU, the driver declares
  to support a set of PCI vendor ID, product ID and class, and so when
  the GPU is detected, then the probe function is run.

Here since we load the module with modprobe, init is run first, then
probe is run right after as the kernel now knows that a new driver is
present and that it supports your GPU.

If my hypothesis is right, the function that fails is the probe
function (that typically results in a black screen with a computer that
still runs, for instance if you setup ssh, you should still be able to
use it).

So for instance for an Ethernet card, if the probe succeed you get an
eth0 or eth1 interface. If it fails, the driver stays loaded in memory
and the probe function will be called again each time a new USB Ethernet
card that is plugged in your computer.

So the probe function does real things, like here it takes over the
control of your GPU and tries to initialize it, and if it doesn't fail
you should have a working graphical console.

However if it fails, then you typically end up with a black screen
because there is no mechanism to switch back to the old driver (like
vesafb or efifb).

Since we want to patch linux-libre to make the probe not fail, and the
driver load, it's a good idea to try to understand where to patch it.

So for that the best way to find out is by trial and error: you first
start with the unmodified Parabola kernel, and try to capture the logs
right after loading the amdgpu driver. This means that you will need to
capture the logs that are produced after you start having a black
screen as otherwise we don't have the error message telling us about
the failure.

Here is an example for radeon in the tutorial:
> 0000:02:00.0: Missing Free firmware (non-Free firmware loading is
> disabled)
> 0000:02:00.0: Missing Free firmware (non-Free firmware loading is
> disabled)
> si_cp: Failed to load firmware "/*(DEBLOBBED)*/"
> [drm:si_init [radeon]] *ERROR* Failed to load firmware!
> radeon 0000:02:00.0: Fatal error during GPU init
> [...]
> radeon: probe of 0000:02:00.0 failed with error -2
With that we can manage to find where exactly the problem comes from in
the Linux source code and can start trying to fix it in linux-libre.

So here's a more in depth explanation of the set of commands I pasted
above:
> dmesg -c
This prints the current kernel log and clears it as well. So the next
time you run 'dmesg' or 'dmesg -c' you'll only have the "new" logs.

> modprobe amdgpu
This loads the amdgpu driver, and you typically have a black screen
(else we wouldn't need to patch that kernel)

> dmesg > amdgpu.log
This stores the last kernel log, so this should have the result of the 
probe function (possibly with additional things).

> sync
This makes sure that this log is really written to the storage device

> poweroff
This tells the computer to gently shutdown to prevent filesystem
corruption. Having corrupted filesystem isn't fun as it can waste a lot
of time even if it's on an external USB key that is used just for this.

This is not the only way to get such logs, for instance if you setup
ssh, you can simply type the commands above without being blind,
which is much easier.

Another way would be to use the journalctl commands I pasted before
(journalctl -k  -b -1 ) to retrieve such logs. To do that you need to
boot the computer, do the 'modprobe amdgpu' and then reboot and
retrieve the logs with that journalctl commands.

The '-k' is for kernel logs, and the '-b -1' requests the logs of the
previous boot. So if somehow the logs are written fine, you should have
them.

PS: Anyone can create an account for the Libreplanet wiki, so feel free
    to improve the tutorial I started, or create a new tutorial, etc.

Denis.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://www.fsfla.org/pipermail/linux-libre/attachments/20200716/544853ce/attachment.sig>