Discussion:
"external abort on linefetch (0x814)" on Kirkwood 6282 SoC
Rob J. Epping
2017-07-29 10:13:56 UTC
Permalink
Hi Andrew and list,
So far, i've not been able to reproduce this. I have 6282 based QNAP
NAS box, with a single disk. Since this is a kernel hacking box, i
tftpboot and don't use an initrd. I've been using the
mvebu_v5_defconfig kernel configuration and i have tried v4.13-rc2,
v4.12, v4.10.0 and v3.9.30. And i have sid for user space.
I'm one of the persons that reported the issue.

I have both a 6281 and a 6282 based device. The 6281 based device (QNAP
TS-219) is on Debian stretch state Juli 25th with kernel
linux-image-4.9.0-3-marvell 4.9.30-2+deb9u2 and initramfs mudules set to
most. The 6282 based device (QNAP TS-221) is stuck on jessie with kernel
linux-image-4.3.0-0.bpo.1-kikwood 4.3.5-1~bpo8+1.
Both devices are in use for personal use, so when the OS is up and
running there are processes active causing network and disk activity.

The way I test is by creating an initrd and vmlinuz from the 6281 device
for the 6282 device using the attached script putting the files on a FAT
based USB key mounted under /mnt and booting with the u-boot commands
printed. The command is not complete, it is missing the USB init parts.
Then I move de USB key and the disks over to the 6282 based device and
boot with the vmlinuz and initrd from the USB key.

Last test was done by installing (but not flashing) the same kernel
image on both systems and just moving initrd and vmlinuz over with the
USB key.

As you can see from the script I did try TFTP booting as well. I do
recall having the issues then as well, though it has been a while. Would
it be possible for you to try with a USB key?

Also are you just booting the kernel or are there processes active?
I did notice the last time the system felt sluggish but it took a while
for error messages to appear.
Andrew
GRTNX,
RobJE
RobJE Debian ARM
2017-07-31 07:29:38 UTC
Permalink
So i'm thinking this has to be related to bits of hardware i'm not
using. I don't have anything on the PCIe bus, i don't have any USB
devices plugged in, i don't use the mtd devices, etc.
Could somebody who does have the issue describe their system? Could
they pull out all there USB devices and see if that stops the
issues. Remove the driver for PCIe devices, if possible.
This remark triggered me. Booting without USB and PCIe will be a
challenge I'll tackle another day, but looking at the differences in
hardware as observed by the kernel is easy.

Attached are two files containing the lshw output for both the kirkwood
and marvell kernel flavors. Except for the obvious differences like
versions, I find the below differences. I do not know if these
differences are related to the observed issues.

- IRQ is in the 3x range on marvell and 8x range in kirkwood.
- On marvell PCI bridges have additional capabilities: pciexpress and
cap_list
- usbhost:0 and usbhost:1 swapped between marvell and kirkwood.

For reverence I also added the 6281 lshw output. lshw versions between
devices are different.
Andrew
GRTNX,
RobJE
Martin Michlmayr
2018-04-25 11:16:28 UTC
Permalink
Quite a few Debian users on QNAP are affected by this "external abort
on linefetch" issue. Ian Campell raised this with Andrew Lunn
(upstream kernel) last year but Andrew couldn't reproduce it:
https://lists.debian.org/debian-arm/2017/07/msg00054.html

RobJE provided additional information but forgot to CC Andrew:
https://lists.debian.org/debian-arm/2017/07/msg00059.html

Timo Jyrinki is happy to run some tests. He's affected and has a
serial console. The bug is still there in the 4.9 kernel we're
shipping with Debian kernel.

Andrew, what information or access do you need so this can be tracked
down?

Thank you!
--
Martin Michlmayr
http://www.cyrius.com/
Timo Jyrinki
2018-05-24 09:40:06 UTC
Permalink
Post by Martin Michlmayr
Timo Jyrinki is happy to run some tests. He's affected and has a
serial console. The bug is still there in the 4.9 kernel we're
shipping with Debian kernel.
Andrew, what information or access do you need so this can be tracked
down?
Yesterday I tried booting with mem=512M added to the u-boot's setenv
bootargs, and wasn't able to reproduce the problem. Booting again
without the parameter it was there again. I repeated a couple of times
with same results, although sometimes it took some time for the
problem to occur in the normal 1GB RAM use case so I'm not 100% sure
of how bullet proof the workaround is. I tried to use at least some
memory by starting Debian installer fetching, logging into it via ssh
etc.

Could someone else try it out? Double-check the parameter worked with
'free'. I'm tempted to make a backup of my current / + flash
partitions and dist-upgrade to stretch. On that note, what would be
the easiest way to set the mem=512M as the default for normal boots?

Andrew wasn't able to reproduce the problem on his 6282 machine. Would
it be that he has QNAP TS-219P+ or similar that has only 512MB RAM?
(https://www.cyrius.com/debian/kirkwood/qnap/ts-219/specs/)

-Timo
Andrew Lunn
2018-05-24 12:30:26 UTC
Permalink
Post by Timo Jyrinki
Post by Martin Michlmayr
Timo Jyrinki is happy to run some tests. He's affected and has a
serial console. The bug is still there in the 4.9 kernel we're
shipping with Debian kernel.
Andrew, what information or access do you need so this can be tracked
down?
Yesterday I tried booting with mem=512M added to the u-boot's setenv
bootargs, and wasn't able to reproduce the problem. Booting again
without the parameter it was there again. I repeated a couple of times
with same results, although sometimes it took some time for the
problem to occur in the normal 1GB RAM use case so I'm not 100% sure
of how bullet proof the workaround is. I tried to use at least some
memory by starting Debian installer fetching, logging into it via ssh
etc.
Could someone else try it out? Double-check the parameter worked with
'free'. I'm tempted to make a backup of my current / + flash
partitions and dist-upgrade to stretch. On that note, what would be
the easiest way to set the mem=512M as the default for normal boots?
Andrew wasn't able to reproduce the problem on his 6282 machine. Would
it be that he has QNAP TS-219P+ or similar that has only 512MB RAM?
(https://www.cyrius.com/debian/kirkwood/qnap/ts-219/specs/)
Hi Timo

***@qnap:~# cat /proc/meminfo
MemTotal: 511516 kB

So lets think about what this could mean...

Is the 1G implemented using two RAM chips? Do you have photos of your
board? Can you identify the chips? Does u-boot say anything useful
about the RAM?

Could the u-boot you have not be correctly initialising the second RAM
chip? Are you using the stock QNAP/marvell u-boot, or have you
upgraded u-boot?

Is there a hole in the address range between the two RAMs? The kernel
should be able to handle that, but i don't know if you have to tell
it, or if it can figure it out itself. Can you see anything about this
in the kernel logs, or u-boot?

Do we see the physical address being accessed when we get the abort?
Is it in the top 1/2 of the RAM? Could it be a DMA operation which has
gone over the boarder between the end of the first RAM and the
beginning of the second RAM? Seems a bit unlikely....

Andrew
RobJE Debian ARM
2018-05-27 11:39:35 UTC
Permalink
Post by Andrew Lunn
Post by Timo Jyrinki
Post by Martin Michlmayr
Timo Jyrinki is happy to run some tests. He's affected and has a
serial console. The bug is still there in the 4.9 kernel we're
shipping with Debian kernel.
Andrew, what information or access do you need so this can be tracked
down?
Yesterday I tried booting with mem=512M added to the u-boot's setenv
bootargs, and wasn't able to reproduce the problem. Booting again
without the parameter it was there again. I repeated a couple of times
with same results, although sometimes it took some time for the
problem to occur in the normal 1GB RAM use case so I'm not 100% sure
of how bullet proof the workaround is. I tried to use at least some
memory by starting Debian installer fetching, logging into it via ssh
etc.
Could someone else try it out? Double-check the parameter worked with
'free'. I'm tempted to make a backup of my current / + flash
partitions and dist-upgrade to stretch. On that note, what would be
the easiest way to set the mem=512M as the default for normal boots?
Andrew wasn't able to reproduce the problem on his 6282 machine. Would
it be that he has QNAP TS-219P+ or similar that has only 512MB RAM?
(https://www.cyrius.com/debian/kirkwood/qnap/ts-219/specs/)
Hi Timo
MemTotal: 511516 kB
So lets think about what this could mean...
Is the 1G implemented using two RAM chips? Do you have photos of your
board? Can you identify the chips? Does u-boot say anything useful
about the RAM?
Could the u-boot you have not be correctly initialising the second RAM
chip? Are you using the stock QNAP/marvell u-boot, or have you
upgraded u-boot?
Is there a hole in the address range between the two RAMs? The kernel
should be able to handle that, but i don't know if you have to tell
it, or if it can figure it out itself. Can you see anything about this
in the kernel logs, or u-boot?
Do we see the physical address being accessed when we get the abort?
Is it in the top 1/2 of the RAM? Could it be a DMA operation which has
gone over the boarder between the end of the first RAM and the
beginning of the second RAM? Seems a bit unlikely....
Andrew
Timo's remark about memory triggered me.

I am not convinced it is related to u-boot or memory chips. Specifically
because kernel lenny 4.3.0-0.bpo.1-kirkwood (4.3.5-1~bpo8+1) does not
have these issues. For me the issues started after the flavour change
from kirkwood to marvell.

I tried running strecth 4.16.0-0.bpo.1-marvell (4.16.5-1~bpo9+1) with
mem=512M which was stable for more than 24 hours. Comparing dmesg output
one interesting line was missing in the 512M version:

HighMem zone: 65536 pages, LIFO batch:15

With mem=768M also kernel boots with no bug and error reports. 768M is
the border where (according to dmesg) HighMem starts. With no mem= (i.e.
using the full 1024M) just booting already prints a lot of error
messages for me.

I think changes in handling HighMem between kirkwood and marvell
flavours are the cause, though have no way other than the test above to
confirm. Maybe information displayed in the error messages can help
confirm issue is related to HighMem?

When there is anything I can test please let me know.

GRTNX,
RobJE
Andrew Lunn
2018-05-28 16:00:33 UTC
Permalink
Post by RobJE Debian ARM
Post by Andrew Lunn
Post by Timo Jyrinki
Post by Martin Michlmayr
Timo Jyrinki is happy to run some tests. He's affected and has a
serial console. The bug is still there in the 4.9 kernel we're
shipping with Debian kernel.
Andrew, what information or access do you need so this can be tracked
down?
Yesterday I tried booting with mem=512M added to the u-boot's setenv
bootargs, and wasn't able to reproduce the problem. Booting again
without the parameter it was there again. I repeated a couple of times
with same results, although sometimes it took some time for the
problem to occur in the normal 1GB RAM use case so I'm not 100% sure
of how bullet proof the workaround is. I tried to use at least some
memory by starting Debian installer fetching, logging into it via ssh
etc.
Could someone else try it out? Double-check the parameter worked with
'free'. I'm tempted to make a backup of my current / + flash
partitions and dist-upgrade to stretch. On that note, what would be
the easiest way to set the mem=512M as the default for normal boots?
Andrew wasn't able to reproduce the problem on his 6282 machine. Would
it be that he has QNAP TS-219P+ or similar that has only 512MB RAM?
(https://www.cyrius.com/debian/kirkwood/qnap/ts-219/specs/)
Hi Timo
MemTotal: 511516 kB
So lets think about what this could mean...
Is the 1G implemented using two RAM chips? Do you have photos of your
board? Can you identify the chips? Does u-boot say anything useful
about the RAM?
Could the u-boot you have not be correctly initialising the second RAM
chip? Are you using the stock QNAP/marvell u-boot, or have you
upgraded u-boot?
Is there a hole in the address range between the two RAMs? The kernel
should be able to handle that, but i don't know if you have to tell
it, or if it can figure it out itself. Can you see anything about this
in the kernel logs, or u-boot?
Do we see the physical address being accessed when we get the abort?
Is it in the top 1/2 of the RAM? Could it be a DMA operation which has
gone over the boarder between the end of the first RAM and the
beginning of the second RAM? Seems a bit unlikely....
Andrew
Timo's remark about memory triggered me.
I am not convinced it is related to u-boot or memory chips. Specifically
because kernel lenny 4.3.0-0.bpo.1-kirkwood (4.3.5-1~bpo8+1) does not
have these issues. For me the issues started after the flavour change
from kirkwood to marvell.
I tried running strecth 4.16.0-0.bpo.1-marvell (4.16.5-1~bpo9+1) with
mem=512M which was stable for more than 24 hours. Comparing dmesg output
HighMem zone: 65536 pages, LIFO batch:15
With mem=768M also kernel boots with no bug and error reports. 768M is
the border where (according to dmesg) HighMem starts. With no mem= (i.e.
using the full 1024M) just booting already prints a lot of error
messages for me.
Hi Rob

Since my QNAP only has 512M, there is not too much experimentation i
can do.

Could you try changing "Memory split" to "3G/1G user/kernel split (for
full 1G low memory)". You should then see that the lowmem in the
Virtual kernel memory layout table goes from starting at 0xc0000000 to
starting at 0xB0000000. I hope it will then not use high mem, and
still give you the full 1G of RAM.

Andrew
Jonathan Medhurst
2018-05-29 05:50:16 UTC
Permalink
Post by Andrew Lunn
Hi Rob
Since my QNAP only has 512M, there is not too much experimentation i
can do.
Could you try changing "Memory split" to "3G/1G user/kernel split (for
full 1G low memory)".
Don't you mean change it to 2G/2G? That's what would be needed to let
the kernel map the whole 1GB of physical RAM in it's address region and
so not need the high memory mechanism.
Post by Andrew Lunn
You should then see that the lowmem in the
Virtual kernel memory layout table goes from starting at 0xc0000000 to
starting at 0xB0000000. I hope it will then not use high mem, and
still give you the full 1G of RAM.
--
Tixy
Andrew Lunn
2018-05-29 11:51:44 UTC
Permalink
Post by Jonathan Medhurst
Post by Andrew Lunn
Hi Rob
Since my QNAP only has 512M, there is not too much experimentation i
can do.
Could you try changing "Memory split" to "3G/1G user/kernel split (for
full 1G low memory)".
Don't you mean change it to 2G/2G? That's what would be needed to let
the kernel map the whole 1GB of physical RAM in it's address region and
so not need the high memory mechanism.
Hi Jonathan

The comment says:

config VMSPLIT_3G_OPT
depends on !ARM_LPAE
bool "3G/1G user/kernel split (for full 1G low memory)"

So i'm thinking that means it should support up to 1G of RAM using
this split. It puts the split at 0xB0000000, so it is more like
2.75G/1.25G.

2G/2G would also work, but that is a bigger change. And i don't know
how many devices are being supported by this one kernel. It should be
possible to build one kernel which runs on all ARM v5 machines, not
just Marvell ARM v5 machines. This is the sort of change which will
affect them all. So i wanted to keep the change as small as possible.

Andrew
Tixy
2018-05-29 15:27:08 UTC
Permalink
Post by Andrew Lunn
Post by Jonathan Medhurst
Post by Andrew Lunn
Hi Rob
Since my QNAP only has 512M, there is not too much
experimentation i
can do.
Could you try changing "Memory split" to "3G/1G user/kernel split (for
full 1G low memory)".
Don't you mean change it to 2G/2G? That's what would be needed to let
the kernel map the whole 1GB of physical RAM in it's address region and
so not need the high memory mechanism.
Hi Jonathan
        config VMSPLIT_3G_OPT
                depends on !ARM_LPAE
                bool "3G/1G user/kernel split (for full 1G low
memory)"
So i'm thinking that means it should support up to 1G of RAM using
this split. It puts the split at 0xB0000000, so it is more like
2.75G/1.25G.
Ah, you are right, I thought you were suggesting VMSPLIT_3G. I didn't
notice that the kernel had sprouted an extra VMSPLIT_3G_OPT option a
couple of years ago.

-- 
Tixy
Timo Jyrinki
2018-06-02 12:36:43 UTC
Permalink
Post by Andrew Lunn
Could you try changing "Memory split" to "3G/1G user/kernel split (for
full 1G low memory)". You should then see that the lowmem in the
Virtual kernel memory layout table goes from starting at 0xc0000000 to
starting at 0xB0000000. I hope it will then not use high mem, and
still give you the full 1G of RAM.
Someone could give newbie tips on making a bootable kernel that I
could load from u-boot. I tried compiling one Debian's kernel simply
with debuild in a stretch chroot, adding VMSPLIT_3G_OPT=y to
config.marvell under debian/, but with the vmlinuz generated I got
"Bad Magic Number" when I tried to load it with u-boot over TFTP.

Given that the installer-armel kernels that do boot over U-Boot have
also kernel variants 6281 and 6282 while the kernel from linux package
does not have variants, I'm certainly missing something useful (and my
free time is severely limited, I didn't yet find information what I'd
need on my own).

Regardless I've now modified the default bootargs in u-boot with
printenv bootargs -> setenv appending mem=768M -> saveenv, and
dist-upgraded to stretch. It's working flawlessly with 768MB RAM!

Now on stretch I could probably also just install the built deb
packages, but I'd rather do this memory corruption testing from a
"live" session over TFTP instead of booting my regular system with a
test kernel.

-Timo
Ian Campbell
2018-06-02 15:55:13 UTC
Permalink
Post by Timo Jyrinki
Post by Andrew Lunn
Could you try changing "Memory split" to "3G/1G user/kernel split
(for
Post by Andrew Lunn
full 1G low memory)". You should then see that the lowmem in the
Virtual kernel memory layout table goes from starting at 0xc0000000
to
Post by Andrew Lunn
starting at 0xB0000000. I hope it will then not use high mem, and
still give you the full 1G of RAM.
Someone could give newbie tips on making a bootable kernel that I
could load from u-boot. I tried compiling one Debian's kernel simply
with debuild in a stretch chroot, adding VMSPLIT_3G_OPT=y to
config.marvell under debian/, but with the vmlinuz generated I got
"Bad Magic Number" when I tried to load it with u-boot over TFTP.
You need to append a dtb and then encode in u-boot's uImage format.
e.g.

cat arch/arm/boot/zImage arch/arm/boot/dts/kirkwood-ts419-6281.dtb > x
sudo mkimage -A arm -T kernel -O linux -C none -a 0x8000 -e 0x8000 -d x uImage

Now the uImage file ought to be bootable with `bootm`, load it to
0x800000 and an initrd (if using one) to 0xa00000 then `bootm
0x800000`.

Be sure to pick the correct dtb variant for your board, it might boot
with the wrong one but you'll potentially be missing some peripherals
etc.
Ian Campbell
2018-06-02 16:33:50 UTC
Permalink
Post by Andrew Lunn
Post by Timo Jyrinki
Post by Andrew Lunn
Could you try changing "Memory split" to "3G/1G user/kernel split
(for
Post by Andrew Lunn
full 1G low memory)". You should then see that the lowmem in the
Virtual kernel memory layout table goes from starting at
0xc0000000
Post by Timo Jyrinki
to
Post by Andrew Lunn
starting at 0xB0000000. I hope it will then not use high mem, and
still give you the full 1G of RAM.
Someone could give newbie tips on making a bootable kernel that I
could load from u-boot. I tried compiling one Debian's kernel
simply
Post by Timo Jyrinki
with debuild in a stretch chroot, adding VMSPLIT_3G_OPT=y to
config.marvell under debian/, but with the vmlinuz generated I got
"Bad Magic Number" when I tried to load it with u-boot over TFTP.
You need to append a dtb and then encode in u-boot's uImage format.
e.g.
cat arch/arm/boot/zImage arch/arm/boot/dts/kirkwood-ts419-6281.dtb
Post by Timo Jyrinki
x
sudo mkimage -A arm -T kernel -O linux -C none -a 0x8000 -e 0x8000 -d x uImage
You don't need that `sudo` BTW unless uImage is in an root-only path.
Timo Jyrinki
2018-06-02 18:48:47 UTC
Permalink
Post by Ian Campbell
You need to append a dtb and then encode in u-boot's uImage format.
e.g.
cat arch/arm/boot/zImage arch/arm/boot/dts/kirkwood-ts419-6281.dtb > x
sudo mkimage -A arm -T kernel -O linux -C none -a 0x8000 -e 0x8000 -d x uImage
Thank you! Now it's all coming back to me, I'm not sure if I've played
with these since Neo FreeRunner times.

So the good news is that with this kernel
kernel-kirkwood-ts219-6282-split3gopt from
https://people.debian.org/~timo/qnap/ (initrd from
http://ftp.debian.org/debian/dists/stretch/main/installer-armel/current/images/kirkwood/network-console/qnap/ts-21x/)
I'm getting full 1GB RAM without the errors!

I do seem to have a problem with networking, not sure because of my
custom build somehow otherwise or if VMSPLIT_3G_OPT=y could affect it.

In the same directory I've also included the zImage, in case you want
to combine it with a different dtb than the kirkwood-ts219-6282 one
and create your own uImage.

-Timo
Andrew Lunn
2018-06-02 19:31:08 UTC
Permalink
Post by Timo Jyrinki
Post by Ian Campbell
You need to append a dtb and then encode in u-boot's uImage format.
e.g.
cat arch/arm/boot/zImage arch/arm/boot/dts/kirkwood-ts419-6281.dtb > x
sudo mkimage -A arm -T kernel -O linux -C none -a 0x8000 -e 0x8000 -d x uImage
Thank you! Now it's all coming back to me, I'm not sure if I've played
with these since Neo FreeRunner times.
So the good news is that with this kernel
kernel-kirkwood-ts219-6282-split3gopt from
https://people.debian.org/~timo/qnap/ (initrd from
http://ftp.debian.org/debian/dists/stretch/main/installer-armel/current/images/kirkwood/network-console/qnap/ts-21x/)
I'm getting full 1GB RAM without the errors!
Cool. Thanks for testing.

Now, the question is, is this an O.K. workaround? Or do we need to
figure out why highmem breaks on Kirkwood?

Andrew
RobJE Debian ARM
2018-06-05 22:52:57 UTC
Permalink
Post by Andrew Lunn
Post by Timo Jyrinki
Post by Ian Campbell
You need to append a dtb and then encode in u-boot's uImage format.
e.g.
cat arch/arm/boot/zImage arch/arm/boot/dts/kirkwood-ts419-6281.dtb > x
sudo mkimage -A arm -T kernel -O linux -C none -a 0x8000 -e 0x8000 -d x uImage
Thank you! Now it's all coming back to me, I'm not sure if I've played
with these since Neo FreeRunner times.
So the good news is that with this kernel
kernel-kirkwood-ts219-6282-split3gopt from
https://people.debian.org/~timo/qnap/ (initrd from
http://ftp.debian.org/debian/dists/stretch/main/installer-armel/current/images/kirkwood/network-console/qnap/ts-21x/)
I'm getting full 1GB RAM without the errors!
Cool. Thanks for testing.
Now, the question is, is this an O.K. workaround? Or do we need to
figure out why highmem breaks on Kirkwood?
For me the most important thing is to have a workable installation that
can be kept up to date. OTOH I'm also interested in what change
introduced the instability.

I did some research into when VMSPLIT_3G_OPT was introduced, which
AFAICT was after linux-image-4.3.0-0.bpo.1-kirkwood (linux-4.3.5).
Version linux-image-4.3.0-0.bpo.1-kirkwood is stable with 1024M memory.

The option VMSPLIT_3G_OPT affects CONFIG_PAGE_OFFSET. For both the
linux-image-4.3.0-0.bpo.1-kirkwood and
linux-image-4.3.0-0.bpo.1-kirkwood kernels the setting is the same

***@threis:~$ grep PAGE_OFFSET /boot/config-4.*
/boot/config-4.16.0-0.bpo.1-marvell:CONFIG_PAGE_OFFSET=0xC0000000
/boot/config-4.3.0-0.bpo.1-kirkwood:CONFIG_PAGE_OFFSET=0xC0000000
***@threis:~$

It seems to me there is something else that changed after either after
linux 4.3 or with the flavour change from kirkwood to marvell.

comparing kernel.org v4.3.5 and v4.7.1 there are some changed to
kirkwood-ts219-6282.dts related to something called mbus and pcie.

grep-ing the kernel arch/arm source for "external abort on linefetch"
there are 2 files found, arch/arm/mm/fsr-2level.c and
arch/arm/mach-integrator/pci_v3.c
fsr is realoed to fault status register and pci_v3 to pci.

I'm wondering if there is a relation between the device-tree changes and
the error messages, that is not triggered when memory size is restricted.

Would it be save to boot 4.16 kernel with 4.3 dtb?

been at this for way to long, time to go to sleep.
Post by Andrew Lunn
Andrew
GRTNX,
RobJE
Timo Jyrinki
2018-06-05 17:25:12 UTC
Permalink
Post by Andrew Lunn
Post by Timo Jyrinki
kernel-kirkwood-ts219-6282-split3gopt from
...
Post by Andrew Lunn
Post by Timo Jyrinki
I'm getting full 1GB RAM without the errors!
Now, the question is, is this an O.K. workaround? Or do we need to
figure out why highmem breaks on Kirkwood?
Fine by me, but I'm not really in position to say that much. On a personal
level even the mem=768M is ok workaround, I can live with 768MB RAM and I'm
really happy to be able to use Debian 9.0.

It would be nice to see more testing on the kernel at some point. Maybe a
more proper kernel zImage build with CONFIG_VMSPLIT_3G_OPT=y would be
useful too, I don't know what's the cause for my wired network brokenness.
I did build the current kernel in a qemu emulated chroot environment
instead of real hw.

Regardless, for testing, if it helps I put the dtb files extracted from
official marvell 4.9 .deb to https://people.debian.org/~timo/qnap/dtb/

In more detail:
wget
http://ftp.debian.org/debian/dists/stretch/main/installer-armel/current/images/kirkwood/network-console/qnap/ts-21x/initrd
# or ts-11x, ts-41x
wget https://people.debian.org/~timo/qnap/zImage
wget https://people.debian.org/~timo/qnap/dtb/kirkwood-ts419-6281.dtb # for
example, depending on your model
cat zImage kirkwood-ts419-6281.dtb > x
mkimage -A arm -T kernel -O linux -C none -a 0x8000 -e 0x8000 -d x uImage

Then follow https://www.cyrius.com/debian/kirkwood/qnap/ts-219/uboot/ to
boot the initrd + uImage from serial console.

On TS-221, with my USB to TTL cable it worked for me with minicom with the
following pins connected:
Loading Image...

For the TFTP server (anything in the same network works), I found tftpd-hpa
simple. Just modify TFTP_DIRECTORY in /etc/default/tftpd-hpa and it just
works.

-Timo
Timo Jyrinki
2018-06-06 09:48:30 UTC
Permalink
more proper kernel zImage build with CONFIG_VMSPLIT_3G_OPT=y would be useful
...
wget https://people.debian.org/~timo/qnap/zImage
It seems my native kernel build on the QNAP device finished last
night, so here's the zImage for it:

https://people.debian.org/~timo/qnap/zImage_new

I have no time to test it myself now. Just in case it would be somehow better.

-Timo
Ian Campbell
2018-06-09 11:30:59 UTC
Permalink
(adding debian-kernel, context: external aborts on qnap/marvell systems
with 1G of RAM, avoided with VMSPLIT_3G_OPT=y).
Post by Andrew Lunn
Post by Timo Jyrinki
Post by Ian Campbell
You need to append a dtb and then encode in u-boot's uImage format.
e.g.
cat arch/arm/boot/zImage arch/arm/boot/dts/kirkwood-ts419-6281.dtb > x
sudo mkimage -A arm -T kernel -O linux -C none -a 0x8000 -e 0x8000 -d x uImage
Thank you! Now it's all coming back to me, I'm not sure if I've played
with these since Neo FreeRunner times.
So the good news is that with this kernel
kernel-kirkwood-ts219-6282-split3gopt from
https://people.debian.org/~timo/qnap/ (initrd from
http://ftp.debian.org/debian/dists/stretch/main/installer-armel/current/images/kirkwood/network-console/qnap/ts-21x/)
I'm getting full 1GB RAM without the errors!
Cool. Thanks for testing.
Now, the question is, is this an O.K. workaround?
Hard to say for sure. IIRC the downside of the VMSPLIT_3G_OPT
workaround is a slightly smaller virtual address space (from 3G down to
2.75G) for the userspace part of a process, which would mean that
applications which really needed the full space would suffer.

There are some use case which need this, linking large packages comes
immediately to mind, but I don't think Debian runs any armel buildd's
on armel (they are running as chroots on armhf systems).

With only 1G of physical RAM anything using the full 3G would be
already so far into swapping hell that it seems like it would be pretty
unusable. So maybe we can assert that it is unlikely that there is any
real world usage that would be impacted by this change.

Only other things which come to mind are applications which require a
full 3G of address space but which don't populate it all with RAM
somehow (v. sparse layouts for dynamical languages perhaps?) or which
are simply buggy with the smaller size (I don't know if there are
precedents on other archs or other arm flavours for this). These seem
unlikely to me, but frankly I'm basing that on no data at all.

Debian uses a Marvell specific kernel, so we don't need to worry about
the impact on other platforms.
Post by Andrew Lunn
Or do we need to figure out why highmem breaks on Kirkwood?
I guess it would be nice from an upstream PoV to know what was going on
-- in particular in case there were to be other more subtle side
effects or corruption possible.

Ian.
Andrew Lunn
2018-06-09 14:23:24 UTC
Permalink
Post by Ian Campbell
With only 1G of physical RAM anything using the full 3G would be
already so far into swapping hell that it seems like it would be pretty
unusable. So maybe we can assert that it is unlikely that there is any
real world usage that would be impacted by this change.
Hi Ian

That was what i was thinking. In theory, one of the kirkwood SoCs can
have 2GB of RAM. But i've not seen many 1G machines, let alone 2G.
Post by Ian Campbell
Debian uses a Marvell specific kernel, so we don't need to worry about
the impact on other platforms.
That i was not sure about. Are there any plans to merge all ARM v5
kernels together? Then this would affect more machines.
Post by Ian Campbell
Post by Andrew Lunn
Or do we need to figure out why highmem breaks on Kirkwood?
I guess it would be nice from an upstream PoV to know what was going on
-- in particular in case there were to be other more subtle side
effects or corruption possible.
I might be able to hack together a 3.5/0.5G split, so forcing some of
the 512MB of RAM i have in my Kirkwood into highmem. Hopefully i can
then reproduce the issue.

Andrew
Ian Campbell
2018-06-09 16:17:12 UTC
Permalink
Post by Andrew Lunn
Post by Ian Campbell
Debian uses a Marvell specific kernel, so we don't need to worry
about
Post by Ian Campbell
the impact on other platforms.
That i was not sure about. Are there any plans to merge all ARM v5
kernels together?
Not AFAIK, marvell is the only armv5 flavour left in Debian and armel
is well past the point where more are likely to be added.
Post by Andrew Lunn
Post by Ian Campbell
Post by Andrew Lunn
Or do we need to figure out why highmem breaks on Kirkwood?
I guess it would be nice from an upstream PoV to know what was going on
-- in particular in case there were to be other more subtle side
effects or corruption possible.
I might be able to hack together a 3.5/0.5G split, so forcing some of
the 512MB of RAM i have in my Kirkwood into highmem. Hopefully i can
then reproduce the issue.
A 3.5/0.5 split is a good idea, hadn't occurred to me. None of my QNAP
boxes have more than 512M either.

Ian.
Damien
2018-07-03 20:33:57 UTC
Permalink
Hi,

Is there any plan to have this fixed kernel in Debian mainstream, or in
a dpkg ?
I'm not skilled in such low level problematics, and I'm failing to have
serial console working.
--
Regards,
Damien Martins
Martin Michlmayr
2018-07-03 21:27:32 UTC
Permalink
Is there any plan to have this fixed kernel in Debian mainstream, or in a
dpkg ?
I think we haven't quite established what the best course of action
is:

1) The config option change works, but some networking issues were
mentioned. Someone needs to figure out whether that's related.

2) Andrew managed to reproduce the issue, so there's hope a real fix
will be found. But maybe I'm getting my hope up too high ;)
--
Martin Michlmayr
https://www.cyrius.com/
Andrew Lunn
2018-07-04 18:48:23 UTC
Permalink
Post by Martin Michlmayr
Is there any plan to have this fixed kernel in Debian mainstream, or in a
dpkg ?
I think we haven't quite established what the best course of action
1) The config option change works, but some networking issues were
mentioned. Someone needs to figure out whether that's related.
I would be interested in knowing what the network issues were? They
might be a pointer to what is going wrong with high pages.
Post by Martin Michlmayr
2) Andrew managed to reproduce the issue, so there's hope a real fix
will be found. But maybe I'm getting my hope up too high ;)
I can reproduce it. But none of the kernel debug tools helped me get
any further. I think the next step is to explain the problem to
Russell King and see if he has any ideas.

Andrew
Martin Michlmayr
2018-07-04 20:14:53 UTC
Permalink
Post by Andrew Lunn
Post by Martin Michlmayr
1) The config option change works, but some networking issues were
mentioned. Someone needs to figure out whether that's related.
I would be interested in knowing what the network issues were? They
might be a pointer to what is going wrong with high pages.
Copying Timo Jyrinki.
--
Martin Michlmayr
https://www.cyrius.com/
Timo Jyrinki
2018-07-22 06:54:07 UTC
Permalink
Post by Andrew Lunn
Post by Martin Michlmayr
1) The config option change works, but some networking issues were
mentioned. Someone needs to figure out whether that's related.
I would be interested in knowing what the network issues were? They
might be a pointer to what is going wrong with high pages.
"Network doesn't work". I'm not sure what's going on, but the
installer isn't able to enable the network, even though the network
device exists and can be configured. Cable is connected similarly to
normal operation and lights are blinking (and obviously the system was
just booted with TFTP from u-boot too).

I tried now again, this time with the "zImage_new" which was the one
recompiled natively. It didn't make a difference as the symptoms
seemed similar, so I put some logs (slightly manually redacted for
possible unique identifiers) at:
https://people.debian.org/~timo/qnap/split3gopt-logs/

Syslog shows both installer and me trying to get life into the
network. I tried setting IP and default route manually and pinging the
router but nothing.

Adding to my earlier instructions, if one wants to test those kernels
built by me you now need to fetch the older initrd to go along with
them from: http://snapshot.debian.org/archive/debian/20180605T102632Z/dists/stretch/main/installer-armel/20170615%2Bdeb9u3/images/kirkwood/network-console/qnap/ts-21x/initrd

-Timo

Loading...