Discussion:
[U-Boot] Older u-boot mangles UBI from ubinize 1.5.2
J Mo
2016-08-11 02:26:54 UTC
Permalink
Greetings

I am attempting to port LEDE/OpenWRT to a new device; the TRENDnet
TEW-827DRU, which is a IPQ806X-based (AP148) system. It has a NAND flash
for storage with a UBI (kernel + squashfs + ubifs).

When my system attempts to attach the UBI, I see the following error
from linux:


[ 3.781181] ubi0: attaching mtd11
[ 3.835224] UBI: EOF marker found, PEBs from 40 will be erased
[ 3.835384] ubi0: scanning is finished
[ 3.840040] ubi0 error: ubi_read_volume_table: the layout volume was
not found
[ 3.844072] ubi0 error: ubi_attach_mtd_dev: failed to attach mtd11,
error -22
[ 3.850897] UBI error: cannot attach mtd11



I took this to google and it turns out that Ram Chandra Jangir here had
noted the same issue a few months back, and then I got lucky and saw his
patches yesterday:

https://patchwork.ozlabs.org/patch/657285/
https://patchwork.ozlabs.org/patch/624733/



I emailed Ram and he sent me his boot log and it looks identical to
mine, so I think it's the same issue. (thx again Ram!)

I tried re-flashing my UBI and tftpbooting my kernel before u-boot could
ever get a chance to mangle it, and now I get much further, though I'm
still not able to mount my rootfs for unknown reasons:

[ 3.772502] ubi0: attaching mtd11
[ 3.826477] UBI: EOF marker found, PEBs from 40 will be erased
[ 3.826638] ubi0: scanning is finished
[ 3.872936] ubi0: volume 2 ("rootfs_data") re-sized from 9 to
430 LEBs
[ 3.873734] ubi0: attached mtd11 (name "rootfs", size 64 MiB)
[ 3.878347] ubi0: PEB size: 131072 bytes (128 KiB), LEB size:
126976 bytes
[ 3.884234] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page
size 2048
[ 3.890936] ubi0: VID header offset: 2048 (aligned 2048), data
offset: 4096
[ 3.897849] ubi0: good PEBs: 512, bad PEBs: 0, corrupted PEBs: 0
[ 3.904627] ubi0: user volume: 3, internal volumes: 1, max.
volumes count: 128
[ 3.910815] ubi0: max/mean erase counter: 1/0, WL threshold:
4096, image sequence number: 2142265782
[ 3.917902] ubi0: available PEBs: 0, total reserved PEBs: 512,
PEBs reserved for bad PEB handling: 40
[ 3.927275] ubi0: background thread "ubi_bgt0d" started, PID 54
[ 3.937007] block ubiblock0_1: created from ubi0:1(rootfs)
[ 3.942096] hctosys: unable to open rtc device (rtc0)
[ 3.956528] VFS: Cannot open root device "ubi0:rootfs" or
unknown-block(31,11): error -2
[ 3.956556] Please append a correct "root=" boot option; here
are the available partitions:



Any advice on this? Any background information that I can read up on? My
google searches have not come up with much. Ram knew about this, but I
don't know if it's otherwise a known issue.

The process works fine on the OEM system, so I assume this is some
ubinize format change which is incompatible with the older u-boot. Or,
the newer kernel code doesn't know how to deal with the UBI once the
older u-boot has mangled/attached it.

Seems like a backwards incompatibility issue.

Just to be clear, my kernel is inside the UBI, so u-boot must attach the
UBI and read the volume to boot.

Rebuilding and replacing my u-boot is probably not possible for now,
though I do have the OEM source. My device has a jtag, but I have not
tested it and that's out of my league.



Additional info below:

--

My u-boot version: U-Boot 2012.07 [Standard IPQ806X.LN,r40331]

The old OEM ubinize is 1.2 from mtd-utils-1.4.5.

The old OEM kernel is 3.4.103. New kernel is 4.4.15.

LEDE built from commit 21f460a5dbef5e3ec59e2032b5b113fe045b475f

The new LEDE ubinize version is 1.5.2. The ubinize command (via LEDE's
ubinize-image.sh script) to build my image was (paths truncated for
readability):

ubinize-image.sh --kernel .../TEW827DRU-uImage .../root.squashfs
.../lede-ipq806x-TEW827DRU-squashfs-factory.bin.tmp -p 128KiB -m 2048 -E 5

Notably the new LEDE ubinize command uses "-E 5" whereas the old OEM
does not, but I don't think that's related.

The ubinize.ini file looked like:

[kernel]
mode=ubi
vol_id=0
vol_type=dynamic
vol_name=kernel
image=/.../TEW827DRU-uImage
[rootfs]
mode=ubi
vol_id=1
vol_type=dynamic
vol_name=rootfs
image=/.../root.squashfs
[rootfs_data]
mode=ubi
vol_id=2
vol_type=dynamic
vol_name=rootfs_data
vol_size=1MiB
vol_flags=autoresize
Richard Weinberger
2016-08-11 12:22:58 UTC
Permalink
Hi!
I got that good old feeling... like I just jumped onto a bag of flaming poo.
Ha ha
Understandable. However, we also need to experiment and figure out the
mess left behind by $vendor which often doesn't leave a lot of
reasonable options for 3rd-party firmware to be installed.
With regard to that specific hack, I never truly understood why it was
needed in first place -- I'm not using it on any UBI-enabled device and
believe it's some kind of work-around to allow ubinized images to be
written via nandwrite, initially in order to support the vendor/stock
sysupgrade-format of a specific device (NETGEAR WNDR4300). Please
correct me or add the missing bits needed to understand the use-case.
It was added to OpenWrt long ago in r38681...r38683 and by now needed
to be fixed several times in r42940, r43287, r44658, r44801 and r44881.
Later on it was re-used by a bunch of other devices, e.g.
bcm4708-netgear-r6250, bcm4708-netgear-r6300-v2,
bcm4708-buffalo-wzr-1750dhp, bcm47081-buffalo-wzr-600dhp2 and probably
some more.
Gabor and Rafal should know more about it and why exactly this is
needed and supposedly cannot be solved without this hack.
I'm also confused about WTF that patch does. If it was device-specific to
comply with OEM-hackery, why apply it generally?
I reckon because it's generic in the sense that it's used by more than
one target (ar71xx, bcm47xx) and we don't do any device/board specific
patching at all.
Hm, I just found another example. I don't know why this didn't turn up in my
searches yesterday since it's a perfect match with the EXACT error. This too
https://patchwork.ozlabs.org/patch/509468/
I think I'll go rip that patch out here in a bit, recompile my image, and
see what happens.
In the end, this will at least give you some consistency in terms
of U-Boot's and the Kernel's UBI implementation. Ie. either both work
or both fail (e.g. to attach a not entirely erased/formatted UBI device
with left-overs from previous uses of the stock fw).
In case you are flashing the firmware using ubiformat, this shouldn't
be a problem anyway.
[...]
Thanks for the insight.
The idea was to have a UBI with three volumes: kernel, rootfs(squashfs), and
the rootfs_data overlay(ubifs).
One of my problems is that someone thought it was a great idea to name the
SMEM NAND UBI partition "rootfs". There's a patch out there which is
supposed to fix that, (rename to "ubi") but it's apparently not working for
me. The auto rootfs selection method might be trying to use the smem/mtd
parition named "rootfs" instead of the UBI volume named "rootfs"?
No, these are two different things and it shouldn't matter. However, in
order to have your UBI device auto-attached without any cmdline
parameters it needs to be named 'ubi', so simply changing the name of
the MTD partition in the device-tree should do the trick.
bootargs = "console=ttyMSM0,115200n8 ubi.mtd=11 root=ubi0:rootfs
rootfstype=squashfs";
Is that not valid? Looks right to me.
squashfs doesn't work on UBI character devices but rather likes block
devices only, just like most filesystems.
Thus, rootfs detection works automagically in OpenWrt/LEDE, just having
a ubi volume named 'rootfs' should do the trick and automatically
decide whether the volume is UBIFS and thus would be mounted similar to
what you tried to do now -- or to create a ubiblock-device and select
that to be mounted as rootfs. In any case, you shouldn't need any
kernel command-line parameters for that, so simply drop everything past
'console=ttyMSM0,115200n8' (and btw, this can also be done nicer by
setting stdout-path rather than hacking the cmdline).
Right. Depending on whether U-Boot's UBI support or the kernel itself
first touches the freshly-written UBI device things go wrong, becase
only the hacked-up OpenWrt/LEDE kernel does the right magic on
firstboot...
The kernel is in the UBI, so u-boot is going to attach it. I can't get
around that without doing major reconstructive surgery to how this thing was
designed to boot.
The number of OpenWRT/LEDE devices that have KERNEL_IN_UBI set are tiny. I
think I only saw one or two others, and they were obscure or dev boards.
This is likely why the issue hasn't come up before, and it could have been a
problem for awhile and nobody noticed.
I do the excact same for all boards on the oxnas target and it works
great. I even store U-Boot's environment inside UBI volumes.
I reckon it really depends on how you flash the device in first place,
ie. using raw nand-write (which may need the before mentioned hack to
erase the remaining free-space) or using ubiformat (which shouldn't
need that).
I don't know who's to blame. That's why I started this three-way cross
posting clusterfark. =)
Not too bad, at least we get to discuss some forgotten uglyness now
before it starts to affect more people...
I'm most tempted to blame the kernel rather than u-boot. After all, I can
change the kernel, and the old kernel worked fine.
I reckon it's somewhere between the way the image was generated and
written to the flash and then didn't get the right treatment on
first-boot because U-Boot tried to access it before it got fixed-up.
Again, if you just use ubiformat to write the image, you won't need
any EOF-markers or other hacks (ie. thus you also shouldn't include
them in the ubinized image!)
Did you intentional drop linux-mtd from the CC's after I offered you
to discuss your patches on linux-mtd? ;-)

Thanks
//richard
Daniel Golle
2016-08-11 12:34:10 UTC
Permalink
Post by Richard Weinberger
Did you intentional drop linux-mtd from the CC's after I offered you
to discuss your patches on linux-mtd? ;-)
I replied twice, once including all the CC's with the intention to
contribute to the general debate. And once to lede-dev, you and J Mo
intending to support J Mo creating board-support and figuring out how
to work with UBI in OpenWrt/LEDE which I assumed would be considered
noise for most readers of the other lists involved.
Daniel Golle
2016-08-11 12:31:27 UTC
Permalink
Hi,
Hm, I just found another example. I don't know why this didn't turn up
in my searches yesterday since it's a perfect match with the EXACT
https://patchwork.ozlabs.org/patch/509468/
I think I'll go rip that patch out here in a bit, recompile my image,
and see what happens.
Yep, I just ripped out that patch, rebuilt, and the UBI is working
[ 3.781400] ubi0: attaching mtd11
[ 4.475744] ubi0: scanning is finished
[ 4.490924] ubi0 warning: print_rsvd_warning: cannot reserve enough PEBs
for bad PEB handling, reserved 5, need 40
[ 4.492040] ubi0: attached mtd11 (name "rootfs", size 64 MiB)
[ 4.500155] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976
bytes
[ 4.506033] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
4096
[ 4.519603] ubi0: good PEBs: 512, bad PEBs: 0, corrupted PEBs: 0
[ 4.526430] ubi0: user volume: 3, internal volumes: 1, max. volumes
count: 128
[ 4.532680] ubi0: max/mean erase counter: 1/0, WL threshold: 4096, image
sequence number: 1454555262
[ 4.539660] ubi0: available PEBs: 0, total reserved PEBs: 512, PEBs
reserved for bad PEB handling: 5
[ 4.549141] ubi0: background thread "ubi_bgt0d" started, PID 54
[ 4.558711] block ubiblock0_1: created from ubi0:1(rootfs)
[ 4.563771] hctosys: unable to open rtc device (rtc0)
[ 4.576690] VFS: Cannot open root device "ubi0:rootfs" or
unknown-block(31,11): error -2
[ 4.576718] Please append a correct "root=" boot option; here are the
[ 4.583956] 1f00 256 mtdblock0 (driver?)
[ 4.596076] 1f01 1280 mtdblock1 (driver?)
[ 4.601109] 1f02 1280 mtdblock2 (driver?)
[ 4.606144] 1f03 2560 mtdblock3 (driver?)
[ 4.611178] 1f04 1152 mtdblock4 (driver?)
[ 4.616214] 1f05 1152 mtdblock5 (driver?)
[ 4.621249] 1f06 2560 mtdblock6 (driver?)
[ 4.626283] 1f07 2560 mtdblock7 (driver?)
[ 4.631319] 1f08 5120 mtdblock8 (driver?)
[ 4.636352] 1f09 512 mtdblock9 (driver?)
[ 4.641387] 1f0a 512 mtdblock10 (driver?)
[ 4.646423] 1f0b 65536 mtdblock11 (driver?)
[ 4.651544] 1f0c 384 mtdblock12 (driver?)
[ 4.656666] 1f0d 5120 mtdblock13 (driver?)
[ 4.661786] 1f0e 65536 mtdblock14 (driver?)
[ 4.666909] fe00 2728 ubiblock0_1 (driver?)
[ 4.672103] Kernel panic - not syncing: VFS: Unable to mount root fs on
unknown-block(31,11)
My squashfs root isn't mounting but that's another patch/issue.
That's what I told you in the previous mail, removing the rootfs=
parameter from the dts should do the trick, because you just cannot
mount a ubi device (which is a character device in Linux) with a
block-based filesystem (like squashfs). This cannot and won't ever
work and you could either leave it to OpenWrt/LEDE's auto-probing to
figure out what to do based on the rootfs type (non-ubifs vs. ubifs)
or append even more board- and filesystem-specific crap to your cmdline
such as ubiblock=... root=/dev/ubiblock0_1 (however, that then won't
work for ubifs, thus the auto-probing patches).
So that 494-mtd-ubi-add-EOF-marker-support.patch has gotta go or get fixed.
I agree, however, once again, it depends on how you write the ubinized
image to the flash in first place.
It's almost certainly been fking stuff up for a long time and just nobody
noticed before now because almost nobody has a kernel in their UBI. It
Not true. As I said, I'm using KERNEL_IN_UBI on all oxnas based targets
and also got U-Boot 2014.10 with UBI support touching the flash before
the kernel would fixup anything. Have a look at
target/linux/oxnas/image/Makefile for a 100% working example.
wasn't in OpenWRT AA/12.09, so it wasn't in the QSDK which my device is
based on.
Please read my previous email (I hope you actually received it?) for
more details.



Cheers


Daniel
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
J Mo
2016-08-11 13:15:32 UTC
Permalink
Post by Daniel Golle
That's what I told you in the previous mail, removing the rootfs=
parameter from the dts should do the trick, because you just cannot
mount a ubi device (which is a character device in Linux) with a
block-based filesystem (like squashfs). This cannot and won't ever
work and you could either leave it to OpenWrt/LEDE's auto-probing to
figure out what to do based on the rootfs type (non-ubifs vs. ubifs)
or append even more board- and filesystem-specific crap to your cmdline
such as ubiblock=... root=/dev/ubiblock0_1 (however, that then won't
work for ubifs, thus the auto-probing patches).
... OH!

Well, I needed some extra intellectual clubbing to catch on.

NOW I remember reading the UBI docs, about glubi, the fact that volumes
are char devices, and I even seem to remember some ALL CAPS red size-20+
text at the top of the page saying something about it.

Tomorrow I'll go read the docs again, because I know I remember reading
that you could put a RO-squashfs in a UBI volume. I just need to have
it mounted the right way.
Daniel Golle
2016-08-11 13:32:13 UTC
Permalink
Hi J,
Post by J Mo
Post by Daniel Golle
That's what I told you in the previous mail, removing the rootfs=
parameter from the dts should do the trick, because you just cannot
mount a ubi device (which is a character device in Linux) with a
block-based filesystem (like squashfs). This cannot and won't ever
work and you could either leave it to OpenWrt/LEDE's auto-probing to
figure out what to do based on the rootfs type (non-ubifs vs. ubifs)
or append even more board- and filesystem-specific crap to your cmdline
such as ubiblock=... root=/dev/ubiblock0_1 (however, that then won't
work for ubifs, thus the auto-probing patches).
... OH!
Well, I needed some extra intellectual clubbing to catch on.
NOW I remember reading the UBI docs, about glubi, the fact that volumes are
char devices, and I even seem to remember some ALL CAPS red size-20+ text at
the top of the page saying something about it.
Tomorrow I'll go read the docs again, because I know I remember reading that
you could put a RO-squashfs in a UBI volume. I just need to have it mounted
the right way.
Exactly. However, this makes mounting a UBIFS volume entirely different
from mounting a volume with any other (read-only) filesystem which
needs a ubiblock device (gluebi has been deprecated in favour of
ubiblock) to be created and subsequently mounted.
The idea of the auto-probing patches [1] was to keep things filesystem-
agnostic, ie. allow for either a single read-write UBIFS rootfs or any
read-only filesystem (e.g. squashfs) which needs ubiblock and have a
UBIFS read-write overlay on top.
In this way, all you have to take care of is *not* to have any rootfs=
or ubi* parameters in your kernel cmdline and all the rest should
happen automagically.


Cheers


Daniel
Heiko Schocher
2016-08-12 04:35:08 UTC
Permalink
Hello Richard,
Hi!
Post by J Mo
I tried re-flashing my UBI and tftpbooting my kernel before u-boot could
ever get a chance to mangle it, and now I get much further, though I'm still
[ 3.772502] ubi0: attaching mtd11
[ 3.826477] UBI: EOF marker found, PEBs from 40 will be erased
WTF is this?
Reading the corresponding patch makes me very sad.
Post by J Mo
[ 3.826638] ubi0: scanning is finished
[ 3.872936] ubi0: volume 2 ("rootfs_data") re-sized from 9 to 430
LEBs
[ 3.873734] ubi0: attached mtd11 (name "rootfs", size 64 MiB)
[ 3.878347] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976
bytes
[ 3.884234] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size
2048
[ 3.890936] ubi0: VID header offset: 2048 (aligned 2048), data
offset: 4096
[ 3.897849] ubi0: good PEBs: 512, bad PEBs: 0, corrupted PEBs: 0
[ 3.904627] ubi0: user volume: 3, internal volumes: 1, max. volumes
count: 128
[ 3.910815] ubi0: max/mean erase counter: 1/0, WL threshold: 4096,
image sequence number: 2142265782
[ 3.917902] ubi0: available PEBs: 0, total reserved PEBs: 512, PEBs
reserved for bad PEB handling: 40
[ 3.927275] ubi0: background thread "ubi_bgt0d" started, PID 54
[ 3.937007] block ubiblock0_1: created from ubi0:1(rootfs)
[ 3.942096] hctosys: unable to open rtc device (rtc0)
[ 3.956528] VFS: Cannot open root device "ubi0:rootfs" or
unknown-block(31,11): error -2
[ 3.956556] Please append a correct "root=" boot option; here are the
Any advice on this? Any background information that I can read up on? My
google searches have not come up with much. Ram knew about this, but I don't
know if it's otherwise a known issue.
The process works fine on the OEM system, so I assume this is some ubinize
format change which is incompatible with the older u-boot. Or, the newer
kernel code doesn't know how to deal with the UBI once the older u-boot has
mangled/attached it.
Seems like a backwards incompatibility issue.
Since OpenWRT/LEDE folks did more or less a hard fork of UBI I'm
ignoring this issue.
Ufff.... thanks for this info!
If you encounter something like that using vanilla UBI I'm all ears.
That said, I kind of understand that you, OpenWRT/LEDE, have a pile of
patches for auto probing rootfs
and other runtime stuff but touching the UBI on-flash format is beyond funny.
Doing so opens a can of worms and is painful for all parties. There
are customers which build their
products using OpenWrt and when they change the kernel at some point
it will get nasty.
This situation needs to be improved now. I invite you to discuss this
changes here on linux-mtd.
Especially the stuff where you change the on-flash format.
If UBI, or MTD in general, can do a better job in some areas, please
tell such that a decent solution can be found.
But your ad-hoc hacks need to stop.
Full Ack.

bye,
Heiko
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
J Mo
2016-08-12 05:13:42 UTC
Permalink
It's all moot now! I accidentally wrecked my u-boot today.

I typoed "nand write ${fileaddr} ${BOOTCONFIG_nand_addr} ${0x800}"

When I meant "0x800" instead of the undefined "${0x800}", which u-boot
translated to 0xacc0000.

I guess I'm going to find out if that JTAG header works.

Loading...