Discussion:
[U-Boot] OMAP3 NAND ECC bug report
Arno Steffen
2011-10-20 09:06:54 UTC
Permalink
I did tests with OMAP3 uboot. The SW-ECC (testet 1 bit, 4 bit BCH)
doesn't correct errors in environment (during power-up).
Compiling uboot for default HW-ECC - correction works fine.
Testet with TI's PSP 4.02.00.07 (almost like arago latest version).

I modified single bits by adding a patch (thanks Scott Wood) to uboot,
that allows write in raw mode to flash. Single bit error can corrected
with HW-ECC, not with SW-ECC.
This report is only about uboot environment, which is read in startup.
I didn't test it, but from my experience the correction works fine,
beside the initial reading of the environment, as for reading and
starting the kernel.

Imho this is very critical, as reliability is suffered if a device
isn't booting.

Best regards
Arno

PS:
The link below refers to a previous discussion.
http://lists.denx.de/pipermail/u-boot/2011-September/100486.html

in omap_gpmc.c I used this in board_nand_init()

...
nand->chip_delay = 100;
nand->ecc.mode = NAND_ECC_4BIT_SOFT;
nand_curr_device = 0;
omap_nand_switch_ecc(NAND_ECC_4BIT_SOFT, 1);
return 0;
}
Arno Steffen
2011-10-21 09:28:35 UTC
Permalink
I think reason that SW_ECC is not working, is in omap_gpmc.c:

omap_nand_switch_ecc():

case NAND_ECC_SOFT:
nand->ecc.mode = NAND_ECC_SOFT;
/* Use mtd default settings */
nand->ecc.layout = NULL;
printf("SW ECC selected\n");

The ecc struct is not setup?!? Someone has an idea?

@TI stuff: if you are not the right person to address this, can you
please forward?
Thanks

- Arno
Post by Arno Steffen
I did tests with OMAP3 uboot. The SW-ECC (testet 1 bit, 4 bit BCH)
doesn't correct errors in environment (during power-up).
Compiling uboot for default HW-ECC - correction works fine.
Testet with TI's PSP 4.02.00.07 (almost like arago latest version).
I modified single bits by adding a patch (thanks Scott Wood) to uboot,
that allows write in raw mode to flash. Single bit error can corrected
with HW-ECC, not with SW-ECC.
This report is only about uboot environment, which is read in startup.
I didn't test it, but from my experience the correction works fine,
beside the initial reading of the environment, as for reading and
starting the kernel.
Imho this is very critical, as reliability is suffered if a device
isn't booting.
Best regards
Arno
The link below refers to a previous discussion.
http://lists.denx.de/pipermail/u-boot/2011-September/100486.html
in omap_gpmc.c I used this in board_nand_init()
...
? ? ? ?nand->chip_delay = 100;
? ? ? ?nand->ecc.mode = NAND_ECC_4BIT_SOFT;
? ? ? ?nand_curr_device = 0;
? ? ? ?omap_nand_switch_ecc(NAND_ECC_4BIT_SOFT, 1);
? ? ? ?return 0;
}
Arno Steffen
2011-10-25 07:41:06 UTC
Permalink
I am feeling to spam the board but found finally the reason for this behaviour.

nand_read returns with -EUCLEAN in case of correcting errors, and this
will later on reported as BAD NAND, although this error is corrected.

Correct me if I am wrong. What I did is chaning env_nand.c:
while (amount_loaded < CONFIG_ENV_SIZE && offset < end) {
if (nand_block_isbad(&nand_info[0], offset)) {
offset += blocksize;
} else {
char_ptr = &buf[amount_loaded];
- if (nand_read(&nand_info[0], offset, &len, char_ptr)) {
- return 1;
+ err = nand_read(&nand_info[0], offset, &len, char_ptr);
+ if (err) {
+ if (err != -EUCLEAN) { // Bad NAND has been corrected, so no problem
+ return 1;
+ }
}
offset += blocksize;
amount_loaded += len;
}


Correct me if I am wrong, but it seems that all platform are suffered,
not just OMAP.
There will be compared a corrected bits before and after correction,
and it send an error message, if bits are corrected (instead of only
if it can not be corrected).

Best regards
Arno
? ? ? ? ? ? ? ?nand->ecc.mode = NAND_ECC_SOFT;
? ? ? ? ? ? ? ?/* Use mtd default settings */
? ? ? ? ? ? ? ?nand->ecc.layout = NULL;
? ? ? ? ? ? ? ?printf("SW ECC selected\n");
The ecc struct is not setup?!? Someone has an idea?
@TI stuff: if you are not the right person to address this, can you
please forward?
Thanks
?- Arno
Post by Arno Steffen
I did tests with OMAP3 uboot. The SW-ECC (testet 1 bit, 4 bit BCH)
doesn't correct errors in environment (during power-up).
Compiling uboot for default HW-ECC - correction works fine.
Testet with TI's PSP 4.02.00.07 (almost like arago latest version).
I modified single bits by adding a patch (thanks Scott Wood) to uboot,
that allows write in raw mode to flash. Single bit error can corrected
with HW-ECC, not with SW-ECC.
This report is only about uboot environment, which is read in startup.
I didn't test it, but from my experience the correction works fine,
beside the initial reading of the environment, as for reading and
starting the kernel.
Imho this is very critical, as reliability is suffered if a device
isn't booting.
Best regards
Arno
The link below refers to a previous discussion.
http://lists.denx.de/pipermail/u-boot/2011-September/100486.html
in omap_gpmc.c I used this in board_nand_init()
...
? ? ? ?nand->chip_delay = 100;
? ? ? ?nand->ecc.mode = NAND_ECC_4BIT_SOFT;
? ? ? ?nand_curr_device = 0;
? ? ? ?omap_nand_switch_ecc(NAND_ECC_4BIT_SOFT, 1);
? ? ? ?return 0;
}
Scott Wood
2011-10-26 20:26:55 UTC
Permalink
Post by Arno Steffen
I am feeling to spam the board but found finally the reason for this behaviour.
nand_read returns with -EUCLEAN in case of correcting errors, and this
will later on reported as BAD NAND, although this error is corrected.
while (amount_loaded < CONFIG_ENV_SIZE && offset < end) {
if (nand_block_isbad(&nand_info[0], offset)) {
offset += blocksize;
} else {
char_ptr = &buf[amount_loaded];
- if (nand_read(&nand_info[0], offset, &len, char_ptr)) {
- return 1;
+ err = nand_read(&nand_info[0], offset, &len, char_ptr);
+ if (err) {
+ if (err != -EUCLEAN) { // Bad NAND has been corrected, so no problem
+ return 1;
+ }
}
offset += blocksize;
amount_loaded += len;
}
Correct me if I am wrong, but it seems that all platform are suffered,
not just OMAP.
There will be compared a corrected bits before and after correction,
and it send an error message, if bits are corrected (instead of only
if it can not be corrected).
What version of U-Boot are you looking at?

In readenv(), I see nand_read_skip_bad(), not nand_read().
nand_read_skip_bad() handles -EUCLEAN.

-Scott
Arno Steffen
2011-10-28 06:10:35 UTC
Permalink
I have been used u-boot supportet (or better say not suppported) by TI
in their latest PSP/SDK (Juli 2011).
Base is 2010.06. But this issue was also inside older versions.

- Arno
Scott Wood
2011-10-28 16:06:09 UTC
Permalink
Post by Arno Steffen
I have been used u-boot supportet (or better say not suppported) by TI
in their latest PSP/SDK (Juli 2011).
Base is 2010.06. But this issue was also inside older versions.
It's not older versions I'm concerned about, but newer ones. This bug
has been fixed in mainline U-Boot, about a year ago. We don't have a
time machine to put the bug fix in older versions. :-)

-Scott
Arno Steffen
2011-10-31 09:36:33 UTC
Permalink
Sure, Scott, not blame on you. Just frustating, doing bughunting for
issues that have been fixed long time ago. I would expect a bit more
from TI in this case. My fault obviously.
Ok, finally I got it, that's what matters.
- Arno
Post by Arno Steffen
I have been used u-boot supportet (or better say not suppported) by TI
in their latest PSP/SDK (Juli 2011).
Base is 2010.06. But this issue was also inside older versions.
It's not older versions I'm concerned about, but newer ones. ?This bug
has been fixed in mainline U-Boot, about a year ago. ?We don't have a
time machine to put the bug fix in older versions. :-)
-Scott
Ivan Nardi
2011-10-27 15:22:43 UTC
Permalink
Post by Arno Steffen
I modified single bits by adding a patch (thanks Scott Wood) to uboot,
that allows write in raw mode to flash. Single bit error can corrected
with HW-ECC, not with SW-ECC.
Hi all,

I would be very interested in using that patch (it is very useful for
testing ecc algorithms)
In u-boot the command "nand biterr" isn't implemented yet and some
code found in the mailing list is pretty old and never merged (why?)
into mainline (http://www.mail-archive.com/u-boot at lists.denx.de/msg22963.html)

Could you share it?
Thanks a lot

Ivan
Scott Wood
2011-10-27 19:02:56 UTC
Permalink
Post by Ivan Nardi
Post by Arno Steffen
I modified single bits by adding a patch (thanks Scott Wood) to uboot,
that allows write in raw mode to flash. Single bit error can corrected
with HW-ECC, not with SW-ECC.
Hi all,
I would be very interested in using that patch (it is very useful for
testing ecc algorithms)
In u-boot the command "nand biterr" isn't implemented yet and some
code found in the mailing list is pretty old and never merged (why?)
into mainline (http://www.mail-archive.com/u-boot at lists.denx.de/msg22963.html)
Could you share it?
Thanks a lot
raw read/write was recently added to mainline U-Boot:

commit fb3659ac422801ea18f36ef62926df70beeada4c
Author: Marek Vasut <marek.vasut at gmail.com>
Date: Fri Sep 23 15:43:10 2011 +0200

NAND: Add nand read.raw and write.raw commands

These commands should work around various "hardware" ECC and BCH
methods.

Signed-off-by: Marek Vasut <marek.vasut at gmail.com>
Cc: Scott Wood <scottwood at freescale.com>
Cc: Stefano Babic <sbabic at denx.de>
Cc: Wolfgang Denk <wd at denx.de>
Cc: Detlev Zundel <dzu at denx.de>
[scottwood at freescale.com: s/write the page/access the page/]
Signed-off-by: Scott Wood <scottwood at freescale.com>

As for why the biterr patch wasn't merged, it's because nobody ever
resubmitted with a version that fixed the issues raised in review and
applies to the current tree.

-Scott
Loading...