Berkeley CSUA MOTD:Entry 10280
Berkeley CSUA MOTD
 
WIKI | FAQ | Tech FAQ
http://csua.com/feed/
2024/11/23 [General] UID:1000 Activity:popular
11/23   

2003/9/22-24 [Computer/HW/Memory] UID:10280 Activity:nil
9/22    Any reason why an app compiled with -pg (gprof) would try to
        reference mem location 0xffffffff while the same code w/o the profile
        switch works fine?  Both compiled cleanly with -Wall & -pedantic.
        \_ Long shot, but I've got a machine with some bad memory at a high
           location.  If I run multiple memory hogging apps, the last one will
           always crash with ugly memory errors like that.  One day I'll get
           off my ass and replace the bad ram.
           \_ could you try this: http://www.memtest86.com  I'm curious if if works.
              \_ I  already did.  It didn't.  I know the ram is because
                 because crashes only started happening after I swapped some
                 old for new.
2024/11/23 [General] UID:1000 Activity:popular
11/23   

You may also be interested in these entries...
1999/11/20-22 [Computer/HW/CPU] UID:16930 Activity:moderate
11/19   Anyone have a link to a description of where the cache is on the
        celeron vs. where it is on the pentium ][? A friend of mine is
        trying to argue that the celeron's is not on the CPU, but is for the
        pentium. That's just wrong, but I want definitive proof to fuck him up.
        \_ I believe in early editions of the celeron, your friend is right.
                However, in later editions of the celeron, they added a
	...
Cache (8192 bytes)
www.memtest86.com -> www.memtest86.com/
CR Clear scroll lock Enables error message scrolling Memory Sizing The BIOS in modern PCs will often reserve several sections of memory for its use and also to communicate information to the operating system ie. It is just as important to test these reserved memory blocks as it is for the remainder of memory. For proper operation all of memory needs to function properly regardless of what the eventual use is. For this reason Memtest86 has been designed to test as much memory as is possible. However, safely and reliably detecting all of the available memory has been problematic. Versions of Memtest86 prior to v29 would probe to find where memory is. This works for the vast majority of motherboards but is not 100 reliable. Sometimes the memory size detection is incorrect and worse probing the wrong places can in some cases cause the test to hang or crash. Starting in version 29 alternative methods are available for determining memory size. By default the test attempts to get the memory size from the BIOS using the e820 method. With e820 the BIOS provides a table of memory segments and identifies what they will be used for. By default Memtest86 will test all of the ram marked as available and also the area reserved for the ACPI tables. This is safe since the test does not use the ACPI tables and the e820 specifications state that this memory may be reused after the tables have been copied. Two additional options are available through online configuration options. The first option BIOS-All also uses the e820 method to obtain a memory map. However, when this option is selected all of the reserved memory segments are tested, regardless of what their intended use is. Testing has shown that these segments are typically not safe to test. The BIOS-All option is more thorough but could be unstable with some motherboards. The third option for memory sizing is the traditional Probe method. In the majority of cases the BIOS-All and Probe methods will return the same memory map. For older BIOSs that do not support the e820 method there are two additional methods e801 and e88 for getting the memory size from the BIOS. These methods only provide the amount of extended memory that is available, not a memory table. When the e801 and e88 methods are used the BIOS-All option will not be available. The MemMap field on the display shows what memory size method is in use. Also the RsvdMem field shows how much memory is reserved and is not being tested. Memtest is also able to create patterns used by the Linux BadRAM feature. An error message is only displayed for errors with a different address or failing bit pattern. The test implicitly tests the CPU, L1 and L2 caches as well as the motherboard. It is impossible for the test to determine what causes the failure to occur. However, most failures will be due to a problem with memory module. When it is not, the only option is to replace parts until the failure is corrected. Once a memory error has been detected, determining the failing SIMM/DIMM module is not a clear cut procedure. With the large number of motherboard vendors and possible combinations of memory slots it would be difficult if not impossible to assemble complete information about how a particular error would map to a failing memory module. However, there are steps that may be taken to determine the failing module. Here are four techniques that you may wish to use: 1 Removing modules This is simplest method for isolating a failing modules, but may only be employed when one or more modules can be removed from the system. By selectively removing modules from the system and then running the test you will be able to find the bad modules. Be sure to note exactly which modules are in the system when the test passes and when the test fails. In these situations the components are not necessarily bad but have marginal conditions that when combined with other components will cause errors. There have been numerous reports of errors with only tests 5 and 8 on Athlon systems. Often the memory works in a different system or the vendor insists that it is good. In these cases the memory is not necessarily bad but is not able to operate reliably at Athlon speeds. Sometimes more conservative memory timings on the motherboard will correct these errors. In other cases the only option is to replace the memory with better quality, higher speed memory. On occasion test 5/8 errors will occur even with name brand memory and a quality motherboard. I am often asked about the reliability of errors reported by Mestest86. In the vast majority of cases errors reported by the test are valid. There are some systems that cause Memtest86 to be confused about the size of memory and it will try to test non-existent memory. This will cause a large number of consecutive addresses to be reported as bad and generally there will be many bits in error. If you have a relatively small number of failing addresses and only one or two bits in error you can be certain that the errors are valid. Frequently memory vendors question if Memtest86 supports their particular memory type or a chipset. Memtest86 is designed to work with all memory types and all chipsets. It is possible that a particular error will never show up in normal operation. However, operating with marginal memory is risky and can result in data loss and even disk corruption. Even if there is no overt indication of problems you cannot assume that your system is unaffected. Sometimes intermittent errors can cause problems that do not show up for a long time. You can be sure that Murphy will get you if you know about a memory error and ignore it. For example a faulty CPU that causes Windows to crash will most likely just cause Memtest86 to crash in the same way. Execution Time The time required for a complete pass of Memtest86 will vary greatly depending on CPU speed, memory speed and memory size. Here are the execution times from a Pentium-II-366 with 64mb of RAM: Test 0 0:05 Test 1 0:18 Test 2 1:02 Test 3 1:38 Test 4 8:05 Test 5 1:40 Test 6 4:24 Test 7 6:04 Total default tests 23:16 Test 8 12:30 Test 9 49:30 Test 10 30:34 Test 11 3:29:40 Total all tests 5:25:30 Memtest86 continues executes indefinitely. The pass counter increments each time that all of the selected tests have been run. Generally a single pass is sufficient to catch all but the most obscure errors. However, for complete confidence when intermittent errors are suspected testing for a longer period is advised. Memory Testing Philosophy There are many good approaches for testing memory. However, many tests simply throw some patterns at memory without much thought or knowledge of the memory architecture or how errors can best be detected. This works fine for hard memory failures but does little to find intermittent errors. BIOS based memory tests are useless for finding intermittent memory errors. Memory chips consist of a large array of tightly packed memory cells, one for each bit of data. The vast majority of the intermittent failures are a result of interaction between these memory cells. Often writing a memory cell can cause one of the adjacent cells to be written with the same data. In addition there is a never ending number of possible chip layouts for different chip types and manufacturers making this strategy impractical. However, there are testing algorithms that can approximate this ideal. Memtest86 Test Algorithms Memtest86 uses two algorithms that provide a reasonable approximation of the ideal test strategy above. With chips that are more than one bit wide it is impossible to selectively read or write just one bit. This means that we cannot guarantee that all adjacent cells have been tested for interaction. In this case the best we can do is to use some patterns to insure that all adjacent cells have at least been written with all possible one and zero combinations. It can also be seen that caching, buffering and out of order execution will interfere with the moving inversions algorithm and make less effective. It is possible to turn off cache but the memory buffering in new high performance chips can not be disabled. To address this limitation a new algo...