Digital Controller IC Minimizes Field Returns Using “Black Box” Tool And Online Diagnostics
Power supply companies can borrow a concept from the aircraft industry for a “Black Box” that monitors operation and stores that data for review if there is a failure. This concept would aid failure analysis for field returns that can be costly in both time and money to the companies and their customers, and the added time-pressure to diagnose and report a comprehensive failure analysis can further strain the vendor-customer relationship. Having the proper failure diagnostic tools to quickly debug and resolve the issue can mean the difference between success and failure of the product. You can configure an ADP1055 advanced PMBus digital controller IC for isolated power supply systems to provide the in-circuit Black Box.
Related Articles
- Digital control IC significantly boosts battery life
- Next-Gen Digital Signal Controller ICs Offer 4X More Memory
- Digital Power Supply Controller Enables “On The Fly” Firmware Upgrades
- PMBus Digital Power System Manager Monitors Board Energy Consumption
- Digitally Enhanced Power Analog Controllers
With online diagnostics and an in-circuit Black Box, problems can be mitigated and even result in a more robust design practice and system knowledge in the long run. The in-circuit Black Box features a data recorder of all the relevant and critical information of the power supply prior to a critical event or interrupt. Besides power supplies, you can easily apply this concept to other systems.
Black Box Operation
The Black Box feature of the ADP1055 can record to its EEPROM vital data about the faults that cause the system to shut down. The Black Box diagnostics tool can be considered in two parts: First, the ‘First Flag ID’ feature records the first instance of failure such as over current/voltage/temperature etc; and second, as the controller encounters such a fault a snapshot of the telemetry is captured (Fig. 1). This information is saved to the embedded non-volatile EEPROM, where it can be later retrieved for debugging purposes. In the presence of multiple faults, the First Flag ID that caused the system to shut-down is captured into the Black Box, along with all the telemetry information.
Since there are several parameters being measured in a digitally controlled power supply, the ADP1055 utilizes dedicated (not multiplexed) sigma-delta ADCs that are averaged over time for each measurement such as voltage, current, and temperature, and to ensure that the accurate data is captured, the measured quantity is recorded into the Black Box at the moment of shut-down.
This Black Box feature is extremely helpful in troubleshooting a failed system during testing and evaluation. If a system is recalled for failure analysis, it is possible to read this information from the EEPROM to help investigate the root cause of the failure.
This file type includes high resolution graphics and schematics when applicable.
Several options are available for recording to the Black Box, which includes:
· No recording, Black Box disabled
· Only record telemetry just before the final shutdown
· Record telemetry of final shutdown and all intermittent retry attempts (if device is set to shut down and retry)
· Record telemetry of final shutdown, all retry attempts, and normal power-down operations using the CTRL pin or the OPERATION command (as described by PMBusTM)
Black Box Contents
Two pages (PageA, PageB) of the EEPROM are dedicated to store the Black Box contents. This allows for a total of 16 records (each page comprises of eight records with 64 bytes each). The two pages form a circular buffer for recording Black Box information with data that gets overwritten on every 16th record.
The EEPROM is a page-erase memory, meaning an entire page must be erased before the page can be written to. Due to the page erase requirement of the EEPROM, after writing the eighth record of any page, the next page is automatically erased to allow for continuous Black Box recording.
Each time a record is written in the Black Box, the device increments the record number. Each Black Box write records the PMBus and manufacturer specific registers listed in Fig. 1 and Fig. 2.
A single Black Box record takes about 1.2 ms to program. However, there is an added page-erase time that must be taken into consideration to ensure that the fault recording occurs successfully. In the ADP1055 device (see sidebar , “About The ADP1055”), eight records can be written per page so whenever the record number is a multiple of 8n − 1 (n > 0), a page erase operation is initiated on the other page. The erase operation takes an additional 32 ms to complete. Hence every (8n-1)th write requires a page erase as well which brings the total recording time to 33.2 ms. The minimum delay time between each shutdown and retry cycle is recommended to be greater than the minimum Black Box programing time, which is 1.2 ms and can be extended to 33.2 ms in the worst case scenario.
Fig. 3 shows the timing of the write operation. Another consideration in successful Black Box recording is the loss of power supply voltage or VDD to the IC. The ADP1055 requires a constant VDD of 3.3 Volts for normal operation and Black Box operation. Typically in an isolated, DC-DC converter an auxiliary or always-on supply provides the power to the controller. In other situations a holdup capacitor on the VDD pin can be used to maintain the voltage above the UVLO threshold.
Black Box Readback
Two dedicated manufacturer-specific commands can be used to read back the contents of the Black Box data stored in the EEPROM. The READ_ BLACKBOX_CURR command is a block-read command that returns the current record N (last record saved) with all related data, as defined in the “Black Box Contents” section. The READ_BLACKBOX_PREV command is a block-read command that returns the data for the previous record N−1 (next-to-last record saved). Because these commands are block-read commands, the first byte received is called the BYTE_COUNT and indicates to the PMBus master how many more bytes to read.
It is recommended to use the ADP1055 GUI for viewing Black Box data, as it displays the entire Black Box contents in an easy-to-read, user accessible format.
The Black Box feature in the ADP1055 uses packet error checking (PEC) to ensure data validity. A PEC byte at the end of each Black Box record is specific to each record and is calculated using a Cyclic Redundancy Check (CRC) 8 polynomial. In a write to EEPROM, the PEC byte is appended to the data and is the last valid byte of that record. In a read from EEPROM, the header block of each record is used to calculate an expected PEC code, and this internally calculated PEC code is compared to the received PEC byte. If the comparison fails, the PEC_ERR bit in the STATUS_CML register is set, and that record is discarded because the validity of the data has been compromised.
Data Detection and Recovery
The Black Box algorithm relies on sufficient time to save the Black Box data and/or perform a page erase operation to prep the Black Box for recording. In a situation where VDD collapsed before the min programming time has elapsed, there is a potential of corrupting the data in the EEPROM. In addition, if VDD collapses during an EEPROM erase operation, the data inside the Black Box may also be corrupted.
In such a case, the Black Box algorithm can detect that a data corruption has occurred, and attempts to take corrective actions to resume proper Black Box recording. Note that the Black Box does not attempt to correct the corrupted data, but instead disregard the corrupted record and resume with recording at a different record. The description below details this scenario.
During VDD power-up, the Header Block of all records in the two pages of the EEPROM are read, and determined if the record is valid. A record is valid if it passes the following tests:
1. The Header Block and the PEC bytes must not be all 1’s, as that is the initial data of each record following a page erase.
2. The calculated PEC code (using the data from the Header Block) must match the received PEC byte.
3. The record number must fall within the valid record range, that is, it must be greater than the current record number and less than the Maximum Record Number.
If a record fails any of the above tests, that record is considered invalid and is discarded. If the record passes all tests, then the pointer to the last valid record number is updated. Below shows two scenarios where a potential VDD collapse can cause corrupted data, and its recovery process.
Scenario 1
In the scenario where VDD collapses before a record is completed, the PEC byte (which is the last byte written) will most likely be corrupted. During the scanning process on VDD power-up, this record will fail test 2 and will thus be discarded.
During the scanning process on power-up, if a record on PageA fails test 2, then PageB will be erased and the next record pointer will be the first record of PageB. For example,
PageA has:
0. Valid Rec_No_0
1. Valid Rec_No_1
2. Valid Rec_No_2
3. Valid Rec_No_3
4. Valid Rec_No_4
5. Valid Rec_No_5
6. Valid Rec_No_6
7. Valid Rec_No_7
PageB has:
8. Valid Rec_No_8
9. Valid Rec_No_9
10. Valid Rec_No_10
11. Valid Rec_No_11
12. Invalid Rec_No_12 (corrupted due to loss of VDD; will fail test2 on power-up)
13. Empty
14. Empty
15. Empty
At the end of the scanning process,
· The last valid record is Rec_No_11 of PageB,
· Invalid Rec_No_12 of PageB is discarded, and the PEC_ERR is set
· PageA will be erased, and the next record for storing is Rec_No_16 of PageA
· READ_BLACKBOX_CURR returns Rec_No_11
· READ_BLACKBOX_PREV returns Rec_No_10
Note that Rec_No 12-15 of PageB is lost, but that is acceptable to resume proper operation of the Black Box recording process.
Scenario 2
In the scenario where VDD collapses before a page-erase is completed, you have the potential of data corruption on the entire page that is being erased. During the scanning process on VDD power-up, all the records of this page may fail test 2 and may also fail test 3, in which case the records will be discarded.
Page A has:
0. Valid Rec_No_0
1. Valid Rec_No_1
2. Valid Rec_No_2
3. Valid Rec_No_3
4. Valid Rec_No_4
5. Valid Rec_No_5
6. Valid Rec_No_6
7. Valid Rec_No_7 (Black Box recording was successful, however, the page-erase was corrupted due to loss of VDD)
PageB has:
8. Corrupted due to loss of VDD during page-erase following Rec_No_7
9. Corrupted due to loss of VDD during page-erase following Rec_No_7
10. Corrupted due to loss of VDD during page-erase following Rec_No_7
11. Corrupted due to loss of VDD during page-erase following Rec_No_7
12. Corrupted due to loss of VDD during page-erase following Rec_No_7
13. Corrupted due to loss of VDD during page-erase following Rec_No_7
14. Corrupted due to loss of VDD during page-erase following Rec_No_7
15. Corrupted due to loss of VDD during page-erase following Rec_No_7
At the end of the scanning process,
· the last valid record is Rec_No_7 of PageA,
· invalid Rec_No_8 through Rec_No_15 of PageB is discarded, and the PEC_ERR is set,
· PageB will be erased, and the next record for storing is Rec_No_8 of PageB.
· READ_BLACKBOX_CURR returns Rec_No_7
· READ_BLACKBOX_PREV returns Rec_No_6
Note that in this scenario, there is no loss of records, as the incomplete PageB erase operation was re-started on power-up.
EEPROM Lifespan and Data Retention
The EEPROM of the ADP1055 has been specifically designed keeping in mind that power supplies are made to operate in the field for long lifetimes. The EEPROM of the ADP1055 has a data retention of up to 15 years at 125 °C. Also, during the lifetime of power supply, there can be multiple writes to the EEPROM which is also limiting factor in data retention. To improve data reliability of the EEPROM following excessive erase-program cycles, the ADP1055 limits the maximum number of fault records to either 158,000 (recommended when the ambient temperature of the ADP1055 is less than 85°C) or to 16,000 (when the ambient temperature of the ADP1055 is less than 125°C).
Following each Black Box recording to EEPROM, the current record number is incremented. On the occasion that a fault occurs and the current record number is greater than the maximum record number mentioned above, no additional Black Box recording is allowed because the EEPROM has reached its maximum allowed erase-program cycles and any additional recording is unreliable. The MEM_ERR bit in the STATUS_CML register is set to indicate this condition.
Fig. 4 describes how the ADP1055 – an advanced digital DC-DC controller with Black Box feature is used in a typical application. It is suitable for topologies such as full bridge, phase shifted, active clamp forward etc. and has several features such as redundant OVP, average and peak over current protection, GPIOs with mini FPGA, and active snubber. The ADP1055 can also function as a multiphase controller. No additional hardware is required to configure the black box. There is no firmware involved, as the ADP1055 is FSM-based (Finite State Machine) with dedicated logic for ease-of-use, so the user does not have to learn any new programming language.
The Black Box feature can also be effectively deployed during the manufacturing flow to detect failures in burn-in and stress testing during product verification and the early stages of pilot production. The Black Box takes debugging of a power supply to the next level and provides a focused guide for troubleshooting a complicated system. This leads to fewer customer failures and improved reliability metrics, such as Mean Time Between Failures (MTBF), through elimination of design issues found by the Black Box recorder.