MPS Minutes 20210624
Jump to navigation
Jump to search
Present: HH, KL, MR, RC, RN, AD, AL (recorder)
Meeting commenced: 10:02 am Meeting adjourned: 10:30am
Post-Mortem and BLM HV
- Ricky found origin of trip issue. Summary of Ricky's email below:
- Ricky has been able to reproduce the same data line holding bus pattern Mike and Ricky observed in the live system when the HV trips.
- The direct cause of the HV trip is because of inaccessible memory.
- With the BLM firmware in production right now, the accessibility of the memory is determined by EPICS using a number of signals and condition for several scenarios. These conditions are especially tricky for determining the accessibility of the diagnostic memory.
- Once the memory accessibility is determined, it in turns control the visibility of the “Update” button. Some conditions have been constructed incorrectly, which in turns causing the “Update” button is being shown while in fact the diagnostic memory is not accessible.
- When the IOC tries to read the memory while it is not accessible, BLM ignores the memory read requests, hence the IOC holds the VME bus for 64us until its time-out mechanism kicks in.
- Since the IOC is trying to do read request for 1 million points, so it will hold the VME bus (or does not release the bus properly) for more than 12 seconds.
- Within these 12 seconds, a regular write to ISEG control register comes in, the data line with an undesired value (1 of the bit held high is the setEmergency bit) being held by IOC, so this undesired value got written into ISEG board and cause the emergency trip
- This issue is just a mistake of showing the Update button incorrectly. It is not some unpredictable VME bus holding from other devices or code within IOC.
- The latest firmware, v0.97 from Hubert it has 2 dedicated bits to indicate the memory accessibility. That would definitely help prevent this kind of reading inaccessible memory issue from happening.
- However, in one specific case that Ricky tested it is possible that - if some one is fast enough to reset the BLM board within the 0.4 seconds memory read time frame, i.e. quickly hitting the Update button - and then the Reset button, then the later half (after reset) of memory read will become inaccessible and will likely cause this ISEG HV trip again.
- Ricky also looked at the VME bus pattern when it is doing a normal memory read. The IOC can properly release the bus and even if a write request from other device comes in, it would yield to the request.
- Thus, with latest BLM firmware v0.97, the chances of this ISEG HV trip happen again are greatly reduced.
- To make it impossible, it is proposed to modify the driver to verify that the bit is indeed available and also disable the Reset button for the readout time. Using the bit should also avoid bus errors.
- Implementing this has to wait until new firmware is rolled out.
- Need time window to perform this work. HH thinks in principle ready as soon as time becomes available.
- Need full day to perform updates for all BLMs
- Need additional time for testing system plus system needs to be recommissioned (on agenda anyways for this summer).
- To be confirmed with Brandon when the best time to do this is.
- Ricky added LEDs on Post-Mortem Epics pages to indicate status of readout.
Second stage trip
- See above for firmware status.
EPICS
- Issue with EPICS start-up after trip or IOC reboot. Even though threshold, delta and HV values are set to default, one has to toggle each slider before the readback matches the set value. HV won't start ramping before clicking into slider bar and adjusting voltage. Any increment is fine.
- MR likely one needs to add script to process individual records to make sure everything will come alive.
- RN, this also needs e-linac time for in-situ testing for verification.
Fibre
- No update. Martin on leave.
- If we get beam in near future, Andrea and Annika may perform further tests. Other option is to use EACA dark current. Not before week after July long weekend.