MPS Minutes 20210624

From MPS Wiki
Revision as of 20:52, 24 June 2021 by Lennarz (talk | contribs) (Created page with "Present: HH, KL, MR, RC, RN, AD, AL (recorder) Meeting commenced: 10:02 am Meeting adjourned: 10:30am == Post-Mortem and BLM HV== *Ricky found origin of trip issue. Summary...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Present: HH, KL, MR, RC, RN, AD, AL (recorder)

Meeting commenced: 10:02 am Meeting adjourned: 10:30am

Post-Mortem and BLM HV

  • Ricky found origin of trip issue. Summary of Ricky's email below:
  • Ricky has been able to reproduce the same data line holding bus pattern Mike and Ricky observed in the live system when the HV trips.
  • The direct cause of the HV trip is because of inaccessible memory.
  • With the BLM firmware in production right now, the accessibility of the memory is determined by EPICS using a number of signals and condition for several scenarios. These conditions are especially tricky for determining the accessibility of the diagnostic memory.
  • Once the memory accessibility is determined, it in turns control the visibility of the “Update” button. Some conditions have been constructed incorrectly, which in turns causing the “Update” button is being shown while in fact the diagnostic memory is not accessible.
  • When the IOC tries to read the memory while it is not accessible, BLM ignores the memory read requests, hence the IOC holds the VME bus for 64us until its time-out mechanism kicks in.
  • Since the IOC is trying to do read request for 1 million points, so it will hold the VME bus (or does not release the bus properly) for more than 12 seconds.
  • Within these 12 seconds, a regular write to ISEG control register comes in, the data line with an undesired value (1 of the bit held high is the setEmergency bit) being held by IOC, so this undesired value got written into ISEG board and cause the emergency trip
  • This issue is just a mistake of showing the Update button incorrectly. It is not some unpredictable VME bus holding from other devices or code within IOC.
  • The latest firmware, v0.97 from Hubert it has 2 dedicated bits to indicate the memory accessibility. That would definitely help prevent this kind of reading inaccessible memory issue from happening.
  • However, in one specific case that Ricky tested it is possible that - if some one is fast enough to reset the BLM board within the 0.4 seconds memory read time frame, i.e. quickly hitting the Update button - and then the Reset button, then the later half (after reset) of memory read will become inaccessible and will likely cause this ISEG HV trip again.
  • Ricky also looked at the VME bus pattern when it is doing a normal memory read. The IOC can properly release the bus and even if a write request from other device comes in, it would yield to the request.
  • Thus, with latest BLM firmware v0.97, the chances of this ISEG HV trip happen again are greatly reduced.
  • To make it impossible, it is proposed to modify the driver to verify that the bit is indeed available and also disable the Reset button for the readout time. Using the bit should also avoid bus errors.
  • Implementing this has to wait until new firmware is rolled out.
  • Need time window to perform this work. HH thinks in principle ready as soon as time becomes available.
  • Need full day to perform updates for all BLMs
  • Need additional time for testing system plus system needs to be recommissioned (on agenda anyways for this summer).
  • To be confirmed with Brandon when the best time to do this is.
  • Ricky added LEDs on Post-Mortem Epics pages to indicate status of readout.


Second stage trip

  • See above for firmware status.

EPICS

Fibre

  • No update. Martin on leave.
  • If we get beam in near future, Andrea and Annika may perform further tests. Other option is to use EACA dark current. Not before week after July long weekend.

AOB