Original request for OEP contacts

Date: Sun, 16 Feb 1997 17:09 -0800 (PST) From: GSAbrams@lbl.gov Subject: Additional Online Contacts Requested To: Distribution TECHBOARD TB_MAIL EXECBOARD ONLINE Message-Id: <01IFHP9UQKV600201A@SERV03.SLAC.STANFORD.EDU> X-Envelope-To: gpdf@cithe502.cithep.caltech.edu, metzler@cithe502.cithep.caltech.edu, WEAVER@CITHE502.CITHEP.CALTECH.EDU To Managers of BaBar detector subsystems : As the Online system has evolved, I have had occasion to solicit from each detector subsystem the name of an individual who would serve to coordinate various activities within the Online in behalf of her respective subsystem. To date, both Detector Control (slow controls) and DataFlow (DAQ) are, for the most part, fully staffed with representives of the various subsystems. At this time, the Online Event Processing (OEP) subsystem of the Online, headed by Gregory Dubois-Felsmann of Caltech, has reached sufficient maturity in its design to warrant and to require its own representative (contact person, coodinator, code writer, etc) from each detector subsystem. I would request that each subsystem consider the tasks and responsibilities described below, and delegate a single individual to serve as their OEP contact (even if the actual work of providing subsystem code for OEP, as described below, will require more than one person's effort). I would appreciate it if you would forward the names of these individuals both to me and to Gregory (gpdf@cithe502.cithep.caltech.edu). I would hope for an effort to identify the OEP contacts during this week's Collaboration Meeting, but in any event would desire that each subsystem respond by the end of February. THIS IS HOW GREGORY DESCRIBES THE TASKS AND RESPONSIBILITIES - Responsibilities of the OEP subsystem contacts ============================================== o Primarily, represent the interests of their subsystems to the OEP coordinator and the OEP group throughout the design, coding, commissioning, and operation epochs of BaBar, and ensure that subsystem-specific code, modules, data, and other necessary components are delivered in accordance with the online and OEP design, code quality standards, and schedules. o Subsystem contacts may do the above themselves or by arranging for other members of their subsystem group to do the work. In the second instance, the contacts should ensure that the subsystems' various OEP-related efforts remain consistent and that the OEP group is always able to obtain authoritative answers to questions related to a subsystem's participation in OEP. Some context: An overview of OEP as seen from the detector subsystems ===================================================================== The Online Event Processing (OEP) subsystem of the online deals with the phase of the data flow subsequent to the assembly of complete events (whether in physics, calibration, or diagnostic running), but short of Prompt Reconstruction. The consequence of this definition is that the OEP subsystem will be responsible for running the Level 3 trigger code, transforming the data into a standard raw data format, providing event displays, providing rapid-turnaround data quality monitoring (on time scales of seconds to minutes), and, in most cases, running the final stages of detector subsystem electronics, pulser, and source calibrations, as well as diagnostics. Certain types of physics data driven calibrations such as the EMC calibration with Bhabhas are also expected to execute in the OEP subsystem, and there may be additional special tasks yet to be defined. OEP activities will run on a subset of the online farm nodes -- during data-taking, as many as will be needed to support Level 3 operation and a reasonable level of monitoring, without incurring deadtime, at up to the nominal 2kHz Level 1 event rate. Subsystems will be responsible, through their OEP contacts, for providing the detector subsystem-specific code that will run in the OEP subsystem. Reliable online operation will require very high levels of code quality, and a system designed to contain and minimize the effects of remaining errors and and anomalies. Code to be run in the OEP system will be required to pass a high level of quality-control scrutiny, and is expected to be subject to coding standards that may be more stringent and/or use special methods of error control and exception handling, in comparison with general BaBar offline code. NB: while there are subsystem-specific aspects to Level 3 and to the event display, these projects are being managed separately from OEP, and so are not envisioned to be the specific responsibility of the subsystem OEP contacts. The Graphics and Level 3 Trigger groups will be responsible for delivering code, while the core OEP group will be responsible for its integration into the online. The principal areas of responsibility of the subsystem OEP contacts, then, will be the data quality monitoring (known as "Fast Monitoring"), and, in collaboration with each subsystem's "calibration expert" (if different from the OEP contact), the OEP components of calibration. Some of these are discussed in further detail below; however, this list is not exhaustive, and the demands on the OEP contacts will doubtless evolve with time. Fast Monitoring =============== Fast Monitoring is a catch-all name for all manner of monitoring and diagnostic code which will be run on event data in OEP, from: - very simple tests to be done on every event (e.g., monitoring of crystal occupancy in the EMC), through - progressively more detailed tests to be done on progressively smaller fractions of the data (e.g., calculation of residuals from drift chamber track fits, L1 trigger simulations), and - up to and including full reconstruction of a small (probably <10%) sample of the data stream. The final stage will probably include some basic physics checks such as monitoring of the Ks and J/psi masses and mass resolutions. The finite online farm resources available to OEP will have to be allocated to provide adequate power for Fast Monitoring for each of the detector subsystems; this allocation will include defining the sampling fraction cascade above. Fast Monitoring will be the first stage in the data flow where data quality will be able to be assessed on whole events. It will, in most cases, be the first line of defense for detecting problems of any subtlety with the content of the raw data. Structural problems with the data may be found by the core DataFlow system, and certain types of content problems may be found by the subsystem feature extraction code in the Read-Out Modules, but, even in these cases, non-trivial analysis and reporting of these problems will probably be handled as part of Fast Monitoring. Fast Monitoring (and OEP in general) will be the only place in the system to look at full events not included in the 100 Hz (nominal) archival data logging stream that will go to Prompt Reconstruction. By way of example, the baseline online system is not expected to be capable of logging the full Bhabha event rate. They will be prescaled for logging, to be sure, but applications requiring high Bhabha statistics will probably be limited to OEP. Fast Monitoring is intended for those diagnostics which require fast response times. The OEP subsystem is being designed to make it possible to get results from high-statistics parts of Fast Monitoring within seconds of the acquisition of the data; the goal is to provide rapid feedback in detecting and debugging problems. Diagnostics (as well as calibrations) which can tolerate response times in the 0.5-2 hour range and only require the archival logging stream's events will be able to be run on a full reconstruction of the entire data volume within the Prompt Reconstruction subsystem. Quality assurance for the BaBar data, then, will rest to a large extent on subsystems' provision of Fast Monitoring code of sufficient performance and sensitivity. Calibration =========== Online electronics, pulser, and source calibrations will involve interoperation between the DataFlow and OEP subsystems, just as for normal data-taking. There will be in most cases be subsystem-supplied calibration "user code" executing both in the Read-Out Modules (ROMs) and in OEP. There will, therefore, be a need for coordination between the OEP and DataFlow components of calibration for each subsystem. The present calibration design envisions that a highly uniform coding framework will be presented across DataFlow and OEP in order to simplify this task. It is in fact intended that almost all calibration "user code" will be able to be run without source code modification in either environment. In order to reach this goal, the requirements of the various subsystems need to be made clear now, so that they may be accommodated in the design of the calibration framework. This information is being actively solicited by the calibration group, led by David Brown of LBNL. However, ongoing and active subsystem participation will be needed. To the extent that calibrations will run within OEP, the OEP group will need its subsystem contacts to be able to represent the interests of calibration. However, each subsystem will also be asked to appoint a single coordinator (or "calibration expert") for online calibration. For some systems the natural connection of calibration with DataFlow may be far stronger than with OEP, or it may be more natural for the calibration expert to be distinct from either the OEP or DataFlow contact, so we are prepared to work with any of the three models {OEP=Calibration, DataFlow=Calibration, or three separate persons}. The key for OEP is that, as stated above, it still must always be possible, regardless of the management model, to know where to go for prompt and authoritative responses to matters arising in respect of a subsystem's participation in OEP. Other issues ============ The standard raw data format will be defined through cooperation among the Online, Reconstruction and Simulation groups. The OEP subsystem contacts will be involved in this process. The OEP group will provide common tools to be used by all the subsystems to the greatest extent possible -- for instance, a distributed histogram package that allows collection of data across all the OEP online farm nodes. The OEP subsystem contacts will be asked to assist in identifying, as soon as possible, requirements for this package (to support Fast Monitoring and calibration) and other plausible areas for this sort of common tool development. Additional details ================== o In the design phase, beginning immediately and extending roughly through Spring 1997, the subsystem contacts should participate in developing the detailed requirements for the OEP system insofar as these depend on detector subsystem needs. In this context, they will probably have to inform their subsystem colleagues of the roles and requirements of OEP, actively identify the subsystems' OEP user constituencies, and collect, coordinate, and present their needs to the OEP group. o There will be regular OEP meetings (by video- or tele-conference), probably beginning every other week during the design phase, which will generally need to be attended by all of the subsystem contacts or by appropriate representatives they may designate. It is viewed as essential by the OEP group that the maximum reasonable commonality be extracted from the subsystems' requirements, and that this can best be achieved in an inter-subsystem group. (The OEP group will, however, be sensitive to the need to avoid stretching the subsystems' efforts on a Procrustean bed of excessive commonality.) o In subsequent phases, the subsystem contacts will be responsible for refining the requirements as experience with the online and the detector subsystems accumulates, for delivering code and other necessary subsystem-specific components of the system, and for commissioning of those components. During commissioning and detector operation, the OEP contacts should seek to channel and prioritize requests from their subsystems to the OEP group. OEP meetings will continue as needed, addressing issues that arise, and, as before, attempting to maintain commonality within the system. o During detector operation, there will necessarily be limited CPU and other resources available for OEP operation within the online. These resources will have to be distributed both within and among the subsystems, and the OEP contacts can be expected to have an essential role in the process of setting appropriate priorities, especially in respect of Fast Monitoring. Further information on the OEP subsystem is available in the requirements and conceptual design documents available on the Web in the October 1996 online review area. Qualifications ============== As a manager, the OEP contact needs to have the full confidence of the subsystem's system manager(s), be able to speak authoritatively on subsystem issues relating to OEP or immediately identify someone who can, and preferably should be vested with the authority to obtain, within reason, the resources required to ensure that the subsystem's part of OEP remains on schedule. It is almost essential that the contact be involved intimately with calibration, reconstruction, or analysis of the subsystem's data, in order to have the necessary perspective. Any subsystem personnel contributing code to OEP should have as an absolute minimum a Paul-Kunz-course-level knowledge of C++, and ideally well beyond that. Most useful would be exposure to modern object-oriented programming techniques, possibly through the commercial courses that are being made available to BaBar members, and/or prior experience in collaborative C++ or other OO programming. The OEP group will, however, strive to provide interfaces for subsystem-specific coding that are readily understandable and/or closely analogous or identical to ones used elsewhere in the experiment, including the offline. ************************************************************************** This concludes the OEP description from Gregory Dubois-Felsmann. I would like to append some additional remarks on calibration - Another subsystem contact person who is now needed is one for the subsystem calibration. Calibration (user) code will be exercised both in the ROMs and in OEP, so the designated individual may well be one already designated as an Online contact. The especial expertise and interest of the subsystem calibration contact should be directed toward the response of the subsystem, detector and/or front end electronics, and the detailed understanding of that response using the calibration framework of BaBar. At present, the calibration framework is under design within both DataFlow and OEP. An effort to construct a general BaBar calibration scheme within this framework is being led by David Brown (LBNL) within the context of the perceived DIRC requirements. To achieve the appropriate level of generality to be applicable to all BaBar subsystems, input is being sought actively from the other subsystems delineating their special needs and requirements, in the context of the February operations workshop and collaboration meeting's calibration sessions. I would therefore further request that the subsystem managers forward to me, and to David Brown (Dave_Brown@lbl.gov), the name of their calibration expert. Again, the desirable time scale for this action is preferably during the Collaboration Meeting, or by the end of this month. Regards, Gerry _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gerald (Gerry) S. Abrams Phone: (510) 486-7188 Lawrence Berkeley Laboratory FAX: (510) 486-5101 MS 50A/2160 Secretary : 1 Cyclotron Road (510) 486-5201 Berkeley, CA 94720 GSAbrams@lbl.gov Distribution to: DISTRIBUTION
Gregory Dubois-Felsmann
Last modified: Thu Jun 5 18:59:53 PDT