Original request for OEP contacts
Date: Sun, 16 Feb 1997 17:09 -0800 (PST)
From: GSAbrams@lbl.gov
Subject: Additional Online Contacts Requested
To: Distribution TECHBOARD TB_MAIL EXECBOARD ONLINE
Message-Id: <01IFHP9UQKV600201A@SERV03.SLAC.STANFORD.EDU>
X-Envelope-To: gpdf@cithe502.cithep.caltech.edu,
metzler@cithe502.cithep.caltech.edu, WEAVER@CITHE502.CITHEP.CALTECH.EDU
To Managers of BaBar detector subsystems :
As the Online system has evolved, I have had occasion to solicit from each
detector subsystem the name of an individual who would serve to coordinate
various activities within the Online in behalf of her respective subsystem.
To date, both Detector Control (slow controls) and DataFlow (DAQ) are, for
the most part, fully staffed with representives of the various subsystems.
At this time, the Online Event Processing (OEP) subsystem of the Online,
headed by Gregory Dubois-Felsmann of Caltech, has reached sufficient
maturity in its design to warrant and to require its own representative
(contact person, coodinator, code writer, etc) from each detector
subsystem. I would request that each subsystem consider the tasks and
responsibilities described below, and delegate a single individual to serve
as their OEP contact (even if the actual work of providing subsystem code
for OEP, as described below, will require more than one person's effort).
I would appreciate it if you would forward the names of these individuals
both to me and to Gregory (gpdf@cithe502.cithep.caltech.edu). I would hope
for an effort to identify the OEP contacts during this week's Collaboration
Meeting, but in any event would desire that each subsystem respond by the
end of February.
THIS IS HOW GREGORY DESCRIBES THE TASKS AND RESPONSIBILITIES -
Responsibilities of the OEP subsystem contacts
==============================================
o Primarily, represent the interests of their subsystems to the OEP
coordinator and the OEP group throughout the design, coding, commissioning,
and operation epochs of BaBar, and ensure that subsystem-specific code,
modules, data, and other necessary components are delivered in accordance
with the online and OEP design, code quality standards, and schedules.
o Subsystem contacts may do the above themselves or by arranging for
other members of their subsystem group to do the work. In the second
instance, the contacts should ensure that the subsystems' various
OEP-related efforts remain consistent and that the OEP group is always
able to obtain authoritative answers to questions related to a subsystem's
participation in OEP.
Some context: An overview of OEP as seen from the detector subsystems
=====================================================================
The Online Event Processing (OEP) subsystem of the online deals with the
phase of the data flow subsequent to the assembly of complete events
(whether in physics, calibration, or diagnostic running), but short of
Prompt Reconstruction.
The consequence of this definition is that the OEP subsystem will be
responsible for running the Level 3 trigger code, transforming the data
into a standard raw data format, providing event displays, providing
rapid-turnaround data quality monitoring (on time scales of seconds to
minutes), and, in most cases, running the final stages of detector
subsystem electronics, pulser, and source calibrations, as well as
diagnostics. Certain types of physics data driven calibrations such as
the EMC calibration with Bhabhas are also expected to execute in the OEP
subsystem, and there may be additional special tasks yet to be defined.
OEP activities will run on a subset of the online farm nodes -- during
data-taking, as many as will be needed to support Level 3 operation and a
reasonable level of monitoring, without incurring deadtime, at up to the
nominal 2kHz Level 1 event rate.
Subsystems will be responsible, through their OEP contacts, for providing
the detector subsystem-specific code that will run in the OEP subsystem.
Reliable online operation will require very high levels of code quality,
and a system designed to contain and minimize the effects of remaining
errors and and anomalies. Code to be run in the OEP system will be
required to pass a high level of quality-control scrutiny, and is
expected to be subject to coding standards that may be more stringent
and/or use special methods of error control and exception handling, in
comparison with general BaBar offline code.
NB: while there are subsystem-specific aspects to Level 3 and to the
event display, these projects are being managed separately from OEP, and
so are not envisioned to be the specific responsibility of the subsystem
OEP contacts. The Graphics and Level 3 Trigger groups will be responsible
for delivering code, while the core OEP group will be responsible for its
integration into the online.
The principal areas of responsibility of the subsystem OEP contacts,
then, will be the data quality monitoring (known as "Fast Monitoring"),
and, in collaboration with each subsystem's "calibration expert" (if
different from the OEP contact), the OEP components of calibration. Some
of these are discussed in further detail below; however, this list is not
exhaustive, and the demands on the OEP contacts will doubtless evolve
with time.
Fast Monitoring
===============
Fast Monitoring is a catch-all name for all manner of monitoring and
diagnostic code which will be run on event data in OEP, from:
- very simple tests to be done on every event (e.g., monitoring of
crystal occupancy in the EMC), through
- progressively more detailed tests to be done on progressively
smaller fractions of the data (e.g., calculation of residuals from
drift chamber track fits, L1 trigger simulations), and
- up to and including full reconstruction of a small (probably <10%)
sample of the data stream.
The final stage will probably include some basic physics checks such as
monitoring of the Ks and J/psi masses and mass resolutions.
The finite online farm resources available to OEP will have to be
allocated to provide adequate power for Fast Monitoring for each of the
detector subsystems; this allocation will include defining the sampling
fraction cascade above.
Fast Monitoring will be the first stage in the data flow where data
quality will be able to be assessed on whole events. It will, in most
cases, be the first line of defense for detecting problems of any
subtlety with the content of the raw data. Structural problems with the
data may be found by the core DataFlow system, and certain types of
content problems may be found by the subsystem feature extraction code
in the Read-Out Modules, but, even in these cases, non-trivial
analysis and reporting of these problems will probably be handled as part
of Fast Monitoring.
Fast Monitoring (and OEP in general) will be the only place in the
system to look at full events not included in the 100 Hz (nominal)
archival data logging stream that will go to Prompt Reconstruction.
By way of example, the baseline online system is not expected to be
capable of logging the full Bhabha event rate. They will be prescaled
for logging, to be sure, but applications requiring high Bhabha
statistics will probably be limited to OEP.
Fast Monitoring is intended for those diagnostics which require fast
response times. The OEP subsystem is being designed to make it possible
to get results from high-statistics parts of Fast Monitoring within
seconds of the acquisition of the data; the goal is to provide rapid
feedback in detecting and debugging problems. Diagnostics (as well as
calibrations) which can tolerate response times in the 0.5-2 hour range
and only require the archival logging stream's events will be able to be
run on a full reconstruction of the entire data volume within the Prompt
Reconstruction subsystem.
Quality assurance for the BaBar data, then, will rest to a large extent
on subsystems' provision of Fast Monitoring code of sufficient performance
and sensitivity.
Calibration
===========
Online electronics, pulser, and source calibrations will involve
interoperation between the DataFlow and OEP subsystems, just as for
normal data-taking. There will be in most cases be subsystem-supplied
calibration "user code" executing both in the Read-Out Modules (ROMs)
and in OEP. There will, therefore, be a need for coordination between
the OEP and DataFlow components of calibration for each subsystem.
The present calibration design envisions that a highly uniform coding
framework will be presented across DataFlow and OEP in order to simplify
this task. It is in fact intended that almost all calibration "user
code" will be able to be run without source code modification in either
environment.
In order to reach this goal, the requirements of the various subsystems
need to be made clear now, so that they may be accommodated in the design
of the calibration framework. This information is being actively solicited
by the calibration group, led by David Brown of LBNL. However, ongoing
and active subsystem participation will be needed.
To the extent that calibrations will run within OEP, the OEP group will
need its subsystem contacts to be able to represent the interests of
calibration. However, each subsystem will also be asked to appoint a
single coordinator (or "calibration expert") for online calibration.
For some systems the natural connection of calibration with DataFlow may
be far stronger than with OEP, or it may be more natural for the
calibration expert to be distinct from either the OEP or DataFlow
contact, so we are prepared to work with any of the three models
{OEP=Calibration, DataFlow=Calibration, or three separate persons}.
The key for OEP is that, as stated above, it still must always be possible,
regardless of the management model, to know where to go for prompt and
authoritative responses to matters arising in respect of a subsystem's
participation in OEP.
Other issues
============
The standard raw data format will be defined through cooperation among
the Online, Reconstruction and Simulation groups. The OEP subsystem
contacts will be involved in this process.
The OEP group will provide common tools to be used by all the subsystems
to the greatest extent possible -- for instance, a distributed histogram
package that allows collection of data across all the OEP online farm
nodes. The OEP subsystem contacts will be asked to assist in identifying,
as soon as possible, requirements for this package (to support Fast
Monitoring and calibration) and other plausible areas for this sort of
common tool development.
Additional details
==================
o In the design phase, beginning immediately and extending roughly
through Spring 1997, the subsystem contacts should participate in
developing the detailed requirements for the OEP system insofar as
these depend on detector subsystem needs. In this context, they will
probably have to inform their subsystem colleagues of the roles and
requirements of OEP, actively identify the subsystems' OEP user
constituencies, and collect, coordinate, and present their needs to
the OEP group.
o There will be regular OEP meetings (by video- or tele-conference),
probably beginning every other week during the design phase, which will
generally need to be attended by all of the subsystem contacts or by
appropriate representatives they may designate. It is viewed as
essential by the OEP group that the maximum reasonable commonality be
extracted from the subsystems' requirements, and that this can best be
achieved in an inter-subsystem group. (The OEP group will, however, be
sensitive to the need to avoid stretching the subsystems' efforts on a
Procrustean bed of excessive commonality.)
o In subsequent phases, the subsystem contacts will be responsible
for refining the requirements as experience with the online and
the detector subsystems accumulates, for delivering code and other
necessary subsystem-specific components of the system, and for
commissioning of those components. During commissioning and
detector operation, the OEP contacts should seek to channel and
prioritize requests from their subsystems to the OEP group. OEP
meetings will continue as needed, addressing issues that arise, and,
as before, attempting to maintain commonality within the system.
o During detector operation, there will necessarily be limited CPU
and other resources available for OEP operation within the online.
These resources will have to be distributed both within and among
the subsystems, and the OEP contacts can be expected to have an
essential role in the process of setting appropriate priorities,
especially in respect of Fast Monitoring.
Further information on the OEP subsystem is available in the
requirements and conceptual design documents available on the Web in
the October 1996 online review area.
Qualifications
==============
As a manager, the OEP contact needs to have the full confidence of the
subsystem's system manager(s), be able to speak authoritatively on
subsystem issues relating to OEP or immediately identify someone who can,
and preferably should be vested with the authority to obtain, within
reason, the resources required to ensure that the subsystem's part of OEP
remains on schedule. It is almost essential that the contact be involved
intimately with calibration, reconstruction, or analysis of the
subsystem's data, in order to have the necessary perspective.
Any subsystem personnel contributing code to OEP should have as an
absolute minimum a Paul-Kunz-course-level knowledge of C++, and ideally
well beyond that. Most useful would be exposure to modern object-oriented
programming techniques, possibly through the commercial courses that are
being made available to BaBar members, and/or prior experience in
collaborative C++ or other OO programming. The OEP group will, however,
strive to provide interfaces for subsystem-specific coding that are
readily understandable and/or closely analogous or identical to ones
used elsewhere in the experiment, including the offline.
**************************************************************************
This concludes the OEP description from Gregory Dubois-Felsmann. I would
like to append some additional remarks on calibration -
Another subsystem contact person who is now needed is one for the
subsystem calibration. Calibration (user) code will be exercised both in
the ROMs and in OEP, so the designated individual may well be one already
designated as an Online contact. The especial expertise and interest of
the subsystem calibration contact should be directed toward the response
of the subsystem, detector and/or front end electronics, and the detailed
understanding of that response using the calibration framework of BaBar.
At present, the calibration framework is under design within both
DataFlow and OEP. An effort to construct a general BaBar calibration scheme
within this framework is being led by David Brown (LBNL) within the context
of the perceived DIRC requirements. To achieve the appropriate level of
generality to be applicable to all BaBar subsystems, input is being sought
actively from the other subsystems delineating their special needs and
requirements, in the context of the February operations workshop and
collaboration meeting's calibration sessions.
I would therefore further request that the subsystem managers
forward to me, and to David Brown (Dave_Brown@lbl.gov), the name of their
calibration expert. Again, the desirable time scale for this action is
preferably during the Collaboration Meeting, or by the end of this month.
Regards,
Gerry
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Gerald (Gerry) S. Abrams Phone: (510) 486-7188
Lawrence Berkeley Laboratory FAX: (510) 486-5101
MS 50A/2160 Secretary :
1 Cyclotron Road (510) 486-5201
Berkeley, CA 94720 GSAbrams@lbl.gov
Distribution to: DISTRIBUTION
Gregory Dubois-Felsmann
Last modified: Thu Jun 5 18:59:53 PDT