The LHC Olympics Black Box Page ----- Winter 2006 Version



Postscript: the Winter Olympics were by all accounts a rousing success:

  1. an easy signal suitable for beginners
  2. a more challenging signal for those who feel ready
  3. an easy signal inside of a moderately realistic standard model background --- the next level of difficulty


Stay Tuned: The Next Olympics will be August 24th-25th at the KITP in Santa Barbara.
This page is now retired; all additional updates can be found here.


WARNING
Much of the information on this page about the black boxes is now obsolete.
All of the information that is not obsolete is now located on the new Wiki,
which you can access from here.

Welcome!!
  
At this site you will find the new black boxes and calibration samples for the LHC Olympics!


The goal of the LHC Olympics data challenge is for participants to try to figure out what is in each black box --- what underlying model has generated this data?


Why participate?  There is no prize for winning --- this is not a competitive exercise! [well, not particularly competitive anyway.]   Instead, this is supposed to be fun and instructive at the same time!   The black box analyses are only a means to an end.... The real goals here include individual and community preparation for the LHC era, and the invention of new tools and techniques which will be valuable for experimentalists and theorists when the real data starts to arrive.  Indeed, we hope there will be many "winners", with different approaches to the data.

                                                                             
For the February 2006 round of the LHC Olympics (to be held at CERN February 9th and 10th, 2006),

each black box contains simulated data that would be generated at LHC by a new physics model, processed through a simulation of an LHC-like detector; note the black boxes contain no standard model backgrounds.  For each black box, we have provided
                                             

Please Note: Suggested Guidelines for Participants in the LHC OLYMPICS

So that the LHC Olympics can be a useful and enjoyable exercise for people with a wide variety of backgrounds, we suggest that no public announcements of "solutions" to the data challenge be made available in advance of our February conference --- it would spoil the fun!  However, other forms of communication, such as
are fine and indeed encouraged, as they will contribute to the success of the LHC Olympics effort.

      Thanks!!!  [from the organizers]

    
                                                                                                            

Here they are:
         NOTE THE FORMAT FOR THESE OLD FILES DIFFERS FROM THOSE OF THE THIRD OLYMPICS AND BEYOND!!!

The black boxes and calibration samples are in the form of large data files.  Each has its own website where you can learn more about it, and where you can look at plots of some distributions and tables of some inclusive signatures.  You can also download the data and do some analysis yourself.

Below you can find general information that applies to all of the black boxes --- or more specific information on how to use the plots and tables that the black box creators have provided, how to interpret and use the data files, more details on how the data files were generated, and features/issues with the detector simulation.
                                                                                
I. Black box "classic", which contains 20 [Not! see below] inverse femtobarns of data generated with the same [Not! see below] physics model as was used in the summer 2005 data challenge, which had only 2 inverse femtobarns of data. (See the important WARNING below.)  [See the new warning, as of 2/04/06, on the following link.]  The black box raw data files, plots and tables extracted from the data, and creators' comments can be found HERE.
                                                                                
II. Black box "uw1", which contains 25 inverse femtobarns of data of a new signal.  The black box raw data files, plots and tables extracted from the data, and creators' comments can be found HERE.
                                                                                
III. Black box "harvardbb", which contains two sets of files, one with 5 inverse femtobarns of data of a new signal, and one with 40 inverse femtobarns of the same signal.  The black box raw data files, plots and tables extracted from the data, and creators' comments can be accessed HERE.
                                                                                
The calibration samples are
                                                                                
A. a pure ttbar sample.
                                                                                
B. a diboson sample (WW, WZ, and ZZ production.)
                                                                                
Comments on how to use calibration samples are HERE.

                                              
 

 

The Plots and Tables ---
      
The creators of the black boxes have provided some information about "inclusive signatures", including plots of "kinematic distributions" and tables that show the numbers of events with certain characteristics. 

What is an inclusive signature?  Inclusive signatures are basically anything that can be measured from the full aggregate data (including all types of events) generated through a new physics model.  This is to be understood as opposed to what one might like to measure (masses of individual new particles, branching fractions of individual new particles) but which may not be possible to extract experimentally, particularly at a hadron collider.  For example, it is relatively easy to measure the mass of a resonance, such as a Z' that decays to a pair of jets or a pair of Z bosons.  But in models, such as supersymmetry, where the decay products of most new particles include an invisible particle that escapes the detector, such mass measurements are not in general possible.  In such cases, for example, events in which gluinos are produced cannot cleanly be separated from events where squarks are produced; it is only possible to add all these events together and inclusively consider the resulting signals. 

A kinematic distribution, in its simplest form, is simply a plot of the number of objects, or object-pairs, or events, as a function of a simple kinematic variable. 

For instance, one might want to know the number of events containing an electron with transverse momentum equal to p_T.  (Why?  Because it gives a measure of how much energy is being dumped, by a new physics process, into its decay products, among which are the electrons which the detector is seeing.)  So do exactly that --- plot the number of events as a function of p_T.  Of course, to make the plot, one has to make some decisions (and to interpret the plot, you have to know what its creator decided.)  When you plot the number of events versus the p_T of all electrons, do you include events with one and only one electron? or events with two electrons?  do you include events with both an electron and a muon?  What angular range can the detector measure electrons in?  Should you trim the edges of this angular range to avoid areas where the detector might make mistakes in its momentum measurements?  etc.   These subtleties can be very important for your interpretation of the plots, and you should read the captions closely.

A classic example of a simple kinematic distribution is a plot of the number of events with a muon and an antimuon versus the invariant mass of the muon-antimuon pair.  A plot of this distribution in the standard model will show a huge peak near the mass of the Z boson, as well as a peak at the J/Psi charmonium resonance.  A key question to ask of a signal is whether there are Z bosons being generated either in association with or in the decays of whatever new particles are being produced --- and a simple way to answer it is to make this plot.  (However, you should keep in mind that in the real LHC the standard model will infect all signals, so you'll always see some Zs; the question of determining whether the Zs are all from standard model processes or whether some come from new physics processes will be a very challenging one.)  Other important questions can be answered using this plot; see for example Hinchliffe and Paige's or Baer and Tata's approach to supersymmetric models (and note that their methods don't require supersymmetry and are much more widely applicable.)

Other classic distributions include: the number of events versus the missing tranverse energy in the events, or versus the sum of the magnitudes of the tranverse momenta of all high-momentum objects in the event, or versus the number of b-tagged jets in the events.

Tables that show numbers of events of a certain class are even simpler.   For example, how many events in the sample contain two positively charged muons and no other leptons?  How many contain two positively charged muons and at least three jets?  How many contain two positively charged muons each of which has at least 100 GeV of transverse momentum?  etc.  Tremendous amounts of information lurk in these basic numbers... if you can find it.

However, one must be very careful!  The detector does not detect electrons, muons and taus with the same "efficiency" (because of detector response to these particles and because of the isolation cuts imposed, which will differ.)  Also, taus that decay leptonically count as electrons and muons, though the electrons and muons in the decay have somewhat lower energy than the parent tau (which affects the above question we asked above: How many events contain two positively charged muons each of which has at least 100 GeV of transverse momentum?)  The number of jets depends on how you define a "jet".  The number of b-tagged jets will depend on how b-tagging works.  And for any of these tables, the number of anything you'd like to count depends on how the detector selects events to store on file --- "triggering".  This sounds very complicated; what's a naive theorist to do?

The key, of course, is to ask questions where these complications cancel.  Certain ratios are less sensitive than others to these details, for instance.  Correlations between the answers to certain questions may not be sensitive to these details.  Learning how to ask the right questions, questions whose answers have content and small sytematic uncertainties, is part of learning how to interpret the data from a hadronic collider such as the LHC.

By looking at the plots and tables, you should be able to extract a lot of information about the physics generating the signal.  You may find, however, that you want more information than we've provided.  At this point you may just want to stop and wait for the next LHC Olympics conference, or you may want to find an friend who is willing to "play experimentalist", study the data files, and provide you with the additional information that you'd like to have.   Collaborations of formal theorists and either phenomenologists or experimentalists are very effective in this regard!  Or you can try to play with the data files yourself.  You may wish to write your own software to do this, or you may wish to learn to use ROOT, or you might want to try a user-friendly software package, especially designed by the Harvard group for black boxes (though please note it has not yet been fully vetted by the LHC Olympics committee and should be used with appropriate caution.)


                                           
The Data Files --- How to read them: 

WARNING: THIS APPLIES ONLY TO THE DATA FILES FOR THE FIRST AND SECOND OLYMPICS!!!  SOME IMPORTANT DETAILS HAVE CHANGED.  ALL CORRECT INFORMATION HAS BEEN REPRODUCED IN THE NEW WIKI, WHICH YOU CAN ACCESS FROM HERE.


The data files in the black box and calibration samples are ordinary text files with rows of numbers; you can just read them by eye, without any conversion software.  Interpreting them by eye is also straightforward.  Here's how the files work.

The files consist of a long list of "events", individual proton-proton collisions that have generated a spray of particles inside the detector.  Each "event" represents the detector's output (very crudely, and considerably processed) in a particular proton-proton collision that happened to produce something sufficiently interesting that it merited storing permanently.  [How does the detector decide what is "sufficiently interesting"?  This is the crucial issue of triggering!]
                                                                               
Each event consists of a set of rows in the data file.  Each row corresponds to an "object" [a lepton, photon, jet, or missing transverse momentum.]                                                                              
A example of a top-antitop pair production event, with the top decaying semileptonically and the antitop decaying hadronically.
  
   1  2   -1.419  2.873     24.94     1.00    0.0    0.0          an isolated muon, positively charged, with 25 GeV of transverse momentum
   2  4   -0.804  2.307    130.99    16.14   10.0    1.0       a heavy-flavor jet (presumably a b quark jet) with 131 GeV of transverse momentum, an invariant mass of 16 GeV, and 10 charged tracks
   3  4    1.046  4.245     82.75    14.11    2.0    0.0         an ordinary jet  with 83 GeV of transverse momentum, an invariant mass of 14 GeV, and 2 charged tracks
   4  4    1.247  5.996     78.72    13.75   14.0    1.0        a heavy-flavor jet (presumably a b quark jet) with 79 GeV of transverse momentum, an invariant mass of 14 GeV, and 14 charged tracks
   5  4   -2.154  3.884     13.85     5.83    3.0    0.0          an ordinary jet  with 83 GeV of transverse momentum, an invariant mass of 6 GeV, and 3 charged tracks, at a very small angle to the beampipe
   6  6    0.000  6.245     92.14     0.00    0.0    0.0           the "missing transverse energy" in the event, 92 GeV, from a combination of the muon neutrino in the event and possible mismeasurements



Photons

A photon is detected as energy in the electromagnetic calorimeter, with no high-transverse-momentum track, and little energy in the hadronic calorimeter.  Isolation cuts are used to reduce backgrounds, such as a pi-zero decaying to photon pairs, or an electron if its charged track is missed.

Electrons

An electron is detected as energy in the electromagnetic calorimeter, with a high-transverse-momentum track pointing toward it, and little energy in the hadronic calorimeter.  Isolated charged pions can give electron-like signals, as can photons or neutral pions if a charged pion happens to point in the same direction and happens not to leave much energy in the hadronic calorimeter.  Electron isolation cuts are used to reduce backgrounds and remove electrons from heavy-flavor decays.

Muons

A muon leaves little energy in the calorimeters, has a track, and travels all the way to the muon-detection system outside the calorimeters.   Muons are rarely faked, though "punchthrough", where a hadron fails to leave all its energy in the hadronic calorimeter and punches through to the muon system, giving a fake muon, can be a problem in some circumstances.  Muon  isolation cuts are used to reduce backgrounds and remove muons from heavy-flavor decays.
Hadronically-Decaying Taus

Tau leptons decay about 1/3 of the time to either an electron or a muon plus neutrinos.  In this case, they cannot be distinguished from electrons or muons and appear in the detector as objects of electron and muon type.

The rest of the time taus decay to quarks plus a neutrino.  The quarks immediately turn into hadrons.  Because of the small mass of the tau, almost all of the tau's decays are to a pair of light quarks.  More precisely, the most common decays of the tau are to a neutrino plus
In the first two cases a single charged track, but one that leaves energy in the hadronic calorimeter and is clearly not an electron, is the result  --- a "1-prong" tau.  Any hadronic or electromagnetic energy is clustered in a very narrow cone surrounding the charged track (more precise statements accounting for the curvature of the track are unimportant here.)  In the second case, three tracks result --- a "3-prong" tau.   Thus, what appears in the detector is a very narrow jet, with invariant mass no greater than 2 GeV, and with 1 or 3 tracks.  Such an object is unlikely to be an ordinary QCD jet (fake rates are not small however) and a reasonably large number of taus will look like this (so efficiencies are not bad.)  No one really knows what the fake rates and efficiencies will actually be at the LHC, though tau detection is expected to be quite good because of the excellent spatial resolution of the calorimeters.  Detailed detector simulations are currently underway by the CMS and Atlas collaborations and should help clarify this issue.

Jets 

Jets are the most common and most problematic objects in hadronic collider physics.  We cannot do proper justice to this extremely complex problem here.  See for examples [we will add references.]  Crudely, jets are defined to be, well, jets of particles (as measured through tracks and through energy in both calorimeters) that fit inside a cone (in azimuth and pseudorapidity) of R=0.7, where R is the sum in quadratures of the azimuthal angle and the pseudorapidity away from the centroid of the jet.   However, this is completely ambiguous; a precise algorithm for defining jets, and resolving ambiguities, is needed.

NOTE THE FOLLOWING INFORMATION APPLIES ONLY TO THE FIRST AND SECOND OLYMPICS

For the PGS detector simulations used here, we have chosen the following algorithm (this is neither ideal nor a long-term solution, but it will have to do for now...)

Jets are defined using a cone algorithm centered on the highest Et tower (cell in eta-phi space) or "seed", i.e. cells within R=Conesize of this "seed" are included in the jet. Once such a jet is defined, the center of the jet may no longer be the "seed" tower. Treating each cell within the jet as a massless particle, a jet 4-vector is defined. If two jet  4-vectors are separated by less than R=Conesize, then these jets are merged into one jet. An artificial shoulder may appear in delta-R distributions as a result. Currently "Conesize" = 0.7, which is a common choice of jet theorists.


Missing Transverse Energy

What is meant by this term? And how precisely is it defined in the context of the PGS detector? 

"Missing transverse energy" is not missing energy; indeed, what is "tranverse energy"?!  Precisely stated, it is the magnitude of the missing transverse momentum in an event. 

Energy conservation cannot be used in a hadronic collider, because so much energy is carried off in unmeasurable particles --- remnants of the shattered initial protons --- inside or very near the beampipe.  For the same reason, momentum conservation along the beampipe cannot be used.  However, momentum conservation transverse to the beampipe should work.  A failure of momentum conservation in the transverse plane suggests
However, experimentally there is more than one way to define the missing momentum, because there are multiple measurements of momentum and they don't always agree.  Here is what we do, using our version of PGS:
   
Missing-Et is defined by summing (as a vector) the directed transverse energy deposited in all of the calorimeter cells (treating each cell as a massless particle) --- this combines, ideally, the momenta of all photons, electrons, hadronically-decaying taus, and jets --- and adding to this the transverse momenta of any muons, whose energy is measured using the muon detection system.   The magnitude of the resultant vector is the "missing transverse energy".

A caution: muon detection works only out to |pseudorapidity|=2, whereas the calorimeter extends to |pseudorapidity|=4, so muons at large |pseudorapidity| (very near the beampipe) can cause additional missed transverse momentum!



Kinematics

Here we define some of the key kinematic variables used above:



Column Seven

NOTE THE FOLLOWING INFORMATION APPLIES ONLY TO THE FIRST AND SECOND OLYMPICS

The information stored in this column is mostly for advanced users.  "R" is defined here.
All of the above tracks must be above a threshold of 1 GeV. Look at the PGS code for more details.
Heavy Flavor Tagging

Sometimes this is called "b-tagging", since the main goal of tagging is usually to detect bottom quarks, but in fact significant numbers of charm quarks get detected this way also.  The key feature of bottom and charm quarks is that they both live just long enough to usually decay at a measurable distance from the initial collision point.  When a hadron containing a bottom or charm quark decays after travelling a few millimeters from the collision point, the charged particles created in the decay can form a "displaced vertex", or at the very least, they do not point back to the collision point --- they have a nonzero "impact parameter".  The decays also can produce muons (which are harder to fake than electrons, so they are preferentially used) which are close to the jet.  The observation within a jet of a displaced vertex, tracks with nonzero impact parameter, and/or a single muon all give evidence that a heavy quark was somewhere in the jet. 

It is expected that about 50 (15) percent of jets containing bottom (charm) quarks will be "tagged" at LHC, while about 1 percent of other jets are tagged by accident --- "mistags".  However, one cannot take these numbers at face value.  First, adjustments in the tagging algorithm can increase or decrease all three tagging rates; certain analyses may need very pure samples, demanding very "tight" tagging requirements, whereas others may need high statistics, in which case "loose" requirements would be used.  Second, a single number is not a proper estimate of a tagging rate; the tagging probabilities for bottom, charm, and non-heavy-flavor jets are dependent on where the jet's transverse momentum and pseudorapidity (among other things, such as the luminosity.) 

NOTE THE FOLLOWING INFORMATION APPLIES ONLY TO THE FIRST AND SECOND OLYMPICS

The PGS detector used in the black box samples was set to have tagging probabilities of the following type:
where "Et" is the magnitude of the jet's "transverse energy" (constructed from the transverse momentum and the invariant mass of the jet.)  The additional factors of 1.1 accounts for using detected soft leptons from the heavy-flavor decays to boost the tagging efficiency.    (There is a subtlety concerning the somewhat crude way that PGS implements b-tagging which lowers this rate slightly; see our detailed discussion of PGS itself.)  Tagging efficiency falls off rapidly for jets near the beampipe; in our current implementation of PGS, no jet at |pseudorapidity| > 2 can be tagged.

Note also that using "1 percent" to describe the probability of mistags is inherently ambiguous.  High-energy gluons can quite often produce charm or bottom quark pairs as they form a jet.  When this happens, the term "mistag" is unclear --- when the experimentalists say they have a 1 percent mistag rate, does it include this effect?  The answer is no. Experimentalists mean that the probability of mistagging the gluon jet when it does not split into charm or bottom is 1 percent.  The overall tagging probability for a gluon is probably closer to 3 percent, 1 percent each for a mistag, a split to charm followed by a tag, or a split to bottom followed a tag.  While this naively seems like a small effect, it is sometimes very important.



Lepton Isolation:

Leptons are a very important sign of potential new physics, since, naively, QCD processes don't generate leptons.  But this isn't really true.  Jets generate leptons, or apparent leptons, in several ways.  First, a charged pion overlaid on a pi-zero, which decays to two photons, can look just like an electron: a track with electromagnetic energy.  Most of the time there's hadronic energy too, which disfavors identifying this as an electron, but fluctuations happen and sometimes the hadronic energy isn't registered.  So we get a "fake" electron.  It's harder to fake muons, but not impossible.  Of course, a fake electron will generally be inside, or near, a jet, since other hadrons typically will accompany the pion-fake-electron.  So if we demand the electron be isolated --- that there be no nearby tracks or energy in the calorimeter --- we are probably looking at a real electron.  Probably.

Another way to get an electron or muon is from the production of a bottom or charm quark, which has a certain probability of decaying to a lepton.  Such a lepton typically is also inside a jet formed from the rest of the shower of particles that are created as the bottom quark discovers it is confined.  But the kick from the bottom quark decay tends to knock the leptons out of the jets a little bit, and occasionally they will be isolated enough to be indistinguishable from "prompt" leptons from W bosons, or Z bosons, or other new sources.   Again, an isolation requirement reduces, though it does not eliminate, the chance of mistaking a prompt lepton from one that comes from a nearby heavy quark jet.

Lepton isolation requirements are generally different for electrons and muons, and in any case the efficiency with which a detector detects muons and electrons will be different.  Do not expect the numbers of muons and electrons to be equal, even within a standard model calibration sample!  Instead, you need to learn something about how the lepton isolation efficiency affects signals in order to draw correct conclusions about the underlying physics.

NOTE THE FOLLOWING INFORMATION APPLIES ONLY TO THE FIRST AND SECOND OLYMPICS

The current version of PGS used in these blackboxes has a new lepton isolation criterion compared to the PGS used for summer 2005's black box, and EXPECT THIS TO CHANGE AGAIN WHEN THE NEW VERSION OF PGS BECOMES AVAILABLE.





 Black Box General Information                                                                               
                                                                                
These types of blackboxes and the calibration samples are generated in three steps (through a single cross-linked computer program, which will soon be available to participants --- see below.)
                                                                                
1) Feynman diagrams (matrix elements) are calculated to obtain the rate for  a particular short-distance physics production process, such as quark-antiquark annihilation into two photons [this can be done with CompHEP or Pythia or Herwig or MadGraph/MadEvent or ALPGEN or other matrix element programs.]  A caution about such data generation can be found HERE.

2) the short distance physics is "evolved" to long-distance physics, accounting for the conversion of quarks and gluons into jets of hadrons, decays of tau leptons, and other processes of importance [this is done, for these blackboxes, with Pythia 6.324, though other programs including Herwig are available for this purpose]  Problems with this stage are commented on HERE.
                                                                               
3) the resulting hadrons and lepton and photons are run through a program called "PGS" (Pretty Good Simulation), written by John Conway (UC Davis) which serves as a simulated detector. Jet reconstruction and lepton identification are done at this stage.  The output of PGS is the blackbox data, or calibration sample data, that you are downloading.  See the warning BELOW.  The data files can be read by eye and are easily interpreted.
                                  

Some Very Important Supplemental Information





Some Aspects of Event Generators

Most event generators are wonderful for some things but have significant limitations for others.  Some are very easy to use and convenient to run, but will only do 2 -> 2 processes in the main scattering event (which leaves out many important 2 -> 3 and 2 -> 4 processes that can be important standard model backgrounds to new physics signals; for example g g -> t tbar Z is a source of large missing energy and leptons.)  Many cannot handle the cascade decays of new particles correctly; they may fail completely (because the phase space integrals required simply take too long) or they may simply discard some important information (such as the correlations between the spins of the new particles and how those correlations propagate into the decay products.)  Some generators that can handle these issues pretty well are unfortunately harder to modify to accomodate new-physics processes.  There is no simple solution here --- it is necessary to understand both the generator you are using and the physical processes (signal and background) that you are simulating, in order to avoid very significant errors.

Moreover, even if your event generator correctly computes tree level amplitudes, this doesn't mean it does the physics right.  Loop corrections are huge in QCD (more precisely, without a loop correction, tree amplitudes suffer from large ambiguities, since they are proportional to a power of a running coupling, whose value is not determined at tree level!)  This can be very roughly dealt with, process by process, by normalizing the rate for each process using a next-to-leading order computation of that rate and hoping the tree-level result is still giving the correct kinematic distributions.  But this is not practical for simulating many processes at once, since typically event generators are not written in such a way that you can easily adjust the normalization of each process by hand.  One should also remember that parton distribution functions are needed for predicting the rate of any given process, but these functions are neither perfectly determined from experiments (especially gluon and heavy quark distributions, which are important at LHC) nor free from effects of loop corrections.  So don't take any one of our black box data files too seriously --- our simulation of the signal from a new physics model is not, for these and other important reasons, what would actually be seen at LHC if this model were a correct description of the real world.  The errors are very hard to quantify without a detailed study of both the signal and standard model backgrounds.

Another modern effort in event generation involves the MC@NLO project (Monte Carlo at Next-to-Leading Order).


Some Aspects of Showering and Hadronization

[to be added]

Some Aspects of the PGS Detector

[to be added]

Triggering  Changed 12/07/05: Thank you to Patrick Meade, Csaba Csaki, Christian Spethmann and others at Cornell for their questions, studies and comments.

NOTE THE FOLLOWING INFORMATION APPLIES ONLY TO THE SECOND OLYMPICS

What is triggering?  Why is it necessary? How is it implemented [very, very crudely!] here? 

The rate of collisions at the Tevatron or the LHC is many orders of magnitude too large for a record of every collision to be stored.  The detectors are so enormous, with so many data channels, that to store the record of a single collision requires a stunning amount of memory.  Moreover, recording the events takes time.  Roughly 100,000,000 events per second occur, but only about 100 of these can be recorded. 

So how do experimentalists decide how to select 10^2 out of 10^8 events each second?  The detector must contain an elaborate "trigger" as part of its hardware and software which does a partial analysis of each collision to decide whether it is sufficiently interesting to be worth recording. 

For instance, if an event has a muon in it, it has a good chance of being interesting.  If there is a large amount of missing transverse momentum, it has a good chance of being interesting.  If there are several jets with a TeV of transverse momentum, it's interesting.  Etc.  So the trigger consists of a set of conditions: if an event appears to satisfy one or more of these conditions, the detector software will trigger a full readout of the detector data.  Otherwise, the event is dumped into the void, and lost forever.

Triggering is all about compromise.  We can't record all events with candidate electrons in them without having to throw away some events that have large missing transverse momentum --- there are just too many.  So we require any interesting electrons to have some minimum amount of transverse momentum... unless, say there are two leptons in the event, which is more rare, in which case we can lower that minimum, or, say, the event has both an electron and a substantial amount of missing tranverse momentum, which is also rare, in which case again the minimum could be lowered.  QCD produces huge numbers of events with high-p_T jets, so we can't record all events with high-p_T jets without having no storage space for events with muons.  So we might demand that an event, in order to be stored, have at least three jets that satisfy a condition: one jet must have at least 650 GeV of p_T, the next must have at least 300 GeV, then next at least 150 GeV.  Etc.

Now here's the problem: this means that the detector records events that pass a rather complicated set of conditions.  Even ignoring the fact that the detector trigger decisions are imperfect, this makes for a very complex situation.  For instance, we cannot easily ask how many events in a new signal have a muon compared to how many events have no leptons.  Or rather, we can ask it, but it doesn't tell us anything, because the trigger conditions for events with muons and for events with no leptons are completely different, and the effect of this difference is very hard to estimate unless, in addition to understanding your detector very well, you have a precise and detailed model of the new physics.... which was precisely what we were trying to construct in the first place!  So the problem is circular, and very difficult to solve.

Triggering is so complicated that we have decided not to address it properly yet in our LHC Olympics workshops.   On the other hand, not to do any triggering at all is to be misleading to the point of silly... we'd end up keeping all sorts of events which would not even be written down for storage by the LHC! 

NEW 12/07/05: The original description of our triggering prescription was incorrect, due to a misunderstanding of the agreed-upon procedure between the author of the website and the executor of the code.  Apologies!  Our original intention was to do something simple and very crude, but at least not totally unrealistic.  Instead, what we have actually done is more realistic than intended (still crude) and much less simple.  (At least it isn't less realistic and less simple.)  We are not providing participants currently with sufficient information to understand the effect of the trigger, and we are working to improve the situation and are discussing workarounds with experts at the present time.  Actually the whole issue is quite interesting and instructive, and we may open up a page on the wiki for further discussions amongst both organizers and participants.

Triggering involves a decision that must be made very rapidly.  Real detectors have to therefore make these decisions based on partial and incomplete and often erroneous information.  This means that interesting events are sometimes missed by the trigger, and conversely, events which seem interesting a first glance may turn out to be less so after being more carefully analyzed.  For instance, an initial look at the calorimeter may reveal a narrow isolated cluster of energy in the hadronic calorimeter that hints at being a tau.  To check if it is likely to be a tau, the triggering system looks to see if there are a small number of tracks in the vicinity (one or three would be expected.)   But because of the time available, the detector will reconstruct tracks quickly, using only two projected dimensions (radius and azimuthal angle phi) of the three-dimensional tracking information.  This allows two types of errors, with either sign: (a) the detector may fail to reconstruct a track which is actually present, for example because of tracks which are superposed and crossing when projected onto radius and phi, or (b) the detector will see a track that points at the tau-candidate cluster, but later, with more time for track reconstruction, this turns out to be a coincidence: although the track has the same phi as the cluster, its eta (psuedorapidity) is completely different.  Because of these errors, or more precisely, inefficiencies and fakes, an event with a tau may be thrown away, or an event without a tau may be kept, on the basis of the triggering process. 

Thus it is essential to distinguish between trigger objects (the imperfectly reconstructed electrons, muons, jets, etc. on the basis of which the trigger decision is made) and reconstructed objects (which are the objects that, having been carefully reconstructed, appear in the data files.)  The original intention, to keep things simple, was to base a crude pseudo-trigger on the reconstructed objects, which would make it easy for participants to understand why one event or another passes the trigger.  However, in fact, the PGS used in the current Olympics data set bases its crude pseudo-trigger on the trigger objects.  Consequently, one cannot, by looking at the reconstructed objects in the data file, understand precisely why a given event passed the trigger.  Precisely so as to avoid serious confusions about triggering, the experiments keep track of the trigger objects in any given event, as well as the reconstructed objects, and they keep track of precisely how a triggering decision was made for each event.  Unfortunately, we do not provide this information with our data sets, which is a genuine structural problem that makes life more difficult than it should be.  In particular, it makes analysis and use of the calibration sets much more complicated [as demonstrated clearly to us by the Cornell group -- thanks!]  We will attempt to address and improve these issues in later rounds of the Olympics, but have not yet decided on the best fix for the Winter 06 Olympics.

For nonspecialists, this should not affect a great deal of what you want to do.  Kinematic reconstruction of mass peaks and endpoints should not be much affected, and looking at ratios where trigger decisions and other issues largely cancel (which the experimentalists do all the time) should help a great deal with the analysis.  Simulating a model will be more subtle, however, and you will need to make sure that you use the right trigger, which is not the default trigger for PGS.  We will work toward helping participants with this in the immediate future... stay tuned.  The unofficial HarvardVersion of PGS on the LHC Olympics wiki does have the correct trigger.

For specialists, or anyone wanting to know how the current trigger really works, here's what we currently are aware of.  Caution: we may not have everything quite straight yet; we are waiting for confirmation from the experts that there are no mistakes in what follows.  At the trigger level, the trigger uses 3-dimensional clusters of energy in the calorimeter, hits in the muon chamber, and rudimentary tracking information to make lists of all the following candidate objects
a) photon candidates
b) electron candidates
c) muon candidates
d) hadronic tau candidates
e) jet candidates
HOWEVER, any cluster of energy that makes it into (a) also makes it into (b), and vice versa, so lists (a) and (b) are the same; also, all objects in (a), (b) and (d) also appear in (e).  Thus: there is no exclusive-or that says that a jet candidate is not a tau candidate, or that an electron candidate is not a photon candidate.  On the contrary: the trigger is as inclusive as possible: all photon candidates are also electron candidates and jet candidates.  But not all jet candidates are electron candidates, because not all jets have a lot of electromagnetic energy compared to their hadronic energy; so the logic is not reversible.

To say it differently: all large clusters (or clusters of clusters) of energy are jet candidates; all narrow clusters with few tracks are tau candidates; all clusters with an abundance of energy in the electromagnetic calorimeter are both photon and electron candidates.  Muon candidates all come from hits in the muon chamber, without an isolation requirement.

Also, the trigger system adds up all the energy in the calorimeters as two-vectors in the plane transverse to the beam, and asks how much transverse momentum is missing; this is the trigger-level missing ``energy''.  Currently, it does not test for the presence of muons (which don't leave much of their energy in the calorimeters) and so missing energy at trigger level may simply be due to muons in the event.  (The reconstruction-level missing energy objects that in the data files do not suffer from this issue.)

Once the lists of candidates are made, and the trigger-level missing energy is computed, the trigger makes a decision.  Most detectors have a long list of possible decisions.  We will include this actuality in later versions of the LHC Olympics.  Right now, we have a single floating trigger decision process, based on a single criterion (which, in retrospect, given what we actually did, is not intrinsically a great idea, but was a reasonable idea at the time given what was intended by the author... ah well)

For any simulated event, we consider all the trigger-level electrons and muons in the event with p_T >  10 GeV, all trigger-level taus and jets with p_T > 100 GeV, and the trigger-level missing transverse energy if it is greater than 50 GeV.  (We demand that at least one electron, muon, tau or jet pass these cuts; we do not trigger on events with only missing transverse energy and
soft jets.)  From the set of trigger objects which pass these cuts, we construct a quantity from the absolute values of the transverse momenta of all objects in all the trigger lists, weighted as follows: [our earlier formula had a typo that weighted the leptons with the number 4.0 instead of 5.0]

ht_sum = sum_{(b),(c)} 5.0 |pt| + sum_{(d)} 0.2 |pt| + sum_{(e)} 0.2 |pt| +  |pt_missing|    

where the sums are over the objects in the trigger lists (a), (b), (c), (d), (e) described above.  Notice list (a) does not appear; it is redundant, since lists (a) and (b) are identical.  Since we are summing over objects in all the trigger object lists, an electron candidate appears in list (a), list (b) and list (e); taus appear in list (d) and (e); and muons in list (c) can, as mentioned above, contribute to the missing pt.  Then we demand that

ht_sum > 150 GeV

If this is true, we keep the event; the objects in the event are fully reconstructed and written to the data files.  If not, we discard the event.

The condition chosen above is an extremely crude summary of a number of the essential trigger conditions at the CMS and Atlas detectors, though it still leaves out many details of the real triggering process, and uses tau's and b-tags less effectively than is expected to be possible.  (However, concerns have been raised that the weighting of jets is dangerously low; the trigger rejects too many fully-hadronic top pair production events, for example... a point well taken.)  If it had been applied to the reconstructed objects, then it would have been very easy for
participants to understand its effects.  But since it was applied to trigger objects, which are not provided to participants, it is essentially impossible --- currently --- to do so.


Comments on this section are welcome!


No Background?!!

A signal without standard model background is not just highly unrealistic, it is potentially deeply misleading. These backgrounds are huge.  Many of the features of a new signal can be swamped beyond repair by standard model processes.  It is easy to invent techniques which will work on a pure signal but will fail when the signal is contaminated by some standard model background.  Etc. 

It is important for participants in the LHC Olympics to think about this issue carefully.  However, with this apology, we proceed in this fashion for the moment because (1) the problem of extracting information from pure signals is already nontrivial, and can serve as a useful learning tool despite its drawbacks, and (2) this is the best that we can do at the moment, for important technical reasons; more on this HERE.  In the future, signals which include backgrounds will be a part of the LHC Olympics data challenges (though signals without backgrounds will still be provided for beginners.)  For the moment, if you want to think about the backgrounds to the current black boxes, some suggestions on how to do so are HERE.


How to use Calibration Samples

[suggestions to come soon]




Crude implementation of Standard Model Backgrounds -- possible methods:

How should I think about standard model backgrounds to these signals?
[answers to come soon]




Why it is so hard to make black boxes with realistic Standard Model Backgrounds --- a challenge for the reader:

It is not at all straightforward to make a suitable standard model background sample for a given black box!!!  Here are just a few of the issues.

All of the largest backgrounds at LHC involve QCD physics, either in full or in part.  For instance, for a signal in which events containing a single lepton plus jets play a part, the dominant background is often from a W boson produced in conjunction with jets.  Fine --- this is a standard model process --- why not just simulate the production of W bosons plus various numbers of quarks and gluons, and be done with it?

The problem is that it isn't possible.  Simulating this process, or rather, set of processes to a satisfactory degree is beyond the state of the art.

One problem is that we simply don't know how to generate these events with good accuracy.  Consider the W plus 4 gluons process... we can calculate the Feynman diagrams using various event generators.  It has a rate of order (alpha_s)^4.  But which alpha_s?  It's a running coupling, and at tree level there's simply no control over the scale \mu at which it should be evaluated.  We need the next-to-leading order process to be calculated also, in order to reduce the dependence on the choice of the scale \mu.  This hasn't been done; it involves a very non-trivial set of loop graphs, which have not yet been calculated.  (W plus three jets is at the cutting edge.)  Consequently, we can only guess at the best choice of \mu, and thus can only guess at the appropriate alpha_s.  Again, alpha_s appears to the fourth power in the rate. So the rate for this one process is only known to a factor of 2 or 3 or so.

This is only the beginning. To produce the background properly, we would need to combine the many W plus one jet, W plus two jet, W plus three jet, W plus four jet, W plus five jet processes.  Loop corrections have only been performed completely for W plus two jets; beyond this point, each of the individual W plus n-jet processes has its own uncertainties, of order a factor of 2 or 3, so the sum of the rates is very uncertain.  More jets means more powers of alpha_s, which reduces the absolute rate but increases the relative uncertainty.   And combining these processes in a consistent way, without double-counting events or incorrectly mixing orders in perturbation theory, is not trivial.

Another problem is purely technical and would be present even if our perturbative knowledge of W plus jets were perfect.  There are so many W's produced at the LHC --- hundreds each second --- that most of them have to be thrown away using the triggering system.  By contrast, a typical new signal might have a few thousands or tens of thousands of events per year.  If an important part of the signal involves a lepton and many jets, we will probably have to impose impose hard cuts --- i.e., impose strict kinematic conditions on the events --- that preserve much of the signal while discarding almost all of the standard model background.  What standard model background remains (which may still be much larger than the signal) will be a tiny fraction of the W plus jets events that LHC actually produces, and will lie far out on the tail of any standard model kinematic distribution.  

In this context, how should we provide LHC Olympics participants with this particular background?  Practically, we couldn't possibly provide the full W-plus-jets background, since we are talking about data sets which are 1000 to 100,000 times larger than the signal data sets.   But suppose, knowing each signal and its characteristic features, we imposed some cuts in advance, in order to reduce the W-plus-jets data sets down to the small fraction of the events that are the most important backgrounds to a particular signal.  This would mean simulating the tails of the W-plus-jets distributions.  We would still have to simulate huge data sets, of order 1000 times larger than the signal, in order to obtain these tails.  This would take weeks.  Also, the result for each separate process contributing to that background would be uncertain by a factor of at least 3, for the reasons mentioned above, and so the number and type of events remaining, after the stringent cuts that we would need to impose, could be wrong by as much as an order of magnitude or more. 

Meanwhile, although this is the worst of the backgrounds of importance, it is hardly the only one.  There are also important backgrounds from Z plus jets, top quark pairs plus jets, diboson (such as WW plus jets), and pure QCD (jets-only) backgrounds, among others.

Incidentally, a naive theorist might think one need not care about pure QCD light-quark and gluon backgrounds in a sample with leptons.  But this isn't true.  Leptons can be faked, especially hadronic taus.   Even fake electrons and muons, which occur rarely, are important; the number of QCD events is so extraordinarily large that their presence can often be a serious issue.  Also, real leptons that are sometimes isolated are generated in decays of bottom and charm quarks, which are produced in abundance in QCD events.

Finally, even if we could calculate perfectly, in perturbation theory, the W-plus-n-quarks/gluons backgrounds, we always have to account for the fact that n quarks and gluons in a Feynman diagram does not in general equal n jets in a detector.  Making sure we can model the differences successfully is highly nontrivial, involving resummation of showering effects, simulation packages and matching of those packages to data. This has to be done consistently at the one-loop level, if we want to make use of recent loop calculations of Feynman graphs; implementing this in the most important processes at LHC is still at the cutting edge, as in the ongoing MC@NLO project (Monte Carlo at Next-to-Leading Order).  Then there are uncertainties that are smaller, but not unimportant, from the parton distribution functions (pdfs).  For certain questions, the lack of precise knowledge about the gluon pdf and those of the charm and bottom quark can contribute to important uncertainties about backgrounds.  For instance, the b-quark and c-quark content of the W-plus-jets background is not well-known, so we can't at present know with precision the background to new signals that produce leptons in association with bottom or top quarks.

[More to be added later]                                      

Fortunately, the experimentalists will be able to combine data and theory with a lot of cleverness to remove many of these backgrounds with some degree of accuracy.  The crucial question of whether this can be done reliably, and under what circumstances, is hotly debated.  Eventually we hope that the LHC Olympics will include black boxes with reasonably simulated standard model backgrounds, and that these issues will come to the fore in the experts' portion of our workshops.

If these problems, which will afflict the entire LHC enterprise, both worry and interest you, feel free to contact the organizers, especially Matt Strassler or Steve Mrenna.  Many theorists are needed to help with state-of-the-art loop calculations and to help with modern event-generation related projects!





AN IMPORTANT NOTICE FOR PARTICIPANTS CONCERNING PGS:

NOTE THE FOLLOWING INFORMATION APPLIES ONLY TO THE FIRST AND SECOND OLYMPICS
The PGS program, written by John Conway, and modified slightly for LHC Olympics purposes by Steve Mrenna and friends, serves as the "detector".  But PGS, along with our adjusted algorithms for jet and muon reconstruction, has changed since the summer, so analysis done on the old blackbox data must be redone, and recalibrated, essentially from scratch.  [This actually happens in real life; for instance, ask your favorite experimentalist on the DZero detector what they've been doing during summer 05!]

Moreover, PGS WILL CHANGE AGAIN before the Winter 2006 LHC Olympics, and WE WILL CONSEQUENTLY HAVE TO RERUN THE DATA SAMPLES (BLACK BOXES AND CALIBRATION SAMPLES) BEFORE THE WINTER 06 LHC OLYMPICS.  You should periodically check this website for updates.

However, you should be able to do a reasonably effective analysis no matter how the detector performs, as long as you use the relevant calibration samples, which will help you determine how the detector is behaving.  [This is what's done in real life, after all.]

Before the Winter 2006 meeting, we expect to have a public downloadable version of PGS, allowing you to run your favorite models through your own version of the detector simulation.
                                                                               
                                    PGS 4 IS NOW AVAILABLE AND IN THE TESTING PHASE (AS OF JUNE 2006)