Building (Better) Jet Algorithms for Run II – Workshop Results:

 

   For the full report see - www.physics.niu.edu/~blazey/jet_alg/jet_alg.html

 

From G.C. Blazey, J.R. Dittmann, S.D. Ellis, V.D. Elvira, K. Frame, S. Grinstein, R. Hirosky, P. Piegaia, H. Schellman, R. Snihur, V. Sorin, D. Zeppenfeld

 

Here is my brief synopsis: 

 

General Point: Many issues derive from the fact that experimental priorities/needs do not always match theoretical priorities/needs!

 

“History” starts 10 years ago with the “Snowmass Accord” (or the Snowmass Algorithm).  Idea was to have an agreed upon algorithm (hence accord) that everyone would use.  But, in practice, it was flawed –

·       Was not efficient – experimenters used seeds to limit where one looked for jets – this introduces IR sensitivity at NNLO

·       Did not treat issue of overlapping cones – split/merge question

 

Can miss configurations with widely separated energy that can still fit in a cone but with little energy in between – introduced RSEP – theory parameter to mimic this effect

 

Note Two, logically distinct, phases to jet finding:

1.    Identify contents of jets - partons, particles calorimeter towers – jet algorithm

2.    Add kinematic properties of jet contents (e.g., 4-vectors) to find jet kinematic properties – recombination scheme

 

 

“Big Picture” goal:  do 1% Jet Physics – precision QCD – in Run II,

 

·       for its own sake

 

·       to isolate physics beyond the Standard Model

 

 

Requires control of the Jet Energy to better than 1% (rates fall rapidly with the jet energy)


Requires understanding of jet energy calibration to < 1%

 

Requires understanding of corrections “back to” fixed order perturbation theory to < 1%

 

Requires “full disclosure” of all jet algorithm details

Strongly Recommends that both collaborations use the same (very similar) jet algorithm


History of HIDDEN issues, all of which influence the result

 

·       Energy Cut on towers kept in analysis (e.g., to avoid noise)

·       (Pre)Clustering to find seeds

 

-         Energy Cut on precluster towers

-         Energy cut on clusters

-         Energy cut on seeds kept

 

·       Starting with seeds find stable cones by iteration

-         In JETCLU, “once in a cone, always in a cone”, the “ratchet” or “Velcro” effect

 

·       Overlapping stable cones must be split/merged

 

-         Depends on overlap parameter fmerge

-         Order of operations matters

All of these issues impact the content of the “found” jet

 

-         Shape may not be a cone

-         Number of towers can differ, i.e., different energy

-         Corrections for underlying event must be tower by tower

 

 

Can be more important the differences between algorithms, e.g.,  seeded vs seedless

 

 

Need to learn from the past!


Goals of IDEAL ALGORITHM   (motherhood)

Fully Specified:  including defining in detail any preclustering, merging, and splitting issues

Theoretically Well Behaved:  the algorithm should be infrared and collinear safe (and insensitive) with no ad hoc clustering parameters (e.g., RSEP)

Detector Independence:  there should be no dependence on cell type, numbers, or size

Order Independence: The algorithms should behave equally at the parton, particle, and detector levels.

Uniformityeveryone uses the same algorithms (to the best possible approximation) – Recommendation: 1 legacy cone, 1 seedless cone, 1 KT algorithm – with E-scheme momentum recombination

 

Theory –

·       Boost invariant results – use variables with appropriate boost properties

·       Kinematic boundary stability – use variables with appropriate energy conservation to allow resummation calculations

Þ      Use E-Scheme variables

Experiment –

·       Minimize resolution smearing and angle biases

·       Stability with luminosity – not sensitive to multiple collisions

·       Efficient use of computer resources – but do not let this drive problems with physics issues (e.g., seeds and preclustering)

·       Easy to calibrate – not so worried about size of corrections as with accuracy of corrections

Cone Algorithm – particles, calorimeter towers, partons in cone of size R, defined in angular space, e.g., Snowmass (h,j)

CONE center - (hC,jC)

CONE  i Ì C iff   

Energy     

Centroid  

“Flow vector”  

 

Jet is defined by “stable” cone:

               

 

Stable cones found by iteration:  start with cone anywhere (and, in principle, everywhere), calculate the centroid of this cone, put new cone at centroid, iterate until cone stops “flowing”, i.e., stable Þ Proto-jets (prior to split/merge)

 

Theoretically can look “everywhere” and find all stable cones

Experimentally reduce size of analysis by putting initial cones only at seeds – energetic towers or clusters of towers – thus introducing undesirable IR sensitivity and missing certain possible 2-jets-in-1 configurations

 

Recommended solutions:

 

·       Seedless (but still efficient) cone algorithm – put initial cone at center of every tower

·       Legacy cone algorithm with midpoints between seeds as extra seeds (remove most of IR sensitivity and find most configurations)

 

 

 

Compare 3 versions of cone algorithm to test the impact of NOT finding all of the low energy proto-jets in the “flat plains”.  These are presumably just local fluctuations and not the footprints of energetic partons.

1.  Streamlined Seedless – only keep cones that do NOT “flow” outside of original tower

2.  Seedless – keep as preproto-jets only those towers for which centroid is within the original tower then fully process

3.  Fully process all cones that start centered on towers in central region (and which do flow outside of this region) – requires separation cut to avoid “over-merging”

 

Note that leading jet is robust for the cone jets!

 

Compare leading jets:

Algorithm

Leading ET (GeV)

Leading Tower #

2nd Leading ET (GeV)

2nd Leading Tower #

1

25.59

156

17.32

126

2

25.62

157

16.98

113

3

25.62

156

14.91

94

KT D=0.7

28.70

151

16.89

84

KT D=1.0

32.59

200

26.26

203

(KT algorithms do not count 0 energy towers)


Streamlined Seedless Algorithm: based on “flowing” idea above – stable cones do NOT flow.

·       Put trial cone at center of ever tower (in fiducial volume)

·       Calculate centroid for every cone (computer expensive)

·       If centroid is outside of original tower, drop the cone from the analysis – rapidly convergences to only stable cones –
proto-jets

 

 

 

E-scheme – (for single partons, particles, towers )

 CONE  i Ì C iff   

4-vector      
”Centroid”    

 

 

 

 

In any case, stable cones will overlap some – small effect for leading jet (but does systematically reduce energy).  So must define merge/split phase –

·       process proto-jets in decreasing energy order

·       merge if shared energy > f=50% of lower proto-jet energy

·       split if shared energy < F=50%, award to “closer” proto-jet

 


3rd possibility – give up geometric simplicity and the problems with overlap and use a KT algorithm. 

·       Combine partons, particles or towers pair-wise based on some measure of “closeness”, beginning with low energy first. 

·       Jet identification is unique – no merge/split stage

·       Resulting jets are more amorphous, energy calibration seemed difficult (subtraction for UE?), and analysis can be very computer intensive (time grows like N3)

 

Reduce number of “particles” by preclustering algorithm – can also deal with negative (and low) tower energies

·       BUT must be careful not to introduce biases like the seeds did

·       Still some serious work to do here

 

RECOMMEND:

·       Both experiments use legacy (mid-point), seedless and KT algorithms (all three!)

·       Use identical versions except for issues required by physical differences – all of this in preclustering??

·       Use E-scheme variables for jet algorithm and recombination

 

 

 


So our homework assignment has been:

 

·       Develop “common” code for an improved legacy cone algorithm

·       Develop “common” code for a seedless cone algorithm

·       Develop “common” code for KT algorithm

·       Use E-scheme variables through out

·       Account for differences in detectors with a “sensible” preclustering scheme

How have we done?         Are we done?