Matt Strassler 11/04/11
Did you know that most of the information produced in the proton-proton collisions at the Large Hadron Collider (LHC) is dumped irretrievably in the metaphorical trash bin — sent ingloriously into oblivion — yes, discarded permanently — as quickly as it comes in? By “most,” I don’t mean 75%. I don’t mean 95%. I mean 99.999% to 99.9999% of all the data at the Large Hadron Collider is erased within a second of its being collected.
It sounds crazy; how can a scientific experiment simply ignore the vast majority of its data?! Well, it’s not as insane as if first appears. Nor is it unprecedented; previous generations of hadron colliders have done something similar. And finally, it is absolutely necessary. In this article I’ll tell you why.
There are three main observations that lead to the decision to throw away most of the data. [I’ll describe this in the context of the two large general-purpose experiments at the LHC, called ATLAS and CMS; the details for the other experiments are quite different, though the issues are analogous.]
The first is that it doesn’t hurt (much). Most proton-proton collisions are boring. Certainly 99% of them are extremely dull (to a particle physicist, anyway), and the next 0.99% won’t raise any eyebrows either. Let ’em go. A few hadrons produced, maybe a couple of jets of rather low energy. No big deal. Nothing to see here, folks.
The second is that it what particle physicists are looking for at the LHC is certainly something very rare indeed. Profoundly interesting proton-proton collisions — say, ones in which a Higgs particle, or some other hypothetical new particle, might be produced — are exceedingly uncommon, at most one in 10,000,000,000 collisions and perhaps as rare as one in 10,000,000,000,000 collisions. If it were possible to distinguish and separate, in real time, the many dull collisions from those rather few collisions that show characteristics redolent of one of these very rare processes, then the data set with the more interesting collisions would be enriched with rarities. It turns out this can be done, with a reasonable degree of reliability.
Third, there’s really no practical choice. Given that we’re on the lookout for something as rare as one in 10,000,000,000,000 collisions, and discoveries are rarely possible unless a new physical phenomenon has been produced a few dozen times at least, we’ve no choice but to make 1,000,000,000,000,000 collisions or so a year. Accounting for the fact that the LHC isn’t on all of the time, that translates to about 100,000,000 collisions each second! If the experimentalists tried to keep and process all the data from all of those collisions, it would require something in excess of the entire world’s stock of computers — not to mention blowing through the LHC’s budget!
In short, ATLAS and CMS can only afford to process and store about 500 collisions per second. But if the LHC matched this limitation, and only produced 500 collisions per second, only one or two Higgs particles would be made at the LHC each year! Not nearly enough for a discovery to be made!
So there’s no choice. The LHC must vastly overproduce collisions, and the ATLAS and CMS experiments must determine, in real time, if a particular collision looks interesting enough to be worth keeping. Obviously this must be done automatically; no human committee could select, each second, a few hundred out of 100,000,000 collisions for permanent storage! The whole operation has to be done by computers, actually by a combination of hardware and software. The critically important system that carries out this key decision is called “the trigger”. For each collision, either the trigger fires, and the collision is stored, or it doesn’t fire, and the collision is lost for good. Clearly, the trigger had better work as intended.
Now is this trigger system, this strategy of dumping information overboard, really so strange and unfamiliar? Not really. It’s doing something similar to what your brain does every day, say, with faces. Think about it: if you commute by public transport to work each day, or walk to work within a city, your brain probably registers hundreds of faces daily. How many of them can you remember from last week, or last year? Probably just a few, belonging to those people with whom you had a conversation, a confrontation, a collision. It would appear that your brain only bothers to store what it considers relevant. Something — a rapid relevance-determining mechanism, over which you have little conscious control — triggers your brain to put a memory of a face somewhere where you can access it. Most of the memories get shunted to someplace inaccessible, and perhaps are even “overwritten.” And it’s a bad day when you fail to remember the face of someone highly relevant, such as a previous boss or a potential future spouse. Your trigger for remembering a face had better work as intended.
At an LHC (or other hadron collider) experiment, the trigger employs a number of strategies to decide whether a collision looks interesting. And that batch of strategies isn’t fixed in stone; it’s programmable, to a large extent. But it is still automated, and only as intelligent as its programmers. Yes, it is absolutely true: an unwisely programmed trigger can accidentally discard collisions manifesting new physical phenomena — the proverbial baby thrown out with the bathwater.
So particle physicists — the experimentalists who run the detectors and take the data, and the theorists like me who advise them — obsess and debate and argue about the trigger. It’s essential that its strategies and settings be chosen with care, and adjusted properly as the collision rate changes or new information becomes available. As the ultimate and irreversible filter, it’s also a potential cause of disappointment or even disaster, so opinions about it are strong and emotions run high.
What are the principles behind a trigger? What are the typical clues that make a collision seem interesting? The main clues are rarities, especially ones which theorists have good reason to believe might potentially yield clues to new physical phenomena. Here are the classic clues… a collision is more likely to fire the trigger if it produces
- An electron or positron (anti-electron), even of low energy
- A muon or anti-muon, even of low energy
- A photon, even of low energy
- A tau lepton or anti-lepton of moderate energy
- Signs of invisible particles of moderate energy
- Jets [manifestations of quarks, antiquarks and gluons] of very high energy
- Many jets of moderate energy
- Jets from bottom quarks of moderate energy
- Multiples or combinations of the above
In the absence of one or more of these rare things, a typical proton-proton collision will just make two or three jets of low energy, or even more commonly a featureless “splat” in which a few dozen hadrons go off haphazardly. These collisions generally just reflect physical processes that we’ve studied long ago, and can’t carry any interesting information about new phenomena of interest to particle physicists, so they are justifiably discarded. (A tiny fraction are kept, just as a cross-check to make sure the trigger is behaving as expected.)
On a personal note, I’ve spent quite a bit of time worrying about the triggers at the LHC experiments. In 2006 I studied some theories with then-student Kathryn Zurek (now a professor at Michigan) that in some cases predict physical phenomena on which the standard trigger strategies would rarely fire, potentially making certain new phenomena unnecessarily difficult to discover. To some extent this work had a minor role in encouraging the ATLAS and CMS experiments to add some additional trigger strategies to their menus.
Before I finish, let me clean up a small lie that I told along the way to keep things simple. Let me clarify what’s being discarded. To do that, I need to remind you how the LHC beams actually work. The LHC has two beams of protons, orbiting the LHC ring in opposite directions, and colliding at a few predetermined points around the ring. But the beams aren’t continuous; they are made from bunches, hundreds of them, each made from as nearly as many protons as there are stars in our home galaxy, the Milky Way. Collisions occur when one of the bunches orbiting clockwise passes through one of the bunches going counterclockwise. Most of the protons miss each other, but a handful of collisions occur. How large is a “handful” depends on what settings the LHC’s operators choose to use. Late this year (2011) there were as many as 20 to 40 collisions in each bunch crossing [You heard that right, each time two bunches cross, 20 to 40 pairs of protons collide.] That sounds insane too, but it’s not, because most of the time all of those collisions are dull, and very rarely one of them is interesting. The probability that two are interesting is very, very low indeed. The term for having all those extra collisions at the same time as the one you want is called “pile-up”.
Data is collected for each bunch crossing; each time two bunches cross, the detectors measure everything (well, more precisely, as much as possible) about all the particles in all 20 to 40 collisions that occur. The trigger’s decision isn’t whether to keep or discard a particular collision, but a particular bunch crossing. A typical bunch crossing has 20 or so dull collisions, and no interesting ones; if any one of them looks interesting, the whole set of data from that bunch crossing is read out. There’s no time to try to separate out the simultaneous collisions from one another; that has to be done later, long after the trigger has fired.