Data Parking at CMS

27 Responses

Pingback: Non-Standard-Model Higgs Particle Decays: What We Found | Of Particular Significance
Shannon says:

September 10, 2013 at 1:30 AM

Hey, I think your site might be having browser compatibility issues.

When I look at your website in Chrome, it looks fine but
when opening in Internet Explorer, it has some overlapping.
I just wanted to give you a quick heads up! Other then that, amazing blog!

Loading...

Reply
Pingback: Final Day of SEARCH 2013 | Of Particular Significance
Japan-Fishing.net says:

July 26, 2013 at 9:26 AM

Sometimes bad weather can affect the signal, but the total amount of
uptime still allows for optimal use. That is the hardest
part about the process of staring a business. Towards the
Knowledge-Driven Bench-Marking of Autonomic Communications.

Loading...

Reply
Pingback: It’s (not) The End of the World | Of Particular Significance
Tom says:

August 30, 2012 at 10:39 AM

I am trying to get my head around the scale of these interactions. I understand that a proton is pretty small. So when a bunch of protons meets the bunch going the other way, of the two hundred thousand million protons now occupying the same space only 20 collisions happen simultaneously? Of those 20 collisions are they all head on or does it include glancing collisions too? The proton is made up a soup of gluons and quarks and I imagine a lot of empty space. Even if the proton-proton collision is head on, do some of the colliding protons pass through each other with none of the soup constituents actually hitting another one from the oncoming bunch? Are there fluctuations in the proton where the soup constituents clump to make more dense regions? Is this analogous to two galaxies colliding where there is a extremely low probability of matter from the two colliding galaxies actually making contact?

Loading...

Reply
1. Matt Strassler says:
  
  August 30, 2012 at 12:03 PM
  
  Most of the proton-proton collisions are not head on and glancing blows are common. That’s why we can afford to make 20 collisions at a time; typically at most one or two of them will be head-on and have a (low) chance of producing a new and interesting phenomenon.
  
  The strong nuclear force is so strong that in fact the soup of quarks and gluons (and anti-quarks) inside the proton is very thick. So in fact the collision of two protons (even a glancing blow) always will involve significant interactions among some of those particles.
  
  However, you should think carefully about what you mean by “hitting each other” or “making contact”.
  
  What does “make contact” mean for galaxies? What you mean is that most of the stars pass by each other, interacting gravitationally, but without the clouds of hydrogen gas passing close enough that they interact electromagnetically, as nearby atoms do. “Contact” means, in this case, “electromagnetic interaction among the atoms in the two stars”.
  
  Meanwhile, if two protons whiz within about 10^-15 meters of each other, a collision occurs, and the gluons and so forth inside the proton will interact via the strong nuclear force. What would “make contact” mean here?
  
  Gluons aren’t like stars. They don’t have an atmosphere and a definite size and there’s no additional force that will allow their atmospheres to interact. [Or if there is, it’s probably not relevant at the LHC, because we would probably have already discovered it from the current experiments.] But also they’re quanta — ripples in gluon fields — and so they’re both inherently spread out through their wave-like behavior and also they’re able to do things that stars can’t. Even without coming closer together than their size — into “contact”, where a new contact force makes something new happen — they can still combine in pairs to make, say, a top quark/anti-quark pair, or a Higgs particle. The closer the gluons approach each other, the more likely such processes become; but there’s no sharp boundary. It’s more subtle than that.
  
  For instance, if two stars hit nearly head on, electromagnetic interactions will be extremely important; the stars will merge, leaving a single, bigger star with emission of a lot of gas and heat and light. If two gluons hit nearly head on, they may (a) pass right through each other (b) scatter off each other at a small angle or a large angle (c) disappear and create a quark and an anti-quark of any type, plus zero or one or more gluons (d) disappear and create a Higgs particle; (e) create a Higgs particle without disappearing; (f) make two photons or two W particles through indirect quantum effects; …….
  
  So you have to be careful about pushing analogies too far… you’ll end up not entirely wrong but also missing key elements.
  
  Loading...
  
  Reply
Mike Hall says:

August 14, 2012 at 9:38 AM

How does the trigger discriminate, timewise, between the various collisions that take place? If there are 20 million bunch crossings per second and ~20 collisions per crossing, and assuming that the bunches must be separated with space between each bunch in the LHC ring, then the time discrimination must in the nanosecond regime if the trigger is to be able to allocate individual detection events to a specific collision. These detectors are big pieces of equipment and signal wiring paths must have delay times of a comparable amount.

Loading...

Reply
1. Matt Strassler says:
  
  August 14, 2012 at 10:31 AM
  
  Yes, it takes light, and most of the particles produced in a collision, many tens of nanoseconds to cross from the collision point to the outside of the ATLAS detector. Many of the electronic signals take even longer. And every part of this immense detector is timed to a fraction of a nanosecond. If for some reason any part of the timing goes off on the millions of electronic channels, the resulting data is bad and can’t be used.
  
  Do not underestimate these people.
  
  Loading...
  
  Reply
Plato says:

August 14, 2012 at 6:16 AM

Matt writes “But the trigger is only as smart and unbiased as the people who program it, and there’s always the risk of throwing out the gold with the gravel, or at least being less efficient at keeping the gold than one would like. Everyone in the field knows this, and the experimentalists spend a lot of their time and personnel worrying about and tinkering with and testing the triggering strategies that they use.”

This had been on my mind in terms of the parameters you use in calorimeter realizations. Is this not of concern considering the trigger used defines for the investigators what fits the parameters or not, has to exist as a new signal. How does one determine that then?

Best,

Loading...

Reply
1. Matt Strassler says:
  
  August 14, 2012 at 8:58 AM
  
  The parameters used to make sure the measuring devices, such as the calorimeter, are working properly do not have much to do with trigger. The story of how one does this is long and complicated, but the trigger plays a minor role. There are always some classes of events (such as events with an electron of sufficiently high momentum) that will always be selected by the trigger, and that’s always enough information to test out and calibrate the equipment.
  
  As one of a dozen examples: An electron’s momentum can be measured by its track’s curvature in the magnetic field of the experiment, and its energy [which should be the same as its momentum, times c, the speed of light] can be measured in the calorimeter. Z particles, which decay to electron-positron pairs, have a known mass, are made abundantly (many per second) and are triggered with ease when they decay this way. Between these various facts one can already calibrate the electromagnetic calorimeter and (partially) the tracker.
  
  Loading...
  
  Reply
Gmalva says:

August 13, 2012 at 5:21 PM

I wonder if someone is thinking to use cloud computing to increase the data processing capacity at LHC. If the data we currently do not process has truly valuable information, why not just investing more money and buying a huge amount of extra computational power from the cloud? I am not sure what’s the cost model at LHC, but I doubt that operating the collider is cheaper than renting a few thousands virtual machines. Amazon EC2 cloud is made up of almost half-a-million Linux servers… there’s a lot of capacity to buy around at a low cost.

Loading...

Reply
1. Matt Strassler says:
  
  August 13, 2012 at 5:35 PM
  
  I doubt that’s the bottleneck as far as cost, but I’m not expert enough to answer. Still looking for someone to answer the good questions you’re all asking. The super-geeks among you can perhaps learn something from http://cdsweb.cern.ch/record/838359/files/lhcc-2005-023.pdf, though it may be a bit out of date in its details.
  
  Loading...
  
  Reply
2. embercadero says:
  
  August 13, 2012 at 8:33 PM
  
  Thinking logically: it is not probable that computational capacity is bottleneck here. If they would be able to store more data in realtime, they would do that, cause when it’s stored once, any analysis on it can be done anytime and anywhere, even in few years from now and it still may be valuable. The only logical explanation for such limit is that these 350+300 stored events per second is the most what current storage systems are able to store in realtime. And with that problem no clouds can help, only more dollars spent on newer and faster hardware.
  
  Loading...
  
  Reply
3. Xezlec says:
  
  August 13, 2012 at 11:10 PM
  
  See my response to “B” above, in which I point out that the internet isn’t cost-effective for very high-bandwidth data. And in fact, I wasn’t even taking into account the hugely greater bandwidth of *unprocessed* data.
  
  Also, the compute farms at CERN already use many thousands of computers in a very high-speed network. If you click the (awesome!) link Prof. Strassler shared and check out section 2.6, page 13, you’ll see a diagram showing that the processing is already done in a cloud made up of multiple processing centers arranged in a multi-tiered dedicated network. CERN has no need for Amazon. Since they’re using them all the time, they can probably get lower costs by building their own clouds than by paying for time from someone else’s. Why rent when you can buy?
  
  Loading...
  
  Reply
4. Plato says:
  
  August 14, 2012 at 6:20 AM
  
  My guess is that considering Seti or LIGO operations as examples, LHC could configure data in that way across many computers?
  
  Loading...
  
  Reply
  1. Matt Strassler says:
    
    August 14, 2012 at 8:11 AM
    
    You have to realize that the LHC computer people are already using state of the art equipment and methods. I am not a computing expert and do not know the reasons why they do and don’t do things, but you cannot imagine that they are uninformed and have not considered these options.
    
    Loading...
    
    Reply
  2. Xezlec says:
    
    August 14, 2012 at 8:24 PM
    
    Seti and LIGO are both examples of applications that are very CPU-intensive but involve only relatively small amounts of data. Again, this is exactly what grids are good for, but it does not describe the LHC data, which, again, is a very large amount of data and therefore probably very expensive to push out to the internet. This sounds like it has all the hallmarks of a bandwidth-bound, rather than CPU-bound, task. This is not what public computing grids are good for. Of course, the LHC already does send data to many computers, thousands of them, just not over the internet, which from the sound of it would be much too slow.
    
    Loading...
    
    Reply
  3. Plato says:
    
    August 14, 2012 at 11:46 PM
    
    Gerard “t Hooft:No ‘Quantum Computer’ will ever be able to out perform a ‘scaled up classical computer.’ http://online.itp.ucsb.edu/online/kitp25/thooft/oh/22.html
    
    Loading...
    
    Reply
  4. Plato says:
    
    August 15, 2012 at 12:00 AM
    
    ” The HLT (High Level Trigger)- http://www.lhc-closer.es/php/index.php?i=1&s=3&p=13&e=0 have access to all data. At the 1 MHz output rate of Level-0 the remaining analogue data is digitized and all data is stored for the time needed to process the Level algorithm. This algorithm is implemented on a online trigger farm composed of up to 2000 PCs.
    
    The HLT algorithm is divided in two sequential phases called HLT1 and HLT2. HLT1 applies a progressive, partial reconstruction seeded by the L0 candidates. Different reconstruction sequences (called alleys) with different algorithms and selection cuts are applied according to the L0 candidate type. The HLT run very complex physics tests to look for specific signatures, for instance matching tracks to hits in the muon chambers, or spotting photons through their high energy but lack of charge. Overall, from every one hundred thousand events per second they select just dizaines of events and the remaining dizaines of thousands are thrown out. We are left with only the collision events that might teach us something new about physics.”
    
    Loading...
    
    Reply
Markus Harder says:

August 13, 2012 at 5:09 PM

Interesting article. Two questions:
1) Why not store as much data as the parking lot allows? Processing could be postponed not only to 2013 but to some years later, when computing power has increased.
2) Would it make sense to store a (small) fraction of the bunch-crossings without applying any filter, to avoid any bias? Or at least only a very basic filter? I think, only so you make really sure you do not throw away something interesting that possibly only in the future turns out to be of interest.

Loading...

Reply
1. Matt Strassler says:
  
  August 13, 2012 at 5:28 PM
  
  1) I’m not sure of the precise pros and cons here. The experts will have to explain it. But there are financial costs.
  
  2) They do that. In fact they store all sorts of things to make sure they understand the filtering done by the trigger. But it doesn’t help: it does not allow you to “make really sure you do not throw away something interesting”. The unbiased (or minimally biased) sample will merely contain a vast number of typical proton-proton collisions that go “splat”, with no high-energy mini-collisions involving any pair of quarks, antiquarks or gluons. Remember only one in 5,000,000,000 collisions makes a Higgs particle, and you’re only able to store about that many bunch-crossings per year. Your unbiased sample can never be big enough to allow you to make or discover something new — which is why you need the trigger in the first place.
  
  Loading...
  
  Reply
B says:

August 13, 2012 at 3:28 PM

What will it take to allow more than 350 results/second to be processed? I assume this is tied to computing capacity so this number might increase over time with improvements in computer hardware.

To give us an idea of the computational requirements, how long would it take a desktop PC to process 1 result? What is the data size of 1 unprocessed result? Has there been any thought of a CMS@Home type of project to offload some processing to willing participants?

Loading...

Reply
1. Matt Strassler says:
  
  August 13, 2012 at 5:30 PM
  
  I’m sure it will improve over time; I expect there will be some increases by the time the LHC restarts after the shut-down that begins next year.
  
  Experts will have to answer your other questions; I’m sure they have thought about distributing computing power to the public and that there’s probably a fundamental obstruction to doing that.
  
  Loading...
  
  Reply
2. Xezlec says:
  
  August 13, 2012 at 11:00 PM
  
  Grid computing is great for projects that are very CPU-intensive but require very little bandwidth. Remember that the internet is about the slowest thing you can imagine, when compared to the high-performance networking you find in a serious compute cluster. In fact, the best high-speed internet access an ordinary person can buy in most areas is probably 10 times slower than even the cheapest network card you can buy for a PC nowadays (unless somebody out there is still selling those old 10 Mb/s cards).
  
  Those LHC events are pretty high-bandwidth, so anything containing the word “internet” is probably out of the question for that reason alone. A single event is 1.5 MB, so 350 of those a second is over 500 MB/s or 4 Gb/s. That’s the equivalent of 100 DS3 connections, which would run more than $100k a month according to one website I just found (and that sounds believable given other numbers I’ve heard). Assuming that’s accurate, it’s the cost equivalent of buying hundreds of desktop computers every month, so a grid is probably no better than just buying their own computers. (Disclaimer: I’ve never built a grid or anything, I’m just trying to take a stab at a possible answer.)
  
  Loading...
  
  Reply
  1. David Schaich says:
    
    August 15, 2012 at 5:40 PM
    
    Yes, there’s definitely no place for any distributed computing within the trigger pipeline, since the way must be cleared immediately for the next data, which come in continuously when the machine is running. However, after the data has been parked or stored, I don’t see any fundamental obstacle to using donated computing like that corralled by the Berkeley Open Infrastructure for Network Computing (BOINC) for SETI@home, LHC@home and the LHC@home Test4Theory. I suspect this approach simply isn’t of much use for the experiments given (1) the sheer amount of data, (2) what I expect to be a relatively small amount of processing per byte, and (3) the existing capabilities of the Worldwide LHC Computing Grid:
    http://wlcg.web.cern.ch
    
    As an aside, I actually first heard of BOINC when I was at CERN (as a summer student in 2005). The computing folks at CERN are quite familiar with all of these models, and they spent a lot of time and energy figuring out what would work best for their needs.
    
    This article was the first I’d heard about Data Parking or Delayed Data Streaming. I came out of my LHC physics classes a few years back with the (apparently mistaken, or at least outdated) impression that the ~350 events/second limit was set by the available bandwidth to storage, rather than by any pre-storage processing. So it’s quite a pleasant surprise for me to learn that there is some other place to shove additional data, and that the experiments are making good use of it.
    
    Loading...
    
    Reply
Pingback: The Trigger and the Parking Lot | Of Particular Significance