In Microcosm, CERN's science centre

Computing


 

How was the web born?

In 1989, Tim Berners-Lee, a British scientist working at CERN, wrote a proposal for an information management system for the Organization. The proposal was based on a marriage between the internet and hypertext. Tim Berners-Lee's boss was sufficiently interested to write 'vague, but exciting' on the cover and to encourage Berners-Lee to continue. At the same time, another CERN scientist, Robert Cailliau, had been thinking along similar lines and the two of them joined forces: Berners-Lee as the technical wizard and Cailliau as the Web's chief advocate.

By the end of 1990, Berners-Lee had defined the concepts of http, HTML and the URL - the fundamental concepts of the Web - and a prototype graphic web browser, server and web page editor were up and running. European physics institutes put up the first web servers in 1991, and on 12 December of that year, the Web crossed the Atlantic when the Stanford Linear Accelerator Center, which has close links with CERN, put the USA's first web server into service.

At the time Tim Berners-Lee was working on his internet vision, what operating system was he designing it to work on? UNIX?

There was no choice of operating system: it was the Internet protocols that mattered (TCP/IP), along with the http protocol that Berners-Lee invented. Berners-Lee used the NeXTStep operating system for development. This was created by Jean-Marie Hullot in Paris, adopted by the NeXT computer company and then adapted for Mac OS X. So the first server, browser and editor ran on NeXT which itself ran BSD Unix. Soon after, Robert Cailliau made a port to Mac OS 7 and 8, a student to Windows 3.1 and a colleague to DEC VMS.

The web was certainly designed to work everywhere there was TCP/IP since its primary goal was to make information available independently of the operating system.

 

<top

 

What is the GRID?

As the Web was CERN’s response to a new wave of scientific collaboration at the end of the 1980s, the GRID is the answer to the need for data analysis to be performed by the world particle physics community. With the LHC, the CERN experiments will have to handle petabytes (1 petabyte=1015 bytes. Each year the LHC experiments will produce enough data to fill a stack of CDs 20 km tall) of information that can not even be handled by the most advanced computers currently available. The proposed solution is the GRID, a very powerful tool tying computing resources distributed around the world into one computing service for all requesting applications. A rapid and natural consequence of the GRID has been the development of a Europe-wide database of mammograms for epidemiological, as well as teaching purposes. Today, 32 Mbytes are needed to store a mammogram image, giving a total of 128 Mbytes per person per visit.

 

<top

I have heard about new protocols being developed at CERN for faster data transmission. What is this all about?

On 20th April 2004, CERN transferred data to the California Institute of Technology so fast, they earned a place in the Guinness Book of Records. And since then, the transfer rate just gets faster...

Over 6.25 Giga-bytes travelled the 11 000 km from CERN to California every second - the equivalent of a DVD every 5 seconds.

CERN works on optimising data transfer protocols which enable the switches and routers on computer networks to change faster.

 

<top

Is there a public implementation for the "high speed" protocols used/implemented at the CERN?

as for the World Wide Web, other network technologies and tools are going to be part of our current life.
As an example you can consider two data transfer systems used for Grid applications, both publicly available, the first one is called bbftp, the second one GridFTP.
bbftp was widely used to transfer large files (several Gigabytes, from Europe to USA and viceversa) within the DataTAG project, a project to test the interoperability of the European and American Grids.
The second one, GridFTP, is a project that produces high-performance, secure, reliable data transfer technologies optimized for high-bandwidth wide-area networks. It is part of the Globus project (www.globus.org).

bbftp reference website: http://doc.in2p3.fr/bbftp/
GridFTP reference website:
http://www.globus.org/toolkit/docs/4.0/data/gridftp/

<top

 

We would like to know if the data is distributed to the tiers by quantity or is it distributed by quality?And if it is distributed by quality, what characteristics are it based on?

The data from the LHC experiments is distributed to the collaborating computer centres in the following way. A copy of the raw data from the detector is distributed amongst the 11 large Tier 1 centres, as well as being stored at CERN. In this way we ensure that there are 2 copies of the original data from the experiments. Initial analysis of that data happens at CERN and at the Tier 1 centres, and produces smaller data sets which are used by the physicists to perform a variety of different analyses. These smaller data sets are distributed from the Tier 1 centres to the so-called Tier 2 centres of which there are about 150. In this way the data is made available and accessible to the several thousand physicists participating in the experiments. The answer to your question then, is really that the distribution to the various Tiers depends on the stage of processing of the data and not specifically on quality or quantity measures.

<top