1. Introduction

1.1 Yet another UNIX guide ?

The aim of this course is to provide a very gentle introduction to the UNIX operating system for those who are more used to working in graphical computer environments such as Microsoft Windows or Mac OS and who are therefore unfamiliar with UNIX. An internet search will quickly reveal a multitude of existing material on a similar theme so why re-invent the wheel here ? The motivation is to gather together in one place all of the information needed to allow users to get started on local UNIX systems at Liverpool and, in particular, the chadwick HPC cluster. Clearly this requires specific local knowledge that is not available in other guides. Futhermore this course aims to provide users with only the basic understanding that they need to get started with UNIX and no more. Many courses dive straight into fairly advanced topics such as process control, pipes and output redirection which, although they are fundamental to UNIX, are not required to begin with and may scare off novices.

If you have only ever used operating systems such as Windows and Mac OS on personal computers and laptops etc, the UNIX operating system may at first sight seem slightly forbidding and difficult to master. You may even wonder whether it is actually worth the trouble of learning at all ! If you are researcher who works in an area which requires significant amounts of computation (e.g. bio-informatics/statistics, computational chemistry/biology, physics and engineering etc) then the answer to this is a resounding YES ! Many specialised research computing systems make use of UNIX including most of the world's supercomputers and a large amount of freely available research software has been developed either solely or primariy for UNIX. UNIX also provides unrivalled facilites for manipulating, searching and sorting data that are useful in many disciplines (particularly the data-intensive analyses found in life- and biosciences).

The course starts with a brief historical perspective. If you are in a hurry, you can skip over this section and go straight to the local introduction section. Knowing where UNIX came from however can help you understand why it works the way it does and how it became to be so popular. We then move on to describe how to login to local UNIX systems and how to enter (and edit) some simple commands. Understanding the UNIX filesystem is the next topic since this is fundamental to the rest of the course and with that out of the way we'll move on to explaining how to manipulate files and directories (folders) in some detail. We'll also have a look at controlling how information is presented by the various UNIX commands, find out where to go for online help and look at how to transfer files to and from a UNIX machine.

The aim here is to provide a very "hands-on" tutorial introduction so please try the examples and excercises as you go along. A theoretical approach to UNIX is likely to be as successful as learning how to ride a bike from a book so dive in and experiment as much as possible. You can expect to make few mistakes along the way but no permanent damage will be done. Maybe you'll even delete a file by accident (it happens to the best of us at one time or other !). Not to worry though - just remember the bike analogy. There might be a few mishaps to begin with but pretty soon you'll be cruising along safely without having to worry about falling off and wondering why you found it so difficult in the first place.

1.2 A brief(ish) history of UNIX

The UNIX operating system has its origins way back in the early 1970s when computers would take up the entire floor of a building and the idea of having a computer on your desk top (not to mention in a mobile phone !) was strictly in the realm of science fiction. These huge mainframe systems were generally used for business applications and their price tag put them far beyound the means of university research groups. The invention of the microprocessor (essentially a computer on a single silicon "chip") in 1971 paved the way for a new generation of smaller machines called mini-computers aimed at supporting a handful of simultaneous users. It was these systems that quickly became adopted by research groups.

Originally all operating systems were created using a very primitive programming language called assembly whose instructions corresponded directly to the operations carried out by the hardware. This was difficult to write, cumbersome and prone to errors. Assembly was also used in the original version of UNIX created in 1969 however, in 1973 it was completely rewritten using the, newly invented C programming language (which is still used today). This turned out to be a landmark event in computing as it meant that the same operating system could be run on different computer hardware without it having to be rewritten from scratch each time. In this way, the operating system could essentially outlive its original hardware thus guaranteeing its long term existence.

Although for many years an American telecomms company called AT&T owned the rights to the UNIX operating system (which was developed at their Bell Research Labs), many other computer companies soon began to produce their own versions of UNIX. Since these did not evolve from a common software base, differences between each vendor's version of UNIX soon started to appear. This led in the late 1980s and early 1990s to major squables over whose version of UNIX should be adopted as the standard one and even to the present day this has still not been fully resolved.

At around the same time, two developments occured which also had a major influence on the development and use of UNIX. The first intitially seemed quite unremarkable - the creation of a new open standard for computer communication called TCP/IP. Until then, computer vendors each adopted their own proprietary networking standards meaning that, for example, an IBM computer could only "talk" to another IBM computer. Vendors quickly incorporated the software underpinning TCP/IP into the UNIX operating system so now any computer could now talk to any other computer running UNIX regardless of who supplied the original system. This lead to an explosion in network connectivity with smaller computer networks becoming part of larger national networks which in turn were part of still larger international ones. These networks of networks were soon to form the backbone of the global Internet which we know today.

The second development came about through work on human-computer intefaces carried out by Hewlett Packard (HP) at their Palo Alto Research Centre in the early 1970s. Previously, the most efficient way of using a computer was to type in a series of slightly cryptic looking commands and wait for the computer to respond - possibly with something equally as cryptic (if this sounds bad, the previous method was to punch holes in pieces of card for the computer to read !). Obviously something more user-friendly was needed if computing was ever to break into the mainstream and HP hit up on the, then novel, idea of the user being a able to interact with the computer by moving a device (later called a mouse) to point at icons on a graphical display. This was a revolutionary idea but unfortunately ahead of its time. Few computer installations had terminals that supported graphical displays, almost none had a mouse interface and in any case there was insufficient processing power to support such a complex operating system interface.

The so-called WIMP (windows, icons, mice and pointers) interface may have languished as a research curiosity had it not been for work carried out at MIT in 1984. By now UNIX was widely adopted and many users accessed central UNIX systems through their own desktop machines called workstations (also running UNIX). In contrast to the "dumb" terminals used in mainframe installations these workstations had significant computing power and the ability to display complicated graphics. MIT realised that the complex "number crunching" required in many scientific applications could be done on a powerful central system and results displayed locally on a workstation. The whole thing could be tied together by a graphical operating system interface similar to that developed by HP. Analagous to the TCP/IP network standard, communication between workstation and the remote system would conform to an open standard so that for example a HP workstation could be used to access an IBM central server. The overall system was dubbed the X Windows standard and was widely adopted by UNIX vendors.

X Windows was a conservative standard which did not specify how the graphics windows and icons etc themselves were displayed or how people would use them. This allowed vendors free rein to develop their own window managers. Attempts to establish a standard graphical interface (similar to say Windows) failed and soon UNIX vendors where more or less at war over which was the "one true UNIX". This lack of co-operation between vendors hampered the development of a truely sophisticated graphical interface for UNIX and to this day it is fair to say that the UNIX graphical environments lag a long way behind their Microsoft and Apple counterparts in terms of usability.

The next character to appear in the UNIX story arrived in 1981 in the guise of the IBM Personal Computer. This was not the first personal computer designed to be used by a single user which would sit comfortably on an office desk but it quickly became the most popular with many other manufacturers making IBM PC clones thanks to IBM's open licensing policy. Initial machines where extremely expensive by modern standards but improving technology and enconomies of scale quickly brought down prices to a point where they were affordable by non-business users including university researchers.

The original operating system for the IBM PC was a command line affair created by Microsoft in 1981 and called DOS (Disk Operating System). This was very primitive compared to UNIX but pretty much all that the original hardware could support. Rapid increases in microprocessor power and the development of special graphics processors though allowed Microsoft produce a graphical operating system interface for the PC which was launched as Windows 1.0 in 1985. This was fairly "clunky" and unreliable by modern standards but advances in software and hardware eventually led to a sophisticated operating system which was fairly easy and intuitive to use. As had happened with UNIX, Microsoft (eventually) incorporated TCP/IP networking so that the PC could now "talk" to other computers such servers, mainframes and research systems (and by extension anything on the global Internet).

Soon the graphics capabilities of PCs matched those of the original UNIX workstations and the development of special software (such as eXceed) allowed users to login to central UNIX machines, interact with them using X Windows and display graphical results locally on the PC. Since a PC was generally a fraction of the cost of a UNIX workstation and could perform other tasks such a word processing, users quite reasonably started to wonder about the logic of buying a UNIX workstation. This being the case, UNIX workstations pretty much disappeared from desktops save for specialised use in research labs and UNIX disappeared into computer server rooms rooms to be replaced by Microsoft operating systems on the desktop.

And that would have been almost the end of the UNIX story had it not been for one final character - a Finnish software engineer called Linus Torvalds. For reasons best known to himself, Torvalds in 1991 decided to create from scratch his very own version of UNIX for the PC (strictly speaking, he only wrote the core of the operating system called the "kernel" but we'll keep it simple here). This, with a certain amount of self-promotion, he called Linux (pronounced as in "Lynne Ucks"). More remarkable than this was that he made the Linux software freely available and modifiable to anyone interested - a move that would have seemed like commercial suicide to most computer companies at the time.

The fact that anyone could contribute to Linux quickly allowed bugs to be spotted and removed and improvements made to the operating system to the benefit of all. Initial interest in Linux was confined to a fairly small group of amateur computer enthusiasts ("geeks" might be a less flattering description) however, large computer companies soon started to realise the commercial benefits of using Linux (not least lower development costs) and started supporting it on their own systems.

Other companies also provided their own versions of Linux with support packages thrown in (at cost). Today Linux is used in a huge variety of systems from large supercomputers, through web servers to PCs and even mobile phones in the form of the Android operating system. A freely available version of UNIX also forms the basis of the Mac OS X operating system and its descendents. Stricly speaking Linux and UNIX are two separate entities however, there are enough similarities for them to be treated as one for the purposes of this course.

So that's the UNIX story in a bigish nutshell. With UNIX being used on so many systems it may seem strange that it still remains fairly obscure to many people. That however probably has more to do with the enormous ubiquity and pervasiveness of Microsoft Windows which is found on the vast majority of world's computers (but not mobile phones). Hopefully though you will now appreciate why UNIX is so important and why it is useful to be able to use it. Hence this course !

1.3 Local considerations

As indicated earlier, the aim of this guide is to provide just enough information for users new to UNIX to get started on local UNIX systems and the chadwick cluster in particular. In the interests of simplicity, this means that it is not a complete guide to all UNIX systems although most of what is covered here applies more generally. As described in the historical introduction, there are a multitude of slightly different versions of UNIX developed by different computer vendors over the years. There is also a version of UNIX called Linux which although initially developed for the PC is now used on a wide range of platforms from mobile phones to supercomputers.

Originally the only way to use a UNIX system was to type in commands and wait for the computer to display the results in text format - a so-called command line interface. UNIX systems now support a variety graphical interfaces superficially similar to Microsoft Windows or Mac OS however, just to add to the confusion, these graphical interfaces vary from vendor to vendor and even exactly the same operating system will have a variety of graphical environments to chose from. If that wasn't enough variety, UNIX operating systems support a variety different command line interfaces called shells.

All this variety may be starting to give you a headache but DON'T PANIC !!! Most if not all of the information in this guide will apply to every UNIX system you are likely to encounter. To keep things simple we are going to stick to the UNIX command line interface rather than the graphical interfaces. If you are used to using Microsoft Windows etc this may seem like hard work to begin with but stick with it - it turns out that this is very powerful especially when it comes to manipulating large numbers of files (and even large numbers of large files).

Again in the interests of simplicity, we'll stick to chadwick (the important differences between this and other local systems will be added later). This uses the Red Hat Enterprise Linux operating system which to most intents and purposes can be considered a standard UNIX operating system. The command line interface uses something called the bash shell which stands for Bourne Again Shell since it was based on an earlier shell called the Bourne Shell (unfortunately this is an example of typical UNIX humour - you can see why it hasn't caught on in the comedy circuit). The shell is actually a very important part of a UNIX system but we won't go into much detail about it here. In fact the only reason for mentioning the shell is this that other systems may use different shells. In particular, many UNIX textbooks describe the so-called C-shell which is based on the C programming language (more UNIX humour - groan !). This is subtley different to bash so tread carefully.

After that rather length preamble it's time to dive in and start using UNIX in earnest but just before that there may be some additional software you will need to install on your computer first. This is described in the next section.