Files and directories
After this chapter, the students can move around in a computer’s directory system using command-line commands and understand the relation between its visualisations in the graphical interface and on the command-line line. They can also explain why moving a file/directory from one place to another is computationally different from copying a file/directory from one place to another, and can rationalise when it is better to create hard or symbolic links to a file instead of multiple copies of the same file.
Computers show the data in a structured file system consisting of directories and files. This is not how the data are actually stored in the computer’s storage systems, and the file system is just a layer that makes locating the data easier for the operating system and the human using it. For efficient use of computers, it is useful to understand the basics of how the computers store the data.
Moving around and listing the contents
Directories can be seen as boxes inside other boxes, each able to contain either files or other directories. Moving between directories may be easier to understand if one thinks them as actual objects with addresses, like houses in a village. Let’s assume that we have a village with three houses, each having four rooms, and a shopping centre with two shops:
village/
├── house1
│ ├── bedroom
│ ├── kitchen
│ ├── livingroom
│ └── office
├── house2
│ ├── bedroom
│ ├── kitchen
│ ├── livingroom
│ └── office
├── house3
│ ├── bedroom
│ ├── kitchen
│ ├── livingroom
│ └── office
└── shoppingcentre
├── bookstore
└── grocerystore
On Unix, some characters (such $
, {
, }
etc.) have special meanings and should not be used in the file or directory names. Secondly, the Unix commands consist of the program name (such as ls
) and its arguments (e.g., -l house1 house2
) which are separated by spaces (the full command would thus be ls -l house1 house2
). As the space is a separator, it is a bad practice to use spaces in the file and directory names: one can use them but that makes life unnecessarily hard.
Above, I have simply left the spaces out and written e.g. shoppingcentre
. A common practice is to replace space with the underscore and write shopping_centre
or use the camel-case and write shoppingCentre
or ShoppingCentre
. It is a good practice to use only alphabet letters (a-z
, A-Z
), numbers (0-9
), underscores (_
) and dots (.
) in the file and directory names. The operating system doesn’t care about the file endings but for us humans, it is good to use informative endings that follow common naming practices: .txt
for text, .csv
for comma-separated values etc.
We can get “in front of” this village with the command
> cd ~/IntSciCom/village
Here, cd
means change to directory and the text after it gives the path (or address) of the directory starting from the home directory, short-handed with ~
. We can list the contents of the directory with the command ls
:
> ls
house1 house2 house3 shoppingcentre
we see the three houses and the shopping centre. We don’t need to enter the directory to see its contents but can list it by giving its name as an argument to the ls
command:
> ls house1
bedroom kitchen livingroom office
The forward slashes (/
) separate the different levels (directories) in the file path. If the path ends at a directory (and the path could thus be extended with more directories or a file), one can either write a trailing slash or leave it out. This command is equivalent to one above:
> ls house1/
bedroom kitchen livingroom office
In fact, we can list the contents of all three directories at once either by giving the name of each directory in the command:
> ls house1 house2 house3
house1:
bedroom kitchen livingroom office
house2:
bedroom kitchen livingroom office
house3:
bedroom kitchen livingroom office
or using wildcards that match multiple different characters:
> ls house?
house1:
bedroom kitchen livingroom office
house2:
bedroom kitchen livingroom office
house3:
bedroom kitchen livingroom office
Here, ?
means any single character. Another often used wildcard is *
and it matches either ‘nothing’, ‘any character’ or ‘any combination of multiple characters’. We can see that *
matches everything with the command:
> ls *
house1:
bedroom kitchen livingroom office
house2:
bedroom kitchen livingroom office
house3:
bedroom kitchen livingroom office
shoppingcentre:
bookstore grocerystore
Efficient use of wildcards is a core skill in command-line working but one has to be aware of their dangerous sides as well. Let’s assume that we would like to remove the directory house1/
and all its content. One way to do that would be:
> rm -r house1/*
Technically the asterisk is unnecessary in the command, but it is nevertheless a valid command for the task and would remove the directory. However, if we make a typo and insert a space before the asterisk:
> rm -r house1/ *
we get a warning rm: cannot remove 'house1': No such file or directory
and will find out that all directories have been removed! Here, the asterisk matched every directory and deleted them; then trying to remove house1
gives the error that no such directory exists any more.
For a beginner (and even for more experienced users), it’s a good practice to test the result of wildcard expansion with a safer command command first. One could first give the command
> ls -R house1/*
and check that these are indeed the files and directories that one wants to remove. If they are, one can then replace ls
with rm -r
in the command.
The Linux command-line environment has a built-in documentation system that can be accessed with the command man <prog_name>
. For example, the command:
> man rm
reveals that rm
is a program to “remove files or directories” and that its argument -r
removes “directories and their contents recursively”. Press “q” to quit and then find out the meaning of the argument in ls -R
.
Absolute and relative file paths
We can move directly into a directory inside another directory by giving its full path:
> cd house1/office/
When moving between directories, one may get confused about the current location. Depending on the system used and its settings, the shell program may tell the name of the directory in the command prompt. One can always print the absolute path to the current directory with the command pwd
(print working directory):
> pwd
/users/username/IntSciCom/village/house1/office
Given that, one could move around by always giving the full path to the target directory. We could move from office
to bedroom
with the command:
> cd /users/$USER/IntSciCom/village/house1/bedroom/
> pwd
/users/username/IntSciCom/village/house1/bedroom
Above, we have used ‘username’ as a part of the file path. That is just a place holder and is actually replaced by one’s own username. Elsewhere, we have used ‘$USER’. That is a variable that holds the user’s username and, when executed, is replaced by that. Because of that, the command containing the variable should work for every user although the real file path is different for each of us
This looks very complicated and there’s a much easier way to refer to the parent directory – or one step backwards on the path. One step backwards is ..
and these can be combined:
> cd ../kitchen/
> pwd
/users/username/IntSciCom/village/house1/kitchen
> cd ../../house2/office/../livingroom/
> pwd
/users/username/IntSciCom/village/house2/livingroom
Here, the second cd
command moves out of kitchen (../
); then out of house1 (../
); then into house2 (house2/
) and into office (office/
); then out of office (../
); and finally into livingroom (livingroom
). It is of course unnecessary to go first to office, come out of there (..
), and then go to livingroom, and one would normally go directly to the correct destination. This detour is shown just to demonstrate that it can be done.
As ..
means one step backwards on the path, .
means this directory:
> cd ..
> pwd
/users/username/IntSciCom/village/house2/
> ls .
bedroom kitchen livingroom office
> ls ..
house1 house2 house3 shoppingcentre
I have a UH Linux computer (running Cubbli, a variant of Ubuntu Linux) and on my system, the root of the file system looks like this:
> ls /
bin cdrom cubbli22-gold etc lib lib64 lost+found mnt proc run snap sys usr
boot cs dev home lib32 libx32 media opt root sbin srv tmp var
One doesn’t need to care about this except for the path symbol /
that represents the root of the file structure tree.
The work and life of a regular Linux user typically happens in the branch starting with /home
or /users
. (On my computer it is call /home
but on the CSC computers it is called /users
and we stick to that now.) In that directory, each user then has a directory of their own, known as home directory, and only they can see and manipulate the files in that directory. On the UH computers, this personal directory is named after the account name, used for email and other things in the UH systems. For the user called “fakename”, the home directory would be /users/fakename
(or /home/fakename
on UH Cubbli).
As the home directory is so important, there are shorthands that help to use it. The command:
cd
(with no additional arguments) changes to the home directory. The symbol ~
(called tilde) is a shorthand for the home directory:
cd ~
is equivalent to the previous command.
Similarly,
ls ~/
is equivalent to
ls /users/$USER/
Here, the variable $USER
holds the username of the current user, and that is substituted differently in the commands of different users, e.g., as /users/fakename/
.
On MacOS, the root of the file system is /
and the home directory is /Users/fakename
. Windows has no concept of a root; the home directory is typically C:\Users\fakename
.
Reading files
Let’s move back to house1
:
> cd ~/IntSciCom/village/house1
We learn that bedroom
contains a directory called notebook
and that contains two files:
> ls bedroom/
notebook
> ls bedroom/notebook/
Shakespeare_Hamlet.txt Shakespeare_Macbeth.txt
The end .txt
suggests (but doesn’t guarantee) that they are text files. We get some information e.g. with the commands ls -l
(where -l
indicates the long format), file
and wc
(meaning word count):
> ls -l bedroom/notebook/
total 8
-rw-rw---- 1 username pepr_username 56 Mar 6 15:48 Shakespeare_Hamlet.txt
-rw-rw---- 1 username pepr_username 18 Mar 6 15:48 Shakespeare_Macbeth.txt
> file bedroom/notebook/Shakespeare_Hamlet.txt
bedroom/notebook/Shakespeare_Hamlet.txt: ASCII text
> wc bedroom/notebook/Shakespeare_Hamlet.txt
1 12 56 bedroom/notebook/Shakespeare_Hamlet.txt
On the output of ls -l
, the first character -
means that it is a regular file;the line would start with d
if the target were a directory. The next three characters indicate what the owner of the file can do it with: rw-
means that one can read the file and write to the file but not execute it (thus, it is not a program that runs and does something); the following three characters tell the permissions of the group members (read and write, no execute) and then all other users (here ---
; we’ll revisit these later). Then comes the owner of the file and their group, and finally the date when the file was last modified; the numbers 56
and 18
in-between give the sizes of the files, 56 and 18 bytes (in this case 56 and 18 characters).
The command file
determines the file type and here says that it is “ASCII text” (consisting of standard characters) that we can easily read.
The command wc
tells that the file consists of one row (in fact, it has one newline character ending the row), 12 words and 56 characters. If one would like to get just one or two of the counts, we could specify that with additional arguments:
> wc -w bedroom/notebook/Shakespeare_Hamlet.txt
12 bedroom/notebook/Shakespeare_Hamlet.txt
See man wc
to find the other arguments. (Press q
to quit reading the manual.)
As we learned that the files are small text files, we can safely look at them more closely. cat
(meaning catenate) is the most basic command to read and print the contents of files. The name of the command comes from its usage for concatenating the contents of files into new files:
> cd bedroom/notebook
> cat Shakespeare_Hamlet.txt Shakespeare_Macbeth.txt > my_notes.txt
Here, > my_notes.txt
means that the output of concatenation is written to a new file called my_notes.txt
. The properties of this new file are quite predictable given the input:
> ls -l my_notes.txt
-rw-rw---- 1 username pepr_username 74 Mar 6 15:55 my_notes.txt
> file my_notes.txt
my_notes.txt: ASCII text
> wc my_notes.txt
2 15 74 my_notes.txt
If we do not direct the contents of the concatenation command into a new file, it is printed on the screen:
> cat Shakespeare_Hamlet.txt
A story about Danish bloke (and his dad who is a ghost)
cat
is an important command but it is not practical for reading all text files. To see that, we can go to the directory bookshelf
inside the directory office
:
> cd ../../office/bookshelf/
If that fails for some reason, you can get there also through the full path:
> cd ~/IntSciCom/village/house1/office/bookshelf/
Now, we can see that the files are much larger:
> ls -l
total 332
-rw-rw---- 1 username pepr_username 206763 Mar 6 15:48 Shakespeare_Hamlet.txt
-rw-rw---- 1 username pepr_username 130397 Mar 6 15:48 Shakespeare_Macbeth.txt
> file *
Shakespeare_Hamlet.txt: Unicode text, UTF-8 (with BOM) text, with CRLF line terminators
Shakespeare_Macbeth.txt: Unicode text, UTF-8 (with BOM) text, with CRLF line terminators
> wc *
7079 34988 206763 Shakespeare_Hamlet.txt
4544 21427 130397 Shakespeare_Macbeth.txt
11623 56415 337160 total
Starting from the bottom, we see that the files have 7079 and 4454 lines and altogether 337,160 characters. They are text files using the Unicode characters and CRLF as the end-of-line mark, revealing that they were created on a Windows system.
We could print the contents of the file with cat
but it is impossible to read the text as fast as it scrolls on the screen:
> cat Shakespeare_Hamlet.txt
(too much to show...)
To make readable, we pipe the output cat
to the program less
:
> cat Shakespeare_Hamlet.txt | less
You can now scroll up and down with arrow keys, press space for the next screenful of text and finally quit reading the text with “q”.
If you missed it above and seem to have got stuck on less
, press the key q on your keyboard to quit the program.
In fact, we don’t need two programs to read a file but can do it directly with less
:
> less Shakespeare_Hamlet.txt
less
can do much more than paginate the text and has, for example, a built-in text search function. If one opens the Hamlet file (the command above), one can then start the search with the character /
and write the text after that, e.g. /To be
. This finds the first occurence of those words; one can keep moving to the next occurence with n
, browse around using arrows etc., or quit with q
.
cat
, more
and less
The cat
command is unsuitable for reading long texts and very early programs were developed to show one screenful of text at a time. One of these programs was more
, the name coming from the text such as --More--(1%)
appearing at the bottom of the screen. The text here means that one has seen 1% of the contents and one can see more by pressing the spacebar. The early more
was very simple and could only go forward in the text. Someone developed a better program for the same task and, instead of calling it something like better-more
named it as less
.
Nowadays, less
is a massive program and, in addition of just text, may read compresssed text files, pdf documents and many more formats. As shown below, the less
manual has 1510 lines of text and it takes time to learn all its features:
> man less | wc
1510 12603 87719
However, the standard behaviour is sufficient for most tasks.
cat filename | less
?
Why can’t we always use less
for reading the files?
The two programs have their own strengths and it’s the Unix-way to combine different tools to get the intended outcome. For example, above we learned that the files have CRLF as the end-of-line mark. Linux typically reads them fine (as here), but specific programs may be more picky and wrong end-of-line characters may cause frustrating problems. cat
has argument -A
to show all characters, also those not typically printed as visible characters (these include e.g., space, backspace, tab and newline). Combining cat
and less
, we can do:
> cat -A Shakespeare_Hamlet.txt | less
and see that every line ends with ^M$
, meaning CR and LF. In comparison, the file that we created above:
> cat -A ../../bedroom/notebook/my_notes.txt | less
has lines ending with $
, meaning LF only, as is the standard on Unix systems. The combination cat -A filename | less
can be useful when things do not work as intended and one suspects that the text file could have something wrong. These commands reveal e.g. the difference between a TAB character (shown as ^I
) and multiple spaces.
And what do CR and LF stand for? In brief, computer systems evolved from mechanical devices and writing text evolved from mechanical typewriters. On those, starting a new line required returning the type element to the beginning of the line (carriage return, or CR) and then moving the paper up to the next line (line feed, or LF). Different operating systems then adopted CR, LF or CR+LF to indicate a newline in text files. Unix (LF) and Windows (CR+LF) use different end-of-line characters.
Yes, LF and CRLF are characters although we don’t typically see anything visible on the screen. Computer languages typically use \n
to mark the newline (depending on the operating system, either LF or CRLF) and the same applies also for bash. The command:
> echo -e "First line.\nSecond line."
First line.
Second line.
has one continuous string (within double quotes) as an argument but then prints the text on two lines. The control character \n
is executed and, as a result, the writing moves to the beginning of the next line. Similarly, a multi-line text file would be stored in memory as a long sequence of characters and its true appearance could only be seen when it’s printed out and all the control characters are converted to their true form.
Moving and copying files
In this example, the directories have descriptive names (such village, house1 and bedroom) but they are actually all technically similar and the whole house2
could be moved into notebook
. Directories can hold many sub-directories and files (on a typical Linux system, a directory can hold approximately 4 billion files), but it is rarely practical and efficient to have lots of files and sub-directories inside one directory. It is easier for both humans and the computer operating system to find information when it is placed in structured directory systems consisting of several layers of sub-directories.
The directories can naturally hold files of different sizes, on a Linux system from 0 to (\(2^{44}-1\)) characters in size (that is, up to 16 terabytes or 16,000 gigabytes in size). One of the reasons for directories to be able to hold so large files is that the files are actually not held inside the directory, but the directory only contains a link to the actual data held elsewhere in the storage device.
We can clarify this with an example. Here we have our village, the buildings, the rooms, the bookshelf and finally the book files:
The directory names (and their contents) and the file names are stored in the storage device (SSD, hard drive, USB disk etc.) but they take little space; the files (here, the books) are pointers to locations in the storage device and the actual data are stored there, shown as solid blue and red blocks.
Given that, moving a file from one place to another is an easy operation and only needs to move the pointer. So, starting from this:
> tree house[12]/office/
house1/office/
└── bookshelf
├── Shakespeare_Hamlet.txt
└── Shakespeare_Macbeth.txt
house2/office/
└── bookshelf
the command:
> cd /users/$USER/IntSciCom/village
> mv house1/office/bookshelf/Shakespeare_Macbeth.txt house2/office/bookshelf/Shakespeare_Macbeth.txt
moves the book file to the bookshelf in the office of house2:
> tree house[12]/office/
house1/office/
└── bookshelf
└── Shakespeare_Hamlet.txt
house2/office/
└── bookshelf
└── Shakespeare_Macbeth.txt
Even if the book file is huge in size, the operation would be easy as the actual data are not moved anywhere, just the pointer to the data. The situation is very different if one copies the file:
> cp house2/office/bookshelf/Shakespeare_Macbeth.txt house1/office/bookshelf/Shakespeare_Macbeth.txt
as now the same data are stored twice and, depending on the size of the file, writing the copy can take a long time:
> tree house[12]/office/
house1/office/
└── bookshelf
├── Shakespeare_Hamlet.txt
└── Shakespeare_Macbeth.txt
house2/office/
└── bookshelf
└── Shakespeare_Macbeth.txt
Note that the behaviour is different when the directories are located on different storage devices (or on different partitions, to be precise), such as on the computer’s main storage device and a USB drive. Then, the data is first copied to the new location and then deleted from the old location (shown here with pale colours and dotted lines):
Often, it is practical to have the same data available in different places and this can be achieved without making multiple copies of it: one can make links to files and directories elsewhere in the same storage device. These links can be either hard or soft, also known as symbolic. A hard link creates a new pointer to the same location on the storage device:
> rm house2/office/bookshelf/Shakespeare_Macbeth.txt
> ln house1/office/bookshelf/Shakespeare_Macbeth.txt house2/office/bookshelf/Shakespeare_Macbeth.txt
> tree house[12]/office/
house1/office/
└── bookshelf
├── Shakespeare_Hamlet.txt
└── Shakespeare_Macbeth.txt
house2/office/
└── bookshelf
└── Shakespeare_Macbeth.txt
> ls -l house[12]/office/bookshelf
total 332
-rw-rw---- 1 username pepr_username 206763 Mar 6 15:48 Shakespeare_Hamlet.txt
-rw-rw---- 1 username pepr_username 130397 Mar 6 15:59 Shakespeare_Macbeth.txt
house2/office/bookshelf:
total 0
lrwxrwxrwx 1 username pepr_username 56 Mar 6 16:03 Shakespeare_Macbeth.txt -> ../../../house1/office/bookshelf/Shakespeare_Macbeth.txt
It would seem natural to prefer this option of hard-linking but it has some downsides: typically we want to be able to remove files (and directories) and free the space in the storage device for new data. In this situation, removing the Macbeth book in house1 wouldn’t free the space on the storage device because the Macbeth book in house2 still points to that data. The space would be only freed when both pointers are removed.
How do we know how many hard links a particular piece of data has? Those with sharp eyes may have spotted that the ls -l
command above prints -rw------- 1
for Hamlet but -rw------- 2
for Macbeth. The numbers 1 and 2 tell the count of pointers to that particular data.
Instead of hard links, it is often practical to create symbolic links. Creating them is often easiest from the place where one wants to see the linked file; one can then give the target of the link using the relative path, that ..
for each step backwards in the path (here, omitting the link name and thus using the target name):
> rm house2/office/bookshelf/Shakespeare_Macbeth.txt
> cd house2/office/bookshelf/
> ln -s ../../../house1/office/bookshelf/Shakespeare_Macbeth.txt
> ls
Shakespeare_Macbeth.txt
> cd ../../../
> tree house[12]/office/
house1/office/
└── bookshelf
├── Shakespeare_Hamlet.txt
└── Shakespeare_Macbeth.txt
house2/office/
└── bookshelf
└── Shakespeare_Macbeth.txt -> ../../../house1/office/bookshelf/Shakespeare_Macbeth.txt
Alternatively, symbolic links can be created using the absolute path (here, renaming the link as Another_Macbeth.txt
)
> cd house2/office/bookshelf/
> ln -s /users/$USER/IntSciCom/village/house1/office/bookshelf/Shakespeare_Macbeth.txt Another_Macbeth.txt
> ls
Another_Macbeth.txt Shakespeare_Macbeth.txt
They can also be created outside the target directory:
> cd ../../../
> ln -s /users/$USER/IntSciCom/village/house1/office/bookshelf/Shakespeare_Macbeth.txt house2/office/bookshelf/Third_Macbeth.txt
Symbolic links are clearly indicated in the output of ls -l
command, but otherwise, they work (nearly) as any regular files:
> cd /users/$USER/IntSciCom/village
> ls -l house[12]/office/bookshelf/
house1/office/bookshelf/:
total 332
-rw-rw---- 1 username pepr_username 206763 Mar 6 15:48 Shakespeare_Hamlet.txt
-rw-rw---- 1 username pepr_username 130397 Mar 6 15:59 Shakespeare_Macbeth.txt
house2/office/bookshelf/:
total 8
lrwxrwxrwx 1 username pepr_username 90 Mar 6 16:12 Another_Macbeth.txt -> /users/username/IntSciCom/village/house1/office/bookshelf/Shakespeare_Macbeth.txt
lrwxrwxrwx 1 username pepr_username 56 Mar 6 16:03 Shakespeare_Macbeth.txt -> ../../../house1/office/bookshelf/Shakespeare_Macbeth.txt
lrwxrwxrwx 1 username pepr_username 90 Mar 6 16:14 Third_Macbeth.txt -> /users/username/IntSciCom/village/house1/office/bookshelf/Shakespeare_Macbeth.txt
> wc -l house[12]/office/bookshelf/*
7079 house1/office/bookshelf/Shakespeare_Hamlet.txt
4544 house1/office/bookshelf/Shakespeare_Macbeth.txt
4544 house2/office/bookshelf/Another_Macbeth.txt
4544 house2/office/bookshelf/Shakespeare_Macbeth.txt
4544 house2/office/bookshelf/Third_Macbeth.txt
25255 total
So, why would one want to use symbolic links instead of hard links?
One good reason is that symbolic links break when the target file is removed. This is useful if one, e.g., has links to a particular dataset in different places and then needs to update this dataset with a new one (e.g., because of errors in the previous version or having additional observations in the new version). When this is done with symbolic links, the removal of the original file is noticed and the erroneous reference to an outdated data file cannot create errors in the analysis in other directories.
Copies of files or symbolic links?
In data analysis, one should be careful when editing the primary source file and typically it is better to work on a copy of the original data. For that, one clearly needs the cp
command – or a combination of edit commands that take the original file as the input and produce the modified file as the output. We’ll see examples of that in the next section.
Symbolic links can point to a single file or to a directory. It may take time to learn to use the symbolic links efficiently, but it is good to understand the concept and recognise them if working with non-native data. One can do without symbolic links and always give the full path of the files and directories. However, that can be overly complicated and introduces “hard-coded” file paths in the analysis pipelines which then make the re-use of the code files unnecessarily difficult.
Symbolic links are especially useful if the data files are very large and one has the tendency to make multiple copies of the same file in different places in the file system. Then having symbolic links that point to the one original copy saves disk space. Similarly, symbolic links are useful if the data files change over time: one may forget to update on or more copies of the file and different parts of the analyses end up using different versions of the data. Having one central copy and many pointers to that prevents the error.
If the target of symbolic links is removed, the links to the target won’t work. One can avoid the effort of recreating the links after each update of the target by using a chain of symbolic links. We could have a setup like this:
> tree
.
├── dir1
│ ├── data_v1.csv
│ └── my_data.csv -> data_v1.csv
├── dir2
│ └── my_data.csv -> ../dir1/my_data.csv
└── dir3
└── my_data.csv -> ../dir1/my_data.csv
Here, the actual data (data_v1.csv
) is in dir1
and the file my_data.csv
is a symbolic link to that. In the directories dir2
and dir3
we have a file my_data.csv
that points to ../dir1/my_data.csv
and thus to the file data_v1.csv
.
Now, an updated data file data_v2.csv
is obtained. One can start using this everywhere by removing the file dir1/my_data.csv
(with the command rm dir1/my_data.csv
) and recreating it as a symbolic link to data_v2.csv
; the symbolic links in dir2
and dir3
then link to data_v2.csv
as well:
> tree
.
├── dir1
│ ├── data_v1.csv
│ ├── data_v2.csv
│ └── my_data.csv -> data_v2.csv
├── dir2
│ └── my_data.csv -> ../dir1/my_data.csv
└── dir3
└── my_data.csv -> ../dir1/my_data.csv
On computer systems, the hierarchically arranged directories are the “address system” for the data files. The “files” we see in the directories are “labels” or “pointers” to the actual data in the storage device; because of that, they can be easily changed or moved around without any effect on the (possibly very large) data. Copying of files from one place to another is computationally different from (and possibly more time-consuming than) moving of files (or their “labels”) in the directory hierarchy. When working with very large data files or with data files that may change and the updates need to be applied on every copy, it is useful to use symbolic links.