A field is the basic element of data. It is characterized by its length and data type. The content of a field is provided by a user or a program. Depending on the file design, fields may be of fixed or variable length, case, the field often consists of two or three subfields: the actual vale to be stored, the name of the field, and in some cases the length of the field. In other cases of variable-length fields, the length of the field is indicated by the use of special demarcation symbols between fields. Most file systems do not support variable-length fields.
A record is a collection of related fields that can be treated as a unit by some application programs, depending on design, records may be of fixed or variable length. A record is of variable length if some of its fields are of variable length or if the number of fields may vary. In the latter case, each field is usually accompanied by a field name. In either case, the entire record usually includes a length field.
A file is a collection of similar records. Files have unique file names and may be created and deleted. Restrictions on access control usually apply at the file level.
A database is a collection of related data. A database may contain all the information related to an organization or project. The database itself consists of one or more types of files.
Users and applications wish to make use of files. Typical operations that must be supported include the following.
- Retrieve_All : Retrieve all the records of a file. This operation will be required for an application that must process all the information in the file at one time. For example, an application that produces a summary of the information in the file would need to retrieve all records. This operation is often equated with the term sequential processing because all the records are accessed in sequence.
- Retrieve_One : This operation requires the retrieval of just a single record. Interactive, transaction-oriented applications need this operation.
- Retrieve_Next : This operation requires the retrieval of the record that is “next” in some logical sequence to the most recently retrieved record. Some interactive applications, such as filling in forms, may require such an operation. A program that is performing a search may also use this operation.
- Retrieve_Previous : Similar to retrieve next, but in this case the record that is “previous” to the currently accessed record is retrieved.
- Insert_One : Insert a new record into the file. It may be necessary that the new record fit into a particular position to preserve a sequencing of the file.
- Delete_One : Delete an existing record. Certain linkage or other data structures ma need to be updated to preserve the sequencing of the file.
- Update_One : Retrieve a record, update one or more of its fields, and rewrite the updated record back into the file. Again, it may be necessary to preserve sequencing when using this operation. If the length of the record has changed, the update operation is generally more difficult than if length is preserved.
- Retrieve_Few : Retrieve a number of records. For example, an application or user may wish to retrieve all records that satisfy a certain set of criteria.
- File Management Systems:-
A file management system is that set of system software that provided services to users and applications related to the use of files. Typically, the only way that a user or application may access files is through the file management system.
Objectives of File Management System:
- To meet the data management needs and requirements of the user which include storage of data and the ability to perform the operations listed earlier.
- To guarantee, to the extent possible, that the data in the file are valid.
- To optimize performance, both from the system point of view in terms of overall throughput and from the user’s point of view in terms of response time.
- To provide I/O support for a variety of types of storage device.
- To minimize or eliminate the potential for lost or destroyed data.
- To provide a standardized set of I/O interface routines.
- To provide I/O support for multiple users in the case of multiple-user systems.
The following constitute a minimal set of requirements.
- Each user should be able to create, delete, and change files.
- Each user may have controlled access to other users’ files.
- Each user may control what types of accesses are allowed to the user’s files.
- Each user should be able to restructure the user’s files in a form appropriate to the problem.
- Each user should be able to move data between files.
- Each user should be able to back up and recover the user’s files in case of damage.
- Each user should be able to access the user’s files by a symbolic name.
These objectives and requirements should be kept in mind throughout our discussion of file management systems.
- File System Architecture:-
Different systems will be organized differently, but this organization is reasonably representative. At the lowest level, device drivers communicate directly with peripheral devices or their controllers or channels. A device driver is responsible for starting I/O operations on a device and processing the completion of an I/O request. In file operations, the typical devices controlled are disk and tape drives. Device drivers are usually considered to be part of the operating system.
The next level is referred to as the basic file system, or the physical I/O level, which is the primary interface with the environment outside of the computer system. It deals with blocks of data that are exchanged with disk or tape systems. It does not understand the content of the data or the structure of the file involved. The basic file system is often considered part of the operating system.
The basic I/O supervisor is responsible for all file I/O initiation and termination. At this level control structures are maintained that deal with device I/O, scheduling, and file status. The basic I/O supervisor is concerned with the selection of the device on which file I/O is to be performed, on the basis of which file has been selected. It is also concerned with scheduling disk and tape accesses to optimize performance. I/O buffers are assigned and secondary memory s allocated at this level. The basic I/O supervisor is part of the operating system.
Logical I/O is that part of the file system that allows users and applications to access records. Whereas the basic file system deals with blocks of data, the logical I/O module deals with file records. Logical I/O provides a general-purpose record I/O capability and maintains basic data about files.
Finally, the level of the file system closest to the user is usually termed to the access method. It provides a standard interface between applications and the file systems and devices that hold the data.
- Functions of File Management:-
Users and applications programs interact with the file system by means of commands for creating and deleting files and for performing operations on files. Before performing any operation, the file system must identify and locate the selected file. This requires the use of some sort of directory to describe the location of al files plus their attributes. In addition, most shared systems enforce user access control : only authorized users are allowed o access particular files in particular ways. The basic operations that a user or application may perform on a file are performed at the record level.
Users and applications are concerned with records I/O is done on a block basis. Thus, the records of a file must be blocked for output and unblocked after input. To support block I/O of files, several functions are needed. The secondary storage must be managed. This involves allocating files to free blocks on secondary storage and managing free storage so as to know what blocks are available for new files and growth in existing files. Both disk scheduling and file allocation are concerned with optimizing performance. The optimization will depend on the structure of the files and the access patterns.
Figure, given below, suggest a division between what might be considered the concerns of the file management system as a separate system utility and the concerns of the operating system, with the point of intersection being record processing.
- The Pile:-
The least complicated form of file organization may be termed the pile. Data are collected in the order in which they arrive. Each record consists of one burst of data. The purpose of the pile is simply to accumulate the mass of data and save it. Records ma have different fields, or they may have similar fields in different orders. Thus each field should be self-describing, including a field name as well as a value. The length of each field must be implicitly indicated by delimiters, explicitly included as a subfield, or known as default for that field type.
There is no structure to the pile file, record access is by exhaustive search. If we wish to find a record that contains a particular field with a particular value. It is necessary to examine each record in the pile until the desired record is found or the entire file has been searched. If we wish to find all records that contain a particular field or contain that field with a particular value, then the entire file must be searched.
Pile files are encountered when data are collected and stored before processing or when data are not easy to organize. This type of file uses space well when the stored data vary in size and structure. Pile files are perfectly adequate for exhaustive searches, and are easy to update. However, beyond these limited uses, this type of file is unsuitable for most applications.
- The Sequential File:-
The most common form of file structure is the sequential file. In this type of file, a fixed format is used for records. All records are of the same length, consisting of the same number of fields of fixed length in a particular order. Because the length and position of each field are known, only the values of fields need to be stored; the field name and the length for each field are attributes of the file structure.
One particular field, usually the first field in each record, is referred to as the key field. The key field uniquely identifies the record; thus, key values for different records are always different. The records are stored in key sequence: alphabetical order for a text key and numerical order for a numerical key. Sequential file are typically used in batch applications.
For interactive applications that involve queries or updates of individual records, he sequential file provides poor performance. Access requires the sequential search of the file for a key match. If the entire file, or a large portion of the file, can be brought into main memory at onetime, more efficient search techniques are possible.
Typically, a sequential file is stored in simple sequential ordering of the records within blocks. That is the physical organization of the file on tape or disk directly matches the logical organization of the file. In this case, the usual procedure is to place new records in a separate pile file, called a log file or transaction file. An alternative is to physically organize the sequential file as a linked list. One or more records re stored in each physical block. Each block on disk contains a pointer to the next block.
- The Indexed Sequential File:-
The most popular approach to overcoming the disadvantages of the sequential file is the indexed sequential file. The indexed sequential file maintains the key characteristic of the sequential file : Records are organized in sequence based on a key field. Two features are added : an indeed to the file to support random access and an overflow file. The index provides a lookup capability to quickly reach the vicinity of a desired record. The overflow file is similar to the log file used with a sequential file, but it is integrated so that records in the overflow file are located by following a pointer from their predecessor record.
In the simplest indexed sequential structure, a single level of indexing is used. The index in this case is a simple sequential file. Each record in the index file consists of two fields : a key field, which is the same as the key field in the main file, and a pointer into the main file. To find a specific field, the index is searched to find the highest key value that is equal to or precedes the desired key value. The search continues in the main file at the location indicated by the pointer.
Additions to the file are handled in the following manner. Each record in the main file contains an additional field not visible to the application, which is a pointer to the overflow file, when a new record is to be inserted into the file, it is added to the overflow file. The record in the main file that immediately precedes the new record in logical sequence is updated to contain a pointer to the new record in the overflow file. If the immediately preceding record is itself in the overflow file, then the pointer in that record is updated. As wit the sequential file, the indexed sequential file is occasionally merged with the overflow file in batch mode.
The indexed sequential file greatly reduces the time required to access a single record without sacrificing the sequential nature of the file. To process the entire file sequentially, the records of the main file are processed in sequence until a pointer to the overflow file is found. Then accessing continues in the overflow file until a null pointer is encountered.
- The Indexed File:-
The indexed sequential file retains one limitation of the sequential file: Effective processing is limited to that which is based on a single field of the file. When it is necessary to search for a record on the basis of some other attributer than the key field, both forms of sequential file are inadequate.
To achieve this flexibility, a structure is needed that employs multiple indexes, one for each type of field that may be the subject of a search. In the general indexed file, the concept of sequentially and a single key are abandoned. Records are accessed only through their indexes. The result is that there is now no resection on the placement of records so long as a pointer in at least one index refers to that record.
Two types of indexes are used. An exhaustive index contains one entry for every searching. A partial index contains entries to records where the field of interest exists. With records of variable length, some records will not contain all fields. When a new record is added to the main file, all the index files must be updated. Indexed files are used mostly in applications where timeliness of information is critical and where data are rarely processed exhaustively. Examples are airline reservation systems an inventory control systems.
- The Direct, or Hashed, File:-
The direct, or hashed, file exploits the capability found on disks to directly access any block of a known address. As with sequential and indexed sequential files, a key field is required in each record. However, there is no concept of sequential ordering here. Direct files are often used where very rapid access is required, where records of fixed length are used, and where records are always accessed one at a time. Examples are directories, pricing tables, schedules, and name lists.