File Processing Systems
Even the earliest business computer systems were used to process business records and produce information. They were generally faster and more accurate than equivalent manual systems. These systems stored groups of records in separate files, and so they were called file processing systems. Although file processing systems are a great improvement over manual systems, they do have the following limitations:
Data is separated and isolated.
Data is often duplicated.
Application programs are dependent on file formats.
It is difficult to represent complex objects using file processing systems. Data is separate and isolated. Recall that as the marketing manager you needed to relate sales data to customer data. Somehow you need to extract data from both the CUSTOMER and ORDER files and combine it into a single file for processing. To do this, computer programmers determine which parts of each of the files are needed. Then they determine how the files are related to one another, and finally they coordinate the processing of the files so the correct data is extracted. This data is then used to produce the information. Imagine the problems of extracting data from ten or fifteen files instead of just two! Data is often duplicated. In the record club example, a member’s name, address, and membership number are stored in both files. Although this duplicate data wastes a small amount of file space, that is not the most serious problem with duplicate data. The major problem concerns data integrity. A collection of data has integrity if the data is logically consistent. This means, in part, that duplicated data items agree with one another. Poor data integrity often develops in file processing systems. If a member were to change his or her name or address, then all files containing that data need to be updated. The danger lies in the risk that all files might not be updated, causing discrepancies between the files. Data integrity problems are serious. If data items differ, inconsistent results will be produced. A report from one application might disagree with a report from another application. At least one of them will be incorrect, but who can tell which one? When this occurs, the credibility of the stored data comes into question. Application programs are dependent on file formats. In file processing systems, the physical formats of files and records are entered in the application programs that process the files. In COBOL, for example, file formats are written in the DATA DIVISION. The problem with this arrangement is that changes in file formats result in program updates. For example, if the Customer record were modified to expand the ZIP Code field from five to nine digits, all programs that use the Customer record need to be modified, even if they do not use the ZIP Code field. There might be twenty programs that process the CUSTOMER file. A change like this one means that a programmer needs to identify all the affected programs, then modify and retest them. This is both time consuming and error-prone. It is also very frustrating to have to modify programs that do not even use the field whose format changed. It is difficult to represent complex objects using file processing systems. This last weakness of file processing systems may seem a bit theoretical, but it is an important shortcoming.