Alternative Dimensional Reduction Methods on the Example of Data Preprocessing to Build a Ship Exhaust Model
Purpose: Systems modeling is one of the basic research methods in scientific human activity. It should also be mentioned that the modeled systems are often multidimensional, which is an additional serious obstacle. Richard Bellman formulated the concept of the "dimensional curse" which says that as the dimensionality of a system increases, difficulties increase geometrically. In this article, the authors consider the problem of dimensionality reduction in order to build a model of the ship's exhaust emissions. It was observed that some data are incomplete or have little impact on the baseline variables. Design/methodology/approach: Two methods of dimensionality reduction were applied (Pearson's linear correlation index and arc-angle index) and their suitability for this process was discussed. Findings: Thanks to the methods used, it was possible to obtain information on the significance level of each of the model inputs. In addition, a lot depends on context, data availability, and much more. In any case, it is worth doing research in this direction. Practical implications: We deal with modeling mainly in cases when we need to rely on measurement data, and the modeled system itself is unknown to us. The only thing the researcher has at his disposal is a certain set of measurement data, which very often lacks metadata. Nevertheless, even the basic information about input and output data allows you to create a model. However, the problem may be too much data. Although their storage itself is nothing difficult at present, the same sending them using means of communication (e.g. sending via the Internet) may already be troublesome. Originality value: Typically, in the analysis of significance, methods that take into account the value of variance are used. Such a method is, for example, PCA (principal component analysis) (Sorzano 2014, Scholkopf 1997). The originality of the approach described in the article, however, consists in building a ranking of the significance of individual variables.