Maximum Entropy Distribution Function and Uncertainty Evaluation Criteria

Marine environmental design parameter extrapolation has important applications in marine engineering and coastal disaster prevention. The distribution models used for environmental design parameter usually pass the hypothesis tests in statistical analysis, but the calculation results of different distribution models often vary largely. In this paper, based on the information entropy, the overall uncertainty test criteria were studied for commonly used distributions including Gumbel, Weibull, and Pearson-III distribution. An improved method for parameter estimation of the maximum entropy distribution model is proposed on the basis of moment estimation. The study in this paper shows that the number of sample data and the degree of dispersion are proportional to the information entropy, and the overall uncertainty of the maximum entropy distribution model is minimal compared with other models.


Research background
Design parameter extrapolation models are used for the statistical analysis of the observed data when extrapolating the return level under extreme sea conditions. There are several models that can be used to calculate the design parameters for marine environmental conditions: Weibull distribution is commonly used internationally; Gumbel distribution is used in China as the recommended distribution for extreme tide levels in port engineering codes, and Pearson-III distribution is used as the recommended distribution for those containing meteorological/hydrological elements such as extreme waves in port hydrological codes. For the same set of measured data in the same marine environment, the above three probability distribution models would pass the hypothesis test under the conditions when statistically tested using probability paper, curve-fitting method, and distribution function setting, however, the design parameter values obtained by different extrapolation models can be high or low. If a lower design standard is chosen from the economic point of view, it may bring hidden risks to the marine construction or buildings, and in case of an extreme situation, it is just a matter of missing the big picture; if a more conservative design standard is chosen from the safety point of view, it may increase the project investment and cause unnecessary waste . In this case, there has been no unified standard for selecting a distribution function to be used in the extrapolation of the parameters of the design criteria, so the return level of the extrapolation is subject to large uncertainties.
The reasonable estimation of marine engineering design parameters is closely related to the sampling method of data, the method of determining the distribution function and the accuracy of parameter estimation, etc. The common method is to organize the measured data into groups and model the series with the maximum value in each group as the extreme series, i.e., the so-called Block Maxima Method (BMM). In BMM, Fisher-Tippett's extreme value theorem guarantees that the distribution of maxima within a group must belong to one of the three distributions (Gumbel, Fréchet, and Weibull) or their generalized form, the general-ized extreme value distribution (GEV) Chen et al., 2019a;Liu et al., 2020;Xu et al., 2018). Because of the high cost of acquiring oceanographic measurements, applying only extreme data from groups for analysis would result in a large waste of data, and there may be a situation where one set of data contains more 'extreme' information than others, but is not fully utilized. In order to make fuller use of the extreme value information in the data, one could apply the Generalized Pareto distribution model (GPD) to statistically analyze all data above a certain threshold, which models all data in the observations above a certain larger threshold and is therefore also known as the Peaks-Over-Threshold (POT) model (Liu et al., 2019d;Xu and Lei, 2019). EJ Gumbel evaluated the extreme value theory used in frequency analysis in Statistics of Extremes, stating that extrapolation through extreme value models is a reasonable method for estimating the probability of extreme events (Gumbel, 2011); Muraleedharan et al. (2007) used shallow wave height data from the Arabian Sea to verify that the Weibull distribution can effectively fit multiple wave height statistics including the extreme wave height; in the long-term prediction problem of extreme wind speed, Pavia and O'Brien (2010) used the Weibull distribution to extrapolate the probability distribution of global ocean wind speed data. Lo Brano et al. (2011) also verified that the Pearson-III and Gumbel distributions can reasonably fit the original wind speed data series and also quantified the uncertainty in the frequency analysis. Ma and Liu (1979) proposed a composite extreme value model when studying the design wave height ocean hydrological extreme value distribution in typhoon-affected waters. Dong and Liu (1999) proposed a gray Markov prediction model for annual extreme water levels. Zhang and Xu (2005) proposed a maximum entropy probability distribution function. Wang et al. (2013) proposed a multidimensional composite extreme value distribution model. Chen et al. (2020) studied the statistical characteristics of marine environmental elements from the time and space dimensions in the setting of a stochastic process and introduced the concept of stopping time in the stochastic process into the analysis of storm surges. Obviously, the probability distribution functions such as one-dimensional composite extreme value distribution, maximum entropy distribution, and multidimensional composite extreme value distribution must have different values for the environmental parameters, and the various parameter values obtained by different methods are not very distinctive as in the theoretical studies, but such different impacts will be huge in engineering applications (Wang et al., 2016;Liu et al., 2019c). Adding one foot would increase the investment by millions of dollars.
The existing studies on design wave height estimation methods focus on the selection of different types of distribution functions or the improvement of the estimation methods for the parameters to be set in the distribution functions, but few researchers have studied how to reflect the errors of the calculation results caused by different distribution models or different parameter estimation methods . Liu et al. (1996) analyzed the sensitivity and uncertainty of the reliability of marine engineering structures and introduced the concept of uncertainty into marine engineering. Sun (2002) summarized the calculation of confidence factors of measurement uncertainty for common distributions. Li et al. (2006) discussed the uncertainty of the entropy of multidimensional random variables. Though above researches refer to uncertainty of random variables, either they do not research further about the uncertainty of different parameter estimation methods or they do not analyze the accuracy and stability of the calculated results on the basis of engineering example. Therefore, giving a criterion for evaluating the uncertainty of the design parameter estimation results due to different models and different parameter estimation methods is necessary to improve the accuracy and stability of the design parameter estimation for the marine environment.
The maximum entropy distribution function is derived based on the principle of maximum entropy in information theory, which maintains maximum entropy for unknown information while fitting the known data better, avoiding the subjective interference of artificially selected distribution function to some extent Liu et al., 2019b;Song et al., 2019). The application of the maximum entropy distribution function obtained by setting some axioms and constraints of known facts to project the return level is an advancement in the theoretical approach. The maximum entropy distribution function uses information entropy to measure uncertainty in the construction process. Information entropy is a measurement function with ∩ convexity. It is the functional analysis of the distribution function, which can measure the uncertainty of a system described by a probability, and the entropy value is maximum when the event is of equal probability. The maximum entropy theorem ensures the existence of a unique maximum entropy, that is to say, each random variable in the system corresponds to a unique value of information entropy, and the information entropy of the system as a whole can be obtained through the local measurement of the probability structure, which can describe the overall uncertainty of the system quantitatively, and the most applicable design parameter projection model can be selected in engineering applications according to the calculated overall uncertainty.
In this paper, based on the information entropy on the Gumbel distribution, Weibull distribution, Pearson-III distribution model and other commonly used alternative models, the overall uncertainty test criterion of the distribution function is investigated in terms of the number of data samples, the degree of dispersion and the sampling random error brought about by the estimation of the model para-meters.
Also in this paper, we will improve the parameter estimation method of the maximum entropy distribution model, and carry out the research on the uncertainty evaluation method caused by the differences in the selection of different extreme value distributions, as well as the research on the uncertainty criterion caused by the parameter estimation quantity in the model, so as to derive a nonlinear set of equations for parameter estimation of the maximum entropy function based on the statistical characteristics such as mean, variance, skewness and kurtosis, which can effectively reduce the computation time of parameter solution and make the process more efficient. By using information entropy as a tool to calculate uncertainty, the overall uncertainty of several commonly used alternative models, such as Gumbel distribution, Weibull distribution, and Pearson-III distribution model, is given, and by combining several aspects such as the number of data samples, the degree of dispersion, and the sampling random error brought by the estimation of model parameters, the distribution form of the model, method comparison, selection of the optimization test, and other reference criteria are proposed to provide a more comprehensive test for calculating the overall uncertainty of the distribution function.

Evaluation method for distribution function uncertainty
Since the marine environmental factors considered in this paper are wave heights, which are random variables with positive values, thus the daily wave height observations x 1 , x 2 , … x n are a set of simple random samples, which are independent of each other and have the same distribution F(x) (original distribution). We therefore set the range of values for X to [0, ∞] in the Gumbel distribution, the Weibull distribution, and the maximum entropy distribution. When x < 0, q(x) ≡ 0, then the value range for X is extended to [−∞, ∞]. For the constraint formula , the implication is that the integral is required to be limited, but it is not required that the integral value is equal to the same constant for all the density functions, so in the following calculations different c i will be used. α ξ For any independent extreme value random variable, from the extreme value theorem it is easy to know that the limit distribution must be one of the three limit forms: Gumbel, Fréchet, and Weibull. Fig. 1 and Fig. 2 show the density function plots for the three standard distributions when different values of are taken. If the three different original distribution families are unified into one generalized extreme value distribution, then the shape of the parameter data itself determines the most suitable tail part layout, so there is no need to decide which distribution to choose first. More importantly, in the process of getting the shape parameters , at the same time it reduces the uncertainty for selecting the appropriate distribution family for the data set it-ξ μ σ self. Fig. 3 shows the GEV density function when needs to be positive, negative and zero for = 0 and = 1, respectively.
Information entropy represents the overall information measurement of a source in the sense of average. Entropy can reflect the degree of disorder, and the uncertainty of random events can be described by probability distribution function. If the density function of continuous random variable is , the information entropy can be calculated by . Let the continuous random variable's mathematical expectation and standard deviation be and , respectively, then the normalization would be . The standardized density function is with the standardized entropy (let ) It can be seen from Eq.
(1) that the information entropy of the random variable is different from the normalized information entropy by a constant, and the constant is exactly equal to the logarithm of the standard deviation. Therefore, for the same kind of random variable, the information entropy increases with the standard deviation increase. Information entropy indicates the uncertainty of random variables, and the standard deviation or variance represents the degree of dispersion of random variables.
The concept of data dispersion will have various definitions according to different research contents and measurement standards. E(x−u|x>u) represents the expected value of measured data exceeding the number u, which is defined as threshold in this paper. It can be seen from the definition that the selection of the measured data area length, data quantity, extreme value model, and the selection of the fluctuation range of the flat part of the trend line are all factors affecting the judgment of the degree of data dispersion.
Obviously, the larger the range of data samples, the larger the uncertainty. Under the condition that the range of values is limited, the maximum entropy value is reached when the probability distribution of the random variable is evenly distributed. And the random variable can take those from the entire value range and also reaches the maximum. The entropy values for several common distributions are calculated below.
(2)−(6) give the expressions for the entropy of different models. In addition to the parameters containing the distribution functions of each model, another quantity c i needs to be calculated. The literature (Landwehr et al., 1979;Qu et al., 1987;Song and Ding, 1988) discusses the parameter solving process of Gumbel, Weibull and Pearson-III respectively; the literature (Zhang and Xu, 2005) gives the method of solving the parameters of the maximum entropy model by using the moment method. In this paper, we further improve it to solve the parameters by using the nonlinear equations consisting of four quantities of mean, variance, kurtosis and skewness.
The mathematical expectation of the extreme wave height X is recorded E(X), D(X) as the variance, and then the formula is substituted into the constraint condition, finally we have (let ): And there will be Substituting Eq. (8) into Eq. (9) yields Then we have Use A m to mark the origin moment of m Hence, we have Let , and B k , S and K represents the k-th central moment, the skewness and the kurtosis of the ex-treme wave height distribution, respectively, according to the definition and the relationship between the origin moment and the central moment, we can have From Eqs. (14), (16) and (17), we can obtain: We can obtain and from Eq. (18), and then from Eq. (11) we can have the value for , thus, can be obtained if we put all these into Eq. (8).
For application, first of all, according to the flat part of the trend line to determine the threshold value, and obtain the over-threshold data group, and origin moments of m can be calculated by , central moments of k can be calculated by . and then can be calculated by Eqs. (16)−(18), and then calculated the maximum entropy distribution parameter , by Eqs. (8) and (11).
The other thing we can know is that formula is not convergent. This is because the infinite integral is convergent and ln x is monotonous on but unbounded. Therefore, the Abel discriminant method shows that the infinite integral does not converge. Although this formula does not converge, it is known that q(x) ln x → 0 when x → 0 and x → ∞. This gives us the possibility to calculate the parameters. In fact, under marine environmental conditions, the distribution model can always find the small enough N and big enough M, so that This is acceptable in practical applications because the extremum can theoretically be infinitely large and infinitely small, but in practice the extremum will always be a bounded value. So, there is (20) In summary, for different models and their corresponding distribution functions, we can obtain the information entropy value through simple parameters and a definite integral, and the difference between the information entropy value and the maximum entropy value is the accurate result of uncertainty for the model distribution form.

Research on uncertainty of model parameter estimation
In the previous section, the influence of the model distribution form on the uncertainty has been introduced. The result has a direct relationship with the parameters. Therefore, the parameter estimation also has an important influence on the uncertainty of the model. If the parameters are calculated accurately, the uncertainty will be more realistic. On the contrary, if the estimation of the parameters itself has a large error, the influence on the uncertainty will be larger. Since the parameters of the previous model distribution function other than the maximum entropy density function are estimated by the maximum likelihood method, which is a widely used method and has achieved good results, it has obtained relatively high precision, and is reliable. So, we consider using the Monte−Carlo method to study the uncertainty of parameter estimators (Turcin, 2006;Zeng et al., 2016Zeng et al., , 2018Deng, 2020).
The Monte−Carlo method is a general term for artificially generating and using random numbers. It is a type of numerical solution or method for solving approximate problems in mathematical, physical, and engineering field by sampling the related random variables or stochastic processes. Specifically, it constructs a random variable or stochastic process for the problem of the required solution, so that a certain numerical feature is the solution of the problem sought, and then samples the constructed random variable or process, and calculates the corresponding parameter value as an approximate solution to the problem sought through the obtained samples. Taking the Gumbel model as an example, the steps to determine the parameter uncertainty on a computer using the Monte Carlo method are as follows.

Random numbers generation μ σ
Firstly, by means of the R program, the parameters of the model are solved by the maximum likelihood method, and then a random number with the same parameter distribution is generated by the computer. The parameters of the Gumbel model obtained by the R program in this article are: = 7.8186, = 1.4016. Then generate 1000 numbers that follows the Gumbel distribution with these parameters. In fact, the random number generation function for Gumbel and Weibull distribution models has been provided in the statistical toolbox in MATLAB, while Pearson-III can be regarded as a special kind of Gamma function. In this paper, from the set of the distribution model, the parameters in the distribution model are solved with the help of the R coding using maximum likelihood estimation from the measured data, which is in silico generate 10000 sets of random series, each containing 1000 random numbers with the same parameter distribution. So these numbers are in extreme value distribution whose parameters are known.

Data sampling
For each of the 1000 data sets, assume that the design wave height is appearing once-in-a-century, that is, take the extreme value data with a probability of 99%. Since the series contains 1000 data points, by referring to the method of taking the empirical distribution function (Liu et al., 2010;Chen et al., 2017), the data point of the 990th order in the ascending order of the series is the extreme value data for once-in-a-century. In this way, 10 000 once-in-a-century extreme data points are obtained by sampling. When the sampling is large enough, the average value of the sampled data can also be used as an approximation of the once-in-acentury data. We can calculate the influence of the estimated parameters of the model on the model uncertainty through 10000 once-in-a-century extreme sampling data. Calculate the recurrence level of the Gumbel distribution with known parameters , then the error between the sampled value and the calculated value is approximately obeying the standard normal distribution, and

Uncertainty calculation of the sampled data
(21) is used as the sampling error for the estimator. Use information entropy as a tool for uncertainty measurement, calculate the number of occurrences for x i , estimate the frequency of its occurrence p i , and calculate the information entropy of the sampling error from equation For a discrete system composed of n events, the information entropy is expressed as H n , where p i is the probability of the event i occurring and , and then solve the uncertainty of the model for the parameter estimator. Information entropy is a probabilistic measure of the uncertainty of a system. Through the calculation of information entropy, the uncertainty of each model parameter and the uncertainty of the selected distribution can be quantitatively described. The improved method for solving the maximum entropy distribution parameter only needs to know the four statistical characteristics of the sample: mean, variance, kurtosis and skewness, once obtained, they largely simplify the calculation steps and facilitates computer implementation.
Specifically, is a ratio of the amount of increase in the information entropy to the amount of increase in the number of samples, and is used to reflect the rate of change of the sample size to the information entropy. The ratio of the increase of the information entropy to the increase of the variance is recorded as , which is used as the rate of dispersion of the sample to the information entropy. The accuracy of parameter estimation in the distribution model also affects the uncertainty of the information. For the parameter estimator in the distribution mode, a large number of random numbers can be generated by the information entropy for sampling error calculated based on the Monte−Carlo method. The error between the theoretical and empirical values of the distribution can be calculated. And we can obtain the information entropy for the error as , which is the rate for the influence of the sampling error on the uncertainty of the first type of information to the number of samples changing. Based on the uncertainty of sample size, dispersion degree, sampling error, and the uncertainty of the model distribution form, this study proposes a formula for measuring the overall uncertainty of the sample data: σ 2 H(X) in the formula is the information entropy of the design parameter estimation model itself, n represents the number of sample data, and represents the variance of the sample.

Research application of model uncertainty
In this paper, the wave height and water level data of the Chaolian Island (35.53°N 120.51°E) in the Yellow Sea for 26 consecutive years are selected. The maximum value of the data block is used to select the data, and the wave height is set as X. The scatter distribution of the data is shown in Fig. 4. For the following part, the information entropy is used to calculate the uncertainty and overall uncertainty of the ocean environment condition design parameter estimational model.

Parameter estimation of three distribution models and
parameter estimation of the maximum entropy distribution model Parameter estimates for the GEV distribution can be obtained from the R program: (μ,σ,ξ) = (6.6250, 1.3354, −0.2367). (24) The corresponding log-likelihood function value is 44.8596, confidence for , , is 95% with confidence interval shown in Table 1. The maximum likelihood estimation for is negative, corresponding to a bounded Weibull distribution (Shi, 2006;Escalante et al., 2016). Fig. 5 shows a higher precision confidence interval for obtained from the contour likelihood function curve is [−0.44, 0.02], since the contour likelihood function is often used to estimate the confidence interval of a given confidence range for parameter contour. Fig. 6 and Fig. 7 show three fitting diagnostic plots of the return level plot, the probability plot, and the quantile plot of the GEV distribution. The circles in the figure represent data points and the solid lines represent model curves. Fig. 6, the recurrence level diagram shows the curve relationship obtained by the generalized extreme value distribution model when the theoretical 95% confidence interval is taken, showing the distribution status of the data within the confidence interval.
In Fig. 7 the quantile plot shows how the observed data fit the model. The cumulative distribution of observation points is .
In the formula, X 1 < X 2 < … X k is the order statistic of the observation data, and i = 1, 2, …, k indicates the number of observation data; the quantile map also reflects the coincidence of the observation point with the generalized extreme value model, taking the model quantile where p is the cumulative distribution probability and p takes different observation points, indicating the cumulative distribution probability and the generalized extreme value model curve fitting, which reflects whether the obtained extreme value distribution is reasonable or not. As can be seen from Figs. 6 and 7, the sequence of observations is in good agreement with the model, where all points in the probability plot and the quantile plot are almost on a straight line, so the test results indicate that the generalized extremum model cannot be rejected. The horizontal graph shows that the design wave height values of the proposed design all fall within the confidence interval. Therefore, it is reasonable to estimate the generalized extremum distribution when the design wave height is used for many years. Other types of hypothesis tests can also be performed. Fig. 8 shows Weibull's probability paper graph, and Fig. 9 shows the Gumbel fitting diagnostic plot of the data. These tests show that the traditional models have passed hypothesis testing, but obviously the design parameters derived from different models will be different, and the calculated design criteria will be different. The different standards caused by this different model bring us certain choices for model selection and inference for the calculation of the return period (Xue and Chen, 2007;Jiang et al., 2019).  Table 2 and Table 3 give the statistical characteristics of the wave height sequence and the K−S test values of different distributions. It can be known that the skewness of the annual extreme wave height sequence S=0.1295>0, indicating that the variable distribution is positively biased, the shape of the distribution is longer to the right, and the density of the variable is larger to the left; kurtosis K=3.1874>3, indicating the sample number. The density curve is steeper than the normal distribution density curve. These statistical features are detailed in Fig. 10.

Information entropy corresponding to the uncertainty of the distribution model
For the uncertainty of the distribution form for the design parameter estimation model, from the aforementioned we know that it is only related to the parameters and we have given the nonlinear equations for solving the relevant parameters. The information entropy of each distribution model, that is, the uncertainty of each model, can be obtained in Table 4 using 20-year data (the 1st group of data) and 26-year data (the 2nd group of data). The difference between the maximum information entropy model and the information entropy of each commonly used distribution model is used as a measure of uncertainty.
From the two sets of data in Table 4, it can be seen that for the information entropy of several distribution models, the maximum entropy model has the largest value of information entropy, and the maximum difference of information entropy between the models reaches 0.5166, which is   CHEN Bai-yu et al. China Ocean Eng., 2021, Vol. 35, No. 2, P. 238-249 almost half of the information entropy of the Gumbel distribution model, the one with the smallest information entropy. This indicates that compared with the Gumbel, Weibull, and Pearson-III models, the maximum entropy model is the one with the least human interference and thus corresponds to the largest uncertainty, which can better reflect the real conditions of the marine environment. In addition, the information entropy of each model increases as the amount of data increases, and the difference between the information entropy of each model and that of the maximum entropy model decreases as the amount of data increases, which indicates that the uncertainty of one model is related to the sample amount. The more the number of samples, the larger the uncertainty of the model, and the result of this increased uncertainty will drive the uncertainty of other models closer to the maximum entropy model.

Information entropy of parameter estimation error
Finally, the uncertainty of the model is studied by considering the model parameter estimator. The accuracy of the parameter estimation has a great influence on the information entropy of each distribution model. Therefore, the values for parameters of the common distribution model are obtained by the maximum likelihood method. Use the steps described in Section 3 to generate random numbers that meet the requirements, and calculate the information entropy of the estimated errors of the distribution parameters for the model (Table 5). It should be noted that the 10000 random errors of the distribution model are different. It is meaningless to directly calculate the frequency of occur-     rence. Therefore, in this study, the error is divided into 100 groups averagely, and the number of occurrences of each group is recorded as m i , is the occurrence frequency of each group. And the information entropy generated by the sampling random error is calculated by Eq. (21). The detailed grouping histogram of the sampling error is shown in Fig. 11. As can be seen from Fig. 11, the sampling errors approach normal distribution when the sample data size is large. The peak values of the sampling errors of the Gumbel distribution model, the Pearson-III distribution model, the Weibull distribution model, and the maximum entropy distribution model are −0.3, −0.15, −0.05, and near 0, respectively, and the maximum frequencies of occurrence are 331, 383, 302 and 367. The Pearson-III distribution model has the highest peak frequency, and it has the widest value range. Since both the Gumbel distribution model and the Weibull distribution model belong to one of the generalized extreme value distribution models, the shapes of their sampling error histograms are very similar. Since the maximum entropy distribution model itself has the largest uncertainty, its sampling error histogram differs from the normal distribution and shows multiple peaks.
Since the error represents the distance between the estimated value and the true value, it is obvious that the more concentrated the distance distribution is, the better the estimation effect is, and the smaller the error information entropy about the distance will be.
As can be seen from Table 5, the information entropy for the parameter estimation error of the Pearson-III distribution is the smallest, the maximum entropy distribution is the second largest, and the Weibull distribution is the largest. From the relationship between information entropy and uncertainty, it is clear that the parameter estimation uncertainty follows the same order. Since the parameter estimation method of the maximum entropy distribution is estimated by the improved method of moments, the difference between the information entropy value for the parameter estimation error and that of the Pearson-III distribution is only 0.82%, which indicates to some extent that the parameter estimation uncertainty of the maximum entropy distribution is relatively small. 4.5 Other information entropy calculation and overall uncertainty of the model From Eq. (16), the uncertainty of the different distribution functions and the relative uncertainty are obtained  (Table 6). Obviously, in this island, it is reasonable to use the maximum entropy distribution to estimate the design wave height of a hundred-year-appearance level. As can be seen from Table 6, the overall model uncertainty of the maximum entropy distribution is much smaller than that of the other three distributions, indicating that the selection of the maximum entropy distribution is usually preferable for statistical characterization of the measured ocean data with large randomness.

Conclusions
In this paper, we investigated and estimated the models and associated uncertainty for the calculations of design parameters for the marine environment, proposed an improved method of moments for the derivation of parameters to be determined in the maximum entropy distribution, and present a specific formula for calculating the overall uncertainty of commonly used distribution models using information entropy.
The overall uncertainties of Gumbel distribution, Weibull distribution, Pearson-III distribution and maximum entropy distribution are 7.3615, 2.1505, 2.399 and 1.4632 respectively, from which it can be seen that the overall uncertainty of the maximum entropy model is the smallest, and the results obtained by using the maximum entropy model are more consistent with engineering facts, less affected by human factors, and can retain the original statistical characteristics of the actual measured data to the largest extent.
In terms of parameter estimation uncertainty, the error information entropy of Gumbel distribution, Weibull distribution, Pearson-III distribution and maximum entropy distribution are 3.9561, 4.0402, 3.8211 and 3.9238, respectively, and the error information entropy of maximum entropy distribution is close to that of Pearson-III distribution, and the difference is very small, which to a certain extent indicates that the parameter estimation method proposed in this paper is not only convenient but also has certain reasonableness and can be adopted in engineering practice. Therefore, the maximum entropy distribution is by far the most suitable distribution model for the statistical analysis of the measured data in marine engineering.
In this paper, the uncertainty of the model itself, the uncertainty of the model parameter estimation, and the uncertainty of the model as a whole are compared and analyzed to determine the most suitable distribution model for statistical analysis, and the superiority of the maximum entropy distribution in the derivation of the ocean extreme wave height from the perspective of uncertainty is illustrated in this paper. Additionally, the uncertainty measure of this paper can be applied to other distribution functions and different research fields.
The Monte−Carlo method is used in this paper for parameter estimation required for calculating the model uncer-tainty. How to train the model data and evaluate the parameters with the help of deep learning on the real measurement data with temporal factors are also by experimental studies that can be conducted in the future. Besides, if random variables applied in previous researches can be replaced by a series of random processes and time dimension can be added on the basis of space dimension. Then discussions about design parameters for the marine environment, such as wave height and surge, can be conducted on both time and space dimensions. This is also an area to be further studied.