Investigation of probabilistic models for forecasting the efficiency of proppant hydraulic fracturing technology

To solve the problems accompanying the development of forecasting methods, a probabilistic method of data analysis is proposed. Using a carbonate object as an example, the application of a probabilistic technique for predicting the effectiveness of proppant hydraulic fracturing (HF) technology is considered. Forecast of the increase in the oil production of wells was made using probabilistic analysis of geological and technological data in different periods of HF implementation. With the help of this method, the dimensional indicators were transferred into a single probabilistic space, which allowed performing a comparison and construct individual probabilistic models. An assessment of the influence degree for each indicator on the HF efficiency was carried out. Probabilistic analysis of indicators in different periods of HF implementation allowed identifying universal statistically significant dependencies. These dependencies do not change their parameters and can be used for forecasting in different periods of time. Criteria for the application of HF technology on a carbonate object have been determined. Using individual probabilistic models, integrated indicators were calculated, on the basis of which regression equations were constructed. Equations were used to predict the HF efficiency on forecast samples of wells. For each of the samples, correlation coefficients were calculated. Forecast results correlate well with the actual increase (values of the correlation coefficient r = 0.58-0.67 for the examined samples). Probabilistic method, unlike others, is simple and transparent. With its use and with careful selection of wells for the application of HF technology, the probability of obtaining high efficiency increases significantly.


Introduction.
Time series forecasting is one of the most common forms for the statement of the forecasting problem, the solution of which plays an essential role in the strategic planning processes. Application of the currently existing mathematical models and methods for forecasting time series is closely related to the specifics of the subject area. The problem of time series forecasting is solved on the basis of creating a forecasting model that adequately describes the process under study.
Considered class of time series with regular constant components is used in subject areas, in which the influence of various factors is significant. An example of such a sphere is the oil industry, where the profit of a company or a country depends on confidence in the fulfillment of plans for oil production. Fulfillment of plans for oil production depends on many factors: technological, due to the technology of oil production; technical, depending on the applied technical means for intensifying oil production and geological, due to the structure of the space where the residual oil reserves are located. To predict the dynamics of the well operation [17, 18,21], a set of geological and physical factors characterizing the reserve or the region of the well are used. In the presence of time series for this class in different areas, solving the forecasting problem is an important scientific and technical problem.
All statistical and structural time series forecasting models have advantages and disadvantages. Regression models and methods. Advantages of these models include the simplicity, flexibility, and uniformity of their analysis. Using linear regression models, the forecast result can be obtained faster than using other models. In addition, the transparency of the modeling is an advantage; i.e., the availability for analysis of all intermediate calculations. The main disadvantage of non-linear regression models is the complexity of determining the type of functional dependence, as well as the complexity of determining the main parameters of the model involved in the calculations.
Neural network models and methods. The main advantage of neural network models is nonlinearity, i.e. the ability to establish non-linear relationships between future and actual values of processes. Other important advantages include adaptability, scalability, and consistency in analysis and design. Disadvantages are the lack of transparency in modeling, complexity of the architecture choice, high requirements for the consistency of the tutoring sample, choice complexity of the tutoring algorithm and the resource intensity of the tutoring process.
Models based on classification-regression trees. Advantages of the models in this class are: scalability (due to which fast processing of extremely large amounts of data is possible), speed and unambiguity of the tree learning process, as well as the possibility of using categorical external factors. Disadvantages are ambiguity of the algorithm for constructing the tree structure and the complexity of determining the moment to stop further branching.
It is not possible to judge the forecasting accuracy of the given forecasting models. Accuracy of forecasting a particular process depends not only on the model, but also on the experience of the researcher, on how well he understood the structure of a particular forecasting model, on the availability of data, and many other factors. However, the main disadvantages of existing forecasting methods can be grouped: a large number of free parameters requiring identification; various units used in forecasting indicators; unavailability of intermediate calculations performed in the "black box"; the complexity of assessing the individual influence degree of the indicators used on the output value. Thus, it is often difficult to determine which indicators have the greatest impact on the output and to rank them.
The task of forecasting time series is relevant for many subject areas and is an important part of the daily work of the "LUKOIL-Perm" LLC enterprises. To solve these problems, it is proposed to consider the possibilities of using the probabilistic method, which is described in sufficient detail in [16,19,22,23]. In this work, a forecast of an increase in oil production HF oil.f q is made using a probabilistic method for analyzing geological and technological data on the example of the experience of using proppant HF technology at the B3B4 carbonate object (upper Carboniferous sediments). Geological and technological parameters of wells are known at the selection stage, which allows making calculations of the technology efficiency in advance and give recommendations for its implementation.
As of 01.01.2019, 66 proppant HF operations were performed at the В3В4 site with an average initial increase in oil production of 7.1 tons/day. Let us divide the pool of wells into two classes: class I with increase in oil production HF oil.act q > 7 tons/day -technology is effective, class II with increase in oil production HF oil.act q < 7 tons/day -technology is ineffective. Initially, for each class of wells in the studied development sites, a statistical analysis of more than 50 indicators was performed. As a result of the analysis, parameters have been established that affect the efficiency of proppant HF. Importance of indicators was assessed by comparing the average values in the two classes according to the criterion of information content  piezoconductivity , cm 2 •s; skin factor S; gamma log values GK, μr/h; neural-gamma gamma log values NGK, μr/h; drilled-in net oil thickness h n , m; relative depth formation H rel , m; absolute H abs , m.
• Technological parameters: current formation pressure in the well Р f , MPa; cumulative oil production since the beginning of well operation Q oil.c , tons; water production Q w.c , tons.
It is proposed to perform probabilistic data analysis and build probabilistic models Initially, it is planned to build a probabilistic model for wells in 2014-2015, i.e., each geological and technological parameter N, which has a dimension (MPa, μm 2 , tons, etc.), is converted into a dimensionless value. Converting dimensional values into dimensionless ones will allow comparing indicators in a single probability space with each other. Based on the analysis results for each indicator, a probabilistic equation is constructed: Probabilistic equation will make it possible to determine the criteria for the application of HF technology and rank the indicators according to the degree of influence on the actual increase in oil production . Let us consider the methodology for constructing individual probabilistic models using the example of the current formation pressure in the well P f . Average indicator values for wells where HF technology is effective and ineffective differ by 1.1 times. Average value in class I P f = 7.7 MPa, in class II P f = 6,8 MPa. Further, using this characteristic, distribution densities of the two classes under study were investigated. In the first case, data on class I values are studied, n 1 = 38, in the second case -data on class II, n 2 = 28. Following the methodology used, at the first stage of constructing a probabilistic model based on the P f indicator for classes I and II, a histogram is constructed. Optimal values of the intervals for values of the indicator P f are calculated by the Sturgess formula: To study the ratios for the proportion of values that fell into different intervals of variation of P f , an interval analysis was performed ( Fig.1).
At the next step, probability of the group belonging to the class is calculated: where N g -number of cases for P f belonging to a group; N k -sample volume for classes I and II. Calculation results are presented in Table 1. Next, conditional probability for each group is calculated: In each interval, probabilities of belonging to class I wells are calculated. After that, interval probabilities of belonging to class I are compared with the average interval values of the P f . By the values of P(P f ) and P f , the pair correlation coefficient r is calculated and the regression equation is constructed. Subsequent correction of the constructed models is performed under the condition that the average value of the probabilities for wells where HF is effective should be greater than 0.5, and for wells where HF is ineffective, less than 0.5. Probabilistic model is as follows: Table 2 shows the models of geological and technological parameters for each of the samples. It is shown that the values of probabilities for all indicators vary within 0.160-0.840. This indicates that all indicators with different dimensions were transferred to a single probability space using the constructed regression equations. Calculations performed to build individual probabilistic models are available to a specialist and, unlike other methods, do not contain hidden processes. Table 2 Probabilistic models of belonging to class I (by samples)    Table 2 shows that tg (2) in four cases changes in the range from +0.095 to +0.120. Inclination angle tg for indicators h n , NGK is the largest in comparison with other indicators. By geological and technological indicators: S, m, P f , Q oil.c , Q w.c , K prod tg → 0. Values of tg for geological parameter K prod change in a narrow range from 0.012 to 0.039. Inclination angle K prod is less than NGK, which indicates a weak effect on the output. For wells from the sample of 2014-2015 by parameter K prod value of tg is the highest, with each addition of new wells to the sample tg → 0. Probably, with an increase in the number of wells in the sample (2019, 2020, ...) at one of the stages tg = 0, i.e. this parameter will not affect the HF efficiency. Parameters Q oil.c , Q w.c in the Absence of the parameter influence shows that the most of the wells are located at the same absolute and relative depths in classes I and II. Change in the dependence direction for the relative depth of the formation H rel from plus to minus in Table 2 indicates a change in the ratio of the wells number belonging to classes I and II. Dependencies constructed for a sample of wells in 2014-2018 have a weak effect on the output indicator tg → 0 (P(N) → 0.5). Probabilistic method of data analysis has "transparency", since in practice it allowed to build probabilistic models and assess the influence degree of indicators on the output indicator .
HF oil.act q Input parameters with different dimensions are compared due to their reduction into a single dimension space.
The most informative parameters influencing the output parameter are determined. The universal dependencies are revealed, ranked by the tg inclination angle, i.e. the parameters of the indicators that have the greatest influence on the output indicator are determined, and ranked according to this. In the case of tg ~ 0 geological and technological parameters have no effect and can be neglected. Thus, only the necessary parameters can be determined, which allows further analysis of universal models.
To achieve an increase in oil production of 7 tons/day, the probability must be more than 50 %. Consequently, the values of the criteria for the application of HF technology are located in the range of 0.5-1 units (P(N)  0.5). Thus, P(N) = 0.5 corresponds to the minimum value of the parameter, P(N) → 1 corresponds to the maximum. Universal indicators must meet the criteria for the use of HF technology in a carbonate object В3В4: values of gamma log (NGK) 4.1-6.3 μr/h; drilled-in net oil thickness h n 4.5-2.5 m; skin factor S -1.4--6.6; porosity coefficient m 16.1-19.4 %; current formation pressure in the well P f 8.2-13.2 МПа; productivity coefficient K prod 4.2-8.5 m 3 /day•MPa; cumulative oil production since the beginning of well operation Q oil.c, 0682-16523 tons; cumulative water production since the beginning of well operation Q w.c 22810-59802 tons.
Forecasting the increase in oil production. For the purpose of joint use of individual probabilities for geological and technological indicators, the generalized (integrated) probability int  This regression model takes into account the values P int in combination m = 3, which makes it possible to predict an increase in oil production (for wells in 2016-2018) with a maximum deviation from the actual values of 2.9 tons/day. Similarly, a regression equation is constructed and the forecast is carried out for wells samples of 2014-2016, 2014-2017, 2014-2018. Figure 4 shows that the calculated values of the increase in oil production correlate quite well with the actual increase in production rates, the values of the correlation coefficients r = 0.58-0.67 for the examined samples of wells. The greatest discrepancy between the forecast and actual data is observed when predicting growth in the period of 2016-2018, the maximum deviation is 2.9 tons/day, in other samples, the smallest deviation is noted. The smallest discrepancy in predicting the increase in oil production rates is in the tutoring sample of wells in 2014-2018.
Conclusion. Main advantages of the probabilistic method of analysis are simplicity and transparency. This method made it possible to significantly reduce the number of free variables, to transfer dimensional indicators into a single probability space.
Description for the construction of individual probabilistic models allowed showing the transparency of this method and the absence of hidden processes. Using tg it is determined, which of the indicators has the greatest impact on the output indicator. Due to the probabilistic presentation of data for each parameter, the criteria for the application of HF technology at the В3В4 object have been determined, and the permissible probabilistic limits of the technology application have been identified. Analysis of the conducted forecasts allowed determining the reliability of the constructed regression models.
The method is recommended to be used to predict the effectiveness of other technologies application: perforation methods, radial drilling, acid treatment, etc.