Manufacturing Cost Estimation Based on a Deep-learning Method

: In the era of the mass customisation, rapid and accurate estimation of the manufacturing cost of different parts can improve the competitiveness of a product. Owing to the ever-changing functions, complex structure, and unusual complex processing links of the parts, the regression-model cost estimation method has difficulty establishing a complex mapping relationship in manufacturing. As a newly emerging technology, deep-learning methods have the ability to learn complex mapping relationships and high-level data features from a large number of data automatically. In this paper, two-dimensional (2D) and three-dimensional (3D) convolutional neural network (CNN) training images and voxel data methods for a cost estimation of a manufacturing process are proposed. Furthermore, the effects of different voxel resolutions, fine-tuning methods, and data volumes of the training CNN are investigated. It was found that compared to 2D CNN, 3D CNN exhibits excellent performance regarding the regression problem of a cost estimation and achieves a high application value.

accurate estimation of the cost of product parts during the design stage can optimise the design and improve the competitiveness of the product.
A cost estimation is a quantitative estimation of the resource costs required for part processing. Differing from a cost calculation, a cost estimation is based on the premise that the company does not obtain the production schedule or data of the manufacturing process, and can be divided into direct and indirect costs. Direct costs are identified as the costs of parts used in production, such as design costs, material costs, and processing costs. Indirect costs refers to costs related to the product that are not directly identified, such as the plant lease, depreciation rate of the equipment, and sales and administrative expenses. A cost estimation is a typical mathematical regression problem, meaning a process of predicting events by building a complex model of the complex relationships in the sample. Since the concept of a regression model was first proposed, different regression models have emerged, including linear regression, ridge regression, and logical regression [2]. The existing cost estimation methods based on a regression model can be divided into two categories: parameter fitting and analysis estimation methods.
The principle of a parameter fitting estimation method is to compensate the data information of the products to be sold according to the structural information and cost information of similar products, and to obtain the cost statistics information for estimating the mathematical expression of the quotation of the products through the fitting relationship between the parameters. This method requires analysing the various links of the commodity cost in detail. Under a low estimation workload, a cost quotation estimation can be obtained; therefore, the method is widely used in the early stage of a product design and can solve the problem of an early cost estimation. The disadvantage of this method is that the accuracy of the cost estimation is low. Mature algorithms include regression analysis estimation, functional cost estimation, and learning curve estimation.
A regression analysis estimation is a statistical analysis method for determining the quantitative relationship between two or more variables. Rickenbacher et al. [3] proposed a statistical approach for the time estimation to complete a building job. In this approach, the model was based on a linear regression analysis. The regression coefficients were estimated according to previous building jobs. The calculated building costs were split according to the volume and building height of the parts to obtain the costs for a single part. Mileham and Currie [4] studied a parametric model of a production cost estimation during the product design stage. The key was to transform the design parameters into cost characteristic parameters through a multiple regression analysis, thereby estimating the cost using the mapping function of such an analysis. This method is well suited for a cost estimation of an injection model.
A functional cost estimation [5] is a method for estimating the value of a product according to its function. French and Folley [6] studied various cost estimation methods for pressure vessels and rolling bearings. They believe that the cost of a product is determined by its functional characteristics. The functional characteristics of a product were determined according to the design parameters. According to the principle of a functional cost method, the cost of a product was estimated using such parameters.
As the principle of the learning curve estimation method [7], the work efficiency increases according to a certain ratio, and the working time per unit task exhibits a decreasing curve. The product cost can be estimated from the law of the output and product cost. Azzouz et al. [8] developed a classification scheme that characterises the different scheduling problems under learning effects, and compared different modelling approaches and solution algorithms through a literature review.
Another category is the analysis estimation method. The principle this is to analyse the activity of the product lifecycle in detail, evaluate the "motivation" of the cost during the process of this activity, and apply a product cost estimation according to the "motivation" of the cost. This method can achieve a higher estimation accuracy.
However, it is based on complete information of the product eigenvalues and is suitable for the late stage of a product design or after design completion. Common cost estimation algorithms include feature-based and process-based estimation methods, non-parametric cost estimation methods, and activity-based estimation methods.
As the principle of a cost estimation method based on feature and processing technologies, the process of a quotation involves adding features that influence the quotation. Each process of a product is equivalent to a feature, such as the tooling, labour costs, transportation costs, packaging costs, the manufacturing parameters, and the processing equipment. By estimating the corresponding manufacturing costs, an accurate estimation of the products can be quoted. According to Ji [9], a case-based reasoning (CBR) revision model was developed to predict the construction costs through feature counting. The formula applied in the model is a mathematical equation that improves the prediction accuracy by applying in advance the error value of the cost.
The prediction effect of this method is extremely high. Rudolph and Emmelmann [10] introduced a cloud-based platform for additive manufacturing that analyses the geometry of a part, including a determination of the volume, surface area, and dimensions. These characteristic factors are used for the cost quotation, which is implemented based on the Standard Triangulation Language format.
The non-parametric cost estimation method [11] is based on previous operations, experience of the decision makers, and the planning, calculation, demonstration, data analysis, and processing. Further, the manufacturing costs can be obtained through a regression analysis of the products and by using a statistical algorithm. Juszczyk et al. [12] investigated issues of a cost estimation based on statistical methods using artificial neural networks (ANNs). They presented a concise comparison of the parametric and nonparametric approaches that require neither assumptions regarding the functional relationships nor an investigation into the rules.
Activity-based costing (ABC) [13] takes the basic principle of a "product consumes activity, an activity consumes resources." It identifies and measures all activities through which an enterprise consumes resources, calculates the costs of the resources consumed for each activity, and estimates the cost of the products. ABC has a high estimation accuracy, although the estimation must be conducted after product design is completed. Time-driven ABC was developed to overcome the problem in which ABC models either become overly complex or are unrealistically simple, which are the reasons why ABC is no longer universally used and why it has been abandoned by some companies [14]. Wouters and Stecher [15] developed a real-time product cost measurement. This approach is used for calculating the rates and non-productive time, as well as cases involving a mix of labour and machine times. The cost per unit of time combines both the cost per labour hour and the cost per machine hour, according to the operational relationships between the machines and operators.
Traditional regression models describe complex mapping relationships with a small number of samples. Table 1 provides a comparison between different models. In an application, the diversification of the part designs, the processing and manufacturing links, and an extraction of high-level data characteristics in each link present significant challenges to a traditional cost estimation method based on a regression model, mainly through the following two aspects. ① It is difficult to model complex mapping relations.
The increases in the quantity and dimensions of the data make the mapping relationship complex. ② The robustness of the data feature expression is poor. Because of the complexity of the actual model structure used in an application, the artificial design features applied in traditional regression models can only deal with changes to a single condition. Functional cost estimation [5,6] Learning curve estimation [7,8] As a newly emerging technology, the deep-learning method has the ability to learn complex mapping relationships and high-level data features from a large number of data automatically. Therefore, a regression model based on a deep neural network can effectively reduce the impact of the aforementioned problems on data prediction and improve the accuracy of the regression.
In this study, a part manufacturing cost estimation was investigated through a deeplearning technology. The rest of this paper is constructed as follows. Firstly, CNN methods are reviewed in Section 2. Next, the generation of image data, voxel data, and a data enhancement method are shown Section 3. In Section 4, methods for training images and voxel data of the parts in a 2D CNN and a 3D CNN are presented. For a 2D CNN, the effect of freezing different convolution layers on the training network is studied by fine-tuning the network method. For a 3D CNN, the influence of training a CNN by different resolutions of the voxel data is investigated. In Section 5, cost estimation methods based on a 2D CNN and a 3D CNN are discussed, and their advantages and disadvantages are compared.

CNN
The regression models of deep neural networks are divided into two types according to the model structure. The first is based on deep neural networks, which establish the regression relationship between the input data and the output data and realise a direct regression prediction [16]. The second involves recognising the model features using a deep neural network followed by the construction of a regression model [17].
According to the different methods of regression, regression models can be divided into ensemble and other types of learning [18]. The integrated learning model is a method of training multiple learners and combining them, which usually achieves better prediction results in practice than a single learner. The latter models can diversify the multi-objective task realisation models and flexibly set up learners for different learning objectives. This method also achieves good results. In addition, multiple-source regression, sequential regression, adaptive regression, and other regression models are widely used.
The application range of a deep neural network has been extended from a 2D model to a 3D model, i.e., from the original 2D image classification to the object detection and visual search of the 3D object model, which introduces a wider range of applications.
Compared with 2D models, 3D models have more complex representation rules. Most of the methods used to generate geometric data depend on the intermediate representations of the 3D shapes, such as point clouds, voxels, depth maps, and RGB-D.
Polygonal meshes [19] and point clouds [20] in a 3D model have difficulty describing the internal features, and they only describe the external shape. In addition, there are difficulties in data processing. RGB-D [21] is an image or image channel containing information related to the distance from the surface of the object to the vantage point.
This method needs to determine the existing area of the object in advance, which consumes computer resources. Depth maps [22] do not directly represent 3D maps, and it is difficult to express the structure of a 3D object directly . Moreover, information is lost owing to occlusion problems. Voxels [23,24] describe the space occupancy of objects in a very simple data form, which is extremely suitable for existing learning methods [25,26]. Wu et al. [27] proposed a 3D ShapeNet, which inputs 30 3 resolution voxel data into a simple five-layer convolution depth confidence network. A total of 150,000 3D models are classified into 660 categories. As a beginning stage of the deep learning of 3D models, researchers are paying increasing attention to this method, despite the simple structure and low accuracy of 3D ShapeNet. Later, Maturana et al. [28] proposed VoxNet, which uses binary voxel grids and a corresponding 3D CNN architecture for a digital geometric analysis of 3D models. Compared with 3D ShapeNet, VoxNet achieves better recognition results. As the advantages of these methods, they can process 3D data from different sources, including LiDAR point clouds, RGB-D, depth maps, and polygonal grid data models. More importantly, these studies prove that CNNs can extract objects, similar to 2D data. The 3D structural features of the volume widen the application range of a CNN. Thus, more forms of 3D data have begun to use CNN learning.

2D-CNN
A CNN is a feedforward neural network with convolutional computations and a deep structure, and is one of the representative algorithms of deep learning [29,30]. Its artificial neurons can respond to the surrounding area of a portion of the coverage and achieve an excellent performance for large 2D images and 3D model processing. Fig.   1 shows a typical CNN structure consisting of three convolution layers, three pooling layers, and two fully connected layers. The convolutional layer cooperates with the pooling layer to form multiple convolution groups, extracting features layer-by-layer.
Regression is achieved through several fully connected layers.
As the number of layers of the CNN model and the complexity of the model increase, the error rate of the model decreases. However, training a complex CNN requires a large amount of input information and can take several days. Transfer learning can solve the problem of a long training time [32]. VGG16 [33] was used to investigate the relationship between the depth of the CNN and its performance. By stacking small 3 × 3 convolution cores and maximum 2 × 2 pooling layers repeatedly, a CNN of [16][17][18][19] layers was successfully constructed. The network structure is shown in Fig. 3.
mt and vt represent the first and second moment estimates of the gradient, respectively. The random gradient descent method randomly extracts a group of samples and updates the parameters of the samples.

Image data
The input of the 2D CNN is a picture of a mechanical part. The size of the image is

Experiments
In this study, a CNN was used to estimate the manufacturing costs of the parts. There were more than 12 types of parts, including a guide shaft, guide-shaft bearing, positioning guide shaft, fixing ring, roller, insertion pin, and metal gasket. Some of the parts are shown in Table 6. The dataset was split into three subsets: training, validation, and testing datasets (60%, 20%, and 20%, respectively). The model was trained using the training dataset and validated using the validation dataset. The specific amount of data is shown in Table 7.  In this study, Google's deep-learning framework TensorFlow was applied. The experiment was conducted using the Ubuntu 18.04 operating system, and the computer had the following specifications: an i9 graphics processor, GeForce 2080Ti memory (11 GB), and a 1.2-TB hard drive. The CNN data flow was designed in TensorFlow using Python. The image files and voxel data were input into the 2D CNN and 3D CNN as training data for estimating the part manufacturing costs. The convolution kernel was initialised by a random decimal. Table 8 shows the parameters of the network structure and the training hyperparameters applied for each experiment.

Training Pictures
Image files were input into the 2D CNN, as shown in Table 2. Before training, the data were normalised. In this study, the trained model VGG16 was used, and the last fully connected layer was modified as having a density of 1. The ReLu activation function was used to fine-tune the network. The effects of freezing different VGG16 convolution layers, namely the first 12, 18, and 24 layers, and using the original model on the training of the model were examined. These network layers were used for feature extractors without training, whereas the other network layer parameters were fine-tuned during training. The initial learning rate of the 2D CNN was 10 -4 , and the minimum learning rate was 10 -6 . When the model was not iterated five consecutive times, the multiplier coefficient of the learning rate was 0.05. When the model was not convergent 40 consecutive times, the network training was considered completely stopped. The convergence of the loss functions for each training is shown in Figs. 6-9.

Training voxel data
During this experiment, voxel data with resolutions of 64 3 , 128 3 , and 256 3 were input into the same 3D CNN. Some voxel data are shown in Tables 3-5. The data were normalised prior to training. The ReLu activation function was also used in the network.
The initial learning rate of the 3D CNN was 10 -4 , and the minimum learning rate was 10 -6 . When the model was not iterated five consecutive times, the learning rate multiplier coefficient was 0.05. If the model was not convergent ten consecutive times, the network training was considered to be completely stopped. The training results for convergence of the loss functions are shown in Figs. 10-12. Fig. 13 shows the convergence of the loss functions when the number of data were 400,000 and the resolution was 128 3 .  (11) Here, i y represents the real price of the parts, and ˆi y indicates the predicted price of the trained model.   In general, the accuracy of the method used for estimating the manufacturing cost of the parts when applying a 2D CNN was far lower than that of the method used for learning the voxel data of the part with a 3D CNN. The images only contained 2D surface information of the part, and the internal information of the part was not applied; thus, the 2D CNN was unable to fully understand all feature information of the part.
The 3D CNN performed well for the part price estimation regression problem. The voxel data contained all geometric features of the part, and a higher voxel resolution corresponded to a larger amount of part information, which was conducive to 3D CNN learning.

Conclusions
The problem of achieving a part processing evaluation was studied using a deeplearning approach. A CNN learning method applying part data for an estimation of the manufacturing cost was proposed. First, by generating a 2D image of the part as the input of the 2D CNN, a fine-tuned VGG16 model was used to study the effects of freezing the first 12, 18, and 24 layers of the network, as well as the performance of the original model for estimating the cost of the parts. When the first 18 layers of the freezing network were used, the accuracy was the highest, and the cost of the part manufacturing was estimated. Moreover, a method for training a 3D CNN with voxel data of the parts was presented. The voxel resolutions of the training data used in the 3D model were 64 3 , 128 3 , and 256 3 , respectively. After training, with the improvement in the voxel resolution, the MAPE of the cost estimation of the parts obtained using the model was small. When the number of training data reached 400,000, the accuracy of part estimation of the 3D CNN model was higher than when 72,594 training data were applied. This method achieves an accurate part manufacturing cost estimation. Finally, a comparison of the two schemes presented in this paper revealed that the accuracy of the 2D CNN learning of the part images used to estimate the manufacturing cost was significantly lower than that of the 3D CNN learning using voxel data of the part.
Because the images only contain 2D surface information of the parts and a lack internal information, 2D CNN cannot learn all feature information of the parts. Voxel data contain all types of feature information of the parts; thus, the features can be easily learned using a 3D CNN. In this paper, the method for training the voxel data of a part to estimate the cost through a 3D CNN has potential application in the current machining industry. This method can be used in a part quotation, which can greatly reduce the current quotation time and improve the quotation accuracy.
For some aspects of the parts that have no processing requirements (such as surface roughness and accuracy) but complex machining features, this method has high application for cost estimations. Future research will address how to add part processing requirements to voxel data to improve the accuracy of a cost estimation. The machining requirements of the parts are generally determined through 2D drawings, and future research will be conducted to extract the machining requirements of the parts and add them to the 3D model. During the voxelisation process, information on the machining requirements will be added. By training the voxel data containing the machining requirements of the parts, a highly precise cost estimation of any part can be realised.