A multi-objective approach for improving real-time audio peak reduction and speech clarity in a dental drill noise reduction device

: Audio signals typically comprise several characteristics to be improved by signal processing and in practice, each of these characteristics has a unique relationship to the controllable system parameters. Quantifying these relationships in a multi-objective (MO) approach will enable an improved system setup. In this paper, two novel objective functions of a complex audio signal are determined for the real-time simultaneous improvement of audio peak reduction and speech clarity in a dental drill noise reduction (DDNR) device. The influence of the DDNR system parameters on the outcome of peak reduction and speech clarity is determined by combining response surface methodology and a desirability function to enable MO optimisation. The results show an average improvement of nearly 30% over the original DDNR device performance. The approach described provides an effective means for addressing MO optimisation of other real-time audio signal processing applications where the signal has similar peak reduction and speech clarity objectives to be processed, particularly where the physical outcome is not easy to evaluate in a virtual environment.


Introduction
Audio signals typically comprise several characteristics to be improved simultaneously when being processed by a signal processing algorithm. In practice, each of these characteristics is affected by the algorithm system parameters, requiring a multiobjective (MO) approach. This paper addresses two objectives of an audio signal containing desired speech and undesired dental drill noise, respectively peak reduction and speech clarity. Associated objective functions are then determined for each, which are based on key signal characteristics. This MO approach to peak reduction and speech clarity produces improved results over the original design. Section 1.1 describes the context of a dental drill noise reduction (DDNR) problem, focusing on the interaction between dental drill noise and speech phenomes to highlight the risk of degrading communication clarity if a broadband noise reduction is employed. A brief description of the DDNR device and the technique it uses for audio signal processing is provided in 1.2. Section 2 describes the theory of how objectives and their associated objective functions are defined in the context of DDNR. Section 3 describes the application of the MO approach, including how to link each objective function to algorithm parameters. Section 4 discusses the results in terms of the practicality and limitations of the approach and Section 5 concludes the paper.

Dental drill noise and speech signal
Patient fear of dental drill (also known as a dental handpiece) noise is a worldwide issue [1] that leads to avoidance of treatment, particularly by more extreme sufferers, as well as causing distress to many who still undergo treatment despite their fear. An investigation completed in the UK using the Modified Dental Anxiety Scale (MDAS) [2] suggests that dental drill noise is the most significant cause of patient anxiety, highlighting the need to suppress the noise to improve patient comfort. Noise from a dental drill is generated by its rotating mechanical components. Highspeed dental drills are either air turbine-driven or electric motor-driven and their frequencies can vary from 2 to 14 kHz [3]. Fig. 1 illustrates a typical electric motor driven dental drill as used in this study. An intermediate shaft is driven by an electric motor normally operating at a speed of 150,000 Revolutions per Minute (RPM) and the burr shaft (cutting tool) normally operates at a speed of 200,000 RPM. Thus, significant noise peaks due to mechanical resonance are generated at around 2.5 and 3.3 kHz.

Fig. 1. Illustration of a typical electric dental drill (Adapted from Long Star Handpiece [4])
In a dentistry environment, verbal communication between dentist and patient is vital to ensure safe and efficient clinical treatment. In an audiogram of sounds from the English language, speech phenomes cluster in a banana shape, which is often referred to as the 'speech banana' [5] as shown in Fig.  2. It is broadly used in the assessment of hearing loss [5,6] but can also indicate the frequency distribution of commonly used speech signals. From this figure, it can be observed that the majority of speech phenomes are within the range of 250 Hz to 5 kHz. Simplified frequency response for an electric motor driven dental drill under no load is also shown in Fig. 2. It indicates that the noise to be attenuated, particularly for the two significant noise peaks, is within the frequency spectrum of speech signals. In use, these two drill peaks will reduce their frequencies as the drill comes under load and therefore will interfere with a broader range of speech phenomes than shown.

The dental drill noise reduction (DDNR) device
Commercial active noise cancelling (ANC) devices are widely available. However, they are optimised for high performance and stability at significantly lower frequencies, e.g. <1 kHz [7], than dental drill noise, which is typically >3 kHz. There are also sound insulation devices available such as ear defenders and in-the-ear plugs but they aim to attenuate a broad range of frequencies that will impair the essential verbal communication between dentist and patient. Thus, both off-the-shelf ANC headphones and sound insulation devices are unsuitable for the target application, which needs to reduce dental drill noise whilst maintaining speech communication. This justifies the need for a DDNR device, whose schematic arrangement is shown in Fig. 3.
The mixed speech and drill noise signal are captured by a microphone embedded in the DDNR device. The signal is then conditioned by a pre-processing unit. The signal then passes to a core adaptive filtering (AF) signal processing unit for noise peak reduction and maintenance of speech. The output signal from the AF unit passes to the post-processing unit, in which the signal can be conditioned again and converted back to analogue for a headphone speaker, and then finally received by a human ear. The main reason for using AF in the DDNR is because it has been successfully applied in noise cancellation for over five decades and is still very popular due to its low cost and adjustability [8]. AF algorithms share similar underlying principles to adaptive control methods in system control [9]. The overall concept of AF is that noise buried within a signal can be estimated and subtracted to enable the desired signal at the input to be better received at the output. Unlike fixed filters, filtering and subtraction in AF are controlled by an adaptive process to reduce the risk of distorting the desired signal or increasing the output noise [10]. AF has been applied to noise cancellation in headphones and proven to be effective for low frequencies [7] but there has been very little research on AF for high-frequency noise reduction applications. In [11][12][13], AF methods are applied within the domain of signal processing but target on low-frequency noise (< 1kHz). The potential of adaptive control algorithms for dealing with complex signals and system responses has been demonstrated, (e.g. [14][15][16]) but again no highfrequency characteristics of signal/system output were mentioned. An AF system using for reducing dental drill noise solely was proposed by Kaymak et al. [17]. Additional needs of improved system capability in mixed-signal processing and maintaining clarity of speech signals have motivated the development of the device investigated here.
The Least Mean Square (LMS) algorithm is the most frequently applied method for noise cancellation due to its simplicity and robustness [18]. Variants of LMS algorithms are widely available, including Normalised LMS (NLMS), which was adopted in the DDNR. NLMS has the advantages of a steadier and faster convergence due to the use of a time-varying step size µ. In the DDNR, the NLMS algorithm pre-scales the error to prevent overloading and maintain a more stable LMS. An illustrative block diagram of the NLMS AF algorithm developed for the application is shown in Fig. 4. The AF algorithm applied uses multiple filters, represented by a single block with each filter denoted as the ith filter in Fig. 4. Pre-processed digitalised audio input x(n) containing the desirable signal (e.g. speech) together with noise (e.g. drill sound) is stored in a delay buffer to synchronise with the filters. In the meantime, x(n) undergoes processing by one of the filters according to arrays of filter weights wi [n]. The filtered output signal yi(n) is then subtracted from the delayed signal d(n) to obtain an error signal e(n). Filter weights in each filter are continuously adapted using the NLMS method to minimise e(n). Detailed implementation of the AF algorithm is not revealed due to commercial confidentiality and also because the authors do not consider this to be the key purpose of the paper. Besides, whilst AF parameter values are likely to differ in other applications the objective functions and MO approach described here can still be used.
When considering the improvement of filtering performance, there are approaches such as the H∞ methods (e.g. [19,20]) that claim to produce guaranteed high performance and good robustness but they are costly in terms of the mathematical computational power required. In addition, the indirect effect of both unchanged pre-and post-processing units on the resultant noise reduction outcome when the AF parameters change needs to be considered. As a result, the AF algorithm requires careful tuning to incorporate this uncertainty to achieve satisfactory output in the physical domain.

Objectives and objective functions
Subjective and objective measurements are both commonly applied in the assessment of digitally transmitted speech and audio signal quality [21]. Until the 1990s, subjective measurements by means of human focus groups have been a standard way, which is timeconsuming and expensive [22]. Most objective measurement used aims to predict the subjective quality using quantitative models with a focus on speech quality [23]. For example, the perceptual evaluation of speech quality (PESQ) assesses the quality of the output speech signal by comparing with the original input speech signal [24]. Such measures are configured for reducing the effects of background noise (e.g. white and factory noise) [25] that does not have narrow-band high frequency ("peaky") characteristics of drill noise. The primary target of the DDNR device is to eliminate unwanted drill noise peaks, with a secondary target of maintaining effective verbal communication. Therefore, new objective measures of for evaluating DDNR are required and the key characteristics of the processed signal need to be investigated. Frequency domain techniques [12] are applied here as they can analyse the frequency component of a signal individually. Power spectral estimation is used to perform frequency domain analysis of the original and filtered signals. MATLAB Signal Process ToolBox [26] offers a family of spectral analysis methods amongst which Welch's power spectrum density (PSD) [27] is used in this paper. The digitised sound recording is 24-bit audio with a sampling rate of 48 kHz. A window size of 4096 and 2048 overlapped samples are used to produce a PSD plot. An example audio plot for the original design, i.e. nominal AF parameter setting is shown in Fig. 5. Two key objectives are identified suited to the target purpose of the DDNR device, namely peak reduction and speech clarity. A frequency spectrum of 0 to 5 kHz is focused on as this is the region of interest where both drill noise peaks and speech signal co-exist. The original signal is indicated by a red dotted line and the filtered signal is represented by a black solid line.
For the peak reduction objective (objective 1), two significant peaks are observed in the figure, labelled as P1 and P2, whose frequency characteristics correspond to the resonant frequencies (approx. 2.5 and 3.3 kHz) stated in Section 1.1. Rather than taking the absolute summit value for P1 and P2 from the plot, a peak is defined as the difference between its summit value (dB/Hz) and signal floor (dB/Hz). The signal floor value is obtained by taking the mean of the signal power across the range of 0.25 Hz and 5 kHz referring to the 'speech banana' region. For example, in Fig. 5 the original peak power for P1 is indicated by P1 o, and the original peak power for P2 is indicated by P2o, with their signal floor denoted as Flooro. Similarly, for the filtered peak P1', its power is indicated by P1f and power of filtered peak P2' becomes P2f, with a new signal floor denoted as Floorf.
Using the above rationale, the objective function ObjPi of peak reduction for each peak Pi can be obtained by subtracting the filtered peak power value from the original peak power value, shown as Equation (1), where i stands for the ith peak to be suppressed in the PSD plot, for the drill noise shown in Fig. 5, i equals 1 for P1 and 2 for P2. For the speech clarity objective (objective 2) an inverse based on speech signal loss is used as the objective function, defined as the difference between the original signal floor value Flooro and filtered signal floor Floorf. The mathematical expression for speech clarity, ObjS, is shown in Equation (2).

Multi-Objective approach
An overview of the MO approach taken for improving real-time audio peak reduction and speech clarity in the DDNR device is shown in Fig. 6. With two objectives represented by objective functions (Equation (1) and (2)), the next step is to perform analysis and optimisation starting from the exploration of the influence of AF algorithm parameters. One way of investigating this is through analytical models enabled by computer-based simulation, e.g. using MATLAB. Analytical models work well when the problem is relatively simple but in real-world scenarios, problems are typically too complex for analytical methods. In this study, the device design is complex and its actual acoustic performance at the headphone output to the ear is the primary interest, making physical experiments more suitable than analytical models. The physical experiments can be kept to a minimum by employing Design of Experiments (DoE), a series of tests in which inputs are varied purposefully to enable observation of their influences on the outcome of a process or system [28]. DoE enables regression models of the objective functions to be obtained and hence predict output responses for given AF parameter values. Response Surface Methodology (RSM) builds on DoE, using statistical techniques to plot surfaces relating responses to several inputs to find an optimum configuration [29]. RSM has broad application in various fields such as chemical [30], biomedical [31] and engineering sciences [32,33] but RSM for noise cancellation performance improvement is barely addressed in the literature. Despite criticism regarding the time and resources required [34], RSM offers the means of unveiling the complex relationships of the physical situation hence makes it a suitable candidate for this study. In the context of the DDNR device, configurable AF parameters correspond to RSM variables, while the scalar values obtained by applying the objective functions correspond to responses in RSM. For the peak reduction objective, there are two noise peaks P1 and P2 to be attenuated whose response can be obtained by using the same objective function ObjPi, namely R P1 and RP2. The speech clarity objective, ObjS, can be represented by one response namely RS. Each of these three responses will have a unique regression model determined by RSM, see Equation (4) to (6). These regression models need to be minimised, maximised or optimised towards a specific target by finding a combination of appropriate variable settings. However, changing the variables will affect all three responses, resulting in a MO optimisation problem to be solved.
The complexity of AF parameter-objective function relationships requires a second-order model to be selected, whose general expression is shown in Equation (3). In the equation, the response is represented by y, k stands for the number of variables, i and j are indices of the variables. β0 is the model intercept, βi, βii, βij are coefficients corresponding to xi, xi 2 and xixj respectively. ε represents the model error.
Box-Behnken Design (BBD) and Central Composite Design (CCD) are commonly employed RSM methods. CCD can be further categorised into Circumscribed, Inscribed and Faced CCD. Detailed explanations of these methods can be found in [35]. The selection of DoE methods is a case-dependent decision.
Regression Analysis examines the adequacy of the regression models obtained from RSM and decides whether they are suitable for predicting responses with reasonable statistical accuracy. The desirability function approach introduced by Derringer and Suich [36] is adopted for dealing with multiple-objective optimisation due to its successful use in a range of applications such as manufacturing, chemistry and biology [31,37,38] The desirability function first coverts system performances (RSM responses) into desirability ratings and uses a single aggregated overall desirability to enable simultaneous optimisation. The desirability function needs to be configured before applying it to the problem. Optimal variable settings for delivering the highest overall desirability rating can then be identified. Validation then needs to be carried out in order to verify the optimisation outcome. When applying the MO approach, engineering decisions that contribute to a successful outcome need to be made. For example, setup of the experiments should be considered to resemble the actual working environment as close as possible. Therefore, a calibrated GRAS 43AG-1 Ear & Cheek Simulator (ECS) [39] was used to mimic a section of a human head and ear canal for representing the acoustic characteristics of an actual ear. The ECS contains a GRAS RA0045 externally polarised ear simulator conforming to IEC 60318-4, which works very effectively below 10 kHz [40]. Typical transfer impedance for the RA0045 can be found in [40], which shows that within the frequency range of interest (0 to 5 kHz), the ear simulator is acceptable for this application in evaluating noise reduction of an electric motor driven dental drill. The ECS uses a GRAS 40AG 1/2" externally polarised pressure microphone [41] that conforms to IEC 61094-4. The typical frequency response shows that the microphone is suitable for up to 20 kHz. A GRAS KB0065 Large Right KEMAR Pinna [42] is also used. The ECS is powered by a GRAS 12AD unit, which is also the hardware interface between the ECS and a PC. A high-performance Behringer MS16 16-Watt monitor speaker was used at a constant volume setting to playback sound recordings representing the sound source. The speaker has a frequency response over the 80 Hz to 20 kHz range [43] that is again suitable for this study. A Windows PC was used to capture the sound data transmitted through the ECS. MATLAB applications were developed to enable data recording and analysis.
The effectiveness of the AF algorithm is assessed by comparing its output with the original combined drill noise and speech signal. Both sounds are transmitted through the DDNR device and its output becomes the signal that is analysed.
The experiment setup for this study is shown in Fig. 7, in which a schematic illustration of the device operating environment can be seen in Fig. 7a and the actual experiment layout can be seen in Fig. 7b. The device is intended to be placed on the patient's chest while they are lying on the treatment chair. An over-theear headphone is plugged into the device and will be worn by the patient. An over-the-ear headphone minimises the effect of acoustic transmission from the noise source directly to the ear. Sound will be generated simultaneously during treatment by both the dental drill and the dentist. The nominal distance from the device microphone to the dental drill and the dentist is estimated as 250mm. The distance from the patient's mouth, where the drill will be located, to the ear is estimated as 100mm horizontally and 50mm vertically.
In the implemented AF algorithm (see Fig. 4) four adjustable parameters, labelled in brackets, were identified and used as RSM variables. Again, due to commercial confidentiality, only an outline explanation is provided here and these four parameters will be referred to as x1 to x4 from now on. Bearing in mind that these AF parameters are algorithm dependent, which means that different AF methods applied may have different variables to explore. Instability of the algorithm output can be observed from both the MATLAB application and from the hardware LED indicators. When the algorithm output becomes unstable during the preliminary experiments, adjustments are made by either increasing the lower limit or decreasing the upper limit of the variables. When all 16 combinations of variable extremes yield reasonable filtering output then an attainable design space is established.
The lower and upper limits of variables identified in the Planning stage are close to their true limits, therefore using any value outside these ranges poses an unacceptable risk of crashing the system. As a result, inscribed CCD [35] is selected to keep the experimental plan within variable limits and for its capability in modelling quadratic behaviours of responses. A CCD experiment created using MATLAB ccdesign function with 4 variables, requires 24 experimental runs plus 12 centre runs (see Appendix 1), which is less than 6% of the equivalent full factorial experiments (625 runs). As each experiment takes approximately 3 min to complete then applying inscribed CCD saves approximately 30 hours in total. Codified variables, using -1 for the lower limit and +1 for the upper limit, are used due to commercial confidentiality. The run order is randomised to avoid biased responses. Results collected for each inscribed CCD experiment run can be found in Appendix 1.
MATLAB fitlm function [44] is used to fit full quadratic regression models to the results collected for the three responses established in Equations (1) and (2). A summary of the regression analysis can be found at the bottom of Appendix 1 and detailed regression results for each response, including the estimated term coefficients, can be found in Appendix 2. Second-order linear regression models derived from the results for the three responses are written as Equations (4) to (6). R P1 and RP2 represent the fitted response of objective peak reduction for the first and second noise peak, and RS represents the fitted response of objective speech clarity. x1 to x4 stand for the four RSM variable identified in the previous section. From Appendix 1 it can be seen that all three regression models are statistically significant given that all p values are significantly less than 0.05. R-squared values for the three models indicate that they are good in terms of explaining the variability in responses, i.e. representing the empirical relationships between AF parameters and noise reduction performance.
0.09 1 4 − 1.72 2 3 − 0.32 2 4 − 0.14 3 4 These regression equations are used to predict responses for the same designs in the designed experiments and then compared with the observed responses for validation. The results are shown in Appendix 3. Columns for variables x1 to x4 are omitted to save space and can be referred to using Appendix 1. A noticeable larger difference is found between predicted and observed responses for RP1 than RP2 and RS. A similar conclusion can be drawn from the regression analysis results, in which larger root mean square error, smaller R-squared and adjusted R-squared are obtained. Considering that all three regression models are statistically significant, reflected in their small p values, plus reasonable R-squared and adjusted R-squared values, then these models were accepted for carrying out the multi-objective optimisation in the next step.
When applying the desirability function response y to be optimised is converted into an individual desirability d whose value varies in the range of 0 to 1. If y reaches its target value then desirability, d, equals 1, if y is outside its acceptable range then d equals 0. Three categories of desirability function are available depending on response target types and are shown in Table 1. Larger-the-better (LTB) means that the response becomes more desirable when its value is large while Smaller-the-better (STB) means that the response should be optimised towards a smaller value. Nominalthe-best (NTB) indicates that there is a specific target value to be achieved for a response, either too large or too small will reduce the desirability. d represents the desirability rating for the response to be optimised, y is the quantitative measure of the response. L is the lower limit of the acceptable range of y, applicable to LTB and NTB scenarios. U is the upper limit of the acceptable range of y, applicable to STB and NTB scenarios. T is the target value of y, used in all scenarios. r, r 1 and r2 are user-specified factors (> 0). When equals to 1, the desirability function rises linearly from 0 at the bounds to 1 at the target. When set to be larger than 1 this implies that the individual desirability value is relatively small unless the response gets very close to its target value. In other words, the higher the value of r, r1 and r2 are, the greater the importance of response values being closer to the respective target will be [45]. More detailed explanation and examples of desirability function with different r values can be found in [46].
When there is more than one response to be optimised, individual desirability ratings di relate to the individual responses yi to be optimised. When multiobjective optimisation is required, i.e. multiple responses need to be optimised at the same time, an  (1) and (2), objective peak reduction is larger-the-better (LTB), whilst objective speech clarity, which is represented by an inverse measure, is, therefore, smaller-the-better (STB). Table 1 Desirability function types. y is the response to be optimised, L is the lower limit, U is the upper limit, T is the target, r, r1 and r2 are the user-specific factors.

Nominalthe-best (NTB)
The proposed upper (U), lower (L) and target (T) values for the three responses are presented in Table 2. Physical experiments are conducted in order to determine the viable values for these limits. The lower limit for Peak 1 reduction response RP1 is set to be 5 dB/Hz and lower limit for Peak 2 reduction response RP2 is set to be 15 dB/Hz. A 15 dB/Hz target is used for RP1 and 30 dB/Hz target is used for RP2. For speech clarity response RS, an upper limit of 10 dB/Hz is set with its target response set at 0 dB/Hz. Factor r is selected to be 1 for all desirability functions, thereby assuming a linear improvement of performance from limits to target values.
After obtaining individual desirability di for each response, the overall desirability D for all the possible permutations of the four variable settings across the entire design space can be estimated using Equation (10) with m equal to 3, expressed in Equation (11).
The solution(s) for the x1 to x4 configuration that yields the highest D value should indicate the optimal setting for the AF algorithm. The largest overall desirability value D equals 0.6906 is obtained having corresponding variables configured at [x1=0.1, x2=0. 6, x3=0.6, x4=-1]. To visualise the results a 3D response surface and contour plot of D with respect to x1 and x2 are shown in Fig. 8, with x3 set at 0.6 and x4 set at -1. In Fig. 8, larger D values indicate better overall desirability and therefore represent better overall noise reduction performance according to the three responses established.  The optimal variable setting [0.1, 0.6, 0.6, -1] is applied to the AF algorithm to validate the optimisation outcome in the physical domain, with the noise reduction results obtained in the form PSD plot, shown in Fig. 9 Details regarding quantified optimised responses are presented in Table 3. Fairly small differences between predicted and observed amongst the three responses suggest the adequacy of the regression models obtained in Equation (4) to (6). The regression model error is indicated by an 11% difference in the overall desirability D based on comparing experimental results to predicted values. The bottom of Table 3 shows the actual improvement of optimal configurations compared to the original settings in the physical domain. Nominal values of the three responses, i.e. quantitative performance for the original design shown in the table are obtained by averaging the experiment results for the 12 centre runs in the inscribed CCD. An average of 29.6% improvement for all the three responses is achieved due to optimisation. Significant improvements of 34% and 51% in R P1 and RS are also achieved, and a slight improvement of 4% in RP2 is also accomplished.

Discussion
Comparing to previous research on dental drill noise reduction the outcome of this study is encouraging. For example, in [17] an AF algorithm is applied to suppress one air turbine driven dental drill noise peak at 4.2 kHz but with no speech, and in [47,48] AF is applied to reduce an electric dental drill running at 3.33 kHz again with no speech signals included in the experiment. The simulation results shown in [3] look promising but they are not validated by physical experiments. More importantly, these literature references only deal with a single objective, which is peak reduction. In this work, e.g. from Table 3, all three responses are optimised with an average improvement of 29.6% as a MO problem. This demonstrates the potential of the approach developed.
When applying this MO approach, identification of objectives and definition of objective functions forms a crucial step. Different objective functions will steer the direction of optimisation and hence can lead to distinctly different outcomes. For example, Fig. 10 shows the outcome of applying two different AF parameter configurations, corresponding to experiment run No. 1 (black solid line) and 5 (blue dashed line). If a simpler objective function is established that indicates a direct reduction in peak power ObjPi* from the original to the filtered signal (ObjPi* = Power of Pi -Power of Pi'), then the two modified responses, denoted as R P1* and RP2*, will yield nearly identical values. This suggests the outcome of objective function ObjPi* for objective peak reduction is identical for the two experiments despite the obvious differences observed in the overall signal. This negative effect can be confirmed by performing a regression analysis for RP1* and RP2*, whose results are shown in Table 4. Noticeably worse regression statistics, e.g. R 2 and Adjusted R 2 are observed, especially for RP2* for which a nonsignificant model (p > 0.05) is obtained. Also, Fig. 10 shows using ObjPi* instead of ObjPi does not necessarily represent an actual reduction of the peaks as the relative peak power is still significant compared to signal floor, e.g. experiment No. 5 (blue dashed line). This means the 'peaky-ness' of the filtered signal remains despite the overall power of the signal being reduced. The objective peak reduction defined in this study aims to address the perceived reduction in noise peaks by considering the signal floor. A peak is only being suppressed when its relative power to signal floor is reduced, e.g. experiment run No. 1 (black solid line).  When defining the objective function for objective speech clarity, the initial approach adopted selects specific 'frequency data points' from the PSD plot, within the region of interest but away from the two identified noise peaks (2.5 and 3.3 kHz), e.g. 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 4.5 and 5 kHz to eliminate the effect of signal power difference due to noise peak filtering. This is because the narrowband characteristic of the two peaks is expected to cause a certain effect on the actual differences between signal floors. A comparison of results between applying the 'frequency data points' approach described above and the signal floor difference approach used in this study produced a 14% difference in objective function ObjS values. Different responses R S collected for ObjS could result in different regression equations and hence a different optimal variable configuration. However, in this study, the simplest measure using signal floor difference was effective enough to yield meaningful outcomes. There are situations where signal difference at specific frequencies are of interest then the 'frequency data point' approach will offer more practicality.
The use of response desirability is able to aggregate multiple responses into one single measure that is easy to calculate and interpret as found in the literature [37]. The calculations are easily accomplished in MATLAB or similar environments. The formula used for estimating overall desirability D (see Equation (10)) assumes that all responses are equally important by taking the geometric mean of individual desirability d i. This original formulation of overall desirability D is sufficient in most cases however there are situations where unequal weights are preferred to address the difference in importance for responses [38], an example expression is shown in Equation (15).
where wi indicates the relative importance of each response, with each of them great than 0 and the sum of them equal to 1, i.e. w1 + w2 +…+ wi = 1. If the relative importance of responses were considered in this study, the outcome of optimisation will be affected. To investigate this a new set of sensitivity analysis varies the relative weights between two responses from 0.1 to 0.8 with a step of change equals to 0.1 while the third response fixed at 0.1. The results are shown in Fig. 11. Fig. 11a illustrates the variation of overall D when the RP2 weight decreases from 0.8 to 0.1 (indicated by a reducing stacked area) with RS or RP1 weight increase from 0.1 to 0.8 (indicated by an increasing stacked area) at the same time. From the figure, It can be seen that when RP2 weight drops D intends to escalate with the increasing RS weight (indicated by red squares) whilst D intends to drop with the increasing RP1 weight (indicated by blue triangles). Fig. 11b indicates the variation of overall D when RS weight decreases from 0.8 to 0.1 (indicated by a reducing stacked area) with RP1 or RP2 weight increase from 0.1 to 0.8 (indicated by an increasing stacked area) at the same time. Overall D drops in both cases when RS weight decreases. A more rapid decline of D is found when RP1 weight increases (indicated by purple diamonds). Fig. 11a shows that when RP1 weight is set to be 0.1, overall D stays on top of the value obtained using geometric mean (indicated by a horizontal dotted line), regardless of how relative weight between RS and RP2 changes. From Fig. 11a and Fig. 11b, it is found that the value of overall desirability D is proportional to the relative weight of response RS implying that speech clarity is the easiest to achieve amongst the three responses. D is more sensitive to RP1 relative weight changes compared to RP2. Despite the variation observed in overall D with regard to changing relative weights in the three responses, optimal configurations identified for the system are similar. For example, optimal points for x1 and x2 for all situations studied are depicted in Fig. 11c and superimposed on the contour plot shown in Fig. 8. It can be seen that all points lie within the 0.6 regions of the plot, indicating that varying the relative weights amongst the three response has little effect on the selection of optimal configurations.
The outcome of this sensitivity analysis firstly demonstrates the validity of the desirability function in accomplishing multiple-objective optimisation. This is confirmed by the small variation of the overall D despite the change in relative weights amongst the three individual desirability ratings. Secondly, in a broader sense, it offers a useful means to investigate the potential effect of response relative weights on the optimal setting of AF parameters. Fig. 11. Sensitivity analysis of overall desirability D with regard to different relative weights assigned to the three responses. In (a) and (b) varying responses are represented using different patterns. Overall D prediction using geometric mean is indicated by a horizontal black dotted line. In (c) all optimal points identified are depicted by red crosses.

Conclusion
Audio signal processing becomes a multiobjective problem when there are several key signal characteristics to be improved at the same time. A multiobjective (MO) approach for tackling such problems, using two novel objectives functions to define peak reduction and speech clarity, has been presented in this paper and applied to improve the performance of a dental drill noise reduction (DDNR) device. For the objective of peak reduction, the objective function is established based on the reduction of target noise peak power relative to its signal floor. The objective function for speech clarity is based on a comparison between the original signal floor value and the filtered signal floor value.
The DDNR employs an adaptive filtering (AF) algorithm using the Normalised Least Mean Square (NLMS) method to achieve steady and fast convergence. Exploration of the different empirical relationships between AF parameters and the objectives was enabled by the successful application of response surface methodology (RSM) and the relationships were presented in the form of statistical regression equations. The MO approach described has successfully improved the DDNR performance, in the physical domain, by effectively reducing unwanted noise peaks (peak reduction objective) while keeping the speech signal that is mixed in relatively unaffected (speech clarity objective). The outcome of the MO optimisation is an average overall improvement of nearly 30% compared to the original design, with a 51% specific improvement in objective speech clarity. A next step will be to apply the proposed approach to a more advanced and complex active noise cancelling (ANC) device.