Finite-element modeling of vocal fold self-oscillations in interaction with vocal tract: Comparison of incompressible and compressible flow model

Finite-element modeling of self-sustained vocal fold oscillations during voice production has mostly considered the air as incompressible, due to numerical complexity. This study overcomes this limitation and studies the influence of air compressibility on phonatory pressures, flow and vocal fold vibratory characteristics. A two-dimensional finiteelement model is used, which incorporates layered vocal fold structure, vocal fold collisions, large deformations of the vocal fold tissue, morphing the fluid mesh according to the vocal fold motion by the arbitrary LagrangianEulerian approach and vocal tract model of Czech vowel [i:] based on data from magnetic resonance images. Unsteady viscous compressible or incompressible airflow is described by the Navier-Stokes equations. An explicit coupling scheme with separated solvers for structure and fluid domain was used for modeling the fluid-structureacoustic interaction. Results of the simulations show clear differences in the glottal flow and vocal fold vibration waveforms between the incompressible and compressible fluid flow. These results provide the evidence on the existence of the coupling between the vocal tract acoustics and the glottal flow (Level 1 interactions), as well as between the vocal tract acoustics and the vocal fold vibrations (Level 2 interactions). c © 2021 University of West Bohemia.


Introduction
Understanding the physiology and biomechanics of human voice production is the basis of voice and speech sciences. Voice is produced through self-sustained oscillations of the vocal folds (VFs) which are excited by air flowing from the lungs. The VF oscillations modulate the airflow and cause acoustic pressure oscillations; these propagate with the speed of sound through the cavities of the vocal tract (VT). The VT acts as a resonator with multiple resonance frequencies; those alter the primary sound generated at the VF level and provide the sound with the typical properties recognized as a human voice.
The voice production mechanisms should, in principle, involve the forward and backward fluid-structure-acoustic interaction (FSAI). Although the current mathematical models, which can handle such complex interaction, are mostly based on partial differential equations (PDEs), To the best of our knowledge, there has not been any study including the compressible NSEs and complete FSAI involving the self-oscillating VFs. The model presented here is based on previous studies of the authors [28,29,64,65] where an finite-element (FE) model of the vocal organs during phonation of Czech vowels was developed. In this study, new two-dimensional (2D) FE models of flow-induced oscillations of the VFs, in interaction with acoustic spaces of the VT, were constructed to enable calculations using either compressible or incompressible airflow.
Although the 2D assumption carries several simplifications, e.g., different formation of turbulence in the VT, absence of spatial vortices or no anterior-posterior modes of VF vibration, it is used here for the sake of a radical reduction of computational costs [62]. It was shown that, up to 3000 Hz, the influence of the spatial dimension of the VT is rather insignificant [80].
The aim of this study was to directly compare the effect of air compressibility on VFs oscillations using an identical FE mesh, identical boundary conditions and identical FSI formulation for compressible and incompressible airflow model. The computed quantities are compared to data found in the literature.
The description of the computational model is divided into sections concerning the geometry of the VF and VT, FE setup, the material model and parameters, the boundary conditions, and the algorithm of the FSAI implementation including the mathematical formulation of the structural and fluid domains.

Vocal fold and vocal tract geometry
The geometry of the VFs was based on the widely used parametric M5 shape, see Scherer et al. [52]. To take into account the layered structure of the VFs [72], four layers of the VF tissue were included in the model -epithelium, superficial layer of lamina propria (SLP), ligament (intermediate and deep layers of the lamina propria) and muscle [72,73]. The geometry of the VT was based on data registered by the MRI during male phonation of the Czech vowel [i:] [20,80].

Finite element model
A 2D FE model of the VF and of the VT with trachea were developed using the software package ANSYS R Academic Research Mechanical, Release 15.0. The FE models of the four-layered tissue of the VFs and of the airways in the VT for the vowel [i:] are shown in Fig. 1. This figure shows also the details of the fluid mesh in the glottal region (marked by the red rectangle) which contains 12 fluid elements over the glottal width. The dimensions of the models are specified in Table 1. The FE model consists of 5 350 linear 3-node and 4-node structural elements, 10 747 linear 4-node fluid elements and 2 × 24 contact elements, which ensure a symmetric contact pair, i.e., the VF collisions. The structural and contact elements have two degrees of freedom (DOF) in each node (displacement in x and y directions) and fluid elements have three DOFs for incompressible flow (velocities in x and y directions and pressure) and four DOFs for compressible flow (velocities in x and y directions, pressure and temperature).
In comparison with [28,64,65], the fluid mesh was refined to capture smaller eddies in the airflow. The size of the fluid element in the y direction between the VFs was 0.025 mm typically. This size was gradually increasing to approximately 5 mm near the lips and near the entrance to the trachea. Taking into account the condition of 10 nodes (or 9 linear elements) per acoustic  Table 1. The L coordinate shown in c) denotes the length of the whole FE model 175.61 wavelength [7], the fluid mesh can handle frequencies up to 7 844 Hz. This limit is reasonable since the time step value lowers the maximal frequency range to 5 000 Hz due to sampling reasons. Majority of the acoustic energy contained in the spectrum of human voice is located below 5 000 Hz [12,59].

Material used for the model
Each layer of the VFs was considered to be a homogeneous isotropic linear elastic material with properties derived from [35,73] and tuned in [28,29,64,65], see Table 2. Properties of the dry air were set to the parameters known for the ideal gas of 36 • C [6], see Table 3. The speed of sound c air [m · s −1 ], as a function of the equation of state and its absolute temperature, was acquired from the well known equation where κ is the heat capacity ratio given in Table 3, R is the universal gas constant with the value of 287.05 J · kg −1 · K −1 , T is the air temperature in [K] and t is the air temperature in [ • C]. In order to insert the damping of the VF layers to the ANSYS R transient analysis (see Section 2.5), proportional (Rayleigh) damping coefficients α (mass-proportional damping) and β (stiffness-proportional damping) were used. These two coefficients were acquired using the modified formulas from [23] ( The damping coefficients for the SLP layer required as an input a pair of chosen damping ratios b p1 = 0.20 and b p2 = 0.30 [15,37,67] and a pair of natural frequencies f 1 = 70.65 Hz and f 2 = 142.54 Hz of the undamped VF computed by modal analysis in the ANSYS R . Thus, the damping of the SLP layer resulted in α = 60.377 6 s −1 and β = 0.000 6 s. The damping of the other layers of the VFs was set to α = 116.527 9 s −1 and β = 0.000 4 s, which correspond to damping ratios b p1 = b p2 = 0.2 at the above mentioned natural frequencies f 1 and f 2 .

Boundary conditions
Boundary conditions and loading were applied in the same manner in both the FE models. The entrance to the trachea was excited by a constant lung pressure p L = 270 Pa (over the considered atmospheric pressure of the 101 350 Pa) as the only driving parameter. This lung pressure is at the lower end of the physiological values observed in humans [10]. Such a low value of the p L was chosen to avoid excessive fluid mesh distortion in the glottal region observed at higher lung pressures.
Zero pressure was applied at the end of the VT to simulate open lips. Radiation to an open space was not considered in this study, similarly as in other studies [19,43,60,62,83,86,87]. The PML layer of elements used in some models [24,63] could not be applied here because the PML layer is designated for acoustical elements rather than the elements for fluid flow simulations.
Rigid walls of the VT and trachea were set to be totally reflective with no-slip condition for the fluid flow. The VFs were clamped at the lateral ends.

Fluid-structure-acoustic interaction algorithm
The FE model of the FSAI was solved in a time domain using a transient analysis with the time step of 1 · 10 −4 s. The structural model (VFs) was described by a momentum equation [6,13,21] where ρ s is the density of the structural model, u s are the unknown displacements, τ s is the Cauchy stress tensor and f s is the structural load vector. The structural model was coded in ANSYS R Parametric Design Language (APDL) and included large deformations and VF collisions. The generalized Hilber-Hughes-Taylor time integration method (HHT-α method) was used for the time integration of the momentum equation [6]. For this purpose, the momentum equation (4) had to be transformed to a weak formulation and discretized by the FE method, which led to a system of equations where M, C, and K are the structural mass, damping and stiffness matrices,ü(t) is the nodal acceleration vector,u(t) is the nodal velocity vector, u(t) is the nodal displacement vector and F s (t) is the applied load vector. The airflow model in the trachea and the VT was developed in the ANSYS R FLOTRAN TM code. The governing equations consist of the continuity equation, the unsteady viscous compressible NSEs and the energy equation for the adiabatic process.
All of the equations except the continuity equation, which will be discussed later, could be written in a unified form as follows [5,11,82] ∂ ∂t The solved variables are: v x -fluid velocity in x direction, v y -fluid velocity in y direction, Ttemperature and p -pressure. The coefficients C p , K, Q v , g x and g y stand for the specific heat of 1 004 J · kg −1 · K −1 , the thermal conductivity of 2.699 5 · 10 −2 W · m −1 · K −1 , the volumetric heat source and the gravity acceleration components in x and y directions, respectively. The distributed resistances R x and R y represent any source term one may wish to add.
The kinetic energy E k , the viscous work term W v and the viscous dissipation term Φ can be expressed as where the latter two terms are written in the Einstein summation convention. The complete set of the governing equations, except the continuity equation, is closed by the thermodynamic state relations where R is the universal gas constant given in Section 2.3 and T 0 is the total (stagnation) temperature of 309 K. The NSEs and the energy equation (6) were discretized by the Galerkin method of weighted residuals [5] which results in solution of the system of equations the pressure correction equation is derived which enables to use the segregated velocity-pressure solution. This nonlinear solution procedure belongs to a class of semi-implicit method for pressure linked equations (SIMPLE) handled by ANSYS R FLOTRAN TM [5,82]. The enhanced SIMPLEN algorithm was used in the case of compressible fluid and the SIMPLEF algorithm was used in the case of incompressible fluid [5]. The incompressible fluid assumption breaks the linkage between the continuity equation and NSEs on one hand and the energy equation on the other hand [82]. Omitting density variations also simplifies the continuity equation and the NSEs. The energy equation is not solved in such a case and therefore one DOF (the temperature T ) is spared, which leads to shorter solution times. The sufficiently high bulk modulus of 10 15 Pa was used in the incompressible state equation (8a) instead of the term RT [5].
Each DOF (independently on the compressibility) was solved in a sequential order. Each equation was solved with intermediate values of the surrounding DOFs. The solution of all the equations in turn and subsequent update of the values is referred to as global iteration [5]. Only a few global iterations are usually sufficient to achieve default convergence criteria (normalized rate of change of each DOF over the solved domain) of the all DOFs: 1 · 10 −12 for pressure and temperature and 1 · 10 −5 for velocities. The FSAI algorithm is schematically described in Fig. 2. Before initiation of the FSAI computation, the VFs were located 0.2 mm from each other. Then, they were slightly adducted and put into contact. This was realized as a static analysis (the solution of deformation) of the structural model. The VFs were pushed together by a prescribed displacement 0.3 mm at both lateral ends.
Afterwards, the FSAI were solved using explicit coupling scheme with separated solvers for the structural and fluid model in the global iterations [5]. The results of the airflow solution were transferred as loads on the VF surface. This then allowed computing the VF motion. In the last part of the algorithm, the material of the fluid mesh was changed to a very compliant solid material which allowed morphing of the fluid mesh geometry based on the VFs displacement. The arbitrary Lagrangian-Eulerian (ALE) approach was utilized to represent moving boundaries of the fluid domain. The distance between the faces of the VFs was monitored during the solution; when it was below the defined minimal distance (0.15 mm used here), the mesh morphing was stopped at these elements and the flow velocity was set to zero. Under these conditions, the structure and fluid nodes were temporarily decoupled and were allowed to come into contact. Collision between the two VFs was modelled by a symmetric surface-to-surface contact pair of face elements solved by the augmented Lagrangian method implemented in the ANSYS R Academic Research Mechanical, Release 15.0. Coulomb friction was assumed between the contact surfaces. The default friction coefficient of 1.0 was used to achieve sticking during the glottal closed phase.

Studied output and evaluation of results
The results from the structural model were assessed from the selected nodes of the VF surface (see the evaluating points marked in Fig. 1a). Vocal fold edge waveforms were captured from the minimal width of the glottis in the medial-lateral direction in each iteration (VF L min , VF R min ). Also, the vibration characteristics were evaluated from the VF L min , VF R min to capture the maximum glottal width (W max g ), open quotient (OQ), closing quotient (ClQ), speed quotient (SQ) and fundamental frequency of VF vibration (f o ) [44].
The fluid results were evaluated in four points: 30 mm below (point s), 1 mm below (g), 30 mm above the upper surface of the VFs (e), and 5 mm up-stream from the lips (m), recall Fig. 1c.
Electroglottographic (EGG) signal [73] was simulated by calculating the ratio between the number of surface epithelium elements in contact and the total number of the elements (in our case 24) on medial VF surface. All 24 elements in contact produced the EGG value of 1 and none of the 24 elements produced value of 0. In order to avoid discontinuities due to density of the structural mesh, the data were subjected to a Gaussian smoothing implemented in MATLAB R R2017a.  Table 4 compares VF vibration characteristics for the compressible and incompressible flows evaluated from Fig. 3.  The maximum glottal width W max g is 7.97% higher (0.39 mm vs. 0.43 mm) in the case of compressible fluid flow. The higher W max g results from the more compliant environment of the compressible flow compared to incompressible flow and from the structure-acoustic interaction (SAI) below the first acoustic resonance in the VT. The W max g through the both computed variants was less than 0.5 mm which is about 2-4 times less than the usual width of glottis reported in the literature [4,72]. Such an effect is caused by the small subglottic pressure p L considered in this study. The rather small pressure p L influences also other quantities proportionally, more in Section 3.2.

Influence of air compressibility on motion of the vocal folds
A small peak located at the beginning of the opening phase was observed in the compressible solution, see Fig. 3c. This can be attributed to the interaction of the acoustic waves with the VF vibrations: the little peak was missing in the first period of the compressible flow when the acoustic wave was not yet fully present in the VT.
The small peak prolonged open phase and made it unsymmetrical. On that account the OQ from compressible solution falls within the range observed in human male subjects during normal phonation, where the 95% confidence interval of the OQ has been reported to be 0.36-0.76 in the middle of glottis [44], even though the low p L used here applies rather to a soft phonation. More unsymmetrical open phase caused that SQ is higher than men's 95% confidence interval 0.32-1.44 [44] contrary to SQ from incompressible solution, which lies in that interval. Asymmetry of the open phase is also demonstrated by ClQ.
Combination of longer open phase and shorter closed phase in compressible model (visible in Fig. 3c) lead to 2.74% higher fundamental frequency f 0 (133 Hz vs. 137 Hz).
These differences show interaction between acoustics at compressible solution and other domains and therefore illustrates the importance of the SAI for the behavior of the voice source. Similar effect was published in [46], where changes in CQ and ClQ were caused dominantly by the acoustic loading. The acoustic waves are apparent as oscillations in both subglottal (p s ) and epilaryngeal (p e ) pressure signals in the compressible case (Fig. 4b). The peaks occurring in the p s and p e in the incompressible case (Fig. 4a) are caused solely by the VF self-oscillations. In both compressible and incompressible cases, the subglottal pressure p s oscillates around the mean value p L = 270 Pa and the epilaryngeal pressure p e oscillates around 0 Pa, as expected.

Comparison of the simulated waveforms
In both cases, the VFs vibrate with phase differences between the lower and upper margins. This can be observed in the displacements u x of the two nodes VF 1 R and VF 2 R (Fig. 4c, d); recall Fig. 1 for location of these nodes. The glottal width W g is wider in the case of compressible flow as shown also in Fig. 3c.
The flow velocity waveforms v g (Figs. 4e and 4f) are similar to each other, with higher maxima in the case of the incompressible flow. Notice that the velocity maxima occur at the very end of the open phase when the glottal width becomes small. This is mainly caused by the inertance of a fluid column: the fluid flow, see Fig. 4e, leads the pressure, see Fig. 4a, in phase as in [75]. The waveform shapes are very similar to those obtained by Alipour and Scherer [4], only the v g maximum is about two times lower in our case which is related to the four times lower p L used here compared to [4,72]. The shape of the v g is related to the glottal flow volume velocity U g (Figs. 4g and 4h): zero U g occurs during the glottal closure matching the v g .
The simulated EGG signal confirms that the contacting phase is significantly shorter than the decontacting phase; this agrees with the observations of normal EGG waveforms in human phonations for vowel [i:] [16].
The pressures near the mouth p m are shown in Figs. 4k and 4l. Whereas in the incompressible case (Fig. 4k) these pressure waveforms are similar to the epilaryngeal pressures p e (recall Fig. 4a), the compressible case (Fig. 4l) reveals pressure fluctuations related to the acoustic resonances of the VT.
The power spectral densities (PSDs) of the pressures p m are displayed in Figs. 4m and 4n. The compressible case (Fig. 4n) shows formant peaks in PSD which are well within the normal range of formant frequencies F 1 ≈ 215-348 Hz and F 2 ≈ 1952-2558 Hz (marked with green) reported for the Czech vowel [i:] [56]. The formant frequencies approximate the first two resonances of the VT calculated by the 2D and also 3D modal analysis in ANSYS R . In both cases, they were 318 and 1957 Hz, respectively. The boundary conditions were in accordance with the open lips and closed VFs. The simulated formants were validated previously using an experimental setup with the identical geometry of the VT made from plexiglass [32] and also by the sound signal of the same vowel [i:], which was acquired during the scan of the original geometry in MRI [80]. The simulated formants fit the measured [i:] vowel formant ranges from both [32] and [80].
No formant peaks occur for the incompressible flow (Fig. 4m), reflecting the fact that there are no acoustic waves propagating in the VT in this case.

Compressibility influence on air pressures and velocities
The flow velocities in the glottis region, together with the VF shape, are depicted in Fig. 5 for three different time instants at which the VFs take convergent, parallel and divergent crosssectional shapes. The jet deflection of fluid flow above the VFs appears during typical oscillation period always, as shown.   [4].
The changing shape of the VFs during the oscillatory cycle influences the pressure and velocity distribution along the FE model from the trachea to the epilarynx. The pressures shown in Figs. 6a and 6b evince similar dependence: in the trachea, the pressure is close to the preset value of 270 Pa and it drops after passing the VFs. The largest pressure drop occurs during the divergent shape independently of compressibility which corresponds to the observations in [4,19,45,63]. A local minimum of the pressure during the divergent glottis appears just below the upper surface of the VFs. The higher pressure below VFs and the lower above the VFs ensure nonzero airflow, which allows the VF oscillation [4].
The centerline velocities for the incompressible and compressible airflows are shown in Figs. 6c and 6d, respectively. As expected, the highest velocities occur just below the upper VF level in the narrowest part of the glottal channel. The overall shape of the velocity curves is similar for the both computed variants, as is the location of the highest velocity peak during the divergent shape. The location of the velocity peak in the divergent glottal shape matches the local minima of the centerline pressure; the match between the pressure minimum and velocity maximum is a logical consequence of the energy preservation principle and agrees with the data reported in [4,45]. A deeper pressure drop produces higher peak in the centerline velocity.
When the flow passes the VF level, there is a region influenced by vortices in the expanded duct closely above the VF level. Except for the glottis and surrounding areas where the flow velocities can exceed 20 m · s −1 , the velocities are low, around 1 m · s −1 .

Overall discussion and conclusion
This study presents a 2D FE model which simulates airflow-driven self-sustained oscillations of the VFs with collisions, interacting with subglottal and VT, solved numerically using compressible Navier-Stokes equations. The model inherently incorporates the complete fluidstructure-acoustic interactions. To the best of our knowledge, such level of complexity for the modeling of phonatory phenomena has not yet been reported in the literature.
In this study, the model was explored to study the influence of air compressibility on the resulting flow, vibratory and acoustic parameters while keeping all other model parameters constant. In FE modeling studies, the air in the trachea and the VT is commonly considered to be incompressible. This approach simplifies the simulation, but disregards interactions between acoustics and the flow and between acoustics and the VF structure. If we put the acoustic interactions aside, incompressible flow should meet, in general, two conditions: The first onethe flow velocity should be low, under Ma = 0.3 (c. 106 m · s −1 ) [6]. In our study, the maximal flow velocities v g , which appeared in the glottal region, were under 35 m · s −1 (c. Ma = 0.1). These are rather low velocities which meet the condition. They can be related to the rather low lung pressure p L . The values of the v g and p L can be expected to be two to four times higher in simulations related to the most common conditions from healthy human phonations [4, 38-40, 61,72], which still meets the condition on low Mach number. And the second one -the variation of the fluid density in the flow should be minimal [82]. Our simulations with compressible equations reveal that the obtained density variation is nearly thousand times smaller than the mean value of air density. Thus the second condition is met also. Results from our model indicate, however, that the incompressible flow should be used carefully even if these two assumptions are fulfilled. Most importantly, the complete FSAI should be considered for accurate modeling of the phonatory phenomena to include generally non-linear relationships between acoustics and other two fields.
Two types of non-linear source-tract interactions can be distinguished in a phonatory system [75,77]: level 1 interactions, where the glottal flow waveform is influenced by acoustic resonances; and level 2 interactions, where the acoustic resonances influence also the VF oscillations. Our simulations with compressible air demonstrate the influence of VT acoustics on both the glottal flow (Fig. 4) and on the VF oscillations (Fig. 3).
The air compressibility affects the VF motion, glottal flow, and the pressure waves in the VT. The most affected quantity was the open quotient OQ, while the vibration frequency f 0 was affected much less. Both the incompressible and compressible models achieved symmetric and stabilized self-oscillations after a few periods of a transient regime. The VFs changed their shape between convergent and divergent, and the fundamental vibration frequency f 0 matched that of the human VFs [72].
As far as the limitations are concerned, the model presented here had numerical difficulties with the glottal fluid mesh, which was heavily distorted in cases of increased lung pressure values. Therefore the lung pressure was kept rather low here. This limitation is planned to be addressed in future studies.
For simplicity, VT for only one Czech vowel, [i:], was considered here. The formant frequencies of the simulated vowel were in good agreement with the vowel formant frequencies observed in Czech speakers [56], with the results of other experiments based on VT with the identical geometry [32] and with the formant frequencies from the sound signal measured during the MRI scan [80].
In principle, the model allows using any VT shapes for different vowels. This opens up new possibilities for investigating the phonatory phenomena relevant for human phonation including the influence of different VT shapes on the VF oscillations. Information on these phenomena is relevant not only for understanding the physiologic speech processes, but also for speech therapy and singing voice where the source-vocal tract interactions are considered to play an important role [30,49,74,76,81].