Speech emotion recognition (SER) is an important application in Affective Computing and Artificial Intelligence. Recently, there has been a significant interest in Deep Neural Networks using speech spectrograms. As the two-dimensional representation of the spectrogram includes more speech characteristics, research interest in convolution neural networks (CNNs) or advanced image recognition models is leveraged to learn deep patterns in a spectrogram to effectively perform SER. Accordingly, in this study, we propose a novel SER model based on the learning of the utterance-level spectrogram. First, we use the Spatial Pyramid Pooling (SPP) strategy to remove the size constraint associated with the CNN-based image recognition task. Then, the SPP layer is deployed to extract both the global-level prominent feature vector and multi-local-level feature vector, followed by an attention model to weigh the feature vectors. Finally, we apply the ArcFace layer, typically used for face recognition, to the SER task, thereby obtaining improved SER performance. Our model achieved an unweighted accuracy of 67.9% on IEMOCAP and 77.6% on EMODB datasets.
Publications
2025
Due to the complicated nature of Parkinson disease (PD), a number of subjective considerations (eg, staging schemes, clinical assessment tools, or questionnaires) on how best to assess clinical deficits and monitor clinical progression have been published; however, none of these considerations include a comprehensive, objective assessment of all functional areas of neurocognition affected by PD (eg, motor, memory, speech, language, executive function, autonomic function, sensory function, behavior, and sleep). This paper highlights the increasing use of digital health technology (eg, smartphones, tablets, and wearable devices) for the classification, staging, and monitoring of PD. Furthermore, this Viewpoint proposes a foundation for a new staging schema that builds from multiple clinically implemented scales (eg, Hoehn and Yahr Scale and Berg Balance Scale) for ease and homogeneity, while also implementing digital health technology to expand current staging protocols. This proposed staging system foundation aims to provide an objective, symptom-specific assessment of all functional areas of neurocognition via inherent device capabilities (eg, device sensors and human-device interactions). As individuals with PD may manifest different symptoms at different times across the spectrum of neurocognition, the modernization of assessments that include objective, symptom-specific monitoring is imperative for providing personalized medicine and maintaining individual quality of life.
Speech signal analysis to support objective clinical decision-making has gained immense interest, especially in neurological disorders. This research assessed the feasibility of speech analysis on the detection of concussions. Using a speech dataset from 82 concussed and 82 healthy participants, we extracted two speech feature sets focusing on Mel Frequency Cepstral Coefficients (MFCCs) to characterize speech articulation. A machine learning pipeline was developed to discriminate concussion speech from healthy speech by applying Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Decision Tree (DT) classifiers. All three classifiers trained on the MFCC-based feature set achieved Matthew's correlation coefficient score above 0.5 on the holdout data set. DT model achieved a 78% sensitivity and 75% specificity. The findings of this research serve as proof-of-concept for speech analysis of concussion detection.
BACKGROUND: Digital biomarkers are increasingly used in clinical decision support for various health conditions. Speech features as digital biomarkers can offer insights into underlying physiological processes due to the complexity of speech production. This process involves respiration, phonation, articulation, and resonance, all of which rely on specific motor systems for the preparation and execution of speech. Deficits in any of these systems can cause changes in speech signal patterns. Increasing efforts are being made to develop speech-based clinical decision support systems.
OBJECTIVE: This systematic scoping review investigated the technological revolution and recent digital clinical speech signal analysis trends to understand the key concepts and research processes from clinical and technical perspectives.
METHODS: A systematic scoping review was undertaken in 6 databases guided by a set of research questions. Articles that focused on speech signal analysis for clinical decision-making were identified, and the included studies were analyzed quantitatively. A narrower scope of studies investigating neurological diseases were analyzed using qualitative content analysis.
RESULTS: A total of 389 articles met the initial eligibility criteria, of which 72 (18.5%) that focused on neurological diseases were included in the qualitative analysis. In the included studies, Parkinson disease, Alzheimer disease, and cognitive disorders were the most frequently investigated conditions. The literature explored the potential of speech feature analysis in diagnosis, differentiating between, assessing the severity and monitoring the treatment of neurological conditions. The common speech tasks used were sustained phonations, diadochokinetic tasks, reading tasks, activity-based tasks, picture descriptions, and prompted speech tasks. From these tasks, conventional speech features (such as fundamental frequency, jitter, and shimmer), advanced digital signal processing-based speech features (such as wavelet transformation-based features), and spectrograms in the form of audio images were analyzed. Traditional machine learning and deep learning approaches were used to build predictive models, whereas statistical analysis assessed variable relationships and reliability of speech features. Model evaluations primarily focused on analytical validations. A significant research gap was identified: the need for a structured research process to guide studies toward potential technological intervention in clinical settings. To address this, a research framework was proposed that adapts a design science research methodology to guide research studies systematically.
CONCLUSIONS: The findings highlight how data science techniques can enhance speech signal analysis to support clinical decision-making. By combining knowledge from clinical practice, speech science, and data science within a structured research framework, future research may achieve greater clinical relevance.
OBJECTIVES: This research study aims to advance the staging of Parkinson's disease (PD) by incorporating machine learning to assess and include a broader multifunctional spectrum of neurocognitive symptoms in the staging schemes beyond motor-centric assessments. Specifically, we provide a novel framework to modernize and personalize PD staging more objectively by proposing a hybrid feature scoring approach.
METHODS: We recruited 37 individuals diagnosed with PD, each of whom completed a series of tablet-based neurocognitive tests assessing motor, memory, speech, executive functions, and tasks ranging in complexity from single to multifunctional. Then, the collected data were used to develop a hybrid feature scoring system to calculate a weighted vector for each function. We evaluated the current PD staging schemes and developed a new approach based on the features selected and extracted using random forest and principal component analysis.
RESULTS: Our findings indicate a substantial bias in current PD staging systems toward fine motor skills, that is, other neurological functions (memory, speech, executive function, etc.) do not map into current PD stages as well as fine motor skills do. The results demonstrate that a more accurate and personalized assessment of PD severity could be achieved by including a more exhaustive range of neurocognitive functions in the staging systems either by involving multiple functions in a unified staging score or by designing a function-specific staging system.
CONCLUSION: The proposed hybrid feature score approach provides a comprehensive understanding of PD by highlighting the need for a staging system that covers various neurocognitive functions. This approach could potentially lead to more effective, objective, and personalized treatment strategies. Further, this proposed methodology could be adapted to other neurodegenerative conditions such as Alzheimer's disease or amyotrophic lateral sclerosis.
2024
Mental health (MH) has become a global issue. Digital phenotyping in mental healthcare provides a highly effective, scaled, cost-effective approach to handling global MH problems. We propose an MH monitoring application. The application monitors overall MH based on mood, stress, behavior, and personality. Further, it proposes objective MH assessment from smartphone data and subjective screening of MH via periodic, short, self-report standardized questionnaires.
Sufficient sleep is essential for individual well-being. Inadequate sleep has been shown to have significant negative impacts on our attention, cognition, and mood. The measurement of sleep from in-bed physiological signals has progressed to where commercial devices already incorporate this functionality. However, the prediction of sleep duration from previous awake activity is less studied. Previous studies have used daily exercise summaries, actigraph data, and pedometer data to predict sleep during individual nights. Building upon these, this article demonstrates how to predict a person's long-term average sleep length over the course of 30 days from Fitbit-recorded physical activity data alongside self-report surveys. Recursive Feature Elimination with Random Forest (RFE-RF) is used to extract the feature sets used by the machine learning models, and sex differences in the feature sets and performances of different machine learning models are then examined. The feature selection process demonstrates that previous sleep patterns and physical exercise are the most relevant kind of features for predicting sleep. Personality and depression metrics were also found to be relevant. When attempting to classify individuals as being long-term sleep-deprived, good performance was achieved across both the male, female, and combined data sets, with the highest-performing model achieving an AUC of 0.9762. The best-performing regression model for predicting the average nightly sleep time achieved an R-squared of 0.6861, with other models achieving similar results. When attempting to predict if a person who previously was obtaining sufficient sleep would become sleep-deprived, the best-performing model obtained an AUC of 0.9448.
Interoception, sometimes referred to as the 'hidden sense,' communicates the state of internal conditions for autonomic energy regulation and is important for human motor control as well as self-awareness. The insula, the cortex of interoception, integrates internal senses such as hunger, thirst and emotions. With input from the cerebellum and proprioceptive inputs, it creates a vast sensorimotor network essential for static posture and dynamic movement. With humans being bipedal to allow for improved mobility and energy utilization, greater neuromotor control is required to effectively stabilize and control the four postural zones of mass (i.e., head, torso, pelvis, and lower extremities) over the base of support. In a dynamic state, this neuromotor control that maintains verticality is critical, challenging energy management for somatic motor control as well as visceral and autonomic functions. In this perspective article, the authors promote a simple series of posture photographs to allow one to integrate more accurate alignment of their postural zones of mass with respect to the gravity line by correlating cortical interoception with cognitive feedback. Doing this focuses one on their body perception in space compared to the objective images. Strengthening interoceptive postural awareness can shift the net result of each zone of postural mass during day-to-day movement towards stronger posture biomechanics and can serve as an individualized strategy to optimize function, longevity, and rehabilitation.
BACKGROUND: Anterolateral ligament and medial collateral ligament injuries could happen concomitantly with anterior cruciate ligament ruptures. The anterolateral ligament is injured more often than the medial collateral ligament during concomitant anterior cruciate ligament ruptures although it offers less restraint to knee movement. Comparing the material properties of the medial collateral ligament and anterolateral ligament helps improve our understanding of their structure-function relationship and injury risk before the onset of injury.
METHODS: Eight cadaveric lower extremity specimens were prepared and mechanically tested to failure in a laboratory setting using a hydraulic platform. Measurements of surface strains of superficial surface of each medial collateral ligament and anterolateral ligament specimen were found using three-dimensional digital image correlation. Ligament stiffness was found using ultrasound shear-wave elastography. t-tests were used to assess for significant differences in strain, stress, Young's modulus, and stiffness in the two ligaments.
FINDINGS: The medial collateral ligament exhibited greater ultimate failure strain along its longitudinal axis (p = 0.03) and Young's modulus (p < 0.0018) than the anterolateral ligament. Conversely, the anterolateral ligament exhibited greater ultimate failure stress than the medial collateral ligament (p < 0.0001). Medial collateral ligament failure occurred mostly in the proximal aspect of the ligament, while most anterolateral ligament failure occurred in the distal or midsubstance aspect (P = 0.04).
INTERPRETATION: Despite both being ligamentous structures, the medial collateral ligament and anterolateral ligament exhibited separate material properties during ultimate failure testing. The weaker material properties of the anterolateral ligament likely contribute to higher rates of concomitant injury with anterior cruciate ligament ruptures.
2023
INTRODUCTION: It is well documented that marked weakness of the quadriceps is present after knee joint injury. This joint trauma induces a presynaptic reflex inhibition of musculature surrounding the joint, termed arthrogenic muscle inhibition (AMI). The extent to which anterior cruciate ligament (ACL) injury affects thigh musculature motor unit activity, which may affect restoration of thigh muscle strength after injury, is undetermined.
METHODS: A randomized protocol of knee flexion and extension isometric contractions (10%-50% maximal voluntary isometric contraction) were performed for each leg on 54 subjects with electromyography array electrodes placed on the vastus medialis, vastus lateralis, semitendinosus, and biceps femoris. Longitudinal assessments for motor unit recruitment and average firing rate were acquired at 6-month intervals for 1 year post ACL injury.
RESULTS: The ACL-injured population demonstrated smaller quadriceps and hamstrings motor unit size (assessed via motor unit action potential peak-to-peak amplitude) and altered firing rate activity in both injured and uninjured limbs compared to healthy controls. Motor unit activity remained altered compared to healthy controls at 12 months post ACL reconstruction (ACLR).
DISCUSSION: Motor unit activity was altered after ACLR up to 12 months post-surgery. Further research is warranted to optimize rehabilitation interventions that adequately address altered motor unit activity and improve safety and success with return to sport after ACLR. In the interim, evidence based clinical reasoning with a focus on development of muscular strength and power capacity should be the impetus behind rehabilitation programming to address motor control deficits.