Collaborative Federated Learning for Healthcare: Multi-Modal COVID-19 Diagnosis at the Edge

Despite significant improvements over the last few years, cloud-based healthcare applications continue to suffer from poor adoption due to their limitations in meeting stringent security, privacy, and quality of service requirements (such as low latency). The edge computing trend, along with techniques for distributed machine learning such as federated learning, has gained popularity as a viable solution in such settings. In this paper, we leverage the capabilities of edge computing in medicine by evaluating the potential of intelligent processing of clinical data at the edge. We utilized the emerging concept of clustered federated learning (CFL) for an automatic COVID-19 diagnosis. We evaluate the performance of the proposed framework under different experimental setups on two benchmark datasets. Promising results are obtained on both datasets resulting in comparable results against the central baseline where the specialized models (i.e., each on a specific image modality) are trained with central data, and improvements of 16% and 11% in overall F1-Scores have been achieved over the trained model trained (using multi-modal COVID-19 data) in the CFL setup on X-ray and Ultrasound datasets, respectively. We also discussed the associated challenges, technologies, and techniques available for deploying ML at the edge in such privacy and delay-sensitive applications.


I. INTRODUCTION
The main motivation for using data storage and computing at the edge stems from the desire to make high quality computing resources available closer to the users and to reduce the need for end devices to exchange private data with centralized servers [?].There is an ongoing trend to deploy machine learning (ML) algorithms at the edge enabling consumers and corporations to enjoy and explore new opportunities in different application domains, such as automotive, security, surveillance, and other smart city services like healthcare [2].This desire is particularly strong in the healthcare industry where the stakes are high and various stakeholders (including consumers, governments, and service providers) have stressed the need for foolproof safeguards for ensuring data security, user privacy, and ethical data use [1].This motivates the use of edge computing in healthcare settings to meet the high Email: adnan.qayyum@itu.edu.pkexpected standards for patient privacy and security as well as stringent requirements for quality of service (high reliability and low latency).Healthcare is a major application domain that can benefit from edge computing.Generally, healthcare centers in remote areas lack advanced medical equipment and other healthcare facilities resulting in poorer access to health services by the people living there.Thanks to the recent advancement in telemedicine, the provision of health services remotely, using audiovisual technology, is a reality now.Large volumes of medical data including ultrasound and X-ray images could be transmitted to major hospitals with advanced diagnosis facilities for diagnostic and training purposes.However, there are several challenges associated with a typical cloud-based infrastructure, such as low bandwidth, high latency, transmission cost, and increasing concerns about data security and privacy.Edge computing, which aims to store and process data locally or closer to edge devices, on the other hand, results in low latency and increased privacy and data security.
A typical IoT environment for smart healthcare is illustrated in Figure 1 in which data collected by different sensors is processed at the edge for different applications using ML techniques.Once the ML predicts an event, the edge devices triggers an action or request service in the cloud.ML algorithms can also be executed concurrently in the cloud as well as at the edge as shown in the figure.However, with local data storage and processing resources (through cloud computing), such applications enjoy a significant improvement in terms of processing time by avoiding networking congestion.More importantly, real-time processing at the edge improves the performance of delay-sensitive applications, such as healthcare, surveillance and automotive, by avoiding any potential latency or delay occurred during data transmission between end devices and the cloud.In addition, ML at the edge results in increased security and privacy as sending data back and forth from the cloud may lead to security threats.
In this paper, we leverage the capabilities of edge computing in medicine by analyzing and evaluating the potential of intelligent processing of clinical visual data related to COVID-19 at the edge allowing the remote healthcare centers to benefit from the multi-modal collaborative learning paradigm without sharing any information about the modality of the local data and the data itself.A number of recent research efforts have focused on diagnosing COVID-19 using AI and data science methods [3]; relatively little work has however focused on using edge AI for COVID-19 diagnosis.
To this aim, we utilize an emerging concept of clustered federated learning (CFL) and propose a CFL-based collaborative learning framework for an automatic multi-modal diagnosis of COVID-19.Our approach is well suited to the task of COVID-19 diagnosis as visual data (i.e., CT scans, X-rays, and ultrasound) is collected at different centers and could be used to build a joint/shared ML model in a cloud-edge infrastructure able to diagnose COVID-19 in both X-ray and Ultrasound images.The proposed framework is evaluated on two benchmark datasets under different experimental setups and we have achieved encouraging results using CFL that are comparable with the baseline results (when the model is trained with central data).We also discuss in detail the potential applications, associated challenges, technologies, tools, and techniques available for deploying ML at the edge in such privacy and delay-sensitive applications.We note that we use the term multi-modal model to represent a single model capable of diagnosing COVID-19 in both X-ray and Ultrasound imagery when provided separately.
The main contributions of the paper are as follows: 1) To highlight the potential of intelligent processing of clinical data at the edge, we propose a collaborative learning framework for COVID-19 diagnosis by leveraging a CFL approach enabling remote healthcare centers to benefit from each other's data without sharing the data itself and associated information.2) We also demonstrate how the performance of conventional FL is affected by the divergence of the distribution of data from different sources (i.e., X-ray and Ultrasound imagery), and how CFL can help to mitigate the adverse impact.3) We also highlight the potential challenges and enabling factors that enable the deployment of ML/DL models to the edge.4) Finally, we elaborate on the open research issues related to deploying ML at the edge for healthcare applications that require further investigation.Organization of the paper: The rest of the paper is organized as follows.Section II-A provides a broad discussion of the related work on automated COVID-19 diagnosis as well as the different challenges encountered in deploying ML on the edge along with a discussion on enabling technologies.The case study for collaborative learning for multi-modal diagnosis of COVID-19 is presented in Section III and results are presented in Section IV.Various open research issues that require further investigation are presented in Section V. Finally, Section VI provides some concluding remarks.

II. BACKGROUND
In this section, we provide background on the related work on automated COVID-19 diagnosis as well as a discussion on the potential challenges that hinder deployment ML at the edge along with the different enabling technologies.

A. Existing Automated COVID-19 Diagnosis Work
COVID-19 has been a strong focus of the research community in 2020, especially after it was declared in March by the World Health Organization (WHO) to be a pandemic, with diverse efforts focusing on diagnosis [4], treatment [5], and the development of the potential vaccine [6].Data science methods-particularly, ML and data visualization techniques-are playing a major role in the international response against the COVID-19 pandemic with some key applications being risk assessment, contact tracking, fake news detection, sentiment analysis, and screening and diagnosis [3].The focus of this paper is on automated screening and diagnosis; we shall discuss next some of the prominent related techniques relying on different types of information (e.g., audio and visual data) that have been proposed.
A number of efforts have focused on automated image analysis in a bid to speed up the COVID-19 diagnosis process [7].To this aim, three different medical imaging modalities, namely computerized tomography (CT), Ultrasound scans, and X-radiation (X-ray), have been mostly exploited.To facilitate research on image-based solutions for COVID-19 diagnosis, several datasets have been collected and made publicly available [7], [9].For instance, Maghdid et al. [9] collected a comprehensive dataset containing a total of 170 X-rays and 361 CT scan images from different sources.Cohen et al. [10] also provide a collection of X-rays and CT scans of confirmed COVID-19 patients.A collection of COVID-19 patients' CT scans has also been made publicly available for research purposes in [11], [12].Born et al. [13], on the other hand, provide a lung ultrasound (POCUS) dataset containing a total of 1103 images including 654 COVID-19, 277 bacterial pneumonia, and 277 healthy controls samples.
A vast majority of the image-based solutions for COVID-19 diagnosis relies on CT scan images.For instance, Wan et al. [14] proposed a deep learning model for extracting COVID-19's specific features/textures in CT scans of confirmed cases to extract useful clinical insight before the pathogenic tests.An evaluation of a reasonable amount of confirmed cases showed encouraging results with an average test accuracy of 73.1%.Butt et al. [15] proposed a two-phase solution for COVID-19 diagnosis in CT scans.Initially, a pre-trained 3D Convolutional Neural Network (CNN) is employed to extract potential infectious regions in CT scans followed by a CNN-based classification framework to classify the candidate regions into COVID-19, influenza, and non-infectious regions.Li et al. [16] also proposed a 3D CNN-based framework to extract both local and global deep features for diagnosis COVID-19 in CT scans.One of the key challenges to CNN-based solution is the unavailability of a large-scale CT scans datasets.In order to deal with the challenge, Afshar et al. [17] proposed a Capsule Networks based deep learning framework, namely COVID-CAPS, for COVID-19 diagnosis in X-ray images.Moreover, to further enhance the capabilities of the proposed model, the authors used an external dataset composed of 94, 323 frontal view chest X-ray images for pre-training and transfer learning purposes.
There are also methods relying on X-ray images for COVID-19 diagnosis.For instance, in [18] a pre-trained deep model is fine-tuned on X-ray images for COVID-19 diagnosis.Similarly, Sethy et al. [19] trained a Support Vector Machine (SVM) classifier on features extracted via ResNet-50 [20] from X-ray images for classification of COVID-19 and non-COVID-19 cases.Ali et al. [21] evaluated the performance of several existing deep models in diagnosing COVID-19 in Xray images.Islam et al. [22] on the other hand proposed a deep framework combining CNNs and Recurrent Neural Networks (RNNs) for diagnosis of COVID-19 in X-ray images.Initially, features are extracted with a CNN, which are then feed into a Long short-term memory (LSTM) for diagnosis/detection purposes.Kassani et al. [23] provide a detailed evaluation of several existing deep models and classification algorithms to find a best combination for COVID-19 diagnosis in both X-ray and CT scans.However, both modalities are treated individually.The deep models are used for feature extraction, which is then fed into different classification algorithms.
Some image-based COVID-19 diagnosis methods also rely on a recently introduced concept of Federated Learning (FL) to ensure data privacy in a collaborative learning environment, where several hospitals can participate in training a global ML model.For instance, in [24] a deep model is collaboratively trained in a federated learning environment on CT scans collected from different sources.Kumar et al. [25] on the other hand proposed a blockchain-FL-based framework for collecting data (CT scans) from different hospitals, and collaboratively training a global deep model.Moreover, several exiting deep models have also been evaluated in the proposed federated learning framework.In [26], a federated learning technique is employed for training a global model on electronic health records from various hospitals to predict mortality within seven days in hospitalized COVID-19 patients.
B. Challenges in Deploying ML at the Edge 1) Resource Scarcity and Heterogeneity: Heterogeneous edge devices with varying computational, storage and communication resources are a major bottleneck for the deployment of ML on the edge.ML algorithms in general and deep learning (DL) in particular require a large amount of computational and processing resources making the deployment of ML impractical in several edge computing applications.DL models are usually large and are computationally expensive, as both the training of a deep model and its inferences are typically performed on power-hungry GPUs and servers while, the edge devices are designed to be operated at low power and usually have frugal memory, therefore, deploying DL models on the edge devices is very challenging.One another important challenge is the availability of a power source at edge device, i.e., a battery with long power backup is always desirable in a typical edge computing network.In addition, the size of the network and systems constraints is also a major challenge that can result in only a few devices being active at a time [27].
2) Network Communication: The heterogeneity of the computational and communication resources also led to slow and unstable communication.In addition to resource heterogeneity, there are other considerations as well, e.g., the Internet upload speed is typically much slower than the download speed [2].Therefore, in an edge computing environment in which ML/DL models are being trained on the client site stable and powerful Internet connection is always desirable, otherwise, the unstable clients will be disconnected from the network that will result in a drop in performance.On the other hand, deploying ML at the edge saves expensive communication, i.e., we do not require the local (raw) data to be transmitted to the cloud/server.
3) Statistical Heterogeneity: The statistical heterogeneity, due to data generated by different types of devices in an edge computing environment, can lead to many efficiency challenges.For instance, the optimization/training of a ML/DL hyperparameters becomes difficult, which directly affect its performance.To address the statistical heterogeneity techniques such meta-learning can be used that can enable devicespecific modeling [28].
4) Privacy and Security Challenges: Despite being able to train joint models with sharing data in a collaborative learning environment using FL, privacy and security challenges arise with the presence of malicious devices.For instance, an adversary can learn sensitive information using the model parameters and the shared model.As shown in [29], privacyrelated information can be inferred from the shared weights even without getting access to the data itself.To restrain leakage of privacy-related information from the shared model, different privacy-preserving techniques can be leveraged, such as cryptographic approaches and differential privacy [30].
5) Adversarial ML: Despite the state of the art performance of ML/DL techniques in solving complex tasks, these techniques have been found vulnerable to carefully crafted adversarial examples [31].In a federated learning setup, a client or multiple clients can be compromised to realize the attacks on the whole network.For instance, local poisoning attacks using compromised attacker devices are presented in [32].The authors demonstrated that their proposed attacks can increase the error rates of the distributively trained model on four real-world datasets.Moreover, a systematic review focused on different adversarial ML attacks and defenses for cloud-hosted ML models can be found in [33].
C. Enabling Technologies: Building Blocks for ML at Edge 1) Schemes for deploying ML at the Edge: In recent years, enormous growth has been observed in the computational power of edge devices, allowing them to play a more important role than just collecting data in IoTs.ML can contribute significantly in fully utilizing the potential of edge devices in numerous exciting applications (e.g., smart healthcare using wearables technologies and AI-empowered sensors, etc.) and turn them into more useful components of an IoT environment [34].ML could be employed at the edge in several ways, such as inference, sensor fusion, transfer learning, generative models, and self-improving devices.In this section, we briefly describe some of the most commonly used schemes.
• Inference: The inference capabilities of ML, which aims predicting unseen objects/classes based on the previous knowledge/trained data, help the IoTs to perform different activities, such as cancer prognosis, brain tumor classification, and other clinical data analysis at the edge devices resulting in reduced latency and bandwidth in telemedicine [35].• Sensor Fusion: ML in conjunction with signal processing algorithms can be used for the fusion of information from different sensors enabling efficient utilization of the available information.With fusion capabilities, individual sensors in an IoT environment can be converted into sophisticated synthetic sensors to solve complex problems more accurately.For instance, in healthcare data from several sensors/sources can be combined efficiently to predict a clinical event, such as heart failure [36].• Transfer Learning: Transfer learning, which aims to reutilize the knowledge of one domain in another domain by fine-tuning a pre-trained model trained on a larger dataset, can help them to learn on a smaller dataset with less computational resources.In an IoT environment and in particular, in healthcare applications where the data is scarcely available, the transfer learning technique can be used to balance workload and latency where the pretrained models are put at the cloud and are shared among edge devices to be fine-tuned for specific tasks [37].• Generative Models: Generative learning can also be useful in edge computing where generative models can be used for the approximation of the original data at the clouds to be used for training models at edge devices for applications with less training samples or to solve complex tasks with minimal computation from the clouds.Generative deep models have already been explored for the generation of synthetic medical images [38].• Self-improving devices: In a typical IoT environment, ML techniques can also be used to enable end devices to optimize their performance and improve continuously based on the collected data and behaviors of other devices.Such strategies help to configure the devices faster which ultimately leads to faster and efficient implementation and deployment.
2) Hardware Optimization Techniques: For successful deployment of ML at the edge, the two critical requirements of edge computing-namely (i) low power consumption, and (ii) high performance-need to be fulfilled.Thus, off-the-shelf solutions are not practical to intelligent processing of data at the edge devices, and custom hardware architectures need to be developed.In this section, we discuss some hardware optimization techniques to optimize hardware resources for deploying ML at the edge.
a) Decentrailized Distributed Computing: In edge computing, computations are completely or largely performed on end devices in a distributed computing fashion.Also, edge computing brings data, applications, and services closer to end devices while eliminating the need for centralized clouds that requires infrastructure decentralization, such kind of decentralization can be efficiently achieved using blockchain technologies [39].Therefore, computational resources can be shared among end/edge devices by employing blockchain and smart contracts technologies thus allowing computational resources demanding ML applications to be deployed at the edge.For instance, different design requirements and challenges in designing a decentralized edge computing and IoT ecosystem are presented in [40].This study is specifically focused on the need of using decentralized trust schemes for the elimination of trust in centralized entities and highlights the potential of using distributed ledger technology, i.e., blockchain for achieving the feature of decentralization.The backbone of blockchain technologies is the distributed consensus mechanism enabling secure communication among trust-less participants without the intervention of a central controlling unit.There are many facets of blockchain with different distributed consensus methods that can be used for edge-centric IoT systems [41].
b) AI Co-Processors: Portable intelligent and dedicated co-processors are considered to be the driving force for deploying AI/ML models at the edge.Different types of specialized processors can be integrated into a single system or chip thus forming a heterogeneous computing paradigm optimized for a specific type of task.In general, AI coprocessors have two common features: (1) enables parallel computing using multiple mini-cores; (2) enables accelerated data fetching using distributed memory that is placed right to mini-cores.

D. Algorithmic Optimization Techniques
The development and advancement of ML algorithms are promising aspects that facilitate the successful application of ML at the edge.In this regard, various algorithms and techniques can be leveraged to enhance and reduce the computation of the parameters in ML models by exploiting different properties such as sparsity.The widely used methods are described below.
• Parameter Efficient Networks: To efficiently deploy ML models at the edge, computation and memory-efficient architectures of ML/DL models are highly desirable.To facilitate embedded ML computing, various architectures of ML models have been proposed in the literature that can be leveraged to deploy ML models on the edge, e.g., Mobile Net [42] and SqueezeNet [43].These architectures are designed with a key focus on reducing computation costs associated with the training and inferences of ML models while maintaining accuracy.• Network Pruning: The literature suggests that a penalty of neurons in the trained model does not contribute towards the final accuracy, therefore, such neurons can be prune to save some memory.Google's Learn2Compress 1 has found that neurons can be reduced by a factor of 2 while retaining an overall accuracy of 97%.To this aim, several algorithms have been proposed in the literature, such as learning important connections and weights among neurons [44] and learning structural sparsity in deep models [45].Moreover, many ML models perform parameter computation using 32-bit float values.On the other hand, edge devices typically operate on 8-bit values or less.Therefore, the model size can be significantly reduced by reducing precision.• Network Distillation: Network distillation is a method for transferring knowledge learned by a larger model to a smaller model.Together with transfer learning, which deals with the transfer of knowledge learned from one domain to another domain, network distillation holds the substantial potential to significantly reduce model size without comprising much on performances in terms of accuracy.In addition, network distillation can be benefited from other hyperparameters tuning algorithms as well.For instance, the distillation method has been successfully used for application-specific and resourceconstrained IoT platforms [46].

III. COVID-19 DIAGNOSIS USING COLLABORATIVE FEDERATED LEARNING
In this section, we consider the problem of developing a single ML model for classification of chest images from multiple sources (such as X-rays and Ultrasound).Consider a 1 https://ai.googleblog.com/2018/05/custom-on-device-ml-models.htmlclustered federated learning (CFL) setup as shown in Figure 2 resembles the actual federated learning settings [47].Clients in each cluster represent the healthcare entities (remote medical imaging facilities) and major hospitals or other government entities (e.g., ministry of health) play the role of the cloud server facilitating the weights aggregation and updates.The problem formulation for collaborative learning is described below.

A. Problem Formulation
In this task, we are interested in learning a shared model M s in a collaborative fashion using clustered federated learning (CFL).As shown in Figure 2, there are two clusters each having different kind of imaging modality, i.e., cluster 1 (C 1 ) has clients having X-ray imagining facility and clients in cluster 2 (C 2 ) posses ultrasound imagining facilities, therefore, each cluster C k is disjoint and has different data distribution D k .Each client m in cluster C k has drawn its samples z k,1 , . . ., z k,m from the distribution D k such that there are no overlapping samples among the clients.We have formulated the problem of collaborative learning as supervised learning problem such that each sample z k,m contains a pair of data sample x k,m and its corresponding class label y k,m , denoted by z k,m = (x k,m , y k,m ).Furthermore, we assume that each client does not have any knowledge either about the identity and data of every other client within the same cluster as well as in the other cluster.The major hospital (aka server) shares a shared model M s and initial weights W 0 with each client of every cluster.After receiving the M s and W 0 , each client trains the shared model (i.e., M s ) using its own local data D k,m , where k = {1, 2} and m denotes the number of clients in each cluster C k .After that, every client in each cluster shares the learned weights W r,k,m to the server, where r denotes the communication round/iteration number.After receiving the weight updates from each client, the server performs federated averaging using Eq. 1.
Where, n denotes the total number of clients participating in the CFL setup (i.e., n = |C 1 | + |C 2 |) and w is a weighting factor that specifies the weight-age given to the weights of each client.Then the server updates the new weights (i.e., update its copy of M s with W r ) and performs the inference using its multi-modal test data (the two modalities, i.e., Xray and Ultrasound are merged to make the testing data multi-modal).After testing the performance of M s at the communication round r, the server shares the updated weights W r with all clients in each cluster and repeats the process until the specified criteria or desired performance is achieved.The algorithm for collaborative multi-modal learning using CFL is presented in Algorithm 1.
Data Description: For this study, two datasets from different sources one containing chest X-ray [10] and chest ultrasound images [13], are used.We formulated the problem as binary classification, i.e., differentiating between COVID-19 chest images and normal chest images.Each dataset is divided into two parts, i.e., a training set and a testing set using a split of 80% and 20%, respectively.The training portion (i.e., 80%) of each dataset is further divided into different parts, depending upon the number of clients in that cluster.The distribution of training and testing data of X-ray and Ultrasound datasets over different classes is shown in Table I.Moreover, the testing sets from both datasets are merged to develop a joint testing set that will be used by the server for the evaluation of the performance of a shared model that is being trained in a collaborative fashion using CFL.
We further note that the datasets used in this study have inter and intra class variability in terms of image size and quality, contrast and brightness level, and positioning of subjects, an example is shown in Figure 3.This is not surprising as these publicly available databases are not standard datasets for COVID-19 detection, and have been curated from different sources [48].Moreover, it is evident from Table I that these datasets are highly imbalanced.These limitations make the training of a generalized model more difficult.Fig. 3: The depiction of inter and intra class variations observed in COVID-19 datasets (X-ray [10] and Ultrasound [13]).2) Model Architecture and Implementation Details: In our experiments, we have used the VGG16 model with one extra convolutional layer and three fully connected layers stacked before its original output layer.Each image is first converted into a gray-scale image, which is then resized to a dimension of 256×256.Moreover, the resized images are normalized before feeding into the model.The model is trained using Adam optimizer with a learning rate of 0.0001 at each client.We use different types of standard data augmentation techniques for training the models.Furthermore, to address the problem of imbalanced classes, we propose to use focal loss [49], which is suited for such issues in binary classification tasks.The focal loss adds a modulating factor (1 − p t ) γ to the standard crossentropy loss, where γ ≥ 0 is a tunable focusing parameter.The α-balanced variant of focal loss is defined in Eq. 2, where α balances the importance of positive/negative examples [49].
We note that for the implementation of the proposed work we used TensorFlow ML library, and all experiments are performed in a simulated environment.The results of the different experiments are described in the next section.

IV. RESULTS AND DISCUSSIONS
In order to show the effectiveness of the proposed multimodal collaborative learning framework for COVID-19 diagnosis, we performed several experiments.On one side, we aim to evaluate and compare the performances of CFL against two baselines, namely (i) specialized FL baseline, and the (ii) multi-modal2 conventional FL.Since CFL aims to tackle the convergence issues of conventional FL schemes due to the diverse distribution of the data, the two baselines, we believe, are appropriate options as a comparison benchmark instead of the state of the art methods for COVID-19 diagnosis.We note that due to the limitations of the dataset, we only consider the divergence in distribution of the data in terms of the nature of the data (i.e., the distribution of ultrasound and X-ray images is different).The first baseline shows the best-case scenario, where separated models for each type of imagery, which we termed as specialized models, are trained in a FL environment.The individual models are trained on X-ray and Ultrasound images with a learning rate of 0.0001 and a batch size of 32 resulting into two separate models one for each modality (i.e., X-ray and Ultrasound).The second baseline represents the experimental setup of a conventional FL environment, where the data is distributed among different clients, and a shared ML model is built in a federated environment.The parameters used in different experiments can be found in Table II.4 provides the experimental results per class and overall (per dataset) results, respectively, in terms of precision, recall, and F1-Score.Since the data set is not balanced, so we believe, alone accuracy is not enough to evaluate the proposed method.For performance evaluation of the three experimental setups (i.e., the two baselines and CFL), we kept the similar experimental setup where we first train the baseline models with a batch size of 16 (for each modality) and then we train the same model in CFL fashion (i.e., using multi-modal settings) with 5 epochs of local training with a batch size of 16.Then we evaluated the collaboratively trained model with the test data from each cluster (modality), i.e., Xray and Ultrasound.As can be seen in the Figure 4, overall comparable results are observed for multi-modal model trained using CFL compared with the specialized two models trained in a conventional FL environment using X-ray and Ultrasound imagery separately.On the other hand, we can see that CFL performance is considerably better than the performance of multi-modal model trained in a conventional federated learning environment.Moreover, it is evident from the figure that a collaboratively trained model is capable of recognizing the test of images from different modalities without having explicit knowledge about these modalities.Moreover, overall better results are obtained on ultrasound images (Figure 4(b)) compared to X-ray imagery (Figure 4(a)) for all models.
In Figure 5, we provide the comparison of the three experimental setups (i.e., specialized models trained in conventional FL settings, multi-modal models trained in a conventional FL and CFL environments) in terms of accuracy and loss at different communication rounds.The figure depicts that the proposed CFL model (which is trained using multi-modal data) provides comparable performance with that of specialized FL models (that are separately trained for each modality).Moreover, it is also evident from the figure that the model trained using multi-modal data in conventional FL settings gets overfitted after 50 communication rounds.On the counter side, the model keeps on learning in CFL setting, though it also tends to show over-fitting behavior at later stage communication round as evident in the Figure 5.The vertical red line shown on Figures 5(a) and 5(b) shows the inflection point beyond which the parameters of the specialized machine learning models of the two clusters (i.e., X-ray and ultrasound) start to diverge from each other.This diversion limits the extent to which the multimodal model can be generalized to fit the underlying multimodal data (i.e., X-ray and ultrasound).Therefore, Figure 5(b) provides the insight that the federated learning rounds should be stopped as soon as the inflection point in the value of the loss function is reached.This inflection point identified the number of rounds beyond which the multimodal machine learning model cannot be enhanced.

A. Lessons Learned
Some key lessons learned from the literature and experiments conducted in this work are: • CFL ensures the privacy of the user's local data, as it does not need to be shared with the server for central training.
• The communication payload of model weights is far less than the payload of sharing actual data, therefore, it saves bandwidth and as well as time.could be in terms of the distribution of negative and positive samples per class as well as in terms of the nature of data samples as detailed earlier.
• This particular use-case demonstrates the potential of the method in medical applications where remote smaller healthcare units can benefit from this collaborative learning method.However, despite these benefits, there are some challenges and limitations as well, e.g., efficiency, security issues, and the optimization of CFL parameters is difficult.Moreover, there is a trade-off in the model performance when we compare model trained using central data and model trained with distributed data using federate learning.In addition, for multi-modal distributively dispersed data, the development of personalized models that are tailored to these modalities is required for local training, which will enhance the efficiency of the shared model and as well as of the models on the client-side.For instance, from our experiments we have learned that the performance of the model being trained in CFL settings start degrading after a particular point (i.e., communication round).Thus highlighting the need of early stopping and the development of optimal stopping criteria except the maximum allowed communication rounds.

A. Developing Personalized Approaches
The edge computing network is potentially more heterogeneous as compared to any other central network and clients in an edge computing network vary due to data acquisition resources, network, computational, and storage resources, etc.Moreover, as discussed above in the paper, clients can significantly vary due to statistical heterogeneity, which is usually a great challenge in realistic settings.For example, as we discussed in the above section, developing a multi-modal collaborative learning framework for COVID-19 diagnosis has efficiency challenges due to the aforementioned heterogeneity issues.Therefore, to handle such heterogeneities, the development of personalized and client-specific ML/DL approaches is required.

B. Adversarialy Robust ML
The edge computing network is more prone to security threats, as edge computing network is an ideal environment for adversaries that aim to get desired outcomes or incentives for breaching the network security and privacy of participating agents.This phenomenon becomes, even more, worse with the integration of ML/DL models that are vulnerable to adversarial attacks, which have been already shown effective for healthcare applications [50].For instance, an adversarial attack on CT scanners in an actual hospital environment by manipulating the hospital's network has already been realized in the literature [51] and threats of adversarial ML for ML and IoT empowered COVID-19 detection systems are highlighted in [52].To restrain the adversarial attacks, different defensive techniques have been proposed in the literature.However, the adversarially robust methods developed so far are attack specific, i.e., they only work for particular attacks for which they were developed and fail to withstand unforeseen attacks.Therefore, the development of adversarially robust ML/DL models is still an open research problem that demands a proportionate amount of interest from the community with the advancement of ML/DL techniques.Moreover, for the successful deployment of ML/DL models on the edge, in particular, for developing robust healthcare applications, the development of adversarially robust models are of utmost importance.

C. Asynchronous Distributed ML
In distributed computing, two approaches are widely used for communication, i.e., synchronous and asynchronous.These approaches are ideal for scenarios where data is instantly available for instance in the central picture archiving and communication system (PACS) of a hospital.However, in realistic settings, the data collection or acquisition might get delayed due to any reason, such as due to some network issue or unavailability of I/O device, etc.Moreover, it is possible that the client (i.e., a small healthcare entity) in an MLbased collaborative computing network is not active at the current iteration/communication round due to some inherent issue, this will result in a delay in the federated parameters update process and will eventually affect the system's overall performance.Therefore, it is worth studying and developing asynchronous approaches for facilitating shared model training for healthcare applications using distributed data.

VI. CONCLUSIONS
This article provides insights on how edge computing and machine learning advances can be used to provide a solution for COVID-19 diagnosis in an efficient privacy-aware manner thereby allowing remote healthcare units to benefit from collaborative learning paradigm without sharing local data.
In particular, we propose using a clustered federated learning (CFL)-based collaborative learning framework to intelligently process visual data at the edge by training a multi-modal ML model capable of diagnoses COVID-19 in both X-ray and Ultrasound imagery.Compared to the conventional FL, CFL is found to better cope with the divergence in distribution of data from different sources (i.e., X-ray and Ultrasound imagery).In the current implementation, we consider the divergence in distribution due to the sources and nature of the data due to the limitations of the datasets.In the future, we will explore how CFL performs in the presence of variances in the distribution of the data in terms of the number of samples per client.

Fig. 1 :
Fig. 1: An illustration of AI/ML at edge in an IoT empowered healthcare environment.

Fig. 2 :
Fig. 2: The proposed clustered federated learning based collaborative learning paradigm (Fig. 2 (c)) versus the method of model training using central data (Fig. 2 (a)) and the conventional federated learning model training method for multiple modalities (Fig. 2 (b)).The term "clients" refers to hospitals, clinics, and medical imaging facilities.

Algorithm 1
Collaborative learning from data of different sources and modality using CFL Input: Shared Model M s , Clusters k, Initial Model Weights W 0 , Set of Clients m, Communication Rounds R, Epochs E, Batch Size B, and Learning Rate η Output: Updated Weights W r Initialize: W 0 , R, E, B, and η for r = 1, ..., R do for i = 1, ..., k in parallel do for j = 1, ..., m in parallel do if r == 1 then

Fig. 4 :
Fig.4: Comparison of clustered federated learning (CFL) with two baselines (i.e., the specialized models (trained with conventional FL independently for each modality) and conventional federated learning (when the model is trained using multi-modal data)) in terms of average values of precision, recall, and F1-score on X-ray and Ultrasound imagery.

Fig. 5 :
Fig.5: Comparison of clustered federated learning (CFL) with the specialized models (trained with conventional FL independently for each modality, i.e., X-ray and Ultrasound) and conventional federated learning (when the model is trained using multi-modal data) over increasing number of communication rounds.

TABLE I :
The distribution of training and testing data of X-ray and Ultrasound datasets over different classes.

TABLE II :
Parameters of clustered federated learning (CFL) experiments.

Table III and
Figure

•
It enables the collaborative learning of multi-modal features by shared model M s without sharing any explicit information about the modality of the local data and the data itself.
• More importantly, compared to the conventional FL, CFL ensures better performance in presence of divergence in the data distribution.The divergence in the distribution

TABLE III :
COMPARISON AGAINST THE TWO BASELINES IN TERMS OF PRECISION, RECALL, AND F1-SCORE.Promising results are obtained by CFL, outperforming the conventional FL while slightly lower performance is obtained compared to Central Baseline with the added advantage of improved privacy and data security.* A separate model is trained in federated learning settings for each modality.