AI/ML solutions for COVID-19 Pandemic
#CellStratAILab #disrupt4.0 #WeCreateAISuperstars #AlwaysUpskilling
Last Saturday (4th Apr ’20), CellStrat AI Lab conducted a Global Code Jam for AI/ML solutions for the current COVID-19 pandemic, which has brought a great deal of issues in the world.
With the Code Jam, we tried to solve some of the COVID-19 problems with help of AI / ML solutions.
The Code Jam was led by Dr Purnendu Sekhar Das, Healthcare Consultant and AI Researcher. The Doctor presented the overall scenario of the current pandemic and the attending epidemiological factors around its spread, as well as the protein structure of the virus and how it infiltrates the lung cells.
Here were some ideas that were discussed and some projects that were presented :-
COVID-19 : AI/ML Project Ideas
1. Deep Learning/Image Recognition : Help diagnose if a medical image like a digital Chest X-Ray shows COVID-19 pneumonia.
Datasets will include normal patient and COVID-19 Pneumonia patient Chest X-rays. CNN based image classification can act as a decision support system for high volume of Chest X-Rays.
Possible Solutions – Image Classification, Image Segmentation
Chest X-Ray Analysis for COVID-19 detection (by Bismillah Kani) :-
The coronavirus pandemic is a major challenge to the healthcare systems around the world. The number of positive cases is increasing exponentially. The healthcare system is overwhelmed and will soon begin to explode. In last few weeks, we can see that AI companies around the world are working towards finding AI-based solutions to support the healthcare systems. One such solution is to detect COVID-19 based on chest X-ray or CT scans. Such AI based tools can be immensely helpful in screening and early detection, triage new infections and monitor advancing disease. AI based analysis of chest X-ray can potentially reduce the burden of radiologists and in future it can also predict criticality of patients and who are most likely needs a ventilator support.
Chest X-Ray Analysis using Image Segmentation (by Dr Purnendu Sekhar Das) :-
Dr Purnendu showed how UNET Image Segmentation can be used to analyze the Chest X-Ray Images for marking the COVID-19 pneumonia cases.
2. Forecasting model for new cases or incidence/infection rates by area to enable hospitals/health officials to better plan resourcing and response.
Data can be past day wise data for other countries along with preventive measures taken by them. Data for India can be used to extrapolate from current numbers to next few weeks.
Possible Solutions – Time-Series, Collaborative Filtering, Clustering, Regression, RL
3. Forecasting of possible rates of spread/patient and prognosis/percentage of serious cases by location from COVID-19 data, along with population demographics data from census 2011 and other public services data (like hospital beds and ventilators).
Clusters of high probability areas on Google Maps and live concentrations of people within the city (so that people can avoid going to crowded places and the government can detect crowded places) for enhancement.
Possible Solutions – RL, Clustering
ICU Decision Support System with Time Series and RL (by Salim Ansari) :-
Salim presented an interesting research note on ICU Management which can help decrease mortality rates. A time-series which has a sequence of tests done in an ICU creates a probabilistic environment which can then be used as a framework for Reinforcement Learning Training to try to decrease mortality rates.
4. Social Media scraping and analysis to predict possible infection rates. Sometimes Twitter and other social media posts precede a catastrophic increase in symptomatic cases. Possible data source: Twitter data on relevant keywords (Cough, fever, cold, flu) in say Italy/target locality – relationship with past infection trends, and the closely following infection/case rates.
Sentiment analysis can also be used to enhance this.
Possible Solutions – NLP text mining with noise reduction, Sentiment Analysis.
5. Mining published literature, PDF documents from Pubmed/other medical literature sources, on COVID-19 disease to answer some specific questions like: Kaggle CORD-19, SQUAD.
Sample tasks can be to answer a question like:
What do we know about COVID-19 risk factors?
What preventive measures are most effective?
Possible Solutions – NLP, Ques-Ans with SQUAD application, Knowledge Graph
Development of Knowledge Graph from COVID-19 literature for Data Mining (by Anupam Ranjan) :-
Anupam presented an interesting technique to develop a Knowledge Graph from COVID-19 literature. This can be used to mine useful information about this outbreak by asking questions such as “Which areas are more affected by Coronavirus ?”. The knowledge graph can help provide such answers easily.
6. Patient level risk prediction for severe COVID-19 disease from clinical features and other patient level data using machine learning.
Data sources for this will be patient level real world data like from EMR and Medical Claims sources; or it can be from any structured medical source – it will contain past clinical history including comorbidities and medications used. Some patients with severe and mild COVID-19 disease will be needed to train the dataset to predict risk for new patients.
Possible Solutions – Structured Data mostly – tabular. Sometimes we might get X-ray or EHR free text etc. Classic ML on tabular data.
7. Google Trends or other search engine data to predict infection rates in a particular locality.
Search engine search phrases aligned to COVID-19 symptoms, just like social media scraping can precede increase in cases.
This approach can also be punched with sentiments to get additional features that might predict rates, as well as add to the social media analysis for any location.
Possible Solutions – Time-series
8. Open Rapid Prototyping on Testing strategies; India has 1.4 B people and may be only 140k testing kits available. What can be done to build an algorithm to find the people who need the COVID specific test most?
Data on patient profiles from previous testing and those who returned positive results can be clubbed with symptoms and other history to come up with a screening algorithm.
Possible Solutions – Triage locations which need testing and who needs testing.
9. Community Surveillance via Apps or mapping tech; the disease is poised to spread exponentially in the next few weeks. If we can apply new tech and some learnings from past disease surveillance techniques to build a working prototype; ideas proposed by team members like the one recently implemented by Singapore govt for contact tracing through a mobile phone App.
Possible Solutions – Use Mobile geo-presence and mobile app (cell tower triangulation) to create contact tracing graph, detecting population density check, carrier dev platforms, Google Maps and MapMyIndia dev platform
COVID-19 Hotspots via Mobile App (by Niraj Kale) :-
Niraj presented a solution for Social Distancing and Hotspot zones using and Android Mobile App. For containing the coronavirus spread, it is important to track the patients and where they are located. Also, it is important to alert the nearby hospitals about the patient information. This can be achieved by tracking the people who have arrived from abroad. Also, this can be filtered by checking the hospitals which have reported patients with symptoms and those who have tested positive to the coronavirus tests. This data can be collected and presented in the form of a heatmap to show which locality has high density of coronavirus patients. The hotspots are the areas that can be locked down and remaining areas can be kept open.
10. Prediction of COVID-19 cases, recovery cases taking population into consideration. When can we see net new cases start reducing in India.
Explanation: Considering population of the country, no of cases getting detected per day, concentration of people, predict no of cases that will be diagnosed in next 1-2 months. Considering the efforts taken by the government and the rate of increase of the cases in India, predict when can we see no of new cases start reducing
Possible Solution: Time Series
11. Predicting commercially available antiviral drugs that may act on novel coronavirus (2019-nCoV)
Explanation: SMILES in the data sets have all been uniformized through the RDKit. Cluster the data sets at the same place. Use it for validating the inference of molecular properties through various machine learning models as proposed.
Possible Solution: https://github.com/GLambard/Molecules_Dataset_Collection
12. What social and environmental factors and genetic factors escalate the virus and what slow it down or kill it. Cauality or pattern matching. E.g. will heat and humidity slow it down. How will Indians’ bodies/natural immune defence react to it vs Chinese.
Explanation: Causality determination, adverse factor detection for slowing the virus, which other viruses it is close to, medicines likely to cure it, genetic factors that provide defence
Possible Solution: Classification, Regression on temperature levels etc.
13. Collaborative filtering can be applied as a potential method to screen patients
Explanation: Which patients likely to get it and which recover easily
Possible Solution: Collaborative Filtering, Classification, Clustering
14. Serological Testing to check who already has antibodies for this virus (having being infected before) and hence some level of immunity against a repeat infection
Possible Solution: Classification, Regression, Clustering
15. Generating COVID-19 related molecules using Generative Modelling techniques
Possible Solution: Molecule synthesis using VAE (Variational AutoEncoders)