Genetic Sequencing and Machine Learning: Anticipating Cardiovascular Disease

Authors: Amina Khalpey, PhD, Parker Wilson, BS, Zain Khalpey, MD, PhD, FACS

Machine learning (ML) has become an increasingly important tool in anticipating cardiovascular disease using genomic data. Properly developed ML algorithms can quickly and accurately sort through the human genome to identify genetic patterns and relationships that may not be immediately apparent to human analysts.

The Genetic Code:

Genetic sequencing is becoming easier and more widely available, especially through next-generation sequencing (NGS).1 NGS encompasses multiple strategies to sequencing DNA such as whole genome sequencing (WGS), whole exome sequencing (WES) and gene panels.2 Each of these have particular uses, but all have deepened our understanding of the human genome and its predictive power for development of disease. This advent of genetic sequencing coupled with ML is revolutionizing the way we as clinicians approach cardiovascular disease diagnosis and treatment.

Polygenic Disease:

Multiple cardiovascular diseases (CVD) are polygenic in nature, meaning that various genes play roles in their development, including hypercholesterolemia, channelopathies, cardiomyopathies and congenital diseases.3-9 The 2020 Scientific Statement From the American Heart Association (AHA) on Genetic Testing for Inherited Cardiovascular Diseases recommended testing specific genes in certain monogenic cardiovascular diseases (CVDs), specifically familial hypercholesterolemia and dilated cardiomyopathy.10 For context, monogenic diseases, diseases associated with one gene in particular, are much easier to analyze and locate in a genome. Polygenic diseases, on the other hand, are much more difficult to locate, for the sole reason that their development relies on multiple genes. Brugada syndrome, for instance, can originate from some eleven different genes, but testing is only recommended for SCN5A by the AHA.11-12 ML paired with data from NGS can analyze thousands, if not millions, of genomes to identify and group genes creating genetic profiles for diseases found in patients.

Precision Healthcare:

If this genetic data were matched with medical history, family history, lifestyle, and environmental factors, algorithms developed with ML could identify individuals who are at high risk of developing cardiovascular disease. This information can be used to develop targeted prevention strategies, and to provide individuals with personalized treatment plans. There are current studies that have evaluated novel gene loci and their association with heart disease or cardiac pathology. Aung et al described left ventricular phenotypes using whole gene sequencing of 16,923 patients and identified eight genes that could be predictive of heart failure, including TTN, BAG3 and others.13 It is with similar machine learning techniques and NGS that new insights and discoveries into the origins of disease are possible. This will undoubtedly help improve our understanding and anticipation of these conditions.

However, it is important to note that machine learning is not a perfect solution. It is only as good as the data that it is trained on, and therefore, accurate, complete, representative datasets are needed for algorithm development. A saying comes to mind, “Garbage in, garbage out”. Many long-term prospective studies should be completed with genetic data to evaluate the development of CVD.

In conclusion, the use of machine learning in predicting problems in genetics and cardiovascular disease has the potential to revolutionize the way we diagnose and treat these conditions. By analyzing complete genomes in combination with genetic sequencing technologies, machine learning algorithms have and will continue to identify complex polygenic patterns and relationships that may not have been apparent to human analysts. While there are challenges associated with using machine learning, the potential benefits are significant, and this technology is likely to play an increasingly important role in healthcare in the years to come.


McKusick V.A., Ruddle F.H. Toward a complete map of the human genome. Genomics. 1987;1:103–106. Krittanawong C, Johnson KW, Choi E, Kaplin S, Venner E, Murugan M, Wang Z, Glicksberg BS, Amos CI, Schatz MC, Tang WHW. Artificial Intelligence and Cardiovascular Genetics. Life (Basel). 2022 Feb 14;12(2):279.

Novelli G., Predazzi I.M., Mango R., Romeo F., Mehta J.L. Role of genomics in cardiovascular medicine. World J. Cardiol. 2010;2:428–436. doi: 10.4330/wjc.v2.i12.428

Bertolini S., Pisciotta L., Di Scala L., Langheim S., Bellocchio A., Masturzo P., Cantafora A., Martini S., Averna M., Pes G.M., et al. Genetic polymorphisms affecting the phenotypic expression of familial hypercholesterolemia. Atherosclerosis. 2004;174:57–65. doi: 10.1016/j.atherosclerosis.2003.12.037.

Krittanawong C., Khawaja M., Rosenson R.S., Amos C.I., Nambi V., Lavie C.J., Virani S.S. Association of PCSK9 Variants with the Risk of Atherosclerotic Cardiovascular Disease and Variable Responses to PCSK9 Inhibitor Therapy. Curr. Probl. Cardiol. 2021:101043. doi: 10.1016/j.cpcardiol.2021.101043.

Campuzano O., Beltrán-Álvarez P., Iglesias A., Scornik F., Pérez G., Brugada R. Genetics and cardiac channelopathies. Genet. Med. 2010;12:260–267. doi: 10.1097/GIM.0b013e3181d81636.

Bleumink G.S., Schut A.F., Sturkenboom M.C., Deckers J.W., van Duijn C.M., Stricker B.H. Genetic polymorphisms and heart failure. Genet. Med. 2004;6:465–474. doi: 10.1097/01.GIM.0000144061.70494.95.

Vecoli C., Borghini A., Turchi S., Mercuri A., Andreassi M.G. Genetic polymorphisms of miRNA machinery genes in bicuspid aortic valve and associated aortopathy. Pers. Med. 2021;18:21–29. doi: 10.2217/pme-2020-0082.

Girdauskas E., Geist L., Disha K., Kazakbaev I., Groß T., Schulz S., Ungelenk M., Kuntze T., Reichenspurner H., Kurth I. Genetic abnormalities in bicuspid aortic valve root phenotype: Preliminary results† Eur. J. Cardio-Thorac. Surg. 2017;52:156–162. doi: 10.1093/ejcts/ezx065.

​​Musunuru K., Hershberger R.E., Day S.M., Klinedinst N.J., Landstrom A.P., Parikh V.N., Prakash S., Semsarian C., Sturm A.C., American Heart Association Council on Genomic and Precision Medicine et al. Genetic Testing for Inherited Cardiovascular Diseases: A Scientific Statement From the American Heart Association. Circ. Genom. Precis. Med. 2020;13:e000067. doi: 10.1161/HCG.0000000000000067.

Brugada J., Campuzano O., Arbelo E., Sarquella-Brugada G., Brugada R. Present Status of Brugada Syndrome. J. Am. Coll. Cardiol. 2018;72:1046–1059. doi: 10.1016/j.jacc.2018.06.037

Al-Khatib S.M., Stevenson W.G., Ackerman M.J., Bryant W.J., Callans D.J., Curtis A.B., Deal B.J., Dickfeld T., Field M.E., Fonarow G.C., et al. 2017 AHA/ACC/HRS Guideline for Management of Patients With Ventricular Arrhythmias and the Prevention of Sudden Cardiac Death. Circulation. 2018;138:e272–e391

Aung N., Vargas J.D., Yang C., Cabrera C.P., Warren H.R., Fung K., Tzanis E., Barnes M.R., Rotter J.I., Taylor K.D., et al. Genome-Wide Analysis of Left Ventricular Image-Derived Phenotypes Identifies Fourteen Loci Associated With Cardiac Morphogenesis and Heart Failure Development. Circulation. 2019;140:1318–1330. doi: 10.1161/CIRCULATIONAHA.119.041161.