Seeing cancer rates decrease with the help of Data Science.


João Ramalho


March 11, 2024

“Historical data and research shows that it is possible to change the world”, claims Our world in data. When we take the time to look into reliable data and make a deeper analysis on some of the current world problems we often get a different view from the messages relayed by everyday breaking news. As an example, historical cancer death rates when adjusted for demographic changes show that finally the world is making slow progress against cancer.

Having been impacted in my family by this disease, cancer is more than just cold numbers for me. I nevertheless believe that good quality data and analysis can play a role at different levels and can really help improving the life of everyone. By covering long term trends and adjusting for context the cancer historical time series show, it can contribute to a more objective understanding of complex situations and support more effective decision-making. When covering a very specific analysis such as brain tumor imagery it can be essential to achieve correct diagnostic accuracy.

Our Data Track volunteer trainer Alain Jungo PhD, Postdoc Researcher at the University of Bern, has been specializing in computer vision and has published several articles on this topic. Alain has strong insights into the applications of Python, machine learning and deep learning to real-life applications. He is now bringing some of these insights to our data track participants helping them to run their own studies and apply them in their future internships and jobs.

I’ve discussed with Alain about data analysis topics but also about learning, coaching and working for a better world in general. Below is an extract from our interview.

João: Hello Alain, I guess like most everyone you sometimes feel gloomy about the news we get from media in general on the environment, economics, peace and so on. How has your knowledge on statistics and data science in general has influenced how you see the world?
Alain: I guess I am like most people when it comes to news… but to counteract the gloomy feeling I try to remind myself that media coverage tends to emphasize extreme, mostly negative events to maximize readership. Therefore, I hold onto the belief that the world is generally doing better and humans are more commendable than the media portrays. My knowledge of data science and statistics has not changed much, except that I probably approach conclusions drawn from data with more skepticism. I’ve come to understand the difficulty of removing human bias from data interpretation. Consequently, I’ve become more cautious in embracing new beliefs.

João: The data field has again accelerated since the arrival of LLMs and their applications in software code generation to the point that some technology leaders claim coding no longer necessary (e.g. NVidia CEO). How much do you use these technologies in your work and if you’d start again would you still go through the difficulties of learning to code in Python for example?
Alain: Tools like GitHub’s Copilot are very helpful. I rely on Copilot regularly to draft or refine algorithms and plots. The suggested code it offers is typically of high quality, streamlining my workflow and reducing the need for extensive consultations on platforms like Stack Overflow. Nevertheless, utilizing such tools necessitates a solid understanding of code and the ability to validate its accuracy and alignment with the desired outcomes. Therefore, I do not view such code generators as an immediate threat but rather as tools empowering us to pursue our objectives: building solutions and understanding data. Furthermore, these tools contribute to the accessibility of coding education, making the learning process more approachable than ever. Therefore, I would probably still spend time learning how to code today.

João: I’ve just been through your 2019 paper on the importance of collecting large volumes of data in order to improve diagnosis accuracy (“Deep Learning Versus Classical Regression for Brain Tumor Patient Survival Prediction”)[]. How has data collection evolved, in the almost 5 years that have passed since you published it?
Alain: The medical field is highly regulated, so changes usually happen slower compared to other areas. The way data is collected has not changed much in the last five years. However, the real challenge with medical data is not acquiring it; huge amounts of patient data and images are collected and stored daily. More challenging are standardization and availability of data. Lately, progress has been made in both areas. More and more anonymized medical data is now freely available, which has drastically sped up the development of new AI-based solutions. Additionally, initiatives for better harmonization of medical data have been initiated in Switzerland.

João: You’ve been working in the medical field for more than 10 years now. What are the most important competencies you see required in the near future, that you integrate in your sessions with Powercoders participants?
Alain: In my view, it is not so much about technical competencies, as they are bound to evolve continuously. What is crucial is being able to adjust to new scenarios and demands quickly. I also encourage the Powercoders participants not to focus solely on using fancy methods but to be curious about understanding the data they’re dealing with. Getting valuable insights and finding solutions often depends on really understanding the data.

João: It’s been great to follow your work with Powercoders in the last couple of years, thanks for all the energy, creativity and time you put on it.
Alain: Thank you and all the participants. It has been a pleasure.

The Powercoders DataTrack is based on the DataCamp platform made available by DataCamp Donates. It provides high quality courses in all relevant technologies for Data Professionals such as Python, R, Julia, SQL, chatGPT, Power BI.