Of all the resources required to make gaining insight from big data workflows a success, perhaps the most important is the human one. A major challenge to the data science community is training and education of the current and next generation of data scientists. There are increasingly many tools, middleware and techniques to use for data science, the necessary computing infrastructures are often within reach and most of the software tools for big data are available freely. But human talent, often referred to as data scientists, is difficult to find. Training of big data scientists well-versed to take advantage of big data technologies in their scientific applications is of critical importance to the future of research and knowledge advancement in any field. Workflows provide an ideal environment to combine and teach different steps, tools and techniques for data science while allowing to bring in focus in the context of a particular application domain. We develop learning modules to teach workflow-based thinking to capture the end-to-end process as reusable blocks of knowledge and integrate the tools and technologies used in big data analysis in an intuitive manner. Our workflow-driven driven approach enables us to teach basic concepts in big data analysis and process management. The learning objectives in our training modules include:

  • Learn about distributed platforms and systems
  • Learn about Cloud and Big Data
  • Learn about scalable workflow tools, e.g., Kepler, Pegasus, Oozie, Yarn and Cascading
  • Learn how to make your science reproducible
  • Gain hands-on-experience with a plethora of data science and big data technologies focused around application case studies

WorDS team is in the process of applying MOOC-based approaches to training to bring such training modules to the masses in a scalable and open fashion. Please follow us on Twitter to be informed on announcements regarding future training activities. WorDS also partners with IDSE and NBCR to participate in the training programs offered by these centers.

