The importance of web scraping and alternative data cannot be overstated. In today's data-driven world, having access to a wide range of information is crucial for businesses looking to stay competitive.
In addition, the growth of artificial intelligence (AI) and machine learning (ML) has also played a significant role in the rise of web scraping. Besides, web scraping, alternative data, AI and ML are already profoundly impacting society and are expected to continue to do so in the coming years.
Four prominent members of Oxylabs AI & ML Advisory Board, including the Head of Data at SpaceNK Adi Andrei, Co-Founder & CEO of Three Thirds Jonas Kubilius, Ali Chaudhry, the Founder of Reinforcement Learning Community, and Software Engineer Pujaa Rajan from Stripe, express their predictions for 2023.
Unseen Machine Learning Capabilities
Adi Andrei, former Engineer at NASA and Lead Data Scientist at Unilever, current Head of Data at SpaceNK, and Advisory Board member at Oxylabs, expects growth in large language models, as well as in self-supervised machine learning methods such as contrastive learning.
“Large Language Models for NLP (like BERT, GPT, and derivatives) will keep improving, and their use will become more pervasive. Also one pretrained model will be able to be used with little modification for many functions (sentiment analysis, summarization, word sense disambiguation, etc.),” - shared Adi.
Adi also anticipates that more attention will be paid to contrastive learning. “It’s aim is to learn representations of data by contrasting between similar and dissimilar samples. Transformers, text-to-Image, and diffusion models, require large-scale datasets and supervised pre-training of large models is extremely expensive. The self-supervised Contrastive Learning can be used to leverage vast amounts of unlabelled data in order to efficiently pre-train large models.” Adi explained.
“Furthermore, contrastive search is a related technique which has been shown to significantly improve the output of large language models when used for text generation tasks” - added Adi.
Content generation techniques to become profitable products
Jonas Kubilius, AI Researcher, Marie Skłodowska-Curie Alumnus, Co-founder and CEO at Three Thirds and member of the Oxylabs Advisory Board, anticipates an increased evolution of Stable Diffusion, GPT-3, GitHub Copilot, and other content generation techniques into profitable products used by developers and content creators in real-world applications. He added that we would see an increased interest in multi-modal models that can handle text, images, audio, and other inputs for multiple tasks.
“We will start seeing a shift from using AI for static tasks like classification to language-model-driven interactive workflows that help people perform their tasks more efficiently,” - said Jonas.
The adoption of artificial intelligence (AI) within the biotech industry has also been on the rise in recent years. According to Jonas, the adoption of AI in the biotech industry is helping improve the speed and accuracy of drug development, which can ultimately benefit patients and the healthcare system as a whole.
“I believe that AI adoption within biotech will keep accelerating, benefiting not only drug discovery efforts but our general understanding of cell biology,” - added Jonas.
AI apps to replace Google
Ali Chaudhry, Postgraduate Teaching Assistant at UCL, Founder of Reinforcement Learning Community, Course Leader of Artificial Intelligence and Machine Learning at Emeritus, Chief Technical Advisor at Infini8AI, and Advisory Board member at Oxylabs, believes that various regulations for AI-powered tools emerged in Europe, US, UK, and Canada will continue in the coming year with stricter regulations and their implementations.
“We will see a proliferation of apps built on top of AI-generated content (text and images) through tools like Dall.E and Stable Diffusion. It will be interesting to see the impact of open-sourcing stable diffusion on the AI community,” - shared Ali.
The recent rise of increased interest in AI and NLP in bots have brought great results.
For instance, OpenAI - a research institute and technology company that is focused on advancing the field of AI created the well-known ChatGPT as one of their projects. The technology allows bots to understand and respond to human language, providing more natural and human-like interactions.
“I see ChatGPT replacing Google in many ways and OpenAI emerging as a big tech giant on top of this product. It will be interesting to explore its impact on education, healthcare, and personalized software. It will transform our society in many ways,” - Ali added.
More use cases for AI-powered applications
Pujaa Rajan, Machine Learning Engineer at Stripe, Software Engineer at Paradigm, and Advisory Board member at Oxylabs, expects that we will continue to see more applications and more adoption of generative AI.
“In 2022, I was personally most excited by GitHub Copilot, which is powered by the OpenAI codex, and I use it every day. This was the first of its kind, so I expect to see more improvements and more competition in related technologies in 2023,” - mentioned Pujaa.
“In the same year, we also saw Lensa, which uses Stable Diffusion, used to create photos and content for social media. In 2023, I can imagine this tool being adopted by authors, illustrators, and others,” - Pujaa added.