Blog Post

Prmagazine > News > News > Midjourney’s surprise: new research on making LLMs write more creatively
Midjourney’s surprise: new research on making LLMs write more creatively

Midjourney’s surprise: new research on making LLMs write more creatively


Join our daily and weekly newsletter for the latest updates and exclusive content on industry-leading AI coverage. learn more


Midjourney Most famous is one of the leading AI image generators – with nearly 20 million users on its Discord channel, According to third-party trackersand probably more places than on its website – but its ambitions are starting to expand.

follow News at the end of summer 2024 The company is building its own computer and AI hardware this week and has released a new research paper with machine learning experts at New York University (NYU) introducing training text-based large language models (LLMs), such as Meta’s open source Llama and Missstral’s eponymous source models to write in a more creative way.

The cooperation is recorded in New research papers Posted in front of the AI ​​Code Community Embrace, two new Technieques are introduced – Diversified Direct Preference Optimization (DDPO) and Diversified Advantages Ratio Priority Optimization (DORPO) – designed to expand the range of possible outputs while maintaining coherence and readability.

For companies known for their diffusing AI image generation models, Midjourney’s new approach to rethinking creativity in text-based LLMS shows that it doesn’t limit its ambitions to visual effects, and, in fact, images aren’t actually worth a thousand words.

Can Midjourney-native LLM or mini versions of existing LLM be used from small bootloader boots? I contacted David Holz, the founder of Midjourney, but no reply yet.

Regardless of what the first-party Midjourney LLM offers, the implications of its new research go beyond academic practice and can be used to help promote the wave of LLM training in enterprise AI teams, product developers, and enterprise AI teams looking to improve AI-generated text.

It also shows that despite recent interest and investments in new multimodal and inference language models for AI model providers, there is still a lot of juice that can be squeezed, cognitive and performance from classic text-centric text LLMS.

Problem: AI-generated writing crashes around homogeneous output

In areas like fact-based Q&A or coding assistance, LLM is expected to produce a single best response.

However, creative writing is open-ended in nature, which means there are many effective responses to a single prompt.

For examples provided by Midjourney researchers “Writing a Story about a Dog on the Moon”LLM can explore the following different paths:

  • An astronaut’s pet dog accidentally stayed after a mission to the moon.
  • Discover your own dog in the futuristic dog space colony.
  • A stranded dog who becomes friends with alien species.

Despite a range of possibilities, LLMs directing adjustment often converge on similar storylines and themes. This happens because:

  1. Post-training technology prioritizes user preferences for originality, thereby enhancing popular but repetitive responses.
  2. Guidance adjustments often smooth out changes, making the model more favorable to “safe” responses than unique responses.
  3. Existing diversity-promoting techniques (such as temperature adjustment) run only at inference time and are not baked into the learning process of the model.

This leads to homogeneous storytelling, where AI-generated creative writing feels repetitive, lacking surprise or depth.

Solution: Modify post-training method to prioritize diversity

To overcome these limitations, the researchers introduced DDPO and DORPO, two extensions to existing preference optimization methods. The core innovation in these approaches is the use of bias (a method that guides training with different measures than other responses.

Here is how it works:

  1. During the training, the model received writing tips and multiple possible responses.
  2. Compare each response with the others at the same prompt and calculate the deviation score.
  3. Rare but high-quality responses are more severe in training, encouraging the model to learn from different examples.

By incorporating bias into Direct Preference Optimization (DPO) and Advantage Preference Optimization (ORPO), the model learns to produce high-quality but more diverse responses.

This approach ensures that AI-generated stories do not converge to a single predictable structure, but instead explores a wider range of characters, settings, and themes that are only possible for human writers.

What Midjourney researchers did to achieve this

The study involved training on LLM on creative writing tasks using the dataset of SubReddit R/WritingerPrompts (Reddit community with user posting tips and short story responses).

The researchers used two basic models for training:

  • Meta’s Llama-3.1-8B ((Llama 3 Series 8 billion parameter model).
  • MISTRAL-7B-V0.3 ((7 billion parameter model from Mistral AI).

They then incorporate these models into:

  1. Supervised fine-tuning (SFT): First, fine-tune the model using LORA (low-level adaptation) to effectively adjust the parameters.
  2. Priority optimization:
    • DPO and ORPO are used as baselines– These standard methods focus on improving response quality based on user preference signals.
    • Then apply DDPO and DORPOintroduce bias-based weighting to encourage more unique responses.
  3. Evaluate:
    • Automatic evaluation: Semantic and stylistic diversity measured using embedding-based techniques.
    • Human Assessment: Compared with GPT-4O and Claude 3.5, the judge evaluated whether the output was diversified and involved.

Key training results:

  • DDPO is significantly better than standard DPO In terms of output diversity, while maintaining quality.
  • DDPO’s Llama-3.1-8b achieves the best balance Quality and diversity, generate response More changes than GPT-4O At the same time, maintain coherence.
  • When the dataset size decreases,DDPO models remain diverse, although they require a certain number of different training samples to be fully effective.

Corporate Meaning: What does it mean for people who use AI to generate creative responses, such as marketing copywriting, company storytelling, and movie/TV/video game scripts?

For AI teams managing LLM deployments, enhancing output diversity while maintaining quality is a critical challenge. These findings are of great significance to organizations that rely on AI to generate content, such as:

  • Session AI and chatbots (Ensure diverse and engaging response).
  • Content Marketing and Storytelling Tools (Prevent duplicate AI-generated replicas).
  • Game development and narrative design (Create various conversations and branch storylines).

For professionals responsible for fine-tuning and deploying models in an enterprise environment, this study provides:

  • A new approach to post-LLM training can enhance creativity without sacrificing quality.
  • By integrating diversity into the learning process itself, a practical alternative to reasoning for time diversity adjustments, such as temperature regulation.
  • From AI-assisted writing tools to virtual assistants that can dynamically tweak their responses, the potential to develop more engaging AI applications.

For those dealing with AI model orchestration and automation, this study highlights:

  • The importance of adjusting models during the training phase reduces the need for post-processing adjustments after deployment.
  • A method of introducing adaptive storytelling to AI-powered applications ensures variability while maintaining high content quality.
  • A way to make LLM output more human-like, which is essential for applications that require interactive storytelling, customer engagement, or dynamic content creation.

The future of creative projects generated by AI looks bright

The success of DDPO and Dorpo show that training LLM with diversity goals can make significant improvements in creative writing. Some ideas include:

  1. Integrate bias-based learning into enterprise AI models to enhance responsive diversity in customer-facing applications.
  2. Explore how these methods are applied to other generation taskssuch as AI-powered poetry, screenwriter or game storytelling.
  3. Develop a hybrid training method That balance Diversity and guidance follow functions For AI Assistant.

For those interested in applying these technologies, researchers plan to use their code publicly here GitHub repository

Whether you are fine-tuning LLM for business applications or optimizing large-scale AI orchestration, this study can provide actionable insights on how models are more dynamic, engaging and responsive to creative tasks.

By adopting these technologies, AI teams can go beyond rigid, formulaic outputs – building AI systems that are not only smart, but are truly imaginative.


Source link

Leave a comment

Your email address will not be published. Required fields are marked *

star360feedback