Building a Music Video with AI Tools A Step by Step Process
- Ottermation
- Feb 14
- 3 min read
Creating a music video traditionally involves many moving parts: writing lyrics, composing music, designing visuals, and editing the final product. Today, AI tools can simplify and speed up this process, allowing creators to experiment and iterate quickly. I recently built a music video using a suite of AI tools, and I want to share my step-by-step approach. This guide will walk you through how I combined AI-powered writing, music generation, storyboarding, image creation, and video editing to bring a music video to life.
Writing Lyrics with Grok and ChatGPT
The first step was crafting the song’s lyrics. I started by using Grok and ChatGPT to generate initial drafts. Both tools have unique strengths: Grok offers creative prompts and poetic suggestions, while ChatGPT excels at refining language and maintaining flow.
I began by feeding a theme and mood into Grok to get raw lyric ideas.
Then, I imported those ideas into ChatGPT to polish the verses and choruses.
I made manual tweaks to ensure the lyrics matched the tone I wanted and felt natural.
This back-and-forth process helped me avoid writer’s block and explore different lyrical directions quickly. The key was to treat the AI outputs as drafts, not final versions, and to keep editing until the words felt right. It was also important to keep the the song around 1 minute. Chat GPT seemed to be better at hitting this goal than Grok.
Creating Music with Suno
Once the lyrics were ready, I moved on to the music. I used Suno, an AI music generator, to compose the instrumental track.
I input the mood, genre, and tempo preferences into Suno.
I ran several generations to explore different beats and styles.
Each version was evaluated for how well it matched the lyrics and overall vibe.
When a track didn’t fit, I adjusted the lyrics slightly to sync better with the rhythm.
This iterative process took time but was essential to find the right balance between music and words. Suno’s ability to quickly generate multiple options made it easy to experiment with tempo changes and instrumentation.
Storyboarding the Video with Animate and Photoshop
With the song finalized, I planned the visuals. I sketched rough storyboards using Animate and Photoshop.
Animate helped me create basic motion frames, character positions, and lighting.
I started cleanup and lighting in Photoshop initially but then realized just keeping it in Animate would be way faster.
These rough boards captured the key scenes and transitions I wanted in the video.
After completing the sketches, I fed the storyboard descriptions into ChatGPT, along with the boards, to rewrite and clean up the drawings. I gave ChatGPT specific style instructions to maintain consistency in tone and style. When minor details were off, I corrected them manually in Photoshop to keep the visuals aligned with my vision. It had a tough time keeping the fishes mouth as a fish mouth and would often use the hamsters nose and mouth. Same issue happened once I fed Grok Imagine the fixed images.


Generating Video Frames with Grok Imagine
Next, I used Grok Imagine to create the actual video animations based on the cleaned-up storyboard.
I input prompts describing each scene, including colors, lighting, and mood.
Some scenes were simple and generated quickly.
I noticed with Grok Imagine it was often better to keep the prompts simple and let it create. If I got too complex with the descriptions things would go off the rails fast.
Other more complex shots required multiple attempts and prompt adjustments.
I saved the best versions of each frame for the edit.
This was the least time consuming part of the process since I generated enough 6 second clips I could edit them later as needed if there were any errors.
Final Editing in Adobe Premiere
After gathering all the video frames and the music track, I imported everything into Adobe Premiere for final editing.
I synced the videos with the music track.
Added transitions and effects to smooth scene changes.
Adjusted timing to match the song’s tempo and mood.
Included text overlays generated in Premiere.
Premiere gave me full control to polish the video and fix any last-minute issues. The final export was a music video that combined AI-generated lyrics, music, visuals, and editing into a cohesive piece.
Tips for Using AI Tools in Music Video Creation
Treat AI outputs as starting points, not finished products.
Be ready to iterate multiple times, especially with music and visuals.
Use manual editing tools like Photoshop and Premiere to refine AI results.
Keep your prompts clear and consistent to get better AI outputs. On a few occasions I had ChatGPT create cleaner prompts for images once I started hitting limits on generations.
Experiment with different AI tools to find the best fit for each stage.
Comments