Intro to Generative AI – Notes

Tools and Applications

// Just quick notes from a course in Coursera plus additional findings through ChatGPT and the web search.

ChatGPT

  • Provide a context first. Write a scenario.
  • Provide as much detail as you can for better results. 
  • Then make a request of what you’d like the AI to do for you.

Ex. Context in quotes. Then followed by a request.

“You are a job applicant who just finished interviewing with the VP of Tech for Crazy Hair Glue, Inc.  You feel like the interview went well and got some positive vibes from it.  You’re interviewing for Director of Product position.  Now about an hour passed after the interview and you want to send a thank you email to the interviewer.”  Write the email in this scenario.

Google Gemini

  • Summarize an article, story, etc. from a web page.
  • Provide a URL or the whole text of what you’d like summarized.

Ex. Summarize this article – https://www.wired.com/story/3d-is-back/ – and provide top 3 takeaways.

Image Generation

  • Text-to-Image generation
  • Image-to-image translation
    • Transforming
    • Converting sketch to image
  • Style transfer and fusion
    • Converting painting to photo
  • Inpainting
    • Filling in the missing part
    • Remove an unwanted object
    • DALL-E and Stable Diffusion models provide inpainting capabilities.
  • Outpainting
    • Extend an image
    • Generate larger image

Image generation models

  • DALL-E by OpenAI
  • Stable Diffusion – Open-source model
  • StyleGAN – NVIDIA’s StyleGAN separates the modeling of ‘image content’ and ‘image style,’ enabling precise control over style for manipulating specific features like pose or facial expression.

// Created the content below through ChatGPT.

Image Tools

AppFeaturesBest Used ForCons
Craiyon– AI image generation (text-to-image)- Free and simple UIFun, quick image ideas & experimentationLow resolution, basic quality, watermarking
Freepik– Massive asset library (vectors, photos, icons)- AI image gen & templatesDesigners needing ready-made graphicsMany assets require paid plan, attribution needed for free content
Picsart– Photo editing, collages- AI effects, background remover- Mobile appSocial media graphics, creative photo editingAds in free version, busy interface
Fotor– Photo editing- AI image generator- Filters & design templatesBasic editing & marketing content creationLimited in free version, less advanced than pro editors
DeepArt.io– AI art using neural style transfer- Turns photos into artworkTurning photos into paintings/artistic stylesSlow processing, paid for HD output, limited customization
MidJourney– High-quality AI art generation- Discord-based interactionProfessional-level concept art, illustrationsRequires Discord, no free use now, steep learning curve
MS Copilot– AI assistant built into MS Office apps- Text/image generation, code helpBoosting productivity, AI-enhanced workflowsLimited to MS ecosystem, requires subscription
Adobe Firefly– AI text-to-image & effects- Style customization- Adobe Creative Cloud-linkedHigh-quality branded content & professional designStill in beta for some features, Adobe account required

Audio & Video Tools

AppFeaturesBest Used ForCons
LOVO– AI voiceover generator- Realistic voices- Custom avatars & video supportPodcasts, marketing videos, e-learning narrationLimited customization of voice tone/emotion in free tier
Synthesia– AI avatar videos- Multilingual support- Script-based video creationCorporate training, explainer videos, internal commsRobotic delivery at times, expensive plans
Murf.ai– High-quality voice synthesis- Voice cloning- Slide sync with voiceVoiceovers for ads, training, audiobooksSteep pricing for full features
Listnr– Text-to-speech platform- 900+ voices- Podcast hosting integrationAudio articles, podcast productionFewer editing tools than competitors
Meta AudioCraft– AI-generated music & sound- Research-level tool- Text-to-audio capabilitiesExperimenting with sound design and audio researchNot yet public, limited real-world use cases
Amper Music– AI music composition- Customizable tracks- Royalty-free outputBackground music for content creatorsLimited style options, platform acquisition reduced updates
Magenta– Google’s open-source AI for music/art- Tools for music generation & explorationDevelopers, experimental artists, creative codingTechnical setup required, not user-friendly for non-coders
Descript– Audio/video editing- Overdub AI voice- Podcast & screen recording toolsEditing podcasts, video production, transcriptionResource-heavy on lower-end computers, premium for AI features
Audo AI– One-click audio cleanup- Noise removal and sound levelingCleaning voice recordings, podcast cleanupLimited beyond cleanup, not for deep editing

TL;DR:

  • 🎙️ LOVO: Easy and realistic AI voiceovers — limited free customization.
  • 📹 Synthesia: AI video avatars — great for business, pricey.
  • 🎧 Murf.ai: Premium-quality voiceovers — high cost.
  • 🗣️ Listnr: Fast voice-to-podcast pipeline — fewer edit tools.
  • 🎵 Meta AudioCraft: Cutting-edge AI sound — not yet consumer-ready.
  • 🎼 Amper Music: Quick AI music — limited styles.
  • 🎨 Magenta: Creative AI lab — great for coders, not plug-and-play.
  • ✂️ Descript: Podcasting powerhouse — needs good hardware.
  • 🔊 Audo AI: One-click audio cleanup — not a full editor.

Summary:

  • Best for Voiceover & Video: Synthesia, LOVO, Murf.ai
  • 🎧 Best for Audio Cleanup & Editing: Descript, Audo AI
  • 🎵 Best for Music Generation: Meta AudioCraft, Amper, Magenta
  • 🎙️ Best for Podcasting: Listnr, Descript

Tools for Code Generation

App / ToolFeaturesBest Used ForCons
ChatGPT (by OpenAI)– Multi-language support- Explains code- Can debug- API & plugin integrationGeneral coding help, prototyping, learningMay hallucinate code, needs review by developer
Gemini (by Google)– Deep integration with Google tools- Real-time suggestions- Supports multiple languagesAssisting in Google Cloud development, explanationsLess mature ecosystem than others; limited IDE integration
GitHub Copilot– Real-time coding in IDEs (VS Code, JetBrains)- Trained on GitHub repos- AutocompleteDaily development, autocomplete, boilerplate codeMay insert insecure/outdated code; requires manual checks
PolyCoder– Open-source model- Focus on C language- LightweightC programming, academic useLimited language support; less powerful than commercial models
Watson Code Assistant– Enterprise-grade AI assistant- Natural language to code- IBM Cloud integrationEnterprise teams, legacy modernizationEnterprise-focused; limited to IBM environments
AlphaCode (by DeepMind)– Solves competitive programming problems- Deep learning for code generationCompetitive coding, research useNot publicly available as a tool yet
CodeWhisperer (by AWS)– Integrates with AWS tools- Real-time suggestions- Supports multiple languagesCloud-native development, AWS ecosystemBiased toward AWS use cases; less flexible outside AWS
Quick Summary:
  • 🧠 Best for Learning & Versatility: ChatGPT, Gemini
  • 👨‍💻 Best for Daily Coding in IDEs: GitHub Copilot, CodeWhisperer
  • 🏢 Best for Enterprise Teams: Watson Code Assistant
  • 🧪 Best for Research & Challenges: AlphaCode
  • 🧑‍🏫 Best for Open-Source Experimentation: PolyCoder

Applications

🧑‍🏫 Education
  • Automated Grading: AI grades assignments and quizzes using rubrics.
  • Personalized Feedback: Provides explanations when students answer incorrectly.
  • AI Teaching Assistant (“Ty”):
    • Helps with coding errors and lab issues.
    • Provides bug fixes, hints, and code suggestions.
  • Scalability: Supports large-scale course delivery by reducing the workload for human instructors.

💰 Finance
  • Fraud Detection: Identifies suspicious transactions to prevent financial crimes.
  • Market Analysis: Helps traders analyze large volumes of market data to make better decisions.
  • Customer Support: Powers chatbots for financial institutions to assist users with questions and transactions.
  • Document Processing (JP Morgan): Summarizes and understands legal documents quickly using generative AI.
  • Market Prediction (Goldman Sachs): Predicts financial trends to give traders an edge.

🏥 Healthcare / Medical
  • Medical Image Generation:
    • Creates synthetic images to train machine learning models.
    • Enhances image resolution and detects anomalies.
  • Drug Discovery:
    • Generates molecular structures to speed up research.
    • Identifies new drug candidates (e.g., by in silico medicine).
  • Personalized Medicine:
    • Creates custom amino acid, protein, and genome patterns for individualized treatment.
  • Diagnostics:
    • Improves breast cancer detection using GAN-generated synthetic data.
  • Training:
    • NVIDIA + King’s College: AI-generated synthetic brain MRIs for training radiologists without violating privacy.

🖥️ IT and DevOps
  • (Briefly mentioned) Used for enhancing systems automation and possibly assisting in software development and debugging, especially through AI tools like Ty.

🧬 Other Mentioned Industries (with limited detail)
  • HR: Potential use in automating resume screening and personalized job matching.
  • Marketing: Generative content creation (e.g., ads, social media copy).
  • Entertainment: Creating music, art, and storylines through AI tools.

AI Glossary Table

TermDefinitionExamples
Data AugmentationTechnique to increase dataset size by altering existing data.1. Flipping or rotating images in image classification.2. Adding background noise to speech data.3. Synonym replacement in text datasets.
Deep LearningA type of machine learning using multi-layered neural networks.1. Powering voice assistants like Alexa.2. Detecting cancer in X-rays.3. Translating languages automatically.
Diffusion ModelAI model that generates data by simulating a gradual transformation process.1. Stable Diffusion generating realistic images.2. AudioCraft creating music from text prompts.3. Text-to-image models improving blurry pictures over steps.
Discriminative AIAI that learns to classify input into categories.1. Spam vs. non-spam email detection.2. Fraud detection in banking.3. Diagnosing diseases from symptoms.
Discriminative AI ModelsModels that focus on predicting categories or labels from input data.1. Logistic regression classifying emails.2. BERT identifying sentiment.3. Random forest detecting churn risk.
Foundation ModelsLarge pre-trained models used as a base for many downstream tasks.1. GPT-4 powering AI writing tools.2. CLIP matching text with images.3. PaLM used for question answering.
Generative Adversarial Network (GAN)Two-part AI model where a generator creates data and a discriminator evaluates it.1. Creating fake but realistic celebrity faces.2. Designing new clothes digitally.3. Converting sketches into artwork.
Generative AIAI that creates original content like text, images, or music.1. ChatGPT writing emails or essays.2. DALL·E creating images from captions.3. AIVA composing instrumental music.
Generative AI ModelsAI models that learn data patterns to generate new, similar content.1. GPT writing human-like text.2. MidJourney producing fantasy art.3. Runway ML making video clips.
Generative Pre-trained Transformer (GPT)A type of transformer model trained on large text datasets for generation.1. ChatGPT answering questions.2. GPT-3 writing scripts.3. GPT generating code snippets.
Large Language Models (LLMs)Massive neural networks trained on huge text corpora to understand and generate human language.1. GPT-4 used in ChatGPT.2. Claude used in customer service bots.3. LLaMA used in academic research.
Machine Learning (ML)AI systems that learn from data to make decisions or predictions.1. Netflix recommending shows.2. Gmail sorting spam.3. Predicting loan defaults.
Natural Language Processing (NLP)AI focused on understanding and generating human language.1. Google Translate converting languages.2. Grammarly fixing grammar.3. Chatbots responding to users.
Neural NetworksAlgorithms inspired by the brain’s structure to recognize patterns in data.1. Image recognition in security cameras.2. Forecasting weather patterns.3. Generating handwriting.
PromptThe input or command given to an AI model to guide its output.1. “Write a poem about the ocean” to ChatGPT.2. “A fox in a space suit” to DALL·E.3. “Make a jazz song” to a music AI.
Training DataThe information (text, images, audio, etc.) used to train an AI model.1. Wikipedia pages used for language models.2. Photos labeled with objects for image classifiers.3. Voice recordings for speech models.
TransformersA model architecture using attention mechanisms to handle sequential input like text.1. BERT for understanding text context.2. GPT for text generation.3. T5 for question answering.
Variational Autoencoder (VAE)A model that compresses input into a lower-dimensional form and reconstructs it to generate similar data.1. Generating handwritten digits similar to MNIST.2. Face morphing tools.3. Style transfer in images.