Tools and Applications
// Just quick notes from a course in Coursera plus additional findings through ChatGPT and the web search.
ChatGPT
- Provide a context first. Write a scenario.
- Provide as much detail as you can for better results.
- Then make a request of what you’d like the AI to do for you.
Ex. Context in quotes. Then followed by a request.
“You are a job applicant who just finished interviewing with the VP of Tech for Crazy Hair Glue, Inc. You feel like the interview went well and got some positive vibes from it. You’re interviewing for Director of Product position. Now about an hour passed after the interview and you want to send a thank you email to the interviewer.” Write the email in this scenario.
Google Gemini
- Summarize an article, story, etc. from a web page.
- Provide a URL or the whole text of what you’d like summarized.
Ex. Summarize this article – https://www.wired.com/story/3d-is-back/ – and provide top 3 takeaways.
Image Generation
- Text-to-Image generation
- Image-to-image translation
- Transforming
- Converting sketch to image
- Style transfer and fusion
- Converting painting to photo
- Inpainting
- Filling in the missing part
- Remove an unwanted object
- DALL-E and Stable Diffusion models provide inpainting capabilities.
- Outpainting
- Extend an image
- Generate larger image
Image generation models
- DALL-E by OpenAI
- Stable Diffusion – Open-source model
- StyleGAN – NVIDIA’s StyleGAN separates the modeling of ‘image content’ and ‘image style,’ enabling precise control over style for manipulating specific features like pose or facial expression.
// Created the content below through ChatGPT.
Image Tools
| App | Features | Best Used For | Cons |
|---|---|---|---|
| Craiyon | – AI image generation (text-to-image)- Free and simple UI | Fun, quick image ideas & experimentation | Low resolution, basic quality, watermarking |
| Freepik | – Massive asset library (vectors, photos, icons)- AI image gen & templates | Designers needing ready-made graphics | Many assets require paid plan, attribution needed for free content |
| Picsart | – Photo editing, collages- AI effects, background remover- Mobile app | Social media graphics, creative photo editing | Ads in free version, busy interface |
| Fotor | – Photo editing- AI image generator- Filters & design templates | Basic editing & marketing content creation | Limited in free version, less advanced than pro editors |
| DeepArt.io | – AI art using neural style transfer- Turns photos into artwork | Turning photos into paintings/artistic styles | Slow processing, paid for HD output, limited customization |
| MidJourney | – High-quality AI art generation- Discord-based interaction | Professional-level concept art, illustrations | Requires Discord, no free use now, steep learning curve |
| MS Copilot | – AI assistant built into MS Office apps- Text/image generation, code help | Boosting productivity, AI-enhanced workflows | Limited to MS ecosystem, requires subscription |
| Adobe Firefly | – AI text-to-image & effects- Style customization- Adobe Creative Cloud-linked | High-quality branded content & professional design | Still in beta for some features, Adobe account required |
Audio & Video Tools
| App | Features | Best Used For | Cons |
|---|---|---|---|
| LOVO | – AI voiceover generator- Realistic voices- Custom avatars & video support | Podcasts, marketing videos, e-learning narration | Limited customization of voice tone/emotion in free tier |
| Synthesia | – AI avatar videos- Multilingual support- Script-based video creation | Corporate training, explainer videos, internal comms | Robotic delivery at times, expensive plans |
| Murf.ai | – High-quality voice synthesis- Voice cloning- Slide sync with voice | Voiceovers for ads, training, audiobooks | Steep pricing for full features |
| Listnr | – Text-to-speech platform- 900+ voices- Podcast hosting integration | Audio articles, podcast production | Fewer editing tools than competitors |
| Meta AudioCraft | – AI-generated music & sound- Research-level tool- Text-to-audio capabilities | Experimenting with sound design and audio research | Not yet public, limited real-world use cases |
| Amper Music | – AI music composition- Customizable tracks- Royalty-free output | Background music for content creators | Limited style options, platform acquisition reduced updates |
| Magenta | – Google’s open-source AI for music/art- Tools for music generation & exploration | Developers, experimental artists, creative coding | Technical setup required, not user-friendly for non-coders |
| Descript | – Audio/video editing- Overdub AI voice- Podcast & screen recording tools | Editing podcasts, video production, transcription | Resource-heavy on lower-end computers, premium for AI features |
| Audo AI | – One-click audio cleanup- Noise removal and sound leveling | Cleaning voice recordings, podcast cleanup | Limited beyond cleanup, not for deep editing |
TL;DR:
- 🎙️ LOVO: Easy and realistic AI voiceovers — limited free customization.
- 📹 Synthesia: AI video avatars — great for business, pricey.
- 🎧 Murf.ai: Premium-quality voiceovers — high cost.
- 🗣️ Listnr: Fast voice-to-podcast pipeline — fewer edit tools.
- 🎵 Meta AudioCraft: Cutting-edge AI sound — not yet consumer-ready.
- 🎼 Amper Music: Quick AI music — limited styles.
- 🎨 Magenta: Creative AI lab — great for coders, not plug-and-play.
- ✂️ Descript: Podcasting powerhouse — needs good hardware.
- 🔊 Audo AI: One-click audio cleanup — not a full editor.
Summary:
- ✅ Best for Voiceover & Video: Synthesia, LOVO, Murf.ai
- 🎧 Best for Audio Cleanup & Editing: Descript, Audo AI
- 🎵 Best for Music Generation: Meta AudioCraft, Amper, Magenta
- 🎙️ Best for Podcasting: Listnr, Descript
Tools for Code Generation
| App / Tool | Features | Best Used For | Cons |
|---|---|---|---|
| ChatGPT (by OpenAI) | – Multi-language support- Explains code- Can debug- API & plugin integration | General coding help, prototyping, learning | May hallucinate code, needs review by developer |
| Gemini (by Google) | – Deep integration with Google tools- Real-time suggestions- Supports multiple languages | Assisting in Google Cloud development, explanations | Less mature ecosystem than others; limited IDE integration |
| GitHub Copilot | – Real-time coding in IDEs (VS Code, JetBrains)- Trained on GitHub repos- Autocomplete | Daily development, autocomplete, boilerplate code | May insert insecure/outdated code; requires manual checks |
| PolyCoder | – Open-source model- Focus on C language- Lightweight | C programming, academic use | Limited language support; less powerful than commercial models |
| Watson Code Assistant | – Enterprise-grade AI assistant- Natural language to code- IBM Cloud integration | Enterprise teams, legacy modernization | Enterprise-focused; limited to IBM environments |
| AlphaCode (by DeepMind) | – Solves competitive programming problems- Deep learning for code generation | Competitive coding, research use | Not publicly available as a tool yet |
| CodeWhisperer (by AWS) | – Integrates with AWS tools- Real-time suggestions- Supports multiple languages | Cloud-native development, AWS ecosystem | Biased toward AWS use cases; less flexible outside AWS |
Quick Summary:
- 🧠 Best for Learning & Versatility: ChatGPT, Gemini
- 👨💻 Best for Daily Coding in IDEs: GitHub Copilot, CodeWhisperer
- 🏢 Best for Enterprise Teams: Watson Code Assistant
- 🧪 Best for Research & Challenges: AlphaCode
- 🧑🏫 Best for Open-Source Experimentation: PolyCoder
Applications
🧑🏫 Education
- Automated Grading: AI grades assignments and quizzes using rubrics.
- Personalized Feedback: Provides explanations when students answer incorrectly.
- AI Teaching Assistant (“Ty”):
- Helps with coding errors and lab issues.
- Provides bug fixes, hints, and code suggestions.
- Scalability: Supports large-scale course delivery by reducing the workload for human instructors.
💰 Finance
- Fraud Detection: Identifies suspicious transactions to prevent financial crimes.
- Market Analysis: Helps traders analyze large volumes of market data to make better decisions.
- Customer Support: Powers chatbots for financial institutions to assist users with questions and transactions.
- Document Processing (JP Morgan): Summarizes and understands legal documents quickly using generative AI.
- Market Prediction (Goldman Sachs): Predicts financial trends to give traders an edge.
🏥 Healthcare / Medical
- Medical Image Generation:
- Creates synthetic images to train machine learning models.
- Enhances image resolution and detects anomalies.
- Drug Discovery:
- Generates molecular structures to speed up research.
- Identifies new drug candidates (e.g., by in silico medicine).
- Personalized Medicine:
- Creates custom amino acid, protein, and genome patterns for individualized treatment.
- Diagnostics:
- Improves breast cancer detection using GAN-generated synthetic data.
- Training:
- NVIDIA + King’s College: AI-generated synthetic brain MRIs for training radiologists without violating privacy.
🖥️ IT and DevOps
- (Briefly mentioned) Used for enhancing systems automation and possibly assisting in software development and debugging, especially through AI tools like Ty.
🧬 Other Mentioned Industries (with limited detail)
- HR: Potential use in automating resume screening and personalized job matching.
- Marketing: Generative content creation (e.g., ads, social media copy).
- Entertainment: Creating music, art, and storylines through AI tools.
AI Glossary Table
| Term | Definition | Examples |
|---|---|---|
| Data Augmentation | Technique to increase dataset size by altering existing data. | 1. Flipping or rotating images in image classification.2. Adding background noise to speech data.3. Synonym replacement in text datasets. |
| Deep Learning | A type of machine learning using multi-layered neural networks. | 1. Powering voice assistants like Alexa.2. Detecting cancer in X-rays.3. Translating languages automatically. |
| Diffusion Model | AI model that generates data by simulating a gradual transformation process. | 1. Stable Diffusion generating realistic images.2. AudioCraft creating music from text prompts.3. Text-to-image models improving blurry pictures over steps. |
| Discriminative AI | AI that learns to classify input into categories. | 1. Spam vs. non-spam email detection.2. Fraud detection in banking.3. Diagnosing diseases from symptoms. |
| Discriminative AI Models | Models that focus on predicting categories or labels from input data. | 1. Logistic regression classifying emails.2. BERT identifying sentiment.3. Random forest detecting churn risk. |
| Foundation Models | Large pre-trained models used as a base for many downstream tasks. | 1. GPT-4 powering AI writing tools.2. CLIP matching text with images.3. PaLM used for question answering. |
| Generative Adversarial Network (GAN) | Two-part AI model where a generator creates data and a discriminator evaluates it. | 1. Creating fake but realistic celebrity faces.2. Designing new clothes digitally.3. Converting sketches into artwork. |
| Generative AI | AI that creates original content like text, images, or music. | 1. ChatGPT writing emails or essays.2. DALL·E creating images from captions.3. AIVA composing instrumental music. |
| Generative AI Models | AI models that learn data patterns to generate new, similar content. | 1. GPT writing human-like text.2. MidJourney producing fantasy art.3. Runway ML making video clips. |
| Generative Pre-trained Transformer (GPT) | A type of transformer model trained on large text datasets for generation. | 1. ChatGPT answering questions.2. GPT-3 writing scripts.3. GPT generating code snippets. |
| Large Language Models (LLMs) | Massive neural networks trained on huge text corpora to understand and generate human language. | 1. GPT-4 used in ChatGPT.2. Claude used in customer service bots.3. LLaMA used in academic research. |
| Machine Learning (ML) | AI systems that learn from data to make decisions or predictions. | 1. Netflix recommending shows.2. Gmail sorting spam.3. Predicting loan defaults. |
| Natural Language Processing (NLP) | AI focused on understanding and generating human language. | 1. Google Translate converting languages.2. Grammarly fixing grammar.3. Chatbots responding to users. |
| Neural Networks | Algorithms inspired by the brain’s structure to recognize patterns in data. | 1. Image recognition in security cameras.2. Forecasting weather patterns.3. Generating handwriting. |
| Prompt | The input or command given to an AI model to guide its output. | 1. “Write a poem about the ocean” to ChatGPT.2. “A fox in a space suit” to DALL·E.3. “Make a jazz song” to a music AI. |
| Training Data | The information (text, images, audio, etc.) used to train an AI model. | 1. Wikipedia pages used for language models.2. Photos labeled with objects for image classifiers.3. Voice recordings for speech models. |
| Transformers | A model architecture using attention mechanisms to handle sequential input like text. | 1. BERT for understanding text context.2. GPT for text generation.3. T5 for question answering. |
| Variational Autoencoder (VAE) | A model that compresses input into a lower-dimensional form and reconstructs it to generate similar data. | 1. Generating handwritten digits similar to MNIST.2. Face morphing tools.3. Style transfer in images. |
