ML_news_private 联合开发网

Pudn.com > 下载中心 > 人工智能/神经网络/深度学习 > ML_news_private

ML_news_private

所属分类：人工智能/神经网络/深度学习
开发工具：Others
文件大小：0KB
下载次数：0
上传日期：2024-02-08 09:28:59
上传者：sh-1993

说明： ML新闻-私人
(ML news private)

# ML_news_private this is just a placeholder, the organized and correct repository is [here](https://github.com/SalvatoreRa/ML-news-of-the-week) # scheme # ML news: ## Research |Link|description| |---|---| |[.]() | | |[.]() | | |[.]() | | ## News |Link|description| |---|---| |[.]() | | |[.]() | | |[.]() | | ## Resources |Link|description| |---|---| |[.]() | | |[.]() | | |[.]() | | ## Perspectives |Link|description| |---|---| |[.]() | | |[.]() | | |[.]() | | # ON WORKING # ML news: ## Research |Link|description| |---|---| |[Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection.](https://arxiv.org/abs/2309.12247) |When it comes to detecting bogus news, a refined BERT model performs better than an off-the-shelf LLM like GPT-3.5-turbo. | |[PAP-REC: Personalized Automatic Prompt for Recommendation Language Model.](https://arxiv.org/abs/2402.00284v1) |In order to improve the efficacy and efficiency of Recommendation Language Models, PAP-REC has developed a technique that automatically generates tailored prompts. | |[PAM: Prompting Audio-Language Models for Audio Quality Assessment.](https://arxiv.org/abs/2402.00282v1) | PAM is a tool that evaluates audio quality without reference tracks or specific training by using Audio-Language Models.| |[AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning.](https://animatelcm.github.io/) |AnimateLCM is a novel method that divides the learning process into two halves in order to rapidly produce high-quality films and enhance current video diffusion models. | |[Boximator: Generating Rich and Controllable Motions for Video Synthesis.](https://arxiv.org/abs/2402.01566) |Controlling video synthesis is a well-known challenge. This paper suggests guiding the generation using boxes and arrows over time, which enhances human preference judgment but still leaves the user with imperfect guidance. | |[KTO: Model Alignment as Prospect Theoretic Optimization.](https://arxiv.org/abs/2402.01306v1) |Kahneman-Tversky Optimization (KTO) is a novel method for conditioning AI models to more closely resemble human thought processes. Utilizing ideas from prospect theory developed by Kahneman & Tversky, KTO prioritizes utility above preference likelihood. | |[A simple method to reduce hallucination in Large Vision-Language Models.](https://arxiv.org/abs/2402.01345v1) | This study clarifies the reasons for multimodal hallucination, a condition in which large vision-language models (LVLMs) occasionally represent visuals erroneously. One important factor is semantic shift bias, especially at paragraph breaks.| |[CapHuman: Capture Your Moments in Parallel Universes.](https://caphuman.github.io/) | Given only one reference facial photograph, our CapHuman can generate photo-realistic specific individual portraits with content-rich representations and diverse head positions, poses, facial expressions, and illuminations in different contexts.| |[Nomic Embed: Training a Reproducible Long Context Text Embedder.](https://arxiv.org/abs/2402.01613v1) | Nomic-Embed-Text-V1 is an open-source, completely reproducible text embedding model that raises the bar. It does well on activities with both short and lengthy contexts. Nomic-Embed-Text-V1, which is transparent to the extreme, provides full access to its model weights, training code, and a large dataset consisting of 235 million text pairs.| |[SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?](https://arxiv.org/abs/2402.01832) | Training large-scale picture models is difficult due to legitimate copyright concerns and the disappearance of large-scale datasets like LAION. This work demonstrates that 30 million artificially created pictures may be used to train a strong CLIP model.| |[Rethinking Optimization and Architecture for Tiny Language Models.](https://arxiv.org/abs/2402.02791v2) | This work investigates how to focus on small models with fewer parameters in order to develop strong language models better suited for mobile devices. | |[Unified Hallucination Detection for Multimodal Large Language Models.](https://arxiv.org/abs/2402.03190v1) |In order to address the important problem of hallucinations in Multimodal Large Language Models (MLLMs), researchers have created a new benchmark called MHaluBench, which is used to assess different hallucination detection techniques. | |[InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions.](https://invictus717.github.io/InteractiveVideo/) | With InteractiveVideo, users may now create videos in a new style that allows for dynamic user interaction. This intuitive framework, in contrast to conventional techniques, enables real-time adjustments utilizing text, graphics, painting, and even drag-and-drop.| |[DeepSeekMath.](https://github.com/deepseek-ai/deepseek-math) |DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | ## News |Link|description| |---|---| |[Sakana Awarded Japanese Government Supercomputing Grant.](https://sakana.ai/nedo-grant/) |Sakana AI is one of seven institutions in Japan chosen by the Japanese government to receive a supercomputing grant, for encouraging development of foundation AI models to strengthen the capabilities of Japan’s generative AI ecosystem. | |[Hugging Face launches open source AI assistant maker to rival OpenAI’s custom GPTs.](https://venturebeat.com/ai/hugging-face-launches-open-source-ai-assistant-maker-to-rival-openais-custom-gpts/) | Hugging Face, the New York City-based startup that offers a popular, developer-focused repository for open source AI code and frameworks (and hosted last year’s “Woodstock of AI”), today announced the launch of third-party, customizable Hugging Chat Assistants.| |[Arc is building an AI agent that browses on your behalf.](https://techcrunch.com/2024/02/01/arc-is-building-an-ai-agent-that-browses-on-your-behalf/) | The Browser Company, which makes the Arc Browser, is on a quest to change that by building an AI that surfs the web for you and gets you the results while bypassing search engines. | |[Introducing Qwen1.5.](https://qwenlm.github.io/blog/qwen1.5/) | 0.5B to 72B range of parameters. This collection of multilingual models is outstanding. It's interesting to note that the first significant sub-1B parameter language model is the smallest model.| |[Inside OpenAI’s Plan to Make AI More ‘Democratic’.](https://time.com/6684266/openai-democracy-artificial-intelligence/) |Colin Megill met with Wojciech Zaremba, co-founder of OpenAI, in May 2023 to talk about integrating Polis, an AI-powered public debating platform that promotes democratic involvement. The cooperation sought to use public feedback to match AI with human ideals. It started the "Democratic Inputs to AI" project at OpenAI, which aims to investigate AI governance through a $1 million award program. | |[Roblox releases real-time AI chat translator.](https://www.theverge.com/2024/2/5/24061495/roblox-generative-ai-chat-translator) | Roblox built an AI model that it says translates text chats so quickly users may not even notice it’s translating the messages of other players at first. It works with 16 languages, including English, French, Japanese, Thai, Polish, and Vietnamese. | |[OpenAI is adding new watermarks to DALL-E 3.](https://www.theverge.com/2024/2/6/24063954/ai-watermarks-dalle3-openai-content-credentials) |OpenAI says watermarks in image metadata are not perfect, but they help build trust of digital information. | |[Microsoft Copilot for Sales and Copilot for Service are now generally available.](https://cloudblogs.microsoft.com/dynamics365/bdm/2024/02/01/microsoft-copilot-for-sales-and-copilot-for-service-are-now-generally-available/) |The AI-powered Copilot for Sales and Service from Microsoft is now widely accessible. It increases the efficiency of sales and support staff by integrating with CRM platforms like Salesforce. The solutions promise to improve customer interactions and expedite company operations by automating repetitive tasks and providing insights directly within Microsoft 365 apps. Early users of these AI capabilities, such as Avanade, report considerable time savings and improved client engagement. | |[First passages of rolled-up Herculaneum scroll revealed.](https://www.nature.com/articles/d41586-024-00346-8) |Researchers used artificial intelligence to decipher the text of 2,000-year-old charred papyrus scripts, unveiling musings on music and capers. | |[IBM wants to build a 100,000-qubit quantum computer.](https://www.technologyreview.com/2023/05/25/1073606/ibm-wants-to-build-a-100000-qubit-quantum-computer/) | The company wants to make large-scale quantum computers a reality within just 10 years.| |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | ## Resources |Link|description| |---|---| |[aphrodite-engine.](https://github.com/PygmalionAI/aphrodite-engine) | For AI inference workloads, the Aphrodite engine can increase throughput while lowering VRAM needs.| |[chatllm-vscode.](https://marketplace.visualstudio.com/items?itemName=locuslab.chatllm-vscode) | ChatLLM is a VSCode extension for interacting with LLM APIs in a flexible and long-form manner. It leverages the VSCode notebook support to do so, creating a new type of notebook (.chatllm) files where you can interact with an (API-based) LLM system over a long document. | |[diffusers v0.26.0.](https://github.com/huggingface/diffusers/releases/tag/v0.26.0) |This new release comes with two new video pipelines, a more unified and consistent experience for single-file checkpoint loading, support for multiple IP-Adapters’ inference with multiple reference images, and more. | |[Ollama vision models.](https://ollama.ai/blog/vision-models) |Recently, support for vision models was introduced by Ollama. Llava 1.6 comes with both Python and JavaScript packages that offer enhanced support and vision functionality. | |[Image to Music v2.](https://huggingface.co/posts/fffiloni/484223631728087) |Images to text, text to prompt, and prompt to music can all be translated into a visually appealing pipeline. | |[3DTopia.](https://github.com/3DTopia/3DTopia) |A two-stage text-to-3D generation model. The first stage uses diffusion model to quickly generate candidates. The second stage refines the assets chosen from the first stage. | |[Open Source Alternative to Rabbit.](https://github.com/KillianLucas/01) |An open-source version of the Rabbit hardware, complete with language modeling, is being developed by a team. | |[NaturalSQL by ChatDB.](https://github.com/cfahlgren1/natural-sql) | NaturalSQL by ChatDB is a series of models with state-of-the-art performance on Text to SQL instructions.| |[contextual_bandits_tutorial.](https://github.com/facebookresearch/Pearl/blob/main/pearl/tutorials/contextual_bandits/contextual_bandits_tutorial.ipynb) |Meta maintains the RL framework called Pearls. This tutorial uses the program to walk through a bandit-based learning problem. | |[BRIA Background Removal v1.4 Model Card.](https://huggingface.co/briaai/RMBG-1.4) |RMBG v1.4 is our state-of-the-art background removal model, designed to effectively separate foreground from background in a range of categories and image types. This model has been trained on a carefully selected dataset, which includes: general stock images, e-commerce, gaming, and advertising content, making it suitable for commercial use cases powering enterprise content creation at scale. | |[MetaVoice-1B.](https://huggingface.co/metavoiceio/metavoice-1B-v0.1) | a small and powerful text-to-speech model that supports generation and voice cloning.| |[Latxa.](https://huggingface.co/collections/HiTZ/latxa-65a697e6838b3acc53677304) |Latxa is a collection of foundation models specifically tuned for Basque. | |[fabric.](https://github.com/danielmiessler/fabric) | An open-source framework for augmenting humans using AI.| |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | ## Perspectives |Link|description| |---|---| |[MIT Paper: AI’s Labor Market Impacts Are Slower Than Expected.](https://aisupremacy.substack.com/p/mit-paper-ais-labor-market-impacts) |The economic feasibility of automating vision-based operations is examined in the working paper "Beyond AI Exposure: Which Tasks are Cost-Effective to Automate with Computer Vision?" authored by researchers from IBM and MIT. Just 23% of them are profitable to automate, it was discovered. In contrast with more disruptive expectations, the report projects a gradual impact on the job market over several years. | |[How AI Is Helping Us Learn About Birds.](https://themarkup.org/hello-world/2024/02/03/how-ai-is-helping-us-learn-about-birds) |Machine learning is powering new insights into how birds migrate—and forecasts about where they’ll go next | |[The Techno-Industrial Revolution.](https://www.notboring.co/p/the-techno-industrial-revolution) | |The increasing sophistication of AI tooling and corporate use cases will lead to an increasing number of practical uses of the technology. The potential here can be viewed through the lens of how AI will increase margins significantly while lowering costs and improving process efficiency. This could open the door to entirely new approaches that weren't previously viable due to extremely narrow profit margins. A couple of these examples are examined in this article. |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | |[.]() | | # ML news: Week 29 January - 4 February ## Research |Link|description| |---|---| |[Matryoshka Representation Learning.](https://arxiv.org/abs/2205.13147) |The new embeddings from OpenAI are scalable to meet your demands. This is thought to be caused by the learning strategy known as the nesting doll approach, which learns characteristics at different granularities. | |[Vivim: a Video Vision Mamba for Medical Video Object Segmentation.](https://arxiv.org/abs/2401.14168v1) | A new framework called Vivim efficiently processes lengthy video sequences for medical video object segmentation. In comparison to conventional techniques, Vivim provides faster and more accurate segmentation results by effectively compressing spatiotemporal data using the state space model methodology.| |[Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities.](https://ailab-cvc.github.io/M2PT/) | This study presents a unique way to improve transformers by utilizing disparate input from many modalities, e.g., audio data to improve an image model. By connecting the transformers of two distinct modalities in a unique way, Multimodal Pathway enables a target modality to profit from the advantages of another.| |[pix2gestalt: Amodal Segmentation by Synthesizing Wholes.](https://gestalt.cs.columbia.edu/) | A framework called Pix2Gestalt is intended for zero-shot amodal segmentation. When an item is partially occluded, it can rebuild its entire shape and look with great skill. Pix2Gestalt, which makes use of large-scale diffusion models, performs exceptionally well in difficult situations, such as producing artistic images that break convention.| |[Large-Vocabulary 3D Diffusion Model with Transformer.](https://ziangcao0312.github.io/difftf_pages/) |The variety of objects that may be generated in 3D poses a significant difficulty. This study builds up the system to operate with a considerably bigger range of items in each 3D category and employs a changed architecture to enhance sampling efficiency. | |[SliceGPT: Compress Large Language Models by Deleting Rows and Columns.](https://arxiv.org/abs/2401.15024) |Another potential distillation work. Importantly, this one can work on models as small as Phi-2. This means you can remove 90% of the rows and columns of weight matrices with minimal reduction to quality at almost all scales. | |[Learning Universal Predictors.](https://arxiv.org/abs/2401.14953) |The process of teaching systems to learn from experience and swiftly adjust to new tasks is known as meta-learning. With artificial data produced by a Universal Turing Machine, this Google project enhances Meta Learning and conducts both theoretical and experimental analysis of the outcomes. | |[CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion.](https://arxiv.org/abs/2401.14066v1) | CreativeSynth is an artistic picture editing technique that combines text and image inputs in a seamless manner. Its diffusion approach, which has specialized attention processes built in, allows for fine alteration of both style and content while maintaining the essential elements of the original artwork.| |[Annotated Hands for Generative Models.](https://arxiv.org/abs/2401.15075v1) |By adding three more channels to training photos for hand annotations, researchers have increased the capacity of generative models, such as GANs and diffusion models, to produce realistic hand images. | |[Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling.](https://arxiv.org/abs/2401.16380) | Many AI systems employ the concept of "up captioning" to enhance labels during training. This work from Apple rephrases C4 as instructions, Q&A pairs, and more in order to apply it to pre-training. The rephrasing step increased convergence by 10x, according to the study, making the model significantly more sample-efficient, albeit at the expense of the rephrasing step itself.| |[Continual Learning with Pre-Trained Models: A Survey.](https://arxiv.org/abs/2401.16386v1) |This work provides an extensive overview of the most recent developments in continuous learning, which is centered on continually adjusting to new information while preserving prior understanding. | |[MacGNN.](https://github.com/yuanchenbei/macgnn) |The MAcro Recommendation Graph (MAG) and Macro Graph Neural Networks (MacGNN) are introduced in this research. These methods greatly reduce the number of nodes by assembling similar behavior patterns into macro nodes, which addresses the computational difficulty of Graph Neural Networks. | |[Machine learning predicts which rivers, streams, and wetlands the Clean Water Act regulates.](https://www.science.org/doi/10.1126/science.adi3794) | Our framework can support permitting, policy design, and use of machine learning in regulatory implementation problems. | |[Weaver: Foundation Models for Creative Writing.](https://arxiv.org/abs/2401.17268) | A group of models called Weaver have been trained especially to narrate stories. On a benchmark for storytelling, the biggest model (34B params) performs better than GPT-4.| |[Text Image Inpainting via Global Structure-Guided Diffusion Models.](https://arxiv.org/abs/2401.14832v1) |In this study, two datasets for handwritten words and scenes are introduced, along with a benchmark. With original, damaged, and assistant photos, the new Global Structure-guided Diffusion Model (GSDM) effectively recovers clean texts by taking use of text structure. Both picture quality and identification accuracy demonstrate notable gains. | |[Multi-granularity Correspondence Learning from Long-term Noisy Videos.](https://lin-yijie.github.io/projects/Norton/) | With Norton, the multi-granularity noisy correspondence problem in video-language studies is addressed, offering a n ... ...

近期下载者：

相关文件：

评论：[我要评论] [举报此文件]

收藏者：