Get Started with OpenAI Shap-E to Generate 3D Objects from Text & Images

Get Started with OpenAI Shap E to Generate 3D Objects from Text Images 1

OpenAI has just released Shap-E, a generative model for creating 3D assets based on text prompts and images.

It’s capable of generating both textured meshes1Meshes are a way to represent 3D objects using a collection of vertices, edges, and faces. These small shapes, usually triangles or quadrilaterals, are combined to form the object’s surface. Meshes are commonly used in 3D modeling and animation. and neural radiance fields2Neural Radiance Fields (NeRF) represent 3D objects as continuous functions mapping 3D coordinates to color and density values. A neural network learns this representation from 2D images of the object. NeRF can generate high-quality images of the object from different viewpoints and lighting conditions., allowing for a diverse range of 3D outputs.

In this tutorial, we’ll walk you through setting up Shap-E on Google Colab (free)3Google Colab is an online platform for writing, running, and sharing code, similar to how Google Docs is an online platform for writing and sharing documents. Colab provides an interactive coding environment, where you can write, run, and share code in Python notebooks. It offers built-in access to powerful computing resources, including GPUs and TPUs, which makes it a popular choice for machine learning and data science projects. Just like Google Docs, you can collaborate with others in real-time and easily share your work., running the code to generate 3D objects from a text prompt and from an image. Thanks to Google Colab you don’t need a powerful GPU because we’ll use the one provided by Google.

The code we’re running can be found here (taken from the openai/shap-e Github):

Quick Demo

In this short demo we’re installing and running Shap-E on Google Colab.

Setting up Shap-E on Google Colab

Open Google Colab by visiting https://colab.research.google.com/.

Click on File > New notebook to create a new Colab notebook.

Enable GPU on Google Colab

We’ll then need to enable Graphics Processing Unit (GPU) on our notebook. It’s often required for resource-intensive tasks like deep learning.

To enable GPU in Google Colab, follow these steps:

  1. You have opened your new Colab notebook.
  2. Click on the “Runtime” menu in the top toolbar.
  3. Select “Change runtime type” from the dropdown menu.
  4. In the “Runtime type” dialog, choose “GPU” from the “Hardware accelerator” dropdown.
    image 5
  5. Click “Save” to apply the changes.

Install Shap-E

In Google Colab, we need to first clone the Shap-E repository from GitHub and then install the required packages. To do this, follow these steps:

Step 1. In the first cell of your Colab notebook, paste the following code:

!git clone https://github.com/openai/shap-e.git

This command clones the Shap-E repository from GitHub to your Colab environment. It downloads the code, examples, and required files for you to use Shap-E.

Run the cell by clicking on the play button or pressing Shift + Enter.

image 1
Example running the command.

Step 2. In a new cell, paste the following code:

%cd shap-e

This command changes the current working directory to the shap-e folder, which is where we cloned the Shap-E repository in the previous step. We need to be inside this folder to install the required packages.

image 2
Example running the command.

Run the cell by clicking on the play button or pressing Shift + Enter.

Step 3. In another new cell, paste the following code:

!pip install -e .

This command installs the necessary packages for Shap-E in your Colab environment. The -e flag installs the package in “editable” mode, which means that any changes made to the package files will be reflected in the installed package without needing to reinstall it.

Run the cell to complete the installation.

image 3
Installing Shap-e.

Now that the Shap-E repository is cloned and the required packages are installed, you can proceed with generating 3D objects using the code provided earlier in the tutorial.

Generate 3D Objects from Text with Shap-E

To generate a 3D object based on a text prompt, follow these steps:

Step 1. In a new cell in your Colab notebook, paste the following code:

import torch
from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
xm = load_model('transmitter', device=device)
model = load_model('text300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))

batch_size = 4
guidance_scale = 15.0
prompt = "a shark"

latents = sample_latents(
    batch_size=batch_size,
    model=model,
    diffusion=diffusion,
    guidance_scale=guidance_scale,
    model_kwargs=dict(texts=[prompt] * batch_size),
    progress=True,
    clip_denoised=True,
    use_fp16=True,
    use_karras=True,
    karras_steps=64,
    sigma_min=1e-3,
    sigma_max=160,
    s_churn=0,
)

render_mode = 'nerf'  # you can change this to 'stf'
size = 64  # this is the size of the renders; higher values take longer to render.

cameras = create_pan_cameras(size, device)
for i, latent in enumerate(latents):
    images = decode_latent_images(xm, latent, cameras, rendering_mode=render_mode)
    display(gif_widget(images))

This code sets up the necessary imports, loads the Shap-E models, and configures the generation parameters such as the text prompt and rendering options. The text prompt in this example is a shark, but you can change it to any object you’d like to generate.

Step 2. Run the cell to generate 3D objects based on the text prompt. The output will be displayed as animated GIFs, showing the generated 3D objects from different angles.

image 6

You can experiment with different text prompts and rendering options by changing the prompt, render_mode, and size variables in the code.

Saving the Generated 3D Objects as Meshes

To save the generated 3D objects as mesh files (in PLY format), follow these steps:

Step 1. In a new cell, paste the following code:

from shap_e.util.notebooks import decode_latent_mesh

for i, latent in enumerate(latents):
    with open(f'example_mesh_{i}.ply', 'wb') as f:
        decode_latent_mesh(xm, latent).tri_mesh().write_ply(f)

Step 2. Run the cell to save the generated 3D objects as PLY files. These files will be saved in the shap-e folder in your Colab environment.

They’ll be saved as files named example_mesh_0.ply.

image 7

Step 3. To download the generated PLY files to your local machine, click on the folder icon on the left sidebar in Colab, navigate to the shap-e folder, and right-click on the PLY files you want to download. Select ‘Download’ to save them to your local machine.

Now you can use these generated 3D objects in any 3D modeling software that supports PLY files.

image 4
Viewing the mesh in Windows 3D viewer.

Generate 3D Objects from Images with Shap-E

You can also generate 3D objects from images with Shap-E.

To do this, first we’ll use the sample image provided in the examples.

image 9
corgi.png

First download that image and upload it in the shap-e directory in your Google Colab.

Just hover over the directory in the file browser on the left and you’ll see a 3 dot menu. Click it and then click upload, and upload corgi.png.

Next, assuming you enabled GPU and installed Shap-E, run the following code:

import torch

from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget
from shap_e.util.image_util import load_image

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

xm = load_model('transmitter', device=device)
model = load_model('image300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))

batch_size = 4
guidance_scale = 3.0

image = load_image("example_data/corgi.png")

latents = sample_latents(
    batch_size=batch_size,
    model=model,
    diffusion=diffusion,
    guidance_scale=guidance_scale,
    model_kwargs=dict(images=[image] * batch_size),
    progress=True,
    clip_denoised=True,
    use_fp16=True,
    use_karras=True,
    karras_steps=64,
    sigma_min=1e-3,
    sigma_max=160,
    s_churn=0,
)

render_mode = 'nerf' # you can change this to 'stf' for mesh rendering
size = 64 # this is the size of the renders; higher values take longer to render.

cameras = create_pan_cameras(size, device)
for i, latent in enumerate(latents):
    images = decode_latent_images(xm, latent, cameras, rendering_mode=render_mode)
    display(gif_widget(images))

The result doesn’t seem all that great. But hopefully with some tweaking or using other images, you’ll get better results.

image 10

Conclusion

OpenAI’s Shap-E is a powerful tool that enables users to generate 3D objects from text and images.

By leveraging Google Colab, you can easily set up and run Shap-E without the need for any complicated installations or powerful hardware.

Resources

  • 1
    Meshes are a way to represent 3D objects using a collection of vertices, edges, and faces. These small shapes, usually triangles or quadrilaterals, are combined to form the object’s surface. Meshes are commonly used in 3D modeling and animation.
  • 2
    Neural Radiance Fields (NeRF) represent 3D objects as continuous functions mapping 3D coordinates to color and density values. A neural network learns this representation from 2D images of the object. NeRF can generate high-quality images of the object from different viewpoints and lighting conditions.
  • 3
    Google Colab is an online platform for writing, running, and sharing code, similar to how Google Docs is an online platform for writing and sharing documents. Colab provides an interactive coding environment, where you can write, run, and share code in Python notebooks. It offers built-in access to powerful computing resources, including GPUs and TPUs, which makes it a popular choice for machine learning and data science projects. Just like Google Docs, you can collaborate with others in real-time and easily share your work.
0 Shares:
Subscribe
Notify of
guest
Receive notifications when your comment receives a reply. (Optional)
Your username will link to your website. (Optional)

7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Alan
Alan
5 months ago

This was really helpful, thanks

Gary
Gary
5 months ago

nice introduction to the topic !
After giving it a shot, Collab shows this error :

—————————————————————————
RuntimeError Traceback (most recent call last)
in ()
6
7 device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)
—-> 8 xm = load_model(‘transmitter’, device=device)
9 model = load_model(‘text300M’, device=device)
10 diffusion = diffusion_from_config(load_config(‘diffusion’))

3 frames
/content/shap-e/shap_e/models/download.py in check_hash(path, expected_hash)
86 actual_hash = hash_file(path)
87 if actual_hash != expected_hash:
—> 88 raise RuntimeError(
89 f”The file {path} should have hash {expected_hash} but has {actual_hash}. ”
90 “Try deleting it and running this call again.”

RuntimeError: The file /content/shap-e/shap_e_model_cache/transmitter.pt should have hash af02a0b85a8abdfb3919584b63c540ba175f6ad4790f574a7fef4617e5acdc3b but has 03622d595efd68c5cedf8f1633937fd40ed9a7db900a4c2a3d70493542182878. Try deleting it and running this call again.

april
april
5 months ago

Locally start python to run the file, install https://openaipublic.azureedge.net/main/shap-e/transmitter.pt has been interrupted, how to solve it

Mark
Mark
4 months ago

Great article! Is Shap-e still your favorite text to 3D? Thank you!

Sher
Sher
2 months ago

Hi!
Thank you very much for this!! As someone who has no background in coding at all, and I have visited other articles, I’ve been facing so many errors, almost gave up! Trying your steps, made it work for the very first time! thank you!! I’m very thankful! Maybe also add that this is only doable in Linux environment for dummies like me. Thanks!

You May Also Like