Notes from NYCML and CHANEL’s Synthetic Media Challenge

Over the Spring semester, I participated in the NYC Media Lab’s Synthetic Media challenge, in collaboration with CHANEL. This challenge brought together teams of faculty and students from a variety of schools in New York to explore how new technologies such as GANs can reimagine the future of storytelling, and unlock new ways to develop, refine, and perhaps redefine the user experience.”  Here is an excerpt from the lab’s description.

“AI, machine learning, computer vision, and other emerging technologies are changing how media is produced, distributed, and consumed. Within this ever-evolving landscape of new media, the intersection of AI and art, and more specifically storytelling, is an exciting area of innovation and creativity.”

Five teams explored a variety of ways in which emerging tools can be used to push the bounds of non-linear, interactive forms of storytelling; and how these new mediums connect with audiences and open up new forms of creative expression.

My team included me and Shirin Anlen, and together we built an AR face filter designed for video calls on Desktop. We wrote code to measure the distance between the user’s lips and determine whether the user is speaking. The filter transforms the user’s appearance based on their level of participation. If they start speaking, the flora slowly disappears. The color of the flora also changes based on the user’s timezone, making the effect look slightly different for each person.  

In recent years, the ability to create AR effects and distribute them has opened up for content creators on several social media platforms such as Facebook, Instagram, and Snapchat. Creations so far vary in their structural and intellectual complexity – ranging from simple visual puns to more elaborate branching narratives like the work of David O’reilly. Some news organizations have also devoted some efforts to experimentation in these platforms, for example, the New York Times’ AR filters created for Instagram (accessible through NYT Instagram page) and the project Homeless Realities created by the Journalism team at USC Annenberg which used Snapchat to tell stories of the Homeless in Los Angeles. 

For our project this spring, we worked with Snap’s Lens Studio to create a camera effect accessible through Snap Camera. The continuously growing capabilities of the Lens Studio software can open up endless possibilities for augmenting our daily interactions on video calls. In the future, I’d be interested in employing the software’s skeletal tracking as well as the ability to use your own ML model for style transfer, classification or generative imaging.

The lens works as followed: once a face is detected, the code starts measuring the distance between the user’s lips to detect speech. If the participant is silent for 30 seconds, leaves start growing on their heads. As long as the user is not speaking, ​the counting continues, flowers grow, pistils appear on eyelashes and cheeks are retouched with green blush. A big flower then appears on one of the eyes and if the participant remains silent, the growth continues until eventually a solid background takes over and only your eyes are visible through. The duration of a complete cycle from start to finish is 5 minutes. At any point, if the participant is speaking, the counting goes backwards and the growth is reversed.

With this prototype we sought to experiment with a few ideas:

  • AR in multi-participant zoom call
    Augmented Reality filters are not always considered the most appropriate during work calls, and can even lead to viral level mishaps, like the case of the Zoom cat Lawyer. How can AR be used in the space of the multi-participant Zoom call? We chose to make one speculation that actually highlights participation or lack thereof and playfully brings to attention more introverted personalities.
  • A different approach to time in AR effects
    How can we create AR content that is meant to “live” with the user for longer periods of time? Most AR effects are designed to provide an immediate impression, and our question was – what if you could wear a mask that grows and changes throughout the day and responds to your behavior?
  • Real-world data in virtual space
    With the pandemic, we are on video calls more than ever before crossing borders and time zones. How can we “bring the outside in” and give presence to a participant’s geographic location and time of day in the virtual space?   

The decision to work with flora for the design of this effect was influenced by CHANEL’s history, especially the symbol of the Camellia and the way it varies throughout the globe. The decision to track speech, in combination with the flora – led to the metaphor of the Wallflower.


Other teams in the lab cohort worked on the following projects:

 

Team Cloud Theory aimed to answer the question “Can machines play like we do?” First, the AI describes the shapes it sees in the clouds in the sky (much like we did as children). Then the machine narrator— a GPT-2 trained model —spins a tale based on those shapes. 

“Feed” by IMUU is a web XR experience that explores the notion of the self and the fragmented existence of bodies within the context of surveillance capitalism. The team used machine-generated images in the game’s environment and character design.

Team Bigotis Fabra created an iterative tool for machine-generated imagery. The project employs a new model called “Clip”, a pipeline that extracts information from both text and image data to generate a similarity score. The tool also enables uploading an image to condition the generated image. For example, you could upload an image of a logo and the generated content would echo the logo shape in some form.

The team GANgang had developed a white paper and a workflow for using machine-generated visuals in a way that overcomes the current imprecise visual quality of GAN outputs. In this case, they used style transfer on live-action footage and ran it through several experimental tools such as Depth Aware Convolutional Network and motion estimation with Optical Flow.