r

Merry Xmas everyone! It’s giveaway time! ???????? . Thank you to...



Merry Xmas everyone! It’s giveaway time! ????????
.
Thank you to all those who participated in my preset giveaway this week! The support makes all the hard work and extra effort worth it!
.
Without further ado, the randomly drawn winners of my custom Lightroom presets are @l9lee @rchellau @bokeh.jay! Congrats and check your DMs soon for details! ????
.
You still have until tomorrow to grab my presets (which this shot was edited with) for 50% off! They’ll be going back to regular price after so don’t miss out! ???? (at Toronto, Ontario)




r

Trying to straighten all the lines on this shot is a sure fire...



Trying to straighten all the lines on this shot is a sure fire way to go blind. ???? (at London, United Kingdom)




r

I’ve gone subway hopping for photos in every city...



I’ve gone subway hopping for photos in every city I’ve been to except the one I live in. ???? (at Toronto, Ontario)




r

I just realized that I can export my entire story all at once...



I just realized that I can export my entire story all at once now, which means uploading my tutorials to my Facebook page will be a million times easier (it was tedious to stitch all the individual clips together before). ????
.
Related: I posted a story this morning deconstructing the edit on yesterday’s shot.
.
Also related: I uploaded the 3 tutorials from my November feature on @thecreatorclass to my Facebook page this morning too. More to come! (at London, United Kingdom)




r

I took this shot about a year ago when I had a very different...



I took this shot about a year ago when I had a very different editing style. A ton of faded blacks and, believe it or not, a subtle green tint (unknowingly inherited from the preset I was using at the time). Re-editing it now, I’m happy with the way my style has evolved, though I can already sense that I’m on the brink of evolving it again. And I’m okay with that. ???? (at London, United Kingdom)




r

This might as well be a Herschel ad. ???? (at London, United...



This might as well be a Herschel ad. ???? (at London, United Kingdom)




r

This trip solidified my conviction to learning photography. A...



This trip solidified my conviction to learning photography. A lot has happened since this shot was taken.
Can you pinpoint the moment you decided to pursue photography? (at Toronto, Ontario)




r

A lot to look forward to in 2017. How did 2016 treat you: ???? or...



A lot to look forward to in 2017. How did 2016 treat you: ???? or ????? (at San Francisco, California)




r

Four days from now I’ll be boarding a one way flight to...



Four days from now I’ll be boarding a one way flight to San Francisco to take on the next evolution of my role at @shopify. Leaving the city that I’ve called home my entire life and the people who have defined everything I am was one of the most uncomfortable decisions I’ve ever had to make. But this wouldn’t be the first time I’ve chased discomfort in my career.
.
I wrote about my ongoing pursuit for discomfort this morning in hopes of inspiring others to do the things that scare and challenge them this year. You can find the link in my profile.
.
Happy 2017! ????
.
????: @jonasll (at San Francisco, California)




r

Quick survey: on average, what time is it when you check...



Quick survey: on average, what time is it when you check Instagram for the first time on any given day? (Be sure to include your timezone!)
.
PS: Thank you for all the incredible support on yesterday’s announcement. ❤️ (at Toronto, Ontario)

















r

Self-promotion

The world has changed. Everything we do is more immediately visible to others than ever before, but much remains the same; the relationships we develop are as important as they always were. This post is a few thoughts on self-promotion, and how to have good relationships as a self-publisher.

Meeting people face to face is ace. They could be colleagues, vendors, or clients; at conferences, coffee shops, or meeting rooms. The hallway and bar tracks at conferences are particularly great. I always come away with a refreshed appreciation for meatspace. However, most of our interactions take place over the Web. On the Web, the lines separating different kinds of relationships are a little blurred. The company trying to get you to buy a product or conference ticket uses the same medium as your friends.

Freelancers and small companies (and co-ops!) can have as much of an impact as big businesses. ‘I publish therefore I am’ could be our new mantra. Hence this post, in a way. Although, I confess I have discussed these thoughts with friends and thought it was about time I kept my promise to publish them.

Publishing primarily means text and images. Text is the most prevalent. However, much more meaning is conveyed non-verbally. ‘It’s not what you say, it’s how you say it.’

Text can contain non-verbal elements like style — either handwritten or typographic characters — and emoticons, but we don’t control style in Twitter, email, or feeds. Or in any of the main situations where people read what we write (unless it’s our own site). Emoticons are often used in text to indicate tone, pitch, inflection, and emotion like irony, humour, or dismay. They plug gaps in the Latin alphabet’s scope that could be filled with punctuation like the sarcasm mark. By using them, we affirm how important non-verbal communication is.

The other critical non-verbal communication around text is karma. Karma is our reputation, our social capital with our audience of peers, commentators, and customers. It has two distinct parts: Personality, and professional reputation. ‘It’s not what was said, it’s who said what.’

So, after that quick brain dump, let me recap:

  • Relationships are everything.
  • We publish primarily in text without the nuance of critical non-verbal communication.
  • Text has non-verbal elements like style and emoticons, but we can only control the latter.
  • Context is also non-verbal communication. Context is karma: Character and professional reputation.

Us Brits are a funny bunch. Traditionally reserved. Hyperbole-shy. At least, in public. We use certain extreme adjectives sparingly for the most part, and usually avoid superlatives if at all possible. We wince a little if we forget and get super-excited. We sometimes prefer ‘spiffing’ accompanied by a wry, ironic smile over an outright ‘awesome’. Both are genuine — one has an extra layer in the inflection cake. However, we take great displeasure in observing blunt marketing messages that try to convince us something is true with massive, lobe-smacking enthusiasm, and some sort of exaggerated adjective-osmosis effect. We poke fun at attempts to be overly cool. We expect a decent level of self-awareness and ring of honesty from people who would sell us stuff. The Web is no exception. In fact, I may go so far as to say that the sensibilities of the Web are fairly closely aligned with British sensibilities. Without, of course, any of our crippling embarrassment. In an age when promoting oneself on the Web is almost required for designers, that’s no bad thing. After all, running smack bang through the middle of the new marketing arts is a large dose of reality; we’re just a bunch of folks telling our story. No manipulation, cool-kid feigned nonchalance, or lobe-smacking enthusiasm required.

Consider what the majority of designers do to promote themselves in this brave new maker-creative culture. People like my friend, Elliot Jay Stocks: making his own magazine, making music, distributing WordPress themes, and writing about his experiences. Yes, it is important for him that he has an audience, and yes, he wants us to buy his stuff, but no, he won’t try to impress or trick us into liking him. It’s our choice. Compare this to traditional advertising that tries to appeal to your demographic with key phrases from your tribe, life-style pitches, and the usual raft of Freudian manipulations. (Sarcasm mark needed here, although I do confess to a soft spot for the more visceral and kitsch Freudian manipulations.)

There is a middle ground between the two though. A dangerous place full of bad surprises: The outfit that seems like a human being. It appears to publish just like you would. They want money in exchange for their amazing stuff they’re super-duper proud of. Then, you find out they’re selling it to you at twice the price it is in the States, or that it crashes every time it closes, or has awful OpenType support. You find out the human being was really a corporate cyborg who sounds like you, but is not of you, and it’s impervious to your appeals to human fairness. Then there are the folks who definitely are human, after all they’re only small, and you know their names. All the non-verbal communication tells you so. Then you peek a little closer —  you see the context — and all they seem to do is talk about themselves, or their business. Their interactions are as carefully crafted as the big companies, and they treat their audience as a captive market. Great spirit forefend they share the bandwidth by celebrating anyone else. They sound like one of us, but act like one of them. Their popularity is inversely proportional to their humanity.

Extreme examples, I know. This is me exploring thoughts though, and harsh light helps define the edges. Feel free to sound off if it offends, but mind your non-verbal communication. :)

That brings me to self-promotion versus self-aggrandisement; there’s a big difference between the two. As independent designers and developer-type people, self-promotion is good, necessary, and often mutually beneficial. It’s about goodwill. It connects us to each other and lubricates the Web. We need it. Self-aggrandisement is coarse, obvious, and often an act of denial; the odour of insecurity or arrogance is nauseating. It is to be avoided.

If you consider the difference between a show-off and a celebrant, perhaps it will be clearer what I’m reaching for:

The very best form of self-promotion is celebration. To celebrate is to share the joy of what you do (and critically also celebrate what others do) and invite folks to participate in the party. To show off is a weakness of character — an act that demands acknowledgement and accolade before the actor can feel the tragic joy of thinking themselves affirmed. To celebrate is to share joy. To show-off is to yearn for it.

It’s as tragic as the disdainful, casual arrogance of criticising the output of others less accomplished than oneself. Don’t be lazy now. Critique, if you please. Be bothered to help, or if you can’t hold back, have a little grace by being discreet and respectful. If you’re arrogant enough to think you have the right to treat anyone in the world badly, you grant them the right to reciprocate. Beware.

Celebrants don’t reserve their bandwidth for themselves. They don’t treat their friends like a tricky audience who may throw pennies at you at the end of the performance. They treat them like friends. It’s a pretty simple way of measuring whether what you publish is good: would I do/say/act the same way with my friends? Human scales are always the best scales.

So, this ends. I feel very out of practise at writing. It’s hard after a hiatus. These are a few thoughts that still feel partially-formed in my mind, but I hope there was a tiny snippet or two in there that fired off a few neurons in your brain. Not too many, though, it’s early yet. :)




r

Reversed Logotype

This image shows a particular optical illusion that confronts us every day. Notice the difference between the black text on a white background and the reverse. With reversed type — light text on a darker background — the strokes seem bolder.

Black text on white is very familiar, so we can be forgiven for thinking it correctly proportioned. For familiarity’s sake we can say it is, but there are two effects happening here: The white background bleeds over the black, making the strokes seem thinner. With reversed type the opposite is true: The white strokes bleed over the black, making it seem bolder.

Punched, backlit letters on a sign outside the Nu Hotel, Brooklyn.

One of the most obvious examples of this is with signs where the letters are punched into the surround then lit from inside. In his article, Designing the ultimate wayfinding typeface, Ralph Herrmann used his own Legibility Text Tool to simulate this effect for road and navigational signs.

One might say that characters are only correctly proportioned with low-contrast. Although objective reality hails that as true, it isn’t a good reason to always set type with low contrast. Type designers have invariably designed around optical illusions and the constraints of different media for us. Low-contrast text can also create legibility and accessibility problems. Fortunately, kind folks like Gez Lemon have provided us with simple tools to check.

As fascinating as optical illusions are —  the disturbing, impossible art of Escher comes to mind — we can design around reversed body type. On the Web, increasing tracking and leading are as simple as increasing the mis-named letter-spacing and line-height in CSS. However, decreasing font weight is a thornier problem. Yes, we will be able to use @font-face to select a variant with a lighter weight, but the core web fonts offer us no options, and there are only a few limited choices with system fonts like Helvetica Neue.

Reversing a logotype

For logotype there are plenty of options, but it makes me slightly uncomfortable to consider switching to a lighter font for reversed type logos. The typeface itself is not the logotype; the variant is, so switching font could be tricky. Ironically, I’d have to be very sure that that was no perceivable difference using a lighter weight font. Also, with display faces, there’s often not a lighter weight available — a problem I came across designing the Analog logo.

The original Analog logo seen here is an adapted version of Fenway Park by Jason Walcott (Jukebox Type).

The logotype worked well when testing it in black on white. However, I wanted a reversed version, too. That’s when I noticed the impact of the optical illusion:

(Reversed without any adjustment.)

It looked bloated! Objective reality be damned; it simply wouldn’t do. After a few minutes contemplating the carnage of adjusting every control point by hand, I remembered something; eureka!

(Reversed then punched.)

Punching the paths through a background image in Fireworks CS4 removed the illusion. (Select both the path and the background then using Modify > Combine Paths > Punch.) Is this a bug? I don’t know, but if it is, it’s a useful one for a change!

Modify > Combine Paths > Punch in Fireworks CS4.

N.B. I confess I haven’t tested this in any other Adobe products, but perhaps you will be so bold? (’scuse the pun. :)

Matthew Kump mentions an Illustrator alternative in the comments.

I grinned. I was happy. All was well with the world again. Lovely! Now I could go right ahead and think about colour and I wouldn’t be far from done. This is how it emerged:

A final note on logotype design & illusions

Before we even got to actual type for the Analog logo, we first had to distill what it would convey. In our case, Alan took us through a process to define the brand values and vision. What emerged were keywords and concepts that fed into the final design. The choice of type, colour, and setting were children of that process. Style is the offspring of meaning.

I always work in greyscale for the first iterations of a new logo for a few simple reasons:

  1. The form has to work independently of colour — think printing in greyscale or having the logo viewed by people with a colour-impairment.
  2. It allows for quick testing of various sizes — small, high contrast versions will emphasise rendering and legibility issues at screen resolutions, especially along curves.
  3. I like black and white. :)

I realise that in this day and age the vast majority of logos need to perform primarily on the Web. However, call me old-fashioned, but I still think that they should work in black and white, too.

Brands and display faces emerged with consumer culture during the 19th Century. Logotypes were displayed prominently in high streets, advertising hoardings, and on sign boards. In many instances the message would be in black and white. They were designed to be legible from a distance, at a glance, and to be instantly recognisable. Even with colour, contrast was important.

The same is true for the Web today; only the context has changed, and the popularity of logomarks and icons. We should always test any logo at low resolutions and sizes, and the brand must still have good contrast (regardless of WCAG 2.0) to be optimal. A combination of colour and form works wonders, but in a world of a million colours where only a handful are named in common parlance, having the right form still seems a smarter choice than trying to own a palette or colour.

A final word

This article was prompted by a happy accident followed by a bit of reading. There are many references to optical illusions in design and typography books. The example image at the start of this article was inspired by one found in the excellent Stop Stealing Sheep and Find Out How Type Works by Erik Spiekermann and E.M. Ginger. There’s also plenty of online material about optical or visual illusions you can dive into. There’s also more on . Oh, and don’t forget the work of M. C. Escher!

Human eyes are amazing. In two sets of watery bags we get a wide-angle lens with incredibly sharp focus and ridiculous depth of field. Apparently our brain is even clever enough to compensate for the lag in the signal getting from retina to cortex. I know next to nothing about ocular science. Spending a morning reading and thinking about optical illusions, and contemplating my own view here in the garden office is pretty awe-inspiring. If only my photographs were as good as my eyes, illusions or no.




r

2010 in Retrospect

Analog, Mapalong, more tries at trans-Atlantic sleep, Cuba, Fontdeck, and my youngest son entering school; it all happened in the last year. At the end of 2007, I wrote up the year very differently. After skipping a couple of years, this is a different wrap-up. To tell the truth I put this together for me, being the very worst of diarists. It meant searching through calendars, Aperture, and elsewhere. I hope it prompts me to keep a better diary. I give you: 2010 in pictures and words:

January

Albany Green, Bristol.

Analog.coop is still fresh after launching in December. We’re still a bit blown away by the response but decide not to do client work, but to make Mapalong instead. We jump through all kinds of hoops trying to make it happen, but ultimately it comes down to our friend and colleague, Chris Shiflett. He gets us going. It snows a lot in Bristol. The snow turns to ice. I slip around, occasionally grumpy, but mostly grinning like an idiot.

February

Morón, Cuba.

My family and I go to Cuba on our first ever all inclusive ‘package’ holiday. It’s a wonderful escape from winter, tempered by surreptitious trips out of the surreal, tourist-only island, to the other Cuba with an unofficial local guide. My boys love the jacuzzi, and sneaking into the gym. Z shoots his first arrow. Just after we return, he turns 4 years old. Now, he wants to go back.

March

DUMBO from the men’s loo at 10 Jay St. — home of Analog NY in Studio 612a.

I visit Chris in Brooklyn to work on Mapalong. We play football. Well, Chris plays. I cripple myself, and limp around a lot. At the same time I meet the irrepressible, Cameron Koczon. We all get drunk on good beer at Beer Table. Life is good. Cameron comes up with the Brooklyn Beta name. It starts to move from idea to action.

Just before Brooklyn, a discussion about First Things First opens during a talk at BathCamp. The follow-ups become passionate with posts like this straw man argument and a vociferous rejoinder.

April and May

In the garden, at home.

The sun comes out. The garden becomes the new studio. Alan Colville and Jon Gibbins stop by as we work on Mapalong. The hunt starts for a co-working space in Bristol. I write pieces about self-promotion and reversed type. Worn out from the sudden burst, I go quiet again.

June

Mild Bunch HQ!

We find a place for our Bristol co-working studio studio. Mild Bunch HQ is born! I design desks for the first time. Our first co-workers are Adam Robertson, Kester Limb, Eugene Getov, and Ben Coleman. Chris and I meet again across the Atlantic; he makes a flying visit to Bristol. The gentle pressure mounts on fellow Analogger, Jon Gibbins to come to Bristol, too. Something special begins. Beer Fridays have started.

Fontdeck!

Fontdeck comes out of private beta! Almost 17 months after Rich Rutter and I talked about a web fonts service in Brighton for the first time, the site was live thanks to the hard work of Clearleft and OmniTI. Now it features thousands of fonts prepared for the Web, and many of the best type designers and foundries in the world.

The Ulster Festival programme.

For the first time in around 15 years I visit Belfast. At the invitation of the Standardistas, Chris and Nik, Elliot Stocks and I talk typography at the Ulster Festival of Art and Design. We’re working on the Brooklyn Beta branding, so talk about that with a bit of neuroscience thrown in as food for thought. Belfast truly is a wonderful place with fantastic people. It made it hard to miss Build for the second time later in the year.

June was busier than it felt. :)

July

Mild Bunch summer; Pieminister, Ginger beer, and Milk Stout.

Summer arrived in earnest. X has a blast at his school sports day. I do, too. Mild Bunch HQ is liberally dosed with shared lunches from Herbert’s bakery and Licata’s deli, and beers on balmy evenings outside The Canteen with friends. That’s all the Mild Bunch is, a group of friends with a name that made us laugh; everyone of friendly disposition is welcome!

August

8Faces and .Net magazine.

8 Faces number 1 is published and sells out in a couple of hours. I was lucky enough to be interviewed, and to sweat over trying to narrow my choices. The .Net interview was me answering a few questions thrown my way from folks on Twitter. Great fun. Elliot, Samantha Cliffe, and I had spent a great day wandering around Montpelier taking pictures in the sun earlier in the year. One of her portraits of me appeared in both magazines. Later that month, I write about Web Fonts, Dingbats, Icons, and Unicode. It’s only my fourth post of the year.

Birthday cake made by my wife, Lowri.

Sometimes, some things strip me of words. Thank you.

September

East River Sunrise from 20 stories up at the home of Jessi and Creighton of Workshop.

The whole of Analog heads to Brooklyn for a Mapalong hack week with the Fictive Kin guys. We start to show it to friends and Brooklyn studio mates like Tina (Swiss Miss) who help us heaps. It’s a frantic week. I get to spend a bit of time with my Analog friend Andrei Zmievski who I haven’t seen in the flesh since 2009. Everyone works and plays hard, and we stay in some fantastic places thanks to Cameron and AirBnB.

Cameron Koczon (front), Larry Legend (middle) and Jon Gibbins (far back with funky glove) in Studio 612a during hack week.

Just before I head to NY, Z starts big school. He looks too small to start. He’s 4. How did time pass so fast? I’m still wondering that after I get back.

October

Brooklyn Beta poster.

The whole of Analog, the Mild Bunch HQ and many others from Bristol, and as far away as Australia and India, head to New York for Brooklyn Beta! A poster whipped together my me, printed in a rush by Rik at Ripe, and transported to NY by Adam Robertson, is given as one of the souvenirs to everyone who comes.

Meanwhile, Jon Gibbins works frantically to get Mapalong ready to give BB an early glimpse of what we’re up to. Two thousand people reserve their usernames before we even go to private beta!

Brooklyn Beta!

Simon Collison giving his Analytical Design workshop on day 1.

Chris and Cameron work tirelessly. Many, many fine people lend a hand. We add some last minute touches to the site, like listing all the crew and attendees as well as the speakers. Cameron shows off Gimme Bar with an hilarious voice-over from Bedrich Rios. Alan narrates Mapalong and we introduce our mapping app to our peers and friends!

Day 2: Chris does technical fixes, Cameron tells jokes, and Cameron Moll waits with great poise for his talk to start.

It’s something we hoped, but never expected: Brooklyn Beta goes down as one of the best conferences ever in the eyes of veteran conference speakers and attendees. ‘Are you sure you’ve not done this before?’ I hear Jonathan Hoefler of Hoefler Frere-Jones ask Cameron. It makes me smile. The fact one of our sponsors asked this question in admiration of Chris and Cameron’s work meant a lot to me. I was proud of them, and grateful to everyone who helped it be something truly friendly, open, smart, and special.

Aftermath: Cameron (blury in action centre left) regales us at Mission Delores; Pat Lauke (left), Lisa Herod (back centre right), Nicholas Sloan (right).

The BB Flickr group has a lot of pictures and links to blog posts. Brooklyn Beta will return again in 2011!

November

Legoland, Windsor.

X turns 7. I realise he really isn’t such a toddler anymore. It took me a while even though he amazes me constantly with his vocabulary and eloquence. His birthday party ensues with a trip to Legoland on the last weekend of the season to watch fireworks and get into trouble. Fun times finding Yoda and the rest of the Star Wars posse battling each other below the Space Shuttle exhibit.

8 Faces

8 Faces number two is published after being announced at Build. Much of the month was spent juggling Mapalong work, and having a great time typesetting the selections spreads for each of the eight faces chosen by the interviewees. That, and worrying with Elliot how it might print with litho. It all turned out OK. I think.

The .Net Awards take place in London. Christened the ‘nutmeg’ awards thanks to iPhone auto-correction, I’m one of millions of judges. We use it as an excuse for a party. At the end of the month, lots of the Mild Bunch go to see Caribou at The Thekla. Good times.

December

Mapalong!

Mapalong goes into private beta! We start inviting many of the Brooklyn Beta folks, and others who’ve reserved their usernames. Lots of placemarks get added. Lots of feedback comes our way. Bug hunting starts. Next design steps start. We push frequently and add people as we go. Big things are planned for the new year!

Clove heart from Lowri.

The Mild Bunch Christmas do goes off with a bang thanks to Adam Robertson making sure it happened. Folks come from far and wide for a great party in The Big Chill Bar in Bristol. Lowri sneaks shots of Sambuca for the girls onto my tab, and we drink all the Innis and Gunn they have.

A few parties later, and the year draws to a close with a very traditional family Christmas in our house. Wood fires, music, the Christmas tree, and two small boys doing what kids do at Christmas. It’s just about perfect; A tonic to the background strife of the month, with a personal tragedy for me, and illness in my close family. Everything worked out OK. Steam-powered fairground rides, dressing up as dinosaurs, and detox follows with a bit of reflection. New Year’s Eve probably means staying in. Babysitters are like gold dust, but I just found we have one for tonight, so it looks like our celebration is coming early!

2011

In the new year, I’ll be mostly trying to do the best I can for my family, my colleagues, and myself. The only goals I have are to help my children be everything they can be, make Mapalong everything we wish it to be, and feel that calm, quiet sense of peace in the evening that only comes from a day well done. Other than that I’ll keep my mind open to serendipity. (…and do something about some bits of my site and the typesetting that’s bugging me after writing this. :)

If you made it this far, thank you, and here’s to you and yours in 2011; may the best of your past be the worst of your future!




r

Ides of March

My friend and colleague, Chris, has shared a spiffing idea, the Ideas of March. He suggests: ‘If we all blog a little more than we normally would this month, maybe we can be reminded of all of the reasons blogs are great.’

But wait, this post is called the Ides of March? Right. As soon as I read what Chris had posted, a twist on the phrase echoed in my memory. The Ides of March is a Roman festival dedicated to the god of war, Mars. Some say it’s on the 15th of March (today). I can’t find a reference that this is accurate relative to the Julian or current Gregorian calendars, so I will use the first full moon instead. This year it will be on Saturday, 19th of March, in four days time. Wikipedia has more:

The Ides of March was a festive day dedicated to the god Mars and a military parade was usually held. In modern times, the term Ides of March is best known as the date that Julius Caesar was killed in 44 B.C.

Dramatic stuff. Appropriate in these times, too. Mars may have been the god of war, based on the anarchistic Greek god, Ares, but he represented the pursuit of peace through military strength. A thoroghly debunked method if you ask me, but a pretty neat rationalisation still used today. The military pursues Gaddafi’s version of peace in Libya. Mubarak tried it, and failed, in Egypt. The Ben Ali regime collapsed under protests in Tunisia. Saleh is on his way in Yemen. Right now, Saudi soldiers are deployed in Bahrain to quell protestors fighting for democratic freedom.

Whatever you think about the current strife, one thing is true: Tyrants never last. I’ve been an advocate of Twitter, and its ambient intimacy for almost four years. In that time I’ve seen it buoyed by the innovations of its users. Smart folks using @replies, and retweets that became a part of the fabric, coded into links and threads (sort-of). Other smart people building clients with new ways of looking at the graph. I’ve seen Twitter take the good ideas and do good things with them. Yet now, Twitter isn’t just the platform any longer, it wants to be the clients too. From URL shortening and tracking, to changes in who can make clients, and how they work. People don’t like it. The same kind of smart people who helped it be successful. The same kind of people who permit benevolent dictators to exist until they become tyrants.

I’m still a fan of the idea of short messages. They are neat, by their nature, but lest Twitter forgets, they also exist elsewhere, too. They’re a snack between meals. Signposts to feasts. The real banquets are blog posts, though. I’ve learnt more from them in the last ten years than I ever will from 140 characters. That’s why blogs are something to be treasured. Blogs and RSS may be dead according to some, but I like that I disagree. After all, even with this rambling post, you’ve probably learnt something, just like I have writing it. Thanks for the prompt, Chris.

Don’t procrastinate, fire up your editor and share your own ideas of March. Drew, Lorna, and Sean already have. Go on, you know it’s been far too long!




r

Web Design as Narrative Architecture

Stories are everywhere. When they don’t exist we make up the narrative — we join the dots. We make cognitive leaps and fill in the bits of a story that are implied or missing. The same goes for websites. We make quick judgements based on a glimpse. Then we delve deeper. The narrative unfolds, or we create one as we browse.

Mark Bernstein penned Beyond Usability and Design: The Narrative Web for A List Apart in 2001. He wrote, ‘the reader’s journey through our site is a narrative experience’. I agreed wholeheartedly: Websites are narrative spaces where stories can be enacted, or emerge.

Henry Jenkins, Director of Comparative Media Studies, and Professor of Literature at MIT, wrote Game Design as Narrative Architecture. He suggested we think of game designers, ‘less as storytellers than as narrative architects’. I agree, and I think web designers are narrative architects, too. (Along with all the multitude of other roles we assume.) Much of what Henry Jenkins wrote applies to modern web design. In particular, he describes two kinds of narratives in game design that are relevant to us:

Enacted narratives are those where:

[…] the story itself may be structured around the character’s movement through space and the features of the environment may retard or accelerate that plot trajectory.

Sites like Amazon, New Adventures, or your portfolio are enacted narrative spaces: Shops or service brochures that want the audience to move through the site towards a specific set of actions like buying something or initiating contact.

Emergent narratives are those where:

[…] spaces are designed to be rich with narrative potential, enabling the story-constructing activity of players.

Sites like Flickr, Twitter, or Dribbble are emergent narrative spaces: Web applications that encourage their audience use the tools at their disposal to tell their own story. The audience defines how they want to use the narrative space, often with surprising results.

We often build both kinds of narrative spaces. Right now, my friends and I at Analog are working on Mapalong, a new maps-based app that’s just launched into private beta. At its heart Mapalong is about telling our stories. It’s one big map with a set of tools to view the world, add places, share them, and see the places others share. The aim is to help people tell their stories. We want to use three ideas to help you do that: Space (recording places, and annotating them), data (importing stuff we create elsewhere), and time (plotting our journeys, and recording all the places, people, and memories along the way). We know that people will find novel uses for the tools in Mapalong. In fact, we want them to because it will help us refine and build better tools. We work in an agile way because that’s the only way to design an emerging narrative space. Without realising it we’ve become architects of a narrative space, and you probably are, too.

Many projects like shops or brochure sites have fixed costs and objectives. They want to guide the audience to a specific set of actions. The site needs to be an enacted narrative space. Ideally, designers would observe behaviour and iterate. Failing that, a healthy dose of empathy can serve. Every site seeks to teach, educate, or inform. So, a bit of knowledge about people’s learning styles can be useful. I once did a course in one to one and small group training with the Chartered Institute of Personnel and Development. It introduced me to Peter Honey and Alan Mumford’s model which describes four different learning styles that are useful for us to know. I paraphrase:

  1. Activists like learning as they go; getting stuck in and working it out. They enjoy the here and now, and are happy to be dominated by immediate experiences. They are open-minded, not sceptical, and this tends to make them enthusiastic about anything new.
  2. Reflectors like being guided with time to take it all in and perhaps return later. They like to stand back to ponder experiences and observe them from many different perspectives. They collect data, both first hand and from others, and prefer to think about it thoroughly before coming to a conclusion.
  3. Theorists to understand and make logical sense of things before they leap in. They think problems through in a vertical, step-by-step logical way. They assimilate disparate facts into coherent theories.
  4. Pragmatists like practical applications of ideas, experiments, and results. They like trying out ideas, theories and techniques to see if they work in practice. They positively search out new ideas and take the first opportunity to experiment with applications.

Usually people share two or more of these qualities. The weight of each can vary depending on the context. So how might learning styles manifest themselves in web browsing behaviour?

  • Activists like to explore, learn as they go, and wander the site working it out. They need good in-context navigation to keep exploring. For example, signposts to related information are optimal for activists. They can just keep going, and going, and exploring until sated.
  • Reflectors are patient and thoughtful. They like to ponder, read, reflect, then decide. Guided tours to orientate them in emergent sites can be a great help. Saving shopping baskets for later, and remembering sessions in enacted sites can also help them.
  • Theorists want logic. Documentation. An understanding of what the site is, and what they might get from it. Clear, detailed information helps a theorist, whatever the space they’re in.
  • Pragmatists get stuck in like activists, but evaluate quickly, and test their assumptions. They are quick, and can be helped by uncluttered concise information, and contextual, logical tools.

An understanding of interactive narrative types and a bit of knowledge about learning styles can be useful concepts for us to bear in mind. I also think they warrant inclusion as part of an articulate designer’s language of web design. If Henry Jenkins is right about games designers, I think he could also be right about web designers: we are narrative architects, designing spaces where stories are told.

The original version of this article first appeared as ‘Jack A Nory’ alongside other, infinitely more excellent articles, in the New Adventures paper of January 2011. It is reproduced with the kind permission of the irrepressible Simon Collison. For a short time, the paper is still available as a PDF!

—∞—




r

Ampersand, the Aftermath

The first Ampersand web typography conference took place in Brighton last Friday. Ampersand was ace. I’m going to say that again with emphasis: Ampersand was ace! Like the Ready Brek kid from the 80s TV ads I’m glowing with good vibes.

Imagine you’d just met some of the musicians that created the soundtrack to your life. That’s pretty much how I feel.

Nerves and all…

Photo by Ben Mitchell.

For a long, long time I’ve gazed across at the typography community with something akin to awe at the work they do. I’ve lurked quietly on the ATypI mailing list, in the Typophile forum, and behind the glass dividing my eyes from the blogs, portfolios, and galleries.

I always had a sneaking suspicion the web and type design communities had much in common: Excellence born from actual client work; techniques and skills refined by practice, not in a lab or classroom; a willingness to share and disseminate, most clearly demonstrated at Typophile and through web designer’s own blogs. The people of both professions have a very diverse set of backgrounds from graphic design all the way through to engineering, to accidentally working in a print shop. We’ve been apprenticed to our work, and Ampersand was a celebration of what we’ve achieved so far and what’s yet to come.

Of course, web design is a new profession. Type design has a history that spans hundreds of years. Nevertheless, both professions are self-actualising. Few courses exist of any real merit. There is no qualifications authority. The work from both arenas succeeds or fails based on whether it works or not.

Ampersand was the first event of its kind. Folks from both communities came together around the mutal fascination, frustration, challenge and opportunity of web type.

Like Brooklyn Beta, the audience was as fantastic as the line up. I met folks like Yves Peters of the FontFeed, Mike Duggan of Microsoft Typography, Jason Smith, Phil Garnham, Fernando Mello, and Emanuela Conidi of Fontsmith, Veronica Burian of TypeTogether, Adam Twardoch of Fontlab and MyFonts, Nick Sherman of of Webtype, Mandy Brown of A Book Apart and Typekit, and many, many others. (Sorry for stopping there, but wow, it would be a huge list.)

Rich Rutter

Rich Rutter opened the day on behalf of Clearleft and Fontdeck at the Brighton Dome. Rich and I had talked about a web typography conference before. He just went out and did it. Hats off to him, and people like Sophie Barrett at Clearleft who helped make the day run so smoothly.

Others have written comprehensive, insightful summaries of the day and the talks. Much better than I could, sitting there on the day, rapt, taking no notes. What follows are a few snippets my memory threw out when prodded.

Vincent Connare

Who knew the original letterforms for Comic Sans were inspired by a copy of The Watchmen Vincent Connare had in his office? Or that Vincent, who also designed Trebuchet, considers himself an engineer rather than type designer, and is working at the moment on the Ubuntu fonts with colleagues at Dalton Maag.

Jason Santa Maria declared himself a type nerd, and gave a supremely detailed talk about selecting, setting, and understanding web type. Wonderful stuff.

Jason Santa Maria

Jonathan Hoefler talked in rapid, articulate, and precise terms about the work behind upcoming release of pretty-much all of H&FJ’s typefaces as web fonts. (Hooray!) He clearly and wonderfully explained how they took the idea behind their typefaces, and moved them through a design process to produce a final form for a specific purpose. In this case, the web, as a distinct and different environment from print.

Jonathan Hoefler

Photo by Sean Johnson.

I spoke between Jason and Jonathan. Gulp. After staying up until 4am the night before, anxiously working on slides, I was carried along by the privilege and joy of being there, hopefully without too much mumbling or squinting with bleary eyes.

After lunch, David Berlow continued the story of web fonts, taking us on a journey through his own trials and tribulations at Font Bureau when re-producing typefaces for the web crude media. His dry, droll, richly-flavoured delivery was a humorous counterpoint to some controversial asides.

David Berlow

Photo by Jeremy Keith.

John Daggett of Mozilla, editor of the CSS3 Fonts Module, talked with great empathy for web designers about the amazing typographic advances we’re about to see in browsers.

Tim Brown of Typekit followed. Tim calmly and thoroughly advocated the extension of modular scales to all aspects of a web interface, taking values from the body type and building all elements with those values as the common denominator.

Finally, Mark Boulton wrapped up the day brilliantly, describing the designer’s role as the mitigator of entropy, reversing the natural trend for things to move from order to chaos, and a theme he’s exploring at the moment: designing from the content out.

Mark Boulton

The tone of the day was fun, thoughtful, articulate, and exacting. All the talks were a mix of anecdotal and observational humour, type nerdery, and most of all an overwhelming commitment to excellence in web typography. It was a journey in itself. Decades of experience from plate and press, screen, and web was being distilled into 45-minute presentations. I loved it.

As always, one of the most enjoyable bits for me was the hallway track. I talked to heaps of people both in the pre- and after-party, and in between the talks on the day itself. I heard stories, ideas, and opinions from print designers, web designers, type designers, font developers, and writers. We talked late into the night. We talked more the next day.

Now the talking has paused for a while, my thoughts are manifold. I can honestly say, I’ve never been so filled with positivity about where we are, and where we’re going. Web typography is here, it works, it’s better all the time, and one day web and type designers everywhere will wonder, perplexed, as they try to imagine what the web was like before.

Here’s to another Ampersand next year! I’m now going to see if Rich needs any encouragement to do it again. I’m guessing not, but if he does, I aim to provide it, vigorously. I hope I see you there!

Furthermore

Last but not least, did I mention that Rich Rutter, Mark Boulton, and I are writing a book? We are! More on that another time, but until then, follow @webtypography for intermittent updates.




r

We, Who Are Web Designers

In 2003, my wife Lowri and I went to a christening party. We were friends of the hosts but we knew almost no-one else there. Sitting next to me was a thirty-something woman and her husband, both dressed in the corporate ‘smart casual’ uniform: Jersey, knitwear, and ready-faded jeans for her, formal shoes and tucked-in formal shirt for him (plus the jeans of course; that’s the casual bit). Both appeared polite, neutral, and neat in every respect.

I smiled and said hello, and asked how they knew our hosts. The conversation stalled pretty quickly the way all conversations will when only one participant is engaged. I persevered, asked about their children who they mentioned, trying to be a good friend to our hosts by being friendly to other guests. It must have prompted her to reciprocate. With reluctant interest she asked the default question: ‘What do you do?’ I paused, uncertain for a second. ‘I’m a web designer’ I managed after a bit of nervous confusion at what exactly it was that I did. Her face managed to drop even as she smiled condescendingly. ‘Oh. White backgrounds!’ she replied with a mixture of scorn and delight. I paused. ‘Much of the time’, I nodded with an attempt at a self-deprecating smile, trying to maintain the camaraderie of the occasion. ‘What do you do?’ I asked, curious to see where her dismissal was coming from. ‘I’m the creative director for … agency’ she said smugly, overbearingly confident in the knowledge that she had a trump card, and had played it. The conversation was over.

I’d like to say her reaction didn’t matter to me, but it did. It stung to be regarded so disdainfully by someone who I would naturally have considered a colleague. I thought to try and explain. To mention how I started in print, too. To find out why she had such little respect for web design, but that was me wanting to be understood. I already knew why. Anything I said would sound defensive. She may have been rude, but at least she was honest.

I am a web designer. I neither concentrate on the party venue, food, music, guest list, or entertainment, but on it all. On the feeling people enter with and walk away remembering. That’s my job. It’s probably yours too.

I’m self-actualised, without the stamp of approval from any guild, curriculum authority, or academic institution. I’m web taught. Colleague taught. Empirically taught. Tempered by over fifteen years of failed experiments on late nights with misbehaving browsers. I learnt how to create venues because none existed. I learnt what music to play for the people I wanted at the event, and how to keep them entertained when they arrived. I empathised, failed, re-empathised, and did it again. I make sites that work. That’s my certificate. That’s my validation.

I try, just like you, to imbue my practice with an abiding sense of responsibility for the universality of the Web as Tim Berners-Lee described it. After all, it’s that very universality that’s allowed our profession and the Web to thrive. From the founding of the W3C in 1994, to Mosaic shipping with <img> tag support in 1993, to the Web Standards Project in 1998, and the CSS Zen Garden in 2003, those who care have been instrumental in shaping the Web. Web designers included. In more recent times I look to the web type revolution, driven and curated by both web designers, developers, and the typography community. Again, we’re teaching ourselves. The venues are open to all, and getting more amazing by the day.

Apart from the sites we’ve built, all the best peripheral resources that support our work are made by us. We’ve contributed vast amounts of code to our collective toolkit. We’ve created inspirational conferences like Brooklyn Beta, New Adventures, Web Directions, Build, An Event Apart, dConstruct, and Webstock. As a group, we’ve produced, written-for, and supported forward-thinking magazines like A List Apart, 8 Faces, Smashing Mag, and The Manual. We’ve written the books that distill our knowledge either independently or with publishers from our own community like Five Simple Steps and A Book Apart. We’ve created services and tools like jQuery, Fontdeck, Typekit, Hashgrid, Teuxdeux, and Firebug. That’s just a sample. There’s so many I haven’t mentioned. We did these things. What an extraordinary industry.

I know I flushed with anger and embarrassment that day at the christening party. Afterwards, I started to look a little deeper into what I do. I started to ask what exactly it means to be a web designer. I started to realise how extraordinary our community is. How extraordinary this profession is that we’ve created. How good the work is that we do. How delightful it is when it does work; for audiences, clients, and us. How fantastic it is that I help build the Web. Long may that feeling last. May it never go away. There’s so much still to learn, create, and make. This is my our party. Hi, I’m Jon; my friends and I are making Mapalong, and I’m a web designer.




r

Auphonic Leveler 1.8 and Auphonic Multitrack 1.4 Updates

Today we released free updates for the Auphonic Leveler Batch Processor and the Auphonic Multitrack Processor with many algorithm improvements and bug fixes for Mac and Windows.

Changelog

  • Linear Filtering Algorithms to avoid Asymmetric Waveforms:
    New zero-phase Adaptive Filtering Algorithms to avoid asymmetric waveforms.
    In asymmetric waveforms, the positive and negative amplitude values are disproportionate - please see Asymmetric Waveforms: Should You Be Concerned?.
    Asymmetrical waveforms are quite natural and not necessarily a problem. They are particularly common on recordings of speech, vocals and can be caused by low-end filtering. However, they limit the amount of gain that can be safely applied without introducing distortion or clipping due to aggressive limiting.
  • Noise Reduction Improvements:
    New and improved noise profile estimation algorithms and bug fixes for parallel Noise Reduction Algorithms.
  • Processing Finished Notification on Mac:
    A system notification (including a short glass sound) is now displayed on Mac OS when the Auphonic Leveler or Auphonic Multitrack has finished processing - thanks to Timo Hetzel.
  • Improved Dithering:
    Improved dithering algorithms - using SoX - if a bit-depth reduction is necessary during file export.
  • Auphonic Multitrack Fixes:
    Fixes for ducking and background tracks and for very short music tracks.
  • New Desktop Apps Documentation:
    The documentation of our desktop apps is now integrated in our new help system:
    see Auphonic Leveler Batch Processor and Auphonic Multitrack Processor.
  • Bug Fixes and Audio Algorithm Improvements:
    This release also includes many small bug fixes and all audio algorithms come with improvements and updated classifiers using the data from our Web Service.

About the Auphonic Desktop Apps

We offer two desktop programs which include our audio algorithms only. The algorithms will be computed offline on your device and are exactly the same as implemented in our Web Service.

The Auphonic Leveler Batch Processor is a batch audio file processor and includes all our (Singletrack) Audio Post Production Algorithms. It can process multiple productions at once.

Auphonic Multitrack includes our Multitrack Post Production Algorithms and requires multiple parallel input audio tracks, which will be analyzed and processed individually as well as combined to create one final mixdown.

Upgrade now

Everyone is encouraged to download the latest binaries:

Please let us know if you have any questions or feedback!






r

Facebook Live Streaming and Audio/Video Hosting connected to Auphonic

Facebook is not only a social media giant, the company also provides valuable tools for broadcasting. Today we release a connection to Facebook, which allows to use the Facebook tools for video/audio production and publishing within Auphonic and our connected services.

The following workflows are possible with Facebook and Auphonic:
  • Use Facebook for live streaming, then import, process and distribute the audio/video with Auphonic.
  • Post your Auphonic audio or video productions directly to the news feed of your Facebook Page or User.
  • Use Facebook as a general media hosting service and share the link or embed the audio/video on any webpage (also visible to non-Facebook users).

Connect to Facebook

First you have to connect to a Facebook account at our External Services Page, click on the "Facebook" button.

Select if you want to connect to your personal Facebook User or to a Facebook Page:

It is always possible to remove or edit the connection in your Facebook Settings (Tab Business Integrations).

Import (Live) Videos from Facebook to Auphonic

Facebook Live is an easy (and free) way to stream live videos:

We implemented an interface to use Facebook as an Incoming External Service. Please select a (live or non-live) video from your Facebook Page/User as the source of a production and then process it with Auphonic:

This workflow allows you to use Facebook for live streaming, import and process the audio/video with Auphonic, then publish a podcast and video version of your live video to any of our connected services.

Export from Auphonic to Facebook

Similar to Youtube, it is possible to use Facebook for media file hosting.
Please add your Facebook Page/User as an External Service in your Productions or Presets to upload the Auphonic results directly to Facebook:

Options for the Facebook export:
  • Distribution Settings
    • Post to News Feed: The exported video is posted directly to your news feed / timeline.
    • Exclude from News Feed: The exported video is visible in the videos tab of your Facebook Page/User (see for example Auphonic's video tab), but it is not posted to your news feed (you can do that later if you want).
    • Secret: Only you can see the exported video, it is not shown in the Facebook video tab and it is not posted to your news feed (you can do that later if you want).
  • Embeddable
    Choose if the exported video should be embeddable in third-party websites.

It is always possible to change the distribution/privacy and embeddable options later directly on Facebook. For example, you can export a video to Facebook as Secret and publish it to your news feed whenever you want.


If your production is audio-only, we automatically generate a video track from the Cover Image and (possible) Chapter Images.
Alternatively you can select an Audiogram Output File, if you want to add an Audiogram (audio waveform visualization) to your Facebook video - for details please see Auphonic Audiogram Generator.

Auphonic Title and Description metadata fields are exported to Facebook as well.
If you add Speech Recognition to your production, we create an SRT file with the speech recognition results and add it to your Facebook video as captions.
See the example below.

Facebook Video Hosting Example with Audiogram and Automatic Captions

Facebook can be used as a general video hosting service: even if you export videos as Secret, you will get a direct link to the video which can be shared or embedded in any third-party websites. Users without a Facebook account are also able to view these videos.

In the example below, we automatically generate an Audiogram Video for an audio-only production, use our integrated Speech Recognition system to create captions and export the video as Secret to Facebook.
Afterwards it can be embedded directly into this blog post (enable Captions if they don't show up per default) - for details please see How to embed a video:

It is also possible to just use the generated result URL from Auphonic to share the link to your video (also visible to non-Facebook users):
https://www.facebook.com/auphonic/videos/1687244844638091/

Important Note:
Facebook needs some time to process an exported video (up to a few minutes) and the direct video link won't work before the processing is finished - please try again a bit later!
On Facebook Pages, you can see the processing progress in your Video Library.

Conclusion

Facebook has many broadcasting tools to offer and is a perfect addition to Auphonic.
Both systems and our other external services can be used to create automated processing and publishing workflows. Furthermore, the export and import to/from Facebook is also fully supported in the Auphonic API.

Please contact us if you have any questions or further ideas!




r

Auphonic Audio Inspector Release

At the Subscribe 9 Conference, we presented the first version of our new Audio Inspector:
The Auphonic Audio Inspector is shown on the status page of a finished production and displays details about what our algorithms are changing in audio files.

A screenshot of the Auphonic Audio Inspector on the status page of a finished Multitrack Production.
Please click on the screenshot to see it in full resolution!

It is possible to zoom and scroll within audio waveforms and the Audio Inspector might be used to manually check production result and input files.

In this blog post, we will discuss the usage and all current visualizations of the Inspector.
If you just want to try the Auphonic Audio Inspector yourself, take a look at this Multitrack Audio Inspector Example.

Inspector Usage

Control bar of the Audio Inspector with scrollbar, play button, current playback position and length, button to show input audio file(s), zoom in/out, toggle legend and a button to switch to fullscreen mode.

Seek in Audio Files
Click or tap inside the waveform to seek in files. The red playhead will show the current audio position.
Zoom In/Out
Use the zoom buttons ([+] and [-]), the mouse wheel or zoom gestures on touch devices to zoom in/out the audio waveform.
Scroll Waveforms
If zoomed in, use the scrollbar or drag the audio waveform directly (with your mouse or on touch devices).
Show Legend
Click the [?] button to show or hide the Legend, which describes details about the visualizations of the audio waveform.
Show Stats
Use the Show Stats link to display Audio Processing Statistics of a production.
Show Input Track(s)
Click Show Input to show or hide input track(s) of a production: now you can see and listen to input and output files for a detailed comparison. Please click directly on the waveform to switch/unmute a track - muted tracks are grayed out slightly:

Showing four input tracks and the Auphonic output of a multitrack production.

Please click on the fullscreen button (bottom right) to switch to fullscreen mode.
Now the audio tracks use all available screen space to see all waveform details:

A multitrack production with output and all input tracks in fullscreen mode.
Please click on the screenshot to see it in full resolution.

In fullscreen mode, it’s also possible to control playback and zooming with keyboard shortcuts:
Press [Space] to start/pause playback, use [+] to zoom in and [-] to zoom out.

Singletrack Algorithms Inspector

First, we discuss the analysis data of our Singletrack Post Production Algorithms.

The audio levels of output and input files, measured according to the ITU-R BS.1770 specification, are displayed directly as the audio waveform. Click on Show Input to see the input and output file. Only one file is played at a time, click directly on the Input or Output track to unmute a file for playback:

Singletrack Production with opened input file.
See the first Leveler Audio Example to try the audio inspector yourself.

Waveform Segments: Music and Speech (gold, blue)
Music/Speech segments are displayed directly in the audio waveform: Music segments are plotted in gold/yellow, speech segments in blue (or light/dark blue).
Waveform Segments: Leveler High/No Amplification (dark, light blue)
Speech segments can be displayed in normal, dark or light blue: Dark blue means that the input signal was very quiet and contains speech, therefore the Adaptive Leveler has to use a high amplification value in this segment.
In light blue regions, the input signal was very quiet as well, but our classifiers decided that the signal should not be amplified (breathing, noise, background sounds, etc.).

Yellow/orange background segments display leveler fades.

Background Segments: Leveler Fade Up/Down (yellow, orange)
If the volume of an input file changes in a fast way, the Adaptive Leveler volume curve will increase/decrease very fast as well (= fade) and should be placed in speech pauses. Otherwise, if fades are too slow or during active speech, one will hear pumping speech artifacts.
Exact fade regions are plotted as yellow (fade up, volume increase) and orange (fade down, volume decrease) background segments in the audio inspector.

Horizontal red lines display noise and hum reduction profiles.

Horizontal Lines: Noise and Hum Reduction Profiles (red)
Our Noise and Hiss Reduction and Hum Reduction algorithms segment the audio file in regions with different background noise characteristics, which are displayed as red horizontal lines in the audio inspector (top lines for noise reduction, bottom lines for hum reduction).
Then a noise print is extracted in each region and a classifier decides if and how much noise reduction is necessary - this is plotted as a value in dB below the top red line.
The hum base frequency (50Hz or 60Hz) and the strength of all its partials is also classified in each region, the value in Hz above the bottom red line indicates the base frequency and whether hum reduction is necessary or not (no red line).

You can try the singletrack audio inspector yourself with our Leveler, Noise Reduction and Hum Reduction audio examples.

Multitrack Algorithms Inspector

If our Multitrack Post Production Algorithms are used, additional analysis data is shown in the audio inspector.

The audio levels of the output and all input tracks are measured according to the ITU-R BS.1770 specification and are displayed directly as the audio waveform. Click on Show Input to see all the input files with track labels and the output file. Only one file is played at a time, click directly into the track to unmute a file for playback:

Input Tracks: Waveform Segments, Background Segments and Horizontal Lines
Input tracks are displayed below the output file including their track names. The same data as in our Singletrack Algorithms Inspector is calculated and plotted separately in each input track:
Output Waveform Segments: Multiple Speakers and Music
Each speaker is plotted in a separate, blue-like color - in the example above we have 3 speakers (normal, light and dark blue) and you can see directly in the waveform when and which speaker is active.
Audio from music input tracks are always plotted in gold/yellow in the output waveform, please try to not mix music and speech parts in music tracks (see also Multitrack Best Practice)!

You can try the multitrack audio inspector yourself with our Multitrack Audio Inspector Example or our general Multitrack Audio Examples.

Ducking, Background and Foreground Segments

Music tracks can be set to Ducking, Foreground, Background or Auto - for more details please see Automatic Ducking, Foreground and Background Tracks.

Ducking Segments (light, dark orange)
In Ducking, the level of a music track is reduced if one of the speakers is active, which is plotted as a dark orange background segment in the output track.
Foreground music parts, where no speaker is active and the music track volume is not reduced, are displayed as light orange background segments in the output track.
Background Music Segments (dark orange background)
Here the whole music track is set to Background and won’t be amplified when speakers are inactive.
Background music parts are plotted as dark organge background segments in the output track.
Foreground Music Segments (light orange background)
Here the whole music track is set to Foreground and its level won’t be reduced when speakers are active.
Foreground music parts are plotted as light organge background segments in the output track.

You can try the ducking/background/foreground audio inspector yourself: Fore/Background/Ducking Audio Examples.

Audio Search, Chapters Marks and Video

Audio Search and Transcriptions
If our Automatic Speech Recognition Integration is used, a time-aligned transcription text will be shown above the waveform. You can use the search field to search and seek directly in the audio file.
See our Speech Recognition Audio Examples to try it yourself.
Chapters Marks
Chapter Mark start times are displayed in the audio waveform as black vertical lines.
The current chapter title is written above the waveform - see “This is Chapter 2” in the screenshot above.

A video production with output waveform, input waveform and transcriptions in fullscreen mode.
Please click on the screenshot to see it in full resolution.

Video Display
If you add a Video Format or Audiogram Output File to your production, the audio inspector will also show a separate video track in addition to the audio output and input tracks. The video playback will be synced to the audio of output and input tracks.

Supported Audio Formats

We use the native HTML5 audio element for playback and the aurora.js javascript audio decoders to support all common audio formats:

WAV, MP3, AAC/M4A and Opus
These formats are supported in all major browsers: Firefox, Chrome, Safari, Edge, iOS Safari and Chrome for Android.
FLAC
FLAC is supported in Firefox, Chrome, Edge and Chrome for Android - see FLAC audio format.
In Safari and iOS Safari, we use aurora.js to directly decode FLAC files in javascript, which works but uses much more CPU compared to native decoding!
ALAC
ALAC is not supported by any browser so far, therefore we use aurora.js to directly decode ALAC files in javascript. This works but uses much more CPU compared to native decoding!
Ogg Vorbis
Only supported by Firefox, Chrome and Chrome for Android - for details please see Ogg Vorbis audio format.

We suggest to use a recent Firefox or Chrome browser for best performance.
Decoding FLAC and ALAC files also works in Safari and iOS with the help of aurora.js, but javascript decoders need a lot of CPU and they sometimes have problems with exact scrolling and seeking.

Please see our blog post Audio File Formats and Bitrates for Podcasts for more details about audio formats.

Mobile Audio Inspector

Multiple responsive layouts were created to optimize the screen space usage on Android and iOS devices, so that the audio inspector is fully usable on mobile devices as well: tap into the waveform to set the playhead location, scroll horizontally to scroll waveforms, scroll vertically to scroll between tracks, use zoom gestures to zoom in/out, etc.

Unfortunately the fullscreen mode is not available on iOS devices (thanks to Apple), but it works on Android and is a really great way to inspect everything using all the available screen space:

Audio inspector in horizontal fullscreen mode on Android.

Conclusion

Try the Auphonic Audio Inspector yourself: take a look at our Audio Example Page or play with the Multitrack Audio Inspector Example.

The Audio Inspector will be shown in all productions which are created in our Web Service.
It might be used to manually check production result/input files and to send us detailed feedback about audio processing results.

Please let us know if you have some feedback or questions - more visualizations will be added in future!







r

Auphonic Add-ons for Adobe Audition and Adobe Premiere

The new Auphonic Audio Post Production Add-ons for Adobe allows you to use the Auphonic Web Service directly within Adobe Audition and Adobe Premiere (Mac and Windows):

Audition Multitrack Editor with the Auphonic Audio Post Production Add-on.
The Auphonic Add-on can be embedded directly inside the Adobe user interface.


It is possible to export tracks/projects from Audition/Premiere and process them with the Auphonic audio post production algorithms (loudness, leveling, noise reduction - see Audio Examples), use our Encoding/Tagging, Chapter Marks, Speech Recognition and trigger Publishing with one click.
Furthermore, you can import the result file of an Auphonic Production into Audition/Premiere.


Download the Auphonic Audio Post Production Add-ons for Adobe:

Auphonic Add-on for Adobe Audition

Audition Waveform Editor with the Auphonic Audio Post Production Add-on.
Metadata, Marker times and titles will be exported to Auphonic as well.

Export from Audition to Auphonic

You can upload the audio of your current active document (a Multitrack Session or a Single Audio File) to our Web Service.
In case of a Multitrack Session, a mixdown will be computed automatically to create a Singletrack Production in our Web Service.
Unfortunately, it is not possible to export the individual tracks in Audition, which could be used to create Multitrack Productions.

Metadata and Markers
All metadata (see tab Metadata in Audition) and markers (see tab Marker in Audition and the Waveform Editor Screenshot) will be exported to Auphonic as well.
Marker times and titles are used to create Chapter Marks (Enhanced Podcasts) in your Auphonic output files.
Auphonic Presets
You can optionally choose an Auphonic Preset to use previously stored settings for your production.
Start Production and Upload & Edit Buttons
Click Upload & Edit to upload your audio and create a new Production for further editing. After the upload, a web browser will be started to edit/adjust the production and start it manually.
Click Start Production to upload your audio, create a new Production and start it directly without further editing. A web browser will be started to see the results of your production.
Audio Compression
Uncompressed Multitrack Sessions or audio files in Audition (WAV, AIFF, RAW, etc.) will be compressed automatically with lossless codecs to speed up the upload time without a loss in audio quality.
FLAC is used as lossless codec on Windows and Mac OS (>= 10.13), older Mac OS systems (< 10.13) do not support FLAC and use ALAC instead.

Import Auphonic Productions in Audition

To import the result of an Auphonic Production into Audition, choose the corresponding production and click Import.
The result file will be downloaded from the Auphonic servers and can be used within Audition. If the production contains multiple Output File Formats, the output file with the highest bitrate (or uncompressed/lossless if available) will be chosen.

Auphonic Add-on for Adobe Premiere

Premiere Video Editor with the Auphonic Audio Post Production Add-on.
The Auphonic Add-on can be embedded directly inside the Adobe Premiere user interface.

Export from Premiere to Auphonic

You can upload the audio of your current Active Sequence in Premiere to our Web Service.

We will automatically create an audio-only mixdown of all enabled audio tracks in your current Active Sequence.
Video/Image tracks are ignored: no video will be rendered or uploaded to Auphonic!
If you want to export a specific audio track, please just mute the other tracks.

Start Production and Upload & Edit Buttons
Click Upload & Edit to upload your audio and create a new Production for further editing. After the upload, a web browser will be started to edit/adjust the production and start it manually.
Click Start Production to upload your audio, create a new Production and start it directly without further editing. A web browser will be started to see the results of your production.
Auphonic Presets
You can optionally choose an Auphonic Preset to use previously stored settings for your production.
Chapter Markers
Chapter Markers in Premiere (not all the other marker types!) will be exported to Auphonic as well and are used to create Chapter Marks (Enhanced Podcasts) in your Auphonic output files.
Audio Compression
The mixdown of your Active Sequence in Premiere will be compressed automatically with lossless codecs to speed up the upload time without a loss in audio quality.
FLAC is used as lossless codec on Windows and Mac OS (>= 10.13), older Mac OS systems (< 10.13) do not support FLAC and use ALAC instead.

Import Auphonic Productions in Premiere

To import the result of an Auphonic Production into Premiere, choose the corresponding production and click Import.
The result file will be downloaded from the Auphonic servers and can be used within Premiere. If the production contains multiple Output File Formats, the output file with the highest bitrate (or uncompressed/lossless if available) will be chosen.

Installation

Install our Add-ons for Audition and Premiere directly on the Adobe Add-ons website:

Auphonic Audio Post Production for Adobe Audition:
https://exchange.adobe.com/addons/products/20433

Auphonic Audio Post Production for Adobe Premiere:
https://exchange.adobe.com/addons/products/20429

The installation requires the Adobe Creative Cloud desktop application and might take a few minutes. Please also also try to restart Audition/Premiere if the installation does not work (on Windows it was once even necessary to restart the computer to trigger the installation).


After the installation, you can start our Add-ons directly in Audition/Premiere:
navigate to Window -> Extensions and click Auphonic Post Production.

Enjoy

Thanks a lot to Durin Gleaves and Charles Van Winkle from Adobe for their great support!

Please let us know if you have any questions or feedback!







r

New Auphonic Privacy Policy and GDPR Compliance

The new General Data Protection Regulation (GDPR) of the European Union will be implemented on May 25th, 2018. We used this opportunity to rework many of our internal data processing structures, removed unnecessary trackers and apply this strict and transparent regulation also to all our customers worldwide.

Image from pixapay.com.

At Auphonic we store as few personal information as possible about your usage and production data.
Here are a few human-readable excerpts from our privacy policy about which information we collect, how we process it, how long and where we store it - for more details please see our full Privacy Policy.

Information that we collect

  • Your email address when you create an account.
  • Your files, content, configuration parameters and other information, including your photos, audio or video files, production settings, metadata and emails.
  • Your tokens or authentication information if you choose to connect to any External services.
  • Your subscription plan, credits purchases and production billing history associated with your account, where applicable.
  • Your interactions with us, whether by email, on our blog or on our social media platforms.

We do not process any special categories of data (also commonly referred to as “sensitive personal data”).

How we use and process your Data

  • To authenticate you when you log on to your account.
  • To run your Productions, such that Auphonic can create new media files from your Content according to your instructions.
  • To improve our audio processing algorithms. For this purpose, you agree that your Content may be viewed and/or listened to by an Auphonic employee or any person contracted by Auphonic to work on our audio processing algorithms.
  • To connect your Auphonic account to an External service according to your instructions.
  • To develop, improve and optimize the contents, screen layouts and features of our Services.
  • To follow up on any question and request for assistance or information.

When using our Service, you fully retain any rights that you have with regards to your Content, including copyright.

How long we store your Information

Your Productions and any associated audio or video files will be permanently deleted from our servers including all its metadata and possible data from external services after 21 days (7 days for video productions).
We will, however, keep billing metadata associated with your Productions in an internal database (how many hours of audio you processed).

Also, we might store selected audio and/or video files (or excerpts thereof) from your Content in an internal storage space for the purpose of improving our audio processing algorithms.

Other information like Presets, connected External services, Account settings etc. will be stored until you delete them or when your account is deleted.

Where we store your Data

All data that we collect from you is stored on secure servers in the European Economic Area (in Germany).

More Information and Contact

For more information please read our full Privacy Policy.

Please do not hesitate to contact us regarding any matter relating to our privacy policy and GDPR compliance!







r

New Auphonic Transcript Editor and Improved Speech Recognition Services

Back in late 2016, we introduced Speech Recognition at Auphonic. This allows our users to create transcripts of their recordings, and more usefully, this means podcasts become searchable.
Now we integrated two more speech recognition engines: Amazon Transcribe and Speechmatics. Whilst integrating these services, we also took the opportunity to develop a complete new Transcription Editor:

Screenshot of our Transcript Editor with word confidence highlighting and the edit bar.
Try out the Transcript Editor Examples yourself!


The new Auphonic Transcript Editor is included directly in our HTML transcript output file, displays word confidence values to instantly see which sections should be checked manually, supports direct audio playback, HTML/PDF/WebVTT export and allows you to share the editor with someone else for further editing.

The new services, Amazon Transcribe and Speechmatics, offer transcription quality improvements compared to our other integrated speech recognition services.
They also return word confidence values, timestamps and some punctuation, which is exported to our output files.

The Auphonic Transcript Editor

With the integration of the two new services offering improved recognition quality and word timestamps alongside confidence scores, we realized that we could leverage these improvements to give our users easy-to-use transcription editing.
Therefore we developed a new, open source transcript editor, which is embedded directly in our HTML output file and has been designed to make checking and editing transcripts as easy as possible.

Main features of our transcript editor:
  • Edit the transcription directly in the HTML document.
  • Show/hide word confidence, to instantly see which sections should be checked manually (if you use Amazon Transcribe or Speechmatics as speech recognition engine).
  • Listen to audio playback of specific words directly in the HTML editor.
  • Share the transcript editor with others: as the editor is embedded directly in the HTML file (no external dependencies), you can just send the HTML file to some else to manually check the automatically generated transcription.
  • Export the edited transcript to HTML, PDF or WebVTT.
  • Completely useable on all mobile devices and desktop browsers.

Examples: Try Out the Transcript Editor

Here are two examples of the new transcript editor, taken from our speech recognition audio examples page:

1. Singletrack Transcript Editor Example
Singletrack speech recognition example from the first 10 minutes of Common Sense 309 by Dan Carlin. Speechmatics was used as speech recognition engine without any keywords or further manual editing.
2. Multitrack Transcript Editor Example
A multitrack automatic speech recognition transcript example from the first 20 minutes of TV Eye on Marvel - Luke Cage S1E1. Amazon Transcribe was used as speech recognition engine without any further manual editing.
As this is a multitrack production, the transcript includes exact speaker names as well (try to edit them!).

Transcript Editing

By clicking the Edit Transcript button, a dashed box appears around the text. This indicates that the text is now freely editable on this page. Your changes can be saved by using one of the export options (see below).
If you make a mistake whilst editing, you can simply use the undo/redo function of the browser to undo or redo your changes.


When working with multitrack productions, another helpful feature is the ability to change all speaker names at once throughout the whole transcript just by editing one speaker. Simply click on an instance of a speaker title and change it to the appropriate name, this name will then appear throughout the whole transcript.

Word Confidence Highlighting

Word confidence values are shown visually in the transcript editor, highlighted in shades of red (see screenshot above). The shade of red is dependent on the actual word confidence value: The darker the red, the lower the confidence value. This means you can instantly see which sections you should check/re-work manually to increase the accuracy.

Once you have edited the highlighted text, it will be set to white again, so it’s easy to see which sections still require editing.
Use the button Add/Remove Highlighting to disable/enable word confidence highlighting.

NOTE: Word confidence values are only available in Amazon Transcribe or Speechmatics, not if you use our other integrated speech recognition services!

Audio Playback

The button Activate/Stop Play-on-click allows you to hear the audio playback of the section you click on (by clicking directly on the word in the transcript editor).
This is helpful in allowing you to check the accuracy of certain words by being able to listen to them directly whilst editing, without having to go back and try to find that section within your audio file.

If you use an External Service in your production to export the resulting audio file, we will automatically use the exported file in the transcript editor.
Otherwise we will use the output file generated by Auphonic. Please note that this file is password protected for the current Auphonic user and will be deleted in 21 days.

If no audio file is available in the transcript editor, or cannot be played because of the password protection, you will see the button Add Audio File to add a new audio file for playback.

Export Formats, Save/Share Transcript Editor

Click on the button Export... to see all export and saving/sharing options:

Save/Share Editor
The Save Editor button stores the whole transcript editor with all its current changes into a new HTML file. Use this button to save your changes for further editing or if you want to share your transcript with someone else for manual corrections (as the editor is embedded directly in the HTML file without any external dependencies).
Export HTML / Export PDF / Export WebVTT
Use one of these buttons to export the edited transcript to HTML (for WordPress, Word, etc.), to PDF (via the browser print function) or to WebVTT (so that the edited transcript can be used as subtitles or imported in web audio players of the Podlove Publisher or Podigee).
Every export format is rendered directly in the browser, no server needed.

Amazon Transcribe

The first of the two new services, Amazon Transcribe, offers accurate transcriptions in English and Spanish at low costs, including keywords, word confidence, timestamps, and punctuation.

UPDATE 2019:
Amazon Transcribe offers more languages now - please see Amazon Transcribe Features!

Pricing
The free tier offers 60 minutes of free usage a month for 12 months. After that, it is billed monthly at a rate of $0.0004 per second ($1.44/h).
More information is available at Amazon Transcribe Pricing.
Custom Vocabulary (Keywords) Support
Custom Vocabulary (called Keywords in Auphonic) gives you the ability to expand and customize the speech recognition vocabulary, specific to your case (i.e. product names, domain-specific terminology, or names of individuals).
The same feature is also available in the Google Cloud Speech API.
Timestamps, Word Confidence, and Punctuation
Amazon Transcribe returns a timestamp and confidence value for each word so that you can easily locate the audio in the original recording by searching for the text.
It also adds some punctuation, which is combined with our own punctuation and formatting automatically.

The high-quality (especially in combination with keywords) and low costs of Amazon Transcribe make it attractive, despite only currently supporting two languages.
However, the processing time of Amazon Transcribe is much slower compared to all our other integrated services!

Try it yourself:
Connect your Auphonic account with Amazon Transcribe at our External Services Page.

Speechmatics

Speechmatics offers accurate transcriptions in many languages including word confidence values, timestamps, and punctuation.

Many Languages
Speechmatics’ clear advantage is the sheer number of languages it supports (all major European and some Asiatic languages).
It also has a Global English feature, which supports different English accents during transcription.
Timestamps, Word Confidence, and Punctuation
Like Amazon, Speechmatics creates timestamps, word confidence values, and punctuation.
Pricing
Speechmatics is the most expensive speech recognition service at Auphonic.
Pricing starts at £0.06 per minute of audio and can be purchased in blocks of £10 or £100. This equates to a starting rate of about $4.78/h. Reduced rate of £0.05 per minute ($3.98/h) are available if purchasing £1,000 blocks.
They offer significant discounts for users requiring higher volumes. At this further reduced price point it is a similar cost to the Google Speech API (or lower). If you process a lot of content, you should contact them directly at sales@speechmatics.com and say that you wish to use it with Auphonic.
More information is available at Speechmatics Pricing.

Speechmatics offers high-quality transcripts in many languages. But these features do come at a price, it is the most expensive speech recognition services at Auphonic.

Unfortunately, their existing Custom Dictionary (keywords) feature, which would further improve the results, is not available in the Speechmatics API yet.

Try it yourself:
Connect your Auphonic account with Speechmatics at our External Services Page.

What do you think?

Any feedback about the new speech recognition services, especially about the recognition quality in various languages, is highly appreciated.

We would also like to hear any comments you have on the transcript editor particularly - is there anything missing, or anything that could be implemented better?
Please let us know!






r

Audio Manipulations and Dynamic Ad Insertion with the Auphonic API

We are pleased to announce a new Audio Inserts feature in the Auphonic API: audio inserts are separate audio files (like intros/outros), which will be inserted into your production at a defined offset.
This blog post shows how one can use this feature for Dynamic Ad Insertion and discusses other Audio Manipulation Methods of the Auphonic API.

API-only Feature

For the general podcasting hobbyist, or even for someone producing a regular podcast, the features that are accessible via our web interface are more than sufficient.

However, some of our users, like podcasting companies who integrate our services as part of their products, asked us for dynamic ad insertions. We teamed up with them to develop a way of making this work within the Auphonic API.

We are pleased therefore to announce audio inserts - a new feature that has been made part of our API. This feature is not available through the web interface though, it requires the use of our API.

Before we talk about audio inserts, let's talk about what you need to know about dynamic ad insertion!

Dynamic Ad Insertion

There are two ways of dealing with adverts within podcasts. In the first, adverts are recorded or edited into the podcast and are fixed, or baked in. The second method is to use dynamic insertion, whereby the adverts are not part of the podcast recording/file but can be inserted into the podcast afterwards, at any time.

This second approach would allow you to run new ad campaigns across your entire catalog of shows. As a podcaster this allows you to potentially generate new revenue from your old content.

As a hosting company, dynamic ad insertion allows you to choose up to date and relevant adverts across all the podcasts you host. You can make these adverts relevant by subject or location, for instance.

Your users can define the time for the ads and their podcast episode, you are then in control of the adverts you insert.

Audio Inserts in Auphonic

Whichever approach to adverts you are taking, using audio inserts can help you.

Audio inserts are separate audio files which will be inserted into your main single or multitrack production at your defined offset (in seconds).

When a separate audio file is inserted as part of your production, it creates a gap in the podcast audio file, shifting the audio back by the length of the insert. Helpfully, chapters and other time-based information like transcriptions are also shifted back when an insert is used.

The biggest advantage of this is that Auphonic will apply loudness normalization to the audio insert so, from an audio point of view, it matches the rest of the podcast.

Although created with dynamic ad insertion in mind, this feature can be used for any type of audio inserts: adverts, music songs, individual parts of a recording, etc. In the case of baked-in adverts, you could upload your already processed advert audio as an insert, without having to edit it into your podcast recording using a separate audio editing application.

Please note that audio inserts should already be edited and processed before using them in production. (This is usually the case with pre-recorded adverts anyway). The only algorithm that Auphonic applies to an audio insert is loudness normalization in order to match the loudness of the entire production. Auphonic does not add any other processing (i.e. no leveling, noise reduction etc).

Audio Inserts Coding Example

Here is a brief overview of how to use our API for audio inserts. Be warned, this section is coding heavy, so if this isn't your thing, feel free to move along to the next section!

You can add audio insert files with a call to https://auphonic.com/api/production/{uuid}/multi_input_files.json, where uuid is the UUID of your production.
Here is an example with two audio inserts from an https URL. The offset/position in the main audio file must be given in seconds:

curl -X POST -H "Content-Type: application/json" 
    https://auphonic.com/api/production/{uuid}/multi_input_files.json 
    -u username:password 
    -d '[
            {
                "input_file": "https://mydomain.com/my_audio_insert_1.wav",
                "type": "insert",
                "offset": 20.5
            },
            {
                "input_file": "https://mydomain.com/my_audio_insert_2.wav",
                "type": "insert",
                "offset": 120.3
            }
        ]'

More details showing how to use audio inserts in our API can be seen here.

Additional API Audio Manipulations

In addition to audio inserts, using the Auphonic API offers a number of other audio manipulation options, which are not available via the web interface:

Cut start/end of audio files: See Docs
In Single-track productions, this feature allows the user to cut the start and/or the end of the uploaded audio file. Crucially, time-based information such as chapters etc. will be shifted accordingly.
Fade In/Out time of audio files: See Docs
This allows you to set the fade in/out time (in ms) at the start/end of output files. The default fade time is 100ms, but values can be set between 0ms and 5000ms.
This feature is also available in our Auphonic Leveler Desktop App.
Adding intro and outro: See Docs
Automatically add intros and outros to your main audio input file, as it is also available in our web interface.
Add multiple intros or outros: See Docs
Using our API, you can also add multiple intros or outros to a production. These intros or outros are played in series.
Overlapping intros/outros: See Docs
This feature allows intros/outros to overlap either the main audio or the following/previous intros/outros.

Conclusion

If you haven't explored our API already, the new audio inserts feature allows for greater flexibility and also dynamic ad insertion.
If you offer online services to podcasters, the Auphonic API would also then allow you to pass on Auphonic's audio processing algorithms to your customers.

If this is of interest to you or you have any new feature suggestions that you feel could benefit your company, please get in touch. We are always happy to extend the functionality of our products!







r

Leveler Presets, LRA Target and Advanced Audio Parameters (Beta)

Lots of users have asked us about more customization and control over the sound of our audio algorithms in the past, so today, we have introduced some advanced algorithm parameters for our singletrack version in a private beta program!

The following new parameters are available:

UPDATE Nov. 2018:
We released a complete rework of the Adaptive Leveler parameters and the description here is not valid anymore!
Please see Auphonic Adaptive Leveler Customization (Beta Update)!

Please join our private beta program and let us know how you use these new features or if you need even more control!

Leveler Presets

Our Adaptive Leveler corrects level differences between speakers, between music and speech and will also apply dynamic range compression to achieve a balanced overall loudness. If you don't know about the Leveler yet, take a look at our Audio Examples.

Leveler presets are basically complete new leveling algorithms, which we have been working on in the past few months:
Our current Leveler tries to normalize all speakers to the same loudness. However, in some cases, you might want more or less loudness differences (dynamic range / loudness range) between the speakers and music segments, or more or less compression, etc.
For these use cases, we have developed additional Leveler Presets and the parameter Maximum Loudness Range.

The following Leveler presets are now available:
Preset Medium:
This is our current leveling algorithm as demonstrated in the Audio Examples.
Preset Hard:
The hard preset reacts faster and applies more gain and compression compared to the medium preset. It is built for recordings with extreme loudness differences, for example very quiet questions from the audience in a lecture recording, extremely soft and loud voices within one audio track, etc.
Preset Soft:
This preset reacts slower, applies less gain and compression compared to the medium preset. Use it if you want to keep more loudness differences (dynamic narration), if you want your voices to sound "less compressed/processed", for dynamic music (concert/classical recordings), background music, etc.
Preset Softer:
Like soft, but softer :)
Preset Speech Medium, Music Soft:
Uses the medium preset in speech segments and the soft preset in music segments. It is built for music live recordings or dynamic music mixes, where you want to amplify all speakers but keep the loudness differences within and between music segments.
Preset Medium, No Compressor:
Like the medium preset, but only (mid-term) leveling and no (short-term) compression is applied. This preset is optimal if you just use a Maximum Loudness Range Target and want to avoid any additional compression as much as possible.
Please let us know your use case, if you need more/other controls or if anything is confusing. The Leveler presets are still in private beta and can be changed as necessary!

Maximum Loudness Range (LRA) Target

The loudness range (LRA) indicates the variation of loudness over the course of a program and is measured in LU (loudness units) - for more details see Loudness Measurement and Normalization or EBU Tech 3342.

The parameter Max Loudness Range controls how much leveling is applied:
volume changes of our Adaptive Leveler will be restricted so that the loudness range of the output file is below the selected value.
High loudness range values will result in very dynamic output files, low loudness range values in compressed output audio. If the LRA value of your input file is already below the maximum loudness range value, no leveling at all will be applied.

It is also important which Leveler Preset you select, for example, if you use the soft(er) preset, it won't be possible to achieve very low loudness range targets.

Also, the Max Loudness Range parameter is not such a precise target value as the Loudness Target. The LRA of your output file might be off a few LU, as it is not reasonable to reach the exact target value.

Use Cases: The Maximum LRA parameter allows you to control the strength of our leveling algorithms, in combination with the parameter Leveler Preset. This might be used for automatic mixdowns with different LRA values for different target platforms (very compressed ones like mobile devices or Alexa, very dynamic ones like home cinema, etc.).

Maximum True Peak Level

This parameter sets the maximum allowed true peak level of the processed output file, which is controlled by the True Peak Limiter after our Global Loudness Normalization algorithms.

If set to Auto (which is the current default), a reasonable value according to the selected loudness target is used: -1dBTP for 23 LUFS (EBU R128) and higher, -2dBTP for -24 LUFS (ATSC A/85) and lower loudness targets.

The maximum true peak level parameter is already available in our desktop program.

Better Hum and Noise Reduction Controls

In addition to the parameter (Noise) Reduction Amount, we now offer two more parameters to control the combination of our Noise and Hum Reduction algorithms:
Hum Base Frequency:
Set the hum base frequency to 50Hz or 60Hz (if you know it), or use Auto to automatically detect the hum base frequency in each speech region.
Hum Reduction Amount:
Maximum hum reduction amount in dB, higher values remove more noise.
In Auto mode, a classifier decides how much hum reduction is necessary in each speech region. Set it to a custom value (> 0), if you prefer more hum reduction or want to bypass our classifier. Use Disable Dehum to disable hum reduction and use our noise reduction algorithms only.

Behavior of noise and hum reduction parameter combinations:

Noise Reduction Amount Hum Base Frequency Hum Reduction Amount
Auto Auto Auto Automatic hum and noise reduction
Auto or > 0 * Disabled No hum reduction, only denoise
Disabled 50Hz Auto or > 0 Force 50Hz hum reduction, no denoise
Disabled Auto Auto or > 0 Automatic dehum, no denoise
12dB 60Hz Auto or > 0 Always do dehum (60Hz) and denoise (12dB)

Advanced Parameters Private Beta and Feedback

At the moment the advanced algorithm parameters are for beta users only. This is to allow us to get user feedback, so we can change the parameters to suit user needs.
Please let us know your case studies, if you need any other algorithm parameters or if you have any questions!

Here are some private beta invitation codes:

y6KCBI4yo0 ksIFEsmI1y BDZec2a21V i4XRGLlVm2 0UDxuS0vbu aaBxi35sKN aaiDSZUbmY bu8lPF80Ih eMsSl6Sf8K DaWpsUnyjo
2YM00m8zDW wh7K2pPmSa jCX7mMy2OJ ZGvvhzCpTF HI0lmGhjVO eXqVhN6QLU t4BH0tYcxY LMjQREVuOx emIogTCAth 0OTPNB7Coz
VIFY8STj2f eKzRSWzOyv 40cMMKKCMN oBruOxBkqS YGgPem6Ne7 BaaFG9I1xZ iSC0aNXoLn ZaS4TykKIa l32bTSBbAx xXWraxS40J
zGtwRJeAKy mVsx489P5k 6SZM5HjkxS QmzdFYOIpf 500AHHtEFA 7Kvk6JRU66 z7ATzwado6 4QEtpzeKzC c9qt9Z1YXx pGSrDzbEED
MP3JUTdnlf PDm2MOLJIG 3uDietVFSL 1i7jZX0Y9e zPkSgmAqqP 5OhcmHIZUP E0vNsPxZ4s FzTIyZIG2r 5EywA0M7r5 FMhpcFkVN5
oRLbRGcRmI 2LTh8GlN7h Cjw6Z3cveP fayCewjE55 GbkyX89Lxu 4LpGZGZGgc iQV7CXYwkH pGLyQPgaha e3lhKDRUMs Skrei1tKIa
We are happy to send further invitation codes to all interested users - please do not hesitate to contact us!

If you have an invitation code, you can enter it here to activate the advanced audio algorithm parameters:
Auphonic Algorithm Parameters Private Beta Activation







r

Resumable File Uploads to Auphonic

Large file uploads in a web browser are problematic, even in 2018. If working with a poor network connection, uploads can fail and have to be retried from the start.

At Auphonic, our users have to upload large audio and video files, or multiple media files when creating a multitrack production. To minimize any potential issues, we integrated various external services which are specialized for large file transfers, like FTP, SFTP, Dropbox, Google Drive, S3, etc.

To further minimize issues, as of today we have also released resumable and chunked direct file uploads in the web browser to auphonic.com.

If you are not interested in the technical details, please just go to the section Resumable Uploads in Auphonic below.

The Problem with Large File Uploads in the Browser

If using either mobile networks (which remain fragile) or unstable WiFi connections, file uploads are often interrupted and will fail. There are also many areas in the world where connections are quite poor, which makes uploading big files frustrating.

After an interrupted file upload, the web browser must restart the whole upload from the start, which is a problem when it happens in the middle of a 4GB video file upload on a slow connection.
Furthermore, the longer an upload takes, the more likely it is to have a network glitch interrupting the upload, which then has to be retried from the start.

The Solution: Chunked, Resumable Uploads

To avoid user frustration, we need to be able to detect network errors and potentially resume an upload without having to restart it from the beginning.

To achieve this, we have to split a file upload in smaller chunks directly within the web browser, so that these chunks can then be sent to the server afterwards.
If an upload fails or the user wants to pause, it is possible to resume it later and only send those chunks that have not already been uploaded.
If there is a network interruption or change, the upload will be retried automatically.

Companies like Dropbox, Google, Amazon AWS etc. all have their own protocols and API's for chunked uploads, but there are also some open source implementations available, which offer resumable uploads:

resumable.js [link]:
"A JavaScript library providing multiple simultaneous, stable and resumable uploads via the HTML5 File API"
This solutions is a JavaScript library only and requires that the protocol is implemented on the server as well.
tus.io [link]:
"Open Protocol for Resumable File Uploads"
Tus.io offers a simple, cheap and reusable stack for clients and servers (in many languages). They have a blog with further information about resumable uploads, see tus blog.
plupload [link]:
A JavaScript library, similar to resumable.js, which requires a separate server implementation.

We chose to use resumable.js and developed our own server implementation.

Resumable Uploads in Auphonic

If you upload files to a singletrack or multitrack production, you will see the upload progress bar and a pause button, which is one way to pause and resume an upload:

It is also possible to close the browser completely or shut down your computer during the upload, then edit the production and upload the file again later. This will just resume the file upload from the position where it was stopped before.
(Previously uploaded chunks are saved for 24h on our servers, after that you have to start the whole upload again.)

In case of a network problem or if you switch to a different connection, we will resume the upload automatically.
This should solve many problems which were reported by some users in the past!

You can of course also use any of our external services for stable incoming and outgoing file transfers!

Do you still have Uploading Issues?

We hope that uploads to Auphonic are much more reliable now, even on poor connections.

If you still experience any problems, please let us know.
We are very happy about any bug reports and will do our best to fix them!







r

Auphonic Adaptive Leveler Customization (Beta Update)

In late August, we launched the private beta program of our advanced audio algorithm parameters. After feedback by our users and many new experiments, we are proud to release a complete rework of the Adaptive Leveler parameters:

In the previous version, we based our Adaptive Leveler parameters on the Loudness Range descriptor (LRA), which is included in the EBU R128 specification.
Although it worked, it turned out that it is very difficult to set a loudness range target for diverse audio content, which does include speech, background sounds, music parts, etc. The results were not predictable and it was hard to find good target values.
Therefore we developed our own algorithm to measure the dynamic range of audio signals, which works similarly for speech, music and other audio content.

The following advanced parameters for our Adaptive Leveler allow you to customize which parts of the audio should be leveled (foreground, all, speech, music, etc.), how much they should be leveled (dynamic range), and how much micro-dynamics compression should be applied.

To try out the new algorithms, please join our private beta program and let us know your feedback!

Leveler Preset

The Leveler Preset defines which parts of the audio should be adjusted by our Adaptive Leveler:

  • Default Leveler:
    Our classic, default leveling algorithm as demonstrated in the Leveler Audio Examples. Use it if you are unsure.
  • Foreground Only Leveler:
    This preset reacts slower and levels foreground parts only. Use it if you have background speech or background music, which should not be amplified.
  • Fast Leveler:
    A preset which reacts much faster. It is built for recordings with fast and extreme loudness differences, for example, to amplify very quiet questions from the audience in a lecture recording, to balance fast-changing soft and loud voices within one audio track, etc.
  • Amplify Everything:
    Amplify as much as possible. Similar to the Fast Leveler, but also amplifies non-speech background sounds like noise.

Leveler Dynamic Range

Our default Leveler tries to normalize all speakers to a similar loudness so that a consumer in a car or subway doesn't feel the need to reach for the volume control.
However, in other environments (living room, cinema, etc.) or in dynamic recordings, you might want more level differences (Dynamic Range, Loudness Range / LRA) between speakers and within music segments.

The parameter Dynamic Range controls how much leveling is applied: Higher values result in more dynamic output audio files (less leveling). If you want to increase the dynamic range by 3dB (or LU), just increase the Dynamic Range parameter by 3dB.
We also like to call this Loudness Comfort Zone: above a maximum and below a minimum possible level (the comfort zone), no leveling is applied. So if your input file already has a small dynamic range (is within the comfort zone), our leveler will be just bypassed.

Example Use Cases:
Higher dynamic range values should be used if you want to keep more loudness differences in dynamic narration or dynamic music recordings (live concert/classical).
It is also possible to utilize this parameter to generate automatic mixdowns with different loudness range (LRA) values for different target environments (very compressed ones like mobile devices or Alexa, very dynamic ones like home cinema, etc.).

Compressor

Controls Micro-Dynamics Compression:
The compressor reduces the volume of short and loud spikes like "p", "t" or laughter ( short-term dynamics) and also shapes the sound of your voice (it will sound more or less "processed").
The Leveler, on the other hand, adjusts mid-term level differences, as done by a sound engineer, using the faders of an audio mixer, so that a listener doesn't have to adjust the playback volume all the time.
For more details please see Loudness Normalization and Compression of Podcasts and Speech Audio.

Possible values are:
  • Auto:
    The compressor setting depends on the selected Leveler Preset. Medium compression is used in Foreground Only and Default Leveler presets, Hard compression in our Fast Leveler and Amplify Everything presets.
  • Soft:
    Uses less compression.
  • Medium:
    Our default setting.
  • Hard:
    More compression, especially tries to compress short and extreme level overshoots. Use this preset if you want your voice to sound very processed, our if you have extreme and fast-changing level differences.
  • Off:
    No short-term dynamics compression is used at all, only mid-term leveling. Switch off the compressor if you just want to adjust the loudness range without any additional micro-dynamics compression.

Separate Music/Speech Parameters

Use the switch Separate MusicSpeech Parameters (top right), to see separate Adaptive Leveler parameters for music and speech segments, to control all leveling details separately for speech and music parts:

For dialog intelligibility improvements in films and TV, it is important that the speech/dialog level and loudness range is not too soft compared to the overall programme level and loudness range. This parameter allows you to use more leveling in speech parts while keeping music and FX elements less processed.
Note: Speech, music and overall loudness and loudness range of your production are also displayed in our Audio Processing Statistics!

Example Use Case:
Music live recordings or dynamic music mixes, where you want to amplify all speakers (speech dynamic range should be small) but keep the dynamic range within and between music segments (music dynamic range should be high).
Dialog intelligibility improvements for films and TV, without effecting music and FX elements.

Other Advanced Audio Algorithm Parameters

We also offer advanced audio parameters for our Noise, Hum Reduction and Global Loudness Normalization algorithms:

For more details, please see the Advanced Audio Algorithms Documentation.

Want to know more?

If you want to know more details about our advanced algorithm parameters (especially the leveler parameters), please listen to the following podcast interview with Chris Curran (Podcast Engineering School):
Auphonic’s New Advanced Features, with Georg Holzmann – PES 108

Advanced Parameters Private Beta and Feedback

At the moment the advanced algorithm parameters are for beta users only. This is to allow us to get user feedback, so we can change the parameters to suit user needs.
Please let us know your case studies, if you need any other algorithm parameters or if you have any questions!

Here are some private beta invitation codes:

jbwCVpLYrl 6zmLqq8o3z RXYIUbC6al QDmIZLuPKa JIrnGRZBgl SWQOWeZOBD ISeBCA9gTy w5FdsyhZVI qWAvANQ5mC twOjdHrit3
KwnL2Le6jB 63SE2V54KK G32AULFyaM 3H0CLYAwLU mp1GFNVZHr swzvEBRCVa rLcNJHUNZT CGGbL0O4q1 5o5dUjruJ9 hAggWBpGvj
ykJ57cFQSe 0OHAD2u1Dx RG4wSYTLbf UcsSYI78Md Xedr3NPCgK mI8gd7eDvO 0Au4gpUDJB mYLkvKYz1C ukrKoW5hoy S34sraR0BU
J2tlV0yNwX QwNdnStYD3 Zho9oZR2e9 jHdjgUq420 51zLbV09p4 c0cth0abCf 3iVBKHVKXU BK4kTbDQzt uTBEkMnSPv tg6cJtsMrZ
BdB8gFyhRg wBsLHg90GG EYwxVUZJGp HLQ72b65uH NNd415ktFS JIm2eTkxMX EV2C5RAUXI a3iwbxWjKj X1AT7DCD7V y0AFIrWo5l
We are happy to send further invitation codes to all interested users - please do not hesitate to contact us!

If you have an invitation code, you can enter it here to activate the advanced audio algorithm parameters:
Auphonic Algorithm Parameters Private Beta Activation







r

More Languages for Amazon Transcribe Speech Recognition

Until recently, Amazon Transcribe supported speech recognition in English and Spanish only.
Now they included French, Italian and Portuguese as well - and a few other languages (including German) are in private beta.

Update March 2019:
Now Amazon Transcribe supports German and Korean as well.

The Auphonic Audio Inspector on the status page of a finished Multitrack Production including speech recognition.
Please click on the screenshot to see it in full resolution!


Amazon Transcribe is integrated as speech recognition engine within Auphonic and offers accurate transcriptions (compared to other services) at low costs, including keywords / custom vocabulary support, word confidence, timestamps, and punctuation.
See the following AWS blog post and video for more information about recent Amazon Transcribe developments: Transcribe speech in three new languages: French, Italian, and Brazilian Portuguese.

Amazon Transcribe is also a perfect fit if you want to use our Transcript Editor because you will be able to see word timestamps and confidence values to instantly check which section/words should be corrected manually to increase the transcription accuracy:


Screenshot of our Transcript Editor with word confidence highlighting and the edit bar.

These features are also available if you use Speechmatics, but unfortunately not in our other integrated speech recognition services.

About Speech Recognition within Auphonic

Auphonic has built a layer on top of a few external speech recognition services to make audio searchable:
Our classifiers generate metadata during the analysis of an audio signal (music segments, silence, multiple speakers, etc.) to divide the audio file into small and meaningful segments, which are processed by the speech recognition engine. The results from all segments are then combined, and meaningful timestamps, simple punctuation and structuring are added to the resulting text.

To learn more about speech recognition within Auphonic, take a look at our Speech Recognition and Transcript Editor help pages or listen to our Speech Recognition Audio Examples.

A comparison table of our integrated services (price, quality, languages, speed, features, etc.) can be found here: Speech Recognition Services Comparison.

Conclusion

We hope that Amazon and others will continue to add new languages, to get accurate and inexpensive automatic speech recognition in many languages.

Don't hesitate to contact us if you have any questions or feedback about speech recognition or our transcript editor!






r

Advanced Multitrack Audio Algorithms Release (Beta)

Last weekend, at the Subscribe10 conference, we released Advanced Audio Algorithm Parameters for Multitrack Productions:

We launched our advanced audio algorithm parameters for Singletrack Productions last year. Now these settings (and more) are available for Multitrack Algorithms as well, which gives you detailed control for each track of your production.

The following new parameters are available:

Please join our private beta program and let us know how you use these new features or if you need even more control!

Fore/Background Settings

The parameter Fore/Background controls whether a track should be in foreground, in background, ducked, or unchanged, which is especially important for music or clip tracks.
For more details, please see Automatic Ducking, Foreground and Background Tracks .

We now added the new option Unchanged and a new parameter to set the level of background segments/tracks:
Unchanged (Foreground):
We sometimes received complaints from users, which produced very complex music or clip tracks, that Auphonic changes the levels too hard.
If you set the parameter Fore/Background to the new option Unchanged (Foreground), Level relations within this track won’t be changed at all. It will be added to the final mixdown so that foreground/solo parts of this track will be as loud as (foreground) speech from other tracks.
Background Level:
It is now possible to set the level of background segments/tracks (compared to foreground segments) in background and ducking tracks. By default, background and ducking segments are 18dB softer than foreground segments.

Leveler Parameters

Similar to our Singletrack Advanced Leveler Parameters (see this previous blog post), we also released leveling parameters for Multitrack Productions now.
The following advanced parameters for our Multitrack Adaptive Leveler can be set for each track and allow you to customize which parts of the audio should be leveled, how much they should be leveled, how much dynamic range compression should be applied and to set the stereo panorama (balance):

Leveler Preset:
Select the Speech or Music Leveler for this track.
If set to Automatic (default), a classifier will decide if this is a music or speech track.
Dynamic Range:
The parameter Dynamic Range controls how much leveling is applied: Higher values result in more dynamic output audio files (less leveling). If you want to increase the dynamic range by 3dB (or LU), just increase the Dynamic Range parameter by 3dB.
For more details, please see Multitrack Leveler Parameters.
Compressor:
Select a preset for Micro-Dynamics Compression: Auto, Soft, Medium, Hard or Off.
The Compressor adjusts short-term dynamics, whereas the Leveler adjusts mid-term level differences.
For more details, please see Multitrack Leveler Parameters.
Stereo Panorama (Balance):
Change the stereo panorama (balance for stereo input files) of the current track.
Possible values: L100, L75, L50, L25, Center, R25, R50, R75 and R100.

If you understand German and want to know more about our Advanced Leveler Parameters and audio dynamics in general, watch our talk at the Subscribe10 conference:
Video: Audio Lautheit und Dynamik.

Better Hum and Noise Reduction Controls

We now offer three parameters to control the combination of our Multitrack Noise and Hum Reduction Algorithms for each input track:
Noise Reduction Amount:
Maximum noise and hum reduction amount in dB, higher values remove more noise.
In Auto mode, a classifier decides if and how much noise reduction is necessary (to avoid artifacts). Set to a custom (non-Auto) value if you prefer more noise reduction or want to bypass our classifier.
Hum Base Frequency:
Set the hum base frequency to 50Hz or 60Hz (if you know it), or use Auto to automatically detect the hum base frequency in each speech region.
Hum Reduction Amount:
Maximum hum reduction amount in dB, higher values remove more noise.
In Auto mode, a classifier decides how much hum reduction is necessary in each speech region. Set it to a custom value (> 0), if you prefer more hum reduction or want to bypass our classifier. Use Disable Dehum to disable hum reduction and use our noise reduction algorithms only.

Behavior of noise and hum reduction parameter combinations:

Noise Reduction Amount Hum Base Frequency Hum Reduction Amount
Auto Auto Auto Automatic hum and noise reduction
Auto or > 0 * Disabled No hum reduction, only denoise
Disabled 50Hz Auto or > 0 Force 50Hz hum reduction, no denoise
Disabled Auto Auto or > 0 Automatic dehum, no denoise
12dB 60Hz Auto or > 0 Always do dehum (60Hz) and denoise (12dB)

Maximum True Peak Level

In the Master Algorithm Settings of your multitrack production, you can set the maximum allowed true peak level of the processed output file, which is controlled by the True Peak Limiter after our Loudness Normalization algorithms.

If set to Auto (which is the current default), a reasonable value according to the selected loudness target is used: -1dBTP for 23 LUFS (EBU R128) and higher, -2dBTP for -24 LUFS (ATSC A/85) and lower loudness targets.

Full API Support

All advanced algorithm parameters, for Singletrack and Multitrack Productions, are available in our API as well, which allows you to integrate them into your scripts, external workflows and third-party applications.

Singletrack API:
Documentation on how to use the advanced algorithm parameters in our singletrack production API: Advanced Algorithm Parameters
Multitrack API:
Documentation of advanced settings for each track of a multitrack production:
Multitrack Advanced Audio Algorithm Settings

Join the Beta and Send Feedback

Please join our beta and let us know your case studies, if you need any other algorithm parameters or if you have any questions!

Here are some private beta invitation codes:

8tZPc3T9pH VAvO8VsDg9 0TwKXBW4Ni kjXJMivtZ1 J9APmAAYjT Zwm6HabuFw HNK5gF8FR5 Do1MPHUyPW CTk45VbV4t xYOzDkEnWP
9XE4dZ0FxD 0Sl3PxDRho uSoRQxmKPx TCI62OjEYu 6EQaPYs7v4 reIJVOwIr8 7hPJqZmWfw kti3m5KbNE GoM2nF0AcN xHCbDC37O5
6PabLBRm9P j2SoI8peiY olQ2vsmnfV fqfxX4mWLO OozsiA8DWo weJw0PXDky VTnOfOiL6l B6HRr6gil0 so0AvM1Ryy NpPYsInFqm
oFeQPLwG0k HmCOkyaX9R G7DR5Sc9Kv MeQLSUCkge xCSvPTrTgl jyQKG3BWWA HCzWRxSrgW xP15hYKEDl 241gK62TrO Q56DHjT3r4
9TqWVZHZLE aWFMSWcuX8 x6FR5OTL43 Xf6tRpyP4S tDGbOUngU0 5BkOF2I264 cccHS0KveO dT29cF75gG 2ySWlYp1kp iJWPhpAimF
We are happy to send further invitation codes to all interested users - please do not hesitate to contact us!

If you have an invitation code, you can enter it here to activate the Multitrack Advanced Audio Algorithm Parameters:
Auphonic Algorithm Parameters Private Beta Activation







r

The New Loudness Target War

In the classic loudness war, music and radio producers have been trying to create their recordings as loud as possible and loudness normalization was introduced to stop that. Now one can see the start of a new loudness target war, where podcasters set their loudness targets higher and higher, mainly triggered by high target recommendations of platforms like Spotify or Amazon Alexa.
In this article, we will show how to resist the loudness target war and still be compliant with major platforms.

Resist the loudness target war! (Photo by Nayani Teixeira)

What's the problem?

“Two or three years ago it seemed that many stations were finally realizing that better radio could improve ratings. And the major myth brought over from AM radio – that a louder signal, regardless of quality, attracts more listeners – appeared to be losing its strength,” writes Robert Orban. The times of excessively compressed audio, putting loudness over sound quality, were coming to an end. We were hoping the same when we wrote about the CALM Act and EBU R128 in 2012. Those measures were meant to make programs sound more evened out and set a standard for a reasonable loudness level.

Except, Orban's article was published in 1979 (PDF, p. 60 ff.), and he concludes: "The loudness war has escalated, and quality is once again being sacrificed." 40 years later, a new loudness target war is emerging. While, yes, radio and TV stations have widely adopted the new standards, prominent competitors in the audio market are pushing for higher loudness targets once again.

Loudness war: the trend of increasing audio levels in recorded music over time. (Screenshot delamar.de)

Why LOUDER is not better

Historically, the loudness war has escalated with the advent of digital technology. Peak level normalization and quasi peak program meters (QPPM) encouraged producers to push audio signals to the limit. Just shy of clipping, signals could now be compressed to the highest possible levels, using multiband compressors and limiters. While this lifted quieter signals up, transforming a waveform into a brick, marketers thought that the louder songs on CDs and almost yelling voices on FM radio would attract listeners. On the other hand, reduced dynamics makes audio less interesting and can lead to listener fatigue, as Rip Rowan pointedly illustrated in his 2002 article "Over the Limit":

WHY IS THE LOUDER IS BETTER APPROACH THE WRONG APPROACH? BECAUSE WHEN ALL OF THE SIGNAL IS AT THE MAXIMUM LEVEL, THEN THERE IS NO WAY FOR THE SIGNAL TO HAVE ANY PUNCH. THE WHOLE THING COMES SCREAMING AT YOU LIKE A MESSAGE IN ALL CAPITAL LETTERS. AS WE ALL KNOW, WHEN YOU TYPE IN ALL CAPITAL LETTERS THERE ARE NO CUES TO HELP THE BRAIN MAKE SENSE OF THE SIGNAL, AND THE MIND TIRES QUICKLY OF TRYING TO PROCESS WHAT IS, BASICALLY, WHITE NOISE. LIKEWISE, A SIGNAL THAT JUST PEGS THE METERS CAUSES THE BRAIN TO REACT AS THOUGH IT IS BEING FED WHITE NOISE. WE SIMPLY FILTER IT OUT AND QUIT TRYING TO PROCESS IT.

Hence, many spoken word producers and broadcasters luckily wisened up and committed to new standards, based on loudness normalization instead of peak normalization. LUFS, Loudness Units relative to Full Scale, is a unit that measures an audio track's average loudness. All segments of a program can then be normalized to a certain LUFS value. As we have discussed before, -23 LUFS is standard now for broadcasters of the EBU, which for example has led to advertising segments no longer being much louder than the rest of any particular program. For a short while, it seemed like a peace agreement, or at least a truce had been achieved in the loudness war.

Human loudness perception is based on average levels instead of peak levels. (Screenshot theproaudiofiles.com)

Loudness Targets and Dynamic Range

However, while LUFS adoption is increasing across the industry, this doesn't mean that people have stopped trying to be louder than everybody else. And indeed -23 LUFS is not the one-size-fits-all value. We have recommended -16 LUFS for podcasts ourselves. The loudness of a production played in a cinema should be different from one made for earphones and noisy listening environments. Some headphones expect a louder signal and don't have enough gain to work with -23 LUFS under all circumstances.

The closer a production gets to 0 LUFS though, the less dynamic range can be reproduced. However, the dynamic range also depends on the dBTP, the maximum True Peak level, which indicates the level of the highest possible peak value. For instance, the aforementioned EBU R128 standard of -23 LUFS also defines a -1 dBTP. The difference between the LUFS and dBTP levels is called Peak to Loudness Range, PLR. For EBU R128 the PLR would be 22 LU (Loudness Units), a podcast with -16 LUFS and -1 dBTP has a PLR of 15 LU. Thus, the PLR is a measure of the maximum possible dynamic range of an audio production.

Platforms: Apple, Google, Amazon, and Spotify

Our already pretty high recommendation of -16 LUFS, -1 dBTP for mobile listening was also adopted by Apple's best practices for podcasts and recommendations for Google Assistant. Some are pushing it too far, though. A daily updated analysis by podnews shows that many podcasts are much louder than -16 LUFS.

Distribution of loudness across a selection of podcasts. (Screenshot podnews.net)

This might in part be due to specs published by other competitors in the audio space:

Amazon, for instance, recommends -14 LUFS at -2 dBTP for Alexa skills, meaning a PLR of only 12 LU. While this might work for Alexa's synthesized voice, which doesn't have much variability, it produces a dynamic range too low for spoken word content.
However, Amazon also says that a skill may be rejected if the program loudness is lower -19 LUFS or higher -9 LUFS, therefore a target of -16 LUFS is perfectly fine, which means that Alexa's robot voice leads by 2 LU compared to the audio content.

Spotify normalizes audio to the equivalent of about -14 LUFS, -1 dbTP, but they still use ReplayGain, which is not exactly the same as LUFS but gives similar results. Spotify mainly decreases the volume of overly loud productions, but can also increase the volume on some playback devices if the audio is much too soft.
For pop music, -14 LUFS is acceptable, but for podcasts or classical music, it is too high. However, (pop) music can always be played a bit louder compared to speech, therefore a loudness target of -16 LUFS for podcasts is fine on Spotify as well.

Make LUFS, not war

If producers just use the highest recommended target and therefore the loudness (target) war goes into another battle, we will hear more compressed, distorted voices once again, lacking emotion and many subtleties that are reflected in the dynamic range of how we speak.

Setting a loudness target higher than -16 LUFS does not improve the listening experience in any way. However, many productions would benefit from sensitively adjusting differences in dynamics throughout the production, lifting up quieter segments, lowering loud segments, and treating speech and music differently. As pointed out in a presentation recently (video, in German), you can do that directly in your DAW or use a leveler to automate dynamics processing.

Instead of just raising the loudness target, adjusting differences in levels can help to make the listening experience much more enjoyable.

Recent album releases suggest that the music industry is still in the middle of the loudness war, often limiting the peak to loudness range to single digit LU values. (By the way, your vinyl-buying friend is right, vinyl records do sound better because they don't work with the highest compression rates.) Hopefully, podcasters, radio producers and other spoken word artists, as well as the platforms that host and publish their productions, can resist the temptation of louder and louder audio.
Make (no more than -16) LUFS, not war!

Conclusion

With some podcasts, smart speakers and streaming platforms trying to be louder than the competition, the listening experience deteriorates. However, podcasts can sound great and loud enough even in noisy environments, when well produced:

  • Never set a loudness target higher than -16 LUFS for spoken word audio.
  • If your audio is too quiet, try lifting up quieter sections and reducing the volume of louder sections, directly in your DAW or by using a leveler.
  • These settings will work fine on all current platforms, including Amazon Alexa and Spotify.

Resist the loudness target war!







r

Dynamic Range Processing in Audio Post Production

If listeners find themselves using the volume up and down buttons a lot, level differences within your podcast or audio file are too big.
In this article, we are discussing why audio dynamic range processing (or leveling) is more important than loudness normalization, why it depends on factors like the listening environment and the individual character of the content, and why the loudness range descriptor (LRA) is only reliable for speech programs.

Photo by Alexey Ruban.

Why loudness normalization is not enough

Everybody who has lived in an apartment building knows the problem: you want to enjoy a movie late at night, but you're constantly on the edge - not only because of the thrilling story, but because your index finger is hovering over the volume down button of your remote. The next loud sound effect is going to come sooner rather than later, and you want to avoid waking up your neighbors with some gunshot sounds blasting from your TV.

In our previous post, we talked about the overall loudness of a production. While that's certainly important to keep in mind, the loudness target is only an average value, ignoring how much the loudness varies within a production. The loudness target of your movie might be in the ideal range, yet the level differences between a gunshot and someone whispering can still be enormous - having you turn the volume down for the former and up for the latter.

While the average loudness might be perfect, level differences can lead to an unpleasant listening experience.

Of course, this doesn't apply to movies alone. The image above shows a podcast or radio production. The loud section is music, the very quiet section just breathing, and the remaining sections are different voices.

To be clear, we're not saying that the above example is problematic per se. There are many situations, where a big difference in levels - a high dynamic range - is justified: for instance, in a movie theater, optimized for listening and without any outside noise, or in classical music.
Also, if the dynamic range is too small, listening can be tiring.

But if you watch the same movie in an outdoor screening in the summer on a beach next to the crashing waves or in the middle of a noisy city, it can be tricky to hear the softer parts.
Spoken word usually has a smaller dynamic range, and if you produce your podcast for a target audience of train or car commuters, the dynamic range should be even smaller, adjusting for the listening situation.

Therefore, hitting the loudness target has less impact on the listening experience than level differences (dynamic range) within one file!
What makes a suitable dynamic range does not only depend on the listening environment, but also on the nature of the content itself. If the dynamic range is too small, the audio can be tiring to listen to, whereas more variability in levels can make a program more interesting, but might not work in all environments, such as a noisy car.

Dynamic range experiment in a car

Wolfgang Rein, audio technician at SWR, a public broadcaster in Germany, did an experiment to test how drivers react to programs with different dynamic ranges. They monitored to what level drivers set the car stereo depending on speed (thus noise level) and audio dynamic range.
While the results are preliminary, it seems like drivers set the volume as low as possible so that they can still understand the content, but don't get distracted by loud sounds.

As drivers adjust the volume to the loudest voice in a program, they won't understand quieter speakers in content with a high dynamic range anymore. To some degree and for short periods of time, they can compensate by focusing more on the radio program, but over time that's tiring. Therefore, if the loudness varies too much, drivers tend to switch to another program rather than adjusting the volume.
Similar results have been found in a study conducted by NPR Labs and Towson University.

On the other hand, the perception was different in pure music programs. When drivers set the volume according to louder parts, they weren't able to hear softer segments or the beginning of a song very well. But that did not matter to them as much and didn't make them want to turn up the volume or switch the program.

Listener's reaction in response to frequent loudness changes. (from John Kean, Eli Johnson, Dr. Ellyn Sheffield: Study of Audio Loudness Range for Consumers in Various Listening Modes and Ambient Noise Levels)

Loudness comfort zone

The reaction of drivers to variable loudness hints at something that BBC sound engineer Mike Thornton calls the loudness comfort zone.

Tests (...) have shown that if the short-term loudness stays within the "comfort zone" then the consumer doesn’t feel the need to reach for the remote control to adjust the volume.
In a blog post, he highlights how the series Blue Planet 2 and Planet Earth 2 might not always have been the easiest to listen to. The graph below shows an excerpt with very loud music, followed by commentary just at the bottom of the green comfort zone. Thornton writes: "with the volume set at a level that was comfortable when the music was playing we couldn’t always hear the excellent commentary from Sir David Attenborough and had to resort to turning on the subtitles to be sure we knew what Sir David was saying!"

Planet Earth 2 Loudness Plot Excerpt. Colored green: comfort zone of +3 to -5LU around the loudness target. (from Mike Thornton: BBC Blue Planet 2 Latest Show In Firing Line For Sound Issues - Are They Right?)

As already mentioned above, a good mix considers the maximum and minimum possible loudness in the target listening environment.
In a movie theater the loudness comfort zone is big (loudness can vary a lot), and loud music is part of the fun, while quiet scenes work just as well. The opposite was true in the aforementioned experiment with drivers, where the loudness comfort zone is much smaller and quiet voices are difficult to understand.

Hence, the loudness comfort zone determines how much dynamic range an audio signal can use in a specific listening environment.

How to measure dynamic range: LRA

When producing audio for various environments, it would be great to have a target value for dynamic range, (the difference between the smallest and largest signal values of an audio signal) as well. Then you could just set a dynamic range target, similarly to a loudness target.

Theoretically, the maximum possible dynamic range of a production is defined by the bit-depth of the audio format. A 16-bit recording can have a dynamic range of 96 dB; for 24-bit, it's 144 dB - which is well above the approx. 120 dB the human ear can handle. However, most of those bits are typically being used to get to a reasonable base volume. Picture a glass of water: you want it to be almost full, with some headroom so that it doesn't spill when there's a sudden movement, i.e. a bigger amplitude wave at the top.

Determining the dynamic range of a production is easier said than done, though. It depends on which signals are included in the measurement: for example, if something like background music or breathing should be considered at all.
The currently preferred method for broadcasting is called Loudness Range, LRA. It is measured in Loudness Units (LU), and takes into account everything between the 10th and the 95th percentile of a loudness distribution, after an additional gating method. In other words, the loudest 5% and quietest 10% of the audio signal are being ignored. This way, quiet breathing or an occasional loud sound effect won't affect the measurement.

Loudness distribution and LRA for the film 'The Matrix'. Figure from EBU Tech Doc 3343 (p.13).

However, the main difficulty is which signals should be included in the loudness range measurement and which ones should be gated. This is unfortunately often very subjective and difficult to define with a purely statistical method like LRA.

Where LRA falls short

Therefore, only pure speech programs give reliable LRA values that are comparable!
For instance, a typical LRA for news programs is 3 LU; for talks and discussions 5 LU is common. LRA values for features, radio dramas, movies or music very much depend on the individual character and might be in the range between 5 and 25 LU.

To further illustrate this, here are some typical LRA values, according to a paper by Thomas Lund (table 2):

ProgramLoudness Range
Matrix, full movie25.0
NBC Interstitials, Jan. 2008, all together (3:30)9.4
Friends Episode 166.6
Speak Ref., Male, German, SQUAM Trk 546.2
Speak Ref., Female, French, SQUAM Trk 514.8
Speak Ref., Male, English, Sound Check3.3
Wish You Were Here, Pink Floyd22.1
Gilgamesh, Battle of Titans, Osaka Symph.19.7
Don’t Cry For Me Arg., Sinead O’Conner13.7
Beethoven Son in F, Op17, Kliegel & Tichman12.0
Rock’n Roll Train, AC/DC6.0
I.G.Y., Donald Fagen3.6

LRA values of music are very unpredictable as well.
For instance, Tom Frampton measured the LRA of songs in multiple genres, and the differences within each genre are quite big. The ten pop songs that he analyzed varied in LRA between 3.7 and 12 LU, country songs between 3.6 and 14.9 LU. In the Electronic genre the individual LRAs were between 3.7 and 15.2 LU. Please see the tables at the bottom of his blog post for more details.

We at Auphonic also tried to base our Adaptive Leveler parameters on the LRA descriptor. Although it worked, it turned out that it is very difficult to set a loudness range target for diverse audio content, which does include speech, background sounds, music parts, etc. The results were not predictable and it was hard to find good target values. Therefore we developed our own algorithm to measure the dynamic range of audio signals.

In conclusion, LRA comparisons are only useful for productions with spoken word only and the LRA value is therefore not applicable as a general dynamic range target value. The more complex a production gets, the more difficult it is to make any judgment based on the LRA.
This is, because the definition of LRA is purely statistical. There's no smart measurement using classifiers that distinguish between music, speech, quiet breathing, background noises and other types of audio. One would need a more intelligent algorithm (as we use in our Adaptive Leveler), that knows which audio segments should be included and excluded from the measurement.

From theory to application: tools

Loudness and dynamic range clearly is a complicated topic. Luckily, there are tools that can help. To keep short-term loudness in range, a compressor can help control sudden changes in loudness - such as p-pops or consonants like t or k. To achieve a good mid-term loudness, i.e. a signal that doesn't go outside the comfort zone too much, a leveler is a good option. Or, just use a fader or manually adjust volume curves. And to make sure that separate productions sound consistent, loudness normalization is the way to go. We have covered all of this in-depth before.

Looking at the audio from above again, with an adaptive leveler applied it looks like this:

Leveler example. Output at the top, input with leveler envelope at the bottom.

Now, the voices are evened out and the music is at a comfortable level, while the breathing has not been touched at all.
We recently extended Auphonic's adaptive leveler, so that it is now possible to customize the dynamic range - please see adaptive leveler customization and advanced multitrack audio algorithms.
If you wanted to increase the loudness comfort zone (or dynamic range) of the standard preset by 10 dB (or LU), for example, the envelope would look like this:

Leveler with higher dynamic range, only touching sections with extremely low or extremely high loudness to fit into a specific loudness comfort zone.

When a production is done, our adaptive leveler uses classifiers to also calculate the integrated loudness and loudness range of dialog and music sections separately. This way it is possible to just compare the dialog LRA and loudness of complex productions.

Assessing the LRA and loudness of dialog and music separately.

Conclusion

Getting audio dynamics right is not easy. Yet, it is an important thing to keep in mind, because focusing on loudness normalization alone is not enough. In fact, hitting the loudness target often has less impact on the listening experience than level differences, i.e. audio dynamics.

If the dynamic range is too small, the audio can be tiring to listen to, whereas a bigger dynamic range can make a program more interesting, but might not work in loud environments, such as a noisy train.
Therefore, a good mix adapts the audio dynamic range according to the target listening environment (different loudness comfort zones in cinema, at home, in a car) and according to the nature of the content (radio feature, movie, podcast, music, etc.).

Furthermore, because the definition of the loudness range / LRA is purely statistical, only speech programs give reliable LRA values that are comparable.
More "intelligent" algorithms are in development, which use classifiers to decide which signals should be included and excluded from the dynamic range measurement.

If you understand German, take a look at our presentation about audio dynamic processing in podcasts for further information:







r

Horizontal or/and Vertical Format in Kayak Photography

Like most paddlers I have a tendency to shoot pictures in a horizontal (landscape) format. It is more tricky to shoot in a vertical format from my tippy kayaks, especially, when I have to use a paddle to stabilize my camera.




r

Winter Stand Up Paddling on Horsetooth Reservoir

I love paddling on the Horsetooth Reservoir in cold season. Boat ramps are closed, no power boat traffic, usually quiet and calm. Snow and ice can enhance scenery. A great time to paddle, train, relax or photograph. The Horsetooth stays […]




r

Some Rights Reserved




r

How to Foster Real-Time Client Engagement During Moderated Research

When we conduct moderated research, like user interviews or usability tests, for our clients, we encourage them to observe as many sessions as possible. We find when clients see us interview their users, and get real-time responses, they’re able to learn about the needs of their users in real-time and be more active participants in the process. One way we help clients feel engaged with the process during remote sessions is to establish a real-time communication backchannel that empowers clients to flag responses they’d like to dig into further and to share their ideas for follow-up questions.

There are several benefits to establishing a communication backchannel for moderated sessions:

  • Everyone on the team, including both internal and client team members, can be actively involved throughout the data collection process rather than waiting to passively consume findings.
  • Team members can identify follow-up questions in real-time which allows the moderator to incorporate those questions during the current session, rather than just considering them for future sessions.
  • Subject matter experts can identify more detailed and specific follow-up questions that the moderator may not think to ask.
  • Even though the whole team is engaged, a single moderator still maintains control over the conversation which creates a consistent experience for the participant.

If you’re interested in creating your own backchannel, here are some tips to make the process work smoothly:

  • Use the chat tool that is already being used on the project. In most cases, we use a joint Slack workspace for the session backchannel but we’ve also used Microsoft Teams.
  • Create a dedicated channel like #moderated-sessions. Conversation in this channel should be limited to backchannel discussions during sessions. This keeps the communication consolidated and makes it easier for the moderator to stay focused during the session.
  • Keep communication limited. Channel participants should ask basic questions that are easy to consume quickly. Supplemental commentary and analysis should not take place in the dedicated channel.
  • Use emoji responses. The moderator can add a quick thumbs up to indicate that they’ve seen a question.

Introducing backchannels for communication during remote moderated sessions has been a beneficial change to our research process. It not only provides an easy way for clients to stay engaged during the data collection process but also increases the moderator’s ability to focus on the most important topics and to ask the most useful follow-up questions.




r

Concurrency & Multithreading in iOS

Concurrency is the notion of multiple things happening at the same time. This is generally achieved either via time-slicing, or truly in parallel if multiple CPU cores are available to the host operating system. We've all experienced a lack of concurrency, most likely in the form of an app freezing up when running a heavy task. UI freezes don't necessarily occur due to the absence of concurrency — they could just be symptoms of buggy software — but software that doesn't take advantage of all the computational power at its disposal is going to create these freezes whenever it needs to do something resource-intensive. If you've profiled an app hanging in this way, you'll probably see a report that looks like this:

Anything related to file I/O, data processing, or networking usually warrants a background task (unless you have a very compelling excuse to halt the entire program). There aren't many reasons that these tasks should block your user from interacting with the rest of your application. Consider how much better the user experience of your app could be if instead, the profiler reported something like this:

Analyzing an image, processing a document or a piece of audio, or writing a sizeable chunk of data to disk are examples of tasks that could benefit greatly from being delegated to background threads. Let's dig into how we can enforce such behavior into our iOS applications.


A Brief History

In the olden days, the maximum amount of work per CPU cycle that a computer could perform was determined by the clock speed. As processor designs became more compact, heat and physical constraints started becoming limiting factors for higher clock speeds. Consequentially, chip manufacturers started adding additional processor cores on each chip in order to increase total performance. By increasing the number of cores, a single chip could execute more CPU instructions per cycle without increasing its speed, size, or thermal output. There's just one problem...

How can we take advantage of these extra cores? Multithreading.

Multithreading is an implementation handled by the host operating system to allow the creation and usage of n amount of threads. Its main purpose is to provide simultaneous execution of two or more parts of a program to utilize all available CPU time. Multithreading is a powerful technique to have in a programmer's toolbelt, but it comes with its own set of responsibilities. A common misconception is that multithreading requires a multi-core processor, but this isn't the case — single-core CPUs are perfectly capable of working on many threads, but we'll take a look in a bit as to why threading is a problem in the first place. Before we dive in, let's look at the nuances of what concurrency and parallelism mean using a simple diagram:

In the first situation presented above, we observe that tasks can run concurrently, but not in parallel. This is similar to having multiple conversations in a chatroom, and interleaving (context-switching) between them, but never truly conversing with two people at the same time. This is what we call concurrency. It is the illusion of multiple things happening at the same time when in reality, they're switching very quickly. Concurrency is about dealing with lots of things at the same time. Contrast this with the parallelism model, in which both tasks run simultaneously. Both execution models exhibit multithreading, which is the involvement of multiple threads working towards one common goal. Multithreading is a generalized technique for introducing a combination of concurrency and parallelism into your program.


The Burden of Threads

A modern multitasking operating system like iOS has hundreds of programs (or processes) running at any given moment. However, most of these programs are either system daemons or background processes that have very low memory footprint, so what is really needed is a way for individual applications to make use of the extra cores available. An application (process) can have many threads (sub-processes) operating on shared memory. Our goal is to be able to control these threads and use them to our advantage.

Historically, introducing concurrency to an app has required the creation of one or more threads. Threads are low-level constructs that need to be managed manually. A quick skim through Apple's Threaded Programming Guide is all it takes to see how much complexity threaded code adds to a codebase. In addition to building an app, the developer has to:

  • Responsibly create new threads, adjusting that number dynamically as system conditions change
  • Manage them carefully, deallocating them from memory once they have finished executing
  • Leverage synchronization mechanisms like mutexes, locks, and semaphores to orchestrate resource access between threads, adding even more overhead to application code
  • Mitigate risks associated with coding an application that assumes most of the costs associated with creating and maintaining any threads it uses, and not the host OS

This is unfortunate, as it adds enormous levels of complexity and risk without any guarantees of improved performance.


Grand Central Dispatch

iOS takes an asynchronous approach to solving the concurrency problem of managing threads. Asynchronous functions are common in most programming environments, and are often used to initiate tasks that might take a long time, like reading a file from the disk, or downloading a file from the web. When invoked, an asynchronous function executes some work behind the scenes to start a background task, but returns immediately, regardless of how long the original task might takes to actually complete.

A core technology that iOS provides for starting tasks asynchronously is Grand Central Dispatch (or GCD for short). GCD abstracts away thread management code and moves it down to the system level, exposing a light API to define tasks and execute them on an appropriate dispatch queue. GCD takes care of all thread management and scheduling, providing a holistic approach to task management and execution, while also providing better efficiency than traditional threads.

Let's take a look at the main components of GCD:

What've we got here? Let's start from the left:

  • DispatchQueue.main: The main thread, or the UI thread, is backed by a single serial queue. All tasks are executed in succession, so it is guaranteed that the order of execution is preserved. It is crucial that you ensure all UI updates are designated to this queue, and that you never run any blocking tasks on it. We want to ensure that the app's run loop (called CFRunLoop) is never blocked in order to maintain the highest framerate. Subsequently, the main queue has the highest priority, and any tasks pushed onto this queue will get executed immediately.
  • DispatchQueue.global: A set of global concurrent queues, each of which manage their own pool of threads. Depending on the priority of your task, you can specify which specific queue to execute your task on, although you should resort to using default most of the time. Because tasks on these queues are executed concurrently, it doesn't guarantee preservation of the order in which tasks were queued.

Notice how we're not dealing with individual threads anymore? We're dealing with queues which manage a pool of threads internally, and you will shortly see why queues are a much more sustainable approach to multhreading.

Serial Queues: The Main Thread

As an exercise, let's look at a snippet of code below, which gets fired when the user presses a button in the app. The expensive compute function can be anything. Let's pretend it is post-processing an image stored on the device.

import UIKit

class ViewController: UIViewController {
    @IBAction func handleTap(_ sender: Any) {
        compute()
    }

    private func compute() -> Void {
        // Pretending to post-process a large image.
        var counter = 0
        for _ in 0..<9999999 {
            counter += 1
        }
    }
}

At first glance, this may look harmless, but if you run this inside of a real app, the UI will freeze completely until the loop is terminated, which will take... a while. We can prove it by profiling this task in Instruments. You can fire up the Time Profiler module of Instruments by going to Xcode > Open Developer Tool > Instruments in Xcode's menu options. Let's look at the Threads module of the profiler and see where the CPU usage is highest.

We can see that the Main Thread is clearly at 100% capacity for almost 5 seconds. That's a non-trivial amount of time to block the UI. Looking at the call tree below the chart, we can see that the Main Thread is at 99.9% capacity for 4.43 seconds! Given that a serial queue works in a FIFO manner, tasks will always complete in the order in which they were inserted. Clearly the compute() method is the culprit here. Can you imagine clicking a button just to have the UI freeze up on you for that long?

Background Threads

How can we make this better? DispatchQueue.global() to the rescue! This is where background threads come in. Referring to the GCD architecture diagram above, we can see that anything that is not the Main Thread is a background thread in iOS. They can run alongside the Main Thread, leaving it fully unoccupied and ready to handle other UI events like scrolling, responding to user events, animating etc. Let's make a small change to our button click handler above:

class ViewController: UIViewController {
    @IBAction func handleTap(_ sender: Any) {
        DispatchQueue.global(qos: .userInitiated).async { [unowned self] in
            self.compute()
        }
    }

    private func compute() -> Void {
        // Pretending to post-process a large image.
        var counter = 0
        for _ in 0..<9999999 {
            counter += 1
        }
    }
}

Unless specified, a snippet of code will usually default to execute on the Main Queue, so in order to force it to execute on a different thread, we'll wrap our compute call inside of an asynchronous closure that gets submitted to the DispatchQueue.global queue. Keep in mind that we aren't really managing threads here. We're submitting tasks (in the form of closures or blocks) to the desired queue with the assumption that it is guaranteed to execute at some point in time. The queue decides which thread to allocate the task to, and it does all the hard work of assessing system requirements and managing the actual threads. This is the magic of Grand Central Dispatch. As the old adage goes, you can't improve what you can't measure. So we measured our truly terrible button click handler, and now that we've improved it, we'll measure it once again to get some concrete data with regards to performance.

Looking at the profiler again, it's quite clear to us that this is a huge improvement. The task takes an identical amount of time, but this time, it's happening in the background without locking up the UI. Even though our app is doing the same amount of work, the perceived performance is much better because the user will be free to do other things while the app is processing.

You may have noticed that we accessed a global queue of .userInitiated priority. This is an attribute we can use to give our tasks a sense of urgency. If we run the same task on a global queue of and pass it a qos attribute of background , iOS will think it's a utility task, and thus allocate fewer resources to execute it. So, while we don't have control over when our tasks get executed, we do have control over their priority.

A Note on Main Thread vs. Main Queue

You might be wondering why the Profiler shows "Main Thread" and why we're referring to it as the "Main Queue". If you refer back to the GCD architecture we described above, the Main Queue is solely responsible for managing the Main Thread. The Dispatch Queues section in the Concurrency Programming Guide says that "the main dispatch queue is a globally available serial queue that executes tasks on the application’s main thread. Because it runs on your application’s main thread, the main queue is often used as a key synchronization point for an application."

The terms "execute on the Main Thread" and "execute on the Main Queue" can be used interchangeably.


Concurrent Queues

So far, our tasks have been executed exclusively in a serial manner. DispatchQueue.main is by default a serial queue, and DispatchQueue.global gives you four concurrent dispatch queues depending on the priority parameter you pass in.

Let's say we want to take five images, and have our app process them all in parallel on background threads. How would we go about doing that? We can spin up a custom concurrent queue with an identifier of our choosing, and allocate those tasks there. All that's required is the .concurrent attribute during the construction of the queue.

class ViewController: UIViewController {
    let queue = DispatchQueue(label: "com.app.concurrentQueue", attributes: .concurrent)
    let images: [UIImage] = [UIImage].init(repeating: UIImage(), count: 5)

    @IBAction func handleTap(_ sender: Any) {
        for img in images {
            queue.async { [unowned self] in
                self.compute(img)
            }
        }
    }

    private func compute(_ img: UIImage) -> Void {
        // Pretending to post-process a large image.
        var counter = 0
        for _ in 0..<9999999 {
            counter += 1
        }
    }
}

Running that through the profiler, we can see that the app is now spinning up 5 discrete threads to parallelize a for-loop.

Parallelization of N Tasks

So far, we've looked at pushing computationally expensive task(s) onto background threads without clogging up the UI thread. But what about executing parallel tasks with some restrictions? How can Spotify download multiple songs in parallel, while limiting the maximum number up to 3? We can go about this in a few ways, but this is a good time to explore another important construct in multithreaded programming: semaphores.

Semaphores are signaling mechanisms. They are commonly used to control access to a shared resource. Imagine a scenario where a thread can lock access to a certain section of the code while it executes it, and unlocks after it's done to let other threads execute the said section of the code. You would see this type of behavior in database writes and reads, for example. What if you want only one thread writing to a database and preventing any reads during that time? This is a common concern in thread-safety called Readers-writer lock. Semaphores can be used to control concurrency in our app by allowing us to lock n number of threads.

let kMaxConcurrent = 3 // Or 1 if you want strictly ordered downloads!
let semaphore = DispatchSemaphore(value: kMaxConcurrent)
let downloadQueue = DispatchQueue(label: "com.app.downloadQueue", attributes: .concurrent)

class ViewController: UIViewController {
    @IBAction func handleTap(_ sender: Any) {
        for i in 0..<15 {
            downloadQueue.async { [unowned self] in
                // Lock shared resource access
                semaphore.wait()

                // Expensive task
                self.download(i + 1)

                // Update the UI on the main thread, always!
                DispatchQueue.main.async {
                    tableView.reloadData()

                    // Release the lock
                    semaphore.signal()
                }
            }
        }
    }

    func download(_ songId: Int) -> Void {
        var counter = 0

        // Simulate semi-random download times.
        for _ in 0..<Int.random(in: 999999...10000000) {
            counter += songId
        }
    }
}

Notice how we've effectively restricted our download system to limit itself to k number of downloads. The moment one download finishes (or thread is done executing), it decrements the semaphore, allowing the managing queue to spawn another thread and start downloading another song. You can apply a similar pattern to database transactions when dealing with concurrent reads and writes.

Semaphores usually aren't necessary for code like the one in our example, but they become more powerful when you need to enforce synchronous behavior whille consuming an asynchronous API. The above could would work just as well with a custom NSOperationQueue with a maxConcurrentOperationCount, but it's a worthwhile tangent regardless.


Finer Control with OperationQueue

GCD is great when you want to dispatch one-off tasks or closures into a queue in a 'set-it-and-forget-it' fashion, and it provides a very lightweight way of doing so. But what if we want to create a repeatable, structured, long-running task that produces associated state or data? And what if we want to model this chain of operations such that they can be cancelled, suspended and tracked, while still working with a closure-friendly API? Imagine an operation like this:

This would be quite cumbersome to achieve with GCD. We want a more modular way of defining a group of tasks while maintaining readability and also exposing a greater amount of control. In this case, we can use Operation objects and queue them onto an OperationQueue, which is a high-level wrapper around DispatchQueue. Let's look at some of the benefits of using these abstractions and what they offer in comparison to the lower-level GCI API:

  • You may want to create dependencies between tasks, and while you could do this via GCD, you're better off defining them concretely as Operation objects, or units of work, and pushing them onto your own queue. This would allow for maximum reusability since you may use the same pattern elsewhere in an application.
  • The Operation and OperationQueue classes have a number of properties that can be observed, using KVO (Key Value Observing). This is another important benefit if you want to monitor the state of an operation or operation queue.
  • Operations can be paused, resumed, and cancelled. Once you dispatch a task using Grand Central Dispatch, you no longer have control or insight into the execution of that task. The Operation API is more flexible in that respect, giving the developer control over the operation's life cycle.
  • OperationQueue allows you to specify the maximum number of queued operations that can run simultaneously, giving you a finer degree of control over the concurrency aspects.

The usage of Operation and OperationQueue could fill an entire blog post, but let's look at a quick example of what modeling dependencies looks like. (GCD can also create dependencies, but you're better off dividing up large tasks into a series of composable sub-tasks.) In order to create a chain of operations that depend on one another, we could do something like this:

class ViewController: UIViewController {
    var queue = OperationQueue()
    var rawImage = UIImage? = nil
    let imageUrl = URL(string: "https://example.com/portrait.jpg")!
    @IBOutlet weak var imageView: UIImageView!

    let downloadOperation = BlockOperation {
        let image = Downloader.downloadImageWithURL(url: imageUrl)
        OperationQueue.main.async {
            self.rawImage = image
        }
    }

    let filterOperation = BlockOperation {
        let filteredImage = ImgProcessor.addGaussianBlur(self.rawImage)
        OperationQueue.main.async {
            self.imageView = filteredImage
        }
    }

    filterOperation.addDependency(downloadOperation)

    [downloadOperation, filterOperation].forEach {
        queue.addOperation($0)
     }
}

So why not opt for a higher level abstraction and avoid using GCD entirely? While GCD is ideal for inline asynchronous processing, Operation provides a more comprehensive, object-oriented model of computation for encapsulating all of the data around structured, repeatable tasks in an application. Developers should use the highest level of abstraction possible for any given problem, and for scheduling consistent, repeated work, that abstraction is Operation. Other times, it makes more sense to sprinkle in some GCD for one-off tasks or closures that we want to fire. We can mix both OperationQueue and GCD to get the best of both worlds.


The Cost of Concurrency

DispatchQueue and friends are meant to make it easier for the application developer to execute code concurrently. However, these technologies do not guarantee improvements to the efficiency or responsiveness in an application. It is up to you to use queues in a manner that is both effective and does not impose an undue burden on other resources. For example, it's totally viable to create 10,000 tasks and submit them to a queue, but doing so would allocate a nontrivial amount of memory and introduce a lot of overhead for the allocation and deallocation of operation blocks. This is the opposite of what you want! It's best to profile your app thoroughly to ensure that concurrency is enhancing your app's performance and not degrading it.

We've talked about how concurrency comes at a cost in terms of complexity and allocation of system resources, but introducing concurrency also brings a host of other risks like:

  • Deadlock: A situation where a thread locks a critical portion of the code and can halt the application's run loop entirely. In the context of GCD, you should be very careful when using the dispatchQueue.sync { } calls as you could easily get yourself in situations where two synchronous operations can get stuck waiting for each other.
  • Priority Inversion: A condition where a lower priority task blocks a high priority task from executing, which effectively inverts their priorities. GCD allows for different levels of priority on its background queues, so this is quite easily a possibility.
  • Producer-Consumer Problem: A race condition where one thread is creating a data resource while another thread is accessing it. This is a synchronization problem, and can be solved using locks, semaphores, serial queues, or a barrier dispatch if you're using concurrent queues in GCD.
  • ...and many other sorts of locking and data-race conditions that are hard to debug! Thread safety is of the utmost concern when dealing with concurrency.

Parting Thoughts + Further Reading

If you've made it this far, I applaud you. Hopefully this article gives you a lay of the land when it comes to multithreading techniques on iOS, and how you can use some of them in your app. We didn't get to cover many of the lower-level constructs like locks, mutexes and how they help us achieve synchronization, nor did we get to dive into concrete examples of how concurrency can hurt your app. We'll save those for another day, but you can dig into some additional reading and videos if you're eager to dive deeper.




r

African American Women Leading in Tech

“Close your eyes and name three people who have impacted the tech industry.”

In all likelihood, that list might be overwhelmingly white and male.

And you are not alone. Numerous lists online yielded the same results. In recent years, many articles have chronicled the dearth of diversity in tech. Studies have shown the ways in which venture capital firms have systematically underestimated and undervalued innovation coming particularly from women of color. In 2016 only 88 tech startups were led by African American women, in 2018 this number had climbed to a little over 200. Between 2009 and 2017, African American women raised $289MM in venture/angel funding. For perspective, this only represents .0006% of the $424.7B in total tech venture funding raised in that same time frame. In 2018, only 34 African American women had ever raised more than a million in venture funding.

When it comes to innovation, it is not unusual for financial value to be the biggest predictor of what is considered innovative. In fact, a now largely controversial list posted by Forbes of America’s most innovative leaders in the fall of 2019 featured 99 men and one woman. Ironically, what was considered innovative was, in fact, very traditional in its presentation. The criteria used for the list was “media reputation for innovation,” social connections, a track record for value creation, and investor expectations for value creation.

The majority of African American women-led startups raise $42,000 from largely informal networks. Criteria weighted on the side of ‘track record for value creation’ and ‘investor expectations for value creation’ devalues the immense contributions of African American women leading the charge on thoughtful and necessary tech. Had Forbes used criteria for innovation that recognized emergent leadership, novel problem-solving, or original thinking outside the circles of already well-known and well-established entrepreneurs we might have learned something new. Instead, we're basically reminded that "it takes money to make money."

Meanwhile, African American women are the fastest-growing demographic of entrepreneurs in the United States. Their contributions to tech, amongst other fields, are cementing the importance of African American women in the innovation space. And they are doing this within and outside traditional tech frameworks. By becoming familiar with these entrepreneurs and their work, we can elevate their reputation and broaden our collective recognition of innovative leaders.

In honor of black history month, we have compiled a list of African American women founders leading the way in tech innovation from Alabama to the Bay Area. From rethinking energy to debt forgiveness platforms these women are crossing boundaries in every field.

Cultivating New Leaders

Photo of Kathryn Finney, courtesy of Forbes.com.

Kathryn Finney founder of Digitalundivided
Kathryn A. Finney is an American author, researcher, investor, entrepreneur, innovator and businesswoman. She is the founder and CEO of digitalundivided, a social enterprise that leads high potential Black and Latinx women founders through the startup pipeline from idea to exit.

Laura Weidman Co-founder Code2040
Laura Weidman Powers is the co-founder and executive director of Code2040, a nonprofit that creates access, awareness, and opportunities for minority engineering talent to ensure their leadership in the innovation economy.

Angelica Ross founder of TransTech Social Enterprises
Angelica Ross is an American businesswoman, actress, and transgender rights advocate. After becoming a self-taught computer coder, she went on to become the founder and CEO of TransTech Social Enterprises, a firm that helps employ transgender people in the tech industry.

Christina Souffrant Ntim co-founder of Global Startup Ecosystem
Christina Souffrant Ntim is the co-founder of award-winning digital accelerator platform – Global Startup Ecosystem which graduates over 1000+ companies across 90+ countries a year.

Media and Entertainment

Bryanda Law founder of Quirktastic
Bryanda Law is the founder of Quirktastic, a modern media-tech company on a mission to grow the largest and most authentically engaged community of fandom-loving people of color.

Morgan Debaun founder of Blavity Inc.
Morgan DeBaun is an African American entrepreneur. She is the Founder and CEO of Blavity Inc., a portfolio of brands and websites created by and for black millennials

Cheryl Contee co-founder of Do Big Things
Cheryl Contee is the award-winning CEO and co-founder of Do Big Things, a digital agency that creates new narratives and tech for a new era focused on causes and campaigns.

Photo of Farah Allen, courtesy of The Source Magazine.

Farah Allen founder of The Labz
Farah Allen is the CEO and founder of The Labz, a collaborative workspace that provides automated tracking, rights management, protection—using Blockchain technology—of your music files during and after you create them.

Health/Wellness

Mara Lidey co-founder of Shine
Marah Lidey is the co-founder & co-CEO of Shine. Shine aims to reinvent health and wellness for millennials through messaging technology.

Alicia Thomas co-founder of Dibs
Alicia Thomas is the founder and CEO of Dibs, a B2B digital platform that gives studios quick and easy access to real-time pricing for fitness classes.

Photo of Erica Plybeah, courtesy of BetterTennessee.com

Erica Plybeah Hemphill founder of MedHaul
Erica Plybeah Hemphill is the founder of MedHaul. MedHaul offers cloud-based solutions that ease the burdens of managing patient transportation.

Star Cunningham founder of 4D Healthware
Star Cunningham is the founder and CEO of 4D Healthware. 4D Healthware is patient engagement software that makes personalized medicine possible through connected data.

Kimberly Wilson founder of HUED
Kimberly Wilson is the founder of HUED. HUED is a healthcare technology startup that helps patients find and book appointments with Black and Latinx healthcare providers.

Financial

Viola Llewellyn co-founder of Ovamba Solutions
Viola Llewellyn is the co-founder and the president of Ovamba Solutions, a US-based fintech company that provides micro, small, and medium enterprises in Africa and the Middle East with microfinance through a mobile platform.

NanaEfua Baidoo Afoh-Manin, Briana DeCuir and Joanne Moreau founders of Shared Harvest Fund
NanaEfua, Briana and Joanne are the founders of Shared Harvest Fund. Shared Harvest Fund provides real opportunities for talented people to volunteer away their student loans.

Photo of Sheena Allen, courtesy of People of Color in Tech.

Sheena Allen founder of CapWay
Sheena Allen is best known as the founder and CEO of fintech company and mobile bank CapWay.

Education

Helen Adeosun co-founder of CareAcademy
Helen Adeosun is the co-founder, president and CEO of CareAcademy, a start-up dedicated to professionalizing caregiving through online classes. CareAcademy brings professional development to caregivers at all levels.

Alexandra Bernadotte founder of Beyond 12
Alex Bernadotte is the founder and chief executive officer of Beyond 12, a nonprofit that integrates personalized coaching with intelligent technology to increase the number of traditionally underserved students who earn a college degree.

Shani Dowell founder of Possip
Shani Dowell is the founder of Possip, a platform that simplifies feedback between parents, schools and districts. Learn more at possipit.com.

Kaya Thomas of We Read Too
Kaya Thomas is an American computer scientist, app developer and writer. She is the creator of We Read Too, an iOS app that helps readers discover books for and by people of color.

Kimberly Gray founder of Uvii
Kimberly Gray is the founder of Uvii. Uvii helps students to communicate and collaborate on mobile with video, audio, and text

Nicole Neal co-founder of ProcureK12 by Noodle Markets
Nicole Neal is the co-founder and CEO of ProcureK12 by Noodle Markets. ProcureK12 makes purchasing for education simple. They combine a competitive school supply marketplace with quote request tools and bid management.

Beauty/Fashion/Consumer goods

Regina Gwyn founder of TresseNoire
Regina Gwynn is the co-founder & CEO of TresseNoire, the leading on-location beauty booking app designed for women of color in New York City and Philadelphia.

Camille Hearst co-founder of Kit.
Camille Hearst is the CEO and co-founder of Kit. Kit lets experts create shoppable collections of products so their followers can buy and the experts can make some revenue from what they share.

Photo of Esosa Ighodaro courtesy of Under30CEO.

Esosa Ighodaro co-founder of CoSign Inc.
Esosa Ighodaro is the co-founder of CoSign Inc., which was founded in 2013. CoSign is a mobile application that transfers social media content into commerce giving cash for endorsing and cosigning products and merchandise like clothing, home goods, technology and more.

Environment

Jessica Matthews founder of Uncharted Power
Jessica O. Matthews is a Nigerian-American inventor, CEO and venture capitalist. She is the co-founder of Uncharted Power, which made Soccket, a soccer ball that can be used as a power generator.

Etosha Cave co-founder of Opus 12
Etosha R. Cave is an American mechanical engineer based in Berkeley, California. She is the Co-Founder and Chief Scientific Officer of Opus 12, a startup that recycles carbon dioxide.

Kellee James founder of Mercaris, Inc.
Kellee James is the founder and CEO of Mercaris, Inc., a growing, minority-led start-up that makes efficient trading of organic and non-GMO commodities possible via market data service exchanges and trading platforms.

Workplace

Photo of Lisa Skeete Tatum courtesy of The Philadelphia Citezen.

Lisa Skeete Tatum founder of Landit
Lisa Skeete Tatum is the founder and CEO of Landit, a technology platform created to increase the success and engagement of women in the workplace, and to enable companies to attract, develop, and retain high-potential, diverse talent.

Netta Jenkins and Jacinta Mathis founders of Dipper
Netta Jenkins and Jacinta Mathis are founders of Dipper, a platform that acts as a safe digital space for individuals of color in the workplace.

Sherisse Hawkins founder of Pagedip
Sherisse Hawkins is the visionary and founder of Pagedip. Pagedip is a cloud-based software solution that allows you to bring depth to digital documents, enabling people to read (text), watch (video) and do (interact) all in the same place without ever having to leave the page.

Thkisha DeDe Sanogo founder of MyTAASK
Thkisha DeDe Sanogo is the founder of MyTAASK. MyTAASK is a personal planning platform dedicated to getting stuff done in real-time.

Home

Photo of Jean Brownhill, courtesy of Quartz at Work.

Jean Brownhill founder of Sweeten 
Jean Brownhill is the founder and CEO of Sweeten, an award-winning service that helps homeowners and business owners find and manage the best vetted general contractors for major renovation projects.

Reham Fagiri co-founder of AptDeco
Reham Fagiri is the co-founder of AptDeco. AptDeco is an online marketplace for buying and selling quality preowned furniture with pick up and delivery built into the service.

Stephanie Cummings founder of Please Assist Me
Stephanie Cummings is the founder and CEO of Please Assist me. Please Assist Me is an apartment task service in Nashville, TN. The organization empowers working professionals by allowing them to outsource their weekly chores to their own personal team.

Law

Kristina Jones co-founder of Court Buddy
Kristina Jones is the co-founder of Court Buddy, a service that matches clients with lawyers.

Sonja Ebron and Debra Slone founders of Courtroom5
Sonja Ebron and Debra Slone are the founders of Courtroom5. Courtroom5 helps you represent yourself in court with tools, training, and community designed for pro se litigants.

Crowdfunding

Zuley Clarke founder of Business Gift Registry
Zuley Clarke is the founder of Business Gift Registry, a crowdfunding platform that lets friends and family support an entrepreneur through gift-giving just like they would support a couple for a wedding.



  • News & Culture