n

Trying to straighten all the lines on this shot is a sure fire...



Trying to straighten all the lines on this shot is a sure fire way to go blind. ???? (at London, United Kingdom)




n

I’ve gone subway hopping for photos in every city...



I’ve gone subway hopping for photos in every city I’ve been to except the one I live in. ???? (at Toronto, Ontario)




n

I just realized that I can export my entire story all at once...



I just realized that I can export my entire story all at once now, which means uploading my tutorials to my Facebook page will be a million times easier (it was tedious to stitch all the individual clips together before). ????
.
Related: I posted a story this morning deconstructing the edit on yesterday’s shot.
.
Also related: I uploaded the 3 tutorials from my November feature on @thecreatorclass to my Facebook page this morning too. More to come! (at London, United Kingdom)




n

I took this shot about a year ago when I had a very different...



I took this shot about a year ago when I had a very different editing style. A ton of faded blacks and, believe it or not, a subtle green tint (unknowingly inherited from the preset I was using at the time). Re-editing it now, I’m happy with the way my style has evolved, though I can already sense that I’m on the brink of evolving it again. And I’m okay with that. ???? (at London, United Kingdom)




n

This might as well be a Herschel ad. ???? (at London, United...



This might as well be a Herschel ad. ???? (at London, United Kingdom)




n

This trip solidified my conviction to learning photography. A...



This trip solidified my conviction to learning photography. A lot has happened since this shot was taken.
Can you pinpoint the moment you decided to pursue photography? (at Toronto, Ontario)




n

A lot to look forward to in 2017. How did 2016 treat you: ???? or...



A lot to look forward to in 2017. How did 2016 treat you: ???? or ????? (at San Francisco, California)




n

Four days from now I’ll be boarding a one way flight to...



Four days from now I’ll be boarding a one way flight to San Francisco to take on the next evolution of my role at @shopify. Leaving the city that I’ve called home my entire life and the people who have defined everything I am was one of the most uncomfortable decisions I’ve ever had to make. But this wouldn’t be the first time I’ve chased discomfort in my career.
.
I wrote about my ongoing pursuit for discomfort this morning in hopes of inspiring others to do the things that scare and challenge them this year. You can find the link in my profile.
.
Happy 2017! ????
.
????: @jonasll (at San Francisco, California)




n

Quick survey: on average, what time is it when you check...



Quick survey: on average, what time is it when you check Instagram for the first time on any given day? (Be sure to include your timezone!)
.
PS: Thank you for all the incredible support on yesterday’s announcement. ❤️ (at Toronto, Ontario)














n

Self-promotion

The world has changed. Everything we do is more immediately visible to others than ever before, but much remains the same; the relationships we develop are as important as they always were. This post is a few thoughts on self-promotion, and how to have good relationships as a self-publisher.

Meeting people face to face is ace. They could be colleagues, vendors, or clients; at conferences, coffee shops, or meeting rooms. The hallway and bar tracks at conferences are particularly great. I always come away with a refreshed appreciation for meatspace. However, most of our interactions take place over the Web. On the Web, the lines separating different kinds of relationships are a little blurred. The company trying to get you to buy a product or conference ticket uses the same medium as your friends.

Freelancers and small companies (and co-ops!) can have as much of an impact as big businesses. ‘I publish therefore I am’ could be our new mantra. Hence this post, in a way. Although, I confess I have discussed these thoughts with friends and thought it was about time I kept my promise to publish them.

Publishing primarily means text and images. Text is the most prevalent. However, much more meaning is conveyed non-verbally. ‘It’s not what you say, it’s how you say it.’

Text can contain non-verbal elements like style — either handwritten or typographic characters — and emoticons, but we don’t control style in Twitter, email, or feeds. Or in any of the main situations where people read what we write (unless it’s our own site). Emoticons are often used in text to indicate tone, pitch, inflection, and emotion like irony, humour, or dismay. They plug gaps in the Latin alphabet’s scope that could be filled with punctuation like the sarcasm mark. By using them, we affirm how important non-verbal communication is.

The other critical non-verbal communication around text is karma. Karma is our reputation, our social capital with our audience of peers, commentators, and customers. It has two distinct parts: Personality, and professional reputation. ‘It’s not what was said, it’s who said what.’

So, after that quick brain dump, let me recap:

  • Relationships are everything.
  • We publish primarily in text without the nuance of critical non-verbal communication.
  • Text has non-verbal elements like style and emoticons, but we can only control the latter.
  • Context is also non-verbal communication. Context is karma: Character and professional reputation.

Us Brits are a funny bunch. Traditionally reserved. Hyperbole-shy. At least, in public. We use certain extreme adjectives sparingly for the most part, and usually avoid superlatives if at all possible. We wince a little if we forget and get super-excited. We sometimes prefer ‘spiffing’ accompanied by a wry, ironic smile over an outright ‘awesome’. Both are genuine — one has an extra layer in the inflection cake. However, we take great displeasure in observing blunt marketing messages that try to convince us something is true with massive, lobe-smacking enthusiasm, and some sort of exaggerated adjective-osmosis effect. We poke fun at attempts to be overly cool. We expect a decent level of self-awareness and ring of honesty from people who would sell us stuff. The Web is no exception. In fact, I may go so far as to say that the sensibilities of the Web are fairly closely aligned with British sensibilities. Without, of course, any of our crippling embarrassment. In an age when promoting oneself on the Web is almost required for designers, that’s no bad thing. After all, running smack bang through the middle of the new marketing arts is a large dose of reality; we’re just a bunch of folks telling our story. No manipulation, cool-kid feigned nonchalance, or lobe-smacking enthusiasm required.

Consider what the majority of designers do to promote themselves in this brave new maker-creative culture. People like my friend, Elliot Jay Stocks: making his own magazine, making music, distributing WordPress themes, and writing about his experiences. Yes, it is important for him that he has an audience, and yes, he wants us to buy his stuff, but no, he won’t try to impress or trick us into liking him. It’s our choice. Compare this to traditional advertising that tries to appeal to your demographic with key phrases from your tribe, life-style pitches, and the usual raft of Freudian manipulations. (Sarcasm mark needed here, although I do confess to a soft spot for the more visceral and kitsch Freudian manipulations.)

There is a middle ground between the two though. A dangerous place full of bad surprises: The outfit that seems like a human being. It appears to publish just like you would. They want money in exchange for their amazing stuff they’re super-duper proud of. Then, you find out they’re selling it to you at twice the price it is in the States, or that it crashes every time it closes, or has awful OpenType support. You find out the human being was really a corporate cyborg who sounds like you, but is not of you, and it’s impervious to your appeals to human fairness. Then there are the folks who definitely are human, after all they’re only small, and you know their names. All the non-verbal communication tells you so. Then you peek a little closer —  you see the context — and all they seem to do is talk about themselves, or their business. Their interactions are as carefully crafted as the big companies, and they treat their audience as a captive market. Great spirit forefend they share the bandwidth by celebrating anyone else. They sound like one of us, but act like one of them. Their popularity is inversely proportional to their humanity.

Extreme examples, I know. This is me exploring thoughts though, and harsh light helps define the edges. Feel free to sound off if it offends, but mind your non-verbal communication. :)

That brings me to self-promotion versus self-aggrandisement; there’s a big difference between the two. As independent designers and developer-type people, self-promotion is good, necessary, and often mutually beneficial. It’s about goodwill. It connects us to each other and lubricates the Web. We need it. Self-aggrandisement is coarse, obvious, and often an act of denial; the odour of insecurity or arrogance is nauseating. It is to be avoided.

If you consider the difference between a show-off and a celebrant, perhaps it will be clearer what I’m reaching for:

The very best form of self-promotion is celebration. To celebrate is to share the joy of what you do (and critically also celebrate what others do) and invite folks to participate in the party. To show off is a weakness of character — an act that demands acknowledgement and accolade before the actor can feel the tragic joy of thinking themselves affirmed. To celebrate is to share joy. To show-off is to yearn for it.

It’s as tragic as the disdainful, casual arrogance of criticising the output of others less accomplished than oneself. Don’t be lazy now. Critique, if you please. Be bothered to help, or if you can’t hold back, have a little grace by being discreet and respectful. If you’re arrogant enough to think you have the right to treat anyone in the world badly, you grant them the right to reciprocate. Beware.

Celebrants don’t reserve their bandwidth for themselves. They don’t treat their friends like a tricky audience who may throw pennies at you at the end of the performance. They treat them like friends. It’s a pretty simple way of measuring whether what you publish is good: would I do/say/act the same way with my friends? Human scales are always the best scales.

So, this ends. I feel very out of practise at writing. It’s hard after a hiatus. These are a few thoughts that still feel partially-formed in my mind, but I hope there was a tiny snippet or two in there that fired off a few neurons in your brain. Not too many, though, it’s early yet. :)




n

Web Fonts, Dingbats, Icons, and Unicode

Yesterday, Cameron Koczon shared a link to the dingbat font, Pictos, by the talented, Drew Wilson. Cameron predicted that dingbats will soon be everywhere. Symbol fonts, yes, I thought. Dingbats? No, thanks. Jason Santa Maria replied:

@FictiveCameron I hope not, dingbat fonts sort of spit in the face of accessibility and semantics at the moment. We need better options.

Jason rightly pointed out the accessibility and semantic problems with dingbats. By mapping icons to letters or numbers in the character map, they are represented on the page by that icon. That’s what Pictos does. For example, by typing an ‘a’ on your keyboard, and setting Pictos as the font-face for that letter, the Pictos anchor icon is displayed.

Other folks suggested SVG and JS might be better, and other more novel workarounds to hide content from assistive technology like screen readers. All interesting, but either not workable in my view, or just a bit awkward.

Ralf Herrmann has an elegant CSS example that works well in Safari.

Falling down with CSS text-replacement

A CSS solution in an article from Pictos creator, Drew Wilson, relies on the fact that most of his icons are mapped to a character that forms part of the common name for that symbol. The article uses the delete icon as an example which is mapped to ‘d’. Using :before and :after pseudo-elements, Drew suggests you can kind-of wrangle the markup into something sort-of semantic. However, it starts to fall down fast. For example, a check mark (tick) is mapped to ‘3’. There’s nothing semantic about that. Clever replacement techniques just hide the evidence. It’s a hack. There’s nothing wrong with a hack here and there (as box model veterans well know) but the ends have to justify the means. The end of this story is not good as a VoiceOver test by Scott at Filament Group shows. In fairness to Drew Wilson, though, he goes on to say if in doubt, do it the old way, using his font to create a background image and deploy with a negative text-indent.

I agreed with Jason, and mentioned a half-formed idea:

@jasonsantamaria that’s exactly what I was thinking. Proper unicode mapping if possible, perhaps?

The conversation continued, and thanks to Jason, helped me refine the idea into this post.

Jon Hicks flagged a common problem for some Windows users where certain Unicode characters are displayed as ‘missing character’ glyphs depending on what character it is. I think most of the problems with dingbats or missing Unicode characters can be solved with web fonts and Unicode.

Rising with Unicode and web fonts

I’d love to be able to use custom icons via optimised web fonts. I want to do so accessibly and semantically, and have optimised font files. This is how it could be done:

  1. Map the icons in the font to the existing Unicode code points for those symbols wherever possible.

    Unicode code points already exist for many common symbols. Fonts could be tiny, fast, stand-alone symbol fonts. Existing typefaces could also be extended to contain symbols that match the style of individual widths, variants, slopes, and weights. Imagine a set of Clarendon or Gotham symbols for a moment. Wouldn’t that be a joy to behold?

    There may be a possibility that private code points could be used if a code-point does not exist for a symbol we need. Type designers, iconographers, and foundries might agree a common set of extended symbols. Alternatively, they could be proposed for inclusion in Unicode.

  2. Include the font with font-face.

    This assumes ubiquitous support (as any use of dingbats does) — we’re very nearly there. WOFF is coming to Safari and with a bit more campaigning we may even see WOFF on iPad soon.

  3. In HTML, reference the Unicode code points in UTF-8 using numeric character references.

    Unicode characters have corresponding numerical references. Named entities may not be rendered by XML parsers. Sean Coates reminded me that in many Cocoa apps in OS X the character map is accessible via a simple CMD+ALT+t shortcut. Ralf Herrmann mentioned that unicode characters ‘…have “speaking” descriptions (like Leftwards Arrow) and fall back nicely to system fonts.’

Limitations

  1. Accessibility: Limited Unicode / entity support in assistive devices.

    My friend and colleague, Jon Gibbins’s old tests in JAWS 7 show some of the inconsistencies. It seems some characters are read out, some ignored completely, and some read as a question mark. Not great, but perhaps Jon will post more about this in the future.

    Elizabeth Pyatt at Penn State university did some dingbat tests in screen readers. For real Unicode symbols, there are pronunciation files that increase the character repertoire of screen readers, like this file for phonetic characters. Symbols would benefit from one.

  2. Web fonts: font-face not supported.

    If font-face is not supported on certain devices like mobile phones, falling back to system fonts is problematic. Unicode symbols may not be present in any system fonts. If they are, for many designers, they will almost certainly be stylistically suboptimal. It is possible to detect font-face using the Paul Irish technique. Perhaps there could be a way to swap Unicode for images if font-face is not present.

Now, next, and a caveat

I can’t recommend using dingbats like Pictos, but the icons sure are useful as images. Beautifully crafted icon sets as carefully crafted fonts could be very useful for rapidly creating image icons for different resolution devices like the iPhone 4, and iPad.

Perhaps we could try and formulate a standard set of commonly used icons using the Unicode symbols range as a starting point. I’ve struggled to find a better visual list of the existing symbols than this Unicode symbol chart from Johannes Knabe.

Icons in fonts as Unicode symbols needs further testing in assistive devices and using font-face.

Last, but not least, I feel a bit cheeky making these suggestions. A little knowledge is a dangerous thing. Combine it with a bit of imagination, and it can be lethal. I have a limited knowledge about how fonts are created, and about Unicode. The real work would be done by others with deeper knowledge than I. I’d be fascinated to hear from Unicode, accessibility, or font experts to see if this is possible. I hope so. It feels to me like a much more elegant and sustainable solution for scalable icons than dingbat fonts.

For more on Unicode, read this long, but excellent, article recommended by my colleague, Andrei, the architect of Unicode and internationalization support in PHP 6: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.




n

2010 in Retrospect

Analog, Mapalong, more tries at trans-Atlantic sleep, Cuba, Fontdeck, and my youngest son entering school; it all happened in the last year. At the end of 2007, I wrote up the year very differently. After skipping a couple of years, this is a different wrap-up. To tell the truth I put this together for me, being the very worst of diarists. It meant searching through calendars, Aperture, and elsewhere. I hope it prompts me to keep a better diary. I give you: 2010 in pictures and words:

January

Albany Green, Bristol.

Analog.coop is still fresh after launching in December. We’re still a bit blown away by the response but decide not to do client work, but to make Mapalong instead. We jump through all kinds of hoops trying to make it happen, but ultimately it comes down to our friend and colleague, Chris Shiflett. He gets us going. It snows a lot in Bristol. The snow turns to ice. I slip around, occasionally grumpy, but mostly grinning like an idiot.

February

Morón, Cuba.

My family and I go to Cuba on our first ever all inclusive ‘package’ holiday. It’s a wonderful escape from winter, tempered by surreptitious trips out of the surreal, tourist-only island, to the other Cuba with an unofficial local guide. My boys love the jacuzzi, and sneaking into the gym. Z shoots his first arrow. Just after we return, he turns 4 years old. Now, he wants to go back.

March

DUMBO from the men’s loo at 10 Jay St. — home of Analog NY in Studio 612a.

I visit Chris in Brooklyn to work on Mapalong. We play football. Well, Chris plays. I cripple myself, and limp around a lot. At the same time I meet the irrepressible, Cameron Koczon. We all get drunk on good beer at Beer Table. Life is good. Cameron comes up with the Brooklyn Beta name. It starts to move from idea to action.

Just before Brooklyn, a discussion about First Things First opens during a talk at BathCamp. The follow-ups become passionate with posts like this straw man argument and a vociferous rejoinder.

April and May

In the garden, at home.

The sun comes out. The garden becomes the new studio. Alan Colville and Jon Gibbins stop by as we work on Mapalong. The hunt starts for a co-working space in Bristol. I write pieces about self-promotion and reversed type. Worn out from the sudden burst, I go quiet again.

June

Mild Bunch HQ!

We find a place for our Bristol co-working studio studio. Mild Bunch HQ is born! I design desks for the first time. Our first co-workers are Adam Robertson, Kester Limb, Eugene Getov, and Ben Coleman. Chris and I meet again across the Atlantic; he makes a flying visit to Bristol. The gentle pressure mounts on fellow Analogger, Jon Gibbins to come to Bristol, too. Something special begins. Beer Fridays have started.

Fontdeck!

Fontdeck comes out of private beta! Almost 17 months after Rich Rutter and I talked about a web fonts service in Brighton for the first time, the site was live thanks to the hard work of Clearleft and OmniTI. Now it features thousands of fonts prepared for the Web, and many of the best type designers and foundries in the world.

The Ulster Festival programme.

For the first time in around 15 years I visit Belfast. At the invitation of the Standardistas, Chris and Nik, Elliot Stocks and I talk typography at the Ulster Festival of Art and Design. We’re working on the Brooklyn Beta branding, so talk about that with a bit of neuroscience thrown in as food for thought. Belfast truly is a wonderful place with fantastic people. It made it hard to miss Build for the second time later in the year.

June was busier than it felt. :)

July

Mild Bunch summer; Pieminister, Ginger beer, and Milk Stout.

Summer arrived in earnest. X has a blast at his school sports day. I do, too. Mild Bunch HQ is liberally dosed with shared lunches from Herbert’s bakery and Licata’s deli, and beers on balmy evenings outside The Canteen with friends. That’s all the Mild Bunch is, a group of friends with a name that made us laugh; everyone of friendly disposition is welcome!

August

8Faces and .Net magazine.

8 Faces number 1 is published and sells out in a couple of hours. I was lucky enough to be interviewed, and to sweat over trying to narrow my choices. The .Net interview was me answering a few questions thrown my way from folks on Twitter. Great fun. Elliot, Samantha Cliffe, and I had spent a great day wandering around Montpelier taking pictures in the sun earlier in the year. One of her portraits of me appeared in both magazines. Later that month, I write about Web Fonts, Dingbats, Icons, and Unicode. It’s only my fourth post of the year.

Birthday cake made by my wife, Lowri.

Sometimes, some things strip me of words. Thank you.

September

East River Sunrise from 20 stories up at the home of Jessi and Creighton of Workshop.

The whole of Analog heads to Brooklyn for a Mapalong hack week with the Fictive Kin guys. We start to show it to friends and Brooklyn studio mates like Tina (Swiss Miss) who help us heaps. It’s a frantic week. I get to spend a bit of time with my Analog friend Andrei Zmievski who I haven’t seen in the flesh since 2009. Everyone works and plays hard, and we stay in some fantastic places thanks to Cameron and AirBnB.

Cameron Koczon (front), Larry Legend (middle) and Jon Gibbins (far back with funky glove) in Studio 612a during hack week.

Just before I head to NY, Z starts big school. He looks too small to start. He’s 4. How did time pass so fast? I’m still wondering that after I get back.

October

Brooklyn Beta poster.

The whole of Analog, the Mild Bunch HQ and many others from Bristol, and as far away as Australia and India, head to New York for Brooklyn Beta! A poster whipped together my me, printed in a rush by Rik at Ripe, and transported to NY by Adam Robertson, is given as one of the souvenirs to everyone who comes.

Meanwhile, Jon Gibbins works frantically to get Mapalong ready to give BB an early glimpse of what we’re up to. Two thousand people reserve their usernames before we even go to private beta!

Brooklyn Beta!

Simon Collison giving his Analytical Design workshop on day 1.

Chris and Cameron work tirelessly. Many, many fine people lend a hand. We add some last minute touches to the site, like listing all the crew and attendees as well as the speakers. Cameron shows off Gimme Bar with an hilarious voice-over from Bedrich Rios. Alan narrates Mapalong and we introduce our mapping app to our peers and friends!

Day 2: Chris does technical fixes, Cameron tells jokes, and Cameron Moll waits with great poise for his talk to start.

It’s something we hoped, but never expected: Brooklyn Beta goes down as one of the best conferences ever in the eyes of veteran conference speakers and attendees. ‘Are you sure you’ve not done this before?’ I hear Jonathan Hoefler of Hoefler Frere-Jones ask Cameron. It makes me smile. The fact one of our sponsors asked this question in admiration of Chris and Cameron’s work meant a lot to me. I was proud of them, and grateful to everyone who helped it be something truly friendly, open, smart, and special.

Aftermath: Cameron (blury in action centre left) regales us at Mission Delores; Pat Lauke (left), Lisa Herod (back centre right), Nicholas Sloan (right).

The BB Flickr group has a lot of pictures and links to blog posts. Brooklyn Beta will return again in 2011!

November

Legoland, Windsor.

X turns 7. I realise he really isn’t such a toddler anymore. It took me a while even though he amazes me constantly with his vocabulary and eloquence. His birthday party ensues with a trip to Legoland on the last weekend of the season to watch fireworks and get into trouble. Fun times finding Yoda and the rest of the Star Wars posse battling each other below the Space Shuttle exhibit.

8 Faces

8 Faces number two is published after being announced at Build. Much of the month was spent juggling Mapalong work, and having a great time typesetting the selections spreads for each of the eight faces chosen by the interviewees. That, and worrying with Elliot how it might print with litho. It all turned out OK. I think.

The .Net Awards take place in London. Christened the ‘nutmeg’ awards thanks to iPhone auto-correction, I’m one of millions of judges. We use it as an excuse for a party. At the end of the month, lots of the Mild Bunch go to see Caribou at The Thekla. Good times.

December

Mapalong!

Mapalong goes into private beta! We start inviting many of the Brooklyn Beta folks, and others who’ve reserved their usernames. Lots of placemarks get added. Lots of feedback comes our way. Bug hunting starts. Next design steps start. We push frequently and add people as we go. Big things are planned for the new year!

Clove heart from Lowri.

The Mild Bunch Christmas do goes off with a bang thanks to Adam Robertson making sure it happened. Folks come from far and wide for a great party in The Big Chill Bar in Bristol. Lowri sneaks shots of Sambuca for the girls onto my tab, and we drink all the Innis and Gunn they have.

A few parties later, and the year draws to a close with a very traditional family Christmas in our house. Wood fires, music, the Christmas tree, and two small boys doing what kids do at Christmas. It’s just about perfect; A tonic to the background strife of the month, with a personal tragedy for me, and illness in my close family. Everything worked out OK. Steam-powered fairground rides, dressing up as dinosaurs, and detox follows with a bit of reflection. New Year’s Eve probably means staying in. Babysitters are like gold dust, but I just found we have one for tonight, so it looks like our celebration is coming early!

2011

In the new year, I’ll be mostly trying to do the best I can for my family, my colleagues, and myself. The only goals I have are to help my children be everything they can be, make Mapalong everything we wish it to be, and feel that calm, quiet sense of peace in the evening that only comes from a day well done. Other than that I’ll keep my mind open to serendipity. (…and do something about some bits of my site and the typesetting that’s bugging me after writing this. :)

If you made it this far, thank you, and here’s to you and yours in 2011; may the best of your past be the worst of your future!




n

Web Design as Narrative Architecture

Stories are everywhere. When they don’t exist we make up the narrative — we join the dots. We make cognitive leaps and fill in the bits of a story that are implied or missing. The same goes for websites. We make quick judgements based on a glimpse. Then we delve deeper. The narrative unfolds, or we create one as we browse.

Mark Bernstein penned Beyond Usability and Design: The Narrative Web for A List Apart in 2001. He wrote, ‘the reader’s journey through our site is a narrative experience’. I agreed wholeheartedly: Websites are narrative spaces where stories can be enacted, or emerge.

Henry Jenkins, Director of Comparative Media Studies, and Professor of Literature at MIT, wrote Game Design as Narrative Architecture. He suggested we think of game designers, ‘less as storytellers than as narrative architects’. I agree, and I think web designers are narrative architects, too. (Along with all the multitude of other roles we assume.) Much of what Henry Jenkins wrote applies to modern web design. In particular, he describes two kinds of narratives in game design that are relevant to us:

Enacted narratives are those where:

[…] the story itself may be structured around the character’s movement through space and the features of the environment may retard or accelerate that plot trajectory.

Sites like Amazon, New Adventures, or your portfolio are enacted narrative spaces: Shops or service brochures that want the audience to move through the site towards a specific set of actions like buying something or initiating contact.

Emergent narratives are those where:

[…] spaces are designed to be rich with narrative potential, enabling the story-constructing activity of players.

Sites like Flickr, Twitter, or Dribbble are emergent narrative spaces: Web applications that encourage their audience use the tools at their disposal to tell their own story. The audience defines how they want to use the narrative space, often with surprising results.

We often build both kinds of narrative spaces. Right now, my friends and I at Analog are working on Mapalong, a new maps-based app that’s just launched into private beta. At its heart Mapalong is about telling our stories. It’s one big map with a set of tools to view the world, add places, share them, and see the places others share. The aim is to help people tell their stories. We want to use three ideas to help you do that: Space (recording places, and annotating them), data (importing stuff we create elsewhere), and time (plotting our journeys, and recording all the places, people, and memories along the way). We know that people will find novel uses for the tools in Mapalong. In fact, we want them to because it will help us refine and build better tools. We work in an agile way because that’s the only way to design an emerging narrative space. Without realising it we’ve become architects of a narrative space, and you probably are, too.

Many projects like shops or brochure sites have fixed costs and objectives. They want to guide the audience to a specific set of actions. The site needs to be an enacted narrative space. Ideally, designers would observe behaviour and iterate. Failing that, a healthy dose of empathy can serve. Every site seeks to teach, educate, or inform. So, a bit of knowledge about people’s learning styles can be useful. I once did a course in one to one and small group training with the Chartered Institute of Personnel and Development. It introduced me to Peter Honey and Alan Mumford’s model which describes four different learning styles that are useful for us to know. I paraphrase:

  1. Activists like learning as they go; getting stuck in and working it out. They enjoy the here and now, and are happy to be dominated by immediate experiences. They are open-minded, not sceptical, and this tends to make them enthusiastic about anything new.
  2. Reflectors like being guided with time to take it all in and perhaps return later. They like to stand back to ponder experiences and observe them from many different perspectives. They collect data, both first hand and from others, and prefer to think about it thoroughly before coming to a conclusion.
  3. Theorists to understand and make logical sense of things before they leap in. They think problems through in a vertical, step-by-step logical way. They assimilate disparate facts into coherent theories.
  4. Pragmatists like practical applications of ideas, experiments, and results. They like trying out ideas, theories and techniques to see if they work in practice. They positively search out new ideas and take the first opportunity to experiment with applications.

Usually people share two or more of these qualities. The weight of each can vary depending on the context. So how might learning styles manifest themselves in web browsing behaviour?

  • Activists like to explore, learn as they go, and wander the site working it out. They need good in-context navigation to keep exploring. For example, signposts to related information are optimal for activists. They can just keep going, and going, and exploring until sated.
  • Reflectors are patient and thoughtful. They like to ponder, read, reflect, then decide. Guided tours to orientate them in emergent sites can be a great help. Saving shopping baskets for later, and remembering sessions in enacted sites can also help them.
  • Theorists want logic. Documentation. An understanding of what the site is, and what they might get from it. Clear, detailed information helps a theorist, whatever the space they’re in.
  • Pragmatists get stuck in like activists, but evaluate quickly, and test their assumptions. They are quick, and can be helped by uncluttered concise information, and contextual, logical tools.

An understanding of interactive narrative types and a bit of knowledge about learning styles can be useful concepts for us to bear in mind. I also think they warrant inclusion as part of an articulate designer’s language of web design. If Henry Jenkins is right about games designers, I think he could also be right about web designers: we are narrative architects, designing spaces where stories are told.

The original version of this article first appeared as ‘Jack A Nory’ alongside other, infinitely more excellent articles, in the New Adventures paper of January 2011. It is reproduced with the kind permission of the irrepressible Simon Collison. For a short time, the paper is still available as a PDF!

—∞—




n

Design Festival, The Setup, and Upcoming Posts

Wow, this has been a busy period. I’m just back from the Ampersand web typography conference in Brighton, and having a catch-up day in Mild Bunch HQ. Just before that I’ve been working flat out. First on Mapalong which was a grass-roots sponsor of Ampersand, and is going great guns. Then on an article for The Manual which is being published soon, and on 8 Faces #3 which is in progress right now. Not to mention the new talk for Ampersand which left me scratching my head and wondering if I was making any sense at all. More on that in a subsequent post.

In the meantime two previous events deserve a mention. (This is me starting more of a journalistic blog. :)

First of all, an interview with Simon Pascal Klien, the typographer and designer who’s curating the Design Festival podcast at the moment. We talked about all things web typography. Pascal cheekily left in a bit of noise from me in the prelude, and that rant pretty much sets the tone for the rest of the conversation. Thanks for your time, Pascal! If anyone reading this would care to listen in, the podcast can be downloaded or played from here:

Secondly, Daniel Bogan of The Setup sent me a few questions about my own tools. My answers are pretty clipped because of time, but you may find it interesting to compare this designer’s setup with your own:

I should note that in the meantime I’ve started writing with Writer, and discovered the great joy of keeping a journal and notes with a Midori Traveler’s Notebook. The latter is part of an on-going search I’m having to find Tools for Life. More on that, too at some point. Here’s my current list of topics I want to write about shortly:

  • Ampersand, the aftermath
  • Marrying a FujiFilm X100
  • No-www
  • Tools for life
  • Paper versus pixels

There, I’ve written it!




n

Ampersand, the Aftermath

The first Ampersand web typography conference took place in Brighton last Friday. Ampersand was ace. I’m going to say that again with emphasis: Ampersand was ace! Like the Ready Brek kid from the 80s TV ads I’m glowing with good vibes.

Imagine you’d just met some of the musicians that created the soundtrack to your life. That’s pretty much how I feel.

Nerves and all…

Photo by Ben Mitchell.

For a long, long time I’ve gazed across at the typography community with something akin to awe at the work they do. I’ve lurked quietly on the ATypI mailing list, in the Typophile forum, and behind the glass dividing my eyes from the blogs, portfolios, and galleries.

I always had a sneaking suspicion the web and type design communities had much in common: Excellence born from actual client work; techniques and skills refined by practice, not in a lab or classroom; a willingness to share and disseminate, most clearly demonstrated at Typophile and through web designer’s own blogs. The people of both professions have a very diverse set of backgrounds from graphic design all the way through to engineering, to accidentally working in a print shop. We’ve been apprenticed to our work, and Ampersand was a celebration of what we’ve achieved so far and what’s yet to come.

Of course, web design is a new profession. Type design has a history that spans hundreds of years. Nevertheless, both professions are self-actualising. Few courses exist of any real merit. There is no qualifications authority. The work from both arenas succeeds or fails based on whether it works or not.

Ampersand was the first event of its kind. Folks from both communities came together around the mutal fascination, frustration, challenge and opportunity of web type.

Like Brooklyn Beta, the audience was as fantastic as the line up. I met folks like Yves Peters of the FontFeed, Mike Duggan of Microsoft Typography, Jason Smith, Phil Garnham, Fernando Mello, and Emanuela Conidi of Fontsmith, Veronica Burian of TypeTogether, Adam Twardoch of Fontlab and MyFonts, Nick Sherman of of Webtype, Mandy Brown of A Book Apart and Typekit, and many, many others. (Sorry for stopping there, but wow, it would be a huge list.)

Rich Rutter

Rich Rutter opened the day on behalf of Clearleft and Fontdeck at the Brighton Dome. Rich and I had talked about a web typography conference before. He just went out and did it. Hats off to him, and people like Sophie Barrett at Clearleft who helped make the day run so smoothly.

Others have written comprehensive, insightful summaries of the day and the talks. Much better than I could, sitting there on the day, rapt, taking no notes. What follows are a few snippets my memory threw out when prodded.

Vincent Connare

Who knew the original letterforms for Comic Sans were inspired by a copy of The Watchmen Vincent Connare had in his office? Or that Vincent, who also designed Trebuchet, considers himself an engineer rather than type designer, and is working at the moment on the Ubuntu fonts with colleagues at Dalton Maag.

Jason Santa Maria declared himself a type nerd, and gave a supremely detailed talk about selecting, setting, and understanding web type. Wonderful stuff.

Jason Santa Maria

Jonathan Hoefler talked in rapid, articulate, and precise terms about the work behind upcoming release of pretty-much all of H&FJ’s typefaces as web fonts. (Hooray!) He clearly and wonderfully explained how they took the idea behind their typefaces, and moved them through a design process to produce a final form for a specific purpose. In this case, the web, as a distinct and different environment from print.

Jonathan Hoefler

Photo by Sean Johnson.

I spoke between Jason and Jonathan. Gulp. After staying up until 4am the night before, anxiously working on slides, I was carried along by the privilege and joy of being there, hopefully without too much mumbling or squinting with bleary eyes.

After lunch, David Berlow continued the story of web fonts, taking us on a journey through his own trials and tribulations at Font Bureau when re-producing typefaces for the web crude media. His dry, droll, richly-flavoured delivery was a humorous counterpoint to some controversial asides.

David Berlow

Photo by Jeremy Keith.

John Daggett of Mozilla, editor of the CSS3 Fonts Module, talked with great empathy for web designers about the amazing typographic advances we’re about to see in browsers.

Tim Brown of Typekit followed. Tim calmly and thoroughly advocated the extension of modular scales to all aspects of a web interface, taking values from the body type and building all elements with those values as the common denominator.

Finally, Mark Boulton wrapped up the day brilliantly, describing the designer’s role as the mitigator of entropy, reversing the natural trend for things to move from order to chaos, and a theme he’s exploring at the moment: designing from the content out.

Mark Boulton

The tone of the day was fun, thoughtful, articulate, and exacting. All the talks were a mix of anecdotal and observational humour, type nerdery, and most of all an overwhelming commitment to excellence in web typography. It was a journey in itself. Decades of experience from plate and press, screen, and web was being distilled into 45-minute presentations. I loved it.

As always, one of the most enjoyable bits for me was the hallway track. I talked to heaps of people both in the pre- and after-party, and in between the talks on the day itself. I heard stories, ideas, and opinions from print designers, web designers, type designers, font developers, and writers. We talked late into the night. We talked more the next day.

Now the talking has paused for a while, my thoughts are manifold. I can honestly say, I’ve never been so filled with positivity about where we are, and where we’re going. Web typography is here, it works, it’s better all the time, and one day web and type designers everywhere will wonder, perplexed, as they try to imagine what the web was like before.

Here’s to another Ampersand next year! I’m now going to see if Rich needs any encouragement to do it again. I’m guessing not, but if he does, I aim to provide it, vigorously. I hope I see you there!

Furthermore

Last but not least, did I mention that Rich Rutter, Mark Boulton, and I are writing a book? We are! More on that another time, but until then, follow @webtypography for intermittent updates.




n

We, Who Are Web Designers

In 2003, my wife Lowri and I went to a christening party. We were friends of the hosts but we knew almost no-one else there. Sitting next to me was a thirty-something woman and her husband, both dressed in the corporate ‘smart casual’ uniform: Jersey, knitwear, and ready-faded jeans for her, formal shoes and tucked-in formal shirt for him (plus the jeans of course; that’s the casual bit). Both appeared polite, neutral, and neat in every respect.

I smiled and said hello, and asked how they knew our hosts. The conversation stalled pretty quickly the way all conversations will when only one participant is engaged. I persevered, asked about their children who they mentioned, trying to be a good friend to our hosts by being friendly to other guests. It must have prompted her to reciprocate. With reluctant interest she asked the default question: ‘What do you do?’ I paused, uncertain for a second. ‘I’m a web designer’ I managed after a bit of nervous confusion at what exactly it was that I did. Her face managed to drop even as she smiled condescendingly. ‘Oh. White backgrounds!’ she replied with a mixture of scorn and delight. I paused. ‘Much of the time’, I nodded with an attempt at a self-deprecating smile, trying to maintain the camaraderie of the occasion. ‘What do you do?’ I asked, curious to see where her dismissal was coming from. ‘I’m the creative director for … agency’ she said smugly, overbearingly confident in the knowledge that she had a trump card, and had played it. The conversation was over.

I’d like to say her reaction didn’t matter to me, but it did. It stung to be regarded so disdainfully by someone who I would naturally have considered a colleague. I thought to try and explain. To mention how I started in print, too. To find out why she had such little respect for web design, but that was me wanting to be understood. I already knew why. Anything I said would sound defensive. She may have been rude, but at least she was honest.

I am a web designer. I neither concentrate on the party venue, food, music, guest list, or entertainment, but on it all. On the feeling people enter with and walk away remembering. That’s my job. It’s probably yours too.

I’m self-actualised, without the stamp of approval from any guild, curriculum authority, or academic institution. I’m web taught. Colleague taught. Empirically taught. Tempered by over fifteen years of failed experiments on late nights with misbehaving browsers. I learnt how to create venues because none existed. I learnt what music to play for the people I wanted at the event, and how to keep them entertained when they arrived. I empathised, failed, re-empathised, and did it again. I make sites that work. That’s my certificate. That’s my validation.

I try, just like you, to imbue my practice with an abiding sense of responsibility for the universality of the Web as Tim Berners-Lee described it. After all, it’s that very universality that’s allowed our profession and the Web to thrive. From the founding of the W3C in 1994, to Mosaic shipping with <img> tag support in 1993, to the Web Standards Project in 1998, and the CSS Zen Garden in 2003, those who care have been instrumental in shaping the Web. Web designers included. In more recent times I look to the web type revolution, driven and curated by both web designers, developers, and the typography community. Again, we’re teaching ourselves. The venues are open to all, and getting more amazing by the day.

Apart from the sites we’ve built, all the best peripheral resources that support our work are made by us. We’ve contributed vast amounts of code to our collective toolkit. We’ve created inspirational conferences like Brooklyn Beta, New Adventures, Web Directions, Build, An Event Apart, dConstruct, and Webstock. As a group, we’ve produced, written-for, and supported forward-thinking magazines like A List Apart, 8 Faces, Smashing Mag, and The Manual. We’ve written the books that distill our knowledge either independently or with publishers from our own community like Five Simple Steps and A Book Apart. We’ve created services and tools like jQuery, Fontdeck, Typekit, Hashgrid, Teuxdeux, and Firebug. That’s just a sample. There’s so many I haven’t mentioned. We did these things. What an extraordinary industry.

I know I flushed with anger and embarrassment that day at the christening party. Afterwards, I started to look a little deeper into what I do. I started to ask what exactly it means to be a web designer. I started to realise how extraordinary our community is. How extraordinary this profession is that we’ve created. How good the work is that we do. How delightful it is when it does work; for audiences, clients, and us. How fantastic it is that I help build the Web. Long may that feeling last. May it never go away. There’s so much still to learn, create, and make. This is my our party. Hi, I’m Jon; my friends and I are making Mapalong, and I’m a web designer.




n

Anakin

I’m pleased to be able to say that Analog is joining forces with Fictive Kin! We already work together on Brooklyn Beta and other projects. We share the same ethics and ambitions. We have fun together. We’re good friends, and often old friends, too. They asked, we said yes. We had a beer. That’s pretty much how it went.

The Analog adventure that started in 2009 continues, just bigger, and hopefully better, with more fun, inspiration, and collaboration. I like to think of it like assembling Dai-X from Star Fleet but Voltron may serve if that’s more familiar to you.

It think Cameron came up with the word ‘Anakin’ from our two brands in 2010. It’s been used by us ever since whenever we get together to work or celebrate. It made me smile at the time, and even more so now, just like the echoes of the Analog identity in the lovely new Fictive Kin logo.

If you fancy keeping an eye on what we’re up to, follow @fictivekin. There’s more on the Fictive Kin blog, and in a post by Chris. Thanks for reading. Onwards!




n

Auphonic Leveler 1.8 and Auphonic Multitrack 1.4 Updates

Today we released free updates for the Auphonic Leveler Batch Processor and the Auphonic Multitrack Processor with many algorithm improvements and bug fixes for Mac and Windows.

Changelog

  • Linear Filtering Algorithms to avoid Asymmetric Waveforms:
    New zero-phase Adaptive Filtering Algorithms to avoid asymmetric waveforms.
    In asymmetric waveforms, the positive and negative amplitude values are disproportionate - please see Asymmetric Waveforms: Should You Be Concerned?.
    Asymmetrical waveforms are quite natural and not necessarily a problem. They are particularly common on recordings of speech, vocals and can be caused by low-end filtering. However, they limit the amount of gain that can be safely applied without introducing distortion or clipping due to aggressive limiting.
  • Noise Reduction Improvements:
    New and improved noise profile estimation algorithms and bug fixes for parallel Noise Reduction Algorithms.
  • Processing Finished Notification on Mac:
    A system notification (including a short glass sound) is now displayed on Mac OS when the Auphonic Leveler or Auphonic Multitrack has finished processing - thanks to Timo Hetzel.
  • Improved Dithering:
    Improved dithering algorithms - using SoX - if a bit-depth reduction is necessary during file export.
  • Auphonic Multitrack Fixes:
    Fixes for ducking and background tracks and for very short music tracks.
  • New Desktop Apps Documentation:
    The documentation of our desktop apps is now integrated in our new help system:
    see Auphonic Leveler Batch Processor and Auphonic Multitrack Processor.
  • Bug Fixes and Audio Algorithm Improvements:
    This release also includes many small bug fixes and all audio algorithms come with improvements and updated classifiers using the data from our Web Service.

About the Auphonic Desktop Apps

We offer two desktop programs which include our audio algorithms only. The algorithms will be computed offline on your device and are exactly the same as implemented in our Web Service.

The Auphonic Leveler Batch Processor is a batch audio file processor and includes all our (Singletrack) Audio Post Production Algorithms. It can process multiple productions at once.

Auphonic Multitrack includes our Multitrack Post Production Algorithms and requires multiple parallel input audio tracks, which will be analyzed and processed individually as well as combined to create one final mixdown.

Upgrade now

Everyone is encouraged to download the latest binaries:

Please let us know if you have any questions or feedback!






n

Facebook Live Streaming and Audio/Video Hosting connected to Auphonic

Facebook is not only a social media giant, the company also provides valuable tools for broadcasting. Today we release a connection to Facebook, which allows to use the Facebook tools for video/audio production and publishing within Auphonic and our connected services.

The following workflows are possible with Facebook and Auphonic:
  • Use Facebook for live streaming, then import, process and distribute the audio/video with Auphonic.
  • Post your Auphonic audio or video productions directly to the news feed of your Facebook Page or User.
  • Use Facebook as a general media hosting service and share the link or embed the audio/video on any webpage (also visible to non-Facebook users).

Connect to Facebook

First you have to connect to a Facebook account at our External Services Page, click on the "Facebook" button.

Select if you want to connect to your personal Facebook User or to a Facebook Page:

It is always possible to remove or edit the connection in your Facebook Settings (Tab Business Integrations).

Import (Live) Videos from Facebook to Auphonic

Facebook Live is an easy (and free) way to stream live videos:

We implemented an interface to use Facebook as an Incoming External Service. Please select a (live or non-live) video from your Facebook Page/User as the source of a production and then process it with Auphonic:

This workflow allows you to use Facebook for live streaming, import and process the audio/video with Auphonic, then publish a podcast and video version of your live video to any of our connected services.

Export from Auphonic to Facebook

Similar to Youtube, it is possible to use Facebook for media file hosting.
Please add your Facebook Page/User as an External Service in your Productions or Presets to upload the Auphonic results directly to Facebook:

Options for the Facebook export:
  • Distribution Settings
    • Post to News Feed: The exported video is posted directly to your news feed / timeline.
    • Exclude from News Feed: The exported video is visible in the videos tab of your Facebook Page/User (see for example Auphonic's video tab), but it is not posted to your news feed (you can do that later if you want).
    • Secret: Only you can see the exported video, it is not shown in the Facebook video tab and it is not posted to your news feed (you can do that later if you want).
  • Embeddable
    Choose if the exported video should be embeddable in third-party websites.

It is always possible to change the distribution/privacy and embeddable options later directly on Facebook. For example, you can export a video to Facebook as Secret and publish it to your news feed whenever you want.


If your production is audio-only, we automatically generate a video track from the Cover Image and (possible) Chapter Images.
Alternatively you can select an Audiogram Output File, if you want to add an Audiogram (audio waveform visualization) to your Facebook video - for details please see Auphonic Audiogram Generator.

Auphonic Title and Description metadata fields are exported to Facebook as well.
If you add Speech Recognition to your production, we create an SRT file with the speech recognition results and add it to your Facebook video as captions.
See the example below.

Facebook Video Hosting Example with Audiogram and Automatic Captions

Facebook can be used as a general video hosting service: even if you export videos as Secret, you will get a direct link to the video which can be shared or embedded in any third-party websites. Users without a Facebook account are also able to view these videos.

In the example below, we automatically generate an Audiogram Video for an audio-only production, use our integrated Speech Recognition system to create captions and export the video as Secret to Facebook.
Afterwards it can be embedded directly into this blog post (enable Captions if they don't show up per default) - for details please see How to embed a video:

It is also possible to just use the generated result URL from Auphonic to share the link to your video (also visible to non-Facebook users):
https://www.facebook.com/auphonic/videos/1687244844638091/

Important Note:
Facebook needs some time to process an exported video (up to a few minutes) and the direct video link won't work before the processing is finished - please try again a bit later!
On Facebook Pages, you can see the processing progress in your Video Library.

Conclusion

Facebook has many broadcasting tools to offer and is a perfect addition to Auphonic.
Both systems and our other external services can be used to create automated processing and publishing workflows. Furthermore, the export and import to/from Facebook is also fully supported in the Auphonic API.

Please contact us if you have any questions or further ideas!




n

Auphonic Audio Inspector Release

At the Subscribe 9 Conference, we presented the first version of our new Audio Inspector:
The Auphonic Audio Inspector is shown on the status page of a finished production and displays details about what our algorithms are changing in audio files.

A screenshot of the Auphonic Audio Inspector on the status page of a finished Multitrack Production.
Please click on the screenshot to see it in full resolution!

It is possible to zoom and scroll within audio waveforms and the Audio Inspector might be used to manually check production result and input files.

In this blog post, we will discuss the usage and all current visualizations of the Inspector.
If you just want to try the Auphonic Audio Inspector yourself, take a look at this Multitrack Audio Inspector Example.

Inspector Usage

Control bar of the Audio Inspector with scrollbar, play button, current playback position and length, button to show input audio file(s), zoom in/out, toggle legend and a button to switch to fullscreen mode.

Seek in Audio Files
Click or tap inside the waveform to seek in files. The red playhead will show the current audio position.
Zoom In/Out
Use the zoom buttons ([+] and [-]), the mouse wheel or zoom gestures on touch devices to zoom in/out the audio waveform.
Scroll Waveforms
If zoomed in, use the scrollbar or drag the audio waveform directly (with your mouse or on touch devices).
Show Legend
Click the [?] button to show or hide the Legend, which describes details about the visualizations of the audio waveform.
Show Stats
Use the Show Stats link to display Audio Processing Statistics of a production.
Show Input Track(s)
Click Show Input to show or hide input track(s) of a production: now you can see and listen to input and output files for a detailed comparison. Please click directly on the waveform to switch/unmute a track - muted tracks are grayed out slightly:

Showing four input tracks and the Auphonic output of a multitrack production.

Please click on the fullscreen button (bottom right) to switch to fullscreen mode.
Now the audio tracks use all available screen space to see all waveform details:

A multitrack production with output and all input tracks in fullscreen mode.
Please click on the screenshot to see it in full resolution.

In fullscreen mode, it’s also possible to control playback and zooming with keyboard shortcuts:
Press [Space] to start/pause playback, use [+] to zoom in and [-] to zoom out.

Singletrack Algorithms Inspector

First, we discuss the analysis data of our Singletrack Post Production Algorithms.

The audio levels of output and input files, measured according to the ITU-R BS.1770 specification, are displayed directly as the audio waveform. Click on Show Input to see the input and output file. Only one file is played at a time, click directly on the Input or Output track to unmute a file for playback:

Singletrack Production with opened input file.
See the first Leveler Audio Example to try the audio inspector yourself.

Waveform Segments: Music and Speech (gold, blue)
Music/Speech segments are displayed directly in the audio waveform: Music segments are plotted in gold/yellow, speech segments in blue (or light/dark blue).
Waveform Segments: Leveler High/No Amplification (dark, light blue)
Speech segments can be displayed in normal, dark or light blue: Dark blue means that the input signal was very quiet and contains speech, therefore the Adaptive Leveler has to use a high amplification value in this segment.
In light blue regions, the input signal was very quiet as well, but our classifiers decided that the signal should not be amplified (breathing, noise, background sounds, etc.).

Yellow/orange background segments display leveler fades.

Background Segments: Leveler Fade Up/Down (yellow, orange)
If the volume of an input file changes in a fast way, the Adaptive Leveler volume curve will increase/decrease very fast as well (= fade) and should be placed in speech pauses. Otherwise, if fades are too slow or during active speech, one will hear pumping speech artifacts.
Exact fade regions are plotted as yellow (fade up, volume increase) and orange (fade down, volume decrease) background segments in the audio inspector.

Horizontal red lines display noise and hum reduction profiles.

Horizontal Lines: Noise and Hum Reduction Profiles (red)
Our Noise and Hiss Reduction and Hum Reduction algorithms segment the audio file in regions with different background noise characteristics, which are displayed as red horizontal lines in the audio inspector (top lines for noise reduction, bottom lines for hum reduction).
Then a noise print is extracted in each region and a classifier decides if and how much noise reduction is necessary - this is plotted as a value in dB below the top red line.
The hum base frequency (50Hz or 60Hz) and the strength of all its partials is also classified in each region, the value in Hz above the bottom red line indicates the base frequency and whether hum reduction is necessary or not (no red line).

You can try the singletrack audio inspector yourself with our Leveler, Noise Reduction and Hum Reduction audio examples.

Multitrack Algorithms Inspector

If our Multitrack Post Production Algorithms are used, additional analysis data is shown in the audio inspector.

The audio levels of the output and all input tracks are measured according to the ITU-R BS.1770 specification and are displayed directly as the audio waveform. Click on Show Input to see all the input files with track labels and the output file. Only one file is played at a time, click directly into the track to unmute a file for playback:

Input Tracks: Waveform Segments, Background Segments and Horizontal Lines
Input tracks are displayed below the output file including their track names. The same data as in our Singletrack Algorithms Inspector is calculated and plotted separately in each input track:
Output Waveform Segments: Multiple Speakers and Music
Each speaker is plotted in a separate, blue-like color - in the example above we have 3 speakers (normal, light and dark blue) and you can see directly in the waveform when and which speaker is active.
Audio from music input tracks are always plotted in gold/yellow in the output waveform, please try to not mix music and speech parts in music tracks (see also Multitrack Best Practice)!

You can try the multitrack audio inspector yourself with our Multitrack Audio Inspector Example or our general Multitrack Audio Examples.

Ducking, Background and Foreground Segments

Music tracks can be set to Ducking, Foreground, Background or Auto - for more details please see Automatic Ducking, Foreground and Background Tracks.

Ducking Segments (light, dark orange)
In Ducking, the level of a music track is reduced if one of the speakers is active, which is plotted as a dark orange background segment in the output track.
Foreground music parts, where no speaker is active and the music track volume is not reduced, are displayed as light orange background segments in the output track.
Background Music Segments (dark orange background)
Here the whole music track is set to Background and won’t be amplified when speakers are inactive.
Background music parts are plotted as dark organge background segments in the output track.
Foreground Music Segments (light orange background)
Here the whole music track is set to Foreground and its level won’t be reduced when speakers are active.
Foreground music parts are plotted as light organge background segments in the output track.

You can try the ducking/background/foreground audio inspector yourself: Fore/Background/Ducking Audio Examples.

Audio Search, Chapters Marks and Video

Audio Search and Transcriptions
If our Automatic Speech Recognition Integration is used, a time-aligned transcription text will be shown above the waveform. You can use the search field to search and seek directly in the audio file.
See our Speech Recognition Audio Examples to try it yourself.
Chapters Marks
Chapter Mark start times are displayed in the audio waveform as black vertical lines.
The current chapter title is written above the waveform - see “This is Chapter 2” in the screenshot above.

A video production with output waveform, input waveform and transcriptions in fullscreen mode.
Please click on the screenshot to see it in full resolution.

Video Display
If you add a Video Format or Audiogram Output File to your production, the audio inspector will also show a separate video track in addition to the audio output and input tracks. The video playback will be synced to the audio of output and input tracks.

Supported Audio Formats

We use the native HTML5 audio element for playback and the aurora.js javascript audio decoders to support all common audio formats:

WAV, MP3, AAC/M4A and Opus
These formats are supported in all major browsers: Firefox, Chrome, Safari, Edge, iOS Safari and Chrome for Android.
FLAC
FLAC is supported in Firefox, Chrome, Edge and Chrome for Android - see FLAC audio format.
In Safari and iOS Safari, we use aurora.js to directly decode FLAC files in javascript, which works but uses much more CPU compared to native decoding!
ALAC
ALAC is not supported by any browser so far, therefore we use aurora.js to directly decode ALAC files in javascript. This works but uses much more CPU compared to native decoding!
Ogg Vorbis
Only supported by Firefox, Chrome and Chrome for Android - for details please see Ogg Vorbis audio format.

We suggest to use a recent Firefox or Chrome browser for best performance.
Decoding FLAC and ALAC files also works in Safari and iOS with the help of aurora.js, but javascript decoders need a lot of CPU and they sometimes have problems with exact scrolling and seeking.

Please see our blog post Audio File Formats and Bitrates for Podcasts for more details about audio formats.

Mobile Audio Inspector

Multiple responsive layouts were created to optimize the screen space usage on Android and iOS devices, so that the audio inspector is fully usable on mobile devices as well: tap into the waveform to set the playhead location, scroll horizontally to scroll waveforms, scroll vertically to scroll between tracks, use zoom gestures to zoom in/out, etc.

Unfortunately the fullscreen mode is not available on iOS devices (thanks to Apple), but it works on Android and is a really great way to inspect everything using all the available screen space:

Audio inspector in horizontal fullscreen mode on Android.

Conclusion

Try the Auphonic Audio Inspector yourself: take a look at our Audio Example Page or play with the Multitrack Audio Inspector Example.

The Audio Inspector will be shown in all productions which are created in our Web Service.
It might be used to manually check production result/input files and to send us detailed feedback about audio processing results.

Please let us know if you have some feedback or questions - more visualizations will be added in future!







n

Auphonic Add-ons for Adobe Audition and Adobe Premiere

The new Auphonic Audio Post Production Add-ons for Adobe allows you to use the Auphonic Web Service directly within Adobe Audition and Adobe Premiere (Mac and Windows):

Audition Multitrack Editor with the Auphonic Audio Post Production Add-on.
The Auphonic Add-on can be embedded directly inside the Adobe user interface.


It is possible to export tracks/projects from Audition/Premiere and process them with the Auphonic audio post production algorithms (loudness, leveling, noise reduction - see Audio Examples), use our Encoding/Tagging, Chapter Marks, Speech Recognition and trigger Publishing with one click.
Furthermore, you can import the result file of an Auphonic Production into Audition/Premiere.


Download the Auphonic Audio Post Production Add-ons for Adobe:

Auphonic Add-on for Adobe Audition

Audition Waveform Editor with the Auphonic Audio Post Production Add-on.
Metadata, Marker times and titles will be exported to Auphonic as well.

Export from Audition to Auphonic

You can upload the audio of your current active document (a Multitrack Session or a Single Audio File) to our Web Service.
In case of a Multitrack Session, a mixdown will be computed automatically to create a Singletrack Production in our Web Service.
Unfortunately, it is not possible to export the individual tracks in Audition, which could be used to create Multitrack Productions.

Metadata and Markers
All metadata (see tab Metadata in Audition) and markers (see tab Marker in Audition and the Waveform Editor Screenshot) will be exported to Auphonic as well.
Marker times and titles are used to create Chapter Marks (Enhanced Podcasts) in your Auphonic output files.
Auphonic Presets
You can optionally choose an Auphonic Preset to use previously stored settings for your production.
Start Production and Upload & Edit Buttons
Click Upload & Edit to upload your audio and create a new Production for further editing. After the upload, a web browser will be started to edit/adjust the production and start it manually.
Click Start Production to upload your audio, create a new Production and start it directly without further editing. A web browser will be started to see the results of your production.
Audio Compression
Uncompressed Multitrack Sessions or audio files in Audition (WAV, AIFF, RAW, etc.) will be compressed automatically with lossless codecs to speed up the upload time without a loss in audio quality.
FLAC is used as lossless codec on Windows and Mac OS (>= 10.13), older Mac OS systems (< 10.13) do not support FLAC and use ALAC instead.

Import Auphonic Productions in Audition

To import the result of an Auphonic Production into Audition, choose the corresponding production and click Import.
The result file will be downloaded from the Auphonic servers and can be used within Audition. If the production contains multiple Output File Formats, the output file with the highest bitrate (or uncompressed/lossless if available) will be chosen.

Auphonic Add-on for Adobe Premiere

Premiere Video Editor with the Auphonic Audio Post Production Add-on.
The Auphonic Add-on can be embedded directly inside the Adobe Premiere user interface.

Export from Premiere to Auphonic

You can upload the audio of your current Active Sequence in Premiere to our Web Service.

We will automatically create an audio-only mixdown of all enabled audio tracks in your current Active Sequence.
Video/Image tracks are ignored: no video will be rendered or uploaded to Auphonic!
If you want to export a specific audio track, please just mute the other tracks.

Start Production and Upload & Edit Buttons
Click Upload & Edit to upload your audio and create a new Production for further editing. After the upload, a web browser will be started to edit/adjust the production and start it manually.
Click Start Production to upload your audio, create a new Production and start it directly without further editing. A web browser will be started to see the results of your production.
Auphonic Presets
You can optionally choose an Auphonic Preset to use previously stored settings for your production.
Chapter Markers
Chapter Markers in Premiere (not all the other marker types!) will be exported to Auphonic as well and are used to create Chapter Marks (Enhanced Podcasts) in your Auphonic output files.
Audio Compression
The mixdown of your Active Sequence in Premiere will be compressed automatically with lossless codecs to speed up the upload time without a loss in audio quality.
FLAC is used as lossless codec on Windows and Mac OS (>= 10.13), older Mac OS systems (< 10.13) do not support FLAC and use ALAC instead.

Import Auphonic Productions in Premiere

To import the result of an Auphonic Production into Premiere, choose the corresponding production and click Import.
The result file will be downloaded from the Auphonic servers and can be used within Premiere. If the production contains multiple Output File Formats, the output file with the highest bitrate (or uncompressed/lossless if available) will be chosen.

Installation

Install our Add-ons for Audition and Premiere directly on the Adobe Add-ons website:

Auphonic Audio Post Production for Adobe Audition:
https://exchange.adobe.com/addons/products/20433

Auphonic Audio Post Production for Adobe Premiere:
https://exchange.adobe.com/addons/products/20429

The installation requires the Adobe Creative Cloud desktop application and might take a few minutes. Please also also try to restart Audition/Premiere if the installation does not work (on Windows it was once even necessary to restart the computer to trigger the installation).


After the installation, you can start our Add-ons directly in Audition/Premiere:
navigate to Window -> Extensions and click Auphonic Post Production.

Enjoy

Thanks a lot to Durin Gleaves and Charles Van Winkle from Adobe for their great support!

Please let us know if you have any questions or feedback!







n

New Auphonic Privacy Policy and GDPR Compliance

The new General Data Protection Regulation (GDPR) of the European Union will be implemented on May 25th, 2018. We used this opportunity to rework many of our internal data processing structures, removed unnecessary trackers and apply this strict and transparent regulation also to all our customers worldwide.

Image from pixapay.com.

At Auphonic we store as few personal information as possible about your usage and production data.
Here are a few human-readable excerpts from our privacy policy about which information we collect, how we process it, how long and where we store it - for more details please see our full Privacy Policy.

Information that we collect

  • Your email address when you create an account.
  • Your files, content, configuration parameters and other information, including your photos, audio or video files, production settings, metadata and emails.
  • Your tokens or authentication information if you choose to connect to any External services.
  • Your subscription plan, credits purchases and production billing history associated with your account, where applicable.
  • Your interactions with us, whether by email, on our blog or on our social media platforms.

We do not process any special categories of data (also commonly referred to as “sensitive personal data”).

How we use and process your Data

  • To authenticate you when you log on to your account.
  • To run your Productions, such that Auphonic can create new media files from your Content according to your instructions.
  • To improve our audio processing algorithms. For this purpose, you agree that your Content may be viewed and/or listened to by an Auphonic employee or any person contracted by Auphonic to work on our audio processing algorithms.
  • To connect your Auphonic account to an External service according to your instructions.
  • To develop, improve and optimize the contents, screen layouts and features of our Services.
  • To follow up on any question and request for assistance or information.

When using our Service, you fully retain any rights that you have with regards to your Content, including copyright.

How long we store your Information

Your Productions and any associated audio or video files will be permanently deleted from our servers including all its metadata and possible data from external services after 21 days (7 days for video productions).
We will, however, keep billing metadata associated with your Productions in an internal database (how many hours of audio you processed).

Also, we might store selected audio and/or video files (or excerpts thereof) from your Content in an internal storage space for the purpose of improving our audio processing algorithms.

Other information like Presets, connected External services, Account settings etc. will be stored until you delete them or when your account is deleted.

Where we store your Data

All data that we collect from you is stored on secure servers in the European Economic Area (in Germany).

More Information and Contact

For more information please read our full Privacy Policy.

Please do not hesitate to contact us regarding any matter relating to our privacy policy and GDPR compliance!







n

Codec2: a whole Podcast on a Floppy Disk

In a previous blogpost we talked about the Opus codec, which offers very low bitrates. Another codec seeking to achieve even lower bitrates is Codec 2.

Codec 2 is designed for use with speech only, and although the bitrates are impressive the results aren’t as clear as Opus, as you can hear in the following audio examples. However, there is some interesting work being done with Codec 2 in combination with neural network (WaveNets) that is yielding great results.

Layers of a WaveNet neural network.

Background

Codec 2 is an open source codec designed for speech, and aims for compression rates between 700bps and 3200bps (bits per seconds).

The man behind it, David Rowe, is an electronic engineer currently living in South Australia. He started the project in September 2009, with the main aim of improving low-cost radio communication for people living in remote areas of the world. With this in mind, he set out to develop a codec that would significantly reduce file sizes and the bandwidth required when streaming.

Another motivation according to David, was to be free from patented technologies used by closed source codes which he believes “require expensive and awkward licenses and are stifling innovation”. His belief is that this work can be done without requiring the use of patent protected codecs, so all his work is open source.

Potential Applications

Rowe’s perceived applications include VOIP trunking, voice over low bandwidth HF/VHF digital radio, (especially for amateur radio, so as to avoid issues with the use of proprietary codecs), and developing world and remote area communications, including military, police and emergency services.

Why we’re interested here at Auphonic is for its potential for longer podcasts, presentations and audiobooks, allowing for low storage and minimizing the effect of bad network connections.

How it Works

To achieve the lower rates sought, speech has to be reduced into the smallest possible information/data, and this means that the amount of redundant information that is transmitted has to be minimized.

To do this, Codec 2 uses harmonic sinusoidal speech coding. This splits the speech into 10 - 30ms segments, called frames. Each frame is then analysed for the fundamental frequency (or pitch), and the number of harmonics that fit into a 4Khz bandwidth. Further, for each of the harmonics within the 4khz range, the amplitude and phase are recorded.

This information is then coded, and the decoder reconstructs the audio based on this data.

Codec 2 Block diagrams - Encoder (left) And decoder (right)
Figure from Rowtel.

Audio Examples and Comparison with other Codecs

Whilst it all sounds great in theory, how does the reality match up? Let’s have a listen.

Here is a short wav audio file:

intro-orig.wav - 1.3 MB (download):

Applying Codec 2 (without the WaveNet decoder) at the different rates available, 3200bps, 2400bps,1600bps,1200bps and 700bps, we get:

3200bps (download):
2400bps (download):
1600bps (download):
1200bps (download):
700bps (download):

These examples show significantly reduced file sizes.
Putting that information more meaningfully in terms of how much storage you would need for an hour of audio:

  • At 3200bps, 1 hour of audio requires only 1.37MB (this would fit on one old 3½-inch floppy disk!)
  • A rate of 2400bps equates to 1.03MB/h
  • A rate of 1600bps equates to 0.68MB/h (Or approximately 2 hours of audio on one floppy disk!)
  • A rate of 1200bps equates to 0.51MB/h
  • A rate of 700bps equates to 0.3MB/h

So great compression, but the result is clearly not natural sounding.

As a comparison here is the same audio as a 8kb/s MP3:

MP3 at 8 kb/s - 23kb file size (download):

The file size is significantly larger than Codec 2 and the quality is arguably still not useable. You can clearly hear what is sometimes called sizzle - the weird metallic sounds you hear on low quality MP3s.

There is a final codec which is worth comparing, one that that seems to capture the two ideals of usable quality at low bitrates that we want: Opus.
Because of it's convincing low-bitrate performance, Auphonic already offers Opus encoding all the way down to 6 kbps, the lowest bitrate that Opus supports.

Comparing Opus at this 6 kbps rate to the 8kbps MP3 shows a significant improvement - although slightly muffled, it still sounds natural:

Opus at 6kbps (download):

Returning to Codec 2, and purely as s a bit of fun, here are some samples of Codec 2 on music! (Note that Codec 2 is not designed for music, it was only ever conceived for use on speech).

Original file (download):
As a 8kbps MP3 (download):

I personally couldn’t listen to the MP3 at this rate, so let’s listen to what Codec 2 does!

Codec 2 at different bitrates:

3200bps (download):
2400bps (download):
1600bps (download):
1200bps (download):
700bps (download):

As you can hear, it is not suitable for this application at all!

Codec 2 and WaveNet

As we have heard, despite the impressive bitrates achieved, the end result is not very natural sounding.
However, where it starts to get more interesting is the work done by W. Bastiaan Kleijn from Cornell University Library. He has been using with Codec 2 running at 2400bps on the coding side, but replaced the Codec 2 decoder with a WaveNet deep learning generative model (for more informationsee the paper Wavenet based low rate speech coding).

Here are some samples from the authors:

Codec Male Example
Original File
Codec 2
With WaveNet Decoder
Codec Female Example
Original File
Codec 2
With WaveNet Decoder

Comparing to Codec 2 you can hear a significant increase in quality, and if you compare to the original, there is not a significant decrease in quality.

David Rowe himself has stated that he considers the result to be "a game changer for low bit rate speech coding" and “as good an an 8000bps wideband speech codec”.

Conclusion

Whilst the (original) Codec 2 project represents very interesting work, it is limited, and the end result is not suited for podcasting. Also as we heard in the audio examples, it can only be used for voice recordings, and not music.

However, Codec 2 in combination with a WaveNet decoder improves the quality a lot and the low bitrate (2400bps) would be extremely interesting for podcasts and audiobooks distribution as well: one hour of audio would require only 1.03MB of storage!

Auphonic will add support for Codec 2 output files when the WaveNet decoder is in a usable form. For now we have just added support for Codec 2 input files.







n

New Auphonic Transcript Editor and Improved Speech Recognition Services

Back in late 2016, we introduced Speech Recognition at Auphonic. This allows our users to create transcripts of their recordings, and more usefully, this means podcasts become searchable.
Now we integrated two more speech recognition engines: Amazon Transcribe and Speechmatics. Whilst integrating these services, we also took the opportunity to develop a complete new Transcription Editor:

Screenshot of our Transcript Editor with word confidence highlighting and the edit bar.
Try out the Transcript Editor Examples yourself!


The new Auphonic Transcript Editor is included directly in our HTML transcript output file, displays word confidence values to instantly see which sections should be checked manually, supports direct audio playback, HTML/PDF/WebVTT export and allows you to share the editor with someone else for further editing.

The new services, Amazon Transcribe and Speechmatics, offer transcription quality improvements compared to our other integrated speech recognition services.
They also return word confidence values, timestamps and some punctuation, which is exported to our output files.

The Auphonic Transcript Editor

With the integration of the two new services offering improved recognition quality and word timestamps alongside confidence scores, we realized that we could leverage these improvements to give our users easy-to-use transcription editing.
Therefore we developed a new, open source transcript editor, which is embedded directly in our HTML output file and has been designed to make checking and editing transcripts as easy as possible.

Main features of our transcript editor:
  • Edit the transcription directly in the HTML document.
  • Show/hide word confidence, to instantly see which sections should be checked manually (if you use Amazon Transcribe or Speechmatics as speech recognition engine).
  • Listen to audio playback of specific words directly in the HTML editor.
  • Share the transcript editor with others: as the editor is embedded directly in the HTML file (no external dependencies), you can just send the HTML file to some else to manually check the automatically generated transcription.
  • Export the edited transcript to HTML, PDF or WebVTT.
  • Completely useable on all mobile devices and desktop browsers.

Examples: Try Out the Transcript Editor

Here are two examples of the new transcript editor, taken from our speech recognition audio examples page:

1. Singletrack Transcript Editor Example
Singletrack speech recognition example from the first 10 minutes of Common Sense 309 by Dan Carlin. Speechmatics was used as speech recognition engine without any keywords or further manual editing.
2. Multitrack Transcript Editor Example
A multitrack automatic speech recognition transcript example from the first 20 minutes of TV Eye on Marvel - Luke Cage S1E1. Amazon Transcribe was used as speech recognition engine without any further manual editing.
As this is a multitrack production, the transcript includes exact speaker names as well (try to edit them!).

Transcript Editing

By clicking the Edit Transcript button, a dashed box appears around the text. This indicates that the text is now freely editable on this page. Your changes can be saved by using one of the export options (see below).
If you make a mistake whilst editing, you can simply use the undo/redo function of the browser to undo or redo your changes.


When working with multitrack productions, another helpful feature is the ability to change all speaker names at once throughout the whole transcript just by editing one speaker. Simply click on an instance of a speaker title and change it to the appropriate name, this name will then appear throughout the whole transcript.

Word Confidence Highlighting

Word confidence values are shown visually in the transcript editor, highlighted in shades of red (see screenshot above). The shade of red is dependent on the actual word confidence value: The darker the red, the lower the confidence value. This means you can instantly see which sections you should check/re-work manually to increase the accuracy.

Once you have edited the highlighted text, it will be set to white again, so it’s easy to see which sections still require editing.
Use the button Add/Remove Highlighting to disable/enable word confidence highlighting.

NOTE: Word confidence values are only available in Amazon Transcribe or Speechmatics, not if you use our other integrated speech recognition services!

Audio Playback

The button Activate/Stop Play-on-click allows you to hear the audio playback of the section you click on (by clicking directly on the word in the transcript editor).
This is helpful in allowing you to check the accuracy of certain words by being able to listen to them directly whilst editing, without having to go back and try to find that section within your audio file.

If you use an External Service in your production to export the resulting audio file, we will automatically use the exported file in the transcript editor.
Otherwise we will use the output file generated by Auphonic. Please note that this file is password protected for the current Auphonic user and will be deleted in 21 days.

If no audio file is available in the transcript editor, or cannot be played because of the password protection, you will see the button Add Audio File to add a new audio file for playback.

Export Formats, Save/Share Transcript Editor

Click on the button Export... to see all export and saving/sharing options:

Save/Share Editor
The Save Editor button stores the whole transcript editor with all its current changes into a new HTML file. Use this button to save your changes for further editing or if you want to share your transcript with someone else for manual corrections (as the editor is embedded directly in the HTML file without any external dependencies).
Export HTML / Export PDF / Export WebVTT
Use one of these buttons to export the edited transcript to HTML (for WordPress, Word, etc.), to PDF (via the browser print function) or to WebVTT (so that the edited transcript can be used as subtitles or imported in web audio players of the Podlove Publisher or Podigee).
Every export format is rendered directly in the browser, no server needed.

Amazon Transcribe

The first of the two new services, Amazon Transcribe, offers accurate transcriptions in English and Spanish at low costs, including keywords, word confidence, timestamps, and punctuation.

UPDATE 2019:
Amazon Transcribe offers more languages now - please see Amazon Transcribe Features!

Pricing
The free tier offers 60 minutes of free usage a month for 12 months. After that, it is billed monthly at a rate of $0.0004 per second ($1.44/h).
More information is available at Amazon Transcribe Pricing.
Custom Vocabulary (Keywords) Support
Custom Vocabulary (called Keywords in Auphonic) gives you the ability to expand and customize the speech recognition vocabulary, specific to your case (i.e. product names, domain-specific terminology, or names of individuals).
The same feature is also available in the Google Cloud Speech API.
Timestamps, Word Confidence, and Punctuation
Amazon Transcribe returns a timestamp and confidence value for each word so that you can easily locate the audio in the original recording by searching for the text.
It also adds some punctuation, which is combined with our own punctuation and formatting automatically.

The high-quality (especially in combination with keywords) and low costs of Amazon Transcribe make it attractive, despite only currently supporting two languages.
However, the processing time of Amazon Transcribe is much slower compared to all our other integrated services!

Try it yourself:
Connect your Auphonic account with Amazon Transcribe at our External Services Page.

Speechmatics

Speechmatics offers accurate transcriptions in many languages including word confidence values, timestamps, and punctuation.

Many Languages
Speechmatics’ clear advantage is the sheer number of languages it supports (all major European and some Asiatic languages).
It also has a Global English feature, which supports different English accents during transcription.
Timestamps, Word Confidence, and Punctuation
Like Amazon, Speechmatics creates timestamps, word confidence values, and punctuation.
Pricing
Speechmatics is the most expensive speech recognition service at Auphonic.
Pricing starts at £0.06 per minute of audio and can be purchased in blocks of £10 or £100. This equates to a starting rate of about $4.78/h. Reduced rate of £0.05 per minute ($3.98/h) are available if purchasing £1,000 blocks.
They offer significant discounts for users requiring higher volumes. At this further reduced price point it is a similar cost to the Google Speech API (or lower). If you process a lot of content, you should contact them directly at sales@speechmatics.com and say that you wish to use it with Auphonic.
More information is available at Speechmatics Pricing.

Speechmatics offers high-quality transcripts in many languages. But these features do come at a price, it is the most expensive speech recognition services at Auphonic.

Unfortunately, their existing Custom Dictionary (keywords) feature, which would further improve the results, is not available in the Speechmatics API yet.

Try it yourself:
Connect your Auphonic account with Speechmatics at our External Services Page.

What do you think?

Any feedback about the new speech recognition services, especially about the recognition quality in various languages, is highly appreciated.

We would also like to hear any comments you have on the transcript editor particularly - is there anything missing, or anything that could be implemented better?
Please let us know!






n

Audio Manipulations and Dynamic Ad Insertion with the Auphonic API

We are pleased to announce a new Audio Inserts feature in the Auphonic API: audio inserts are separate audio files (like intros/outros), which will be inserted into your production at a defined offset.
This blog post shows how one can use this feature for Dynamic Ad Insertion and discusses other Audio Manipulation Methods of the Auphonic API.

API-only Feature

For the general podcasting hobbyist, or even for someone producing a regular podcast, the features that are accessible via our web interface are more than sufficient.

However, some of our users, like podcasting companies who integrate our services as part of their products, asked us for dynamic ad insertions. We teamed up with them to develop a way of making this work within the Auphonic API.

We are pleased therefore to announce audio inserts - a new feature that has been made part of our API. This feature is not available through the web interface though, it requires the use of our API.

Before we talk about audio inserts, let's talk about what you need to know about dynamic ad insertion!

Dynamic Ad Insertion

There are two ways of dealing with adverts within podcasts. In the first, adverts are recorded or edited into the podcast and are fixed, or baked in. The second method is to use dynamic insertion, whereby the adverts are not part of the podcast recording/file but can be inserted into the podcast afterwards, at any time.

This second approach would allow you to run new ad campaigns across your entire catalog of shows. As a podcaster this allows you to potentially generate new revenue from your old content.

As a hosting company, dynamic ad insertion allows you to choose up to date and relevant adverts across all the podcasts you host. You can make these adverts relevant by subject or location, for instance.

Your users can define the time for the ads and their podcast episode, you are then in control of the adverts you insert.

Audio Inserts in Auphonic

Whichever approach to adverts you are taking, using audio inserts can help you.

Audio inserts are separate audio files which will be inserted into your main single or multitrack production at your defined offset (in seconds).

When a separate audio file is inserted as part of your production, it creates a gap in the podcast audio file, shifting the audio back by the length of the insert. Helpfully, chapters and other time-based information like transcriptions are also shifted back when an insert is used.

The biggest advantage of this is that Auphonic will apply loudness normalization to the audio insert so, from an audio point of view, it matches the rest of the podcast.

Although created with dynamic ad insertion in mind, this feature can be used for any type of audio inserts: adverts, music songs, individual parts of a recording, etc. In the case of baked-in adverts, you could upload your already processed advert audio as an insert, without having to edit it into your podcast recording using a separate audio editing application.

Please note that audio inserts should already be edited and processed before using them in production. (This is usually the case with pre-recorded adverts anyway). The only algorithm that Auphonic applies to an audio insert is loudness normalization in order to match the loudness of the entire production. Auphonic does not add any other processing (i.e. no leveling, noise reduction etc).

Audio Inserts Coding Example

Here is a brief overview of how to use our API for audio inserts. Be warned, this section is coding heavy, so if this isn't your thing, feel free to move along to the next section!

You can add audio insert files with a call to https://auphonic.com/api/production/{uuid}/multi_input_files.json, where uuid is the UUID of your production.
Here is an example with two audio inserts from an https URL. The offset/position in the main audio file must be given in seconds:

curl -X POST -H "Content-Type: application/json" 
    https://auphonic.com/api/production/{uuid}/multi_input_files.json 
    -u username:password 
    -d '[
            {
                "input_file": "https://mydomain.com/my_audio_insert_1.wav",
                "type": "insert",
                "offset": 20.5
            },
            {
                "input_file": "https://mydomain.com/my_audio_insert_2.wav",
                "type": "insert",
                "offset": 120.3
            }
        ]'

More details showing how to use audio inserts in our API can be seen here.

Additional API Audio Manipulations

In addition to audio inserts, using the Auphonic API offers a number of other audio manipulation options, which are not available via the web interface:

Cut start/end of audio files: See Docs
In Single-track productions, this feature allows the user to cut the start and/or the end of the uploaded audio file. Crucially, time-based information such as chapters etc. will be shifted accordingly.
Fade In/Out time of audio files: See Docs
This allows you to set the fade in/out time (in ms) at the start/end of output files. The default fade time is 100ms, but values can be set between 0ms and 5000ms.
This feature is also available in our Auphonic Leveler Desktop App.
Adding intro and outro: See Docs
Automatically add intros and outros to your main audio input file, as it is also available in our web interface.
Add multiple intros or outros: See Docs
Using our API, you can also add multiple intros or outros to a production. These intros or outros are played in series.
Overlapping intros/outros: See Docs
This feature allows intros/outros to overlap either the main audio or the following/previous intros/outros.

Conclusion

If you haven't explored our API already, the new audio inserts feature allows for greater flexibility and also dynamic ad insertion.
If you offer online services to podcasters, the Auphonic API would also then allow you to pass on Auphonic's audio processing algorithms to your customers.

If this is of interest to you or you have any new feature suggestions that you feel could benefit your company, please get in touch. We are always happy to extend the functionality of our products!







n

Leveler Presets, LRA Target and Advanced Audio Parameters (Beta)

Lots of users have asked us about more customization and control over the sound of our audio algorithms in the past, so today, we have introduced some advanced algorithm parameters for our singletrack version in a private beta program!

The following new parameters are available:

UPDATE Nov. 2018:
We released a complete rework of the Adaptive Leveler parameters and the description here is not valid anymore!
Please see Auphonic Adaptive Leveler Customization (Beta Update)!

Please join our private beta program and let us know how you use these new features or if you need even more control!

Leveler Presets

Our Adaptive Leveler corrects level differences between speakers, between music and speech and will also apply dynamic range compression to achieve a balanced overall loudness. If you don't know about the Leveler yet, take a look at our Audio Examples.

Leveler presets are basically complete new leveling algorithms, which we have been working on in the past few months:
Our current Leveler tries to normalize all speakers to the same loudness. However, in some cases, you might want more or less loudness differences (dynamic range / loudness range) between the speakers and music segments, or more or less compression, etc.
For these use cases, we have developed additional Leveler Presets and the parameter Maximum Loudness Range.

The following Leveler presets are now available:
Preset Medium:
This is our current leveling algorithm as demonstrated in the Audio Examples.
Preset Hard:
The hard preset reacts faster and applies more gain and compression compared to the medium preset. It is built for recordings with extreme loudness differences, for example very quiet questions from the audience in a lecture recording, extremely soft and loud voices within one audio track, etc.
Preset Soft:
This preset reacts slower, applies less gain and compression compared to the medium preset. Use it if you want to keep more loudness differences (dynamic narration), if you want your voices to sound "less compressed/processed", for dynamic music (concert/classical recordings), background music, etc.
Preset Softer:
Like soft, but softer :)
Preset Speech Medium, Music Soft:
Uses the medium preset in speech segments and the soft preset in music segments. It is built for music live recordings or dynamic music mixes, where you want to amplify all speakers but keep the loudness differences within and between music segments.
Preset Medium, No Compressor:
Like the medium preset, but only (mid-term) leveling and no (short-term) compression is applied. This preset is optimal if you just use a Maximum Loudness Range Target and want to avoid any additional compression as much as possible.
Please let us know your use case, if you need more/other controls or if anything is confusing. The Leveler presets are still in private beta and can be changed as necessary!

Maximum Loudness Range (LRA) Target

The loudness range (LRA) indicates the variation of loudness over the course of a program and is measured in LU (loudness units) - for more details see Loudness Measurement and Normalization or EBU Tech 3342.

The parameter Max Loudness Range controls how much leveling is applied:
volume changes of our Adaptive Leveler will be restricted so that the loudness range of the output file is below the selected value.
High loudness range values will result in very dynamic output files, low loudness range values in compressed output audio. If the LRA value of your input file is already below the maximum loudness range value, no leveling at all will be applied.

It is also important which Leveler Preset you select, for example, if you use the soft(er) preset, it won't be possible to achieve very low loudness range targets.

Also, the Max Loudness Range parameter is not such a precise target value as the Loudness Target. The LRA of your output file might be off a few LU, as it is not reasonable to reach the exact target value.

Use Cases: The Maximum LRA parameter allows you to control the strength of our leveling algorithms, in combination with the parameter Leveler Preset. This might be used for automatic mixdowns with different LRA values for different target platforms (very compressed ones like mobile devices or Alexa, very dynamic ones like home cinema, etc.).

Maximum True Peak Level

This parameter sets the maximum allowed true peak level of the processed output file, which is controlled by the True Peak Limiter after our Global Loudness Normalization algorithms.

If set to Auto (which is the current default), a reasonable value according to the selected loudness target is used: -1dBTP for 23 LUFS (EBU R128) and higher, -2dBTP for -24 LUFS (ATSC A/85) and lower loudness targets.

The maximum true peak level parameter is already available in our desktop program.

Better Hum and Noise Reduction Controls

In addition to the parameter (Noise) Reduction Amount, we now offer two more parameters to control the combination of our Noise and Hum Reduction algorithms:
Hum Base Frequency:
Set the hum base frequency to 50Hz or 60Hz (if you know it), or use Auto to automatically detect the hum base frequency in each speech region.
Hum Reduction Amount:
Maximum hum reduction amount in dB, higher values remove more noise.
In Auto mode, a classifier decides how much hum reduction is necessary in each speech region. Set it to a custom value (> 0), if you prefer more hum reduction or want to bypass our classifier. Use Disable Dehum to disable hum reduction and use our noise reduction algorithms only.

Behavior of noise and hum reduction parameter combinations:

Noise Reduction Amount Hum Base Frequency Hum Reduction Amount
Auto Auto Auto Automatic hum and noise reduction
Auto or > 0 * Disabled No hum reduction, only denoise
Disabled 50Hz Auto or > 0 Force 50Hz hum reduction, no denoise
Disabled Auto Auto or > 0 Automatic dehum, no denoise
12dB 60Hz Auto or > 0 Always do dehum (60Hz) and denoise (12dB)

Advanced Parameters Private Beta and Feedback

At the moment the advanced algorithm parameters are for beta users only. This is to allow us to get user feedback, so we can change the parameters to suit user needs.
Please let us know your case studies, if you need any other algorithm parameters or if you have any questions!

Here are some private beta invitation codes:

y6KCBI4yo0 ksIFEsmI1y BDZec2a21V i4XRGLlVm2 0UDxuS0vbu aaBxi35sKN aaiDSZUbmY bu8lPF80Ih eMsSl6Sf8K DaWpsUnyjo
2YM00m8zDW wh7K2pPmSa jCX7mMy2OJ ZGvvhzCpTF HI0lmGhjVO eXqVhN6QLU t4BH0tYcxY LMjQREVuOx emIogTCAth 0OTPNB7Coz
VIFY8STj2f eKzRSWzOyv 40cMMKKCMN oBruOxBkqS YGgPem6Ne7 BaaFG9I1xZ iSC0aNXoLn ZaS4TykKIa l32bTSBbAx xXWraxS40J
zGtwRJeAKy mVsx489P5k 6SZM5HjkxS QmzdFYOIpf 500AHHtEFA 7Kvk6JRU66 z7ATzwado6 4QEtpzeKzC c9qt9Z1YXx pGSrDzbEED
MP3JUTdnlf PDm2MOLJIG 3uDietVFSL 1i7jZX0Y9e zPkSgmAqqP 5OhcmHIZUP E0vNsPxZ4s FzTIyZIG2r 5EywA0M7r5 FMhpcFkVN5
oRLbRGcRmI 2LTh8GlN7h Cjw6Z3cveP fayCewjE55 GbkyX89Lxu 4LpGZGZGgc iQV7CXYwkH pGLyQPgaha e3lhKDRUMs Skrei1tKIa
We are happy to send further invitation codes to all interested users - please do not hesitate to contact us!

If you have an invitation code, you can enter it here to activate the advanced audio algorithm parameters:
Auphonic Algorithm Parameters Private Beta Activation







n

Resumable File Uploads to Auphonic

Large file uploads in a web browser are problematic, even in 2018. If working with a poor network connection, uploads can fail and have to be retried from the start.

At Auphonic, our users have to upload large audio and video files, or multiple media files when creating a multitrack production. To minimize any potential issues, we integrated various external services which are specialized for large file transfers, like FTP, SFTP, Dropbox, Google Drive, S3, etc.

To further minimize issues, as of today we have also released resumable and chunked direct file uploads in the web browser to auphonic.com.

If you are not interested in the technical details, please just go to the section Resumable Uploads in Auphonic below.

The Problem with Large File Uploads in the Browser

If using either mobile networks (which remain fragile) or unstable WiFi connections, file uploads are often interrupted and will fail. There are also many areas in the world where connections are quite poor, which makes uploading big files frustrating.

After an interrupted file upload, the web browser must restart the whole upload from the start, which is a problem when it happens in the middle of a 4GB video file upload on a slow connection.
Furthermore, the longer an upload takes, the more likely it is to have a network glitch interrupting the upload, which then has to be retried from the start.

The Solution: Chunked, Resumable Uploads

To avoid user frustration, we need to be able to detect network errors and potentially resume an upload without having to restart it from the beginning.

To achieve this, we have to split a file upload in smaller chunks directly within the web browser, so that these chunks can then be sent to the server afterwards.
If an upload fails or the user wants to pause, it is possible to resume it later and only send those chunks that have not already been uploaded.
If there is a network interruption or change, the upload will be retried automatically.

Companies like Dropbox, Google, Amazon AWS etc. all have their own protocols and API's for chunked uploads, but there are also some open source implementations available, which offer resumable uploads:

resumable.js [link]:
"A JavaScript library providing multiple simultaneous, stable and resumable uploads via the HTML5 File API"
This solutions is a JavaScript library only and requires that the protocol is implemented on the server as well.
tus.io [link]:
"Open Protocol for Resumable File Uploads"
Tus.io offers a simple, cheap and reusable stack for clients and servers (in many languages). They have a blog with further information about resumable uploads, see tus blog.
plupload [link]:
A JavaScript library, similar to resumable.js, which requires a separate server implementation.

We chose to use resumable.js and developed our own server implementation.

Resumable Uploads in Auphonic

If you upload files to a singletrack or multitrack production, you will see the upload progress bar and a pause button, which is one way to pause and resume an upload:

It is also possible to close the browser completely or shut down your computer during the upload, then edit the production and upload the file again later. This will just resume the file upload from the position where it was stopped before.
(Previously uploaded chunks are saved for 24h on our servers, after that you have to start the whole upload again.)

In case of a network problem or if you switch to a different connection, we will resume the upload automatically.
This should solve many problems which were reported by some users in the past!

You can of course also use any of our external services for stable incoming and outgoing file transfers!

Do you still have Uploading Issues?

We hope that uploads to Auphonic are much more reliable now, even on poor connections.

If you still experience any problems, please let us know.
We are very happy about any bug reports and will do our best to fix them!







n

Auphonic Adaptive Leveler Customization (Beta Update)

In late August, we launched the private beta program of our advanced audio algorithm parameters. After feedback by our users and many new experiments, we are proud to release a complete rework of the Adaptive Leveler parameters:

In the previous version, we based our Adaptive Leveler parameters on the Loudness Range descriptor (LRA), which is included in the EBU R128 specification.
Although it worked, it turned out that it is very difficult to set a loudness range target for diverse audio content, which does include speech, background sounds, music parts, etc. The results were not predictable and it was hard to find good target values.
Therefore we developed our own algorithm to measure the dynamic range of audio signals, which works similarly for speech, music and other audio content.

The following advanced parameters for our Adaptive Leveler allow you to customize which parts of the audio should be leveled (foreground, all, speech, music, etc.), how much they should be leveled (dynamic range), and how much micro-dynamics compression should be applied.

To try out the new algorithms, please join our private beta program and let us know your feedback!

Leveler Preset

The Leveler Preset defines which parts of the audio should be adjusted by our Adaptive Leveler:

  • Default Leveler:
    Our classic, default leveling algorithm as demonstrated in the Leveler Audio Examples. Use it if you are unsure.
  • Foreground Only Leveler:
    This preset reacts slower and levels foreground parts only. Use it if you have background speech or background music, which should not be amplified.
  • Fast Leveler:
    A preset which reacts much faster. It is built for recordings with fast and extreme loudness differences, for example, to amplify very quiet questions from the audience in a lecture recording, to balance fast-changing soft and loud voices within one audio track, etc.
  • Amplify Everything:
    Amplify as much as possible. Similar to the Fast Leveler, but also amplifies non-speech background sounds like noise.

Leveler Dynamic Range

Our default Leveler tries to normalize all speakers to a similar loudness so that a consumer in a car or subway doesn't feel the need to reach for the volume control.
However, in other environments (living room, cinema, etc.) or in dynamic recordings, you might want more level differences (Dynamic Range, Loudness Range / LRA) between speakers and within music segments.

The parameter Dynamic Range controls how much leveling is applied: Higher values result in more dynamic output audio files (less leveling). If you want to increase the dynamic range by 3dB (or LU), just increase the Dynamic Range parameter by 3dB.
We also like to call this Loudness Comfort Zone: above a maximum and below a minimum possible level (the comfort zone), no leveling is applied. So if your input file already has a small dynamic range (is within the comfort zone), our leveler will be just bypassed.

Example Use Cases:
Higher dynamic range values should be used if you want to keep more loudness differences in dynamic narration or dynamic music recordings (live concert/classical).
It is also possible to utilize this parameter to generate automatic mixdowns with different loudness range (LRA) values for different target environments (very compressed ones like mobile devices or Alexa, very dynamic ones like home cinema, etc.).

Compressor

Controls Micro-Dynamics Compression:
The compressor reduces the volume of short and loud spikes like "p", "t" or laughter ( short-term dynamics) and also shapes the sound of your voice (it will sound more or less "processed").
The Leveler, on the other hand, adjusts mid-term level differences, as done by a sound engineer, using the faders of an audio mixer, so that a listener doesn't have to adjust the playback volume all the time.
For more details please see Loudness Normalization and Compression of Podcasts and Speech Audio.

Possible values are:
  • Auto:
    The compressor setting depends on the selected Leveler Preset. Medium compression is used in Foreground Only and Default Leveler presets, Hard compression in our Fast Leveler and Amplify Everything presets.
  • Soft:
    Uses less compression.
  • Medium:
    Our default setting.
  • Hard:
    More compression, especially tries to compress short and extreme level overshoots. Use this preset if you want your voice to sound very processed, our if you have extreme and fast-changing level differences.
  • Off:
    No short-term dynamics compression is used at all, only mid-term leveling. Switch off the compressor if you just want to adjust the loudness range without any additional micro-dynamics compression.

Separate Music/Speech Parameters

Use the switch Separate MusicSpeech Parameters (top right), to see separate Adaptive Leveler parameters for music and speech segments, to control all leveling details separately for speech and music parts:

For dialog intelligibility improvements in films and TV, it is important that the speech/dialog level and loudness range is not too soft compared to the overall programme level and loudness range. This parameter allows you to use more leveling in speech parts while keeping music and FX elements less processed.
Note: Speech, music and overall loudness and loudness range of your production are also displayed in our Audio Processing Statistics!

Example Use Case:
Music live recordings or dynamic music mixes, where you want to amplify all speakers (speech dynamic range should be small) but keep the dynamic range within and between music segments (music dynamic range should be high).
Dialog intelligibility improvements for films and TV, without effecting music and FX elements.

Other Advanced Audio Algorithm Parameters

We also offer advanced audio parameters for our Noise, Hum Reduction and Global Loudness Normalization algorithms:

For more details, please see the Advanced Audio Algorithms Documentation.

Want to know more?

If you want to know more details about our advanced algorithm parameters (especially the leveler parameters), please listen to the following podcast interview with Chris Curran (Podcast Engineering School):
Auphonic’s New Advanced Features, with Georg Holzmann – PES 108

Advanced Parameters Private Beta and Feedback

At the moment the advanced algorithm parameters are for beta users only. This is to allow us to get user feedback, so we can change the parameters to suit user needs.
Please let us know your case studies, if you need any other algorithm parameters or if you have any questions!

Here are some private beta invitation codes:

jbwCVpLYrl 6zmLqq8o3z RXYIUbC6al QDmIZLuPKa JIrnGRZBgl SWQOWeZOBD ISeBCA9gTy w5FdsyhZVI qWAvANQ5mC twOjdHrit3
KwnL2Le6jB 63SE2V54KK G32AULFyaM 3H0CLYAwLU mp1GFNVZHr swzvEBRCVa rLcNJHUNZT CGGbL0O4q1 5o5dUjruJ9 hAggWBpGvj
ykJ57cFQSe 0OHAD2u1Dx RG4wSYTLbf UcsSYI78Md Xedr3NPCgK mI8gd7eDvO 0Au4gpUDJB mYLkvKYz1C ukrKoW5hoy S34sraR0BU
J2tlV0yNwX QwNdnStYD3 Zho9oZR2e9 jHdjgUq420 51zLbV09p4 c0cth0abCf 3iVBKHVKXU BK4kTbDQzt uTBEkMnSPv tg6cJtsMrZ
BdB8gFyhRg wBsLHg90GG EYwxVUZJGp HLQ72b65uH NNd415ktFS JIm2eTkxMX EV2C5RAUXI a3iwbxWjKj X1AT7DCD7V y0AFIrWo5l
We are happy to send further invitation codes to all interested users - please do not hesitate to contact us!

If you have an invitation code, you can enter it here to activate the advanced audio algorithm parameters:
Auphonic Algorithm Parameters Private Beta Activation







n

More Languages for Amazon Transcribe Speech Recognition

Until recently, Amazon Transcribe supported speech recognition in English and Spanish only.
Now they included French, Italian and Portuguese as well - and a few other languages (including German) are in private beta.

Update March 2019:
Now Amazon Transcribe supports German and Korean as well.

The Auphonic Audio Inspector on the status page of a finished Multitrack Production including speech recognition.
Please click on the screenshot to see it in full resolution!


Amazon Transcribe is integrated as speech recognition engine within Auphonic and offers accurate transcriptions (compared to other services) at low costs, including keywords / custom vocabulary support, word confidence, timestamps, and punctuation.
See the following AWS blog post and video for more information about recent Amazon Transcribe developments: Transcribe speech in three new languages: French, Italian, and Brazilian Portuguese.

Amazon Transcribe is also a perfect fit if you want to use our Transcript Editor because you will be able to see word timestamps and confidence values to instantly check which section/words should be corrected manually to increase the transcription accuracy:


Screenshot of our Transcript Editor with word confidence highlighting and the edit bar.

These features are also available if you use Speechmatics, but unfortunately not in our other integrated speech recognition services.

About Speech Recognition within Auphonic

Auphonic has built a layer on top of a few external speech recognition services to make audio searchable:
Our classifiers generate metadata during the analysis of an audio signal (music segments, silence, multiple speakers, etc.) to divide the audio file into small and meaningful segments, which are processed by the speech recognition engine. The results from all segments are then combined, and meaningful timestamps, simple punctuation and structuring are added to the resulting text.

To learn more about speech recognition within Auphonic, take a look at our Speech Recognition and Transcript Editor help pages or listen to our Speech Recognition Audio Examples.

A comparison table of our integrated services (price, quality, languages, speed, features, etc.) can be found here: Speech Recognition Services Comparison.

Conclusion

We hope that Amazon and others will continue to add new languages, to get accurate and inexpensive automatic speech recognition in many languages.

Don't hesitate to contact us if you have any questions or feedback about speech recognition or our transcript editor!






n

Advanced Multitrack Audio Algorithms Release (Beta)

Last weekend, at the Subscribe10 conference, we released Advanced Audio Algorithm Parameters for Multitrack Productions:

We launched our advanced audio algorithm parameters for Singletrack Productions last year. Now these settings (and more) are available for Multitrack Algorithms as well, which gives you detailed control for each track of your production.

The following new parameters are available:

Please join our private beta program and let us know how you use these new features or if you need even more control!

Fore/Background Settings

The parameter Fore/Background controls whether a track should be in foreground, in background, ducked, or unchanged, which is especially important for music or clip tracks.
For more details, please see Automatic Ducking, Foreground and Background Tracks .

We now added the new option Unchanged and a new parameter to set the level of background segments/tracks:
Unchanged (Foreground):
We sometimes received complaints from users, which produced very complex music or clip tracks, that Auphonic changes the levels too hard.
If you set the parameter Fore/Background to the new option Unchanged (Foreground), Level relations within this track won’t be changed at all. It will be added to the final mixdown so that foreground/solo parts of this track will be as loud as (foreground) speech from other tracks.
Background Level:
It is now possible to set the level of background segments/tracks (compared to foreground segments) in background and ducking tracks. By default, background and ducking segments are 18dB softer than foreground segments.

Leveler Parameters

Similar to our Singletrack Advanced Leveler Parameters (see this previous blog post), we also released leveling parameters for Multitrack Productions now.
The following advanced parameters for our Multitrack Adaptive Leveler can be set for each track and allow you to customize which parts of the audio should be leveled, how much they should be leveled, how much dynamic range compression should be applied and to set the stereo panorama (balance):

Leveler Preset:
Select the Speech or Music Leveler for this track.
If set to Automatic (default), a classifier will decide if this is a music or speech track.
Dynamic Range:
The parameter Dynamic Range controls how much leveling is applied: Higher values result in more dynamic output audio files (less leveling). If you want to increase the dynamic range by 3dB (or LU), just increase the Dynamic Range parameter by 3dB.
For more details, please see Multitrack Leveler Parameters.
Compressor:
Select a preset for Micro-Dynamics Compression: Auto, Soft, Medium, Hard or Off.
The Compressor adjusts short-term dynamics, whereas the Leveler adjusts mid-term level differences.
For more details, please see Multitrack Leveler Parameters.
Stereo Panorama (Balance):
Change the stereo panorama (balance for stereo input files) of the current track.
Possible values: L100, L75, L50, L25, Center, R25, R50, R75 and R100.

If you understand German and want to know more about our Advanced Leveler Parameters and audio dynamics in general, watch our talk at the Subscribe10 conference:
Video: Audio Lautheit und Dynamik.

Better Hum and Noise Reduction Controls

We now offer three parameters to control the combination of our Multitrack Noise and Hum Reduction Algorithms for each input track:
Noise Reduction Amount:
Maximum noise and hum reduction amount in dB, higher values remove more noise.
In Auto mode, a classifier decides if and how much noise reduction is necessary (to avoid artifacts). Set to a custom (non-Auto) value if you prefer more noise reduction or want to bypass our classifier.
Hum Base Frequency:
Set the hum base frequency to 50Hz or 60Hz (if you know it), or use Auto to automatically detect the hum base frequency in each speech region.
Hum Reduction Amount:
Maximum hum reduction amount in dB, higher values remove more noise.
In Auto mode, a classifier decides how much hum reduction is necessary in each speech region. Set it to a custom value (> 0), if you prefer more hum reduction or want to bypass our classifier. Use Disable Dehum to disable hum reduction and use our noise reduction algorithms only.

Behavior of noise and hum reduction parameter combinations:

Noise Reduction Amount Hum Base Frequency Hum Reduction Amount
Auto Auto Auto Automatic hum and noise reduction
Auto or > 0 * Disabled No hum reduction, only denoise
Disabled 50Hz Auto or > 0 Force 50Hz hum reduction, no denoise
Disabled Auto Auto or > 0 Automatic dehum, no denoise
12dB 60Hz Auto or > 0 Always do dehum (60Hz) and denoise (12dB)

Maximum True Peak Level

In the Master Algorithm Settings of your multitrack production, you can set the maximum allowed true peak level of the processed output file, which is controlled by the True Peak Limiter after our Loudness Normalization algorithms.

If set to Auto (which is the current default), a reasonable value according to the selected loudness target is used: -1dBTP for 23 LUFS (EBU R128) and higher, -2dBTP for -24 LUFS (ATSC A/85) and lower loudness targets.

Full API Support

All advanced algorithm parameters, for Singletrack and Multitrack Productions, are available in our API as well, which allows you to integrate them into your scripts, external workflows and third-party applications.

Singletrack API:
Documentation on how to use the advanced algorithm parameters in our singletrack production API: Advanced Algorithm Parameters
Multitrack API:
Documentation of advanced settings for each track of a multitrack production:
Multitrack Advanced Audio Algorithm Settings

Join the Beta and Send Feedback

Please join our beta and let us know your case studies, if you need any other algorithm parameters or if you have any questions!

Here are some private beta invitation codes:

8tZPc3T9pH VAvO8VsDg9 0TwKXBW4Ni kjXJMivtZ1 J9APmAAYjT Zwm6HabuFw HNK5gF8FR5 Do1MPHUyPW CTk45VbV4t xYOzDkEnWP
9XE4dZ0FxD 0Sl3PxDRho uSoRQxmKPx TCI62OjEYu 6EQaPYs7v4 reIJVOwIr8 7hPJqZmWfw kti3m5KbNE GoM2nF0AcN xHCbDC37O5
6PabLBRm9P j2SoI8peiY olQ2vsmnfV fqfxX4mWLO OozsiA8DWo weJw0PXDky VTnOfOiL6l B6HRr6gil0 so0AvM1Ryy NpPYsInFqm
oFeQPLwG0k HmCOkyaX9R G7DR5Sc9Kv MeQLSUCkge xCSvPTrTgl jyQKG3BWWA HCzWRxSrgW xP15hYKEDl 241gK62TrO Q56DHjT3r4
9TqWVZHZLE aWFMSWcuX8 x6FR5OTL43 Xf6tRpyP4S tDGbOUngU0 5BkOF2I264 cccHS0KveO dT29cF75gG 2ySWlYp1kp iJWPhpAimF
We are happy to send further invitation codes to all interested users - please do not hesitate to contact us!

If you have an invitation code, you can enter it here to activate the Multitrack Advanced Audio Algorithm Parameters:
Auphonic Algorithm Parameters Private Beta Activation







n

The New Loudness Target War

In the classic loudness war, music and radio producers have been trying to create their recordings as loud as possible and loudness normalization was introduced to stop that. Now one can see the start of a new loudness target war, where podcasters set their loudness targets higher and higher, mainly triggered by high target recommendations of platforms like Spotify or Amazon Alexa.
In this article, we will show how to resist the loudness target war and still be compliant with major platforms.

Resist the loudness target war! (Photo by Nayani Teixeira)

What's the problem?

“Two or three years ago it seemed that many stations were finally realizing that better radio could improve ratings. And the major myth brought over from AM radio – that a louder signal, regardless of quality, attracts more listeners – appeared to be losing its strength,” writes Robert Orban. The times of excessively compressed audio, putting loudness over sound quality, were coming to an end. We were hoping the same when we wrote about the CALM Act and EBU R128 in 2012. Those measures were meant to make programs sound more evened out and set a standard for a reasonable loudness level.

Except, Orban's article was published in 1979 (PDF, p. 60 ff.), and he concludes: "The loudness war has escalated, and quality is once again being sacrificed." 40 years later, a new loudness target war is emerging. While, yes, radio and TV stations have widely adopted the new standards, prominent competitors in the audio market are pushing for higher loudness targets once again.

Loudness war: the trend of increasing audio levels in recorded music over time. (Screenshot delamar.de)

Why LOUDER is not better

Historically, the loudness war has escalated with the advent of digital technology. Peak level normalization and quasi peak program meters (QPPM) encouraged producers to push audio signals to the limit. Just shy of clipping, signals could now be compressed to the highest possible levels, using multiband compressors and limiters. While this lifted quieter signals up, transforming a waveform into a brick, marketers thought that the louder songs on CDs and almost yelling voices on FM radio would attract listeners. On the other hand, reduced dynamics makes audio less interesting and can lead to listener fatigue, as Rip Rowan pointedly illustrated in his 2002 article "Over the Limit":

WHY IS THE LOUDER IS BETTER APPROACH THE WRONG APPROACH? BECAUSE WHEN ALL OF THE SIGNAL IS AT THE MAXIMUM LEVEL, THEN THERE IS NO WAY FOR THE SIGNAL TO HAVE ANY PUNCH. THE WHOLE THING COMES SCREAMING AT YOU LIKE A MESSAGE IN ALL CAPITAL LETTERS. AS WE ALL KNOW, WHEN YOU TYPE IN ALL CAPITAL LETTERS THERE ARE NO CUES TO HELP THE BRAIN MAKE SENSE OF THE SIGNAL, AND THE MIND TIRES QUICKLY OF TRYING TO PROCESS WHAT IS, BASICALLY, WHITE NOISE. LIKEWISE, A SIGNAL THAT JUST PEGS THE METERS CAUSES THE BRAIN TO REACT AS THOUGH IT IS BEING FED WHITE NOISE. WE SIMPLY FILTER IT OUT AND QUIT TRYING TO PROCESS IT.

Hence, many spoken word producers and broadcasters luckily wisened up and committed to new standards, based on loudness normalization instead of peak normalization. LUFS, Loudness Units relative to Full Scale, is a unit that measures an audio track's average loudness. All segments of a program can then be normalized to a certain LUFS value. As we have discussed before, -23 LUFS is standard now for broadcasters of the EBU, which for example has led to advertising segments no longer being much louder than the rest of any particular program. For a short while, it seemed like a peace agreement, or at least a truce had been achieved in the loudness war.

Human loudness perception is based on average levels instead of peak levels. (Screenshot theproaudiofiles.com)

Loudness Targets and Dynamic Range

However, while LUFS adoption is increasing across the industry, this doesn't mean that people have stopped trying to be louder than everybody else. And indeed -23 LUFS is not the one-size-fits-all value. We have recommended -16 LUFS for podcasts ourselves. The loudness of a production played in a cinema should be different from one made for earphones and noisy listening environments. Some headphones expect a louder signal and don't have enough gain to work with -23 LUFS under all circumstances.

The closer a production gets to 0 LUFS though, the less dynamic range can be reproduced. However, the dynamic range also depends on the dBTP, the maximum True Peak level, which indicates the level of the highest possible peak value. For instance, the aforementioned EBU R128 standard of -23 LUFS also defines a -1 dBTP. The difference between the LUFS and dBTP levels is called Peak to Loudness Range, PLR. For EBU R128 the PLR would be 22 LU (Loudness Units), a podcast with -16 LUFS and -1 dBTP has a PLR of 15 LU. Thus, the PLR is a measure of the maximum possible dynamic range of an audio production.

Platforms: Apple, Google, Amazon, and Spotify

Our already pretty high recommendation of -16 LUFS, -1 dBTP for mobile listening was also adopted by Apple's best practices for podcasts and recommendations for Google Assistant. Some are pushing it too far, though. A daily updated analysis by podnews shows that many podcasts are much louder than -16 LUFS.

Distribution of loudness across a selection of podcasts. (Screenshot podnews.net)

This might in part be due to specs published by other competitors in the audio space:

Amazon, for instance, recommends -14 LUFS at -2 dBTP for Alexa skills, meaning a PLR of only 12 LU. While this might work for Alexa's synthesized voice, which doesn't have much variability, it produces a dynamic range too low for spoken word content.
However, Amazon also says that a skill may be rejected if the program loudness is lower -19 LUFS or higher -9 LUFS, therefore a target of -16 LUFS is perfectly fine, which means that Alexa's robot voice leads by 2 LU compared to the audio content.

Spotify normalizes audio to the equivalent of about -14 LUFS, -1 dbTP, but they still use ReplayGain, which is not exactly the same as LUFS but gives similar results. Spotify mainly decreases the volume of overly loud productions, but can also increase the volume on some playback devices if the audio is much too soft.
For pop music, -14 LUFS is acceptable, but for podcasts or classical music, it is too high. However, (pop) music can always be played a bit louder compared to speech, therefore a loudness target of -16 LUFS for podcasts is fine on Spotify as well.

Make LUFS, not war

If producers just use the highest recommended target and therefore the loudness (target) war goes into another battle, we will hear more compressed, distorted voices once again, lacking emotion and many subtleties that are reflected in the dynamic range of how we speak.

Setting a loudness target higher than -16 LUFS does not improve the listening experience in any way. However, many productions would benefit from sensitively adjusting differences in dynamics throughout the production, lifting up quieter segments, lowering loud segments, and treating speech and music differently. As pointed out in a presentation recently (video, in German), you can do that directly in your DAW or use a leveler to automate dynamics processing.

Instead of just raising the loudness target, adjusting differences in levels can help to make the listening experience much more enjoyable.

Recent album releases suggest that the music industry is still in the middle of the loudness war, often limiting the peak to loudness range to single digit LU values. (By the way, your vinyl-buying friend is right, vinyl records do sound better because they don't work with the highest compression rates.) Hopefully, podcasters, radio producers and other spoken word artists, as well as the platforms that host and publish their productions, can resist the temptation of louder and louder audio.
Make (no more than -16) LUFS, not war!

Conclusion

With some podcasts, smart speakers and streaming platforms trying to be louder than the competition, the listening experience deteriorates. However, podcasts can sound great and loud enough even in noisy environments, when well produced:

  • Never set a loudness target higher than -16 LUFS for spoken word audio.
  • If your audio is too quiet, try lifting up quieter sections and reducing the volume of louder sections, directly in your DAW or by using a leveler.
  • These settings will work fine on all current platforms, including Amazon Alexa and Spotify.

Resist the loudness target war!







n

Dynamic Range Processing in Audio Post Production

If listeners find themselves using the volume up and down buttons a lot, level differences within your podcast or audio file are too big.
In this article, we are discussing why audio dynamic range processing (or leveling) is more important than loudness normalization, why it depends on factors like the listening environment and the individual character of the content, and why the loudness range descriptor (LRA) is only reliable for speech programs.

Photo by Alexey Ruban.

Why loudness normalization is not enough

Everybody who has lived in an apartment building knows the problem: you want to enjoy a movie late at night, but you're constantly on the edge - not only because of the thrilling story, but because your index finger is hovering over the volume down button of your remote. The next loud sound effect is going to come sooner rather than later, and you want to avoid waking up your neighbors with some gunshot sounds blasting from your TV.

In our previous post, we talked about the overall loudness of a production. While that's certainly important to keep in mind, the loudness target is only an average value, ignoring how much the loudness varies within a production. The loudness target of your movie might be in the ideal range, yet the level differences between a gunshot and someone whispering can still be enormous - having you turn the volume down for the former and up for the latter.

While the average loudness might be perfect, level differences can lead to an unpleasant listening experience.

Of course, this doesn't apply to movies alone. The image above shows a podcast or radio production. The loud section is music, the very quiet section just breathing, and the remaining sections are different voices.

To be clear, we're not saying that the above example is problematic per se. There are many situations, where a big difference in levels - a high dynamic range - is justified: for instance, in a movie theater, optimized for listening and without any outside noise, or in classical music.
Also, if the dynamic range is too small, listening can be tiring.

But if you watch the same movie in an outdoor screening in the summer on a beach next to the crashing waves or in the middle of a noisy city, it can be tricky to hear the softer parts.
Spoken word usually has a smaller dynamic range, and if you produce your podcast for a target audience of train or car commuters, the dynamic range should be even smaller, adjusting for the listening situation.

Therefore, hitting the loudness target has less impact on the listening experience than level differences (dynamic range) within one file!
What makes a suitable dynamic range does not only depend on the listening environment, but also on the nature of the content itself. If the dynamic range is too small, the audio can be tiring to listen to, whereas more variability in levels can make a program more interesting, but might not work in all environments, such as a noisy car.

Dynamic range experiment in a car

Wolfgang Rein, audio technician at SWR, a public broadcaster in Germany, did an experiment to test how drivers react to programs with different dynamic ranges. They monitored to what level drivers set the car stereo depending on speed (thus noise level) and audio dynamic range.
While the results are preliminary, it seems like drivers set the volume as low as possible so that they can still understand the content, but don't get distracted by loud sounds.

As drivers adjust the volume to the loudest voice in a program, they won't understand quieter speakers in content with a high dynamic range anymore. To some degree and for short periods of time, they can compensate by focusing more on the radio program, but over time that's tiring. Therefore, if the loudness varies too much, drivers tend to switch to another program rather than adjusting the volume.
Similar results have been found in a study conducted by NPR Labs and Towson University.

On the other hand, the perception was different in pure music programs. When drivers set the volume according to louder parts, they weren't able to hear softer segments or the beginning of a song very well. But that did not matter to them as much and didn't make them want to turn up the volume or switch the program.

Listener's reaction in response to frequent loudness changes. (from John Kean, Eli Johnson, Dr. Ellyn Sheffield: Study of Audio Loudness Range for Consumers in Various Listening Modes and Ambient Noise Levels)

Loudness comfort zone

The reaction of drivers to variable loudness hints at something that BBC sound engineer Mike Thornton calls the loudness comfort zone.

Tests (...) have shown that if the short-term loudness stays within the "comfort zone" then the consumer doesn’t feel the need to reach for the remote control to adjust the volume.
In a blog post, he highlights how the series Blue Planet 2 and Planet Earth 2 might not always have been the easiest to listen to. The graph below shows an excerpt with very loud music, followed by commentary just at the bottom of the green comfort zone. Thornton writes: "with the volume set at a level that was comfortable when the music was playing we couldn’t always hear the excellent commentary from Sir David Attenborough and had to resort to turning on the subtitles to be sure we knew what Sir David was saying!"

Planet Earth 2 Loudness Plot Excerpt. Colored green: comfort zone of +3 to -5LU around the loudness target. (from Mike Thornton: BBC Blue Planet 2 Latest Show In Firing Line For Sound Issues - Are They Right?)

As already mentioned above, a good mix considers the maximum and minimum possible loudness in the target listening environment.
In a movie theater the loudness comfort zone is big (loudness can vary a lot), and loud music is part of the fun, while quiet scenes work just as well. The opposite was true in the aforementioned experiment with drivers, where the loudness comfort zone is much smaller and quiet voices are difficult to understand.

Hence, the loudness comfort zone determines how much dynamic range an audio signal can use in a specific listening environment.

How to measure dynamic range: LRA

When producing audio for various environments, it would be great to have a target value for dynamic range, (the difference between the smallest and largest signal values of an audio signal) as well. Then you could just set a dynamic range target, similarly to a loudness target.

Theoretically, the maximum possible dynamic range of a production is defined by the bit-depth of the audio format. A 16-bit recording can have a dynamic range of 96 dB; for 24-bit, it's 144 dB - which is well above the approx. 120 dB the human ear can handle. However, most of those bits are typically being used to get to a reasonable base volume. Picture a glass of water: you want it to be almost full, with some headroom so that it doesn't spill when there's a sudden movement, i.e. a bigger amplitude wave at the top.

Determining the dynamic range of a production is easier said than done, though. It depends on which signals are included in the measurement: for example, if something like background music or breathing should be considered at all.
The currently preferred method for broadcasting is called Loudness Range, LRA. It is measured in Loudness Units (LU), and takes into account everything between the 10th and the 95th percentile of a loudness distribution, after an additional gating method. In other words, the loudest 5% and quietest 10% of the audio signal are being ignored. This way, quiet breathing or an occasional loud sound effect won't affect the measurement.

Loudness distribution and LRA for the film 'The Matrix'. Figure from EBU Tech Doc 3343 (p.13).

However, the main difficulty is which signals should be included in the loudness range measurement and which ones should be gated. This is unfortunately often very subjective and difficult to define with a purely statistical method like LRA.

Where LRA falls short

Therefore, only pure speech programs give reliable LRA values that are comparable!
For instance, a typical LRA for news programs is 3 LU; for talks and discussions 5 LU is common. LRA values for features, radio dramas, movies or music very much depend on the individual character and might be in the range between 5 and 25 LU.

To further illustrate this, here are some typical LRA values, according to a paper by Thomas Lund (table 2):

ProgramLoudness Range
Matrix, full movie25.0
NBC Interstitials, Jan. 2008, all together (3:30)9.4
Friends Episode 166.6
Speak Ref., Male, German, SQUAM Trk 546.2
Speak Ref., Female, French, SQUAM Trk 514.8
Speak Ref., Male, English, Sound Check3.3
Wish You Were Here, Pink Floyd22.1
Gilgamesh, Battle of Titans, Osaka Symph.19.7
Don’t Cry For Me Arg., Sinead O’Conner13.7
Beethoven Son in F, Op17, Kliegel & Tichman12.0
Rock’n Roll Train, AC/DC6.0
I.G.Y., Donald Fagen3.6

LRA values of music are very unpredictable as well.
For instance, Tom Frampton measured the LRA of songs in multiple genres, and the differences within each genre are quite big. The ten pop songs that he analyzed varied in LRA between 3.7 and 12 LU, country songs between 3.6 and 14.9 LU. In the Electronic genre the individual LRAs were between 3.7 and 15.2 LU. Please see the tables at the bottom of his blog post for more details.

We at Auphonic also tried to base our Adaptive Leveler parameters on the LRA descriptor. Although it worked, it turned out that it is very difficult to set a loudness range target for diverse audio content, which does include speech, background sounds, music parts, etc. The results were not predictable and it was hard to find good target values. Therefore we developed our own algorithm to measure the dynamic range of audio signals.

In conclusion, LRA comparisons are only useful for productions with spoken word only and the LRA value is therefore not applicable as a general dynamic range target value. The more complex a production gets, the more difficult it is to make any judgment based on the LRA.
This is, because the definition of LRA is purely statistical. There's no smart measurement using classifiers that distinguish between music, speech, quiet breathing, background noises and other types of audio. One would need a more intelligent algorithm (as we use in our Adaptive Leveler), that knows which audio segments should be included and excluded from the measurement.

From theory to application: tools

Loudness and dynamic range clearly is a complicated topic. Luckily, there are tools that can help. To keep short-term loudness in range, a compressor can help control sudden changes in loudness - such as p-pops or consonants like t or k. To achieve a good mid-term loudness, i.e. a signal that doesn't go outside the comfort zone too much, a leveler is a good option. Or, just use a fader or manually adjust volume curves. And to make sure that separate productions sound consistent, loudness normalization is the way to go. We have covered all of this in-depth before.

Looking at the audio from above again, with an adaptive leveler applied it looks like this:

Leveler example. Output at the top, input with leveler envelope at the bottom.

Now, the voices are evened out and the music is at a comfortable level, while the breathing has not been touched at all.
We recently extended Auphonic's adaptive leveler, so that it is now possible to customize the dynamic range - please see adaptive leveler customization and advanced multitrack audio algorithms.
If you wanted to increase the loudness comfort zone (or dynamic range) of the standard preset by 10 dB (or LU), for example, the envelope would look like this:

Leveler with higher dynamic range, only touching sections with extremely low or extremely high loudness to fit into a specific loudness comfort zone.

When a production is done, our adaptive leveler uses classifiers to also calculate the integrated loudness and loudness range of dialog and music sections separately. This way it is possible to just compare the dialog LRA and loudness of complex productions.

Assessing the LRA and loudness of dialog and music separately.

Conclusion

Getting audio dynamics right is not easy. Yet, it is an important thing to keep in mind, because focusing on loudness normalization alone is not enough. In fact, hitting the loudness target often has less impact on the listening experience than level differences, i.e. audio dynamics.

If the dynamic range is too small, the audio can be tiring to listen to, whereas a bigger dynamic range can make a program more interesting, but might not work in loud environments, such as a noisy train.
Therefore, a good mix adapts the audio dynamic range according to the target listening environment (different loudness comfort zones in cinema, at home, in a car) and according to the nature of the content (radio feature, movie, podcast, music, etc.).

Furthermore, because the definition of the loudness range / LRA is purely statistical, only speech programs give reliable LRA values that are comparable.
More "intelligent" algorithms are in development, which use classifiers to decide which signals should be included and excluded from the dynamic range measurement.

If you understand German, take a look at our presentation about audio dynamic processing in podcasts for further information:







n

Horizontal or/and Vertical Format in Kayak Photography

Like most paddlers I have a tendency to shoot pictures in a horizontal (landscape) format. It is more tricky to shoot in a vertical format from my tippy kayaks, especially, when I have to use a paddle to stabilize my camera.




n

Winter Stand Up Paddling on Horsetooth Reservoir

I love paddling on the Horsetooth Reservoir in cold season. Boat ramps are closed, no power boat traffic, usually quiet and calm. Snow and ice can enhance scenery. A great time to paddle, train, relax or photograph. The Horsetooth stays […]




n

How to Foster Real-Time Client Engagement During Moderated Research

When we conduct moderated research, like user interviews or usability tests, for our clients, we encourage them to observe as many sessions as possible. We find when clients see us interview their users, and get real-time responses, they’re able to learn about the needs of their users in real-time and be more active participants in the process. One way we help clients feel engaged with the process during remote sessions is to establish a real-time communication backchannel that empowers clients to flag responses they’d like to dig into further and to share their ideas for follow-up questions.

There are several benefits to establishing a communication backchannel for moderated sessions:

  • Everyone on the team, including both internal and client team members, can be actively involved throughout the data collection process rather than waiting to passively consume findings.
  • Team members can identify follow-up questions in real-time which allows the moderator to incorporate those questions during the current session, rather than just considering them for future sessions.
  • Subject matter experts can identify more detailed and specific follow-up questions that the moderator may not think to ask.
  • Even though the whole team is engaged, a single moderator still maintains control over the conversation which creates a consistent experience for the participant.

If you’re interested in creating your own backchannel, here are some tips to make the process work smoothly:

  • Use the chat tool that is already being used on the project. In most cases, we use a joint Slack workspace for the session backchannel but we’ve also used Microsoft Teams.
  • Create a dedicated channel like #moderated-sessions. Conversation in this channel should be limited to backchannel discussions during sessions. This keeps the communication consolidated and makes it easier for the moderator to stay focused during the session.
  • Keep communication limited. Channel participants should ask basic questions that are easy to consume quickly. Supplemental commentary and analysis should not take place in the dedicated channel.
  • Use emoji responses. The moderator can add a quick thumbs up to indicate that they’ve seen a question.

Introducing backchannels for communication during remote moderated sessions has been a beneficial change to our research process. It not only provides an easy way for clients to stay engaged during the data collection process but also increases the moderator’s ability to focus on the most important topics and to ask the most useful follow-up questions.




n

TTT in SPAAACE

By now, you’ve probably heard of TTT, our quarterly team events. If you haven’t, you should read all about their history. TTT, or Third Third Thursday, is a time for us to look back and look ahead. Twice a year, all four offices come together for an all-hands, conference-style experience. The other two TTTs are celebrated locally and casually. Each office meets for a round-table discussion followed by a fun activity out of the office.

In these meetings, we discuss team and industry changes and review business health metrics. Additionally, at each TTT, both our President, Andy Rankin, and CEO Brian Williams, directly field questions from any member of our team. At our TTTs we’ve talked about team diversity and tech ethics, celebrated our victories, and worked through our failures. The conversations have sparked new understanding, new initiatives, new processes, and have truly shaped the company over time. We come together in the spirit of “progress, not perfection.”

While each office is unique, and the conversation is tailored to and shaped by each audience, the People Team finds ways to make everyone’s TTT similar, particularly our afternoon activity, so we can bond over shared experiences, even miles apart. This summer, we all tried our hands at ax throwing, and just a few weeks ago each of our offices got to venture into Space.

Well, sort of.

After a morning meeting, Boulder visited the Fiske Planetarium at CU Boulder.

Durham visited UNC’s Morehead Planetarium.

And since the Smithsonian is refurbishing the Einstein Planetarium, our Falls Church office made our way to the Udvar Hazy center to catch an Imax show and fly a few jets, via simulator.

Each office also got a taste of space food trying Astronaut ice cream, to mixed reviews.

TTTs are more than fun snacks and field trips. They are about finding common ground with colleagues, challenging each other to grow, and re-connecting with folks you don’t work with day-to-day. They are about setting aside time for frank discussion across disciplines and experience levels, and getting outside the office for new perspectives. They are just a little part of what makes Viget so unique.

Are you ready to join us for our next big TTT adventure? It’s Viget20, and it’s going to be a good one. We're hiring.



  • News & Culture

n

Concurrency & Multithreading in iOS

Concurrency is the notion of multiple things happening at the same time. This is generally achieved either via time-slicing, or truly in parallel if multiple CPU cores are available to the host operating system. We've all experienced a lack of concurrency, most likely in the form of an app freezing up when running a heavy task. UI freezes don't necessarily occur due to the absence of concurrency — they could just be symptoms of buggy software — but software that doesn't take advantage of all the computational power at its disposal is going to create these freezes whenever it needs to do something resource-intensive. If you've profiled an app hanging in this way, you'll probably see a report that looks like this:

Anything related to file I/O, data processing, or networking usually warrants a background task (unless you have a very compelling excuse to halt the entire program). There aren't many reasons that these tasks should block your user from interacting with the rest of your application. Consider how much better the user experience of your app could be if instead, the profiler reported something like this:

Analyzing an image, processing a document or a piece of audio, or writing a sizeable chunk of data to disk are examples of tasks that could benefit greatly from being delegated to background threads. Let's dig into how we can enforce such behavior into our iOS applications.


A Brief History

In the olden days, the maximum amount of work per CPU cycle that a computer could perform was determined by the clock speed. As processor designs became more compact, heat and physical constraints started becoming limiting factors for higher clock speeds. Consequentially, chip manufacturers started adding additional processor cores on each chip in order to increase total performance. By increasing the number of cores, a single chip could execute more CPU instructions per cycle without increasing its speed, size, or thermal output. There's just one problem...

How can we take advantage of these extra cores? Multithreading.

Multithreading is an implementation handled by the host operating system to allow the creation and usage of n amount of threads. Its main purpose is to provide simultaneous execution of two or more parts of a program to utilize all available CPU time. Multithreading is a powerful technique to have in a programmer's toolbelt, but it comes with its own set of responsibilities. A common misconception is that multithreading requires a multi-core processor, but this isn't the case — single-core CPUs are perfectly capable of working on many threads, but we'll take a look in a bit as to why threading is a problem in the first place. Before we dive in, let's look at the nuances of what concurrency and parallelism mean using a simple diagram:

In the first situation presented above, we observe that tasks can run concurrently, but not in parallel. This is similar to having multiple conversations in a chatroom, and interleaving (context-switching) between them, but never truly conversing with two people at the same time. This is what we call concurrency. It is the illusion of multiple things happening at the same time when in reality, they're switching very quickly. Concurrency is about dealing with lots of things at the same time. Contrast this with the parallelism model, in which both tasks run simultaneously. Both execution models exhibit multithreading, which is the involvement of multiple threads working towards one common goal. Multithreading is a generalized technique for introducing a combination of concurrency and parallelism into your program.


The Burden of Threads

A modern multitasking operating system like iOS has hundreds of programs (or processes) running at any given moment. However, most of these programs are either system daemons or background processes that have very low memory footprint, so what is really needed is a way for individual applications to make use of the extra cores available. An application (process) can have many threads (sub-processes) operating on shared memory. Our goal is to be able to control these threads and use them to our advantage.

Historically, introducing concurrency to an app has required the creation of one or more threads. Threads are low-level constructs that need to be managed manually. A quick skim through Apple's Threaded Programming Guide is all it takes to see how much complexity threaded code adds to a codebase. In addition to building an app, the developer has to:

  • Responsibly create new threads, adjusting that number dynamically as system conditions change
  • Manage them carefully, deallocating them from memory once they have finished executing
  • Leverage synchronization mechanisms like mutexes, locks, and semaphores to orchestrate resource access between threads, adding even more overhead to application code
  • Mitigate risks associated with coding an application that assumes most of the costs associated with creating and maintaining any threads it uses, and not the host OS

This is unfortunate, as it adds enormous levels of complexity and risk without any guarantees of improved performance.


Grand Central Dispatch

iOS takes an asynchronous approach to solving the concurrency problem of managing threads. Asynchronous functions are common in most programming environments, and are often used to initiate tasks that might take a long time, like reading a file from the disk, or downloading a file from the web. When invoked, an asynchronous function executes some work behind the scenes to start a background task, but returns immediately, regardless of how long the original task might takes to actually complete.

A core technology that iOS provides for starting tasks asynchronously is Grand Central Dispatch (or GCD for short). GCD abstracts away thread management code and moves it down to the system level, exposing a light API to define tasks and execute them on an appropriate dispatch queue. GCD takes care of all thread management and scheduling, providing a holistic approach to task management and execution, while also providing better efficiency than traditional threads.

Let's take a look at the main components of GCD:

What've we got here? Let's start from the left:

  • DispatchQueue.main: The main thread, or the UI thread, is backed by a single serial queue. All tasks are executed in succession, so it is guaranteed that the order of execution is preserved. It is crucial that you ensure all UI updates are designated to this queue, and that you never run any blocking tasks on it. We want to ensure that the app's run loop (called CFRunLoop) is never blocked in order to maintain the highest framerate. Subsequently, the main queue has the highest priority, and any tasks pushed onto this queue will get executed immediately.
  • DispatchQueue.global: A set of global concurrent queues, each of which manage their own pool of threads. Depending on the priority of your task, you can specify which specific queue to execute your task on, although you should resort to using default most of the time. Because tasks on these queues are executed concurrently, it doesn't guarantee preservation of the order in which tasks were queued.

Notice how we're not dealing with individual threads anymore? We're dealing with queues which manage a pool of threads internally, and you will shortly see why queues are a much more sustainable approach to multhreading.

Serial Queues: The Main Thread

As an exercise, let's look at a snippet of code below, which gets fired when the user presses a button in the app. The expensive compute function can be anything. Let's pretend it is post-processing an image stored on the device.

import UIKit

class ViewController: UIViewController {
    @IBAction func handleTap(_ sender: Any) {
        compute()
    }

    private func compute() -> Void {
        // Pretending to post-process a large image.
        var counter = 0
        for _ in 0..<9999999 {
            counter += 1
        }
    }
}

At first glance, this may look harmless, but if you run this inside of a real app, the UI will freeze completely until the loop is terminated, which will take... a while. We can prove it by profiling this task in Instruments. You can fire up the Time Profiler module of Instruments by going to Xcode > Open Developer Tool > Instruments in Xcode's menu options. Let's look at the Threads module of the profiler and see where the CPU usage is highest.

We can see that the Main Thread is clearly at 100% capacity for almost 5 seconds. That's a non-trivial amount of time to block the UI. Looking at the call tree below the chart, we can see that the Main Thread is at 99.9% capacity for 4.43 seconds! Given that a serial queue works in a FIFO manner, tasks will always complete in the order in which they were inserted. Clearly the compute() method is the culprit here. Can you imagine clicking a button just to have the UI freeze up on you for that long?

Background Threads

How can we make this better? DispatchQueue.global() to the rescue! This is where background threads come in. Referring to the GCD architecture diagram above, we can see that anything that is not the Main Thread is a background thread in iOS. They can run alongside the Main Thread, leaving it fully unoccupied and ready to handle other UI events like scrolling, responding to user events, animating etc. Let's make a small change to our button click handler above:

class ViewController: UIViewController {
    @IBAction func handleTap(_ sender: Any) {
        DispatchQueue.global(qos: .userInitiated).async { [unowned self] in
            self.compute()
        }
    }

    private func compute() -> Void {
        // Pretending to post-process a large image.
        var counter = 0
        for _ in 0..<9999999 {
            counter += 1
        }
    }
}

Unless specified, a snippet of code will usually default to execute on the Main Queue, so in order to force it to execute on a different thread, we'll wrap our compute call inside of an asynchronous closure that gets submitted to the DispatchQueue.global queue. Keep in mind that we aren't really managing threads here. We're submitting tasks (in the form of closures or blocks) to the desired queue with the assumption that it is guaranteed to execute at some point in time. The queue decides which thread to allocate the task to, and it does all the hard work of assessing system requirements and managing the actual threads. This is the magic of Grand Central Dispatch. As the old adage goes, you can't improve what you can't measure. So we measured our truly terrible button click handler, and now that we've improved it, we'll measure it once again to get some concrete data with regards to performance.

Looking at the profiler again, it's quite clear to us that this is a huge improvement. The task takes an identical amount of time, but this time, it's happening in the background without locking up the UI. Even though our app is doing the same amount of work, the perceived performance is much better because the user will be free to do other things while the app is processing.

You may have noticed that we accessed a global queue of .userInitiated priority. This is an attribute we can use to give our tasks a sense of urgency. If we run the same task on a global queue of and pass it a qos attribute of background , iOS will think it's a utility task, and thus allocate fewer resources to execute it. So, while we don't have control over when our tasks get executed, we do have control over their priority.

A Note on Main Thread vs. Main Queue

You might be wondering why the Profiler shows "Main Thread" and why we're referring to it as the "Main Queue". If you refer back to the GCD architecture we described above, the Main Queue is solely responsible for managing the Main Thread. The Dispatch Queues section in the Concurrency Programming Guide says that "the main dispatch queue is a globally available serial queue that executes tasks on the application’s main thread. Because it runs on your application’s main thread, the main queue is often used as a key synchronization point for an application."

The terms "execute on the Main Thread" and "execute on the Main Queue" can be used interchangeably.


Concurrent Queues

So far, our tasks have been executed exclusively in a serial manner. DispatchQueue.main is by default a serial queue, and DispatchQueue.global gives you four concurrent dispatch queues depending on the priority parameter you pass in.

Let's say we want to take five images, and have our app process them all in parallel on background threads. How would we go about doing that? We can spin up a custom concurrent queue with an identifier of our choosing, and allocate those tasks there. All that's required is the .concurrent attribute during the construction of the queue.

class ViewController: UIViewController {
    let queue = DispatchQueue(label: "com.app.concurrentQueue", attributes: .concurrent)
    let images: [UIImage] = [UIImage].init(repeating: UIImage(), count: 5)

    @IBAction func handleTap(_ sender: Any) {
        for img in images {
            queue.async { [unowned self] in
                self.compute(img)
            }
        }
    }

    private func compute(_ img: UIImage) -> Void {
        // Pretending to post-process a large image.
        var counter = 0
        for _ in 0..<9999999 {
            counter += 1
        }
    }
}

Running that through the profiler, we can see that the app is now spinning up 5 discrete threads to parallelize a for-loop.

Parallelization of N Tasks

So far, we've looked at pushing computationally expensive task(s) onto background threads without clogging up the UI thread. But what about executing parallel tasks with some restrictions? How can Spotify download multiple songs in parallel, while limiting the maximum number up to 3? We can go about this in a few ways, but this is a good time to explore another important construct in multithreaded programming: semaphores.

Semaphores are signaling mechanisms. They are commonly used to control access to a shared resource. Imagine a scenario where a thread can lock access to a certain section of the code while it executes it, and unlocks after it's done to let other threads execute the said section of the code. You would see this type of behavior in database writes and reads, for example. What if you want only one thread writing to a database and preventing any reads during that time? This is a common concern in thread-safety called Readers-writer lock. Semaphores can be used to control concurrency in our app by allowing us to lock n number of threads.

let kMaxConcurrent = 3 // Or 1 if you want strictly ordered downloads!
let semaphore = DispatchSemaphore(value: kMaxConcurrent)
let downloadQueue = DispatchQueue(label: "com.app.downloadQueue", attributes: .concurrent)

class ViewController: UIViewController {
    @IBAction func handleTap(_ sender: Any) {
        for i in 0..<15 {
            downloadQueue.async { [unowned self] in
                // Lock shared resource access
                semaphore.wait()

                // Expensive task
                self.download(i + 1)

                // Update the UI on the main thread, always!
                DispatchQueue.main.async {
                    tableView.reloadData()

                    // Release the lock
                    semaphore.signal()
                }
            }
        }
    }

    func download(_ songId: Int) -> Void {
        var counter = 0

        // Simulate semi-random download times.
        for _ in 0..<Int.random(in: 999999...10000000) {
            counter += songId
        }
    }
}

Notice how we've effectively restricted our download system to limit itself to k number of downloads. The moment one download finishes (or thread is done executing), it decrements the semaphore, allowing the managing queue to spawn another thread and start downloading another song. You can apply a similar pattern to database transactions when dealing with concurrent reads and writes.

Semaphores usually aren't necessary for code like the one in our example, but they become more powerful when you need to enforce synchronous behavior whille consuming an asynchronous API. The above could would work just as well with a custom NSOperationQueue with a maxConcurrentOperationCount, but it's a worthwhile tangent regardless.


Finer Control with OperationQueue

GCD is great when you want to dispatch one-off tasks or closures into a queue in a 'set-it-and-forget-it' fashion, and it provides a very lightweight way of doing so. But what if we want to create a repeatable, structured, long-running task that produces associated state or data? And what if we want to model this chain of operations such that they can be cancelled, suspended and tracked, while still working with a closure-friendly API? Imagine an operation like this:

This would be quite cumbersome to achieve with GCD. We want a more modular way of defining a group of tasks while maintaining readability and also exposing a greater amount of control. In this case, we can use Operation objects and queue them onto an OperationQueue, which is a high-level wrapper around DispatchQueue. Let's look at some of the benefits of using these abstractions and what they offer in comparison to the lower-level GCI API:

  • You may want to create dependencies between tasks, and while you could do this via GCD, you're better off defining them concretely as Operation objects, or units of work, and pushing them onto your own queue. This would allow for maximum reusability since you may use the same pattern elsewhere in an application.
  • The Operation and OperationQueue classes have a number of properties that can be observed, using KVO (Key Value Observing). This is another important benefit if you want to monitor the state of an operation or operation queue.
  • Operations can be paused, resumed, and cancelled. Once you dispatch a task using Grand Central Dispatch, you no longer have control or insight into the execution of that task. The Operation API is more flexible in that respect, giving the developer control over the operation's life cycle.
  • OperationQueue allows you to specify the maximum number of queued operations that can run simultaneously, giving you a finer degree of control over the concurrency aspects.

The usage of Operation and OperationQueue could fill an entire blog post, but let's look at a quick example of what modeling dependencies looks like. (GCD can also create dependencies, but you're better off dividing up large tasks into a series of composable sub-tasks.) In order to create a chain of operations that depend on one another, we could do something like this:

class ViewController: UIViewController {
    var queue = OperationQueue()
    var rawImage = UIImage? = nil
    let imageUrl = URL(string: "https://example.com/portrait.jpg")!
    @IBOutlet weak var imageView: UIImageView!

    let downloadOperation = BlockOperation {
        let image = Downloader.downloadImageWithURL(url: imageUrl)
        OperationQueue.main.async {
            self.rawImage = image
        }
    }

    let filterOperation = BlockOperation {
        let filteredImage = ImgProcessor.addGaussianBlur(self.rawImage)
        OperationQueue.main.async {
            self.imageView = filteredImage
        }
    }

    filterOperation.addDependency(downloadOperation)

    [downloadOperation, filterOperation].forEach {
        queue.addOperation($0)
     }
}

So why not opt for a higher level abstraction and avoid using GCD entirely? While GCD is ideal for inline asynchronous processing, Operation provides a more comprehensive, object-oriented model of computation for encapsulating all of the data around structured, repeatable tasks in an application. Developers should use the highest level of abstraction possible for any given problem, and for scheduling consistent, repeated work, that abstraction is Operation. Other times, it makes more sense to sprinkle in some GCD for one-off tasks or closures that we want to fire. We can mix both OperationQueue and GCD to get the best of both worlds.


The Cost of Concurrency

DispatchQueue and friends are meant to make it easier for the application developer to execute code concurrently. However, these technologies do not guarantee improvements to the efficiency or responsiveness in an application. It is up to you to use queues in a manner that is both effective and does not impose an undue burden on other resources. For example, it's totally viable to create 10,000 tasks and submit them to a queue, but doing so would allocate a nontrivial amount of memory and introduce a lot of overhead for the allocation and deallocation of operation blocks. This is the opposite of what you want! It's best to profile your app thoroughly to ensure that concurrency is enhancing your app's performance and not degrading it.

We've talked about how concurrency comes at a cost in terms of complexity and allocation of system resources, but introducing concurrency also brings a host of other risks like:

  • Deadlock: A situation where a thread locks a critical portion of the code and can halt the application's run loop entirely. In the context of GCD, you should be very careful when using the dispatchQueue.sync { } calls as you could easily get yourself in situations where two synchronous operations can get stuck waiting for each other.
  • Priority Inversion: A condition where a lower priority task blocks a high priority task from executing, which effectively inverts their priorities. GCD allows for different levels of priority on its background queues, so this is quite easily a possibility.
  • Producer-Consumer Problem: A race condition where one thread is creating a data resource while another thread is accessing it. This is a synchronization problem, and can be solved using locks, semaphores, serial queues, or a barrier dispatch if you're using concurrent queues in GCD.
  • ...and many other sorts of locking and data-race conditions that are hard to debug! Thread safety is of the utmost concern when dealing with concurrency.

Parting Thoughts + Further Reading

If you've made it this far, I applaud you. Hopefully this article gives you a lay of the land when it comes to multithreading techniques on iOS, and how you can use some of them in your app. We didn't get to cover many of the lower-level constructs like locks, mutexes and how they help us achieve synchronization, nor did we get to dive into concrete examples of how concurrency can hurt your app. We'll save those for another day, but you can dig into some additional reading and videos if you're eager to dive deeper.




n

African American Women Leading in Tech

“Close your eyes and name three people who have impacted the tech industry.”

In all likelihood, that list might be overwhelmingly white and male.

And you are not alone. Numerous lists online yielded the same results. In recent years, many articles have chronicled the dearth of diversity in tech. Studies have shown the ways in which venture capital firms have systematically underestimated and undervalued innovation coming particularly from women of color. In 2016 only 88 tech startups were led by African American women, in 2018 this number had climbed to a little over 200. Between 2009 and 2017, African American women raised $289MM in venture/angel funding. For perspective, this only represents .0006% of the $424.7B in total tech venture funding raised in that same time frame. In 2018, only 34 African American women had ever raised more than a million in venture funding.

When it comes to innovation, it is not unusual for financial value to be the biggest predictor of what is considered innovative. In fact, a now largely controversial list posted by Forbes of America’s most innovative leaders in the fall of 2019 featured 99 men and one woman. Ironically, what was considered innovative was, in fact, very traditional in its presentation. The criteria used for the list was “media reputation for innovation,” social connections, a track record for value creation, and investor expectations for value creation.

The majority of African American women-led startups raise $42,000 from largely informal networks. Criteria weighted on the side of ‘track record for value creation’ and ‘investor expectations for value creation’ devalues the immense contributions of African American women leading the charge on thoughtful and necessary tech. Had Forbes used criteria for innovation that recognized emergent leadership, novel problem-solving, or original thinking outside the circles of already well-known and well-established entrepreneurs we might have learned something new. Instead, we're basically reminded that "it takes money to make money."

Meanwhile, African American women are the fastest-growing demographic of entrepreneurs in the United States. Their contributions to tech, amongst other fields, are cementing the importance of African American women in the innovation space. And they are doing this within and outside traditional tech frameworks. By becoming familiar with these entrepreneurs and their work, we can elevate their reputation and broaden our collective recognition of innovative leaders.

In honor of black history month, we have compiled a list of African American women founders leading the way in tech innovation from Alabama to the Bay Area. From rethinking energy to debt forgiveness platforms these women are crossing boundaries in every field.

Cultivating New Leaders

Photo of Kathryn Finney, courtesy of Forbes.com.

Kathryn Finney founder of Digitalundivided
Kathryn A. Finney is an American author, researcher, investor, entrepreneur, innovator and businesswoman. She is the founder and CEO of digitalundivided, a social enterprise that leads high potential Black and Latinx women founders through the startup pipeline from idea to exit.

Laura Weidman Co-founder Code2040
Laura Weidman Powers is the co-founder and executive director of Code2040, a nonprofit that creates access, awareness, and opportunities for minority engineering talent to ensure their leadership in the innovation economy.

Angelica Ross founder of TransTech Social Enterprises
Angelica Ross is an American businesswoman, actress, and transgender rights advocate. After becoming a self-taught computer coder, she went on to become the founder and CEO of TransTech Social Enterprises, a firm that helps employ transgender people in the tech industry.

Christina Souffrant Ntim co-founder of Global Startup Ecosystem
Christina Souffrant Ntim is the co-founder of award-winning digital accelerator platform – Global Startup Ecosystem which graduates over 1000+ companies across 90+ countries a year.

Media and Entertainment

Bryanda Law founder of Quirktastic
Bryanda Law is the founder of Quirktastic, a modern media-tech company on a mission to grow the largest and most authentically engaged community of fandom-loving people of color.

Morgan Debaun founder of Blavity Inc.
Morgan DeBaun is an African American entrepreneur. She is the Founder and CEO of Blavity Inc., a portfolio of brands and websites created by and for black millennials

Cheryl Contee co-founder of Do Big Things
Cheryl Contee is the award-winning CEO and co-founder of Do Big Things, a digital agency that creates new narratives and tech for a new era focused on causes and campaigns.

Photo of Farah Allen, courtesy of The Source Magazine.

Farah Allen founder of The Labz
Farah Allen is the CEO and founder of The Labz, a collaborative workspace that provides automated tracking, rights management, protection—using Blockchain technology—of your music files during and after you create them.

Health/Wellness

Mara Lidey co-founder of Shine
Marah Lidey is the co-founder & co-CEO of Shine. Shine aims to reinvent health and wellness for millennials through messaging technology.

Alicia Thomas co-founder of Dibs
Alicia Thomas is the founder and CEO of Dibs, a B2B digital platform that gives studios quick and easy access to real-time pricing for fitness classes.

Photo of Erica Plybeah, courtesy of BetterTennessee.com

Erica Plybeah Hemphill founder of MedHaul
Erica Plybeah Hemphill is the founder of MedHaul. MedHaul offers cloud-based solutions that ease the burdens of managing patient transportation.

Star Cunningham founder of 4D Healthware
Star Cunningham is the founder and CEO of 4D Healthware. 4D Healthware is patient engagement software that makes personalized medicine possible through connected data.

Kimberly Wilson founder of HUED
Kimberly Wilson is the founder of HUED. HUED is a healthcare technology startup that helps patients find and book appointments with Black and Latinx healthcare providers.

Financial

Viola Llewellyn co-founder of Ovamba Solutions
Viola Llewellyn is the co-founder and the president of Ovamba Solutions, a US-based fintech company that provides micro, small, and medium enterprises in Africa and the Middle East with microfinance through a mobile platform.

NanaEfua Baidoo Afoh-Manin, Briana DeCuir and Joanne Moreau founders of Shared Harvest Fund
NanaEfua, Briana and Joanne are the founders of Shared Harvest Fund. Shared Harvest Fund provides real opportunities for talented people to volunteer away their student loans.

Photo of Sheena Allen, courtesy of People of Color in Tech.

Sheena Allen founder of CapWay
Sheena Allen is best known as the founder and CEO of fintech company and mobile bank CapWay.

Education

Helen Adeosun co-founder of CareAcademy
Helen Adeosun is the co-founder, president and CEO of CareAcademy, a start-up dedicated to professionalizing caregiving through online classes. CareAcademy brings professional development to caregivers at all levels.

Alexandra Bernadotte founder of Beyond 12
Alex Bernadotte is the founder and chief executive officer of Beyond 12, a nonprofit that integrates personalized coaching with intelligent technology to increase the number of traditionally underserved students who earn a college degree.

Shani Dowell founder of Possip
Shani Dowell is the founder of Possip, a platform that simplifies feedback between parents, schools and districts. Learn more at possipit.com.

Kaya Thomas of We Read Too
Kaya Thomas is an American computer scientist, app developer and writer. She is the creator of We Read Too, an iOS app that helps readers discover books for and by people of color.

Kimberly Gray founder of Uvii
Kimberly Gray is the founder of Uvii. Uvii helps students to communicate and collaborate on mobile with video, audio, and text

Nicole Neal co-founder of ProcureK12 by Noodle Markets
Nicole Neal is the co-founder and CEO of ProcureK12 by Noodle Markets. ProcureK12 makes purchasing for education simple. They combine a competitive school supply marketplace with quote request tools and bid management.

Beauty/Fashion/Consumer goods

Regina Gwyn founder of TresseNoire
Regina Gwynn is the co-founder & CEO of TresseNoire, the leading on-location beauty booking app designed for women of color in New York City and Philadelphia.

Camille Hearst co-founder of Kit.
Camille Hearst is the CEO and co-founder of Kit. Kit lets experts create shoppable collections of products so their followers can buy and the experts can make some revenue from what they share.

Photo of Esosa Ighodaro courtesy of Under30CEO.

Esosa Ighodaro co-founder of CoSign Inc.
Esosa Ighodaro is the co-founder of CoSign Inc., which was founded in 2013. CoSign is a mobile application that transfers social media content into commerce giving cash for endorsing and cosigning products and merchandise like clothing, home goods, technology and more.

Environment

Jessica Matthews founder of Uncharted Power
Jessica O. Matthews is a Nigerian-American inventor, CEO and venture capitalist. She is the co-founder of Uncharted Power, which made Soccket, a soccer ball that can be used as a power generator.

Etosha Cave co-founder of Opus 12
Etosha R. Cave is an American mechanical engineer based in Berkeley, California. She is the Co-Founder and Chief Scientific Officer of Opus 12, a startup that recycles carbon dioxide.

Kellee James founder of Mercaris, Inc.
Kellee James is the founder and CEO of Mercaris, Inc., a growing, minority-led start-up that makes efficient trading of organic and non-GMO commodities possible via market data service exchanges and trading platforms.

Workplace

Photo of Lisa Skeete Tatum courtesy of The Philadelphia Citezen.

Lisa Skeete Tatum founder of Landit
Lisa Skeete Tatum is the founder and CEO of Landit, a technology platform created to increase the success and engagement of women in the workplace, and to enable companies to attract, develop, and retain high-potential, diverse talent.

Netta Jenkins and Jacinta Mathis founders of Dipper
Netta Jenkins and Jacinta Mathis are founders of Dipper, a platform that acts as a safe digital space for individuals of color in the workplace.

Sherisse Hawkins founder of Pagedip
Sherisse Hawkins is the visionary and founder of Pagedip. Pagedip is a cloud-based software solution that allows you to bring depth to digital documents, enabling people to read (text), watch (video) and do (interact) all in the same place without ever having to leave the page.

Thkisha DeDe Sanogo founder of MyTAASK
Thkisha DeDe Sanogo is the founder of MyTAASK. MyTAASK is a personal planning platform dedicated to getting stuff done in real-time.

Home

Photo of Jean Brownhill, courtesy of Quartz at Work.

Jean Brownhill founder of Sweeten 
Jean Brownhill is the founder and CEO of Sweeten, an award-winning service that helps homeowners and business owners find and manage the best vetted general contractors for major renovation projects.

Reham Fagiri co-founder of AptDeco
Reham Fagiri is the co-founder of AptDeco. AptDeco is an online marketplace for buying and selling quality preowned furniture with pick up and delivery built into the service.

Stephanie Cummings founder of Please Assist Me
Stephanie Cummings is the founder and CEO of Please Assist me. Please Assist Me is an apartment task service in Nashville, TN. The organization empowers working professionals by allowing them to outsource their weekly chores to their own personal team.

Law

Kristina Jones co-founder of Court Buddy
Kristina Jones is the co-founder of Court Buddy, a service that matches clients with lawyers.

Sonja Ebron and Debra Slone founders of Courtroom5
Sonja Ebron and Debra Slone are the founders of Courtroom5. Courtroom5 helps you represent yourself in court with tools, training, and community designed for pro se litigants.

Crowdfunding

Zuley Clarke founder of Business Gift Registry
Zuley Clarke is the founder of Business Gift Registry, a crowdfunding platform that lets friends and family support an entrepreneur through gift-giving just like they would support a couple for a wedding.



  • News & Culture

n

Markdown Comes Alive! Part 1, Basic Editor

In my last post, I covered what LiveView is at a high level. In this series, we’re going to dive deeper and implement a LiveView powered Markdown editor called Frampton. This series assumes you have some familiarity with Phoenix and Elixir, including having them set up locally. Check out Elizabeth’s three-part series on getting started with Phoenix for a refresher.

This series has a companion repository published on GitHub. Get started by cloning it down and switching to the starter branch. You can see the completed application on master. Our goal today is to make a Markdown editor, which allows a user to enter Markdown text on a page and see it rendered as HTML next to it in real-time. We’ll make use of LiveView for the interaction and the Earmark package for rendering Markdown. The starter branch provides some styles and installs LiveView.

Rendering Markdown

Let’s set aside the LiveView portion and start with our data structures and the functions that operate on them. To begin, a Post will have a body, which holds the rendered HTML string, and title. A string of markdown can be turned into HTML by calling Post.render(post, markdown). I think that just about covers it!

First, let’s define our struct in lib/frampton/post.ex:

defmodule Frampton.Post do
  defstruct body: "", title: ""

  def render(%__MODULE{} = post, markdown) do
    # Fill me in!
  end
end

Now the failing test (in test/frampton/post_test.exs):

describe "render/2" do
  test "returns our post with the body set" do
    markdown = "# Hello world!"                                                                                                                 
    assert Post.render(%Post{}, markdown) == {:ok, %Post{body: "<h1>Hello World</h1>
"}}
  end
end

Our render method will just be a wrapper around Earmark.as_html!/2 that puts the result into the body of the post. Add {:earmark, "~> 1.4.3"} to your deps in mix.exs, run mix deps.get and fill out render function:

def render(%__MODULE{} = post, markdown) do
  html = Earmark.as_html!(markdown)
  {:ok, Map.put(post, :body, html)}
end

Our test should now pass, and we can render posts! [Note: we’re using the as_html! method, which prints error messages instead of passing them back to the user. A smarter version of this would handle any errors and show them to the user. I leave that as an exercise for the reader…] Time to play around with this in an IEx prompt (run iex -S mix in your terminal):

iex(1)> alias Frampton.Post
Frampton.Post
iex(2)> post = %Post{}
%Frampton.Post{body: "", title: ""}
iex(3)> {:ok, updated_post} = Post.render(post, "# Hello world!")
{:ok, %Frampton.Post{body: "<h1>Hello world!</h1>
", title: ""}}
iex(4)> updated_post
%Frampton.Post{body: "<h1>Hello world!</h1>
", title: ""}

Great! That’s exactly what we’d expect. You can find the final code for this in the render_post branch.

LiveView Editor

Now for the fun part: Editing this live!

First, we’ll need a route for the editor to live at: /editor sounds good to me. LiveViews can be rendered from a controller, or directly in the router. We don’t have any initial state, so let's go straight from a router.

First, let's put up a minimal test. In test/frampton_web/live/editor_live_test.exs:

defmodule FramptonWeb.EditorLiveTest do
  use FramptonWeb.ConnCase
  import Phoenix.LiveViewTest

  test "the editor renders" do
    conn = get(build_conn(), "/editor")
    assert html_response(conn, 200) =~ "data-test="editor""
  end
end

This test doesn’t do much yet, but notice that it isn’t live view specific. Our first render is just the same as any other controller test we’d write. The page’s content is there right from the beginning, without the need to parse JavaScript or make API calls back to the server. Nice.

To make that test pass, add a route to lib/frampton_web/router.ex. First, we import the LiveView code, then we render our Editor:

import Phoenix.LiveView.Router
# … Code skipped ...
# Inside of `scope "/"`:
live "/editor", EditorLive

Now place a minimal EditorLive module, in lib/frampton_web/live/editor_live.ex:

defmodule FramptonWeb.EditorLive do
  use Phoenix.LiveView

  def render(assigns) do
    ~L"""
      <div data-test=”editor”>
        <h1>Hello world!</h1>
      </div>
      """
  end

  def mount(_params, _session, socket) do
    {:ok, socket}
  end
end

And we have a passing test suite! The ~L sigil designates that LiveView should track changes to the content inside. We could keep all of our markup in this render/1 method, but let’s break it out into its own template for demonstration purposes.

Move the contents of render into lib/frampton_web/templates/editor/show.html.leex, and replace EditorLive.render/1 with this one liner: def render(assigns), do: FramptonWeb.EditorView.render("show.html", assigns). And finally, make an EditorView module in lib/frampton_web/views/editor_view.ex:

defmodule FramptonWeb.EditorView do
  use FramptonWeb, :view
  import Phoenix.LiveView
end

Our test should now be passing, and we’ve got a nicely separated out template, view and “live” server. We can keep markup in the template, helper functions in the view, and reactive code on the server. Now let’s move forward to actually render some posts!

Handling User Input

We’ve got four tasks to accomplish before we are done:

  1. Take markdown input from the textarea
  2. Send that input to the LiveServer
  3. Turn that raw markdown into HTML
  4. Return the rendered HTML to the page.

Event binding

To start with, we need to annotate our textarea with an event binding. This tells the liveview.js framework to forward DOM events to the server, using our liveview channel. Open up lib/frampton_web/templates/editor/show.html.leex and annotate our textarea:

<textarea phx-keyup="render_post"></textarea>

This names the event (render_post) and sends it on each keyup. Let’s crack open our web inspector and look at the web socket traffic. Using Chrome, open the developer tools, navigate to the network tab and click WS. In development you’ll see two socket connections: one is Phoenix LiveReload, which polls your filesystem and reloads pages appropriately. The second one is our LiveView connection. If you let it sit for a while, you’ll see that it's emitting a “heartbeat” call. If your server is running, you’ll see that it responds with an “ok” message. This lets LiveView clients know when they've lost connection to the server and respond appropriately.

Now, type some text and watch as it sends down each keystroke. However, you’ll also notice that the server responds with a “phx_error” message and wipes out our entered text. That's because our server doesn’t know how to handle the event yet and is throwing an error. Let's fix that next.

Event handling

We’ll catch the event in our EditorLive module. The LiveView behavior defines a handle_event/3 callback that we need to implement. Open up lib/frampton_web/live/editor_live.ex and key in a basic implementation that lets us catch events:

def handle_event("render_post", params, socket) do
  IO.inspect(params)

  {:noreply, socket}
end

The first argument is the name we gave to our event in the template, the second is the data from that event, and finally the socket we’re currently talking through. Give it a try, typing in a few characters. Look at your running server and you should see a stream of events that look something like this:

There’s our keystrokes! Next, let’s pull out that value and use it to render HTML.

Rendering Markdown

Lets adjust our handle_event to pattern match out the value of the textarea:

def handle_event("render_post", %{"value" => raw}, socket) do

Now that we’ve got the raw markdown string, turning it into HTML is easy thanks to the work we did earlier in our Post module. Fill out the body of the function like this:

{:ok, post} = Post.render(%Post{}, raw)
IO.inspect(post)

If you type into the textarea you should see output that looks something like this:

Perfect! Lastly, it’s time to send that rendered html back to the page.

Returning HTML to the page

In a LiveView template, we can identify bits of dynamic data that will change over time. When they change, LiveView will compare what has changed and send over a diff. In our case, the dynamic content is the post body.

Open up show.html.leex again and modify it like so:

<div class="rendered-output">
  <%= @post.body %>
</div>

Refresh the page and see:

Whoops!

The @post variable will only be available after we put it into the socket’s assigns. Let’s initialize it with a blank post. Open editor_live.ex and modify our mount/3 function:

def mount(_params, _session, socket) do
  post = %Post{}
  {:ok, assign(socket, post: post)}
end

In the future, we could retrieve this from some kind of storage, but for now, let's just create a new one each time the page refreshes. Finally, we need to update the Post struct with user input. Update our event handler like this:

def handle_event("render_post", %{"value" => raw}, %{assigns: %{post: post}} = socket) do
  {:ok, post} = Post.render(post, raw)
  {:noreply, assign(socket, post: post)
end

Let's load up http://localhost:4000/editor and see it in action.

Nope, that's not quite right! Phoenix won’t render this as HTML because it’s unsafe user input. We can get around this (very good and useful) security feature by wrapping our content in a raw/1 call. We don’t have a database and user processes are isolated from each other by Elixir. The worst thing a malicious user could do would be crash their own session, which doesn’t bother me one bit.

Check the edit_posts branch for the final version.

Conclusion

That’s a good place to stop for today. We’ve accomplished a lot! We’ve got a dynamically rendering editor that takes user input, processes it and updates the page. And we haven’t written any JavaScript, which means we don’t have to maintain or update any JavaScript. Our server code is built on the rock-solid foundation of the BEAM virtual machine, giving us a great deal of confidence in its reliability and resilience.

In the next post, we’ll tackle making a shared editor, allowing multiple users to edit the same post. This project will highlight Elixir’s concurrency capabilities and demonstrate how LiveView builds on them to enable some incredible user experiences.



  • Code
  • Back-end Engineering

n

Committed to the wrong branch? -, @{upstream}, and @{-1} to the rescue

I get into this situation sometimes. Maybe you do too. I merge feature work into a branch used to collect features, and then continue development but on that branch instead of back on the feature branch

git checkout feature
# ... bunch of feature commits ...
git push
git checkout qa-environment
git merge --no-ff --no-edit feature
git push
# deploy qa-environment to the QA remote environment
# ... more feature commits ...
# oh. I'm not committing in the feature branch like I should be

and have to move those commits to the feature branch they belong in and take them out of the throwaway accumulator branch

git checkout feature
git cherry-pick origin/qa-environment..qa-environment
git push
git checkout qa-environment
git reset --hard origin/qa-environment
git merge --no-ff --no-edit feature
git checkout feature
# ready for more feature commits

Maybe you prefer

git branch -D qa-environment
git checkout qa-environment

over

git checkout qa-environment
git reset --hard origin/qa-environment

Either way, that works. But it'd be nicer if we didn't have to type or even remember the branches' names and the remote's name. They are what is keeping this from being a context-independent string of commands you run any time this mistake happens. That's what we're going to solve here.

Shorthands for longevity

I like to use all possible natively supported shorthands. There are two broad motivations for that.

  1. Fingers have a limited number of movements in them. Save as many as possible left late in life.
  2. Current research suggests that multitasking has detrimental effects on memory. Development tends to be very heavy on multitasking. Maybe relieving some of the pressure on quick-access short term memory (like knowing all relevant branch names) add up to leave a healthier memory down the line.

First up for our scenario: the - shorthand, which refers to the previously checked out branch. There are a few places we can't use it, but it helps a lot:

Bash
# USING -

git checkout feature
# hack hack hack
git push
git checkout qa-environment
git merge --no-ff --no-edit -        # ????
git push
# hack hack hack
# whoops
git checkout -        # now on feature ???? 
git cherry-pick origin/qa-environment..qa-environment
git push
git checkout - # now on qa-environment ????
git reset --hard origin/qa-environment
git merge --no-ff --no-edit -        # ????
git checkout -                       # ????
# on feature and ready for more feature commits
Bash
# ORIGINAL

git checkout feature
# hack hack hack
git push
git checkout qa-environment
git merge --no-ff --no-edit feature
git push
# hack hack hack
# whoops
git checkout feature
git cherry-pick origin/qa-environment..qa-environment
git push
git checkout qa-environment
git reset --hard origin/qa-environment
git merge --no-ff --no-edit feature
git checkout feature
# ready for more feature commits

We cannot use - when cherry-picking a range

> git cherry-pick origin/-..-
fatal: bad revision 'origin/-..-'

> git cherry-pick origin/qa-environment..-
fatal: bad revision 'origin/qa-environment..-'

and even if we could we'd still have provide the remote's name (here, origin).

That shorthand doesn't apply in the later reset --hard command, and we cannot use it in the branch -D && checkout approach either. branch -D does not support the - shorthand and once the branch is deleted checkout can't reach it with -:

# assuming that branch-a has an upstream origin/branch-a
> git checkout branch-a
> git checkout branch-b
> git checkout -
> git branch -D -
error: branch '-' not found.
> git branch -D branch-a
> git checkout -
error: pathspec '-' did not match any file(s) known to git

So we have to remember the remote's name (we know it's origin because we are devoting memory space to knowing that this isn't one of those times it's something else), the remote tracking branch's name, the local branch's name, and we're typing those all out. No good! Let's figure out some shorthands.

@{-<n>} is hard to say but easy to fall in love with

We can do a little better by using @{-<n>} (you'll also sometimes see it referred to be the older @{-N}). It is a special construct for referring to the nth previously checked out ref.

> git checkout branch-a
> git checkout branch-b
> git rev-parse --abbrev-rev @{-1} # the name of the previously checked out branch
branch-a
> git checkout branch-c
> git rev-parse --abbrev-rev @{-2} # the name of branch checked out before the previously checked out one
branch-a

Back in our scenario, we're on qa-environment, we switch to feature, and then want to refer to qa-environment. That's @{-1}! So instead of

git cherry-pick origin/qa-environment..qa-environment

We can do

git cherry-pick origin/qa-environment..@{-1}

Here's where we are (🎉 marks wins from -, 💥 marks the win from @{-1})

Bash
# USING - AND @{-1}

git checkout feature
# hack hack hack
git push
git checkout qa-environment
git merge --no-ff --no-edit -                # ????
git push
# hack hack hack
# whoops
git checkout -                               # ????
git cherry-pick origin/qa-environment..@{-1} # ????
git push
git checkout -                               # ????
git reset --hard origin/qa-environment
git merge --no-ff --no-edit -                # ????
git checkout -                               # ????
# ready for more feature commits
Bash
# ORIGINAL

git checkout feature
# hack hack hack
git push
git checkout qa-environment
git merge --no-ff --no-edit feature
git push
# hack hack hack
# whoops
git checkout feature
git cherry-pick origin/qa-environment..qa-environment
git push
git checkout qa-environment
git reset --hard origin/qa-environment
git merge --no-ff --no-edit feature
git checkout feature
# ready for more feature commits

One down, two to go: we're still relying on memory for the remote's name and the remote branch's name and we're still typing both out in full. Can we replace those with generic shorthands?

@{-1} is the ref itself, not the ref's name, we can't do

> git cherry-pick origin/@{-1}..@{-1}
origin/@{-1}
fatal: ambiguous argument 'origin/@{-1}': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

because there is no branch origin/@{-1}. For the same reason, @{-1} does not give us a generalized shorthand for the scenario's later git reset --hard origin/qa-environment command.

But good news!

Do @{u} @{push}

@{upstream} or its shorthand @{u} is the remote branch a that would be pulled from if git pull were run. @{push} is the remote branch that would be pushed to if git push was run.

> git checkout branch-a
Switched to branch 'branch-a'
Your branch is ahead of 'origin/branch-a' by 3 commits.
  (use "git push" to publish your local commits)
> git reset --hard origin/branch-a
HEAD is now at <the SHA origin/branch-a is at>

we can

> git checkout branch-a
Switched to branch 'branch-a'
Your branch is ahead of 'origin/branch-a' by 3 commits.
  (use "git push" to publish your local commits)
> git reset --hard @{u}                                # <-- So Cool!
HEAD is now at <the SHA origin/branch-a is at>

Tacking either onto a branch name will give that branch's @{upstream} or @{push}. For example

git checkout branch-a@{u}

is the branch branch-a pulls from.

In the common workflow where a branch pulls from and pushes to the same branch, @{upstream} and @{push} will be the same, leaving @{u} as preferable for its terseness. @{push} shines in triangular workflows where you pull from one remote and push to another (see the external links below).

Going back to our scenario, it means short, portable commands with a minimum human memory footprint. (🎉 marks wins from -, 💥 marks the win from @{-1}, 😎 marks the wins from @{u}.)

Bash
# USING - AND @{-1} AND @{u}

git checkout feature
# hack hack hack
git push
git checkout qa-environment
git merge --no-ff --no-edit -    # ????
git push
# hack hack hack
# whoops
git checkout -                   # ????
git cherry-pick @{-1}@{u}..@{-1} # ????????
git push
git checkout -                   # ????
git reset --hard @{u}            # ????
git merge --no-ff --no-edit -    # ????
git checkout -                   # ????
# ready for more feature commits
Bash
# ORIGINAL

git checkout feature
# hack hack hack
git push
git checkout qa-environment
git merge --no-ff --no-edit feature
git push
# hack hack hack
# whoops
git checkout feature
git cherry-pick origin/qa-environment..qa-environment
git push
git checkout qa-environment
git reset --hard origin/qa-environment
git merge --no-ff --no-edit feature
git checkout feature
# ready for more feature commits

Make the things you repeat the easiest to do

Because these commands are generalized, we can run some series of them once, maybe

git checkout - && git reset --hard @{u} && git checkout -

or

git checkout - && git cherry-pick @{-1}@{u}.. @{-1} && git checkout - && git reset --hard @{u} && git checkout -

and then those will be in the shell history just waiting to be retrieved and run again the next time, whether with CtrlR incremental search or history substring searching bound to the up arrow or however your interactive shell is configured. Or make it an alias, or even better an abbreviation if your interactive shell supports them. Save the body wear and tear, give memory a break, and level up in Git.

And keep going

The GitHub blog has a good primer on triangular workflows and how they can polish your process of contributing to external projects.

The FreeBSD Wiki has a more in-depth article on triangular workflow process (though it doesn't know about @{push} and @{upstream}).

The construct @{-<n>} and the suffixes @{push} and @{upstream} are all part of the gitrevisions spec. Direct links to each:



    • Code
    • Front-end Engineering
    • Back-end Engineering